summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-12-05Merge branch 'bnxt_en-dcbnl'David S. Miller
Michael Chan says: ==================== bnxt_en: Add DCBNL support. This series adds DCBNL operations to support host-based IEEE DCBX. v2: Updated to the latest firmware interface spec. David, please consider this series for net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05bnxt_en: Add PFC statistics.Michael Chan
Report PFC statistics to ethtool -S and DCBNL. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05bnxt_en: Implement DCBNL to support host-based DCBX.Michael Chan
Support only IEEE DCBX initially. Add IEEE DCBNL ops and functions to get and set the hardware DCBX parameters. The DCB code is conditional on Kconfig CONFIG_BNXT_DCB. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05bnxt_en: Update firmware header file to latest 1.6.0.Michael Chan
Latest interface has the latest DCB command structs. Get and store the max number of lossless TCs the hardware can support. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05bnxt_en: Re-factor bnxt_setup_tc().Michael Chan
Add a new function bnxt_setup_mq_tc() to handle MQPRIO. This new function will be called during ETS setup when we add DCBNL in the next patch. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05Merge branch 'fib-suffix-length-fixes'David S. Miller
Alexander Duyck says: ==================== IPv4 FIB suffix length fixes In reviewing the patch from Robert Shearman and looking over the code I realized there were a few different bugs we were still carrying in the IPv4 FIB lookup code. These two patches are based off of Robert's original patch, but take things one step further by splitting them up to address two additional issues I found. So first have Robert's original patch which was addressing the fact that us calling update_suffix in resize is expensive when it is called per add. To address that I incorporated the core bit of the patch which was us dropping the update_suffix call from resize. The first patch in the series does a rename and fix on the push_suffix and pull_suffix code. Specifically we drop the need to pass a leaf and secondly we fix things so we pull the suffix as long as the value of the suffix in the node is dropping. The second patch addresses the original issue reported as well as optimizing the code for the fact that update_suffix is only really meant to go through and clean things up when we are decreasing a suffix. I had originally added code for it to somehow cause an increase, but if we push the suffix when a new leaf is added we only ever have to handle pulling down the suffix with update_suffix so I updated the code to reflect that. As far as side effects the only ones I think that will be obvious should be the fact that some routes may be able to be found earlier since before we relied on resize to update the suffix lengths, and now we are updating them before we add or remove the leaf. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05ipv4: Drop suffix update from resize codeAlexander Duyck
It has been reported that update_suffix can be expensive when it is called on a large node in which most of the suffix lengths are the same. The time required to add 200K entries had increased from around 3 seconds to almost 49 seconds. In order to address this we need to move the code for updating the suffix out of resize and instead just have it handled in the cases where we are pushing a node that increases the suffix length, or will decrease the suffix length. Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length") Reported-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Reviewed-by: Robert Shearman <rshearma@brocade.com> Tested-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05ipv4: Drop leaf from suffix pull/push functionsAlexander Duyck
It wasn't necessary to pass a leaf in when doing the suffix updates so just drop it. Instead just pass the suffix and work with that. Since we dropped the leaf there is no need to include that in the name so the names are updated to node_push_suffix and node_pull_suffix. Finally I noticed that the logic for pulling the suffix length back actually had some issues. Specifically it would stop prematurely if there was a longer suffix, but it was not as long as the original suffix. I updated the code to address that in node_pull_suffix. Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length") Suggested-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Reviewed-by: Robert Shearman <rshearma@brocade.com> Tested-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05net: phy: dp83848: Support ethernet pause framesJesper Nilsson
According to the documentation, the PHYs supported by this driver can also support pause frames. Announce this to be so. Tested with a TI83822I. Acked-by: Andrew F. Davis <afd@ti.com> Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes the following issues: - Intermittent build failure in RSA - Memory corruption in chelsio crypto driver - Regression in DRBG due to vmalloced stack" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: rsa - Add Makefile dependencies to fix parallel builds crypto: chcr - Fix memory corruption crypto: drbg - prevent invalid SG mappings
2016-12-04Linux 4.9-rc8v4.9-rc8Linus Torvalds
2016-12-03net: dcb: set error code on failuresPan Bian
In function dcbnl_cee_fill(), returns the value of variable err on errors. However, on some error paths (e.g. nla put fails), its value may be 0. It may be better to explicitly set a negative errno to variable err before returning. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188881 Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03ipv6 addrconf: Implemented enhanced DAD (RFC7527)Erik Nordmark
Implemented RFC7527 Enhanced DAD. IPv6 duplicate address detection can fail if there is some temporary loopback of Ethernet frames. RFC7527 solves this by including a random nonce in the NS messages used for DAD, and if an NS is received with the same nonce it is assumed to be a looped back DAD probe and is ignored. RFC7527 is enabled by default. Can be disabled by setting both of conf/{all,interface}/enhanced_dad to zero. Signed-off-by: Erik Nordmark <nordmark@arista.com> Signed-off-by: Bob Gilligan <gilligan@arista.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03Merge branch 'mv88e6390-batch-three'David S. Miller
Andrew Lunn says: ==================== mv88e6390 batch 3 More patches to support the MV88e6390. This is mostly refactoring existing code and adding implementations for the mv88e6390. This patchset set which reserved frames are sent to the cpu, the size of jumbo frames that will be accepted, turn off egress rate limiting, and configuration of pause frames. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net: dsa: mv88e6xxx: Implement mv88e6390 pause controlAndrew Lunn
The mv88e6390 has a number flow control registers accessed via the Flow Control register. Use these to set the pause control. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net: dsa: mv88e6xxx: Refactor pause configurationAndrew Lunn
The mv88e6390 has a different mechanism for configuring pause. Refactor the code into an ops function, and for the moment, don't add any mv88e6390 code yet. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net: dsa: mv88e6xxx: Refactor egress rate limitingAndrew Lunn
There are two different rate limiting configurations, depending on the switch generation. Refactor this into ops. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net: dsa: mv88e6xxx: Refactor setting of jumbo framesAndrew Lunn
Some switches support jumbo frames. Refactor this code into operations in the ops structure. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net: dsa: mv88e6xxx: Reserved Management frames to CPUAndrew Lunn
Older devices have a couple of registers in global2. The mv88e6390 family has a single register in global1 behind which hides similar configuration. Implement and op for this. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03Merge branch 'mv88e6390-batch-two'David S. Miller
Andrew Lunn says: ==================== MV88E6390 batch two This is the second batch of patches adding support for the MV88e6390. They are not sufficient to make it work properly. The mv88e6390 has a much expanded set of priority maps. Refactor the existing code, and implement basic support for the new device. Similarly, the monitor control register has been reworked. The mv88e6390 has something odd in its EDSA tagging implementation, which means it is not possible to use it. So we need to use DSA tagging. This is the first device with EDSA support where we need to use DSA, and the code does not support this. So two patches refactor the existing code. The two different register definitions are separated out, and using DSA on an EDSA capable device is added. v2: Add port prefix Add helper function for 6390 Add _IEEE_ into #defines Split monitor_ctrl into a number of separate ops. Remove 6390 code which is management, used in a later patch s/EGREES/EGRESS/. Broke up setup_port_dsa() and set_port_dsa() into a number of ops v3: Verify mandatory ops for port setup Don't set ether type for DSA port. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net: dsa: mv88e6xxx: Refactor CPU and DSA port setupAndrew Lunn
Older chips only support DSA tagging. Newer chips have both DSA and EDSA tagging. Refactor the code by adding port functions for setting the frame mode, egress mode, and if to forward unknown frames. This results in the helper mv88e6xxx_6065_family() becoming unused, so remove it. Signed-off-by: Andrew Lunn <andrew@lunn.ch> v3: Verify mandatory ops for port setup Don't set ether type for DSA port. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net: dsa: mv88e6xxx: Move the tagging protocol into infoAndrew Lunn
Older chips support a single tagging protocol, DSA. New chips support both DSA and EDSA, an enhanced version. Having both as an option changes the register layouts. Up until now, it has been assumed that if EDSA is supported, it will be used. Hence the register layout has been determined by which protocol should be used. However, mv88e6390 has a different implementation of EDSA, which requires we need to use the DSA tagging. Hence separate the selection of the protocol from the register layout. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net: dsa: mv88e6xxx: Monitor and Management tablesAndrew Lunn
The mv88e6390 changes the monitor control register into the Monitor and Management control, which is an indirection register to various registers. Add ops to set the CPU port and the ingress/egress port for both register layouts, to global1 Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net: dsa: mv88e6xxx: Implement mv88e6390 tag remapAndrew Lunn
The mv88e6390 does not have the two registers to set the frame priority map. Instead it has an indirection registers for setting a number of different priority maps. Refactor the old code into an function, implement the mv88e6390 version, and use an op to call the right one. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03Merge tag 'drm-fixes-for-v4.9-rc8' of ↵Linus Torvalds
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "A pretty small pull request: a couple of AMD powerxpress regression fixes and a power management fix, a couple of i915 fixes and one hdlcd fix, along with one core don't oops because of incorrect API usage fix" * tag 'drm-fixes-for-v4.9-rc8' of git://people.freedesktop.org/~airlied/linux: drm/i915: drop the struct_mutex when wedged or trying to reset drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error drm: Don't call drm_for_each_crtc with a non-KMS driver drm/radeon: fix check for port PM availability drm/amdgpu: fix check for port PM availability drm/amd/powerplay: initialize the soft_regs offset in struct smu7_hwmgr drm: hdlcd: Fix cleanup order
2016-12-03Merge branch 'fib-notifier-event-replay'David S. Miller
Jiri Pirko says: ==================== ipv4: fib: Replay events when registering FIB notifier Ido says: In kernel 4.9 the switchdev-specific FIB offload mechanism was replaced by a new FIB notification chain to which modules could register in order to be notified about the addition and deletion of FIB entries. The motivation for this change was that switchdev drivers need to be able to reflect the entire FIB table and not only FIBs configured on top of the port netdevs themselves. This is useful in case of in-band management. The fundamental problem with this approach is that upon registration listeners lose all the information previously sent in the chain and thus have an incomplete view of the FIB tables, which can result in packet loss. This patchset fixes that by dumping the FIB tables and replaying notifications previously sent in the chain for the registered notification block. The entire dump process is done under RCU and thus the FIB notification chain is converted to be atomic. The listeners are modified accordingly. This is done in the first eight patches. The ninth patch adds a change sequence counter to ensure the integrity of the FIB dump. The last patch adds the dump itself to the FIB chain registration function and modifies existing listeners to pass a callback to be executed in case dump was inconsistent. --- v3->v4: - Register the notification block after the dump and protect it using the change sequence counter (Hannes Frederic Sowa). - Since we now integrate the dump into the registration function, drop the sysctl to set maximum number of retries and instead set it to a fixed number. Lets see if it's really a problem before adding something we can never remove. - For the same reason, dump FIB tables for all net namespaces. - Add a comment regarding guarantees provided by mutex semantics. v2->v3: - Add sysctl to set the number of FIB dump retries (Hannes Frederic Sowa). - Read the sequence counter under RTNL to ensure synchronization between the dump process and other processes changing the routing tables (Hannes Frederic Sowa). - Pass a callback to the dump function to be executed prior to a retry. - Limit the dump to a single net namespace. v1->v2: - Add a sequence counter to ensure the integrity of the FIB dump (David S. Miller, Hannes Frederic Sowa). - Protect notifications from re-ordering in listeners by using an ordered workqueue (Hannes Frederic Sowa). - Introduce fib_info_hold() (Jiri Pirko). - Relieve rocker from the need to invoke the FIB dump by registering to the FIB notification chain prior to ports creation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03ipv4: fib: Replay events when registering FIB notifierIdo Schimmel
Commit b90eb7549499 ("fib: introduce FIB notification infrastructure") introduced a new notification chain to notify listeners (f.e., switchdev drivers) about addition and deletion of routes. However, upon registration to the chain the FIB tables can already be populated, which means potential listeners will have an incomplete view of the tables. Solve that by dumping the FIB tables and replaying the events to the passed notification block. The dump itself is done using RCU in order not to starve consumers that need RTNL to make progress. The integrity of the dump is ensured by reading the FIB change sequence counter before and after the dump under RTNL. This allows us to avoid the problematic situation in which the dumping process sends a ENTRY_ADD notification following ENTRY_DEL generated by another process holding RTNL. Callers of the registration function may pass a callback that is executed in case the dump was inconsistent with current FIB tables. The number of retries until a consistent dump is achieved is set to a fixed number to prevent callers from looping for long periods of time. In case current limit proves to be problematic in the future, it can be easily converted to be configurable using a sysctl. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03ipv4: fib: Allow for consistent FIB dumpingIdo Schimmel
The next patch will enable listeners of the FIB notification chain to request a dump of the FIB tables. However, since RTNL isn't taken during the dump, it's possible for the FIB tables to change mid-dump, which will result in inconsistency between the listener's table and the kernel's. Allow listeners to know about changes that occurred mid-dump, by adding a change sequence counter to each net namespace. The counter is incremented just before a notification is sent in the FIB chain. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03ipv4: fib: Convert FIB notification chain to be atomicIdo Schimmel
In order not to hold RTNL for long periods of time we're going to dump the FIB tables using RCU. Convert the FIB notification chain to be atomic, as we can't block in RCU critical sections. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03rocker: Register FIB notifier before creating portsIdo Schimmel
We can miss FIB notifications sent between the time the ports were created and the FIB notification block registered. Instead of receiving these notifications only when they are replayed for the FIB notification block during registration, just register the notification block before the ports are created. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03rocker: Implement FIB offload in deferred workIdo Schimmel
Convert rocker to offload FIBs in deferred work in a similar fashion to mlxsw, which was converted in the previous commits. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03rocker: Create an ordered workqueue for FIB offloadIdo Schimmel
As explained in the previous commits, we need to process FIB entries addition / deletion events in FIFO order or otherwise we can have a mismatch between the kernel's FIB table and the device's. Create an ordered workqueue for rocker to which these work items will be submitted to. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03mlxsw: spectrum_router: Implement FIB offload in deferred workIdo Schimmel
FIB offload is currently done in process context with RTNL held, but we're about to dump the FIB tables in RCU critical section, so we can no longer sleep. Instead, defer the operation to process context using deferred work. Make sure fib info isn't freed while the work is queued by taking a reference on it and releasing it after the operation is done. Deferring the operation is valid because the upper layers always assume the operation was successful. If it's not, then the driver-specific abort mechanism is called and all routed traffic is directed to slow path. The work items are submitted to an ordered workqueue to prevent a mismatch between the kernel's FIB table and the device's. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03mlxsw: core: Create an ordered workqueue for FIB offloadIdo Schimmel
We're going to start processing FIB entries addition / deletion events in deferred work. These work items must be processed in the order they were submitted or otherwise we can have differences between the kernel's FIB table and the device's. Solve this by creating an ordered workqueue to which these work items will be submitted to. Note that we can't simply convert the current workqueue to be ordered, as EMADs re-transmissions are also processed in deferred work. Later on, we can migrate other work items to this workqueue, such as FDB notification processing and nexthop resolution, since they all take the same lock anyway. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03ipv4: fib: Add fib_info_hold() helperIdo Schimmel
As explained in the previous commit, modules are going to need to take a reference on fib info and then drop it using fib_info_put(). Add the fib_info_hold() helper to make the code more readable and also symmetric with fib_info_put(). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Suggested-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03ipv4: fib: Export free_fib_info()Ido Schimmel
The FIB notification chain is going to be converted to an atomic chain, which means switchdev drivers will have to offload FIB entries in deferred work, as hardware operations entail sleeping. However, while the work is queued fib info might be freed, so a reference must be taken. To release the reference (and potentially free the fib info) fib_info_put() will be called, which in turn calls free_fib_info(). Export free_fib_info() so that modules will be able to invoke fib_info_put(). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03act_mirred: fix a typo in get_devWANG Cong
Fixes: 255cb30425c0 ("net/sched: act_mirred: Add new tc_action_ops get_dev()") Cc: Hadar Hen Zion <hadarh@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2016-12-02 This series contains updates to i40e and i40evf only. Alex provides changes so that we are much more robust about defining what we can and cannot offload in i40e and i40evf by doing additional checks other than L4 tunnel header length. Jake provides several fixes/changes, first cleaning up a label that is unnecessary, as well as cleaned up the use of a "magic number". Clarified the code by separating the global private flags and the regular private flags per interface into two arrays, so that future additions will not produce duplication and buggy code. Adds additional checks to protect against NULL values for msix_entries and q_vectors pointers. Michal adds Clause22 method for accessing registers for some external PHYs. Piotr adds additional protocol support for the admin queue discover capabilities function. Tushar Dave fixes a panic seen on SPARC, where writel() should not be used to write directly to a memory address but only to a memory mapped I/O address otherwise it causes data access exceptions. Joe Perches separates out a section of code into its own function, to help reduce i40evf_reset_task() a bit. Alan fixes an issue by checking for NULL before dereferencing msix_entries and returning early in the case where it is NULL within the i40evf_close() code path. Henry provides code cleanup to remove unreachable and redundant sections of code. Fixed up an issue where new NICs were not identifying "unknown PHYs" correctly. Harshitha fixes a issue where the ethtool "Supported Link" modes list backplane interfaces on X722 devices for 10 GbE with SFP+ and Cortina retimer, where these interfaces should not be visible to the user since they cannot use them. Carolyn changes an X722 informational message so that it only appears when extra messages are desired. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03tcp: fix the missing avr32 SOF_TIMESTAMPING_OPT_STATSYuchung Cheng
The commit of SOF_TIMESTAMPING_OPT_STATS didn't include the new header for avr32, causing build to break. The patch fixes it. Fixes: 1c885808e456 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING") Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03udp: be less conservative with sock rmem accountingPaolo Abeni
Before commit 850cbaddb52d ("udp: use it's own memory accounting schema"), the udp protocol allowed sk_rmem_alloc to grow beyond the rcvbuf by the whole current packet's truesize. After said commit we allow sk_rmem_alloc to exceed the rcvbuf only if the receive queue is empty. As reported by Jesper this cause a performance regression for some (small) values of rcvbuf. This commit is intended to fix the regression restoring the old handling of the rcvbuf limit. Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Fixes: 850cbaddb52d ("udp: use it's own memory accounting schema") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03Merge tag 'batadv-net-for-davem-20161202' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Simon Wunderlich says: ==================== Here is another batman-adv bugfix: - fix checking for failed allocation of TVLV blocks in TT local data, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03net_sched: gen_estimator: account for timer driftsEric Dumazet
Under heavy stress, timer used in estimators tend to slowly be delayed by a few jiffies, leading to inaccuracies. Lets remember what was the last scheduled jiffies so that we get more precise estimations, without having to add a multiply/divide in the loop to account for the drifts. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03sfc: remove EFX_BUG_ON_PARANOID, use EFX_WARN_ON_[ONCE_]PARANOID insteadEdward Cree
Logically, EFX_BUG_ON_PARANOID can never be correct. For, BUG_ON should only be used if it is not possible to continue without potential harm; and since the non-DEBUG driver will continue regardless (as the BUG_ON is compiled out), clearly the BUG_ON cannot be needed in the DEBUG driver. So, replace every EFX_BUG_ON_PARANOID with either an EFX_WARN_ON_PARANOID or the newly defined EFX_WARN_ON_ONCE_PARANOID. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03Merge branch 'samples-bpf-automated-cgroup-tests'David S. Miller
Sargun Dhillon says: ==================== samples, bpf: Refactor; Add automated tests for cgroups These two patches are around refactoring out some old, reusable code from the existing test_current_task_under_cgroup_user test, and adding a new, automated test. There is some generic cgroupsv2 setup & cleanup code, given that most environment still don't have it setup by default. With this code, we're able to pretty easily add an automated test for future cgroupsv2 functionality. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03samples, bpf: Add automated test for cgroup filter attachmentsSargun Dhillon
This patch adds the sample program test_cgrp2_attach2. This program is similar to test_cgrp2_attach, but it performs automated testing of the cgroupv2 BPF attached filters. It runs the following checks: * Simple filter attachment * Application of filters to child cgroups * Overriding filters on child cgroups * Checking that this still works when the parent filter is removed The filters that are used here are simply allow all / deny all filters, so it isn't checking the actual functionality of the filters, but rather the behaviour around detachment / attachment. If net_cls is enabled, this test will fail. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03samples, bpf: Refactor test_current_task_under_cgroup - separate out helpersSargun Dhillon
This patch modifies test_current_task_under_cgroup_user. The test has several helpers around creating a temporary environment for cgroup testing, and moving the current task around cgroups. This set of helpers can then be used in other tests. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03samples/bpf: silence compiler warningsAlexei Starovoitov
silence some of the clang compiler warnings like: include/linux/fs.h:2693:9: warning: comparison of unsigned enum expression < 0 is always false arch/x86/include/asm/processor.h:491:30: warning: taking address of packed member 'sp0' of class or structure 'x86_hw_tss' may result in an unaligned pointer value include/linux/cgroup-defs.h:326:16: warning: field 'cgrp' with variable sized type 'struct cgroup' not at the end of a struct or class is a GNU extension since they add too much noise to samples/bpf/ build. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03netns: fix net_generic() "id - 1" bloatAlexey Dobriyan
net_generic() function is both a) inline and b) used ~600 times. It has the following code inside ... ptr = ng->ptr[id - 1]; ... "id" is never compile time constant so compiler is forced to subtract 1. And those decrements or LEA [r32 - 1] instructions add up. We also start id'ing from 1 to catch bugs where pernet sybsystem id is not initialized and 0. This is quite pointless idea (nothing will work or immediate interference with first registered subsystem) in general but it hints what needs to be done for code size reduction. Namely, overlaying allocation of pointer array and fixed part of structure in the beginning and using usual base-0 addressing. Ids are just cookies, their exact values do not matter, so lets start with 3 on x86_64. Code size savings (oh boy): -4.2 KB As usual, ignore the initial compiler stupidity part of the table. add/remove: 0/0 grow/shrink: 12/670 up/down: 89/-4297 (-4208) function old new delta tipc_nametbl_insert_publ 1250 1270 +20 nlmclnt_lookup_host 686 703 +17 nfsd4_encode_fattr 5930 5941 +11 nfs_get_client 1050 1061 +11 register_pernet_operations 333 342 +9 tcf_mirred_init 843 849 +6 tcf_bpf_init 1143 1149 +6 gss_setup_upcall 990 994 +4 idmap_name_to_id 432 434 +2 ops_init 274 275 +1 nfsd_inject_forget_client 259 260 +1 nfs4_alloc_client 612 613 +1 tunnel_key_walker 164 163 -1 ... tipc_bcbase_select_primary 392 360 -32 mac80211_hwsim_new_radio 2808 2767 -41 ipip6_tunnel_ioctl 2228 2186 -42 tipc_bcast_rcv 715 672 -43 tipc_link_build_proto_msg 1140 1089 -51 nfsd4_lock 3851 3796 -55 tipc_mon_rcv 1012 956 -56 Total: Before=156643951, After=156639743, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03netns: add dummy struct inside "struct net_generic"Alexey Dobriyan
This is precursor to fixing "[id - 1]" bloat inside net_generic(). Name "s" is chosen to complement name "u" often used for dummy unions. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03netns: publish net_generic correctlyAlexey Dobriyan
Publishing net_generic pointer is done with silly mistake: new array is published BEFORE setting freshly acquired pernet subsystem pointer. memcpy rcu_assign_pointer kfree_rcu ng->ptr[id - 1] = data; This bug was introduced with commit dec827d174d7f76c457238800183ca864a639365 ("[NETNS]: The generic per-net pointers.") in the glorious days of chopping networking stack into containers proper 8.5 years ago (whee...) How it didn't trigger for so long? Well, you need quite specific set of conditions: *) race window opens once per pernet subsystem addition (read: modprobe or boot) *) not every pernet subsystem is eligible (need ->id and ->size) *) not every pernet subsystem is vulnerable (need incorrect or absense of ordering of register_pernet_sybsys() and actually using net_generic()) *) to hide the bug even more, default is to preallocate 13 pointers which is actually quite a lot. You need IPv6, netfilter, bridging etc together loaded to trigger reallocation in the first place. Trimmed down config are OK. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>