summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-08-30netfilter: conntrack: add gc worker to remove timed-out entriesFlorian Westphal
Conntrack gc worker to evict stale entries. GC happens once every 5 seconds, but we only scan at most 1/64th of the table (and not more than 8k) buckets to avoid hogging cpu. This means that a complete scan of the table will take several minutes of wall-clock time. Considering that the gc run will never have to evict any entries during normal operation because those will happen from packet path this should be fine. We only need gc to make sure userspace (conntrack event listeners) eventually learn of the timeout, and for resource reclaim in case the system becomes idle. We do not disable BH and cond_resched for every bucket so this should not introduce noticeable latencies either. A followup patch will add a small change to speed up GC for the extreme case where most entries are timed out on an otherwise idle system. v2: Use cond_resched_rcu_qs & add comment wrt. missing restart on nulls value change in gc worker, suggested by Eric Dumazet. v3: don't call cancel_delayed_work_sync twice (again, Eric). Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-30netfilter: evict stale entries on netlink dumpsFlorian Westphal
When dumping we already have to look at the entire table, so we might as well toss those entries whose timeout value is in the past. We also look at every entry during resize operations. However, eviction there is not as simple because we hold the global resize lock so we can't evict without adding a 'expired' list to drop from later. Considering that resizes are very rare it doesn't seem worth doing it. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-30netfilter: conntrack: get rid of conntrack timerFlorian Westphal
With stats enabled this eats 80 bytes on x86_64 per nf_conn entry, as Eric Dumazet pointed out during netfilter workshop 2016. Eric also says: "Another reason was the fact that Thomas was about to change max timer range [..]" (500462a9de657f8, 'timers: Switch to a non-cascading wheel'). Remove the timer and use a 32bit jiffies value containing timestamp until entry is valid. During conntrack lookup, even before doing tuple comparision, check the timeout value and evict the entry in case it is too old. The dying bit is used as a synchronization point to avoid races where multiple cpus try to evict the same entry. Because lookup is always lockless, we need to bump the refcnt once when we evict, else we could try to evict already-dead entry that is being recycled. This is the standard/expected way when conntrack entries are destroyed. Followup patches will introduce garbage colliction via work queue and further places where we can reap obsoleted entries (e.g. during netlink dumps), this is needed to avoid expired conntracks from hanging around for too long when lookup rate is low after a busy period. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-30netfilter: don't rely on DYING bit to detect when destroy event was sentFlorian Westphal
The reliable event delivery mode currently (ab)uses the DYING bit to detect which entries on the dying list have to be skipped when re-delivering events from the eache worker in reliable event mode. Currently when we delete the conntrack from main table we only set this bit if we could also deliver the netlink destroy event to userspace. If we fail we move it to the dying list, the ecache worker will reattempt event delivery for all confirmed conntracks on the dying list that do not have the DYING bit set. Once timer is gone, we can no longer use if (del_timer()) to detect when we 'stole' the reference count owned by the timer/hash entry, so we need some other way to avoid racing with other cpu. Pablo suggested to add a marker in the ecache extension that skips entries that have been unhashed from main table but are still waiting for the last reference count to be dropped (e.g. because one skb waiting on nfqueue verdict still holds a reference). We do this by adding a tristate. If we fail to deliver the destroy event, make a note of this in the eache extension. The worker can then skip all entries that are in a different state. Either they never delivered a destroy event, e.g. because the netlink backend was not loaded, or redelivery took place already. Once the conntrack timer is removed we will now be able to replace del_timer() test with test_and_set_bit(DYING, &ct->status) to avoid racing with other cpu that tries to evict the same conntrack. Because DYING will then be set right before we report the destroy event we can no longer skip event reporting when dying bit is set. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-30netfilter: restart search if moved to other chainFlorian Westphal
In case nf_conntrack_tuple_taken did not find a conflicting entry check that all entries in this hash slot were tested and restart in case an entry was moved to another chain. Reported-by: Eric Dumazet <edumazet@google.com> Fixes: ea781f197d6a ("netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu()") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-30Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2016-08-29 This series contains updates to fm10k only. Jake provides all the changes in this series starting with fixes an issue where VF devices may fail during an unbind/bind and we will never zero the reference counter for the pci_dev structure. Updated the hot path to use SW counters instead of checking for hardware Tx pending for possible transmit hangs, which will improve performance. Fixed the NAPI budget accounting so that fm10k_poll will return actual work done, capped at (budget - 1) instead of returning 0. Added a check to ensure that the device is in the normal IO state before continuing to probe, which allows us to give a more descriptive message of what is wrong in the case of uncorrectable AER error. In preparation for adding Geneve Rx offload support, refactored the current VXLAN offload flow to be a bit more generic. Added support for receive offloads on one Geneve tunnel. Ensure that other bits in the RXQCTL register do not get cleared, to make sure that bits related to queue ownership are maintained. Fixed an issue in queue ownership assignment which casued a race condition between the PF and the VF such that potentially a VF could cause FUM fault errors due to normal PF/VF driver behavior. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
All three conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-29Merge tag 'hwmon-for-linus-v4.8-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fix from Guenter Roeck: "Add missing sysfs attribute group terminator to it87 driver" * tag 'hwmon-for-linus-v4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (it87) Add missing sysfs attribute group terminator
2016-08-29Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix bugs that could cause kernel deadlocks or file system corruption while moving xattrs to expand the extended inode. Also add some sanity checks to the block group descriptors to make sure we don't end up overwriting the superblock" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: avoid deadlock when expanding inode size ext4: properly align shifted xattrs when expanding inodes ext4: fix xattr shifting when expanding inodes part 2 ext4: fix xattr shifting when expanding inodes ext4: validate that metadata blocks do not overlap superblock ext4: reserve xattr index for the Hurd
2016-08-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Segregate namespaces properly in conntrack dumps, from Liping Zhang. 2) tcp listener refcount fix in netfilter tproxy, from Eric Dumazet. 3) Fix timeouts in qed driver due to xmit_more, from Yuval Mintz. 4) Fix use-after-free in tcp_xmit_retransmit_queue(). 5) Userspace header fixups (use of __u32, missing includes, etc.) from Mikko Rapeli. 6) Further refinements to fragmentation wrt gso and tunnels, from Shmulik Ladkani. 7) Trigger poll correctly for zero length UDP packets, from Eric Dumazet. 8) TCP window scaling fix, also from Eric Dumazet. 9) SLAB_DESTROY_BY_RCU is not relevant any more for UDP sockets. 10) Module refcount leak in qdisc_create_dflt(), from Eric Dumazet. 11) Fix deadlock in cp_rx_poll() of 8139cp driver, from Gao Feng. 12) Memory leak in rhashtable's alloc_bucket_locks(), from Eric Dumazet. 13) Add new device ID to alx driver, from Owen Lin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits) Add Killer E2500 device ID in alx driver. net: smc91x: fix SMC accesses Documentation: networking: dsa: Remove platform device TODO net/mlx5: Increase number of ethtool steering priorities net/mlx5: Add error prints when validate ETS failed net/mlx5e: Fix memory leak if refreshing TIRs fails net/mlx5e: Add ethtool counter for TX xmit_more net/mlx5e: Fix ethtool -g/G rx ring parameter report with striding RQ net/mlx5e: Don't wait for SQ completions on close net/mlx5e: Don't post fragmented MPWQE when RQ is disabled net/mlx5e: Don't wait for RQ completions on close net/mlx5e: Limit UMR length to the device's limitation rhashtable: fix a memory leak in alloc_bucket_locks() sfc: fix potential stack corruption from running past stat bitmask team: loadbalance: push lacpdus to exact delivery net: hns: dereference ppe_cb->ppe_common_cb if it is non-null 8139cp: Fix one possible deadloop in cp_rx_poll i40e: Change some init flow for the client Revert "phy: IRQ cannot be shared" net: dsa: bcm_sf2: Fix race condition while unmasking interrupts ...
2016-08-29Merge tag 'platform-drivers-x86-v4.8-4' of ↵Linus Torvalds
git://git.infradead.org/users/dvhart/linux-platform-drivers-x86 Pull x86 platform driver fixes from Darren Hart: "Remove module related code from two drivers that are only configurable as built-in: intel_pmic_gpio and platform/olpc" * tag 'platform-drivers-x86-v4.8-4' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: intel_pmic_gpio: Make explicitly non-modular platform/olpc: Make ec explicitly non-modular
2016-08-29Merge tag 'powerpc-4.8-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Ben Herrenschmidt: "This was meant to be sent early last week, but I has a change pending on one of the fixes and other things made me forget all about. Ugh. We have some misc fixes for powerpc 4.8. Some trivial bits and some regressions, and a trivial cleanup or two that I saw no point in letting rot in patchwork" * tag 'powerpc-4.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: signals: Discard transaction state from signal frames powerpc/powernv : Drop reference added by kset_find_obj() powerpc/tm: do not use r13 for tabort_syscall powerpc: move hmi.c to arch/powerpc/kvm/ powerpc: sysdev: cpm: fix gpio save_regs functions powerpc/pseries: PACA save area fix for MCE vs MCE powerpc/pseries: PACA save area fix for general exception vs MCE powerpc/prom: Fix sub-processor option passed to ibm, client-architecture-support powerpc, hotplug: Avoid to touch non-existent cpumasks. powerpc: migrate exception table users off module.h and onto extable.h powerpc/powernv/pci: fix iterator signedness powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb) cxl: use pcibios_free_controller_deferred() when removing vPHBs powerpc: mpc8349emitx: Delete unnecessary assignment for the field "owner" powerpc/512x: Delete unnecessary assignment for the field "owner" drivers/macintosh: Delete owner assignment powerpc: cputhreads: Add missing include file
2016-08-29hwmon: (it87) Add missing sysfs attribute group terminatorJean Delvare
Attribute array it87_attributes_in lacks its NULL terminator, causing random behavior when operating on the attribute group. Fixes: 52929715634a ("hwmon: (it87) Use is_visible for voltage sensors") Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-08-29fm10k: don't re-map queues when a mailbox message sufficesJacob Keller
When the PF assigns a new MAC address to a VF it uses the base address registers to store the MAC address. This allows a VF which loads after this setup the ability to get the initial address without having to wait for a mailbox message. Unfortunately to do this, the PF must take queue ownership away from the VF, which can cause fault errors when there is already an active VF driver. This queue ownership assignment causes race condition between the PF and the VF such that potentially a VF can cause FUM fault errors due to normal PF/VF driver behavior. It is not safe to simply allow the PF to write the base address registers without taking queue ownership back as the PF must also disable the queues, and this would impact active VF use. The current code is safe because the queue ownership will prevent the VF from actually writing but does trigger the FUM fault. We can do better by simply avoiding the register write process when a mailbox message suffices. If the message can be sent over the mailbox, then we will not perform the queue ownership assignment and we won't update the base address to be the same as the MAC address. We do still have to write the TXQCTL registers in order to update the VID of the queue. This is necessary because the TXQCTL register is read-only from the VF, and thus the VF cannot do this for itself. This register does not need to wait for the Tx queue to be disabled and is safe for the PF to write during normal VF operation, so we move this write to the top of the function above the mailbox message. Without this, the TXQCTL register would be misconfigured and cause the VF to Tx hang. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29fm10k: don't clear the RXQCTL register when enabling or disabling queuesJacob Keller
Ensure that other bits in the RXQCTL register do not get cleared. This ensures that bits related to queue ownership are maintained. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29fm10k: remove unnecessary extra parenthesis around ((~value))Jacob Keller
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29fm10k: add support for Rx offloads on one Geneve tunnelJacob Keller
Similar to how we handle VXLAN offload, enable support for a single Geneve tunnel. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29fm10k: rework vxlan_port offload before adding geneve supportJacob Keller
In preparation for adding Geneve Rx offload support, refactor the current VXLAN offload flow to be a bit more generic so that it will be easier to add the new Geneve code. The fm10k hardware supports one VXLAN and one Geneve tunnel, so we will eventually treat the VXLAN and Geneve tunnels identically. To this end, factor out the code that handles the current list so that we can use the generic flow for both tunnels in the next patch. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29fm10k: don't try to stop queues if we've lost hw_addrJacob Keller
In the event of a surprise remove, we expect the driver to go down, which includes calling .stop_hw(). However, this function will return an error because the queues won't appear to cleanly disable. Prevent this and avoid the unnecessary checks by just returning when FM10K_REMOVED(hw->hw_addr) is true. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29fm10k: don't continue probe if PCI device not in normal IO stateJacob Keller
In the event of an uncorrectable AER error occurring when the driver has not loaded, the recovery routines are not done. This is done because future loads of the driver may not be aware of the IO state and may not be able to recover at all. In this case, when we next load the driver it fails due to what appears to be a surprise remove event. Instead, add a check to ensure that the device is in the normal IO state before continuing to probe. This allows us to give a more descriptive message of what is wrong. Without this change, the driver will attempt to probe up to our first call of .reset_hw() which will be unable to read registers and act as if a surprise remove event occurred. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29fm10k: print error code when pci_enable_device_mem fails during probeJacob Keller
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29fm10k: NAPI polling routine must return actual work doneJacob Keller
When fm10k_poll fully cleans rings it returns 0. This is incorrect as it messes up the budget accounting in the core NAPI code. Fix this by returning actual work done, capped at budget - 1 since the core doesn't expect a return of the full budget when the driver modifies the NAPI status. Cc: Paolo Abeni <pabeni@redhat.com> Cc: Venkatesh Srinivas <venkateshs@google.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29fm10k: prefer READ_ONCE instead of ACCESS_ONCEJacob Keller
While technically not needed, as all our uses of ACCESS_ONCE are scalar types, we already use READ_ONCE in a few places, and for code readability we can swap all the uses of the older ACCESS_ONCE into READ_ONCE. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29fm10k: remove fm10k_get_reta_size from namespaceJacob Keller
The function is only used in fm10k_ethtool.c, so make it static. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29fm10k: use variadic form of alloc_workqueueJacob Keller
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29fm10k: use software values when checking for Tx hangs in hot pathJacob Keller
A previous patch added support to check for hardware Tx pending in the fm10k_down routine. This support was intended to ensure that we accurately check what the hardware state is. However, checking for Tx hangs in this manor during the hotpath results in a large performance hit. Avoid this by making the hotpath check use the SW counters instead. Fixes: a0f53cf49cb0 ("fm10k: use actual hardware registers when checking for pending Tx", 2016-06-08) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-29fm10k: fix PCI device enable_cnt leak in .io_slot_resetJacob Keller
A previous patch removed the pci_disable_device() call in .io_error_detected. This call corresponded to a pci_enable_device_mem() call within .io_slot_reset handler. Change the call here to a pci_reenable_device() so that it does not increment and leak the enable_cnt reference count for the device. Without this change, VF devices may fail during an unbind/bind, and we'll never zero the reference counter for the pci_dev structure. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-28intel_pmic_gpio: Make explicitly non-modularPaul Gortmaker
The Kconfig entry controlling compilation of this code is: drivers/platform/x86/Kconfig:config GPIO_INTEL_PMIC drivers/platform/x86/Kconfig: bool "Intel PMIC GPIO support" ...meaning that it currently is not being built as a module by anyone. Lets remove the couple traces of modular infrastructure use, so that when reading the driver there is no doubt it is builtin-only. We delete the MODULE_LICENSE tag etc. since all that information was (or is now) contained at the top of the file in the comments. We don't replace module.h with init.h since the file already has that. Cc: Alek Du <alek.du@intel.com> Cc: platform-driver-x86@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-08-28platform/olpc: Make ec explicitly non-modularPaul Gortmaker
The Kconfig entry controlling compilation of this code is: arch/x86/Kconfig:config OLPC arch/x86/Kconfig: bool "One Laptop Per Child support" ...meaning that it currently is not being built as a module by anyone. Lets remove the couple traces of modular infrastructure use, so that when reading the driver there is no doubt it is builtin-only. We delete the MODULE_LICENSE tag etc. since all that information was (or is now) contained at the top of the file in the comments. Cc: platform-driver-x86@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-08-29net_sched: fix use of uninitialized ethertype variable in cls_flowerArnd Bergmann
The addition of VLAN support caused a possible use of uninitialized data if we encounter a zero TCA_FLOWER_KEY_ETH_TYPE key, as pointed out by "gcc -Wmaybe-uninitialized": net/sched/cls_flower.c: In function 'fl_change': net/sched/cls_flower.c:366:22: error: 'ethertype' may be used uninitialized in this function [-Werror=maybe-uninitialized] This changes the code to only set the ethertype field if it was nonzero, as before the patch. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 9399ae9a6cb2 ("net_sched: flower: Add vlan support") Cc: Hadar Hen Zion <hadarh@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-29net/xgene: fix error handling during resetArnd Bergmann
The newly added reset logic uses helper functions for the MMIO that may fail. However, when the read operation fails, we end up writing back uninitialized data to the register, as gcc warns: drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c: In function 'xgene_enet_link_state': drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c:213:2: error: 'data' may be used uninitialized in this function [-Werror=maybe-uninitialized] drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c:209:6: note: 'data' was declared here u32 data; We already print a warning to the console log if that happens, the best alternative that I can see is skip the rest of the reset sequence if the register value cannot be read: Most likely the write would fail as well, and if it succeeded, worse things could happen. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 3eb7cb9dc946 ("drivers: net: xgene: XFI PCS reset when link is down") Cc: Fushen Chen <fchen@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-29net: ethtool: add support for 1000BaseX and missing 10G link modesVidya Sagar Ravipati
This patch enhances ethtool link mode bitmap to include missing interface modes for 1G/10G speeds Changes: 1000baseX is the mode introduced to cover all 1G Fiber cases. All modes under 1000BaseX i.e. 1000BASE-SX, 1000BASE-LX, 1000BASE-LX10 and 1000BASE-BX10 are not explicitly defined at this moment. 10G CR,SR,LR and ER link modes are included for 10G speed.. Issue: ethtool on 1G/10G SFP port reports Base-T as this port supports 1000baseX,10G CR, SR and LR modes. root@tor-02$ ethtool swp1 Settings for swp1: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full 10000baseT/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Advertised link modes: 1000baseT/Full Advertised pause frame use: No Advertised auto-negotiation: No Speed: 10000Mb/s Duplex: Full Port: FIBRE PHYAD: 0 Transceiver: external Auto-negotiation: off Current message level: 0x00000000 (0) Link detected: yes After fix: root@tor-02$ ethtool swp1 Settings for swp1: Supported ports: [ FIBRE ] Supported link modes: 1000baseX/Full 10000baseCR/Full 10000baseSR/Full 10000baseLR/Full 10000baseER/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Advertised link modes: 1000baseT/Full Advertised pause frame use: No Advertised auto-negotiation: No Speed: 10000Mb/s Duplex: Full Port: FIBRE PHYAD: 0 Transceiver: external Auto-negotiation: off Current message level: 0x00000000 (0) Link detected: yes Signed-off-by: Vidya Sagar Ravipati <vidya@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-29amd-xgbe: Reset running devices after resume from hibernateJames Morse
After resume from hibernate on arm64, any amd-xgbe devices that were running when we hibernated are reported as down, even when it is not. Re-plugging the cables does not cause the interface to come back, the link must be marked as down then up via 'ip set link' using the serial console. This happens because the device has been power-cycled and possibly re-initialised by firmware, whereas the driver's memory structures have been restored from the hibernate image and the two do not agree. Schedule a restart of the device after powerup in case the world changed while we were asleep. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-29Add Killer E2500 device ID in alx driver.Owen Lin
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-29tcp: add tcp_add_backlog()Eric Dumazet
When TCP operates in lossy environments (between 1 and 10 % packet losses), many SACK blocks can be exchanged, and I noticed we could drop them on busy senders, if these SACK blocks have to be queued into the socket backlog. While the main cause is the poor performance of RACK/SACK processing, we can try to avoid these drops of valuable information that can lead to spurious timeouts and retransmits. Cause of the drops is the skb->truesize overestimation caused by : - drivers allocating ~2048 (or more) bytes as a fragment to hold an Ethernet frame. - various pskb_may_pull() calls bringing the headers into skb->head might have pulled all the frame content, but skb->truesize could not be lowered, as the stack has no idea of each fragment truesize. The backlog drops are also more visible on bidirectional flows, since their sk_rmem_alloc can be quite big. Let's add some room for the backlog, as only the socket owner can selectively take action to lower memory needs, like collapsing receive queues or partial ofo pruning. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28net: smc91x: fix SMC accessesRussell King
Commit b70661c70830 ("net: smc91x: use run-time configuration on all ARM machines") broke some ARM platforms through several mistakes. Firstly, the access size must correspond to the following rule: (a) at least one of 16-bit or 8-bit access size must be supported (b) 32-bit accesses are optional, and may be enabled in addition to the above. Secondly, it provides no emulation of 16-bit accesses, instead blindly making 16-bit accesses even when the platform specifies that only 8-bit is supported. Reorganise smc91x.h so we can make use of the existing 16-bit access emulation already provided - if 16-bit accesses are supported, use 16-bit accesses directly, otherwise if 8-bit accesses are supported, use the provided 16-bit access emulation. If neither, BUG(). This exactly reflects the driver behaviour prior to the commit being fixed. Since the conversion incorrectly cut down the available access sizes on several platforms, we also need to go through every platform and fix up the overly-restrictive access size: Arnd assumed that if a platform can perform 32-bit, 16-bit and 8-bit accesses, then only a 32-bit access size needed to be specified - not so, all available access sizes must be specified. This likely fixes some performance regressions in doing this: if a platform does not support 8-bit accesses, 8-bit accesses have been emulated by performing a 16-bit read-modify-write access. Tested on the Intel Assabet/Neponset platform, which supports only 8-bit accesses, which was broken by the original commit. Fixes: b70661c70830 ("net: smc91x: use run-time configuration on all ARM machines") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28Documentation: networking: dsa: Remove platform device TODOFlorian Fainelli
Since commit 83c0afaec7b7 ("net: dsa: Add new binding implementation"), the shortcomings of the dsa platform device have been addressed, remove that TODO item. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28cxgb4/cxgb4vf: fix spelling mistake "provissioned" -> "provisioned"Colin Ian King
Trivial fix to spelling mistake in dev_warn message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28net: ucc_geth: fix spelling mistake "propperty" -> "property"Colin Ian King
Trivial fix to spelling mistake in dev_warn message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28wan/fsl_ucc_hdlc: fix spelling mistake "prameter" -> "parameter"Colin Ian King
Trivial fix to spelling mistake in dev_err message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28Merge branch 'strp-generalization'David S. Miller
Tom Herbert says: ==================== strp: Generalize stream parser to work with other socket types Add a read_sock protocol operation function that allows something like tcp_read_sock to be called for other protocol types. Specific changes in this patch set: - Add read_sock function to proto_ops. This has the same signature as tcp_read_sock. sk_read_actor_t is also defined in net.h. - Set peek_len and read_sock proto_op functions for TCPv4 and TCPv6 stream ops. - Remove references to tcp in strparser. - Call peek_len and read_sock operations from strparser instead of calling TCP specific functions. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28kcm: Remove TCP specific references from kcm and strparserTom Herbert
kcm and strparser need to work with any type of stream socket not just TCP. Eliminate references to TCP and call generic proto_ops functions of read_sock and peek_len. Also in strp_init check if the socket support the proto_ops read_sock and peek_len. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28tcp: Set read_sock and peek_len proto_opsTom Herbert
In inet_stream_ops we set read_sock to tcp_read_sock and peek_len to tcp_peek_len (which is just a stub function that calls tcp_inq). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28net: Add read_sock proto_opTom Herbert
Add new function in proto_ops structure. This includes moving the typedef got sk_read_actor into net.h and removing the definition from tcp.h. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28Merge branch 'mlx5-series'David S. Miller
Saeed Mahameed says: ==================== Mellanox 100G mlx5 fixes 2016-08-29 This series contains some bug fixes for the mlx5 core and mlx5 ethernet driver. From Saeed, Fix UMR to consider hardware translation table field size limitation when calculating the maximum number of MTTs required by the driver. Three patches to speed-up netdevice close time by serializing channel (SQs & RQs) destruction rather than issuing and waiting for hardware interrupts to free them. From Eran, Fix ethtool ring parameter reporting for striding RQ layout. Add error prints on ETS validation failure. From Kamal, Fix memory leak on error flow. From Maor, Fix ethtool steering priorities number. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28net/mlx5: Increase number of ethtool steering prioritiesMaor Gottlieb
Ethtool has 11 flow tables, each flow table has its own priority. Increase the number of priorities to be aligned with the number of flow tables. Fixes: 1174fce8d141 ('net/mlx5e: Support l3/l4 flow type specs in ethtool flow steering') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28net/mlx5: Add error prints when validate ETS failedEran Ben Elisha
Upon set ETS failure due to user invalid input, add error prints to specify the exact error to the user. Fixes: cdcf11212b22 ('net/mlx5e: Validate BW weight values of ETS') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28net/mlx5e: Fix memory leak if refreshing TIRs failsKamal Heib
Free 'in' command object also when mlx5_core_modify_tir fails. Fixes: 724b2aa15126 ("net/mlx5e: TIRs management refactoring") Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28net/mlx5e: Add ethtool counter for TX xmit_moreTariq Toukan
Add a counter in ethtool for the number of times that TX xmit_more was used. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28net/mlx5e: Fix ethtool -g/G rx ring parameter report with striding RQEran Ben Elisha
The driver RQ has two possible configurations: striding RQ and non-striding RQ. Until this patch, the driver always reported the number of hardware WQEs (ring descriptors). For non striding RQ configuration, this was OK since we have one WQE per pending packet For striding RQ, multiple packets can fit into one WQE. For better user experience we normalize the rx_pending parameter (size of wqe/mtu) as the average ring size in case of striding RQ. Fixes: 461017cb006a ('net/mlx5e: Support RX multi-packet WQE ...') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>