summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-05-22Merge branch 'Support-for-fdb-ECMP-nexthop-groups'David S. Miller
Roopa Prabhu says: ==================== Support for fdb ECMP nexthop groups This series introduces ecmp nexthops and nexthop groups for mac fdb entries. In subsequent patches this is used by the vxlan driver fdb entries. The use case is E-VPN multihoming [1,2,3] which requires bridged vxlan traffic to be load balanced to remote switches (vteps) belonging to the same multi-homed ethernet segment (This is analogous to a multi-homed LAG but over vxlan). Changes include new nexthop flag NHA_FDB for nexthops referenced by fdb entries. These nexthops only have ip. The patches make sure that routes dont reference such nexthops. example: $ip nexthop add id 12 via 172.16.1.2 fdb $ip nexthop add id 13 via 172.16.1.3 fdb $ip nexthop add id 102 group 12/13 fdb $bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self [1] E-VPN https://tools.ietf.org/html/rfc7432 [2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365 [3] LPC talk with mention of nexthop groups for L2 ecmp http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf v4 - - fix error path free_skb in vxlan_xmit_nh - fix atomic notifier initialization issue (Reported-by: kernel test robot <rong.a.chen@intel.com>) The reported error was easy to locate and fix, but i was not able to re-test with the robot reproducer script due to some other issues with running the script on my test system. v3 - fix wording in selftest print as pointed out by davidA v2 - - dropped nikolays fixes for nexthop multipath null pointer deref (he will send those separately) - added negative tests for route add with fdb nexthop + a few more - Fixes for a few fdb replace conditions found during more testing - Moved to rcu_dereference_rtnl in vxlan_fdb_info and consolidate rcu dereferences - Fixes to build failures Reported-by: kbuild test robot <lkp@intel.com> - DavidA, I am going to send a separate patch for the neighbor code validation for NDA_NH_ID if thats ok. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22selftests: net: add fdb nexthop testsRoopa Prabhu
This commit adds ipv4 and ipv6 fdb nexthop api tests to fib_nexthops.sh. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22vxlan: support for nexthop notifiersRoopa Prabhu
vxlan driver registers for nexthop add/del notifiers to cleanup fdb entries pointing to such nexthops. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22nexthop: add support for notifiersRoopa Prabhu
This patch adds nexthop add/del notifiers. To be used by vxlan driver in a later patch. Could possibly be used by switchdev drivers in the future. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22vxlan: ecmp support for mac fdb entriesRoopa Prabhu
Todays vxlan mac fdb entries can point to multiple remote ips (rdsts) with the sole purpose of replicating broadcast-multicast and unknown unicast packets to those remote ips. E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be load balanced to remote switches (vteps) belonging to the same multi-homed ethernet segment (E-VPN multihoming is analogous to multi-homed LAG implementations, but with the inter-switch peerlink replaced with a vxlan tunnel). In other words it needs support for mac ecmp. Furthermore, for faster convergence, E-VPN multihoming needs the ability to update fdb ecmp nexthops independent of the fdb entries. New route nexthop API is perfect for this usecase. This patch extends the vxlan fdb code to take a nexthop id pointing to an ecmp nexthop group. Changes include: - New NDA_NH_ID attribute for fdbs - Use the newly added fdb nexthop groups - makes vxlan rdsts and nexthop handling code mutually exclusive - since this is a new use-case and the requirement is for ecmp nexthop groups, the fdb add and update path checks that the nexthop is really an ecmp nexthop group. This check can be relaxed in the future, if we want to introduce replication fdb nexthop groups and allow its use in lieu of current rdst lists. - fdb update requests with nexthop id's only allowed for existing fdb's that have nexthop id's - learning will not override an existing fdb entry with nexthop group - I have wrapped the switchdev offload code around the presence of rdst [1] E-VPN RFC https://tools.ietf.org/html/rfc7432 [2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365 [3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf Includes a null check fix in vxlan_xmit from Nikolay v2 - Fixed build issue: Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22nexthop: support for fdb ecmp nexthopsRoopa Prabhu
This patch introduces ecmp nexthops and nexthop groups for mac fdb entries. In subsequent patches this is used by the vxlan driver fdb entries. The use case is E-VPN multihoming [1,2,3] which requires bridged vxlan traffic to be load balanced to remote switches (vteps) belonging to the same multi-homed ethernet segment (This is analogous to a multi-homed LAG but over vxlan). Changes include new nexthop flag NHA_FDB for nexthops referenced by fdb entries. These nexthops only have ip. This patch includes appropriate checks to avoid routes referencing such nexthops. example: $ip nexthop add id 12 via 172.16.1.2 fdb $ip nexthop add id 13 via 172.16.1.3 fdb $ip nexthop add id 102 group 12/13 fdb $bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self [1] E-VPN https://tools.ietf.org/html/rfc7432 [2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365 [3] LPC talk with mention of nexthop groups for L2 ecmp http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf v4 - fixed uninitialized variable reported by kernel test robot Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22Merge branch '1GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 1GbE Intel Wired LAN Driver Updates 2020-05-21 This series contains updates to igc and e1000. Andre cleans up code that was left over from the igb driver that handled MAC address filters based on the source address, which is not currently supported. Simplifies the MAC address filtering code and prepare the igc driver for future source address support. Updated the MAC address filter internal APIs to support filters based on source address. Added support for Network Flow Classification (NFC) rules based on source MAC address. Cleaned up the 'cookie' field which is not used anywhere in the code and cleaned up a wrapper function that was not needed. Simplified the filtering code for readability and aligned the ethtool functions, so that function names were consistent. Alex provides a fix for e1000 to resolve a deadlock issue when NAPI is being disabled. Sasha does additional cleanup of the igc driver of dead code that is not used or needed. v2: Fix the function header comment in patch 3 of the series, based on the feedback from Jakub Kicinski. ==================== Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22Merge tag 'io_uring-5.7-2020-05-22' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "A small collection of small fixes that should go into this release: - Two fixes for async request preparation (Pavel) - Busy clear fix for SQPOLL (Xiaoguang) - Don't use kiocb->private for O_DIRECT buf index, some file systems use it (Bijan) - Kill dead check in io_splice() - Ensure sqo_wait is initialized early - Cancel task_work if we fail adding to original process - Only add (IO)pollable requests to iopoll list, fixing a regression in this merge window" * tag 'io_uring-5.7-2020-05-22' of git://git.kernel.dk/linux-block: io_uring: reset -EBUSY error when io sq thread is waken up io_uring: don't add non-IO requests to iopoll pending list io_uring: don't use kiocb.private to store buf_index io_uring: cancel work if task_work_add() fails io_uring: remove dead check in io_splice() io_uring: fix FORCE_ASYNC req preparation io_uring: don't prepare DRAIN reqs twice io_uring: initialize ctx->sqo_wait earlier
2020-05-22Merge tag 'block-5.7-2020-05-22' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Two fixes for null_blk zone mode" * tag 'block-5.7-2020-05-22' of git://git.kernel.dk/linux-block: null_blk: don't allow discard for zoned mode null_blk: return error for invalid zone size
2020-05-22Merge tag 'riscv-for-linus-5.7-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: "Two fixes: - Another !MMU build fix that was a straggler from last week - A fix to use the "register" keyword for the GP global register variable" * tag 'riscv-for-linus-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: gp_in_global needs register keyword riscv: Fix print_vm_layout build error if NOMMU
2020-05-22Merge tag 'efi-fixes-for-v5.7-rc6' of ↵Borislav Petkov
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/urgent Pull EFI fixes from Ard Biesheuvel: "- fix EFI framebuffer earlycon for wide fonts - avoid filling screen_info with garbage if the EFI framebuffer is not available - fix a potential host tool build error due to a symbol clash on x86 - work around a EFI firmware bug regarding the binary format of the TPM final events table - fix a missing memory free by reworking the E820 table sizing routine to not do the allocation in the first place - add CPER parsing for firmware errors"
2020-05-22x86/unwind/orc: Fix unwind_get_return_address_ptr() for inactive tasksJosh Poimboeuf
Normally, show_trace_log_lvl() scans the stack, looking for text addresses to print. In parallel, it unwinds the stack with unwind_next_frame(). If the stack address matches the pointer returned by unwind_get_return_address_ptr() for the current frame, the text address is printed normally without a question mark. Otherwise it's considered a breadcrumb (potentially from a previous call path) and it's printed with a question mark to indicate that the address is unreliable and typically can be ignored. Since the following commit: f1d9a2abff66 ("x86/unwind/orc: Don't skip the first frame for inactive tasks") ... for inactive tasks, show_trace_log_lvl() prints *only* unreliable addresses (prepended with '?'). That happens because, for the first frame of an inactive task, unwind_get_return_address_ptr() returns the wrong return address pointer: one word *below* the task stack pointer. show_trace_log_lvl() starts scanning at the stack pointer itself, so it never finds the first 'reliable' address, causing only guesses to being printed. The first frame of an inactive task isn't a normal stack frame. It's actually just an instance of 'struct inactive_task_frame' which is left behind by __switch_to_asm(). Now that this inactive frame is actually exposed to callers, fix unwind_get_return_address_ptr() to interpret it properly. Fixes: f1d9a2abff66 ("x86/unwind/orc: Don't skip the first frame for inactive tasks") Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200522135435.vbxs7umku5pyrdbk@treble
2020-05-22Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Bring the PTRACE_SYSEMU semantics in line with the man page. - Annotate variable assignment in get_user() with the type to avoid sparse warnings. * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Add get_user() type annotation on the !access_ok() path arm64: Fix PTRACE_SYSEMU semantics
2020-05-22Merge tag 'sound-5.7-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Just a few small fixes: the only significant one is a slight improvement for PCM running position update with no-period-elapsed case while the rest are HD-audio fixups and ice1712 model quirk" * tag 'sound-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek - Add more fixup entries for Clevo machines ALSA: iec1712: Initialize STDSP24 properly when using the model=staudio option ALSA: hda/realtek - Fix silent output on Gigabyte X570 Aorus Xtreme ALSA: pcm: fix incorrect hw_base increase
2020-05-22arm64: Add get_user() type annotation on the !access_ok() pathAl Viro
Sparse reports "Using plain integer as NULL pointer" when the arm64 __get_user_error() assigns 0 to a pointer type. Use proper type annotation. Signed-of-by: Al Viro <viro@zeniv.linux.org.uk> Reported-by: kbuild test robot <lkp@intel.com> Link: http://lkml.kernel.org/r/20200522142321.GP23230@ZenIV.linux.org.uk Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-05-22Merge tag 'powerpc-5.7-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - a revert of a recent change to the PTE bits for 32-bit BookS, which broke swap. - a "fix" to disable STRICT_KERNEL_RWX for 64-bit in Kconfig, as it's causing crashes for some people. Thanks to Christophe Leroy and Rui Salvaterra. * tag 'powerpc-5.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Disable STRICT_KERNEL_RWX Revert "powerpc/32s: reorder Linux PTE bits to better match Hash PTE bits."
2020-05-22mt76: mt7915: Fix build errorYueHaibing
In file included from ./include/linux/firmware.h:6:0, from drivers/net/wireless/mediatek/mt76/mt7915/mcu.c:4: In function ‘__mt7915_mcu_msg_send’, inlined from ‘mt7915_mcu_send_message’ at drivers/net/wireless/mediatek/mt76/mt7915/mcu.c:370:6: ./include/linux/compiler.h:396:38: error: call to ‘__compiletime_assert_545’ declared with attribute error: BUILD_BUG_ON failed: cmd == MCU_EXT_CMD_EFUSE_ACCESS && mcu_txd->set_query != MCU_Q_QUERY _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^ ./include/linux/compiler.h:377:4: note: in definition of macro ‘__compiletime_assert’ prefix ## suffix(); \ ^~~~~~ ./include/linux/compiler.h:396:2: note: in expansion of macro ‘_compiletime_assert’ _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^~~~~~~~~~~~~~~~~~~ ./include/linux/build_bug.h:39:37: note: in expansion of macro ‘compiletime_assert’ #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ^~~~~~~~~~~~~~~~~~ ./include/linux/build_bug.h:50:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’ BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) ^~~~~~~~~~~~~~~~ drivers/net/wireless/mediatek/mt76/mt7915/mcu.c:280:2: note: in expansion of macro ‘BUILD_BUG_ON’ BUILD_BUG_ON(cmd == MCU_EXT_CMD_EFUSE_ACCESS && ^~~~~~~~~~~~ BUILD_BUG_ON is meaningless here, chang it to WARN_ON. Fixes: e57b7901469f ("mt76: add mac80211 driver for MT7915 PCIe-based chipsets") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200522034533.61716-1-yuehaibing@huawei.com
2020-05-22batman-adv: use rcu_replace_pointer() where appropriateAntonio Quartulli
In commit a63fc6b75cca ("rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer()") a new helper macro named rcu_replace_pointer() was introduced to simplify code requiring to switch an rcu pointer to a new value while extracting the old one. Use rcu_replace_pointer() where appropriate to make code slimer. Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-05-22batman-adv: Revert "Drop lockdep.h include for soft-interface.c"Sven Eckelmann
The commit 1a33e10e4a95 ("net: partially revert dynamic lockdep key changes") reverts the commit ab92d68fc22f ("net: core: add generic lockdep keys"). But it forgot to also revert the commit 5759af0682b3 ("batman-adv: Drop lockdep.h include for soft-interface.c") which depends on the latter. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-05-22misc: rtsx: Add short delay after exit from ASPMKlaus Doth
DMA transfers to and from the SD card stall for 10 seconds and run into timeout on RTS5260 card readers after ASPM was enabled. Adding a short msleep after disabling ASPM fixes the issue on several Dell Precision 7530/7540 systems I tested. This function is only called when waking up after the chip went into power-save after not transferring data for a few seconds. The added msleep does therefore not change anything in data transfer speed or induce any excessive waiting while data transfers are running, or the chip is sleeping. Only the transition from sleep to active is affected. Signed-off-by: Klaus Doth <kdlnx@doth.eu> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/4434eaa7-2ee3-a560-faee-6cee63ebd6d4@doth.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-21ice: Rename build_ctob to ice_build_ctobTony Nguyen
To make the function easier to identify as being part of the ice driver, prepend ice to the function name. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21ice: remove unnecessary backslashBruce Allan
Self-explanatory. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21ice: remove unnecessary checkBruce Allan
The variable status cannot be zero due to a prior check of it; remove this check. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21ice: remove unnecessary expression that is always trueBruce Allan
The else conditional expression is always true due to the if conditional expression; remove it and add a comment to make it obvious still. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21ice: Fix check for removing/adding mac filtersLihong Yang
In function ice_set_mac_address, we will remove old dev_addr before adding the new MAC. In the removing and adding process of the MAC, there is no need to return error if the check finds the to-be-removed dev_addr does not exist in the MAC filter list or the to-be-added mac already exists, keep going or return success accordingly. Signed-off-by: Lihong Yang <lihong.yang@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21ice: refactor filter functionsMichal Swiatkowski
Move filter functions to separate file. Add functions that prepare suitable ice_fltr_info struct depending on the filter type and add this struct to earlier created list: - ice_fltr_add_mac_to_list - ice_fltr_add_vlan_to_list - ice_fltr_add_eth_to_list This functions are used in adding and removing filters. Create wrappers for functions mentioned above that alloc list, add suitable ice_fltr_info to it and call add or remove function. - ice_fltr_prepare_mac - ice_fltr_prepare_mac_and_broadcast - ice_fltr_prepare_vlan - ice_fltr_prepare_eth Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21ice: Fix resource leak on early exit from functionEric Joyner
Memory allocated in the ice_add_prof_id_vsig() function wasn't being properly freed if an error occurred inside the for-loop in the function. In particular, 'p' wasn't being freed if an error occurred before it was added to the resource list at the end of the for-loop. Signed-off-by: Eric Joyner <eric.joyner@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21ice: cleanup vf_id signednessJesse Brandeburg
The vf_id variable is dealt with in the code in inconsistent ways of sign usage, preventing compilation with -Werror=sign-compare. Fix this problem in the code by always treating vf_id as unsigned, since there are no valid values of vf_id that are negative. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21ice: Fix casting issuesKarol Kolacinski
Change min() macros to min_t() which has compare type specified and it helps avoid precision loss. In some cases there was precision loss during calls or assignments. Some fields in structs were unnecessarily large and gave multiple warnings. There were also some minor type differences which are now fixed as well as some cases where a simple cast was needed. Callers were were passing data that is a u16 to ice_sched_cfg_node_bw_alloc() but the function was truncating that to a u8. Fix that by changing the function to take a u16. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21ice: Provide more meaningful error messageLihong Yang
When printing the ice status or AQ error codes, instead of printing out the numerical value, provide the description of the error code. This provides more info about the issue than a number. Signed-off-by: Lihong Yang <lihong.yang@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21ice: Fix probe/open race conditionAnirudh Venkataramanan
As soon as the driver registers the PF netdev, userspace utilities like NetworkManager try to bring up the associated interface. When this happens, the driver may not have finished initializing fully, resulting in a bunch of errors in the interface up flow. The driver already has a mechanism to indicate if it's not up yet; by setting the __ICE_DOWN bit in pf->state, but this bit gets cleared too early in the current flow. So clear this bit only when the driver is fully up. Also check for the same bit in the ice_open flow, and return -EBUSY if the bit is set. Also in ice_open, replace references of vsi->back with a local variable. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21ice: only drop link once when setting pauseparamsDave Ertman
Currently, the ice driver is setting a PHY configuration, which causes a link drop, and then additionally it calls for a nway_reset, which restarts auto-negotiation on the link, which also causes a link drop. These two link events in such close timing is causing the FW to not be able to generate a link interrupt for the driver to respond to. Remove the unnecessary auto-negotiation restart from the set pauseparams flow. Also remove error path that would have performed an ice_down/ice_up as that is also unnecessary. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21ice: Fix check for contiguous TCsDave Ertman
The current implementation for contiguous TC check is assuming that the UPs will be mapped to TCs in a linear progressing fashion. This is obviously not always true. Change the check to allow for various UP2TC mapping configurations. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21ice: Don't reset and rebuild for Tx timeout on PFC enabled queueAvinash JD
When there's a Tx timeout for a queue which belongs to a PFC enabled TC, then it's not because the queue is hung but because PFC is in action. In PFC, peer sends a pause frame for a specified period of time when its buffer threshold is exceeded (due to congestion). Netdev on the other hand checks if ACK is received within a specified time for a TX packet, if not, it'll invoke the tx_timeout routine. Signed-off-by: Avinash JD <avinash.dayanand@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21ice: Add VF promiscuous supportBrett Creeley
Implement promiscuous support for VF VSIs. Behaviour of promiscuous support is based on VF trust as well as the, introduced, vf-true-promisc flag. A trusted VF with vf-true-promisc disabled will be the default VSI, which means that all traffic without a matching destination MAC address in the device's internal switch will be forwarded to this VF VSI. A trusted VF with vf-true-promisc enabled will go into "true promiscuous mode". This amounts to the VF receiving all ingress and egress traffic that hits the device's internal switch. An untrusted VF will only receive traffic destined for that VF. The vf-true-promisc-support flag cannot be toggled while any VF is in promiscuous mode. This flag should be set prior to loading the iavf driver or spawning VF(s). Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21ice: Add support for tunnel offloadsTony Nguyen
Create a boost TCAM entry for each tunnel port in order to get a tunnel PTYPE. Update netdev feature flags and implement the appropriate logic to get and set values for hardware offloads. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Henry Tieman <henry.w.tieman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21ice: report netlist version in .info_getJacob Keller
The flash memory for the ice hardware contains a block of information used for link management called the Netlist module. As this essentially represents another section of firmware, add its version information to the output of the driver's .info_get handler. This includes both a version and the first few bytes of a hash of the module contents. fw.netlist -> the version information extracted from the netlist module fw.netlist.build-> first 4 bytes of the hash of the contents, similar to fw.mgmt.build Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21flow_dissector: Drop BPF flow dissector prog ref on netns cleanupJakub Sitnicki
When attaching a flow dissector program to a network namespace with bpf(BPF_PROG_ATTACH, ...) we grab a reference to bpf_prog. If netns gets destroyed while a flow dissector is still attached, and there are no other references to the prog, we leak the reference and the program remains loaded. Leak can be reproduced by running flow dissector tests from selftests/bpf: # bpftool prog list # ./test_flow_dissector.sh ... selftests: test_flow_dissector [PASS] # bpftool prog list 4: flow_dissector name _dissect tag e314084d332a5338 gpl loaded_at 2020-05-20T18:50:53+0200 uid 0 xlated 552B jited 355B memlock 4096B map_ids 3,4 btf_id 4 # Fix it by detaching the flow dissector program when netns is going away. Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20200521083435.560256-1-jakub@cloudflare.com
2020-05-21Merge branch 'improve-branch_taken'Alexei Starovoitov
John Fastabend says: ==================== This series adds logic to the verifier to handle the case where a pointer is known to be non-null but then the verifier encountesr a instruction, such as 'if ptr == 0 goto X' or 'if ptr != 0 goto X', where the pointer is compared against 0. Because the verifier tracks if a pointer may be null in many cases we can improve the branch tracking by following the case known to be true. The first patch adds the verifier logic and patches 2-4 add the test cases. v1->v2: fix verifier logic to return -1 indicating both paths must still be walked if value is not zero. Move mark_precision skip for this case into caller of mark_precision to ensure mark_precision can still catch other misuses. And add PTR_TYPE_BTF_ID to our list of supported types. Finally add one more test to catch the value not equal zero case. Thanks to Andrii for original review. Also fixed up commit messages hopefully its better now. ==================== Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-05-21bpf: Selftests, add printk to test_sk_lookup_kern to encode null ptr checkJohn Fastabend
Adding a printk to test_sk_lookup_kern created the reported failure where a pointer type is checked twice for NULL. Lets add it to the progs test test_sk_lookup_kern.c so we test the case from C all the way into the verifier. We already have printk's in selftests so seems OK to add another one. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/159009170603.6313.1715279795045285176.stgit@john-Precision-5820-Tower
2020-05-21bpf: Selftests, verifier case for non null pointer map value branchJohn Fastabend
When we have pointer type that is known to be non-null we only follow the non-null branch. This adds tests to cover the map_value pointer returned from a map lookup. To force an error if both branches are followed we do an ALU op on R10. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/159009168650.6313.7434084136067263554.stgit@john-Precision-5820-Tower
2020-05-21bpf: Selftests, verifier case for non null pointer check branch takenJohn Fastabend
When we have pointer type that is known to be non-null and comparing against zero we only follow the non-null branch. This adds tests to cover this case for reference tracking. Also add the other case when comparison against a non-zero value and ensure we still fail with unreleased reference. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/159009166599.6313.1593680633787453767.stgit@john-Precision-5820-Tower
2020-05-21bpf: Verifier track null pointer branch_taken with JNE and JEQJohn Fastabend
Currently, when considering the branches that may be taken for a jump instruction if the register being compared is a pointer the verifier assumes both branches may be taken. But, if the jump instruction is comparing if a pointer is NULL we have this information in the verifier encoded in the reg->type so we can do better in these cases. Specifically, these two common cases can be handled. * If the instruction is BPF_JEQ and we are comparing against a zero value. This test is 'if ptr == 0 goto +X' then using the type information in reg->type we can decide if the ptr is not null. This allows us to avoid pushing both branches onto the stack and instead only use the != 0 case. For example PTR_TO_SOCK and PTR_TO_SOCK_OR_NULL encode the null pointer. Note if the type is PTR_TO_SOCK_OR_NULL we can not learn anything. And also if the value is non-zero we learn nothing because it could be any arbitrary value a different pointer for example * If the instruction is BPF_JNE and ware comparing against a zero value then a similar analysis as above can be done. The test in asm looks like 'if ptr != 0 goto +X'. Again using the type information if the non null type is set (from above PTR_TO_SOCK) we know the jump is taken. In this patch we extend is_branch_taken() to consider this extra information and to return only the branch that will be taken. This resolves a verifier issue reported with C code like the following. See progs/test_sk_lookup_kern.c in selftests. sk = bpf_sk_lookup_tcp(skb, tuple, tuple_len, BPF_F_CURRENT_NETNS, 0); bpf_printk("sk=%d\n", sk ? 1 : 0); if (sk) bpf_sk_release(sk); return sk ? TC_ACT_OK : TC_ACT_UNSPEC; In the above the bpf_printk() will resolve the pointer from PTR_TO_SOCK_OR_NULL to PTR_TO_SOCK. Then the second test guarding the release will cause the verifier to walk both paths resulting in the an unreleased sock reference. See verifier/ref_tracking.c in selftests for an assembly version of the above. After the above additional logic is added the C code above passes as expected. Reported-by: Andrey Ignatov <rdna@fb.com> Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/159009164651.6313.380418298578070501.stgit@john-Precision-5820-Tower
2020-05-21Merge branch 'af_xdp-common-alloc'Alexei Starovoitov
Björn Töpel says: ==================== Overview ======== Driver adoption for AF_XDP has been slow. The amount of code required to proper support AF_XDP is substantial and the driver/core APIs are vague or even non-existing. Drivers have to manually adjust data offsets, updating AF_XDP handles differently for different modes (aligned/unaligned). This series attempts to improve the situation by introducing an AF_XDP buffer allocation API. The implementation is based on a single core (single producer/consumer) buffer pool for the AF_XDP UMEM. A buffer is allocated using the xsk_buff_alloc() function, and returned using xsk_buff_free(). If a buffer is disassociated with the pool, e.g. when a buffer is passed to an AF_XDP socket, a buffer is said to be released. Currently, the release function is only used by the AF_XDP internals and not visible to the driver. Drivers using this API should register the XDP memory model with the new MEM_TYPE_XSK_BUFF_POOL type, which will supersede the MEM_TYPE_ZERO_COPY type. The buffer type is struct xdp_buff, and follows the lifetime of regular xdp_buffs, i.e. the lifetime of an xdp_buff is restricted to a NAPI context. In other words, the API is not replacing xdp_frames. DMA mapping/synching is folded into the buffer handling as well. @JeffK The Intel drivers changes should go through the bpf-next tree, and not your regular Intel tree, since multiple (non-Intel) drivers are affected. The outline of the series is as following: Patch 1 is a fix for xsk_umem_xdp_frame_sz(). Patch 2 to 4 are restructures/clean ups. The XSKMAP implementation is moved to net/xdp/. Functions/defines/enums that are only used by the AF_XDP internals are moved from the global include/net/xdp_sock.h to net/xdp/xsk.h. We are also introducing a new "driver include file", include/net/xdp_sock_drv.h, which is the only file NIC driver developers adding AF_XDP zero-copy support should care about. Patch 5 adds the new API, and migrates the "copy-mode"/skb-mode AF_XDP path to the new API. Patch 6 to 11 migrates the existing zero-copy drivers to the new API. Patch 12 removes the MEM_TYPE_ZERO_COPY memory type, and the "handle" member of struct xdp_buff. Patch 13 simplifies the xdp_return_{frame,frame_rx_napi,buff} functions. Patch 14 is a performance patch, where some functions are inlined. Finally, patch 15 updates the MAINTAINERS file to correctly mirror the new file layout. Note that this series removes the "handle" member from struct xdp_buff, which reduces the xdp_buff size. After this series, the diff stat of drivers/net/ is: 27 files changed, 419 insertions(+), 1288 deletions(-) This series is a first step of simplifying the driver side of AF_XDP. I think more of the AF_XDP logic can be moved from the drivers to the AF_XDP core, e.g. the "need wakeup" set/clear functionality. Statistics when allocation fails can now be added to the socket statistics via the XDP_STATISTICS getsockopt(). This will be added in a follow up series. Performance =========== As a nice side effect, performance is up a bit as well. * i40e: 3% higher pps for rxdrop, zero-copy, aligned and unaligned (40 GbE, 64B packets). * mlx5: RX +0.8 Mpps, TX +0.4 Mpps Changelog ========= v4->v5: * Fix various kdoc and GCC warnings (W=1). (Jakub) v3->v4: * mlx5: Remove unused variable num_xsk_frames. (Jakub) * i40e: Made i40e_fd_handle_status() static. (kbuild test robot) v2->v3: * Added xsk_umem_xdp_frame_sz() fix to the series. (Björn) * Initialize struct xdp_buff member frame_sz. (Björn) * Add API to query the DMA address of a frame. (Maxim) * Do DMA sync for CPU till the end of the frame to handle possible growth (frame_sz). (Maxim) * mlx5: Handle frame_sz, use xsk_buff_xdp_get_frame_dma, use xsk_buff API for DMA sync on TX, add performance numbers. (Maxim) v1->v2: * mlx5: Fix DMA address handling, set XDP metadata to invalid. (Maxim) * ixgbe: Fixed xdp_buff data_end update. (Björn) * Swapped SoBs in patch 4. (Maxim) rfc->v1: * Fixed build errors/warnings for m68k and riscv. (kbuild test robot) * Added headroom/chunk size getter. (Maxim/Björn) * mlx5: Put back the sanity check for XSK params, use XSK API to get the total headroom size. (Maxim) * Fixed spelling in commit message. (Björn) * Make sure xp_validate_desc() is inlined for Tx perf. (Maxim) * Sorted file entries. (Joe) * Added xdp_return_{frame,frame_rx_napi,buff} simplification (Björn) Thanks for all the comments/input/help! ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-05-21MAINTAINERS, xsk: Update AF_XDP section after moves/addsBjörn Töpel
Update MAINTAINERS to correctly mirror the current AF_XDP socket file layout. Also, add the AF_XDP files of libbpf. rfc->v1: Sorted file entries. (Joe) Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Joe Perches <joe@perches.com> Link: https://lore.kernel.org/bpf/20200520192103.355233-16-bjorn.topel@gmail.com
2020-05-21xsk: Explicitly inline functions and move definitionsBjörn Töpel
In order to reduce the number of function calls, the struct xsk_buff_pool definition is moved to xsk_buff_pool.h. The functions xp_get_dma(), xp_dma_sync_for_cpu(), xp_dma_sync_for_device(), xp_validate_desc() and various helper functions are explicitly inlined. Further, move xp_get_handle() and xp_release() to xsk.c, to allow for the compiler to perform inlining. rfc->v1: Make sure xp_validate_desc() is inlined for Tx perf. (Maxim) Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-15-bjorn.topel@gmail.com
2020-05-21xdp: Simplify xdp_return_{frame, frame_rx_napi, buff}Björn Töpel
The xdp_return_{frame,frame_rx_napi,buff} function are never used, except in xdp_convert_zc_to_xdp_frame(), by the MEM_TYPE_XSK_BUFF_POOL memory type. To simplify and reduce code, change so that xdp_convert_zc_to_xdp_frame() calls xsk_buff_free() directly since the type is know, and remove MEM_TYPE_XSK_BUFF_POOL from the switch statement in __xdp_return() function. Suggested-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-14-bjorn.topel@gmail.com
2020-05-21xsk: Remove MEM_TYPE_ZERO_COPY and corresponding codeBjörn Töpel
There are no users of MEM_TYPE_ZERO_COPY. Remove all corresponding code, including the "handle" member of struct xdp_buff. rfc->v1: Fixed spelling in commit message. (Björn) Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-13-bjorn.topel@gmail.com
2020-05-21mlx5, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOLBjörn Töpel
Use the new MEM_TYPE_XSK_BUFF_POOL API in lieu of MEM_TYPE_ZERO_COPY in mlx5e. It allows to drop a lot of code from the driver (which is now common in AF_XDP core and was related to XSK RX frame allocation, DMA mapping, etc.) and slightly improve performance (RX +0.8 Mpps, TX +0.4 Mpps). rfc->v1: Put back the sanity check for XSK params, use XSK API to get the total headroom size. (Maxim) v1->v2: Fix DMA address handling, set XDP metadata to invalid. (Maxim) v2->v3: Handle frame_sz, use xsk_buff_xdp_get_frame_dma, use xsk_buff API for DMA sync on TX, add performance numbers. (Maxim) v3->v4: Remove unused variable num_xsk_frames. (Jakub) Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-12-bjorn.topel@gmail.com
2020-05-21ixgbe, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOLBjörn Töpel
Remove MEM_TYPE_ZERO_COPY in favor of the new MEM_TYPE_XSK_BUFF_POOL APIs. v1->v2: Fixed xdp_buff data_end update. (Björn) Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: intel-wired-lan@lists.osuosl.org Link: https://lore.kernel.org/bpf/20200520192103.355233-11-bjorn.topel@gmail.com