linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2021-06-28	bnxt_en: Transmit and retrieve packet timestamps	Pavan Chebbi
	Setup the TXBD to enable TX timestamp if requested. At TX packet DMA completion, if we requested TX timestamp on that packet, we defer to .do_aux_work() to obtain the TX timestamp from the firmware before we free the TX SKB. v2: Use .do_aux_work() to get the TX timestamp from firmware. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-28	bnxt_en: Get the RX packet timestamp	Pavan Chebbi
	If the RX packet is timestamped by the hardware, the RX completion record will contain the lower 32-bit of the timestamp. This needs to be combined with the upper 16-bit of the periodic timestamp that we get from the timer. The previous snapshot in ptp->old_timer is used to make sure that the snapshot is not ahead of the RX timestamp and we adjust for wrap-around if needed. v2: Make ptp->old_time read access safe on 32-bit CPUs. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-28	bnxt_en: Get the full 48-bit hardware timestamp periodically	Pavan Chebbi
	From the bnxt_timer(), read the 48-bit hardware running clock periodically and store it in ptp->current_time. The previous snapshot of the clock will be stored in ptp->old_time. The old_time snapshot will be used in the next patches to compute the RX packet timestamps. v2: Use .do_aux_work() to read the timer periodically. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-28	bnxt_en: Add PTP clock APIs, ioctls, and ethtool methods	Michael Chan
	Add the clock APIs to set/get/adjust the hw clock, and the related ioctls and ethtool methods. v2: Propagate error code from ptp_clock_register(). Add spinlock to serialize access to the timecounter. The timecounter is accessed in process context and the RX datapath. Read the PHC using direct registers. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-28	bnxt_en: Get PTP hardware capability from firmware	Michael Chan
	Store PTP hardware info in a structure if hardware and firmware support PTP. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-28	bnxt_en: Update firmware interface to 1.10.2.47	Michael Chan
	Adding the PTP related firmware interface is the main change. There is also a name change for admin_mtu, requiring code fixup. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-28	net: hns3: add support for dumping MAC umv counter in debugfs	Jian Shen
	This patch adds support of dumping MAC umv counter in debugfs, which will be helpful for debugging. The display style is below: $ cat umv_info num_alloc_vport : 2 max_umv_size : 256 wanted_umv_size : 256 priv_umv_size : 85 share_umv_size : 86 vport(0) used_umv_num : 1 vport(1) used_umv_num : 1 Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-28	net: hns3: add support for FD counter in debugfs	Jian Shen
	Previously, the flow director counter is not enabled. To improve the maintainability for chechking whether flow director hit or not, enable flow director counter for each function, and add debugfs query inerface to query the counters for each function. The debugfs command is below: cat fd_counter func_id hit_times pf 0 vf0 0 vf1 0 Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-28	Merge tag 'sched-core-2021-06-28' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler udpates from Ingo Molnar: - Changes to core scheduling facilities: - Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables coordinated scheduling across SMT siblings. This is a much requested feature for cloud computing platforms, to allow the flexible utilization of SMT siblings, without exposing untrusted domains to information leaks & side channels, plus to ensure more deterministic computing performance on SMT systems used by heterogenous workloads. There are new prctls to set core scheduling groups, which allows more flexible management of workloads that can share siblings. - Fix task->state access anti-patterns that may result in missed wakeups and rename it to ->__state in the process to catch new abuses. - Load-balancing changes: - Tweak newidle_balance for fair-sched, to improve 'memcache'-like workloads. - "Age" (decay) average idle time, to better track & improve workloads such as 'tbench'. - Fix & improve energy-aware (EAS) balancing logic & metrics. - Fix & improve the uclamp metrics. - Fix task migration (taskset) corner case on !CONFIG_CPUSET. - Fix RT and deadline utilization tracking across policy changes - Introduce a "burstable" CFS controller via cgroups, which allows bursty CPU-bound workloads to borrow a bit against their future quota to improve overall latencies & batching. Can be tweaked via /sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us. - Rework assymetric topology/capacity detection & handling. - Scheduler statistics & tooling: - Disable delayacct by default, but add a sysctl to enable it at runtime if tooling needs it. Use static keys and other optimizations to make it more palatable. - Use sched_clock() in delayacct, instead of ktime_get_ns(). - Misc cleanups and fixes. * tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits) sched/doc: Update the CPU capacity asymmetry bits sched/topology: Rework CPU capacity asymmetry detection sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag psi: Fix race between psi_trigger_create/destroy sched/fair: Introduce the burstable CFS controller sched/uclamp: Fix uclamp_tg_restrict() sched/rt: Fix Deadline utilization tracking during policy change sched/rt: Fix RT utilization tracking during policy change sched: Change task_struct::state sched,arch: Remove unused TASK_STATE offsets sched,timer: Use __set_current_state() sched: Add get_current_state() sched,perf,kvm: Fix preemption condition sched: Introduce task_is_running() sched: Unbreak wakeups sched/fair: Age the average idle time sched/cpufreq: Consider reduced CPU capacity in energy calculation sched/fair: Take thermal pressure into account while estimating energy thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressure sched/fair: Return early from update_tg_cfs_load() if delta == 0 ...
2021-06-26	net/mlx5e: Add IPsec support to uplink representor	Raed Salem
	Add the xfrm xdo and ipsec_init/cleanup to uplink representor to support IPsec in SRIOV switchdev mode. Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Huy Nguyen <huyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-26	net/mlx5e: kTLS, Add stats for number of deleted kTLS TX offloaded connections	Tariq Toukan
	Expose ethtool SW counter for the number of kTLS device-offloaded TX connections that are finished and deleted. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-26	net/mlx5: SF, Improve performance in SF allocation	Eli Cohen
	Avoid second traversal on the SF table by recording the first free entry and using it in case the looked up entry was not found in the table. Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-26	net/mlx5: Increase hairpin buffer size	Ariel Levkovich
	The max packet size a hairpin queue is able to handle is determined by the total hairpin buffer size divided by 4. Currently the buffer size is set to 32KB which makes the max packet size to be 8KB and doesn't support jumbo frames of size 9KB. This change increases the buffer size to 64KB to increase the max frame size and support 9KB frames. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-26	net/mlx5: DR, Add support for flow sampler offload	Yevgeny Kliteynik
	Add SW steering support for sFlow / flow sampler action. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-26	net/mlx5: Compare sampler flow destination ID in fs_core	Yevgeny Kliteynik
	When comparing sampler flow destinations, in fs_core, consider sampler ID as well. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-25	Merge branch '100GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2021-06-25 This series contains updates to ice driver only. Jesse adds support for tracepoints to aide in debugging. Maciej adds support for PTP auxiliary pin support. Victor removes the VSI info from the old aggregator when moving the VSI to another aggregator. Tony removes an unnecessary VSI assignment. Christophe Jaillet fixes a memory leak for failed allocation in ice_pf_dcb_cfg(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-25	net: mdiobus: withdraw fwnode_mdbiobus_register	Marcin Wojtas
	The newly implemented fwnode_mdbiobus_register turned out to be problematic - in case the fwnode_/of_/acpi_mdio are built as modules, a dependency cycle can be observed during the depmod phase of modules_install, eg.: depmod: ERROR: Cycle detected: fwnode_mdio -> of_mdio -> fwnode_mdio depmod: ERROR: Found 2 modules in dependency cycles! OR: depmod: ERROR: Cycle detected: acpi_mdio -> fwnode_mdio -> acpi_mdio depmod: ERROR: Found 2 modules in dependency cycles! A possible solution could be to rework fwnode_mdiobus_register, so that to merge the contents of acpi_mdiobus_register and of_mdiobus_register. However feasible, such change would be very intrusive and affect huge amount of the of_mdiobus_register users. Since there are currently 2 users of ACPI and MDIO (xgmac_mdio and mvmdio), withdraw the fwnode_mdbiobus_register and roll back to a simple 'if' condition in affected drivers. Fixes: 62a6ef6a996f ("net: mdiobus: Introduce fwnode_mdbiobus_register()") Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-25	Revert "be2net: disable bh with spin_lock in be_process_mcc"	Petr Oros
	Patch was based on wrong presumption that be_poll can be called only from bh context. It reintroducing old regression (also reverted) and causing deadlock when we use netconsole with benet in bonding. Old revert: commit 072a9c486004 ("netpoll: revert 6bdb7fe3104 and fix be_poll() instead") [ 331.269715] bond0: (slave enp0s7f0): Releasing backup interface [ 331.270121] CPU: 4 PID: 1479 Comm: ifenslave Not tainted 5.13.0-rc7+ #2 [ 331.270122] Call Trace: [ 331.270122] [c00000001789f200] [c0000000008c505c] dump_stack+0x100/0x174 (unreliable) [ 331.270124] [c00000001789f240] [c008000001238b9c] be_poll+0x64/0xe90 [be2net] [ 331.270125] [c00000001789f330] [c000000000d1e6e4] netpoll_poll_dev+0x174/0x3d0 [ 331.270127] [c00000001789f400] [c008000001bc167c] bond_poll_controller+0xb4/0x130 [bonding] [ 331.270128] [c00000001789f450] [c000000000d1e624] netpoll_poll_dev+0xb4/0x3d0 [ 331.270129] [c00000001789f520] [c000000000d1ed88] netpoll_send_skb+0x448/0x470 [ 331.270130] [c00000001789f5d0] [c0080000011f14f8] write_msg+0x180/0x1b0 [netconsole] [ 331.270131] [c00000001789f640] [c000000000230c0c] console_unlock+0x54c/0x790 [ 331.270132] [c00000001789f7b0] [c000000000233098] vprintk_emit+0x2d8/0x450 [ 331.270133] [c00000001789f810] [c000000000234758] vprintk+0xc8/0x270 [ 331.270134] [c00000001789f850] [c000000000233c28] printk+0x40/0x54 [ 331.270135] [c00000001789f870] [c000000000ccf908] __netdev_printk+0x150/0x198 [ 331.270136] [c00000001789f910] [c000000000ccfdb4] netdev_info+0x68/0x94 [ 331.270137] [c00000001789f950] [c008000001bcbd70] __bond_release_one+0x188/0x6b0 [bonding] [ 331.270138] [c00000001789faa0] [c008000001bcc6f4] bond_do_ioctl+0x42c/0x490 [bonding] [ 331.270139] [c00000001789fb60] [c000000000d0d17c] dev_ifsioc+0x17c/0x400 [ 331.270140] [c00000001789fbc0] [c000000000d0db70] dev_ioctl+0x390/0x890 [ 331.270141] [c00000001789fc10] [c000000000c7c76c] sock_do_ioctl+0xac/0x1b0 [ 331.270142] [c00000001789fc90] [c000000000c7ffac] sock_ioctl+0x31c/0x6e0 [ 331.270143] [c00000001789fd60] [c0000000005b9728] sys_ioctl+0xf8/0x150 [ 331.270145] [c00000001789fdb0] [c0000000000336c0] system_call_exception+0x160/0x2f0 [ 331.270146] [c00000001789fe10] [c00000000000d35c] system_call_common+0xec/0x278 [ 331.270147] --- interrupt: c00 at 0x7fffa6c6ec00 [ 331.270147] NIP: 00007fffa6c6ec00 LR: 0000000105c4185c CTR: 0000000000000000 [ 331.270148] REGS: c00000001789fe80 TRAP: 0c00 Not tainted (5.13.0-rc7+) [ 331.270148] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28000428 XER: 00000000 [ 331.270155] IRQMASK: 0 [ 331.270156] GPR00: 0000000000000036 00007fffd494d5b0 00007fffa6d57100 0000000000000003 [ 331.270158] GPR04: 0000000000008991 00007fffd494d6d0 0000000000000008 00007fffd494f28c [ 331.270161] GPR08: 0000000000000003 0000000000000000 0000000000000000 0000000000000000 [ 331.270164] GPR12: 0000000000000000 00007fffa6dfa220 0000000000000000 0000000000000000 [ 331.270167] GPR16: 0000000105c44880 0000000000000000 0000000105c60088 0000000105c60318 [ 331.270170] GPR20: 0000000105c602c0 0000000105c44560 0000000000000000 0000000000000000 [ 331.270172] GPR24: 00007fffd494dc50 00007fffd494d6a8 0000000105c60008 00007fffd494d6d0 [ 331.270175] GPR28: 00007fffd494f27e 0000000105c6026c 00007fffd494f284 0000000000000000 [ 331.270178] NIP [00007fffa6c6ec00] 0x7fffa6c6ec00 [ 331.270178] LR [0000000105c4185c] 0x105c4185c [ 331.270179] --- interrupt: c00 This reverts commit d0d006a43e9a7a796f6f178839c92fcc222c564d. Fixes: d0d006a43e9a7a ("be2net: disable bh with spin_lock in be_process_mcc") Signed-off-by: Petr Oros <poros@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-25	ice: Fix a memory leak in an error handling path in 'ice_pf_dcb_cfg()'	Christophe JAILLET
	If this 'kzalloc()' fails we must free some resources as in all the other error handling paths of this function. Fixes: 348048e724a0 ("ice: Implement iidc operations") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-25	ice: remove unnecessary VSI assignment	Tony Nguyen
	ice_get_vf_vsi() is being called twice for the same VSI. Remove the unnecessary call/assignment. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
2021-06-25	ice: remove the VSI info from previous agg	Victor Raj
	Remove the VSI info from previous aggregator after moving the VSI to a new aggregator. Signed-off-by: Victor Raj <victor.raj@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-25	ice: add support for auxiliary input/output pins	Maciej Machnikowski
	The E810 device supports programmable pins for enabling both input and output events related to the PTP hardware clock. This includes both output signals with programmable period, as well as timestamping of events on input pins. Add support for enabling these using the CONFIG_PTP_1588_CLOCK interface. This allows programming the software defined pins to take advantage of the hardware clock features. Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-25	gve: Fix swapped vars when fetching max queues	Bailey Forrest
	Fixes: 893ce44df565 ("gve: Add basic driver framework for Compute Engine Virtual NIC") Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-25	Add Mellanox BlueField Gigabit Ethernet driver	David Thompson
	This patch adds build and driver logic for the "mlxbf_gige" Ethernet driver from Mellanox Technologies. The second generation BlueField SoC from Mellanox supports an out-of-band GigaBit Ethernet management port to the Arm subsystem. This driver supports TCP/IP network connectivity for that port, and provides back-end routines to handle basic ethtool requests. The driver interfaces to the Gigabit Ethernet block of BlueField SoC via MMIO accesses to registers, which contain control information or pointers describing transmit and receive resources. There is a single transmit queue, and the port supports transmit ring sizes of 4 to 256 entries. There is a single receive queue, and the port supports receive ring sizes of 32 to 32K entries. The transmit and receive rings are allocated from DMA coherent memory. There is a 16-bit producer and consumer index per ring to denote software ownership and hardware ownership, respectively. The main driver logic such as probe(), remove(), and netdev ops are in "mlxbf_gige_main.c". Logic in "mlxbf_gige_rx.c" and "mlxbf_gige_tx.c" handles the packet processing for receive and transmit respectively. The logic in "mlxbf_gige_ethtool.c" supports the handling of some basic ethtool requests: get driver info, get ring parameters, get registers, and get statistics. The logic in "mlxbf_gige_mdio.c" is the driver controlling the Mellanox BlueField hardware that interacts with a PHY device via MDIO/MDC pins. This driver does the following: - At driver probe time, it configures several BlueField MDIO parameters such as sample rate, full drive, voltage and MDC - It defines functions to read and write MDIO registers and registers the MDIO bus. - It defines the phy interrupt handler reporting a link up/down status change - This driver's probe is invoked from the main driver logic while the phy interrupt handler is registered in ndo_open. Driver limitations - Only supports 1Gbps speed - Only supports GMII protocol - Supports maximum packet size of 2KB - Does not support scatter-gather buffering Testing - Successful build of kernel for ARM64, ARM32, X86_64 - Tested ARM64 build on FastModels & Palladium - Tested ARM64 build on several Mellanox boards that are built with the BlueField-2 SoC. The testing includes coverage in the areas of networking (e.g. ping, iperf, ifconfig, route), file transfers (e.g. SCP), and various ethtool options relevant to this driver. Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com> Reviewed-by: Liming Sun <limings@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-25	ice: add tracepoints	Jesse Brandeburg
	This patch is modeled after one by Scott Peterson for i40e. Add tracepoints to the driver, via a new file ice_trace.h and some new trace calls added in interesting places in the driver. Add some tracing for DIMLIB to help debug interrupt moderation problems. Performance should not be affected, and this can be very useful for debugging and adding new trace events to paths in the future. Note eBPF programs can attach to these events, as well as perf can count them since we're attaching to the events subsystem in the kernel. Co-developed-by: Ben Shelton <benjamin.h.shelton@intel.com> Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-24	net: bcmgenet: Add mdio-bcm-unimac soft dependency	Jian-Hong Pan
	The Broadcom UniMAC MDIO bus from mdio-bcm-unimac module comes too late. So, GENET cannot find the ethernet PHY on UniMAC MDIO bus. This leads GENET fail to attach the PHY as following log: bcmgenet fd580000.ethernet: GENET 5.0 EPHY: 0x0000 ... could not attach to PHY bcmgenet fd580000.ethernet eth0: failed to connect to PHY uart-pl011 fe201000.serial: no DMA platform data libphy: bcmgenet MII bus: probed ... unimac-mdio unimac-mdio.-19: Broadcom UniMAC MDIO bus It is not just coming too late, there is also no way for the module loader to figure out the dependency between GENET and its MDIO bus driver unless we provide this MODULE_SOFTDEP hint. This patch adds the soft dependency to load mdio-bcm-unimac module before genet module to fix this issue. Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=213485 Fixes: 9a4e79697009 ("net: bcmgenet: utilize generic Broadcom UniMAC MDIO controller driver") Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	mlxsw: core_env: Avoid unnecessary memcpy()s	Ido Schimmel
	Simply get a pointer to the data in the register payload instead of copying it to a temporary buffer. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Fix warnings reported for DQO patchset	Bailey Forrest
	https://patchwork.kernel.org/project/netdevbpf/list/?series=506637&state=* - Remove unused variable - Use correct integer type for string formatting. - Remove `inline` in C files Fixes: 9c1a59a2f4bc ("gve: DQO: Add ring allocation and initialization") Fixes: a57e5de476be ("gve: DQO: Add TX path") Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	e1000e: Check the PCIm state	Sasha Neftin
	Complete to commit def4ec6dce393e ("e1000e: PCIm function state support") Check the PCIm state only on CSME systems. There is no point to do this check on non CSME systems. This patch fixes a generation a false-positive warning: "Error in exiting dmoff" Fixes: def4ec6dce39 ("e1000e: PCIm function state support") Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	Merge branch '40GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-06-24 This series contains updates to i40e driver only. Dinghao Liu corrects error handling for failed i40e_vsi_request_irq() call. Mateusz allows for disabling of autonegotiation for all BaseT media. Jesse corrects the multiplier being used on 5Gb speeds for PTP. Jan adds locking in paths calling i40e_setup_pf_switch() that were missing it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: DQO: Add RX path	Bailey Forrest
	The RX queue has an array of `gve_rx_buf_state_dqo` objects. All allocated pages have an associated buf_state object. When a buffer is posted on the RX buffer queue, the buffer ID will be the buf_state's index into the RX queue's array. On packet reception, the RX queue will have one descriptor for each buffer associated with a received packet. Each RX descriptor will have a buffer_id that was posted on the buffer queue. Notable mentions: - We use a default buffer size of 2048 bytes. Based on page size, we may post separate sections of a single page as separate buffers. - The driver holds an extra reference on pages passed up the receive path with an skb and keeps these pages on a list. When posting new buffers to the NIC, we check if any of these pages has only our reference, or another buffer sized segment of the page has no references. If so, it is free to reuse. This page recycling approach is a common netdev optimization that reduces page alloc/free calls. - Pages in the free list have a page_count bias in order to avoid an atomic increment of pagecount every time we attempt to reuse a page. # references = page_count() - bias - In order to track when a page is safe to reuse, we keep track of the last offset which had a single SKB reference. When this occurs, it implies that every single other offset is reusable. Otherwise, we don't know if offsets can be safely reused. - We maintain two free lists of pages. List #1 (recycled_buf_states) contains pages we know can be reused right away. List #2 (used_buf_states) contains pages which cannot be used right away. We only attempt to get pages from list #2 when list #1 is empty. We only attempt to use a small fixed number pages from list #2 before giving up and allocating a new page. Both lists are FIFOs in hope that by the time we attempt to reuse a page, the references were dropped. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: DQO: Add TX path	Bailey Forrest
	TX SKBs will have their buffers DMA mapped with the device. Each buffer will have at least one TX descriptor associated. Each SKB will also have a metadata descriptor. Each TX queue maintains an array of `gve_tx_pending_packet_dqo` objects. Every TX SKB will have an associated pending_packet object. A TX SKB's descriptors will use its pending_packet's index as the completion tag, which will be returned on the TX completion queue. The device implements a "flow-miss model". Most packets will simply receive a packet completion. The flow-miss system may choose to process a packet based on its contents. A TX packet which experiences a flow miss would receive a miss completion followed by a later reinjection completion. The miss-completion is received when the packet starts to be processed by the flow-miss system and the reinjection completion is received when the flow-miss system completes processing the packet and sends it on the wire. Notable mentions: - Buffers may be freed after receiving the miss-completion, but in order to avoid packet reordering, we do not complete the SKB until receiving the reinjection completion. - The driver must robustly handle the unlikely scenario where a miss completion does not have an associated reinjection completion. This is accomplished by maintaining a list of packets which have a pending reinjection completion. After a short timeout (5 seconds), the SKB and buffers are released and the pending_packet is moved to a second list which has a longer timeout (60 seconds), where the pending_packet will not be reused. When the longer timeout elapses, the driver may assume the reinjection completion would never be received and the pending_packet may be reused. - Completion handling is triggered by an interrupt and is done in the NAPI poll function. Because the TX path and completion exist in different threading contexts they maintain their own lists for free pending_packet objects. The TX path uses a lock-free approach to steal the list from the completion path. - Both the TSO context and general context descriptors have metadata bytes. The device requires that if multiple descriptors contain the same field, each descriptor must have the same value set for that field. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: DQO: Configure interrupts on device up	Bailey Forrest
	When interrupts are first enabled, we also set the ratelimits, which will be static for the entire usage of the device. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: DQO: Add ring allocation and initialization	Bailey Forrest
	Allocate the buffer and completion ring structures. Do not populate the rings yet. That will happen in the respective rx and tx datapath follow-on patches Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: DQO: Add core netdev features	Bailey Forrest
	Add napi netdev device registration, interrupt handling and initial tx and rx polling stubs. The stubs will be filled in follow-on patches. Also: - LRO feature advertisement and handling - Also update ethtool logic Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Update adminq commands to support DQO queues	Bailey Forrest
	DQO queue creation requires additional parameters: - TX completion/RX buffer queue size - TX completion/RX buffer queue address - TX/RX queue size - RX buffer size Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Add DQO fields for core data structures	Bailey Forrest
	- Add new DQO datapath structures: - `gve_rx_buf_queue_dqo` - `gve_rx_compl_queue_dqo` - `gve_rx_buf_state_dqo` - `gve_tx_desc_dqo` - `gve_tx_pending_packet_dqo` - Incorporate these into the existing ring data structures: - `gve_rx_ring` - `gve_tx_ring` Noteworthy mentions: - `gve_rx_buf_state` represents an RX buffer which was posted to HW. Each RX queue has an array of these objects and the index into the array is used as the buffer_id when posted to HW. - `gve_tx_pending_packet_dqo` is treated similarly for TX queues. The completion_tag is the index into the array. - These two structures have links for linked lists which are represented by 16b indexes into a contiguous array of these structures. This reduces memory footprint compared to 64b pointers. - We use unions for the writeable datapath structures to reduce cache footprint. GQI specific members will renamed like DQO members in a future patch. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Add dqo descriptors	Bailey Forrest
	General description of rings and descriptors: TX ring is used for sending TX packet buffers to the NIC. It has the following descriptors: - `gve_tx_pkt_desc_dqo` - Data buffer descriptor - `gve_tx_tso_context_desc_dqo` - TSO context descriptor - `gve_tx_general_context_desc_dqo` - Generic metadata descriptor Metadata is a collection of 12 bytes. We define `gve_tx_metadata_dqo` which represents the logical interpetation of the metadata bytes. It's helpful to define this structure because the metadata bytes exist in multiple descriptor types (including `gve_tx_tso_context_desc_dqo`), and the device requires same field has the same value in all descriptors. The TX completion ring is used to receive completions from the NIC. Having a separate ring allows for completions to be out of order. The completion descriptor `gve_tx_compl_desc` has several different types, most important are packet and descriptor completions. Descriptor completions are used to notify the driver when descriptors sent on the TX ring are done being consumed. The descriptor completion is only used to signal that space is cleared in the TX ring. A packet completion will be received when a packet transmitted on the TX queue is done being transmitted. In addition there are "miss" and "reinjection" completions. The device implements a "flow-miss model". Most packets will simply receive a packet completion. The flow-miss system may choose to process a packet based on its contents. A TX packet which experiences a flow miss would receive a miss completion followed by a later reinjection completion. The miss-completion is received when the packet starts to be processed by the flow-miss system and the reinjection completion is received when the flow-miss system completes processing the packet and sends it on the wire. The RX buffer ring is used to send buffers to HW via the `gve_rx_desc_dqo` descriptor. Received packets are put into the RX queue by the device, which populates the `gve_rx_compl_desc_dqo` descriptor. The RX descriptors refer to buffers posted by the buffer queue. Received buffers may be returned out of order, such as when HW LRO is enabled. Important concepts: - "TX" and "RX buffer" queues, which send descriptors to the device, use MMIO doorbells to notify the device of new descriptors. - "RX" and "TX completion" queues, which receive descriptors from the device, use a "generation bit" to know when a descriptor was populated by the device. The driver initializes all bits with the "current generation". The device will populate received descriptors with the "next generation" which is inverted from the current generation. When the ring wraps, the current/next generation are swapped. - It's the driver's responsibility to ensure that the RX and TX completion queues are not overrun. This can be accomplished by limiting the number of descriptors posted to HW. - TX packets have a 16 bit completion_tag and RX buffers have a 16 bit buffer_id. These will be returned on the TX completion and RX queues respectively to let the driver know which packet/buffer was completed. Bitfields are used to describe descriptor fields. This notation is more concise and readable than shift-and-mask. It is possible because the driver is restricted to little endian platforms. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Add support for DQO RX PTYPE map	Bailey Forrest
	Unlike GQI, DQO RX descriptors do not contain the L3 and L4 type of the packet. L3 and L4 types are necessary in order to set the hash and csum on RX SKBs correctly. DQO RX descriptors instead contain a 10 bit PTYPE index. The PTYPE map enables the device to tell the driver how to map from PTYPE index to L3/L4 type. The device doesn't provide any guarantees about the range of possible PTYPEs, so we just use a 1024 entry array to implement a fast mapping structure. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: adminq: DQO specific device descriptor logic	Bailey Forrest
	- In addition to TX and RX queues, DQO has TX completion and RX buffer queues. - TX completions are received when the device has completed sending a packet on the wire. - RX buffers are posted on a separate queue form the RX completions. - DQO descriptor rings are allowed to be smaller than PAGE_SIZE. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Introduce per netdev `enum gve_queue_format`	Bailey Forrest
	The currently supported queue formats are: - GQI_RDA - GQI with raw DMA addressing - GQI_QPL - GQI with queue page list - DQO_RDA - DQO with raw DMA addressing The old `gve_priv.raw_addressing` value is only used for GQI_RDA, so we remove it in favor of just checking against GQI_RDA Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Introduce a new model for device options	Bailey Forrest
	The current model uses an integer ID and a fixed size struct for the parameters of each device option. The new model allows the device option structs to grow in size over time. A driver may assume that changes to device option structs will always be appended. New device options will also generally have a `supported_features_mask` so that the driver knows which fields within a particular device option are enabled. `gve_device_option.feat_mask` is changed to `required_features_mask`, and it is a bitmask which must match the value expected by the driver. This gives the device the ability to break backwards compatibility with old drivers for certain features by blocking the old drivers from trying to use the feature. We maintain ABI compatibility with the old model for GVE_DEV_OPT_ID_RAW_ADDRESSING in case a driver is using a device which does not support the new model. This patch introduces some new terminology: RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA mapped and read/updated by the device. QPL - Queue Page Lists - Driver uses bounce buffers which are DMA mapped with the device for read/write and data is copied from/to SKBs. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Make gve_rx_slot_page_info.page_offset an absolute offset	Bailey Forrest
	Using `page_offset` like a boolean means a page may only be split into two sections. With page sizes larger than 4k, this can be very wasteful. Future commits in this patchset use `struct gve_rx_slot_page_info` in a way which supports a fixed buffer size and a variable page size. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: gve_rx_copy: Move padding to an argument	Bailey Forrest
	Future use cases will have a different padding value. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	gve: Move some static functions to a common file	Bailey Forrest
	These functions will be shared by the GQI and DQO variants of the GVNIC driver as of follow-up patches in this series. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: atlantic: fix the macsec key length	Antoine Tenart
	The key length used to store the macsec key was set to MACSEC_KEYID_LEN (16), which is an issue as: - This was never meant to be the key length. - The key length can be > 16. Fix this by using MACSEC_MAX_KEY_LEN instead (the max length accepted in uAPI). Fixes: 27736563ce32 ("net: atlantic: MACSec egress offload implementation") Fixes: 9ff40a751a6f ("net: atlantic: MACSec ingress offload implementation") Reported-by: Lior Nahmanson <liorna@nvidia.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: sparx5: add ethtool configuration and statistics support	Steen Hegelund
	This adds statistic counters for the network interfaces provided by the driver. It also adds CPU port counters (which are not exposed by ethtool). This also adds support for configuring the network interface parameters via ethtool: speed, duplex, aneg etc. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: sparx5: add calendar bandwidth allocation support	Steen Hegelund
	This configures the Sparx5 calendars according to the bandwidth requested in the Device Tree nodes. It also checks if the total requested bandwidth is within the specs of the detected Sparx5 models limits. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: sparx5: add switching support	Steen Hegelund
	This adds SwitchDev support by hardware offloading the software bridge. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24	net: sparx5: add vlan support	Steen Hegelund
	This adds Sparx5 VLAN support. Sparx5 has more VLAN features than provided here, but these will be added in later series. For now we only add the basic L2 features. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>