summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-11-19powerpc: Only include kup-radix.h for 64-bit Book3SMichael Ellerman
In kup.h we currently include kup-radix.h for all 64-bit builds, which includes Book3S and Book3E. The latter doesn't make sense, Book3E never uses the Radix MMU. This has worked up until now, but almost by accident, and the recent uaccess flush changes introduced a build breakage on Book3E because of the bad structure of the code. So disentangle things so that we only use kup-radix.h for Book3S. This requires some more stubs in kup.h and fixing an include in syscall_64.c. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19powerpc/64s: flush L1D after user accessesNicholas Piggin
IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache after user accesses. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19powerpc/64s: flush L1D on kernel entryNicholas Piggin
IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache on kernel entry. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19selftests/powerpc: rfi_flush: disable entry flush if presentRussell Currey
We are about to add an entry flush. The rfi (exit) flush test measures the number of L1D flushes over a syscall with the RFI flush enabled and disabled. But if the entry flush is also enabled, the effect of enabling and disabling the RFI flush is masked. If there is a debugfs entry for the entry flush, disable it during the RFI flush and restore it later. Reported-by: Spoorthy S <spoorts2@in.ibm.com> Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-18Merge tag 'linux-can-fixes-for-5.10-20201118' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2020-11-18 Jimmy Assarsson provides two patches for the kvaser_pciefd and kvaser_usb drivers, where the can_bittiming_const are fixed. The next patch is by me and fixes an erroneous flexcan_transceiver_enable() during bus-off recovery in the flexcan driver. Jarkko Nikula's patch for the m_can driver fixes the IRQ handler to only process the interrupts if the device is not suspended. * tag 'linux-can-fixes-for-5.10-20201118' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: m_can: process interrupt only when not runtime suspended can: flexcan: flexcan_chip_start(): fix erroneous flexcan_transceiver_enable() during bus-off recovery can: kvaser_usb: kvaser_usb_hydra: Fix KCAN bittiming limits can: kvaser_pciefd: Fix KCAN bittiming limits ==================== Link: https://lore.kernel.org/r/20201118160414.2731659-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-18net/mlx4_core: Fix init_hca fields offsetAya Levin
Slave function read the following capabilities from the wrong offset: 1. log_mc_entry_sz 2. fs_log_entry_sz 3. log_mc_hash_sz Fix that by adjusting these capabilities offset to match firmware layout. Due to the wrong offset read, the following issues might occur: 1+2. Negative value reported at max_mcast_qp_attach. 3. Driver to init FW with multicast hash size of zero. Fixes: a40ded604365 ("net/mlx4_core: Add masking for a few queries on HCA caps") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20201118081922.553-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-18Merge tag 'mlx5-fixes-2020-11-17' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2020-11-17 This series introduces some fixes to mlx5 driver. * tag 'mlx5-fixes-2020-11-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: fix error return code in mlx5e_tc_nic_init() net/mlx5: E-Switch, Fail mlx5_esw_modify_vport_rate if qos disabled net/mlx5: Disable QoS when min_rates on all VFs are zero net/mlx5: Clear bw_share upon VF disable net/mlx5: Add handling of port type in rule deletion net/mlx5e: Fix check if netdev is bond slave net/mlx5e: Fix IPsec packet drop by mlx5e_tc_update_skb net/mlx5e: Set IPsec WAs only in IP's non checksum partial case. net/mlx5e: Fix refcount leak on kTLS RX resync ==================== Link: https://lore.kernel.org/r/20201117195702.386113-1-saeedm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-18atm: nicstar: Unmap DMA on send errorSebastian Andrzej Siewior
The `skb' is mapped for DMA in ns_send() but does not unmap DMA in case push_scqe() fails to submit the `skb'. The memory of the `skb' is released so only the DMA mapping is leaking. Unmap the DMA mapping in case push_scqe() failed. Fixes: 864a3ff635fa7 ("atm: [nicstar] remove virt_to_bus() and support 64-bit platforms") Cc: Chas Williams <3chas3@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-18page_frag: Recover from memory pressureDongli Zhang
The ethernet driver may allocate skb (and skb->data) via napi_alloc_skb(). This ends up to page_frag_alloc() to allocate skb->data from page_frag_cache->va. During the memory pressure, page_frag_cache->va may be allocated as pfmemalloc page. As a result, the skb->pfmemalloc is always true as skb->data is from page_frag_cache->va. The skb will be dropped if the sock (receiver) does not have SOCK_MEMALLOC. This is expected behaviour under memory pressure. However, once kernel is not under memory pressure any longer (suppose large amount of memory pages are just reclaimed), the page_frag_alloc() may still re-use the prior pfmemalloc page_frag_cache->va to allocate skb->data. As a result, the skb->pfmemalloc is always true unless page_frag_cache->va is re-allocated, even if the kernel is not under memory pressure any longer. Here is how kernel runs into issue. 1. The kernel is under memory pressure and allocation of PAGE_FRAG_CACHE_MAX_ORDER in __page_frag_cache_refill() will fail. Instead, the pfmemalloc page is allocated for page_frag_cache->va. 2: All skb->data from page_frag_cache->va (pfmemalloc) will have skb->pfmemalloc=true. The skb will always be dropped by sock without SOCK_MEMALLOC. This is an expected behaviour. 3. Suppose a large amount of pages are reclaimed and kernel is not under memory pressure any longer. We expect skb->pfmemalloc drop will not happen. 4. Unfortunately, page_frag_alloc() does not proactively re-allocate page_frag_alloc->va and will always re-use the prior pfmemalloc page. The skb->pfmemalloc is always true even kernel is not under memory pressure any longer. Fix this by freeing and re-allocating the page instead of recycling it. References: https://lore.kernel.org/lkml/20201103193239.1807-1-dongli.zhang@oracle.com/ References: https://lore.kernel.org/linux-mm/20201105042140.5253-1-willy@infradead.org/ Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Bert Barbe <bert.barbe@oracle.com> Cc: Rama Nichanamatlu <rama.nichanamatlu@oracle.com> Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Cc: Manjunath Patil <manjunath.b.patil@oracle.com> Cc: Joe Jin <joe.jin@oracle.com> Cc: SRINIVAS <srinivas.eeda@oracle.com> Fixes: 79930f5892e1 ("net: do not deplete pfmemalloc reserve") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201115201029.11903-1-dongli.zhang@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-18Merge tag 'gfs2-v5.10-rc4-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fix from Andreas Gruenbacher: "Fix gfs2 freeze/thaw" * tag 'gfs2-v5.10-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix regression in freeze_go_sync
2020-11-18Merge tag 'nfsd-5.10-2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd fix from Bruce Fields: "Just one quick fix for a tracing oops" * tag 'nfsd-5.10-2' of git://linux-nfs.org/~bfields/linux: SUNRPC: Fix oops in the rpc_xdr_buf event class
2020-11-18Merge tag 'linux-kselftest-kunit-fixes-5.10-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kunit fixes from Shuah Khan: "Several fixes to Kunit documentation and tools, and to not pollute the source directory. Also remove the incorrect kunit .gitattributes file" * tag 'linux-kselftest-kunit-fixes-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: fix display of failed expectations for strings kunit: tool: fix extra trailing \n in raw + parsed test output kunit: tool: print out stderr from make (like build warnings) KUnit: Docs: usage: wording fixes KUnit: Docs: style: fix some Kconfig example issues KUnit: Docs: fix a wording typo kunit: Do not pollute source directory with generated files (test.log) kunit: Do not pollute source directory with generated files (.kunitconfig) kunit: tool: fix pre-existing python type annotation errors kunit: Fix kunit.py parse subcommand (use null build_dir) kunit: tool: unmark test_data as binary blobs
2020-11-18net: dsa: mv88e6xxx: Wait for EEPROM done after HW resetAndrew Lunn
When the switch is hardware reset, it reads the contents of the EEPROM. This can contain instructions for programming values into registers and to perform waits between such programming. Reading the EEPROM can take longer than the 100ms mv88e6xxx_hardware_reset() waits after deasserting the reset GPIO. So poll the EEPROM done bit to ensure it is complete. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Ruslan Sushko <rus@sushko.dev> Link: https://lore.kernel.org/r/20201116164301.977661-1-rus@sushko.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-18Merge branch 'mlxsw-couple-of-fixes'Jakub Kicinski
Ido Schimmel says: ==================== mlxsw: Couple of fixes Patch #1 fixes firmware flashing when CONFIG_MLXSW_CORE=y and CONFIG_MLXFW=m. Patch #2 prevents EMAD transactions from needlessly failing when the system is under heavy load by using exponential backoff. Please consider patch #2 for stable. ==================== Link: https://lore.kernel.org/r/20201117173352.288491-1-idosch@idosch.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-18mlxsw: core: Use variable timeout for EMAD retriesIdo Schimmel
The driver sends Ethernet Management Datagram (EMAD) packets to the device for configuration purposes and waits for up to 200ms for a reply. A request is retried up to 5 times. When the system is under heavy load, replies are not always processed in time and EMAD transactions fail. Make the process more robust to such delays by using exponential backoff. First wait for up to 200ms, then retransmit and wait for up to 400ms and so on. Fixes: caf7297e7ab5 ("mlxsw: core: Introduce support for asynchronous EMAD register access") Reported-by: Denis Yulevich <denisyu@nvidia.com> Tested-by: Denis Yulevich <denisyu@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-18mlxsw: Fix firmware flashingIdo Schimmel
The commit cited below moved firmware flashing functionality from mlxsw_spectrum to mlxsw_core, but did not adjust the Kconfig dependencies. This makes it possible to have mlxsw_core as built-in and mlxfw as a module. The mlxfw code is therefore not reachable from mlxsw_core and firmware flashing fails: # devlink dev flash pci/0000:01:00.0 file mellanox/mlxsw_spectrum-13.2008.1310.mfa2 devlink answers: Operation not supported Fix by having mlxsw_core select mlxfw. Fixes: b79cb787ac70 ("mlxsw: Move fw flashing code into core.c") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Vadim Pasternak <vadimp@nvidia.com> Tested-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-18net: Have netpoll bring-up DSA management interfaceFlorian Fainelli
DSA network devices rely on having their DSA management interface up and running otherwise their ndo_open() will return -ENETDOWN. Without doing this it would not be possible to use DSA devices as netconsole when configured on the command line. These devices also do not utilize the upper/lower linking so the check about the netpoll device having upper is not going to be a problem. The solution adopted here is identical to the one done for net/ipv4/ipconfig.c with 728c02089a0e ("net: ipv4: handle DSA enabled master network devices"), with the network namespace scope being restricted to that of the process configuring netpoll. Fixes: 04ff53f96a93 ("net: dsa: Add netconsole support") Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20201117035236.22658-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-18atl1e: fix error return code in atl1e_probe()Zhang Changzhong
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: a6a5325239c2 ("atl1e: Atheros L1E Gigabit Ethernet driver") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1605581875-36281-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-18atl1c: fix error return code in atl1c_probe()Zhang Changzhong
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: 43250ddd75a3 ("atl1c: Atheros L1C Gigabit Ethernet driver") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1605581721-36028-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-18ah6: fix error return code in ah6_input()Zhang Changzhong
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1605581105-35295-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-18net: usb: qmi_wwan: Set DTR quirk for MR400Filip Moc
LTE module MR400 embedded in TL-MR6400 v4 requires DTR to be set. Signed-off-by: Filip Moc <dev@moc6.cz> Acked-by: Bjørn Mork <bjorn@mork.no> Link: https://lore.kernel.org/r/20201117173631.GA550981@moc6.cz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-18regulator: ti-abb: Fix array out of bound read access on the first transitionNishanth Menon
At the start of driver initialization, we do not know what bias setting the bootloader has configured the system for and we only know for certain the very first time we do a transition. However, since the initial value of the comparison index is -EINVAL, this negative value results in an array out of bound access on the very first transition. Since we don't know what the setting is, we just set the bias configuration as there is nothing to compare against. This prevents the array out of bound access. NOTE: Even though we could use a more relaxed check of "< 0" the only valid values(ignoring cosmic ray induced bitflips) are -EINVAL, 0+. Fixes: 40b1936efebd ("regulator: Introduce TI Adaptive Body Bias(ABB) on-chip LDO driver") Link: https://lore.kernel.org/linux-mm/CA+G9fYuk4imvhyCN7D7T6PMDH6oNp6HDCRiTUKMQ6QXXjBa4ag@mail.gmail.com/ Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Nishanth Menon <nm@ti.com> Link: https://lore.kernel.org/r/20201118145009.10492-1-nm@ti.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-11-18can: m_can: process interrupt only when not runtime suspendedJarkko Nikula
Avoid processing bogus interrupt statuses when the HW is runtime suspended and the M_CAN_IR register read may get all bits 1's. Handler can be called if the interrupt request is shared with other peripherals or at the end of free_irq(). Therefore check the runtime suspended status before processing. Fixes: cdf8259d6573 ("can: m_can: Add PM Support") Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Link: https://lore.kernel.org/r/20200915134715.696303-1-jarkko.nikula@linux.intel.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-18gfs2: Fix regression in freeze_go_syncBob Peterson
Patch 541656d3a513 ("gfs2: freeze should work on read-only mounts") changed the check for glock state in function freeze_go_sync() from "gl->gl_state == LM_ST_SHARED" to "gl->gl_req == LM_ST_EXCLUSIVE". That's wrong and it regressed gfs2's freeze/thaw mechanism because it caused only the freezing node (which requests the glock in EX) to queue freeze work. All nodes go through this go_sync code path during the freeze to drop their SHared hold on the freeze glock, allowing the freezing node to acquire it in EXclusive mode. But all the nodes must freeze access to the file system locally, so they ALL must queue freeze work. The freeze_work calls freeze_func, which makes a request to reacquire the freeze glock in SH, effectively blocking until the thaw from the EX holder. Once thawed, the freezing node drops its EX hold on the freeze glock, then the (blocked) freeze_func reacquires the freeze glock in SH again (on all nodes, including the freezer) so all nodes go back to a thawed state. This patch changes the check back to gl_state == LM_ST_SHARED like it was prior to 541656d3a513. Fixes: 541656d3a513 ("gfs2: freeze should work on read-only mounts") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-11-18can: flexcan: flexcan_chip_start(): fix erroneous ↵Marc Kleine-Budde
flexcan_transceiver_enable() during bus-off recovery If the CAN controller goes into bus off, the do_set_mode() callback with CAN_MODE_START can be used to recover the controller, which then calls flexcan_chip_start(). If configured, this is done automatically by the framework or manually by the user. In flexcan_chip_start() there is an explicit call to flexcan_transceiver_enable(), which does a regulator_enable() on the transceiver regulator. This results in a net usage counter increase, as there is no corresponding flexcan_transceiver_disable() in the bus off code path. This further leads to the transceiver stuck enabled, even if the CAN interface is shut down. To fix this problem the flexcan_transceiver_enable()/flexcan_transceiver_disable() are moved out of flexcan_chip_start()/flexcan_chip_stop() into flexcan_open()/flexcan_close(). Fixes: e955cead0311 ("CAN: Add Flexcan CAN controller driver") Link: https://lore.kernel.org/r/20201118150148.2664024-1-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-18can: kvaser_usb: kvaser_usb_hydra: Fix KCAN bittiming limitsJimmy Assarsson
Use correct bittiming limits for the KCAN CAN controller. Fixes: aec5fb2268b7 ("can: kvaser_usb: Add support for Kvaser USB hydra family") Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://lore.kernel.org/r/20201115163027.16851-2-jimmyassarsson@gmail.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-18can: kvaser_pciefd: Fix KCAN bittiming limitsJimmy Assarsson
Use correct bittiming limits for the KCAN CAN controller. Fixes: 26ad340e582d ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices") Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://lore.kernel.org/r/20201115163027.16851-1-jimmyassarsson@gmail.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-17ipv4: use IS_ENABLED instead of ifdefFlorian Klink
Checking for ifdef CONFIG_x fails if CONFIG_x=m. Use IS_ENABLED instead, which is true for both built-ins and modules. Otherwise, a > ip -4 route add 1.2.3.4/32 via inet6 fe80::2 dev eth1 fails with the message "Error: IPv6 support not enabled in kernel." if CONFIG_IPV6 is `m`. In the spirit of b8127113d01e53adba15b41aefd37b90ed83d631. Fixes: d15662682db2 ("ipv4: Allow ipv6 gateway with ipv4 routes") Cc: Kim Phillips <kim.phillips@arm.com> Signed-off-by: Florian Klink <flokli@flokli.de> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20201115224509.2020651-1-flokli@flokli.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17qed: fix ILT configuration of SRC blockDmitry Bogdanov
The code refactoring of ILT configuration was not complete, the old unused variables were used for the SRC block. That could lead to the memory corruption by HW when rx filters are configured. This patch completes that refactoring. Fixes: 8a52bbab39c9 (qed: Debug feature: ilt and mdump) Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Link: https://lore.kernel.org/r/20201116132944.2055-1-dbogdanov@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17inet_diag: Fix error path to cancel the meseage in inet_req_diag_fill()Wang Hai
nlmsg_cancel() needs to be called in the error path of inet_req_diag_fill to cancel the message. Fixes: d545caca827b ("net: inet: diag: expose the socket mark to privileged processes.") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Link: https://lore.kernel.org/r/20201116082018.16496-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17tools/testing/scatterlist: Fix test to compile and runMaor Gottlieb
Add missing define of ALIGN_DOWN to make the test build and run. In addition, __sg_alloc_table_from_pages now support unaligned maximum segment, so adapt the test result accordingly. Fixes: 07da1223ec93 ("lib/scatterlist: Add support in dynamic allocation of SG table from pages") Link: https://lore.kernel.org/r/20201115120623.139113-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-18bpf, sockmap: Avoid failures from skb_to_sgvec when skb has frag_listJohn Fastabend
When skb has a frag_list its possible for skb_to_sgvec() to fail. This happens when the scatterlist has fewer elements to store pages than would be needed for the initial skb plus any of its frags. This case appears rare, but is possible when running an RX parser/verdict programs exposed to the internet. Currently, when this happens we throw an error, break the pipe, and kfree the msg. This effectively breaks the application or forces it to do a retry. Lets catch this case and handle it by doing an skb_linearize() on any skb we receive with frags. At this point skb_to_sgvec should not fail because the failing conditions would require frags to be in place. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/160556576837.73229.14800682790808797635.stgit@john-XPS-13-9370
2020-11-18bpf, sockmap: Handle memory acct if skb_verdict prog redirects to selfJohn Fastabend
If the skb_verdict_prog redirects an skb knowingly to itself, fix your BPF program this is not optimal and an abuse of the API please use SK_PASS. That said there may be cases, such as socket load balancing, where picking the socket is hashed based or otherwise picks the same socket it was received on in some rare cases. If this happens we don't want to confuse userspace giving them an EAGAIN error if we can avoid it. To avoid double accounting in these cases. At the moment even if the skb has already been charged against the sockets rcvbuf and forward alloc we check it again and do set_owner_r() causing it to be orphaned and recharged. For one this is useless work, but more importantly we can have a case where the skb could be put on the ingress queue, but because we are under memory pressure we return EAGAIN. The trouble here is the skb has already been accounted for so any rcvbuf checks include the memory associated with the packet already. This rolls up and can result in unnecessary EAGAIN errors in userspace read() calls. Fix by doing an unlikely check and skipping checks if skb->sk == sk. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/160556574804.73229.11328201020039674147.stgit@john-XPS-13-9370
2020-11-18bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to selfJohn Fastabend
If a socket redirects to itself and it is under memory pressure it is possible to get a socket stuck so that recv() returns EAGAIN and the socket can not advance for some time. This happens because when redirecting a skb to the same socket we received the skb on we first check if it is OK to enqueue the skb on the receiving socket by checking memory limits. But, if the skb is itself the object holding the memory needed to enqueue the skb we will keep retrying from kernel side and always fail with EAGAIN. Then userspace will get a recv() EAGAIN error if there are no skbs in the psock ingress queue. This will continue until either some skbs get kfree'd causing the memory pressure to reduce far enough that we can enqueue the pending packet or the socket is destroyed. In some cases its possible to get a socket stuck for a noticeable amount of time if the socket is only receiving skbs from sk_skb verdict programs. To reproduce I make the socket memory limits ridiculously low so sockets are always under memory pressure. More often though if under memory pressure it looks like a spurious EAGAIN error on user space side causing userspace to retry and typically enough has moved on the memory side that it works. To fix skip memory checks and skb_orphan if receiving on the same sock as already assigned. For SK_PASS cases this is easy, its always the same socket so we can just omit the orphan/set_owner pair. For backlog cases we need to check skb->sk and decide if the orphan and set_owner pair are needed. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/160556572660.73229.12566203819812939627.stgit@john-XPS-13-9370
2020-11-18bpf, sockmap: Use truesize with sk_rmem_schedule()John Fastabend
We use skb->size with sk_rmem_scheduled() which is not correct. Instead use truesize to align with socket and tcp stack usage of sk_rmem_schedule. Suggested-by: Daniel Borkman <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/160556570616.73229.17003722112077507863.stgit@john-XPS-13-9370
2020-11-18bpf, sockmap: Ensure SO_RCVBUF memory is observed on ingress redirectJohn Fastabend
Fix sockmap sk_skb programs so that they observe sk_rcvbuf limits. This allows users to tune SO_RCVBUF and sockmap will honor them. We can refactor the if(charge) case out in later patches. But, keep this fix to the point. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/160556568657.73229.8404601585878439060.stgit@john-XPS-13-9370
2020-11-18bpf, sockmap: Fix partial copy_page_to_iter so progress can still be madeJohn Fastabend
If copy_page_to_iter() fails or even partially completes, but with fewer bytes copied than expected we currently reset sg.start and return EFAULT. This proves problematic if we already copied data into the user buffer before we return an error. Because we leave the copied data in the user buffer and fail to unwind the scatterlist so kernel side believes data has been copied and user side believes data has _not_ been received. Expected behavior should be to return number of bytes copied and then on the next read we need to return the error assuming its still there. This can happen if we have a copy length spanning multiple scatterlist elements and one or more complete before the error is hit. The error is rare enough though that my normal testing with server side programs, such as nginx, httpd, envoy, etc., I have never seen this. The only reliable way to reproduce that I've found is to stream movies over my browser for a day or so and wait for it to hang. Not very scientific, but with a few extra WARN_ON()s in the code the bug was obvious. When we review the errors from copy_page_to_iter() it seems we are hitting a page fault from copy_page_to_iter_iovec() where the code checks fault_in_pages_writeable(buf, copy) where buf is the user buffer. It also seems typical server applications don't hit this case. The other way to try and reproduce this is run the sockmap selftest tool test_sockmap with data verification enabled, but it doesn't reproduce the fault. Perhaps we can trigger this case artificially somehow from the test tools. I haven't sorted out a way to do that yet though. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/160556566659.73229.15694973114605301063.stgit@john-XPS-13-9370
2020-11-17net/tls: Fix wrong record sn in async mode of device resyncTariq Toukan
In async_resync mode, we log the TCP seq of records until the async request is completed. Later, in case one of the logged seqs matches the resync request, we return it, together with its record serial number. Before this fix, we mistakenly returned the serial number of the current record instead. Fixes: ed9b7646b06a ("net/tls: Add asynchronous resync") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Link: https://lore.kernel.org/r/20201115131448.2702-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17netdevsim: set .owner to THIS_MODULETaehee Yoo
If THIS_MODULE is not set, the module would be removed while debugfs is being used. It eventually makes kernel panic. Fixes: 82c93a87bf8b ("netdevsim: implement couple of testing devlink health reporters") Fixes: 424be63ad831 ("netdevsim: add UDP tunnel port offload support") Fixes: 4418f862d675 ("netdevsim: implement support for devlink region and snapshots") Fixes: d3cbb907ae57 ("netdevsim: add ACL trap reporting cookie as a metadata") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://lore.kernel.org/r/20201115103041.30701-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17enetc: Workaround for MDIO register access issueAlex Marginean
Due to a hardware issue, an access to MDIO registers that is concurrent with other ENETC register accesses may lead to the MDIO access being dropped or corrupted. The workaround introduces locking for all register accesses to the ENETC register space. To reduce performance impact, a readers-writers locking scheme has been implemented. The writer in this case is the MDIO access code (irrelevant whether that MDIO access is a register read or write), and the reader is any access code to non-MDIO ENETC registers. Also, the datapath functions acquire the read lock fewer times and use _hot accessors. All the rest of the code uses the _wa accessors which lock every register access. The commit introducing MDIO support is - commit ebfcb23d62ab ("enetc: Add ENETC PF level external MDIO support") but due to subsequent refactoring this patch is applicable on top of a later commit. Fixes: 6517798dd343 ("enetc: Make MDIO accessors more generic and export to include/linux/fsl") Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20201112182608.26177-1-claudiu.manoil@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "A fix for use-after-free in the Sun keyboard driver, a fix to firmware updates on newer ICs in the Elan touchpad diver, and a couple misc driver fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: elan_i2c - fix firmware update on newer ICs Input: resistive-adc-touch - fix kconfig dependency on IIO_BUFFER Input: sunkbd - avoid use-after-free in teardown paths Input: i8042 - allow insmod to succeed on devices without an i8042 controller Input: adxl34x - clean up a data type in adxl34x_probe()
2020-11-17net/mlx5: fix error return code in mlx5e_tc_nic_init()Wang Hai
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: aedd133d17bc ("net/mlx5e: Support CT offload for tc nic flows") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-11-17net/mlx5: E-Switch, Fail mlx5_esw_modify_vport_rate if qos disabledEli Cohen
Avoid calling mlx5_esw_modify_vport_rate() if qos is not enabled and avoid unnecessary syndrome messages from firmware. Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support") Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-11-17net/mlx5: Disable QoS when min_rates on all VFs are zeroVladyslav Tarasiuk
Currently when QoS is enabled for VF and any min_rate is configured, the driver sets bw_share value to at least 1 and doesn’t allow to set it to 0 to make minimal rate unlimited. It means there is always a minimal rate configured for every VF, even if user tries to remove it. In order to make QoS disable possible, check whether all vports have configured min_rate = 0. If this is true, set their bw_share to 0 to disable min_rate limitations. Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate") Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-11-17net/mlx5: Clear bw_share upon VF disableVladyslav Tarasiuk
Currently, if user disables VFs with some min and max rates configured, they are cleared. But QoS data is not cleared and restored upon next VF enable placing limits on minimal rate for given VF, when user expects none. To match cleared vport->info struct with QoS-related min and max rates upon VF disable, clear vport->qos struct too. Fixes: 556b9d16d3f5 ("net/mlx5: Clear VF's configuration on disabling SRIOV") Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-11-17net/mlx5: Add handling of port type in rule deletionMichael Guralnik
Handle destruction of rules with port destination type to enable full destruction of flow. Without this handling of TX rules the deletion of these rules fails. Dmesg of flow destruction failure: [ 203.714146] mlx5_core 0000:00:0b.0: mlx5_cmd_check:753:(pid 342): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x144b7a) [ 210.547387] ------------[ cut here ]------------ [ 210.548663] refcount_t: decrement hit 0; leaking memory. [ 210.550651] WARNING: CPU: 4 PID: 342 at lib/refcount.c:31 refcount_warn_saturate+0x5c/0x110 [ 210.550654] Modules linked in: mlx5_ib mlx5_core ib_ipoib rdma_ucm rdma_cm iw_cm ib_cm ib_umad ib_uverbs ib_core [ 210.550675] CPU: 4 PID: 342 Comm: test Not tainted 5.8.0-rc2+ #116 [ 210.550678] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 210.550680] RIP: 0010:refcount_warn_saturate+0x5c/0x110 [ 210.550685] Code: c6 d1 1b 01 00 0f 84 ad 00 00 00 5b 5d c3 80 3d b5 d1 1b 01 00 75 f4 48 c7 c7 20 d1 15 82 c6 05 a5 d1 1b 01 01 e8 a7 eb af ff <0f> 0b eb dd 80 3d 99 d1 1b 01 00 75 d4 48 c7 c7 c0 cf 15 82 c6 05 [ 210.550687] RSP: 0018:ffff8881642e77e8 EFLAGS: 00010282 [ 210.550691] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000 [ 210.550694] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed102c85ceef [ 210.550696] RBP: ffff888161720428 R08: ffffffff8124c10e R09: ffffed103243beae [ 210.550698] R10: ffff8881921df56b R11: ffffed103243bead R12: ffff8881841b4180 [ 210.550701] R13: ffff888161720428 R14: ffff8881616d0000 R15: ffff888161720380 [ 210.550704] FS: 00007fc27f025740(0000) GS:ffff888192000000(0000) knlGS:0000000000000000 [ 210.550706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 210.550708] CR2: 0000557e4b41a6a0 CR3: 0000000002415004 CR4: 0000000000360ea0 [ 210.550711] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 210.550713] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 210.550715] Call Trace: [ 210.550717] mlx5_del_flow_rules+0x484/0x490 [mlx5_core] [ 210.550720] ? mlx5_cmd_set_fte+0xa80/0xa80 [mlx5_core] [ 210.550722] mlx5_ib_destroy_flow+0x17f/0x280 [mlx5_ib] [ 210.550724] uverbs_free_flow+0x4c/0x90 [ib_uverbs] [ 210.550726] destroy_hw_idr_uobject+0x41/0xb0 [ib_uverbs] [ 210.550728] uverbs_destroy_uobject+0xaa/0x390 [ib_uverbs] [ 210.550731] __uverbs_cleanup_ufile+0x129/0x1b0 [ib_uverbs] [ 210.550733] ? uverbs_destroy_uobject+0x390/0x390 [ib_uverbs] [ 210.550735] uverbs_destroy_ufile_hw+0x78/0x190 [ib_uverbs] [ 210.550737] ib_uverbs_close+0x36/0x140 [ib_uverbs] [ 210.550739] __fput+0x181/0x380 [ 210.550741] task_work_run+0x88/0xd0 [ 210.550743] do_exit+0x5f6/0x13b0 [ 210.550745] ? sched_clock_cpu+0x30/0x140 [ 210.550747] ? is_current_pgrp_orphaned+0x70/0x70 [ 210.550750] ? lock_downgrade+0x360/0x360 [ 210.550752] ? mark_held_locks+0x1d/0x90 [ 210.550754] do_group_exit+0x8a/0x140 [ 210.550756] get_signal+0x20a/0xf50 [ 210.550758] do_signal+0x8c/0xbe0 [ 210.550760] ? hrtimer_nanosleep+0x1d8/0x200 [ 210.550762] ? nanosleep_copyout+0x50/0x50 [ 210.550764] ? restore_sigcontext+0x320/0x320 [ 210.550766] ? __hrtimer_init+0xf0/0xf0 [ 210.550768] ? timespec64_add_safe+0x150/0x150 [ 210.550770] ? mark_held_locks+0x1d/0x90 [ 210.550772] ? lockdep_hardirqs_on_prepare+0x14c/0x240 [ 210.550774] __prepare_exit_to_usermode+0x119/0x170 [ 210.550776] do_syscall_64+0x65/0x300 [ 210.550778] ? trace_hardirqs_off+0x10/0x120 [ 210.550781] ? mark_held_locks+0x1d/0x90 [ 210.550783] ? asm_sysvec_apic_timer_interrupt+0xa/0x20 [ 210.550785] ? lockdep_hardirqs_on+0x112/0x190 [ 210.550787] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 210.550789] RIP: 0033:0x7fc27f1cd157 [ 210.550791] Code: Bad RIP value. [ 210.550793] RSP: 002b:00007ffd4db27ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000023 [ 210.550798] RAX: fffffffffffffdfc RBX: ffffffffffffff80 RCX: 00007fc27f1cd157 [ 210.550800] RDX: 00007fc27f025740 RSI: 00007ffd4db27eb0 RDI: 00007ffd4db27eb0 [ 210.550803] RBP: 0000000000000016 R08: 0000000000000000 R09: 000000000000000e [ 210.550805] R10: 00007ffd4db27dc7 R11: 0000000000000246 R12: 0000000000400c00 [ 210.550808] R13: 00007ffd4db285f0 R14: 0000000000000000 R15: 0000000000000000 [ 210.550809] irq event stamp: 49399 [ 210.550812] hardirqs last enabled at (49399): [<ffffffff81172d36>] console_unlock+0x556/0x6f0 [ 210.550815] hardirqs last disabled at (49398): [<ffffffff81172897>] console_unlock+0xb7/0x6f0 [ 210.550818] softirqs last enabled at (48706): [<ffffffff81e0037b>] __do_softirq+0x37b/0x60c [ 210.550820] softirqs last disabled at (48697): [<ffffffff81c00e2f>] asm_call_on_stack+0xf/0x20 [ 210.550822] ---[ end trace ad18c0e6fa846454 ]--- [ 210.581862] mlx5_core 0000:00:0c.0: mlx5_destroy_flow_table:2132:(pid 342): Flow table 262150 wasn't destroyed, refcount > 1 Fixes: a7ee18bdee83 ("RDMA/mlx5: Allow creating a matcher for a NIC TX flow table") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-11-17net/mlx5e: Fix check if netdev is bond slaveMaor Dickman
Bond events handler uses bond_slave_get_rtnl to check if net device is bond slave. bond_slave_get_rtnl return the rcu rx_handler pointer from the netdev which exists for bond slaves but also exists for devices that are attached to linux bridge so using it as indication for bond slave is wrong. Fix by using netif_is_lag_port instead. Fixes: 7e51891a237f ("net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-11-17net/mlx5e: Fix IPsec packet drop by mlx5e_tc_update_skbHuy Nguyen
Both TC and IPsec crypto offload use metadata_regB to store private information. Since TC does not use bit 31 of regB, IPsec will use bit 31 as the IPsec packet marker. The IPsec's regB usage is changed to: Bit31: IPsec marker Bit30-24: IPsec syndrome Bit23-0: IPsec obj id Fixes: b2ac7541e377 ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-11-17net/mlx5e: Set IPsec WAs only in IP's non checksum partial case.Huy Nguyen
The IP's checksum partial still requires L4 csum flag on Ethernet WQE. Make the IPsec WAs only for the IP's non checksum partial case (for example icmd packet) Fixes: 5be019040cb7 ("net/mlx5e: IPsec: Add Connect-X IPsec Tx data path offload") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Alaa Hleihel <alaa@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-11-17net/mlx5e: Fix refcount leak on kTLS RX resyncMaxim Mikityanskiy
On resync, the driver calls inet_lookup_established (__inet6_lookup_established) that increases sk_refcnt of the socket. To decrease it, the driver set skb->destructor to sock_edemux. However, it didn't work well, because the TCP stack also sets this destructor for early demux, and the refcount gets decreased only once, while increased two times (in mlx5e and in the TCP stack). It leads to a socket leak, a TLS context leak, which in the end leads to calling tls_dev_del twice: on socket close and on driver unload, which in turn leads to a crash. This commit fixes the refcount leak by calling sock_gen_put right away after using the socket, thus fixing all the subsequent issues. Fixes: 0419d8c9d8f8 ("net/mlx5e: kTLS, Add kTLS RX resync support") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>