summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-12-08mlxsw: spectrum_ipip: Add Spectrum-1 ip6gre supportIdo Schimmel
As explained in the previous patch, the existing Spectrum-2 ip6gre implementation can be reused for Spectrum-1. Change the Spectrum-1 ip6gre operations structure to use the common operations. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08mlxsw: spectrum_ipip: Rename Spectrum-2 ip6gre operationsIdo Schimmel
There are two main differences between Spectrum-1 and newer ASICs in terms of IP-in-IP support: 1. In Spectrum-1, RIFs representing ip6gre tunnels require two entries in the RIF table. 2. In Spectrum-2 and newer ASICs, packets ingress the underlay (during encapsulation) and egress the underlay (during decapsulation) via a special generic loopback RIF. The first difference was handled in previous patches by adding the 'double_rif_entry' field to the Spectrum-1 operations structure of ip6gre RIFs. The second difference is handled during RIF creation, by only creating a generic loopback RIF in Spectrum-2 and newer ASICs. Therefore, the ip6gre operations can be shared between Spectrum-1 and newer ASIC in a similar fashion to how the ipgre operations are shared. Rename the operations to not be Spectrum-2 specific and move them earlier in the file so that they could later be used for Spectrum-1. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08mlxsw: spectrum_router: Add support for double entry RIFsIdo Schimmel
In Spectrum-1, loopback router interfaces (RIFs) used for IP-in-IP encapsulation with an IPv6 underlay require two RIF entries and the RIF index must be even. Prepare for this change by extending the RIF parameters structure with a 'double_entry' field that indicates if the RIF being created requires two RIF entries or not. Only set it for RIFs representing ip6gre tunnels in Spectrum-1. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08mlxsw: spectrum_router: Parametrize RIF allocation sizeIdo Schimmel
Currently, each router interface (RIF) consumes one entry in the RIFs table. This is going to change in subsequent patches where some RIFs will consume two table entries. Prepare for this change by parametrizing the RIF allocation size. For now, always pass '1'. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08mlxsw: spectrum_router: Use gen_pool for RIF index allocationIdo Schimmel
Currently, each router interface (RIF) consumes one entry in the RIFs table and there are no alignment constraints. This is going to change in subsequent patches where some RIFs will consume two table entries and their indexes will need to be aligned to the allocation size (even). Prepare for this change by converting the RIF index allocation to use gen_pool with the 'gen_pool_first_fit_order_align' algorithm. No Kconfig changes necessary as mlxsw already selects 'GENERIC_ALLOCATOR'. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08Merge tag 'net-6.1-rc9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, can and netfilter. Current release - new code bugs: - bonding: ipv6: correct address used in Neighbour Advertisement parsing (src vs dst typo) - fec: properly scope IRQ coalesce setup during link up to supported chips only Previous releases - regressions: - Bluetooth fixes for fake CSR clones (knockoffs): - re-add ERR_DATA_REPORTING quirk - fix crash when device is replugged - Bluetooth: - silence a user-triggerable dmesg error message - L2CAP: fix u8 overflow, oob access - correct vendor codec definition - fix support for Read Local Supported Codecs V2 - ti: am65-cpsw: fix RGMII configuration at SPEED_10 - mana: fix race on per-CQ variable NAPI work_done Previous releases - always broken: - af_unix: diag: fetch user_ns from in_skb in unix_diag_get_exact(), avoid null-deref - af_can: fix NULL pointer dereference in can_rcv_filter - can: slcan: fix UAF with a freed work - can: can327: flush TX_work on ldisc .close() - macsec: add missing attribute validation for offload - ipv6: avoid use-after-free in ip6_fragment() - nft_set_pipapo: actually validate intervals in fields after the first one - mvneta: prevent oob access in mvneta_config_rss() - ipv4: fix incorrect route flushing when table ID 0 is used, or when source address is deleted - phy: mxl-gpy: add workaround for IRQ bug on GPY215B and GPY215C" * tag 'net-6.1-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits) net: dsa: sja1105: avoid out of bounds access in sja1105_init_l2_policing() s390/qeth: fix use-after-free in hsci macsec: add missing attribute validation for offload net: mvneta: Fix an out of bounds check net: thunderbolt: fix memory leak in tbnet_open() ipv6: avoid use-after-free in ip6_fragment() net: plip: don't call kfree_skb/dev_kfree_skb() under spin_lock_irq() net: phy: mxl-gpy: add MDINT workaround net: dsa: mv88e6xxx: accept phy-mode = "internal" for internal PHY ports xen/netback: don't call kfree_skb() under spin_lock_irqsave() dpaa2-switch: Fix memory leak in dpaa2_switch_acl_entry_add() and dpaa2_switch_acl_entry_remove() ethernet: aeroflex: fix potential skb leak in greth_init_rings() tipc: call tipc_lxc_xmit without holding node_read_lock can: esd_usb: Allow REC and TEC to return to zero can: can327: flush TX_work on ldisc .close() can: slcan: fix freed work crash can: af_can: fix NULL pointer dereference in can_rcv_filter net: dsa: sja1105: fix memory leak in sja1105_setup_devlink_regions() ipv4: Fix incorrect route flushing when table ID 0 is used ipv4: Fix incorrect route flushing when source address is deleted ...
2022-12-08Merge branch 'mlx4-better-big-tcp-support'Jakub Kicinski
Eric Dumazet says: ==================== mlx4: better BIG-TCP support mlx4 uses a bounce buffer in TX whenever the tx descriptors wrap around the right edge of the ring. Size of this bounce buffer was hard coded and can be increased if/when needed. ==================== Link: https://lore.kernel.org/r/20221207141237.2575012-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08net/mlx4: small optimization in mlx4_en_xmit()Eric Dumazet
Test against MLX4_MAX_DESC_TXBBS only matters if the TX bounce buffer is going to be used. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGSEric Dumazet
Google production kernel has increased MAX_SKB_FRAGS to 45 for BIG-TCP rollout. Unfortunately mlx4 TX bounce buffer is not big enough whenever an skb has up to 45 page fragments. This can happen often with TCP TX zero copy, as one frag usually holds 4096 bytes of payload (order-0 page). Tested: Kernel built with MAX_SKB_FRAGS=45 ip link set dev eth0 gso_max_size 185000 netperf -t TCP_SENDFILE I made sure that "ethtool -G eth0 tx 64" was properly working, ring->full_size being set to 15. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Wei Wang <weiwan@google.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08net/mlx4: rename two constantsEric Dumazet
MAX_DESC_SIZE is really the size of the bounce buffer used when reaching the right side of TX ring buffer. MAX_DESC_TXBBS get a MLX4_ prefix. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08Merge branch 'mlx5-Support-tc-police-jump-conform-exceed-attribute'Jakub Kicinski
Saeed Mahameed says: ==================== Support tc police jump conform-exceed attribute The tc police action conform-exceed option defines how to handle packets which exceed or conform to the configured bandwidth limit. One of the possible conform-exceed values is jump, which skips over a specified number of actions. This series adds support for conform-exceed jump action. The series adds platform support for branching actions by providing true/false flow attributes to the branching action. This is necessary for supporting police jump, as each branch may execute a different action list. The first five patches are preparation patches: - Patches 1 and 2 add support for actions with no destinations (e.g. drop) - Patch 3 refactor the code for subsequent function reuse - Patch 4 defines an abstract way for identifying terminating actions - Patch 5 updates action list validations logic considering branching actions The following three patches introduce an interface for abstracting branching actions: - Patch 6 introduces an abstract api for defining branching actions - Patch 7 generically instantiates the branching flow attributes using the abstract API Patch 8 adds the platform support for jump actions, by executing the following sequence: a. Store the jumping flow attr b. Identify the jump target action while iterating the actions list. c. Instantiate a new flow attribute after the jump target action. This is the flow attribute that the branching action should jump to. d. Set the target post action id on: d.1. The jumping attribute, thus realizing the jump functionality. d.2. The attribute preceding the target jump attr, if not terminating. The next patches apply the platform's branching attributes to the police action: - Patch 9 is a refactor patch - Patch 10 initializes the post meter table with the red/green flow attributes, as were initialized by the platform - Patch 11 enables the offload of meter actions using jump conform-exceed value. ==================== Link: https://lore.kernel.org/all/20221203221337.29267-1-saeed@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08net/mlx5e: TC, allow meter jump control actionOz Shlomo
Separate the matchall police action validation from flower validation. Isolate the action validation logic in the police action parser. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-12-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08net/mlx5e: TC, init post meter rules with branching attributesOz Shlomo
Instantiate the post meter actions with the platform initialized branching action attributes. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-11-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08net/mlx5e: TC, rename post_meter actionsOz Shlomo
Currently post meter supports only the pipe/drop conform-exceed policy. This assumption is reflected in several variable names. Rename the following variables as a pre-step for using the generalized branching action platform. Rename fwd_green_rule/drop_red_rule to green_rule/red_rule respectively. Repurpose red_counter/green_counter to act_counter/drop_counter to allow police conform-exceed configurations that do not drop. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-10-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08net/mlx5e: TC, initialize branching action with target attrOz Shlomo
Identify the jump target action when iterating the action list. Initialize the jump target attr with the jumping attribute during the parsing phase. Initialize the jumping attr post action with the target during the offload phase. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-9-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08net/mlx5e: TC, initialize branch flow attributesOz Shlomo
Initialize flow attribute for drop, accept, pipe and jump branching actions. Instantiate a flow attribute instance according to the specified branch control action. Store the branching attributes on the branching action flow attribute during the parsing phase. Then, during the offload phase, allocate the relevant mod header objects to the branching actions. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-8-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08net/mlx5e: TC, set control params for branching actionsOz Shlomo
Extend the act tc api to set the branch control params aligning with the police conform/exceed use case. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-7-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08net/mlx5e: TC, validate action list per attributeOz Shlomo
Currently the entire flow action list is validate for offload limitations. For example, flow with both forward and drop actions are declared invalid due to hardware restrictions. However, a multi-table hardware model changes the limitations from a flow scope to a single flow attribute scope. Apply offload limitations to flow attributes instead of the entire flow. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-6-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08net/mlx5e: TC, add terminating actionsOz Shlomo
Extend act api to identify actions that terminate action list. Pre-step for terminating branching actions. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-5-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08net/mlx5e: TC, reuse flow attribute post parser processingOz Shlomo
After the tc action parsing phase the flow attribute is initialized with relevant eswitch offload objects such as tunnel, vlan, header modify and counter attributes. The post processing is done both for fdb and post-action attributes. Reuse the flow attribute post parsing logic by both fdb and post-action offloads. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-4-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08net/mlx5: fs, assert null dest pointer when dest_num is 0Oz Shlomo
Currently create_flow_handle() assumes a null dest pointer when there are no destinations. This might not be the case as the caller may pass an allocated dest array while setting the dest_num parameter to 0. Assert null dest array for flow rules that have no destinations (e.g. drop rule). Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-3-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08net/mlx5e: E-Switch, handle flow attribute with no destinationsOz Shlomo
Rules with drop action are not required to have a destination. Currently the destination list is allocated with the maximum number of destinations and passed to the fs_core layer along with the actual number of destinations. Remove redundant passing of dest pointer when count of dest is 0. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08Merge tag 'for-linus-2022120801' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: "A regression fix for handling Logitech HID++ devices and memory corruption fixes: - regression fix (revert) for catch-all handling of Logitech HID++ Bluetooth devices; there are devices that turn out not to work with this, and the root cause is yet to be properly understood. So we are dropping it for now, and it will be revisited for 6.2 or 6.3 (Benjamin Tissoires) - memory corruption fix in HID core (ZhangPeng) - memory corruption fix in hid-lg4ff (Anastasia Belova) - Kconfig fix for I2C_HID (Benjamin Tissoires) - a few device-id specific quirks that piggy-back on top of the important fixes above" * tag 'for-linus-2022120801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: Revert "HID: logitech-hidpp: Enable HID++ for all the Logitech Bluetooth devices" Revert "HID: logitech-hidpp: Remove special-casing of Bluetooth devices" HID: usbhid: Add ALWAYS_POLL quirk for some mice HID: core: fix shift-out-of-bounds in hid_report_raw_event HID: uclogic: Add HID_QUIRK_HIDINPUT_FORCE quirk HID: fix I2C_HID not selected when I2C_HID_OF_ELAN is HID: hid-lg4ff: Add check for empty lbuf HID: ite: Enable QUIRK_TOUCHPAD_ON_OFF_REPORT on Acer Aspire Switch V 10 HID: uclogic: Fix frame templates for big endian architectures
2022-12-08Merge tag 'soc-fixes-6.1-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fix from Arnd Bergmann: "One last build fix came in, addressing a link failure when building without CONFIG_OUTER_CACHE" * tag 'soc-fixes-6.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: ARM: at91: fix build for SAMA5D3 w/o L2 cache
2022-12-08Revert "HID: logitech-hidpp: Enable HID++ for all the Logitech Bluetooth ↵Benjamin Tissoires
devices" This reverts commit 532223c8ac57605a10e46dc0ab23dcf01c9acb43. As reported in [0], hid-logitech-hidpp now binds on all bluetooth mice, but there are corner cases where hid-logitech-hidpp just gives up on the mouse. This leads the end user with a dead mouse. Given that we are at -rc8, we are definitively too late to find a proper fix. We already identified 2 issues less than 24 hours after the bug report. One in that ->match() was never designed to be used anywhere else than in hid-generic, and the other that hid-logitech-hidpp has corner cases where it gives up on devices it is not supposed to. So we have no choice but postpone this patch to the next kernel release. [0] https://lore.kernel.org/linux-input/CAJZ5v0g-_o4AqMgNwihCb0jrwrcJZfRrX=jv8aH54WNKO7QB8A@mail.gmail.com/ Reported-by: Rafael J . Wysocki <rjw@rjwysocki.net> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2022-12-08Revert "HID: logitech-hidpp: Remove special-casing of Bluetooth devices"Benjamin Tissoires
This reverts commit 8544c812e43ab7bdf40458411b83987b8cba924d. We need to revert commit 532223c8ac57 ("HID: logitech-hidpp: Enable HID++ for all the Logitech Bluetooth devices") because that commit might make hid-logitech-hidpp bind on mice that are not well enough supported by hid-logitech-hidpp, and the end result is that the probe of those mice is now returning -ENODEV, leaving the end user with a dead mouse. Given that commit 8544c812e43a ("HID: logitech-hidpp: Remove special-casing of Bluetooth devices") is a direct dependency of 532223c8ac57, revert it too. Note that this also adapt according to commit 908d325e1665 ("HID: logitech-hidpp: Detect hi-res scrolling support") to re-add support of the devices that were removed from that commit too. I have locally an MX Master and I tested this device with that revert, ensuring we still have high-res scrolling. Reported-by: Rafael J . Wysocki <rjw@rjwysocki.net> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2022-12-08Merge tag 'loongarch-fixes-6.1-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Export smp_send_reschedule() for modules use, fix a huge page entry update issue, and add documents for booting description" * tag 'loongarch-fixes-6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: docs/zh_CN: Add LoongArch booting description's translation docs/LoongArch: Add booting description LoongArch: mm: Fix huge page entry update for virtual machine LoongArch: Export symbol for function smp_send_reschedule()
2022-12-08Merge tag 'for-linus-xsa-6.1-rc9b-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fix from Juergen Gross: "A single fix for the recent security issue XSA-423" * tag 'for-linus-xsa-6.1-rc9b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/netback: fix build warning
2022-12-08Merge tag 'gpio-fixes-for-v6.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix a memory leak in gpiolib core - fix reference leaks in gpio-amd8111 and gpio-rockchip * tag 'gpio-fixes-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio/rockchip: fix refcount leak in rockchip_gpiolib_register() gpio: amd8111: Fix PCI device reference count leak gpiolib: fix memory leak in gpiochip_setup_dev()
2022-12-08Merge tag 'ata-6.1-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ATA fix from Damien Le Moal: - Avoid a NULL pointer dereference in the libahci platform code that can happen on initialization when a device tree does not specify names for the adapter clocks (from Anders) * tag 'ata-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: ata: libahci_platform: ahci_platform_find_clk: oops, NULL pointer
2022-12-08memcg: Fix possible use-after-free in memcg_write_event_control()Tejun Heo
memcg_write_event_control() accesses the dentry->d_name of the specified control fd to route the write call. As a cgroup interface file can't be renamed, it's safe to access d_name as long as the specified file is a regular cgroup file. Also, as these cgroup interface files can't be removed before the directory, it's safe to access the parent too. Prior to 347c4a874710 ("memcg: remove cgroup_event->cft"), there was a call to __file_cft() which verified that the specified file is a regular cgroupfs file before further accesses. The cftype pointer returned from __file_cft() was no longer necessary and the commit inadvertently dropped the file type check with it allowing any file to slip through. With the invarients broken, the d_name and parent accesses can now race against renames and removals of arbitrary files and cause use-after-free's. Fix the bug by resurrecting the file type check in __file_cft(). Now that cgroupfs is implemented through kernfs, checking the file operations needs to go through a layer of indirection. Instead, let's check the superblock and dentry type. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 347c4a874710 ("memcg: remove cgroup_event->cft") Cc: stable@kernel.org # v3.14+ Reported-by: Jann Horn <jannh@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-08net: dsa: sja1105: avoid out of bounds access in sja1105_init_l2_policing()Radu Nicolae Pirea (OSS)
The SJA1105 family has 45 L2 policing table entries (SJA1105_MAX_L2_POLICING_COUNT) and SJA1110 has 110 (SJA1110_MAX_L2_POLICING_COUNT). Keeping the table structure but accounting for the difference in port count (5 in SJA1105 vs 10 in SJA1110) does not fully explain the difference. Rather, the SJA1110 also has L2 ingress policers for multicast traffic. If a packet is classified as multicast, it will be processed by the policer index 99 + SRCPORT. The sja1105_init_l2_policing() function initializes all L2 policers such that they don't interfere with normal packet reception by default. To have a common code between SJA1105 and SJA1110, the index of the multicast policer for the port is calculated because it's an index that is out of bounds for SJA1105 but in bounds for SJA1110, and a bounds check is performed. The code fails to do the proper thing when determining what to do with the multicast policer of port 0 on SJA1105 (ds->num_ports = 5). The "mcast" index will be equal to 45, which is also equal to table->ops->max_entry_count (SJA1105_MAX_L2_POLICING_COUNT). So it passes through the check. But at the same time, SJA1105 doesn't have multicast policers. So the code programs the SHARINDX field of an out-of-bounds element in the L2 Policing table of the static config. The comparison between index 45 and 45 entries should have determined the code to not access this policer index on SJA1105, since its memory wasn't even allocated. With enough bad luck, the out-of-bounds write could even overwrite other valid kernel data, but in this case, the issue was detected using KASAN. Kernel log: sja1105 spi5.0: Probed switch chip: SJA1105Q ================================================================== BUG: KASAN: slab-out-of-bounds in sja1105_setup+0x1cbc/0x2340 Write of size 8 at addr ffffff880bd57708 by task kworker/u8:0/8 ... Workqueue: events_unbound deferred_probe_work_func Call trace: ... sja1105_setup+0x1cbc/0x2340 dsa_register_switch+0x1284/0x18d0 sja1105_probe+0x748/0x840 ... Allocated by task 8: ... sja1105_setup+0x1bcc/0x2340 dsa_register_switch+0x1284/0x18d0 sja1105_probe+0x748/0x840 ... Fixes: 38fbe91f2287 ("net: dsa: sja1105: configure the multicast policers, if present") CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Radu Nicolae Pirea (OSS) <radu-nicolae.pirea@oss.nxp.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20221207132347.38698-1-radu-nicolae.pirea@oss.nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08s390/qeth: fix use-after-free in hsciAlexandra Winter
KASAN found that addr was dereferenced after br2dev_event_work was freed. ================================================================== BUG: KASAN: use-after-free in qeth_l2_br2dev_worker+0x5ba/0x6b0 Read of size 1 at addr 00000000fdcea440 by task kworker/u760:4/540 CPU: 17 PID: 540 Comm: kworker/u760:4 Tainted: G E 6.1.0-20221128.rc7.git1.5aa3bed4ce83.300.fc36.s390x+kasan #1 Hardware name: IBM 8561 T01 703 (LPAR) Workqueue: 0.0.8000_event qeth_l2_br2dev_worker Call Trace: [<000000016944d4ce>] dump_stack_lvl+0xc6/0xf8 [<000000016942cd9c>] print_address_description.constprop.0+0x34/0x2a0 [<000000016942d118>] print_report+0x110/0x1f8 [<0000000167a7bd04>] kasan_report+0xfc/0x128 [<000000016938d79a>] qeth_l2_br2dev_worker+0x5ba/0x6b0 [<00000001673edd1e>] process_one_work+0x76e/0x1128 [<00000001673ee85c>] worker_thread+0x184/0x1098 [<000000016740718a>] kthread+0x26a/0x310 [<00000001672c606a>] __ret_from_fork+0x8a/0xe8 [<00000001694711da>] ret_from_fork+0xa/0x40 Allocated by task 108338: kasan_save_stack+0x40/0x68 kasan_set_track+0x36/0x48 __kasan_kmalloc+0xa0/0xc0 qeth_l2_switchdev_event+0x25a/0x738 atomic_notifier_call_chain+0x9c/0xf8 br_switchdev_fdb_notify+0xf4/0x110 fdb_notify+0x122/0x180 fdb_add_entry.constprop.0.isra.0+0x312/0x558 br_fdb_add+0x59e/0x858 rtnl_fdb_add+0x58a/0x928 rtnetlink_rcv_msg+0x5f8/0x8d8 netlink_rcv_skb+0x1f2/0x408 netlink_unicast+0x570/0x790 netlink_sendmsg+0x752/0xbe0 sock_sendmsg+0xca/0x110 ____sys_sendmsg+0x510/0x6a8 ___sys_sendmsg+0x12a/0x180 __sys_sendmsg+0xe6/0x168 __do_sys_socketcall+0x3c8/0x468 do_syscall+0x22c/0x328 __do_syscall+0x94/0xf0 system_call+0x82/0xb0 Freed by task 540: kasan_save_stack+0x40/0x68 kasan_set_track+0x36/0x48 kasan_save_free_info+0x4c/0x68 ____kasan_slab_free+0x14e/0x1a8 __kasan_slab_free+0x24/0x30 __kmem_cache_free+0x168/0x338 qeth_l2_br2dev_worker+0x154/0x6b0 process_one_work+0x76e/0x1128 worker_thread+0x184/0x1098 kthread+0x26a/0x310 __ret_from_fork+0x8a/0xe8 ret_from_fork+0xa/0x40 Last potentially related work creation: kasan_save_stack+0x40/0x68 __kasan_record_aux_stack+0xbe/0xd0 insert_work+0x56/0x2e8 __queue_work+0x4ce/0xd10 queue_work_on+0xf4/0x100 qeth_l2_switchdev_event+0x520/0x738 atomic_notifier_call_chain+0x9c/0xf8 br_switchdev_fdb_notify+0xf4/0x110 fdb_notify+0x122/0x180 fdb_add_entry.constprop.0.isra.0+0x312/0x558 br_fdb_add+0x59e/0x858 rtnl_fdb_add+0x58a/0x928 rtnetlink_rcv_msg+0x5f8/0x8d8 netlink_rcv_skb+0x1f2/0x408 netlink_unicast+0x570/0x790 netlink_sendmsg+0x752/0xbe0 sock_sendmsg+0xca/0x110 ____sys_sendmsg+0x510/0x6a8 ___sys_sendmsg+0x12a/0x180 __sys_sendmsg+0xe6/0x168 __do_sys_socketcall+0x3c8/0x468 do_syscall+0x22c/0x328 __do_syscall+0x94/0xf0 system_call+0x82/0xb0 Second to last potentially related work creation: kasan_save_stack+0x40/0x68 __kasan_record_aux_stack+0xbe/0xd0 kvfree_call_rcu+0xb2/0x760 kernfs_unlink_open_file+0x348/0x430 kernfs_fop_release+0xc2/0x320 __fput+0x1ae/0x768 task_work_run+0x1bc/0x298 exit_to_user_mode_prepare+0x1a0/0x1a8 __do_syscall+0x94/0xf0 system_call+0x82/0xb0 The buggy address belongs to the object at 00000000fdcea400 which belongs to the cache kmalloc-96 of size 96 The buggy address is located 64 bytes inside of 96-byte region [00000000fdcea400, 00000000fdcea460) The buggy address belongs to the physical page: page:000000005a9c26e8 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xfdcea flags: 0x3ffff00000000200(slab|node=0|zone=1|lastcpupid=0x1ffff) raw: 3ffff00000000200 0000000000000000 0000000100000122 000000008008cc00 raw: 0000000000000000 0020004100000000 ffffffff00000001 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: 00000000fdcea300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc 00000000fdcea380: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc >00000000fdcea400: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ^ 00000000fdcea480: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc 00000000fdcea500: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ================================================================== Fixes: f7936b7b2663 ("s390/qeth: Update MACs of LEARNING_SYNC device") Reported-by: Thorsten Winkler <twinkler@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Thorsten Winkler <twinkler@linux.ibm.com> Link: https://lore.kernel.org/r/20221207105304.20494-1-wintera@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08macsec: add missing attribute validation for offloadEmeel Hakim
Add missing attribute validation for IFLA_MACSEC_OFFLOAD to the netlink policy. Fixes: 791bb3fcafce ("net: macsec: add support for specifying offload upon link creation") Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/20221207101618.989-1-ehakim@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08net: mvneta: Fix an out of bounds checkDan Carpenter
In an earlier commit, I added a bounds check to prevent an out of bounds read and a WARN(). On further discussion and consideration that check was probably too aggressive. Instead of returning -EINVAL, a better fix would be to just prevent the out of bounds read but continue the process. Background: The value of "pp->rxq_def" is a number between 0-7 by default, or even higher depending on the value of "rxq_number", which is a module parameter. If the value is more than the number of available CPUs then it will trigger the WARN() in cpu_max_bits_warn(). Fixes: e8b4fc13900b ("net: mvneta: Prevent out of bounds read in mvneta_config_rss()") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/Y5A7d1E5ccwHTYPf@kadam Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08net: thunderbolt: fix memory leak in tbnet_open()Zhengchao Shao
When tb_ring_alloc_rx() failed in tbnet_open(), ida that allocated in tb_xdomain_alloc_out_hopid() is not released. Add tb_xdomain_release_out_hopid() to the error path to release ida. Fixes: 180b0689425c ("thunderbolt: Allow multiple DMA tunnels over a single XDomain connection") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20221207015001.1755826-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08net: dsa: microchip: add stats64 support for ksz8 series of switchesOleksij Rempel
Add stats64 support for ksz8xxx series of switches. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20221205052904.2834962-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08docs/zh_CN: Add LoongArch booting description's translationYanteng Si
Translate ../loongarch/booting.rst into Chinese. Suggested-by: Xiaotian Wu <wuxiaotian@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-12-08docs/LoongArch: Add booting descriptionYanteng Si
1, Describe the information passed from BootLoader to kernel. 2, Describe the meaning and values of the kernel image header field. Suggested-by: Xiaotian Wu <wuxiaotian@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-12-08LoongArch: mm: Fix huge page entry update for virtual machineHuacai Chen
In virtual machine (guest mode), the tlbwr instruction can not write the last entry of MTLB, so we need to make it non-present by invtlb and then write it by tlbfill. This also simplify the whole logic. Signed-off-by: Rui Wang <wangrui@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-12-08LoongArch: Export symbol for function smp_send_reschedule()Bibo Mao
Function smp_send_reschedule() is standard kernel API, which is defined in header file include/linux/smp.h. However, on LoongArch it is defined as an inline function, this is confusing and kernel modules can not use this function. Now we define smp_send_reschedule() as a general function, and add a EXPORT_SYMBOL_GPL on this function, so that kernel modules can use it. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-12-07Merge branch 'net-ethernet-ti-am65-cpsw-fix-set-channel-operation'Jakub Kicinski
Roger Quadros says: ==================== net: ethernet: ti: am65-cpsw: Fix set channel operation This contains a critical bug fix for the recently merged suspend/resume support [1] that broke set channel operation. (ethtool -L eth0 tx <n>) As there were 2 dependent patches on top of the offending commit [1] first revert them and then apply them back after the correct fix. [1] fd23df72f2be ("net: ethernet: ti: am65-cpsw: Add suspend/resume support") ==================== Link: https://lore.kernel.org/r/20221206094419.19478-1-rogerq@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-07net: ethernet: ti: am65-cpsw: Fix hardware switch mode on suspend/resumeRoger Quadros
On low power during system suspend the ALE table context is lost. Save the ALE context before suspend and restore it after resume. Signed-off-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-07net: ethernet: ti: am65-cpsw: retain PORT_VLAN_REG after suspend/resumeRoger Quadros
During suspend resume the context of PORT_VLAN_REG is lost so save it during suspend and restore it during resume for host port and slave ports. Signed-off-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-07net: ethernet: ti: am65-cpsw: Add suspend/resume supportRoger Quadros
Add PM handlers for System suspend/resume. As DMA driver doesn't yet support suspend/resume we free up the DMA channels at suspend and acquire and initialize them at resume. In this revised approach we do not free the TX/RX IRQs at am65_cpsw_nuss_common_stop() as it causes problems. We will now free them only on .suspend() as we need to release the DMA channels (as DMA looses context) and re-acquiring them on .resume() may not necessarily give us the same IRQs. To make this easier: - introduce am65_cpsw_nuss_remove_rx_chns() which is similar to am65_cpsw_nuss_remove_tx_chns(). These will be invoked in pm.suspend() to release the DMA channels and free up the IRQs. - move napi_add() and request_irq() calls to am65_cpsw_nuss_init_rx/tx_chns() so we can invoke them in pm.resume() to acquire the DMA channels and IRQs. As CPTS looses contect during suspend/resume, invoke the necessary CPTS suspend/resume helpers. ALE_CLEAR command is issued in cpsw_ale_start() so no need to issue it before the call to cpsw_ale_start(). Signed-off-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-07Revert "net: ethernet: ti: am65-cpsw: Add suspend/resume support"Roger Quadros
This reverts commit fd23df72f2be317d38d9fde0a8996b8e7454fd2a. This commit broke set channel operation. Revert this and implement it with a different approach in a separate patch. Signed-off-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-07Revert "net: ethernet: ti: am65-cpsw: retain PORT_VLAN_REG after suspend/resume"Roger Quadros
This reverts commit 643cf0e3ab5ccee37b3c53c018bd476c45c4b70e. This is to make it easier to revert the offending commit fd23df72f2be ("net: ethernet: ti: am65-cpsw: Add suspend/resume support") Signed-off-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-07Revert "net: ethernet: ti: am65-cpsw: Fix hardware switch mode on ↵Roger Quadros
suspend/resume" This reverts commit 1af3cb3702d02167926a2bd18580cecb2d64fd94. This is to make it easier to revert the offending commit fd23df72f2be ("net: ethernet: ti: am65-cpsw: Add suspend/resume support") Signed-off-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-07ipv6: avoid use-after-free in ip6_fragment()Eric Dumazet
Blamed commit claimed rcu_read_lock() was held by ip6_fragment() callers. It seems to not be always true, at least for UDP stack. syzbot reported: BUG: KASAN: use-after-free in ip6_dst_idev include/net/ip6_fib.h:245 [inline] BUG: KASAN: use-after-free in ip6_fragment+0x2724/0x2770 net/ipv6/ip6_output.c:951 Read of size 8 at addr ffff88801d403e80 by task syz-executor.3/7618 CPU: 1 PID: 7618 Comm: syz-executor.3 Not tainted 6.1.0-rc6-syzkaller-00012-g4312098baf37 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:284 [inline] print_report+0x15e/0x45d mm/kasan/report.c:395 kasan_report+0xbf/0x1f0 mm/kasan/report.c:495 ip6_dst_idev include/net/ip6_fib.h:245 [inline] ip6_fragment+0x2724/0x2770 net/ipv6/ip6_output.c:951 __ip6_finish_output net/ipv6/ip6_output.c:193 [inline] ip6_finish_output+0x9a3/0x1170 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip6_output+0x1f1/0x540 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:445 [inline] ip6_local_out+0xb3/0x1a0 net/ipv6/output_core.c:161 ip6_send_skb+0xbb/0x340 net/ipv6/ip6_output.c:1966 udp_v6_send_skb+0x82a/0x18a0 net/ipv6/udp.c:1286 udp_v6_push_pending_frames+0x140/0x200 net/ipv6/udp.c:1313 udpv6_sendmsg+0x18da/0x2c80 net/ipv6/udp.c:1606 inet6_sendmsg+0x9d/0xe0 net/ipv6/af_inet6.c:665 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 sock_write_iter+0x295/0x3d0 net/socket.c:1108 call_write_iter include/linux/fs.h:2191 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x9ed/0xdd0 fs/read_write.c:584 ksys_write+0x1ec/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fde3588c0d9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fde365b6168 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007fde359ac050 RCX: 00007fde3588c0d9 RDX: 000000000000ffdc RSI: 00000000200000c0 RDI: 000000000000000a RBP: 00007fde358e7ae9 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fde35acfb1f R14: 00007fde365b6300 R15: 0000000000022000 </TASK> Allocated by task 7618: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 __kasan_slab_alloc+0x82/0x90 mm/kasan/common.c:325 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slab.h:737 [inline] slab_alloc_node mm/slub.c:3398 [inline] slab_alloc mm/slub.c:3406 [inline] __kmem_cache_alloc_lru mm/slub.c:3413 [inline] kmem_cache_alloc+0x2b4/0x3d0 mm/slub.c:3422 dst_alloc+0x14a/0x1f0 net/core/dst.c:92 ip6_dst_alloc+0x32/0xa0 net/ipv6/route.c:344 ip6_rt_pcpu_alloc net/ipv6/route.c:1369 [inline] rt6_make_pcpu_route net/ipv6/route.c:1417 [inline] ip6_pol_route+0x901/0x1190 net/ipv6/route.c:2254 pol_lookup_func include/net/ip6_fib.h:582 [inline] fib6_rule_lookup+0x52e/0x6f0 net/ipv6/fib6_rules.c:121 ip6_route_output_flags_noref+0x2e6/0x380 net/ipv6/route.c:2625 ip6_route_output_flags+0x76/0x320 net/ipv6/route.c:2638 ip6_route_output include/net/ip6_route.h:98 [inline] ip6_dst_lookup_tail+0x5ab/0x1620 net/ipv6/ip6_output.c:1092 ip6_dst_lookup_flow+0x90/0x1d0 net/ipv6/ip6_output.c:1222 ip6_sk_dst_lookup_flow+0x553/0x980 net/ipv6/ip6_output.c:1260 udpv6_sendmsg+0x151d/0x2c80 net/ipv6/udp.c:1554 inet6_sendmsg+0x9d/0xe0 net/ipv6/af_inet6.c:665 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 __sys_sendto+0x23a/0x340 net/socket.c:2117 __do_sys_sendto net/socket.c:2129 [inline] __se_sys_sendto net/socket.c:2125 [inline] __x64_sys_sendto+0xe1/0x1b0 net/socket.c:2125 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 7599: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2e/0x40 mm/kasan/generic.c:511 ____kasan_slab_free mm/kasan/common.c:236 [inline] ____kasan_slab_free+0x160/0x1c0 mm/kasan/common.c:200 kasan_slab_free include/linux/kasan.h:177 [inline] slab_free_hook mm/slub.c:1724 [inline] slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1750 slab_free mm/slub.c:3661 [inline] kmem_cache_free+0xee/0x5c0 mm/slub.c:3683 dst_destroy+0x2ea/0x400 net/core/dst.c:127 rcu_do_batch kernel/rcu/tree.c:2250 [inline] rcu_core+0x81f/0x1980 kernel/rcu/tree.c:2510 __do_softirq+0x1fb/0xadc kernel/softirq.c:571 Last potentially related work creation: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 __kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:481 call_rcu+0x9d/0x820 kernel/rcu/tree.c:2798 dst_release net/core/dst.c:177 [inline] dst_release+0x7d/0xe0 net/core/dst.c:167 refdst_drop include/net/dst.h:256 [inline] skb_dst_drop include/net/dst.h:268 [inline] skb_release_head_state+0x250/0x2a0 net/core/skbuff.c:838 skb_release_all net/core/skbuff.c:852 [inline] __kfree_skb net/core/skbuff.c:868 [inline] kfree_skb_reason+0x151/0x4b0 net/core/skbuff.c:891 kfree_skb_list_reason+0x4b/0x70 net/core/skbuff.c:901 kfree_skb_list include/linux/skbuff.h:1227 [inline] ip6_fragment+0x2026/0x2770 net/ipv6/ip6_output.c:949 __ip6_finish_output net/ipv6/ip6_output.c:193 [inline] ip6_finish_output+0x9a3/0x1170 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip6_output+0x1f1/0x540 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:445 [inline] ip6_local_out+0xb3/0x1a0 net/ipv6/output_core.c:161 ip6_send_skb+0xbb/0x340 net/ipv6/ip6_output.c:1966 udp_v6_send_skb+0x82a/0x18a0 net/ipv6/udp.c:1286 udp_v6_push_pending_frames+0x140/0x200 net/ipv6/udp.c:1313 udpv6_sendmsg+0x18da/0x2c80 net/ipv6/udp.c:1606 inet6_sendmsg+0x9d/0xe0 net/ipv6/af_inet6.c:665 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 sock_write_iter+0x295/0x3d0 net/socket.c:1108 call_write_iter include/linux/fs.h:2191 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x9ed/0xdd0 fs/read_write.c:584 ksys_write+0x1ec/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Second to last potentially related work creation: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 __kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:481 call_rcu+0x9d/0x820 kernel/rcu/tree.c:2798 dst_release net/core/dst.c:177 [inline] dst_release+0x7d/0xe0 net/core/dst.c:167 refdst_drop include/net/dst.h:256 [inline] skb_dst_drop include/net/dst.h:268 [inline] __dev_queue_xmit+0x1b9d/0x3ba0 net/core/dev.c:4211 dev_queue_xmit include/linux/netdevice.h:3008 [inline] neigh_resolve_output net/core/neighbour.c:1552 [inline] neigh_resolve_output+0x51b/0x840 net/core/neighbour.c:1532 neigh_output include/net/neighbour.h:546 [inline] ip6_finish_output2+0x56c/0x1530 net/ipv6/ip6_output.c:134 __ip6_finish_output net/ipv6/ip6_output.c:195 [inline] ip6_finish_output+0x694/0x1170 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip6_output+0x1f1/0x540 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:445 [inline] NF_HOOK include/linux/netfilter.h:302 [inline] NF_HOOK include/linux/netfilter.h:296 [inline] mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820 mld_send_cr net/ipv6/mcast.c:2121 [inline] mld_ifc_work+0x720/0xdc0 net/ipv6/mcast.c:2653 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289 worker_thread+0x669/0x1090 kernel/workqueue.c:2436 kthread+0x2e8/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 The buggy address belongs to the object at ffff88801d403dc0 which belongs to the cache ip6_dst_cache of size 240 The buggy address is located 192 bytes inside of 240-byte region [ffff88801d403dc0, ffff88801d403eb0) The buggy address belongs to the physical page: page:ffffea00007500c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1d403 memcg:ffff888022f49c81 flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000000200 ffffea0001ef6580 dead000000000002 ffff88814addf640 raw: 0000000000000000 00000000800c000c 00000001ffffffff ffff888022f49c81 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 3719, tgid 3719 (kworker/0:6), ts 136223432244, free_ts 136222971441 prep_new_page mm/page_alloc.c:2539 [inline] get_page_from_freelist+0x10b5/0x2d50 mm/page_alloc.c:4288 __alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5555 alloc_pages+0x1aa/0x270 mm/mempolicy.c:2285 alloc_slab_page mm/slub.c:1794 [inline] allocate_slab+0x213/0x300 mm/slub.c:1939 new_slab mm/slub.c:1992 [inline] ___slab_alloc+0xa91/0x1400 mm/slub.c:3180 __slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3279 slab_alloc_node mm/slub.c:3364 [inline] slab_alloc mm/slub.c:3406 [inline] __kmem_cache_alloc_lru mm/slub.c:3413 [inline] kmem_cache_alloc+0x31a/0x3d0 mm/slub.c:3422 dst_alloc+0x14a/0x1f0 net/core/dst.c:92 ip6_dst_alloc+0x32/0xa0 net/ipv6/route.c:344 icmp6_dst_alloc+0x71/0x680 net/ipv6/route.c:3261 mld_sendpack+0x5de/0xe70 net/ipv6/mcast.c:1809 mld_send_cr net/ipv6/mcast.c:2121 [inline] mld_ifc_work+0x720/0xdc0 net/ipv6/mcast.c:2653 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289 worker_thread+0x669/0x1090 kernel/workqueue.c:2436 kthread+0x2e8/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1459 [inline] free_pcp_prepare+0x65c/0xd90 mm/page_alloc.c:1509 free_unref_page_prepare mm/page_alloc.c:3387 [inline] free_unref_page+0x1d/0x4d0 mm/page_alloc.c:3483 __unfreeze_partials+0x17c/0x1a0 mm/slub.c:2586 qlink_free mm/kasan/quarantine.c:168 [inline] qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187 kasan_quarantine_reduce+0x184/0x210 mm/kasan/quarantine.c:294 __kasan_slab_alloc+0x66/0x90 mm/kasan/common.c:302 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slab.h:737 [inline] slab_alloc_node mm/slub.c:3398 [inline] kmem_cache_alloc_node+0x304/0x410 mm/slub.c:3443 __alloc_skb+0x214/0x300 net/core/skbuff.c:497 alloc_skb include/linux/skbuff.h:1267 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1191 [inline] netlink_sendmsg+0x9a6/0xe10 net/netlink/af_netlink.c:1896 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 __sys_sendto+0x23a/0x340 net/socket.c:2117 __do_sys_sendto net/socket.c:2129 [inline] __se_sys_sendto net/socket.c:2125 [inline] __x64_sys_sendto+0xe1/0x1b0 net/socket.c:2125 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 1758fd4688eb ("ipv6: remove unnecessary dst_hold() in ip6_fragment()") Reported-by: syzbot+8c0ac31aa9681abb9e2d@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20221206101351.2037285-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>