summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-02net: hisilicon: Annotate struct rcb_common_cb with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct rcb_common_cb. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Yisen Zhuang <yisen.zhuang@huawei.com> Cc: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230922172858.3822653-6-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-02net: enetc: Annotate struct enetc_int_vector with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct enetc_int_vector. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Cc: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230922172858.3822653-5-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-02net: hns: Annotate struct ppe_common_cb with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct ppe_common_cb. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Yisen Zhuang <yisen.zhuang@huawei.com> Cc: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230922172858.3822653-4-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-02ipv6: Annotate struct ip6_sf_socklist with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct ip6_sf_socklist. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230922172858.3822653-3-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-02ipv4/igmp: Annotate struct ip_sf_socklist with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct ip_sf_socklist. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Martin KaFai Lau <martin.lau@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230922172858.3822653-2-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-02ipv4: Annotate struct fib_info with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct fib_info. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: David Ahern <dsahern@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230922172858.3822653-1-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-02Merge tag 'iommu-fixes-v6.6-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: - Arm SMMU fixes from Will Deacon: - Fix TLB range command encoding when TTL, Num and Scale are all zero - Fix soft lockup by limiting TLB invalidation ops issued by SVA - Fix clocks description for SDM630 platform in arm-smmu DT binding - Intel VT-d fix from Lu Baolu: - Fix a suspend/hibernation problem in iommu_suspend() - Mediatek driver: Fix page table sharing for addresses over 4GiB - Apple/Dart: DMA_FQ handling fix in attach_dev() * tag 'iommu-fixes-v6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Avoid memory allocation in iommu_suspend() iommu/apple-dart: Handle DMA_FQ domains in attach_dev() iommu/mediatek: Fix share pgtable for iova over 4GB iommu/arm-smmu-v3: Fix soft lockup triggered by arm_smmu_mm_invalidate_range dt-bindings: arm-smmu: Fix SDM630 clocks description iommu/arm-smmu-v3: Avoid constructing invalid range commands
2023-10-02ovl: make use of ->layers safe in rcu pathwalkAmir Goldstein
ovl_permission() accesses ->layers[...].mnt; we can't have ->layers freed without an RCU delay on fs shutdown. Fortunately, kern_unmount_array() that is used to drop those mounts does include an RCU delay, so freeing is delayed; unfortunately, the array passed to kern_unmount_array() is formed by mangling ->layers contents and that happens without any delays. The ->layers[...].name string entries are used to store the strings to display in "lowerdir=..." by ovl_show_options(). Those entries are not accessed in RCU walk. Move the name strings into a separate array ofs->config.lowerdirs and reuse the ofs->config.lowerdirs array as the temporary mount array to pass to kern_unmount_array(). Reported-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20231002023711.GP3389589@ZenIV/ Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-02ovl: fetch inode once in ovl_dentry_revalidate_common()Al Viro
d_inode_rcu() is right - we might be in rcu pathwalk; however, OVL_E() hides plain d_inode() on the same dentry... Fixes: a6ff2bc0be17 ("ovl: use OVL_E() and OVL_E_FLAGS() accessors") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-02ovl: move freeing ovl_entry past rcu delayAl Viro
... into ->free_inode(), that is. Fixes: 0af950f57fef "ovl: move ovl_entry into ovl_inode" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-02ovl: fix file reference leak when submitting aioAmir Goldstein
Commit 724768a39374 ("ovl: fix incorrect fdput() on aio completion") took a refcount on real file before submitting aio, but forgot to avoid clearing FDPUT_FPUT from real.flags stack variable. This can result in a file reference leak. Fixes: 724768a39374 ("ovl: fix incorrect fdput() on aio completion") Reported-by: Gil Lev <contact@levgil.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-02Merge branch 'mlxsw-next'David S. Miller
Petr Machata says: ==================== mlxsw: Provide enhancements and new feature Vadim Pasternak writes: Patch #1 - Optimize transaction size for efficient retrieval of module data. Patch #3 - Enable thermal zone binding with new cooling device. Patch #4 - Employ standard macros for dividing buffer into the chunks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-02mlxsw: i2c: Utilize standard macros for dividing buffer into chunksVadim Pasternak
Use standard macro DIV_ROUND_UP() to determine the number of chunks required for a given buffer. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-02mlxsw: core: Extend allowed list of external cooling devices for thermal ↵Vadim Pasternak
zone binding Extend the list of allowed external cooling devices for thermal zone binding to include devices of type "emc2305". The motivation is to provide support for the system SN2201, which is equipped with the Spectrum-1 ASIC. The system's airflow control is managed by the EMC2305 RPM-based PWM Fan Speed Controller as the cooling device. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-02mlxsw: reg: Limit MTBR register payload to a single data recordVadim Pasternak
The MTBR register is used to read temperatures from multiple sensors in one transaction, but the driver only reads from a single sensor in each transaction. Rrestrict the payload size of the MTBR register to prevent the transmission of redundant data to the firmware. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-02sky2: Make sure there is at least one frag_addr availableKees Cook
In the pathological case of building sky2 with 16k PAGE_SIZE, the frag_addr[] array would never be used, so the original code was correct that size should be 0. But the compiler now gets upset with 0 size arrays in places where it hasn't eliminated the code that might access such an array (it can't figure out that in this case an rx skb with fragments would never be created). To keep the compiler happy, make sure there is at least 1 frag_addr in struct rx_ring_info: In file included from include/linux/skbuff.h:28, from include/net/net_namespace.h:43, from include/linux/netdevice.h:38, from drivers/net/ethernet/marvell/sky2.c:18: drivers/net/ethernet/marvell/sky2.c: In function 'sky2_rx_unmap_skb': include/linux/dma-mapping.h:416:36: warning: array subscript i is outside array bounds of 'dma_addr_t[0]' {aka 'long long unsigned int[]'} [-Warray-bounds=] 416 | #define dma_unmap_page(d, a, s, r) dma_unmap_page_attrs(d, a, s, r, 0) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/marvell/sky2.c:1257:17: note: in expansion of macro 'dma_unmap_page' 1257 | dma_unmap_page(&pdev->dev, re->frag_addr[i], | ^~~~~~~~~~~~~~ In file included from drivers/net/ethernet/marvell/sky2.c:41: drivers/net/ethernet/marvell/sky2.h:2198:25: note: while referencing 'frag_addr' 2198 | dma_addr_t frag_addr[ETH_JUMBO_MTU >> PAGE_SHIFT]; | ^~~~~~~~~ With CONFIG_PAGE_SIZE_16KB=y, PAGE_SHIFT == 14, so: #define ETH_JUMBO_MTU 9000 causes "ETH_JUMBO_MTU >> PAGE_SHIFT" to be 0. Use "?: 1" to solve this build warning. Cc: Mirko Lindner <mlindner@marvell.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202309191958.UBw1cjXk-lkp@intel.com/ Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-02net: dsa: mv88e6xxx: Avoid EEPROM timeout when EEPROM is absentFabio Estevam
Since commit 23d775f12dcd ("net: dsa: mv88e6xxx: Wait for EEPROM done before HW reset") the following error is seen on a imx8mn board with a 88E6320 switch: mv88e6085 30be0000.ethernet-1:00: Timeout waiting for EEPROM done This board does not have an EEPROM attached to the switch though. This problem is well explained by Andrew Lunn: "If there is an EEPROM, and the EEPROM contains a lot of data, it could be that when we perform a hardware reset towards the end of probe, it interrupts an I2C bus transaction, leaving the I2C bus in a bad state, and future reads of the EEPROM do not work. The work around for this was to poll the EEInt status and wait for it to go true before performing the hardware reset. However, we have discovered that for some boards which do not have an EEPROM, EEInt never indicates complete. As a result, mv88e6xxx_g1_wait_eeprom_done() spins for a second and then prints a warning. We probably need a different solution than calling mv88e6xxx_g1_wait_eeprom_done(). The datasheet for 6352 documents the EEPROM Command register: bit 15 is: EEPROM Unit Busy. This bit must be set to a one to start an EEPROM operation (see EEOp below). Only one EEPROM operation can be executing at one time so this bit must be zero before setting it to a one. When the requested EEPROM operation completes this bit will automatically be cleared to a zero. The transition of this bit from a one to a zero can be used to generate an interrupt (the EEInt in Global 1, offset 0x00). and more interesting is bit 11: Register Loader Running. This bit is set to one whenever the register loader is busy executing instructions contained in the EEPROM." Change to using mv88e6xxx_g2_eeprom_wait() to fix the timeout error when the EEPROM chip is not present. Fixes: 23d775f12dcd ("net: dsa: mv88e6xxx: Wait for EEPROM done before HW reset") Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Fabio Estevam <festevam@denx.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-02ptp: ocp: Fix error handling in ptp_ocp_device_initDinghao Liu
When device_add() fails, ptp_ocp_dev_release() will be called after put_device(). Therefore, it seems that the ptp_ocp_dev_release() before put_device() is redundant. Fixes: 773bda964921 ("ptp: ocp: Expose various resources on the timecard.") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: Vadim Feodrenko <vadim.fedorenko@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01Linux 6.6-rc4v6.6-rc4Linus Torvalds
2023-10-01Merge tag 'kbuild-fixes-v6.6-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix the module compression with xz so the in-kernel decompressor works - Document a kconfig idiom to express an optional dependency between modules - Make modpost, when W=1 is given, detect broken drivers that reference .exit.* sections - Remove unused code * tag 'kbuild-fixes-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: remove stale code for 'source' symlink in packaging scripts modpost: Don't let "driver"s reference .exit.* vmlinux.lds.h: remove unused CPU_KEEP and CPU_DISCARD macros modpost: add missing else to the "of" check Documentation: kbuild: explain handling optional dependencies kbuild: Use CRC32 and a 1MiB dictionary for XZ compressed modules
2023-10-01Merge tag 'mm-hotfixes-stable-2023-10-01-08-34' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Fourteen hotfixes, eleven of which are cc:stable. The remainder pertain to issues which were introduced after 6.5" * tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: Crash: add lock to serialize crash hotplug handling selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions() mm, memcg: reconsider kmem.limit_in_bytes deprecation mm: zswap: fix potential memory corruption on duplicate store arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries mm: hugetlb: add huge page size param to set_huge_pte_at() maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states maple_tree: add mas_is_active() to detect in-tree walks nilfs2: fix potential use after free in nilfs_gccache_submit_read_data() mm: abstract moving to the next PFN mm: report success more often from filemap_map_folio_range() fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
2023-10-01Merge tag 'char-misc-6.6-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull misc driver fix from Greg KH: "Here is a single, much requested, fix for a set of misc drivers to resolve a much reported regression in the -rc series that has also propagated back to the stable releases. Sorry for the delay, lots of conference travel for a few weeks put me very far behind in patch wrangling. It has been reported by many to resolve the reported problem, and has been in linux-next with no reported issues" * tag 'char-misc-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: misc: rtsx: Fix some platforms can not boot and move the l1ss judgment to probe
2023-10-01Merge tag 'tty-6.6-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial driver fixes from Greg KH: "Here are two tty/serial driver fixes for 6.6-rc4 that resolve some reported regressions: - revert a n_gsm change that ended up causing problems - 8250_port fix for irq data both have been in linux-next for over a week with no reported problems" * tag 'tty-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: Revert "tty: n_gsm: fix UAF in gsm_cleanup_mux" serial: 8250_port: Check IRQ data before use
2023-10-01Merge branch 'inet-more-data-race-fixes'David S. Miller
Eric Dumazet says: ==================== inet: more data-race fixes This series fixes some existing data-races on inet fields: inet->mc_ttl, inet->pmtudisc, inet->tos, inet->uc_index, inet->mc_index and inet->mc_addr. While fixing them, we convert eight socket options to lockless implementation. v2: addressed David Ahern feedback on ("inet: implement lockless IP_TOS") Added David Reviewed-by: tag on other patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01inet: implement lockless getsockopt(IP_MULTICAST_IF)Eric Dumazet
Add missing annotations to inet->mc_index and inet->mc_addr to fix data-races. getsockopt(IP_MULTICAST_IF) can be lockless. setsockopt() side is left for later. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01inet: lockless IP_PKTOPTIONS implementationEric Dumazet
Current implementation is already lockless, because the socket lock is released before reading socket fields. Add missing READ_ONCE() annotations. Note that corresponding WRITE_ONCE() are needed, the order of the patches do not really matter. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01inet: implement lockless getsockopt(IP_UNICAST_IF)Eric Dumazet
Add missing READ_ONCE() annotations when reading inet->uc_index Implementing getsockopt(IP_UNICAST_IF) locklessly seems possible, the setsockopt() part might not be possible at the moment. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01inet: lockless getsockopt(IP_MTU)Eric Dumazet
sk_dst_get() does not require socket lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01inet: lockless getsockopt(IP_OPTIONS)Eric Dumazet
inet->inet_opt being RCU protected, we can use RCU instead of locking the socket. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01inet: implement lockless IP_TOSEric Dumazet
Some reads of inet->tos are racy. Add needed READ_ONCE() annotations and convert IP_TOS option lockless. v2: missing changes in include/net/route.h (David Ahern) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01inet: implement lockless IP_MTU_DISCOVEREric Dumazet
inet->pmtudisc can be read locklessly. Implement proper lockless reads and writes to inet->pmtudisc ip_sock_set_mtu_discover() can now be called from arbitrary contexts. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01inet: implement lockless IP_MULTICAST_TTLEric Dumazet
inet->mc_ttl can be read locklessly. Implement proper lockless reads and writes to inet->mc_ttl Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01net: prevent address rewrite in kernel_bind()Jordan Rife
Similar to the change in commit 0bdf399342c5("net: Avoid address overwrite in kernel_connect"), BPF hooks run on bind may rewrite the address passed to kernel_bind(). This change 1) Makes a copy of the bind address in kernel_bind() to insulate callers. 2) Replaces direct calls to sock->ops->bind() in net with kernel_bind() Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/ Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind") Cc: stable@vger.kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jordan Rife <jrife@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01net: prevent rewrite of msg_name in sock_sendmsg()Jordan Rife
Callers of sock_sendmsg(), and similarly kernel_sendmsg(), in kernel space may observe their value of msg_name change in cases where BPF sendmsg hooks rewrite the send address. This has been confirmed to break NFS mounts running in UDP mode and has the potential to break other systems. This patch: 1) Creates a new function called __sock_sendmsg() with same logic as the old sock_sendmsg() function. 2) Replaces calls to sock_sendmsg() made by __sys_sendto() and __sys_sendmsg() with __sock_sendmsg() to avoid an unnecessary copy, as these system calls are already protected. 3) Modifies sock_sendmsg() so that it makes a copy of msg_name if present before passing it down the stack to insulate callers from changes to the send address. Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/ Fixes: 1cedee13d25a ("bpf: Hooks for sys_sendmsg") Cc: stable@vger.kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jordan Rife <jrife@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01net: replace calls to sock->ops->connect() with kernel_connect()Jordan Rife
commit 0bdf399342c5 ("net: Avoid address overwrite in kernel_connect") ensured that kernel_connect() will not overwrite the address parameter in cases where BPF connect hooks perform an address rewrite. This change replaces direct calls to sock->ops->connect() in net with kernel_connect() to make these call safe. Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/ Fixes: d74bad4e74ee ("bpf: Hooks for sys_connect") Cc: stable@vger.kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jordan Rife <jrife@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01Merge branch 'socket-option-lockless'David S. Miller
Eric Dumazet says: ==================== net: more data-races fixes and lockless socket options This is yet another round of data-races fixes, and lockless socket options. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01net: annotate data-races around sk->sk_dst_pending_confirmEric Dumazet
This field can be read or written without socket lock being held. Add annotations to avoid load-store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01net: annotate data-races around sk->sk_tx_queue_mappingEric Dumazet
This field can be read or written without socket lock being held. Add annotations to avoid load-store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01net: lockless implementation of SO_TXREHASHEric Dumazet
sk->sk_txrehash readers are already safe against concurrent change of this field. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01net: implement lockless SO_MAX_PACING_RATEEric Dumazet
SO_MAX_PACING_RATE setsockopt() does not need to hold the socket lock, because sk->sk_pacing_rate readers can run fine if the value is changed by other threads, after adding READ_ONCE() accessors. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01net: lockless implementation of SO_BUSY_POLL, SO_PREFER_BUSY_POLL, ↵Eric Dumazet
SO_BUSY_POLL_BUDGET Setting sk->sk_ll_usec, sk_prefer_busy_poll and sk_busy_poll_budget do not require the socket lock, readers are lockless anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01net: lockless SO_{TYPE|PROTOCOL|DOMAIN|ERROR } setsockopt()Eric Dumazet
This options can not be set and return -ENOPROTOOPT, no need to acqure socket lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01net: lockless SO_PASSCRED, SO_PASSPIDFD and SO_PASSSECEric Dumazet
sock->flags are atomic, no need to hold the socket lock in sk_setsockopt() for SO_PASSCRED, SO_PASSPIDFD and SO_PASSSEC. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01net: implement lockless SO_PRIORITYEric Dumazet
This is a followup of 8bf43be799d4 ("net: annotate data-races around sk->sk_priority"). sk->sk_priority can be read and written without holding the socket lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01openvswitch: reduce stack usage in do_execute_actionsIlya Maximets
do_execute_actions() function can be called recursively multiple times while executing actions that require pipeline forking or recirculations. It may also be re-entered multiple times if the packet leaves openvswitch module and re-enters it through a different port. Currently, there is a 256-byte array allocated on stack in this function that is supposed to hold NSH header. Compilers tend to pre-allocate that space right at the beginning of the function: a88: 48 81 ec b0 01 00 00 sub $0x1b0,%rsp NSH is not a very common protocol, but the space is allocated on every recursive call or re-entry multiplying the wasted stack space. Move the stack allocation to push_nsh() function that is only used if NSH actions are actually present. push_nsh() is also a simple function without a possibility for re-entry, so the stack is returned right away. With this change the preallocated space is reduced by 256 B per call: b18: 48 81 ec b0 00 00 00 sub $0xb0,%rsp Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Eelco Chaudron echaudro@redhat.com Reviewed-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()David Howells
Including the transhdrlen in length is a problem when the packet is partially filled (e.g. something like send(MSG_MORE) happened previously) when appending to an IPv4 or IPv6 packet as we don't want to repeat the transport header or account for it twice. This can happen under some circumstances, such as splicing into an L2TP socket. The symptom observed is a warning in __ip6_append_data(): WARNING: CPU: 1 PID: 5042 at net/ipv6/ip6_output.c:1800 __ip6_append_data.isra.0+0x1be8/0x47f0 net/ipv6/ip6_output.c:1800 that occurs when MSG_SPLICE_PAGES is used to append more data to an already partially occupied skbuff. The warning occurs when 'copy' is larger than the amount of data in the message iterator. This is because the requested length includes the transport header length when it shouldn't. This can be triggered by, for example: sfd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_L2TP); bind(sfd, ...); // ::1 connect(sfd, ...); // ::1 port 7 send(sfd, buffer, 4100, MSG_MORE); sendfile(sfd, dfd, NULL, 1024); Fix this by only adding transhdrlen into the length if the write queue is empty in l2tp_ip6_sendmsg(), analogously to how UDP does things. l2tp_ip_sendmsg() looks like it won't suffer from this problem as it builds the UDP packet itself. Fixes: a32e0eec7042 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6") Reported-by: syzbot+62cbf263225ae13ff153@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/0000000000001c12b30605378ce8@google.com/ Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Eric Dumazet <edumazet@google.com> cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> cc: "David S. Miller" <davem@davemloft.net> cc: David Ahern <dsahern@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: Jakub Kicinski <kuba@kernel.org> cc: netdev@vger.kernel.org cc: bpf@vger.kernel.org cc: syzkaller-bugs@googlegroups.com Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01Merge tag 'x86-urgent-2023-10-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: a kerneldoc build warning fix, add SRSO mitigation for AMD-derived Hygon processors, and fix a SGX kernel crash in the page fault handler that can trigger when ksgxd races to reclaim the SECS special page, by making the SECS page unswappable" * tag 'x86-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Resolves SECS reclaim vs. page fault for EAUG race x86/srso: Add SRSO mitigation for Hygon processors x86/kgdb: Fix a kerneldoc warning when build with W=1
2023-10-01Merge tag 'timers-urgent-2023-10-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix a spurious kernel warning during CPU hotplug events that may trigger when timer/hrtimer softirqs are pending, which are otherwise hotplug-safe and don't merit a warning" * tag 'timers-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers: Tag (hr)timer softirq as hotplug safe
2023-10-01Merge tag 'sched-urgent-2023-10-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Fix a RT tasks related lockup/live-lock during CPU offlining" * tag 'sched-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/rt: Fix live lock between select_fallback_rq() and RT push
2023-10-01Merge tag 'perf-urgent-2023-10-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fixes from Ingo Molnar: "Misc fixes: work around an AMD microcode bug on certain models, and fix kexec kernel PMI handlers on AMD systems that get loaded on older kernels that have an unexpected register state" * tag 'perf-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/amd: Do not WARN() on every IRQ perf/x86/amd/core: Fix overflow reset on hotplug