summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-11-22Merge branch 'nh-group-refcnt'David S. Miller
Nikolay Aleksandrov says: ==================== net: nexthop: fix refcount issues when replacing groups This set fixes a refcount bug when replacing nexthop groups and modifying routes. It is complex because the objects look valid when debugging memory dumps, but we end up having refcount dependency between unlinked objects which can never be released, so in turn they cannot free their resources and refcounts. The problem happens because we can have stale IPv6 per-cpu dsts in nexthops which were removed from a group. Even though the IPv6 gen is bumped, the dsts won't be released until traffic passes through them or the nexthop is freed, that can take arbitrarily long time, and even worse we can create a scenario[1] where it can never be released. The fix is to release the IPv6 per-cpu dsts of replaced nexthops after an RCU grace period so no new ones can be created. To do that we add a new IPv6 stub - fib6_nh_release_dsts, which is used by the nexthop code only when necessary. We can further optimize group replacement, but that is more suited for net-next as these patches would have to be backported to stable releases. v2: patch 02: update commit msg patch 03: check for mausezahn before testing and make a few comments more verbose [1] This info is also present in patch 02's commit message. Initial state: $ ip nexthop list id 200 via 2002:db8::2 dev bridge.10 scope link onlink id 201 via 2002:db8::3 dev bridge scope link onlink id 203 group 201/200 $ ip -6 route 2001:db8::10 nhid 203 metric 1024 pref medium nexthop via 2002:db8::3 dev bridge weight 1 onlink nexthop via 2002:db8::2 dev bridge.10 weight 1 onlink Create rt6_info through one of the multipath legs, e.g.: $ taskset -a -c 1 ./pkt_inj 24 bridge.10 2001:db8::10 (pkt_inj is just a custom packet generator, nothing special) Then remove that leg from the group by replace (let's assume it is id 200 in this case): $ ip nexthop replace id 203 group 201 Now remove the IPv6 route: $ ip -6 route del 2001:db8::10/128 The route won't be really deleted due to the stale rt6_info holding 1 refcnt in nexthop id 200. At this point we have the following reference count dependency: (deleted) IPv6 route holds 1 reference over nhid 203 nh 203 holds 1 ref over id 201 nh 200 holds 1 ref over the net device and the route due to the stale rt6_info Now to create circular dependency between nh 200 and the IPv6 route, and also to get a reference over nh 200, restore nhid 200 in the group: $ ip nexthop replace id 203 group 201/200 And now we have a permanent circular dependncy because nhid 203 holds a reference over nh 200 and 201, but the route holds a ref over nh 203 and is deleted. To trigger the bug just delete the group (nhid 203): $ ip nexthop del id 203 It won't really be deleted due to the IPv6 route dependency, and now we have 2 unlinked and deleted objects that reference each other: the group and the IPv6 route. Since the group drops the reference it holds over its entries at free time (i.e. its own refcount needs to drop to 0) that will never happen and we get a permanent ref on them, since one of the entries holds a reference over the IPv6 route it will also never be released. At this point the dependencies are: (deleted, only unlinked) IPv6 route holds reference over group nh 203 (deleted, only unlinked) group nh 203 holds reference over nh 201 and 200 nh 200 holds 1 ref over the net device and the route due to the stale rt6_info This is the last point where it can be fixed by running traffic through nh 200, and specifically through the same CPU so the rt6_info (dst) will get released due to the IPv6 genid, that in turn will free the IPv6 route, which in turn will free the ref count over the group nh 203. If nh 200 is deleted at this point, it will never be released due to the ref from the unlinked group 203, it will only be unlinked: $ ip nexthop del id 200 $ ip nexthop $ Now we can never release that stale rt6_info, we have IPv6 route with ref over group nh 203, group nh 203 with ref over nh 200 and 201, nh 200 with rt6_info (dst) with ref over the net device and the IPv6 route. All of these objects are only unlinked, and cannot be released, thus they can't release their ref counts. Message from syslogd@dev at Nov 19 14:04:10 ... kernel:[73501.828730] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3 Message from syslogd@dev at Nov 19 14:04:20 ... kernel:[73512.068811] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22selftests: net: fib_nexthops: add test for group refcount imbalance bugNikolay Aleksandrov
The new selftest runs a sequence which causes circular refcount dependency between deleted objects which cannot be released and results in a netdevice refcount imbalance. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop groupNikolay Aleksandrov
When replacing a nexthop group, we must release the IPv6 per-cpu dsts of the removed nexthop entries after an RCU grace period because they contain references to the nexthop's net device and to the fib6 info. With specific series of events[1] we can reach net device refcount imbalance which is unrecoverable. IPv4 is not affected because dsts don't take a refcount on the route. [1] $ ip nexthop list id 200 via 2002:db8::2 dev bridge.10 scope link onlink id 201 via 2002:db8::3 dev bridge scope link onlink id 203 group 201/200 $ ip -6 route 2001:db8::10 nhid 203 metric 1024 pref medium nexthop via 2002:db8::3 dev bridge weight 1 onlink nexthop via 2002:db8::2 dev bridge.10 weight 1 onlink Create rt6_info through one of the multipath legs, e.g.: $ taskset -a -c 1 ./pkt_inj 24 bridge.10 2001:db8::10 (pkt_inj is just a custom packet generator, nothing special) Then remove that leg from the group by replace (let's assume it is id 200 in this case): $ ip nexthop replace id 203 group 201 Now remove the IPv6 route: $ ip -6 route del 2001:db8::10/128 The route won't be really deleted due to the stale rt6_info holding 1 refcnt in nexthop id 200. At this point we have the following reference count dependency: (deleted) IPv6 route holds 1 reference over nhid 203 nh 203 holds 1 ref over id 201 nh 200 holds 1 ref over the net device and the route due to the stale rt6_info Now to create circular dependency between nh 200 and the IPv6 route, and also to get a reference over nh 200, restore nhid 200 in the group: $ ip nexthop replace id 203 group 201/200 And now we have a permanent circular dependncy because nhid 203 holds a reference over nh 200 and 201, but the route holds a ref over nh 203 and is deleted. To trigger the bug just delete the group (nhid 203): $ ip nexthop del id 203 It won't really be deleted due to the IPv6 route dependency, and now we have 2 unlinked and deleted objects that reference each other: the group and the IPv6 route. Since the group drops the reference it holds over its entries at free time (i.e. its own refcount needs to drop to 0) that will never happen and we get a permanent ref on them, since one of the entries holds a reference over the IPv6 route it will also never be released. At this point the dependencies are: (deleted, only unlinked) IPv6 route holds reference over group nh 203 (deleted, only unlinked) group nh 203 holds reference over nh 201 and 200 nh 200 holds 1 ref over the net device and the route due to the stale rt6_info This is the last point where it can be fixed by running traffic through nh 200, and specifically through the same CPU so the rt6_info (dst) will get released due to the IPv6 genid, that in turn will free the IPv6 route, which in turn will free the ref count over the group nh 203. If nh 200 is deleted at this point, it will never be released due to the ref from the unlinked group 203, it will only be unlinked: $ ip nexthop del id 200 $ ip nexthop $ Now we can never release that stale rt6_info, we have IPv6 route with ref over group nh 203, group nh 203 with ref over nh 200 and 201, nh 200 with rt6_info (dst) with ref over the net device and the IPv6 route. All of these objects are only unlinked, and cannot be released, thus they can't release their ref counts. Message from syslogd@dev at Nov 19 14:04:10 ... kernel:[73501.828730] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3 Message from syslogd@dev at Nov 19 14:04:20 ... kernel:[73512.068811] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3 Fixes: 7bf4796dd099 ("nexthops: add support for replace") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22net: ipv6: add fib6_nh_release_dsts stubNikolay Aleksandrov
We need a way to release a fib6_nh's per-cpu dsts when replacing nexthops otherwise we can end up with stale per-cpu dsts which hold net device references, so add a new IPv6 stub called fib6_nh_release_dsts. It must be used after an RCU grace period, so no new dsts can be created through a group's nexthop entry. Similar to fib6_nh_release it shouldn't be used if fib6_nh_init has failed so it doesn't need a dummy stub when IPv6 is not enabled. Fixes: 7bf4796dd099 ("nexthops: add support for replace") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22net, neigh: Fix crash in v6 module initialization error pathDaniel Borkmann
When IPv6 module gets initialized, but it's hitting an error in inet6_init() where it then needs to undo all the prior initialization work, it also might do a call to ndisc_cleanup() which then calls neigh_table_clear(). In there is a missing timer cancellation of the table's managed_work item. The kernel test robot explicitly triggered this error path and caused a UAF crash similar to the below: [...] [ 28.833183][ C0] BUG: unable to handle page fault for address: f7a43288 [ 28.833973][ C0] #PF: supervisor write access in kernel mode [ 28.834660][ C0] #PF: error_code(0x0002) - not-present page [ 28.835319][ C0] *pde = 06b2c067 *pte = 00000000 [ 28.835853][ C0] Oops: 0002 [#1] PREEMPT [ 28.836367][ C0] CPU: 0 PID: 303 Comm: sed Not tainted 5.16.0-rc1-00233-g83ff5faa0d3b #7 [ 28.837293][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014 [ 28.838338][ C0] EIP: __run_timers.constprop.0+0x82/0x440 [...] [ 28.845607][ C0] Call Trace: [ 28.845942][ C0] <SOFTIRQ> [ 28.846333][ C0] ? check_preemption_disabled.isra.0+0x2a/0x80 [ 28.846975][ C0] ? __this_cpu_preempt_check+0x8/0xa [ 28.847570][ C0] run_timer_softirq+0xd/0x40 [ 28.848050][ C0] __do_softirq+0xf5/0x576 [ 28.848547][ C0] ? __softirqentry_text_start+0x10/0x10 [ 28.849127][ C0] do_softirq_own_stack+0x2b/0x40 [ 28.849749][ C0] </SOFTIRQ> [ 28.850087][ C0] irq_exit_rcu+0x7d/0xc0 [ 28.850587][ C0] common_interrupt+0x2a/0x40 [ 28.851068][ C0] asm_common_interrupt+0x119/0x120 [...] Note that IPv6 module cannot be unloaded as per 8ce440610357 ("ipv6: do not allow ipv6 module to be removed") hence this can only be seen during module initialization error. Tested with kernel test robot's reproducer. Fixes: 7482e3841d52 ("net, neigh: Add NTF_MANAGED flag for managed neighbor entries") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Li Zhijian <zhijianx.li@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22nixge: fix mac address error handling againArnd Bergmann
The change to eth_hw_addr_set() caused gcc to correctly spot a bug that was introduced in an earlier incorrect fix: In file included from include/linux/etherdevice.h:21, from drivers/net/ethernet/ni/nixge.c:7: In function '__dev_addr_set', inlined from 'eth_hw_addr_set' at include/linux/etherdevice.h:319:2, inlined from 'nixge_probe' at drivers/net/ethernet/ni/nixge.c:1286:3: include/linux/netdevice.h:4648:9: error: 'memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread] 4648 | memcpy(dev->dev_addr, addr, len); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As nixge_get_nvmem_address() can return either NULL or an error pointer, the NULL check is wrong, and we can end up reading from ERR_PTR(-EOPNOTSUPP), which gcc knows to contain zero readable bytes. Make the function always return an error pointer again but fix the check to match that. Fixes: f3956ebb3bf0 ("ethernet: use eth_hw_addr_set() instead of ether_addr_copy()") Fixes: abcd3d6fc640 ("net: nixge: Fix error path for obtaining mac address") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22net/smc: Avoid warning of possible recursive lockingWen Gu
Possible recursive locking is detected by lockdep when SMC falls back to TCP. The corresponding warnings are as follows: ============================================ WARNING: possible recursive locking detected 5.16.0-rc1+ #18 Tainted: G E -------------------------------------------- wrk/1391 is trying to acquire lock: ffff975246c8e7d8 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0x109/0x250 [smc] but task is already holding lock: ffff975246c8f918 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0xfe/0x250 [smc] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&ei->socket.wq.wait); lock(&ei->socket.wq.wait); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by wrk/1391: #0: ffff975246040130 (sk_lock-AF_SMC){+.+.}-{0:0}, at: smc_connect+0x43/0x150 [smc] #1: ffff975246c8f918 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0xfe/0x250 [smc] stack backtrace: Call Trace: <TASK> dump_stack_lvl+0x56/0x7b __lock_acquire+0x951/0x11f0 lock_acquire+0x27a/0x320 ? smc_switch_to_fallback+0x109/0x250 [smc] ? smc_switch_to_fallback+0xfe/0x250 [smc] _raw_spin_lock_irq+0x3b/0x80 ? smc_switch_to_fallback+0x109/0x250 [smc] smc_switch_to_fallback+0x109/0x250 [smc] smc_connect_fallback+0xe/0x30 [smc] __smc_connect+0xcf/0x1090 [smc] ? mark_held_locks+0x61/0x80 ? __local_bh_enable_ip+0x77/0xe0 ? lockdep_hardirqs_on+0xbf/0x130 ? smc_connect+0x12a/0x150 [smc] smc_connect+0x12a/0x150 [smc] __sys_connect+0x8a/0xc0 ? syscall_enter_from_user_mode+0x20/0x70 __x64_sys_connect+0x16/0x20 do_syscall_64+0x34/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae The nested locking in smc_switch_to_fallback() is considered to possibly cause a deadlock because smc_wait->lock and clc_wait->lock are the same type of lock. But actually it is safe so far since there is no other place trying to obtain smc_wait->lock when clc_wait->lock is held. So the patch replaces spin_lock() with spin_lock_nested() to avoid false report by lockdep. Link: https://lkml.org/lkml/2021/11/19/962 Fixes: 2153bd1e3d3d ("Transfer remaining wait queue entries during fallback") Reported-by: syzbot+e979d3597f48262cb4ee@syzkaller.appspotmail.com Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Acked-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22vsock/virtio: suppress used length validationMichael S. Tsirkin
It turns out that vhost vsock violates the virtio spec by supplying the out buffer length in the used length (should just be the in length). As a result, attempts to validate the used length fail with: vmw_vsock_virtio_transport virtio1: tx: used len 44 is larger than in buflen 0 Since vsock driver does not use the length fox tx and validates the length before use for rx, it is safe to suppress the validation in virtio core for this driver. Reported-by: Halil Pasic <pasic@linux.ibm.com> Fixes: 939779f5152d ("virtio_ring: validate used buffer length") Cc: "Jason Wang" <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22net: ax88796c: do not receive data in pointerNicolas Iooss
Function axspi_read_status calls: ret = spi_write_then_read(ax_spi->spi, ax_spi->cmd_buf, 1, (u8 *)&status, 3); status is a pointer to a struct spi_status, which is 3-byte wide: struct spi_status { u16 isr; u8 status; }; But &status is the pointer to this pointer, and spi_write_then_read does not dereference this parameter: int spi_write_then_read(struct spi_device *spi, const void *txbuf, unsigned n_tx, void *rxbuf, unsigned n_rx) Therefore axspi_read_status currently receive a SPI response in the pointer status, which overwrites 24 bits of the pointer. Thankfully, on Little-Endian systems, the pointer is only used in le16_to_cpus(&status->isr); ... which is a no-operation. So there, the overwritten pointer is not dereferenced. Nevertheless on Big-Endian systems, this can lead to dereferencing pointers after their 24 most significant bits were overwritten. And in all systems this leads to possible use of uninitialized value in functions calling spi_write_then_read which expect status to be initialized when the function returns. Moreover function axspi_read_status (and macro AX_READ_STATUS) do not seem to be used anywhere. So currently this seems to be dead code. Fix the issue anyway so that future code works properly when using function axspi_read_status. Fixes: a97c69ba4f30 ("net: ax88796c: ASIX AX88796C SPI Ethernet Adapter Driver") Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Acked-by: Łukasz Stelmach <l.stelmach@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22net: stmmac: retain PTP clock time during SIOCSHWTSTAMP ioctlsHolger Assmann
Currently, when user space emits SIOCSHWTSTAMP ioctl calls such as enabling/disabling timestamping or changing filter settings, the driver reads the current CLOCK_REALTIME value and programming this into the NIC's hardware clock. This might be necessary during system initialization, but at runtime, when the PTP clock has already been synchronized to a grandmaster, a reset of the timestamp settings might result in a clock jump. Furthermore, if the clock is also controlled by phc2sys in automatic mode (where the UTC offset is queried from ptp4l), that UTC-to-TAI offset (currently 37 seconds in 2021) would be temporarily reset to 0, and it would take a long time for phc2sys to readjust so that CLOCK_REALTIME and the PHC are apart by 37 seconds again. To address the issue, we introduce a new function called stmmac_init_tstamp_counter(), which gets called during ndo_open(). It contains the code snippet moved from stmmac_hwtstamp_set() that manages the time synchronization. Besides, the sub second increment configuration is also moved here since the related values are hardware dependent and runtime invariant. Furthermore, the hardware clock must be kept running even when no time stamping mode is selected in order to retain the synchronized time base. That way, timestamping can be enabled again at any time only with the need to compensate the clock's natural drifting. As a side effect, this patch fixes the issue that ptp_clock_info::enable can be called before SIOCSHWTSTAMP and the driver (which looks at priv->systime_flags) was not prepared to handle that ordering. Fixes: 92ba6888510c ("stmmac: add the support for PTP hw clock driver") Reported-by: Michael Olbrich <m.olbrich@pengutronix.de> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Holger Assmann <h.assmann@pengutronix.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22MAINTAINERS: Add entry to MAINTAINERS for MilbeautSugaya Taichi
Add entry to MAINTAINERS for Milbeaut that supported minimal drivers. Signed-off-by: Sugaya Taichi <sugaya.taichi@socionext.com> Link: https://lore.kernel.org/r/1636968656-14033-5-git-send-email-sugaya.taichi@socionext.com' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-11-22nfp: checking parameter process for rx-usecs/tx-usecs is invalidDiana Wang
Use nn->tlv_caps.me_freq_mhz instead of nn->me_freq_mhz to check whether rx-usecs/tx-usecs is valid. This is because nn->tlv_caps.me_freq_mhz represents the clock_freq (MHz) of the flow processing cores (FPC) on the NIC. While nn->me_freq_mhz is not be set. Fixes: ce991ab6662a ("nfp: read ME frequency from vNIC ctrl memory") Signed-off-by: Diana Wang <na.wang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22ipv6: fix typos in __ip6_finish_output()Eric Dumazet
We deal with IPv6 packets, so we need to use IP6CB(skb)->flags and IP6SKB_REROUTED, instead of IPCB(skb)->flags and IPSKB_REROUTED Found by code inspection, please double check that fixing this bug does not surface other bugs. Fixes: 09ee9dba9611 ("ipv6: Reinject IPv6 packets if IPsec policy matches after SNAT") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tobias Brunner <tobias@strongswan.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: David Ahern <dsahern@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Tested-by: Tobias Brunner <tobias@strongswan.org> Acked-by: Tobias Brunner <tobias@strongswan.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22selftests/tc-testings: Be compatible with newer tc outputLi Zhijian
old tc(iproute2-5.9.0) output: action order 1: bpf action.o:[action-ok] id 60 tag bcf7977d3b93787c jited default-action pipe newer tc(iproute2-5.14.0) output: action order 1: bpf action.o:[action-ok] id 64 name tag bcf7977d3b93787c jited default-action pipe It can fix below errors: # ok 260 f84a - Add cBPF action with invalid bytecode # not ok 261 e939 - Add eBPF action with valid object-file # Could not match regex pattern. Verify command output: # total acts 0 # # action order 1: bpf action.o:[action-ok] id 42 name tag bcf7977d3b93787c jited default-action pipe # index 667 ref 1 bind 0 Signed-off-by: Li Zhijian <zhijianx.li@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22selftests/tc-testing: match any qdisc typeLi Zhijian
We should not always presume all kernels use pfifo_fast as the default qdisc. For example, a fq_codel qdisk could have below output: qdisc fq_codel 0: parent 1:4 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64 Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: Li Zhijian <zhijianx.li@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22net: dsa: qca8k: fix MTU calculationRobert Marko
qca8k has a global MTU, so its tracking the MTU per port to make sure that the largest MTU gets applied. Since it uses the frame size instead of MTU the driver MTU change function will then add the size of Ethernet header and checksum on top of MTU. The driver currently populates the per port MTU size as Ethernet frame length + checksum which equals 1518. The issue is that then MTU change function will go through all of the ports, find the largest MTU and apply the Ethernet header + checksum on top of it again, so for a desired MTU of 1500 you will end up with 1536. This is obviously incorrect, so to correct it populate the per port struct MTU with just the MTU and not include the Ethernet header + checksum size as those will be added by the MTU change function. Fixes: f58d2598cf70 ("net: dsa: qca8k: implement the port MTU callbacks") Signed-off-by: Robert Marko <robert.marko@sartura.hr> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22net: dsa: qca8k: fix internal delay applied to the wrong PAD configAnsuel Smith
With SGMII phy the internal delay is always applied to the PAD0 config. This is caused by the falling edge configuration that hardcode the reg to PAD0 (as the falling edge bits are present only in PAD0 reg) Move the delay configuration before the reg overwrite to correctly apply the delay. Fixes: cef08115846e ("net: dsa: qca8k: set internal delay also for sgmii") Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-22firmware: smccc: Fix check for ARCH_SOC_ID not implementedMichael Kelley
The ARCH_FEATURES function ID is a 32-bit SMC call, which returns a 32-bit result per the SMCCC spec. Current code is doing a 64-bit comparison against -1 (SMCCC_RET_NOT_SUPPORTED) to detect that the feature is unimplemented. That check doesn't work in a Hyper-V VM, where the upper 32-bits are zero as allowed by the spec. Cast the result as an 'int' so the comparison works. The change also makes the code consistent with other similar checks in this file. Fixes: 821b67fa4639 ("firmware: smccc: Add ARCH_SOC_ID support") Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-11-22Merge tag 'socfpga_fix_for_v5.16' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux into arm/fixes SoCFPGA fix for v5.16 - Fix crash when CONFIG_FORTIRY_SOURCE is enabled * tag 'socfpga_fix_for_v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux: ARM: socfpga: Fix crash with CONFIG_FORTIRY_SOURCE Link: https://lore.kernel.org/r/20211119153224.2761257-1-dinguyen@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-11-22Merge tag 'scmi-fixes-5.16' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes Arm SCMI fixes for v5.16 Couple of fixes for sparse warnings(type error assignment in voltage and sensor protocols), add proper propagation of error from scmi_pm_domain_probe handling agent discovery response in base protocol correctly and a fix to avoid null pointer de-reference in the error path. * tag 'scmi-fixes-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: firmware: arm_scmi: Fix type error assignment in voltage protocol firmware: arm_scmi: Fix type error in sensor protocol firmware: arm_scmi: pm: Propagate return value to caller firmware: arm_scmi: Fix base agent discover response firmware: arm_scmi: Fix null de-reference on error path Link: https://lore.kernel.org/r/20211118121656.4014764-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-11-22Merge tag 'optee-fix-for-v5.16' of ↵Arnd Bergmann
git://git.linaro.org/people/jens.wiklander/linux-tee into arm/fixes Fix possible NULL pointer dereference in OP-TEE driver * tag 'optee-fix-for-v5.16' of git://git.linaro.org/people/jens.wiklander/linux-tee: optee: fix kfree NULL pointer Link: https://lore.kernel.org/r/20211117125747.GA2896197@jade Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-11-22Merge tag 'arm-soc/for-5.16/devicetree-fixes' of ↵Arnd Bergmann
https://github.com/Broadcom/stblinux into arm/fixes This pull request contains Broadcom ARM-based SoCs Device Tree fixes for 5.16, please pull the following: - Florian fixes the BCM5310x DTS include file to have the appropriate I2C controller interrupt line, and allows the BCMA GPIO controller to be used as an interrupt controller. Finally, the BCM2711 (Raspberry Pi 4) PCIe Device Tree node interrupts are fixed to list the correct interrupt output as well as the INTB/C/D lines. * tag 'arm-soc/for-5.16/devicetree-fixes' of https://github.com/Broadcom/stblinux: ARM: dts: bcm2711: Fix PCIe interrupts ARM: dts: BCM5301X: Add interrupt properties to GPIO node ARM: dts: BCM5301X: Fix I2C controller interrupt Link: https://lore.kernel.org/r/20211116201429.2692786-1-f.fainelli@gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-11-22USB: serial: option: add Telit LE910S1 0x9200 compositionDaniele Palmas
Add the following Telit LE910S1 composition: 0x9200: tty Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Link: https://lore.kernel.org/r/20211119140319.10448-1-dnlplm@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org>
2021-11-22Revert "parisc: Fix backtrace to always include init funtion names"Helge Deller
This reverts commit 279917e27edc293eb645a25428c6ab3f3bca3f86. With the CONFIG_HARDENED_USERCOPY option enabled, this patch triggers kernel bugs at runtime: usercopy: Kernel memory overwrite attempt detected to kernel text (offset 2084839, size 6)! kernel BUG at mm/usercopy.c:99! Backtrace: IAOQ[0]: usercopy_abort+0xc4/0xe8 [<00000000406ed1c8>] __check_object_size+0x174/0x238 [<00000000407086d4>] copy_strings.isra.0+0x3e8/0x708 [<0000000040709a20>] do_execveat_common.isra.0+0x1bc/0x328 [<000000004070b760>] compat_sys_execve+0x7c/0xb8 [<0000000040303eb8>] syscall_exit+0x0/0x14 The problem is, that we have an init section of at least 2MB size which starts at _stext and is freed after bootup. If then later some kernel data is (temporarily) stored in this free memory, check_kernel_text_object() will trigger a bug since the data appears to be inside the kernel text (>=_stext) area: if (overlaps(ptr, len, _stext, _etext)) usercopy_abort("kernel text"); Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@kernel.org # 5.4+
2021-11-22parisc: Convert PTE lookup to use extru_safe() macroHelge Deller
Convert the PTE lookup functions to use the safer extru_safe macro. Signed-off-by: Helge Deller <deller@gmx.de>
2021-11-22parisc: Fix extraction of hash lock bits in syscall.SJohn David Anglin
The extru instruction leaves the most significant 32 bits of the target register in an undefined state on PA 2.0 systems. If any of these bits are nonzero, this will break the calculation of the lock pointer. Fix by using extrd,u instruction via extru_safe macro on 64-bit kernels. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2021-11-22parisc: Provide an extru_safe() macro to extract unsigned bitsHelge Deller
The extru instruction leaves the most significant 32 bits of the target register in an undefined state on PA 2.0 systems. Provide a macro to safely use extru on 32- and 64-bit machines. Suggested-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2021-11-22parisc: Increase FRAME_WARN to 2048 bytes on pariscHelge Deller
PA-RISC uses a much bigger frame size for functions than other architectures. So increase it to 2048 for 32- and 64-bit kernels. This fixes e.g. a warning in lib/xxhash.c. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Helge Deller <deller@gmx.de>
2021-11-21iomap: Fix inline extent handling in iomap_readpageAndreas Gruenbacher
Before commit 740499c78408 ("iomap: fix the iomap_readpage_actor return value for inline data"), when hitting an IOMAP_INLINE extent, iomap_readpage_actor would report having read the entire page. Since then, it only reports having read the inline data (iomap->length). This will force iomap_readpage into another iteration, and the filesystem will report an unaligned hole after the IOMAP_INLINE extent. But iomap_readpage_actor (now iomap_readpage_iter) isn't prepared to deal with unaligned extents, it will get things wrong on filesystems with a block size smaller than the page size, and we'll eventually run into the following warning in iomap_iter_advance: WARN_ON_ONCE(iter->processed > iomap_length(iter)); Fix that by changing iomap_readpage_iter to return 0 when hitting an inline extent; this will cause iomap_iter to stop immediately. To fix readahead as well, change iomap_readahead_iter to pass on iomap_readpage_iter return values less than or equal to zero. Fixes: 740499c78408 ("iomap: fix the iomap_readpage_actor return value for inline data") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-11-21Linux 5.16-rc2v5.16-rc2Linus Torvalds
2021-11-21Merge tag 'x86-urgent-2021-11-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - Move the command line preparation and the early command line parsing earlier so that the command line parameters which affect early_reserve_memory(), e.g. efi=nosftreserve, are taken into account. This was broken when the invocation of early_reserve_memory() was moved recently. - Use an atomic type for the SGX page accounting, which is read and written locklessly, to plug various race conditions related to it. * tag 'x86-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Fix free page accounting x86/boot: Pull up cmdline preparation and early param parsing
2021-11-21Merge tag 'perf-urgent-2021-11-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fixes from Thomas Gleixner: - Remove unneded PEBS disabling when taking LBR snapshots to prevent an unchecked MSR access error. - Fix IIO event constraints for Snowridge and Skylake server chips. * tag 'perf-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/perf: Fix snapshot_branch_stack warning in VM perf/x86/intel/uncore: Fix IIO event constraints for Snowridge perf/x86/intel/uncore: Fix IIO event constraints for Skylake Server perf/x86/intel/uncore: Fix filter_tid mask for CHA events on Skylake Server
2021-11-21Merge tag 'powerpc-5.16-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull more powerpc fixes from Michael Ellerman: - Fix a bug in copying of sigset_t for 32-bit systems, which caused X to not start. - Fix handling of shared LSIs (rare) with the xive interrupt controller (Power9/10). - Fix missing TOC setup in some KVM code, which could result in oopses depending on kernel data layout. - Fix DMA mapping when we have persistent memory and only one DMA window available. - Fix further problems with STRICT_KERNEL_RWX on 8xx, exposed by a recent fix. - A couple of other minor fixes. Thanks to Alexey Kardashevskiy, Aneesh Kumar K.V, Cédric Le Goater, Christian Zigotzky, Christophe Leroy, Daniel Axtens, Finn Thain, Greg Kurz, Masahiro Yamada, Nicholas Piggin, and Uwe Kleine-König. * tag 'powerpc-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/xive: Change IRQ domain to a tree domain powerpc/8xx: Fix pinned TLBs with CONFIG_STRICT_KERNEL_RWX powerpc/signal32: Fix sigset_t copy powerpc/book3e: Fix TLBCAM preset at boot powerpc/pseries/ddw: Do not try direct mapping with persistent memory and one window powerpc/pseries/ddw: simplify enable_ddw() powerpc/pseries/ddw: Revert "Extend upper limit for huge DMA window for persistent memory" powerpc/pseries: Fix numa FORM2 parsing fallback code powerpc/pseries: rename numa_dist_table to form2_distances powerpc: clean vdso32 and vdso64 directories powerpc/83xx/mpc8349emitx: Drop unused variable KVM: PPC: Book3S HV: Use GLOBAL_TOC for kvmppc_h_set_dabr/xdabr()
2021-11-21pstore/blk: Use "%lu" to format unsigned longGeert Uytterhoeven
On 32-bit: fs/pstore/blk.c: In function ‘__best_effort_init’: include/linux/kern_levels.h:5:18: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 3 has type ‘long unsigned int’ [-Wformat=] 5 | #define KERN_SOH "\001" /* ASCII Start Of Header */ | ^~~~~~ include/linux/kern_levels.h:14:19: note: in expansion of macro ‘KERN_SOH’ 14 | #define KERN_INFO KERN_SOH "6" /* informational */ | ^~~~~~~~ include/linux/printk.h:373:9: note: in expansion of macro ‘KERN_INFO’ 373 | printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) | ^~~~~~~~~ fs/pstore/blk.c:314:3: note: in expansion of macro ‘pr_info’ 314 | pr_info("attached %s (%zu) (no dedicated panic_write!)\n", | ^~~~~~~ Cc: stable@vger.kernel.org Fixes: 7bb9557b48fcabaa ("pstore/blk: Use the normal block device I/O path") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210629103700.1935012-1-geert@linux-m68k.org Cc: Jens Axboe <axboe@kernel.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "15 patches. Subsystems affected by this patch series: ipc, hexagon, mm (swap, slab-generic, kmemleak, hugetlb, kasan, damon, and highmem), and proc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: proc/vmcore: fix clearing user buffer by properly using clear_user() kmap_local: don't assume kmap PTEs are linear arrays in memory mm/damon/dbgfs: fix missed use of damon_dbgfs_lock mm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocation kasan: test: silence intentional read overflow warnings hugetlb, userfaultfd: fix reservation restore on userfaultfd error hugetlb: fix hugetlb cgroup refcounting during mremap mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag hexagon: ignore vmlinux.lds hexagon: clean up timer-regs.h hexagon: export raw I/O routines for modules mm: emit the "free" trace report before freeing memory in kmem_cache_free() shm: extend forced shm destroy to support objects from several IPC nses ipc: WARN if trying to remove ipc object which is absent mm/swap.c:put_pages_list(): reinitialise the page list
2021-11-20Merge tag 'block-5.16-2021-11-19' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - Flip a cap check to avoid a selinux error (Alistair) - Fix for a regression this merge window where we can miss a queue ref put (me) - Un-mark pstore-blk as broken, as the condition that triggered that change has been rectified (Kees) - Queue quiesce and sync fixes (Ming) - FUA insertion fix (Ming) - blk-cgroup error path put fix (Yu) * tag 'block-5.16-2021-11-19' of git://git.kernel.dk/linux-block: blk-mq: don't insert FUA request with data into scheduler queue blk-cgroup: fix missing put device in error path from blkg_conf_pref() block: avoid to quiesce queue in elevator_init_mq Revert "mark pstore-blk as broken" blk-mq: cancel blk-mq dispatch work in both blk_cleanup_queue and disk_release() block: fix missing queue put in error path block: Check ADMIN before NICE for IOPRIO_CLASS_RT
2021-11-20Merge tag 'pinctrl-v5.16-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "There is an ACPI stubs fix which is ACKed by the ACPI maintainer for merging through my tree. One item stand out and that is that I delete the <linux/sdb.h> header that is used by nothing. I deleted this subsystem (through the GPIO tree) a while back so I feel responsible for tidying up the floor. Other than that it is the usual mistakes, a bit noisy around build issue and Kconfig then driver fixes. Specifics: - Fix some stubs causing compile issues for ACPI. - Fix some wakeups on AMD IRQs shared between GPIO and SCI. - Fix a build warning in the Tegra driver. - Fix a Kconfig issue in the Qualcomm driver. - Add a missing include the RALink driver. - Return a valid type for the Apple pinctrl IRQs. - Implement some Qualcomm SDM845 dual-edge errata. - Remove the unused <linux/sdb.h> header. (The subsystem was once deleted by the pinctrl maintainer...) - Fix a duplicate initialized in the Tegra driver. - Fix register offsets for UFS and SDC in the Qualcomm SM8350 driver" * tag 'pinctrl-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: qcom: sm8350: Correct UFS and SDC offsets pinctrl: tegra194: remove duplicate initializer again Remove unused header <linux/sdb.h> pinctrl: qcom: sdm845: Enable dual edge errata pinctrl: apple: Always return valid type in apple_gpio_irq_type pinctrl: ralink: include 'ralink_regs.h' in 'pinctrl-mt7620.c' pinctrl: qcom: fix unmet dependencies on GPIOLIB for GPIOLIB_IRQCHIP pinctrl: tegra: Return const pointer from tegra_pinctrl_get_group() pinctrl: amd: Fix wakeups when IRQ is shared with SCI ACPI: Add stubs for wakeup handler functions
2021-11-20Merge tag 's390-5.16-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Add missing Kconfig option for ftrace direct multi sample, so it can be compiled again, and also add s390 support for this sample. - Update Christian Borntraeger's email address. - Various fixes for memory layout setup. Besides other this makes it possible to load shared DCSS segments again. - Fix copy to user space of swapped kdump oldmem. - Remove -mstack-guard and -mstack-size compile options when building vdso binaries. This can happen when CONFIG_VMAP_STACK is disabled and results in broken vdso code which causes more or less random exceptions. Also remove the not needed -nostdlib option. - Fix memory leak on cpu hotplug and return code handling in kexec code. - Wire up futex_waitv system call. - Replace snprintf with sysfs_emit where appropriate. * tag 's390-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: ftrace/samples: add s390 support for ftrace direct multi sample ftrace/samples: add missing Kconfig option for ftrace direct multi sample MAINTAINERS: update email address of Christian Borntraeger s390/kexec: fix memory leak of ipl report buffer s390/kexec: fix return code handling s390/dump: fix copying to user-space of swapped kdump oldmem s390: wire up sys_futex_waitv system call s390/vdso: filter out -mstack-guard and -mstack-size s390/vdso: remove -nostdlib compiler flag s390: replace snprintf in show functions with sysfs_emit s390/boot: simplify and fix kernel memory layout setup s390/setup: re-arrange memblock setup s390/setup: avoid using memblock_enforce_memory_limit s390/setup: avoid reserving memory above identity mapping
2021-11-20Merge tag '5.16-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Three small cifs/smb3 fixes: two to address minor coverity issues and one cleanup" * tag '5.16-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: introduce cifs_ses_mark_for_reconnect() helper cifs: protect srv_count with cifs_tcp_ses_lock cifs: move debug print out of spinlock
2021-11-20proc/vmcore: fix clearing user buffer by properly using clear_user()David Hildenbrand
To clear a user buffer we cannot simply use memset, we have to use clear_user(). With a virtio-mem device that registers a vmcore_cb and has some logically unplugged memory inside an added Linux memory block, I can easily trigger a BUG by copying the vmcore via "cp": systemd[1]: Starting Kdump Vmcore Save Service... kdump[420]: Kdump is using the default log level(3). kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/ kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/ kdump[465]: saving vmcore-dmesg.txt complete kdump[467]: saving vmcore BUG: unable to handle page fault for address: 00007f2374e01000 #PF: supervisor write access in kernel mode #PF: error_code(0x0003) - permissions violation PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867 Oops: 0003 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014 RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86 Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81 RSP: 0018:ffffc9000073be08 EFLAGS: 00010212 RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000 RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008 RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50 R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000 R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8 FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0 Call Trace: read_vmcore+0x236/0x2c0 proc_reg_read+0x55/0xa0 vfs_read+0x95/0x190 ksys_read+0x4f/0xc0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access Prevention (SMAP)", which is used to detect wrong access from the kernel to user buffers like this: SMAP triggers a permissions violation on wrong access. In the x86-64 variant of clear_user(), SMAP is properly handled via clac()+stac(). To fix, properly use clear_user() when we're dealing with a user buffer. Link: https://lkml.kernel.org/r/20211112092750.6921-1-david@redhat.com Fixes: 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages") Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Philipp Rudo <prudo@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20kmap_local: don't assume kmap PTEs are linear arrays in memoryArd Biesheuvel
The kmap_local conversion broke the ARM architecture, because the new code assumes that all PTEs used for creating kmaps form a linear array in memory, and uses array indexing to look up the kmap PTE belonging to a certain kmap index. On ARM, this cannot work, not only because the PTE pages may be non-adjacent in memory, but also because ARM/!LPAE interleaves hardware entries and extended entries (carrying software-only bits) in a way that is not compatible with array indexing. Fortunately, this only seems to affect configurations with more than 8 CPUs, due to the way the per-CPU kmap slots are organized in memory. Work around this by permitting an architecture to set a Kconfig symbol that signifies that the kmap PTEs do not form a lineary array in memory, and so the only way to locate the appropriate one is to walk the page tables. Link: https://lore.kernel.org/linux-arm-kernel/20211026131249.3731275-1-ardb@kernel.org/ Link: https://lkml.kernel.org/r/20211116094737.7391-1-ardb@kernel.org Fixes: 2a15ba82fa6c ("ARM: highmem: Switch to generic kmap atomic") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reported-by: Quanyang Wang <quanyang.wang@windriver.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20mm/damon/dbgfs: fix missed use of damon_dbgfs_lockSeongJae Park
DAMON debugfs is supposed to protect dbgfs_ctxs, dbgfs_nr_ctxs, and dbgfs_dirs using damon_dbgfs_lock. However, some of the code is accessing the variables without the protection. This fixes it by protecting all such accesses. Link: https://lkml.kernel.org/r/20211110145758.16558-3-sj@kernel.org Fixes: 75c1c2b53c78 ("mm/damon/dbgfs: support multiple contexts") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20mm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocationSeongJae Park
Patch series "DAMON fixes". This patch (of 2): DAMON users can trigger below warning in '__alloc_pages()' by invoking write() to some DAMON debugfs files with arbitrarily high count argument, because DAMON debugfs interface allocates some buffers based on the user-specified 'count'. if (unlikely(order >= MAX_ORDER)) { WARN_ON_ONCE(!(gfp & __GFP_NOWARN)); return NULL; } Because the DAMON debugfs interface code checks failure of the 'kmalloc()', this commit simply suppresses the warnings by adding '__GFP_NOWARN' flag. Link: https://lkml.kernel.org/r/20211110145758.16558-1-sj@kernel.org Link: https://lkml.kernel.org/r/20211110145758.16558-2-sj@kernel.org Fixes: 4bc05954d007 ("mm/damon: implement a debugfs-based user space interface") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20kasan: test: silence intentional read overflow warningsKees Cook
As done in commit d73dad4eb5ad ("kasan: test: bypass __alloc_size checks") for __write_overflow warnings, also silence some more cases that trip the __read_overflow warnings seen in 5.16-rc1[1]: In file included from include/linux/string.h:253, from include/linux/bitmap.h:10, from include/linux/cpumask.h:12, from include/linux/mm_types_task.h:14, from include/linux/mm_types.h:5, from include/linux/page-flags.h:13, from arch/arm64/include/asm/mte.h:14, from arch/arm64/include/asm/pgtable.h:12, from include/linux/pgtable.h:6, from include/linux/kasan.h:29, from lib/test_kasan.c:10: In function 'memcmp', inlined from 'kasan_memcmp' at lib/test_kasan.c:897:2: include/linux/fortify-string.h:263:25: error: call to '__read_overflow' declared with attribute error: detected read beyond size of object (1st parameter) 263 | __read_overflow(); | ^~~~~~~~~~~~~~~~~ In function 'memchr', inlined from 'kasan_memchr' at lib/test_kasan.c:872:2: include/linux/fortify-string.h:277:17: error: call to '__read_overflow' declared with attribute error: detected read beyond size of object (1st parameter) 277 | __read_overflow(); | ^~~~~~~~~~~~~~~~~ [1] http://kisskb.ellerman.id.au/kisskb/buildresult/14660585/log/ Link: https://lkml.kernel.org/r/20211116004111.3171781-1-keescook@chromium.org Fixes: d73dad4eb5ad ("kasan: test: bypass __alloc_size checks") Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Acked-by: Marco Elver <elver@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20hugetlb, userfaultfd: fix reservation restore on userfaultfd errorMina Almasry
Currently in the is_continue case in hugetlb_mcopy_atomic_pte(), if we bail out using "goto out_release_unlock;" in the cases where idx >= size, or !huge_pte_none(), the code will detect that new_pagecache_page == false, and so call restore_reserve_on_error(). In this case I see restore_reserve_on_error() delete the reservation, and the following call to remove_inode_hugepages() will increment h->resv_hugepages causing a 100% reproducible leak. We should treat the is_continue case similar to adding a page into the pagecache and set new_pagecache_page to true, to indicate that there is no reservation to restore on the error path, and we need not call restore_reserve_on_error(). Rename new_pagecache_page to page_in_pagecache to make that clear. Link: https://lkml.kernel.org/r/20211117193825.378528-1-almasrymina@google.com Fixes: c7b1850dfb41 ("hugetlb: don't pass page cache pages to restore_reserve_on_error") Signed-off-by: Mina Almasry <almasrymina@google.com> Reported-by: James Houghton <jthoughton@google.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Wei Xu <weixugc@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20hugetlb: fix hugetlb cgroup refcounting during mremapBui Quang Minh
When hugetlb_vm_op_open() is called during copy_vma(), we may take the reference to resv_map->css. Later, when clearing the reservation pointer of old_vma after transferring it to new_vma, we forget to drop the reference to resv_map->css. This leads to a reference leak of css. Fixes this by adding a check to drop reservation css reference in clear_vma_resv_huge_pages() Link: https://lkml.kernel.org/r/20211113154412.91134-1-minhquangbui99@gmail.com Fixes: 550a7d60bd5e35 ("mm, hugepages: add mremap() support for hugepage backed vma") Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flagRustam Kovhaev
When kmemleak is enabled for SLOB, system does not boot and does not print anything to the console. At the very early stage in the boot process we hit infinite recursion from kmemleak_init() and eventually kernel crashes. kmemleak_init() specifies SLAB_NOLEAKTRACE for KMEM_CACHE(), but kmem_cache_create_usercopy() removes it because CACHE_CREATE_MASK is not valid for SLOB. Let's fix CACHE_CREATE_MASK and make kmemleak work with SLOB Link: https://lkml.kernel.org/r/20211115020850.3154366-1-rkovhaev@gmail.com Fixes: d8843922fba4 ("slab: Ignore internal flags in cache creation") Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Glauber Costa <glommer@parallels.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20hexagon: ignore vmlinux.ldsNathan Chancellor
After building allmodconfig, there is an untracked vmlinux.lds file in arch/hexagon/kernel: $ git ls-files . --exclude-standard --others arch/hexagon/kernel/vmlinux.lds Ignore it as all other architectures have. Link: https://lkml.kernel.org/r/20211115174250.1994179-4-nathan@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org> Cc: Brian Cain <bcain@codeaurora.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20hexagon: clean up timer-regs.hNathan Chancellor
When building allmodconfig, there is a warning about TIMER_ENABLE being redefined: drivers/clocksource/timer-oxnas-rps.c:39:9: error: 'TIMER_ENABLE' macro redefined [-Werror,-Wmacro-redefined] #define TIMER_ENABLE BIT(7) ^ arch/hexagon/include/asm/timer-regs.h:13:9: note: previous definition is here #define TIMER_ENABLE 0 ^ 1 error generated. The values in this header are only used in one file each, if they are used at all. Remove the header and sink all of the constants into their respective files. TCX0_CLK_RATE is only used in arch/hexagon/include/asm/timex.h TIMER_ENABLE, RTOS_TIMER_INT, RTOS_TIMER_REGS_ADDR are only used in arch/hexagon/kernel/time.c. SLEEP_CLK_RATE and TIMER_CLR_ON_MATCH have both been unused since the file's introduction in commit 71e4a47f32f4 ("Hexagon: Add time and timer functions"). TIMER_ENABLE is redefined as BIT(0) so the shift is moved into the definition, rather than its use. Link: https://lkml.kernel.org/r/20211115174250.1994179-3-nathan@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Brian Cain <bcain@codeaurora.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20hexagon: export raw I/O routines for modulesNathan Chancellor
Patch series "Fixes for ARCH=hexagon allmodconfig", v2. This series fixes some issues noticed with ARCH=hexagon allmodconfig. This patch (of 3): When building ARCH=hexagon allmodconfig, the following errors occur: ERROR: modpost: "__raw_readsl" [drivers/i3c/master/svc-i3c-master.ko] undefined! ERROR: modpost: "__raw_writesl" [drivers/i3c/master/dw-i3c-master.ko] undefined! ERROR: modpost: "__raw_readsl" [drivers/i3c/master/dw-i3c-master.ko] undefined! ERROR: modpost: "__raw_writesl" [drivers/i3c/master/i3c-master-cdns.ko] undefined! ERROR: modpost: "__raw_readsl" [drivers/i3c/master/i3c-master-cdns.ko] undefined! Export these symbols so that modules can use them without any errors. Link: https://lkml.kernel.org/r/20211115174250.1994179-1-nathan@kernel.org Link: https://lkml.kernel.org/r/20211115174250.1994179-2-nathan@kernel.org Fixes: 013bf24c3829 ("Hexagon: Provide basic implementation and/or stubs for I/O routines.") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Brian Cain <bcain@codeaurora.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>