summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-07-10efi: Fix efi_power_off() not being run before acpi_power_off() when necessaryHans de Goede
Commit 98f30d0ecf79 ("ACPI: power: Switch to sys-off handler API") switched the ACPI sleep code from directly setting the old global pm_power_off handler to using the new register_sys_off_handler() mechanism with a priority of SYS_OFF_PRIO_FIRMWARE. This is a problem when the old global pm_power_off handler would later be overwritten, such as done by the late_initcall(efi_shutdown_init): if (efi_poweroff_required()) pm_power_off = efi_power_off; The old global pm_power_off handler gets run with a priority of SYS_OFF_PRIO_DEFAULT which is lower then SYS_OFF_PRIO_FIRMWARE, causing acpi_power_off() to run first, changing the behavior from before the ACPI sleep code switched to the new register_sys_off_handler(). Switch the registering of efi_power_off over to register_sys_off_handler() with a priority of SYS_OFF_PRIO_FIRMWARE + 1 so that it will run before acpi_power_off() as before. Note since the new sys-off-handler code will try all handlers in priority order, there is no more need for the EFI code to store and call the original pm_power_off handler. Fixes: 98f30d0ecf79 ("ACPI: power: Switch to sys-off handler API") Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220708131412.81078-3-hdegoede@redhat.com
2022-07-10platform/x86: x86-android-tablets: Fix Lenovo Yoga Tablet 2 830/1050 ↵Hans de Goede
poweroff again Commit 98f30d0ecf79 ("ACPI: power: Switch to sys-off handler API") switched the ACPI sleep code from directly setting the old global pm_power_off handler to using the new register_sys_off_handler() mechanism with a priority of SYS_OFF_PRIO_FIRMWARE. This is a problem in special cases where the old global pm_power_off handler later gets overwritten, such as the Lenovo Tab2 poweroff bugfix in x86-android-tablets. The old global pm_power_off handler gets run with a priority of SYS_OFF_PRIO_DEFAULT which is lower then SYS_OFF_PRIO_FIRMWARE, causing the troublesome ACPI poweroff (which freezes the system) to run first. Switch the registering of lenovo_yoga_tab2_830_1050_power_off over to register_sys_off_handler() with a priority of SYS_OFF_PRIO_FIRMWARE + 1 so that it will run before acpi_power_off() to fix this. Fixes: 98f30d0ecf79 ("ACPI: power: Switch to sys-off handler API") Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20220708131412.81078-2-hdegoede@redhat.com
2022-07-10platform/x86: gigabyte-wmi: add support for B660I AORUS PRO DDR4Pär Eriksson
Add support for the B660I AORUS PRO DDR4. Signed-off-by: Pär Eriksson <parherman@gmail.com> Link: https://lore.kernel.org/r/20220705184407.14181-1-parherman@gmail.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-07-10platform/x86/amd/pmc: Add new platform supportShyam Sundar S K
PMC driver can be supported on a new upcoming platform. Add this information to the support list. Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Link: https://lore.kernel.org/r/20220630050324.3780654-2-Shyam-sundar.S-k@amd.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-07-10platform/x86/amd/pmc: Add new acpi id for PMC controllerShyam Sundar S K
New version of PMC controller will have a separate ACPI id, add that to the support list. Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Link: https://lore.kernel.org/r/20220630050324.3780654-1-Shyam-sundar.S-k@amd.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-07-10kbuild: remove unused cmd_none in scripts/Makefile.modinstMasahiro Yamada
Commit 65ce9c38326e ("kbuild: move module strip/compression code into scripts/Makefile.modinst") added this unused code. Perhaps, I thought cmd_none was useful for CONFIG_MODULE_COMPRESS_NONE, but I did not use it after all. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
2022-07-10x86/boot: Fix the setup data types max limitBorislav Petkov
Commit in Fixes forgot to change the SETUP_TYPE_MAX definition which contains the highest valid setup data type. Correct that. Fixes: 5ea98e01ab52 ("x86/boot: Add Confidential Computing type to setup_data") Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/ddba81dd-cc92-699c-5274-785396a17fb5@zytor.com
2022-07-09Merge tag 'i2c-for-5.19-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Two I2C driver bugfixes preventing resource leaks" * tag 'i2c-for-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: cadence: Unregister the clk notifier in error path i2c: piix4: Fix a memory leak in the EFCH MMIO support
2022-07-09drm/aperture: Run fbdev removal before internal helpersThomas Zimmermann
Always run fbdev removal first to remove simpledrm via sysfb_disable(). This clears the internal state. The later call to drm_aperture_detach_drivers() then does nothing. Otherwise, with drm_aperture_detach_drivers() running first, the call to sysfb_disable() uses inconsistent state. Example backtrace show below: BUG: KASAN: use-after-free in device_del+0x79/0x5f0 Read of size 8 at addr ffff888108185050 by task systemd-udevd/311 CPU: 0 PID: 311 Comm: systemd-udevd Tainted: G E 5.19.0-rc2-1-default+ #1689 Hardware name: HP ProLiant DL120 G7, BIOS J01 04/21/2011 Call Trace: device_del+0x79/0x5f0 platform_device_del.part.0+0x19/0xe0 platform_device_unregister+0x1c/0x30 sysfb_disable+0x2d/0x70 remove_conflicting_framebuffers+0x1c/0xf0 remove_conflicting_pci_framebuffers+0x130/0x1a0 drm_aperture_remove_conflicting_pci_framebuffers+0x86/0xb0 mgag200_pci_probe+0x2d/0x140 [mgag200] Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 873eb3b11860 ("fbdev: Disable sysfb device registration when removing conflicting FBs") Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Helge Deller <deller@gmx.de> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Cc: Changcheng Deng <deng.changcheng@zte.com.cn> Reviewed-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-09ptrace: fix clearing of JOBCTL_TRACED in ptrace_unfreeze_traced()Sven Schnelle
CI reported the following splat while running the strace testsuite: WARNING: CPU: 1 PID: 3570031 at kernel/ptrace.c:272 ptrace_check_attach+0x12e/0x178 CPU: 1 PID: 3570031 Comm: strace Tainted: G OE 5.19.0-20220624.rc3.git0.ee819a77d4e7.300.fc36.s390x #1 Hardware name: IBM 3906 M04 704 (z/VM 7.1.0) Call Trace: [<00000000ab4b645a>] ptrace_check_attach+0x132/0x178 ([<00000000ab4b6450>] ptrace_check_attach+0x128/0x178) [<00000000ab4b6cde>] __s390x_sys_ptrace+0x86/0x160 [<00000000ac03fcec>] __do_syscall+0x1d4/0x200 [<00000000ac04e312>] system_call+0x82/0xb0 Last Breaking-Event-Address: [<00000000ab4ea3c8>] wait_task_inactive+0x98/0x190 This is because JOBCTL_TRACED is set, but the task is not in TASK_TRACED state. Caused by ptrace_unfreeze_traced() which does: task->jobctl &= ~TASK_TRACED but it should be: task->jobctl &= ~JOBCTL_TRACED Fixes: 31cae1eaae4f ("sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Tested-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-09Merge tag 'powerpc-5.19-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: - On Power8 bare metal, fix creation of RNG platform devices, which are needed for the /dev/hwrng driver to probe correctly. Thanks to Jason A. Donenfeld, and Sachin Sant. * tag 'powerpc-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/powernv: delay rng platform device creation until later in boot
2022-07-09Merge tag 'asoc-fix-v5.19-rc4' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v5.19 Quite a large batch due to things building up for a couple of weeks but all driver specific apart from Marek's documentation fix.
2022-07-09netfilter: nf_tables: replace BUG_ON by element length checkPablo Neira Ayuso
BUG_ON can be triggered from userspace with an element with a large userdata area. Replace it by length check and return EINVAL instead. Over time extensions have been growing in size. Pick a sufficiently old Fixes: tag to propagate this fix. Fixes: 7d7402642eaf ("netfilter: nf_tables: variable sized set element keys / data") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-09io_uring: check that we have a file table when allocating update slotsJens Axboe
If IORING_FILE_INDEX_ALLOC is set asking for an allocated slot, the helper doesn't check if we actually have a file table or not. The non alloc path does do that correctly, and returns -ENXIO if we haven't set one up. Do the same for the allocated path, avoiding a NULL pointer dereference when trying to find a free bit. Fixes: a7c41b4687f5 ("io_uring: let IORING_OP_FILES_UPDATE support choosing fixed file slots") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-09vlan: fix memory leak in vlan_newlink()Eric Dumazet
Blamed commit added back a bug I fixed in commit 9bbd917e0bec ("vlan: fix memory leak in vlan_dev_set_egress_priority") If a memory allocation fails in vlan_changelink() after other allocations succeeded, we need to call vlan_dev_free_egress_priority() to free all allocated memory because after a failed ->newlink() we do not call any methods like ndo_uninit() or dev->priv_destructor(). In following example, if the allocation for last element 2000:2001 fails, we need to free eight prior allocations: ip link add link dummy0 dummy0.100 type vlan id 100 \ egress-qos-map 1:2 2:3 3:4 4:5 5:6 6:7 7:8 8:9 2000:2001 syzbot report was: BUG: memory leak unreferenced object 0xffff888117bd1060 (size 32): comm "syz-executor408", pid 3759, jiffies 4294956555 (age 34.090s) hex dump (first 32 bytes): 09 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff83fc60ad>] kmalloc include/linux/slab.h:600 [inline] [<ffffffff83fc60ad>] vlan_dev_set_egress_priority+0xed/0x170 net/8021q/vlan_dev.c:193 [<ffffffff83fc6628>] vlan_changelink+0x178/0x1d0 net/8021q/vlan_netlink.c:128 [<ffffffff83fc67c8>] vlan_newlink+0x148/0x260 net/8021q/vlan_netlink.c:185 [<ffffffff838b1278>] rtnl_newlink_create net/core/rtnetlink.c:3363 [inline] [<ffffffff838b1278>] __rtnl_newlink+0xa58/0xdc0 net/core/rtnetlink.c:3580 [<ffffffff838b1629>] rtnl_newlink+0x49/0x70 net/core/rtnetlink.c:3593 [<ffffffff838ac66c>] rtnetlink_rcv_msg+0x21c/0x5c0 net/core/rtnetlink.c:6089 [<ffffffff839f9c37>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2501 [<ffffffff839f8da7>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] [<ffffffff839f8da7>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345 [<ffffffff839f9266>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921 [<ffffffff8384dbf6>] sock_sendmsg_nosec net/socket.c:714 [inline] [<ffffffff8384dbf6>] sock_sendmsg+0x56/0x80 net/socket.c:734 [<ffffffff8384e15c>] ____sys_sendmsg+0x36c/0x390 net/socket.c:2488 [<ffffffff838523cb>] ___sys_sendmsg+0x8b/0xd0 net/socket.c:2542 [<ffffffff838525b8>] __sys_sendmsg net/socket.c:2571 [inline] [<ffffffff838525b8>] __do_sys_sendmsg net/socket.c:2580 [inline] [<ffffffff838525b8>] __se_sys_sendmsg net/socket.c:2578 [inline] [<ffffffff838525b8>] __x64_sys_sendmsg+0x78/0xf0 net/socket.c:2578 [<ffffffff845ad8d5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff845ad8d5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff8460006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: 37aa50c539bc ("vlan: introduce vlan_dev_free_egress_priority") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Xin Long <lucien.xin@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09nfp: fix issue of skb segments exceeds descriptor limitationBaowen Zheng
TCP packets will be dropped if the segments number in the tx skb exceeds limitation when sending iperf3 traffic with --zerocopy option. we make the following changes: Get nr_frags in nfp_nfdk_tx_maybe_close_block instead of passing from outside because it will be changed after skb_linearize operation. Fill maximum dma_len in first tx descriptor to make sure the whole head is included in the first descriptor. Fixes: c10d12e3dce8 ("nfp: add support for NFDK data path") Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09x86/speculation: Disable RRSBA behaviorPawan Gupta
Some Intel processors may use alternate predictors for RETs on RSB-underflow. This condition may be vulnerable to Branch History Injection (BHI) and intramode-BTI. Kernel earlier added spectre_v2 mitigation modes (eIBRS+Retpolines, eIBRS+LFENCE, Retpolines) which protect indirect CALLs and JMPs against such attacks. However, on RSB-underflow, RET target prediction may fallback to alternate predictors. As a result, RET's predicted target may get influenced by branch history. A new MSR_IA32_SPEC_CTRL bit (RRSBA_DIS_S) controls this fallback behavior when in kernel mode. When set, RETs will not take predictions from alternate predictors, hence mitigating RETs as well. Support for this is enumerated by CPUID.7.2.EDX[RRSBA_CTRL] (bit2). For spectre v2 mitigation, when a user selects a mitigation that protects indirect CALLs and JMPs against BHI and intramode-BTI, set RRSBA_DIS_S also to protect RETs for RSB-underflow case. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-09x86/kexec: Disable RET on kexecKonrad Rzeszutek Wilk
All the invocations unroll to __x86_return_thunk and this file must be PIC independent. This fixes kexec on 64-bit AMD boxes. [ bp: Fix 32-bit build. ] Reported-by: Edward Tran <edward.tran@oracle.com> Reported-by: Awais Tanveer <awais.tanveer@oracle.com> Suggested-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-09netfilter: nf_log: incorrect offset to network headerPablo Neira Ayuso
NFPROTO_ARP is expecting to find the ARP header at the network offset. In the particular case of ARP, HTYPE= field shows the initial bytes of the ethernet header destination MAC address. netdev out: IN= OUT=bridge0 MACSRC=c2:76:e5:71:e1:de MACDST=36:b0:4a:e2:72:ea MACPROTO=0806 ARP HTYPE=14000 PTYPE=0x4ae2 OPCODE=49782 NFPROTO_NETDEV egress hook is also expecting to find the IP headers at the network offset. Fixes: 35b9395104d5 ("netfilter: add generic ARP packet logger") Reported-by: Tom Yan <tom.ty89@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-08Input: document the units for resolution of size axesSiarhei Vishniakou
Today, the resolution of size axes is not documented. As a result, it's not clear what the canonical interpretation of this value should be. On Android, there is a need to calculate the size of the touch ellipse in physical units (millimeters). After reviewing linux source, it turned out that most of the existing usages are already interpreting this value as "units/mm". This documentation will make it explicit. This will help device implementations with correctly following the linux specs, and will ensure that the devices will work on Android without needing further customized parameters for scaling of major/minor values. Signed-off-by: Siarhei Vishniakou <svv@google.com> Reviewed-by: Jeff LaBundy <jeff@labundy.com> Link: https://lore.kernel.org/r/20220520084514.3451193-1-svv@google.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-07-08Input: goodix - call acpi_device_fix_up_power() in some casesHans de Goede
On ACPI boards, when we cannot get the GPIOs to do a reset ourselves if necessary, call acpi_device_fix_up_power() to force the ACPI _PS0 method to run. On some devices without proper GPIO descriptions this will reset the touchscreen for us and this may be necessary for us to be able to communicate to the touchscreen at all. Specifically on an Aya Neo Next this change will cause the _PS0() ACPI function to call INIT() which does: Method (INIT, 0, Serialized) { TP_I = 0x00A50000 TP_R = 0x00A50000 Sleep (0x0A) TP_I = 0x00E50000 Sleep (One) TP_R = 0x00E50000 Sleep (0x06) TP_I = 0x00A50000 Sleep (0x3C) TP_I = 0x00041800 } On older kernels the ACPI core assumed a power-on was necessary by itself and would run _PS0 before our probe function runs, which can be seen from the GPIO pin ctrl registers in /sys/kernel/debug/gpio which match the above hex values with older kernels. With newer kernels before this change the GPIO pin ctrl registers do not match, indicating INIT() has not run and probing the touchscreen fails. This change makes Linux run _PS0() again fixing the touchscreen not working on the Aya Neo Next. Reported-and-tested-by: Maya Matuszczyk <maccraft123mc@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20220618210233.208027-1-hdegoede@redhat.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-07-08Merge branch 'selftests-forwarding-install-two-missing-tests'Jakub Kicinski
Martin Blumenstingl says: ==================== selftests: forwarding: Install two missing tests For some distributions (e.g. OpenWrt) we don't want to rely on rsync to copy the tests to the target as some extra dependencies need to be installed. The Makefile in tools/testing/selftests/net/forwarding already installs most of the tests. This series adds the two missing tests to the list of installed tests. That way a downstream distribution can build a package using this Makefile (and add dependencies there as needed). ==================== Link: https://lore.kernel.org/r/20220707135532.1783925-1-martin.blumenstingl@googlemail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08selftests: forwarding: Install no_forwarding.shMartin Blumenstingl
When using the Makefile from tools/testing/selftests/net/forwarding/ all tests should be installed. Add no_forwarding.sh to the list of "to be installed tests" where it has been missing so far. Fixes: 476a4f05d9b83f ("selftests: forwarding: add a no_forwarding.sh test") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08selftests: forwarding: Install local_termination.shMartin Blumenstingl
When using the Makefile from tools/testing/selftests/net/forwarding/ all tests should be installed. Add local_termination.sh to the list of "to be installed tests" where it has been missing so far. Fixes: 90b9566aa5cd3f ("selftests: forwarding: add a test for local_termination.sh") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08Merge tag 'fscache-fixes-20220708' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache fixes from David Howells: - Fix a check in fscache_wait_on_volume_collision() in which the polarity is reversed. It should complain if a volume is still marked acquisition-pending after 20s, but instead complains if the mark has been cleared (ie. the condition has cleared). Also switch an open-coded test of the ACQUIRE_PENDING volume flag to use the helper function for consistency. - Not a fix per se, but neaten the code by using a helper to check for the DROPPED state. - Fix cachefiles's support for erofs to only flush requests associated with a released control file, not all requests. - Fix a race between one process invalidating an object in the cache and another process trying to look it up. * tag 'fscache-fixes-20220708' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: fscache: Fix invalidation/lookup race cachefiles: narrow the scope of flushed requests when releasing fd fscache: Introduce fscache_cookie_is_dropped() fscache: Fix if condition in fscache_wait_on_volume_collision()
2022-07-08Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== bpf 2022-07-08 We've added 3 non-merge commits during the last 2 day(s) which contain a total of 7 files changed, 40 insertions(+), 24 deletions(-). The main changes are: 1) Fix cBPF splat triggered by skb not having a mac header, from Eric Dumazet. 2) Fix spurious packet loss in generic XDP when pushing packets out (note that native XDP is not affected by the issue), from Johan Almbladh. 3) Fix bpf_dynptr_{read,write}() helper signatures with flag argument before its set in stone as UAPI, from Joanne Koong. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Add flags arg to bpf_dynptr_read and bpf_dynptr_write APIs bpf: Make sure mac_header was set before using it xdp: Fix spurious packet loss in generic XDP TX path ==================== Link: https://lore.kernel.org/r/20220708213418.19626-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08ptrace: fix clearing of JOBCTL_TRACED in ptrace_unfreeze_traced()Sven Schnelle
CI reported the following splat while running the strace testsuite: [ 3976.640309] WARNING: CPU: 1 PID: 3570031 at kernel/ptrace.c:272 ptrace_check_attach+0x12e/0x178 [ 3976.640391] CPU: 1 PID: 3570031 Comm: strace Tainted: G OE 5.19.0-20220624.rc3.git0.ee819a77d4e7.300.fc36.s390x #1 [ 3976.640410] Hardware name: IBM 3906 M04 704 (z/VM 7.1.0) [ 3976.640452] Call Trace: [ 3976.640454] [<00000000ab4b645a>] ptrace_check_attach+0x132/0x178 [ 3976.640457] ([<00000000ab4b6450>] ptrace_check_attach+0x128/0x178) [ 3976.640460] [<00000000ab4b6cde>] __s390x_sys_ptrace+0x86/0x160 [ 3976.640463] [<00000000ac03fcec>] __do_syscall+0x1d4/0x200 [ 3976.640468] [<00000000ac04e312>] system_call+0x82/0xb0 [ 3976.640470] Last Breaking-Event-Address: [ 3976.640471] [<00000000ab4ea3c8>] wait_task_inactive+0x98/0x190 This is because JOBCTL_TRACED is set, but the task is not in TASK_TRACED state. Caused by ptrace_unfreeze_traced() which does: task->jobctl &= ~TASK_TRACED but it should be: task->jobctl &= ~JOBCTL_TRACED Fixes: 31cae1eaae4f ("sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Link: https://lkml.kernel.org/r/20220706101625.2100298-1-svens@linux.ibm.com Link: https://lkml.kernel.org/r/YrHA5UkJLornOdCz@li-4a3a4a4c-28e5-11b2-a85c-a8d192c6f089.ibm.com Link: https://bugzilla.redhat.com/show_bug.cgi?id=2101641 Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Alexander Gordeev <agordeev@linux.ibm.com> Tested-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2022-07-08Merge tag 'sunxi-fixes-for-5.19-2' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/fixes - Fix SPI NOR compatible on Orange Pi Zero * tag 'sunxi-fixes-for-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: ARM: dts: sunxi: Fix SPI NOR campatible on Orange Pi Zero Link: https://lore.kernel.org/r/Ysh44qUmdmF6TWS6@kista.localdomain Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-07-08Merge tag 'sunxi-fixes-for-5.19-1' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/fixes - fix binding for D1 display pipeline * tag 'sunxi-fixes-for-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: dt-bindings: display: sun4i: Fix D1 pipeline count Link: https://lore.kernel.org/r/YshiPKZRq6NHxPzO@kista.localdomain Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-07-08Merge tag 'at91-fixes-5.19-2' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/fixes AT91 fixes for 5.19 #2 It contains 2 DT fixes: - one for SAMA5D2 to fix the i2s1 assigned-clock-parents property - one for kswitch-d10 (LAN966 based) enforcing proper settings on GPIO pins * tag 'at91-fixes-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux: ARM: dts: at91: sama5d2: Fix typo in i2s1 node ARM: dts: kswitch-d10: use open drain mode for coma-mode pins Link: https://lore.kernel.org/r/20220708151621.860339-1-claudiu.beznea@microchip.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-07-08Input: wm97xx - make .remove() obviously always return 0Uwe Kleine-König
wm97xx_remove() returns zero unconditionally. To prepare changing the prototype for platform remove callbacks to return void, make it explicit that wm97xx_mfd_remove() always returns zero. The prototype for wm97xx_remove cannot be changed, as it's also used as a plain device remove callback. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20220708062718.240013-1-u.kleine-koenig@pengutronix.de Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-07-08Merge tag 'acpi-5.19-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix two recent regressions related to CPPC support. Specifics: - Prevent _CPC from being used if the platform firmware does not confirm CPPC v2 support via _OSC (Mario Limonciello) - Allow systems with X86_FEATURE_CPPC set to use _CPC even if CPPC support cannot be agreed on via _OSC (Mario Limonciello)" * tag 'acpi-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is supported ACPI: CPPC: Only probe for _CPC if CPPC v2 is acked
2022-07-08Merge tag 'pm-5.19-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix a NULL pointer dereference in a devfreq driver and a runtime PM framework issue that may cause a supplier device to be suspended before its consumer. Specifics: - Fix NULL pointer dereference related to printing a diagnostic message in the exynos-bus devfreq driver (Christian Marangi) - Fix race condition in the runtime PM framework which in some cases may cause a supplier device to be suspended when its consumer is still active (Rafael Wysocki)" * tag 'pm-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / devfreq: exynos-bus: Fix NULL pointer dereference PM: runtime: Fix supplier device management during consumer probe PM: runtime: Redefine pm_runtime_release_supplier()
2022-07-08Merge tag 'cxl-fixes-for-5.19-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Vishal Verma: - Update MAINTAINERS for Ben's email - Fix cleanup of port devices on failure to probe driver - Fix endianness in get/set LSA mailbox command structures - Fix memregion_free() fallback definition - Fix missing variable payload checks in CXL cmd size validation * tag 'cxl-fixes-for-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/mbox: Fix missing variable payload checks in cmd size validation memregion: Fix memregion_free() fallback definition cxl/mbox: Use __le32 in get,set_lsa mailbox structures cxl/core: Use is_endpoint_decoder cxl: Fix cleanup of port devices on failure to probe driver. MAINTAINERS: Update Ben's email address
2022-07-08Merge tag 'iommu-fixes-v5.19-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: - fix device setup failures in the Intel VT-d driver when the PASID table is shared - fix Intel VT-d device hot-add failure due to wrong device notifier order - remove the old IOMMU mailing list from the MAINTAINERS file now that it has been retired * tag 'iommu-fixes-v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: MAINTAINERS: Remove iommu@lists.linux-foundation.org iommu/vt-d: Fix RID2PASID setup/teardown failure iommu/vt-d: Fix PCI bus rescan device hot add
2022-07-08arm64: dts: broadcom: bcm4908: Fix cpu node for smp bootWilliam Zhang
Add spin-table enable-method and cpu-release-addr properties for cpu0 node. This is required by all ARMv8 SoC. Otherwise some bootloader like u-boot can not update cpu-release-addr and linux fails to start up secondary cpus. Fixes: 2961f69f151c ("arm64: dts: broadcom: add BCM4908 and Asus GT-AC5300 early DTS files") Signed-off-by: William Zhang <william.zhang@broadcom.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
2022-07-08arm64: dts: broadcom: bcm4908: Fix timer node for BCM4906 SoCWilliam Zhang
The cpu mask value in interrupt property inherits from bcm4908.dtsi which sets to four cpus. Correct the value to two cpus for dual core BCM4906 SoC. Fixes: c8b404fb05dc ("arm64: dts: broadcom: bcm4908: add BCM4906 Netgear R8000P DTS files") Signed-off-by: William Zhang <william.zhang@broadcom.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
2022-07-08Merge tag 'gpio-fixes-for-v5.19-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix a build error in gpio-vf610 - fix a null-pointer dereference in the GPIO character device code * tag 'gpio-fixes-for-v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpiolib: cdev: fix null pointer dereference in linereq_free() gpio: vf610: fix compilation error
2022-07-08Merge branch 'pm-core'Rafael J. Wysocki
Merge a runtime PM framework cleanup and fix related to device links. * pm-core: PM: runtime: Fix supplier device management during consumer probe PM: runtime: Redefine pm_runtime_release_supplier()
2022-07-08Merge tag 'block-5.19-2022-07-08' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "NVMe pull request with another id quirk addition, and a tracing fix" * tag 'block-5.19-2022-07-08' of git://git.kernel.dk/linux-block: nvme: use struct group for generic command dwords nvme-pci: phison e16 has bogus namespace ids
2022-07-08ARM: dts: sunxi: Fix SPI NOR campatible on Orange Pi ZeroMichal Suchanek
The device tree should include generic "jedec,spi-nor" compatible, and a manufacturer-specific one. The macronix part is what is shipped on the boards that come with a flash chip. Fixes: 45857ae95478 ("ARM: dts: orange-pi-zero: add node for SPI NOR") Signed-off-by: Michal Suchanek <msuchanek@suse.de> Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/r/20220708174529.3360-1-msuchanek@suse.de
2022-07-08Merge tag 'io_uring-5.19-2022-07-08' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring tweak from Jens Axboe: "Just a minor tweak to an addition made in this release cycle: padding a 32-bit value that's in a 64-bit union to avoid any potential funkiness from that" * tag 'io_uring-5.19-2022-07-08' of git://git.kernel.dk/linux-block: io_uring: explicit sqe padding for ioctl commands
2022-07-08Merge tag 'for-5.19/fbdev-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev Pull fbdev fixes from Helge Deller: - fbcon now prevents switching to screen resolutions which are smaller than the font size, and prevents enabling a font which is bigger than the current screen resolution. This fixes vmalloc-out-of-bounds accesses found by KASAN. - Guiling Deng fixed a bug where the centered fbdev logo wasn't displayed correctly if the screen size matched the logo size. - Hsin-Yi Wang provided a patch to include errno.h to fix build when CONFIG_OF isn't enabled. * tag 'for-5.19/fbdev-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev: fbcon: Use fbcon_info_from_console() in fbcon_modechange_possible() fbmem: Check virtual screen sizes in fb_set_var() fbcon: Prevent that screen size is smaller than font size fbcon: Disallow setting font bigger than screen size video: of_display_timing.h: include errno.h fbdev: fbmem: Fix logo center image dx issue
2022-07-08btrfs: zoned: drop optimization of zone finishNaohiro Aota
We have an optimization in do_zone_finish() to send REQ_OP_ZONE_FINISH only when necessary, i.e. we don't send REQ_OP_ZONE_FINISH when we assume we wrote fully into the zone. The assumption is determined by "alloc_offset == capacity". This condition won't work if the last ordered extent is canceled due to some errors. In that case, we consider the zone is deactivated without sending the finish command while it's still active. This inconstancy results in activating another block group while we cannot really activate the underlying zone, which causes the active zone exceeds errors like below. BTRFS error (device nvme3n2): allocation failed flags 1, wanted 520192 tree-log 0, relocation: 0 nvme3n2: I/O Cmd(0x7d) @ LBA 160432128, 127 blocks, I/O Error (sct 0x1 / sc 0xbd) MORE DNR active zones exceeded error, dev nvme3n2, sector 0 op 0xd:(ZONE_APPEND) flags 0x4800 phys_seg 1 prio class 0 nvme3n2: I/O Cmd(0x7d) @ LBA 160432128, 127 blocks, I/O Error (sct 0x1 / sc 0xbd) MORE DNR active zones exceeded error, dev nvme3n2, sector 0 op 0xd:(ZONE_APPEND) flags 0x4800 phys_seg 1 prio class 0 Fix the issue by removing the optimization for now. Fixes: 8376d9e1ed8f ("btrfs: zoned: finish superblock zone once no space left for new SB") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-07-08btrfs: zoned: fix a leaked bioc in read_zone_infoChristoph Hellwig
The bioc would leak on the normal completion path and also on the RAID56 check (but that one won't happen in practice due to the invalid combination with zoned mode). Fixes: 7db1c5d14dcd ("btrfs: zoned: support dev-replace in zoned filesystems") CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> [ update changelog ] Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-07-08btrfs: return -EAGAIN for NOWAIT dio reads/writes on compressed and inline ↵Filipe Manana
extents When doing a direct IO read or write, we always return -ENOTBLK when we find a compressed extent (or an inline extent) so that we fallback to buffered IO. This however is not ideal in case we are in a NOWAIT context (io_uring for example), because buffered IO can block and we currently have no support for NOWAIT semantics for buffered IO, so if we need to fallback to buffered IO we should first signal the caller that we may need to block by returning -EAGAIN instead. This behaviour can also result in short reads being returned to user space, which although it's not incorrect and user space should be able to deal with partial reads, it's somewhat surprising and even some popular applications like QEMU (Link tag #1) and MariaDB (Link tag #2) don't deal with short reads properly (or at all). The short read case happens when we try to read from a range that has a non-compressed and non-inline extent followed by a compressed extent. After having read the first extent, when we find the compressed extent we return -ENOTBLK from btrfs_dio_iomap_begin(), which results in iomap to treat the request as a short read, returning 0 (success) and waiting for previously submitted bios to complete (this happens at fs/iomap/direct-io.c:__iomap_dio_rw()). After that, and while at btrfs_file_read_iter(), we call filemap_read() to use buffered IO to read the remaining data, and pass it the number of bytes we were able to read with direct IO. Than at filemap_read() if we get a page fault error when accessing the read buffer, we return a partial read instead of an -EFAULT error, because the number of bytes previously read is greater than zero. So fix this by returning -EAGAIN for NOWAIT direct IO when we find a compressed or an inline extent. Reported-by: Dominique MARTINET <dominique.martinet@atmark-techno.com> Link: https://lore.kernel.org/linux-btrfs/YrrFGO4A1jS0GI0G@atmark-techno.com/ Link: https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582 Tested-by: Dominique MARTINET <dominique.martinet@atmark-techno.com> CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-07-08ARM: dts: at91: sama5d2: Fix typo in i2s1 nodeRyan Wanner
Fix typo in i2s1 causing errors in dt binding validation. Change assigned-parrents to assigned-clock-parents to match i2s0 node formatting. Fixes: 1ca81883c557 ("ARM: dts: at91: sama5d2: add nodes for I2S controllers") Signed-off-by: Ryan Wanner <Ryan.Wanner@microchip.com> [claudiu.beznea: use imperative addressing in commit description, remove blank line after fixes tag, fix typo in commit message] Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Link: https://lore.kernel.org/r/20220707215812.193008-1-Ryan.Wanner@microchip.com
2022-07-08Merge tag 'tee-fixes-for-v5.19' of ↵Arnd Bergmann
https://git.linaro.org/people/jens.wiklander/linux-tee into arm/fixes Fixes for TEE subsystem A fix for the recently merged commit ed8faf6c8f8c ("optee: add OPTEE_SMC_CALL_WITH_RPC_ARG and OPTEE_SMC_CALL_WITH_REGD_ARG"). Two small fixes in comment, repeated words etc. * tag 'tee-fixes-for-v5.19' of https://git.linaro.org/people/jens.wiklander/linux-tee: tee: tee_get_drvdata(): fix description of return value optee: Remove duplicate 'of' in two places. optee: smc_abi.c: fix wrong pointer passed to IS_ERR/PTR_ERR() Link: https://lore.kernel.org/r/20220708134607.GA901814@jade Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-07-08ovl: turn of SB_POSIXACL with idmapped layers temporarilyChristian Brauner
This cycle we added support for mounting overlayfs on top of idmapped mounts. Recently I've started looking into potential corner cases when trying to add additional tests and I noticed that reporting for POSIX ACLs is currently wrong when using idmapped layers with overlayfs mounted on top of it. I have sent out an patch that fixes this and makes POSIX ACLs work correctly but the patch is a bit bigger and we're already at -rc5 so I recommend we simply don't raise SB_POSIXACL when idmapped layers are used. Then we can fix the VFS part described below for the next merge window so we can have good exposure in -next. I'm going to give a rather detailed explanation to both the origin of the problem and mention the solution so people know what's going on. Let's assume the user creates the following directory layout and they have a rootfs /var/lib/lxc/c1/rootfs. The files in this rootfs are owned as you would expect files on your host system to be owned. For example, ~/.bashrc for your regular user would be owned by 1000:1000 and /root/.bashrc would be owned by 0:0. IOW, this is just regular boring filesystem tree on an ext4 or xfs filesystem. The user chooses to set POSIX ACLs using the setfacl binary granting the user with uid 4 read, write, and execute permissions for their .bashrc file: setfacl -m u:4:rwx /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc Now they to expose the whole rootfs to a container using an idmapped mount. So they first create: mkdir -pv /vol/contpool/{ctrover,merge,lowermap,overmap} mkdir -pv /vol/contpool/ctrover/{over,work} chown 10000000:10000000 /vol/contpool/ctrover/{over,work} The user now creates an idmapped mount for the rootfs: mount-idmapped/mount-idmapped --map-mount=b:0:10000000:65536 \ /var/lib/lxc/c2/rootfs \ /vol/contpool/lowermap This for example makes it so that /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc which is owned by uid and gid 1000 as being owned by uid and gid 10001000 at /vol/contpool/lowermap/home/ubuntu/.bashrc. Assume the user wants to expose these idmapped mounts through an overlayfs mount to a container. mount -t overlay overlay \ -o lowerdir=/vol/contpool/lowermap, \ upperdir=/vol/contpool/overmap/over, \ workdir=/vol/contpool/overmap/work \ /vol/contpool/merge The user can do this in two ways: (1) Mount overlayfs in the initial user namespace and expose it to the container. (2) Mount overlayfs on top of the idmapped mounts inside of the container's user namespace. Let's assume the user chooses the (1) option and mounts overlayfs on the host and then changes into a container which uses the idmapping 0:10000000:65536 which is the same used for the two idmapped mounts. Now the user tries to retrieve the POSIX ACLs using the getfacl command getfacl -n /vol/contpool/lowermap/home/ubuntu/.bashrc and to their surprise they see: # file: vol/contpool/merge/home/ubuntu/.bashrc # owner: 1000 # group: 1000 user::rw- user:4294967295:rwx group::r-- mask::rwx other::r-- indicating the uid wasn't correctly translated according to the idmapped mount. The problem is how we currently translate POSIX ACLs. Let's inspect the callchain in this example: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */ sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() |> vfs_getxattr() | -> __vfs_getxattr() | -> handler->get == ovl_posix_acl_xattr_get() | -> ovl_xattr_get() | -> vfs_getxattr() | -> __vfs_getxattr() | -> handler->get() /* lower filesystem callback */ |> posix_acl_fix_xattr_to_user() { 4 = make_kuid(&init_user_ns, 4); 4 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 4); /* FAILURE */ -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4); } If the user chooses to use option (2) and mounts overlayfs on top of idmapped mounts inside the container things don't look that much better: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:10000000:65536 sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() |> vfs_getxattr() | -> __vfs_getxattr() | -> handler->get == ovl_posix_acl_xattr_get() | -> ovl_xattr_get() | -> vfs_getxattr() | -> __vfs_getxattr() | -> handler->get() /* lower filesystem callback */ |> posix_acl_fix_xattr_to_user() { 4 = make_kuid(&init_user_ns, 4); 4 = mapped_kuid_fs(&init_user_ns, 4); /* FAILURE */ -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4); } As is easily seen the problem arises because the idmapping of the lower mount isn't taken into account as all of this happens in do_gexattr(). But do_getxattr() is always called on an overlayfs mount and inode and thus cannot possible take the idmapping of the lower layers into account. This problem is similar for fscaps but there the translation happens as part of vfs_getxattr() already. Let's walk through an fscaps overlayfs callchain: setcap 'cap_net_raw+ep' /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc The expected outcome here is that we'll receive the cap_net_raw capability as we are able to map the uid associated with the fscap to 0 within our container. IOW, we want to see 0 as the result of the idmapping translations. If the user chooses option (1) we get the following callchain for fscaps: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */ sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() -> vfs_getxattr() -> xattr_getsecurity() -> security_inode_getsecurity() ________________________________ -> cap_inode_getsecurity() | | { V | 10000000 = make_kuid(0:0:4k /* overlayfs idmapping */, 10000000); | 10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000); | /* Expected result is 0 and thus that we own the fscap. */ | 0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000); | } | -> vfs_getxattr_alloc() | -> handler->get == ovl_other_xattr_get() | -> vfs_getxattr() | -> xattr_getsecurity() | -> security_inode_getsecurity() | -> cap_inode_getsecurity() | { | 0 = make_kuid(0:0:4k /* lower s_user_ns */, 0); | 10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0); | 10000000 = from_kuid(0:0:4k /* overlayfs idmapping */, 10000000); | |____________________________________________________________________| } -> vfs_getxattr_alloc() -> handler->get == /* lower filesystem callback */ And if the user chooses option (2) we get: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:10000000:65536 sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() -> vfs_getxattr() -> xattr_getsecurity() -> security_inode_getsecurity() _______________________________ -> cap_inode_getsecurity() | | { V | 10000000 = make_kuid(0:10000000:65536 /* overlayfs idmapping */, 0); | 10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000); | /* Expected result is 0 and thus that we own the fscap. */ | 0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000); | } | -> vfs_getxattr_alloc() | -> handler->get == ovl_other_xattr_get() | |-> vfs_getxattr() | -> xattr_getsecurity() | -> security_inode_getsecurity() | -> cap_inode_getsecurity() | { | 0 = make_kuid(0:0:4k /* lower s_user_ns */, 0); | 10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0); | 0 = from_kuid(0:10000000:65536 /* overlayfs idmapping */, 10000000); | |____________________________________________________________________| } -> vfs_getxattr_alloc() -> handler->get == /* lower filesystem callback */ We can see how the translation happens correctly in those cases as the conversion happens within the vfs_getxattr() helper. For POSIX ACLs we need to do something similar. However, in contrast to fscaps we cannot apply the fix directly to the kernel internal posix acl data structure as this would alter the cached values and would also require a rework of how we currently deal with POSIX ACLs in general which almost never take the filesystem idmapping into account (the noteable exception being FUSE but even there the implementation is special) and instead retrieve the raw values based on the initial idmapping. The correct values are then generated right before returning to userspace. The fix for this is to move taking the mount's idmapping into account directly in vfs_getxattr() instead of having it be part of posix_acl_fix_xattr_to_user(). To this end we simply move the idmapped mount translation into a separate step performed in vfs_{g,s}etxattr() instead of in posix_acl_fix_xattr_{from,to}_user(). To see how this fixes things let's go back to the original example. Assume the user chose option (1) and mounted overlayfs on top of idmapped mounts on the host: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */ sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() |> vfs_getxattr() | |> __vfs_getxattr() | | -> handler->get == ovl_posix_acl_xattr_get() | | -> ovl_xattr_get() | | -> vfs_getxattr() | | |> __vfs_getxattr() | | | -> handler->get() /* lower filesystem callback */ | | |> posix_acl_getxattr_idmapped_mnt() | | { | | 4 = make_kuid(&init_user_ns, 4); | | 10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4); | | 10000004 = from_kuid(&init_user_ns, 10000004); | | |_______________________ | | } | | | | | |> posix_acl_getxattr_idmapped_mnt() | | { | | V | 10000004 = make_kuid(&init_user_ns, 10000004); | 10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004); | 10000004 = from_kuid(&init_user_ns, 10000004); | } |_________________________________________________ | | | | |> posix_acl_fix_xattr_to_user() | { V 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004); /* SUCCESS */ 4 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000004); } And similarly if the user chooses option (1) and mounted overayfs on top of idmapped mounts inside the container: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:10000000:65536 sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() |> vfs_getxattr() | |> __vfs_getxattr() | | -> handler->get == ovl_posix_acl_xattr_get() | | -> ovl_xattr_get() | | -> vfs_getxattr() | | |> __vfs_getxattr() | | | -> handler->get() /* lower filesystem callback */ | | |> posix_acl_getxattr_idmapped_mnt() | | { | | 4 = make_kuid(&init_user_ns, 4); | | 10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4); | | 10000004 = from_kuid(&init_user_ns, 10000004); | | |_______________________ | | } | | | | | |> posix_acl_getxattr_idmapped_mnt() | | { V | 10000004 = make_kuid(&init_user_ns, 10000004); | 10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004); | 10000004 = from_kuid(0(&init_user_ns, 10000004); | |_________________________________________________ | } | | | |> posix_acl_fix_xattr_to_user() | { V 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004); /* SUCCESS */ 4 = from_kuid(0:10000000:65536 /* caller's idmappings */, 10000004); } The last remaining problem we need to fix here is ovl_get_acl(). During ovl_permission() overlayfs will call: ovl_permission() -> generic_permission() -> acl_permission_check() -> check_acl() -> get_acl() -> inode->i_op->get_acl() == ovl_get_acl() > get_acl() /* on the underlying filesystem) ->inode->i_op->get_acl() == /*lower filesystem callback */ -> posix_acl_permission() passing through the get_acl request to the underlying filesystem. This will retrieve the acls stored in the lower filesystem without taking the idmapping of the underlying mount into account as this would mean altering the cached values for the lower filesystem. The simple solution is to have ovl_get_acl() simply duplicate the ACLs, update the values according to the idmapped mount and return it to acl_permission_check() so it can be used in posix_acl_permission(). Since overlayfs doesn't cache ACLs they'll be released right after. Link: https://github.com/brauner/mount-idmapped/issues/9 Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: linux-unionfs@vger.kernel.org Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Fixes: bc70682a497c ("ovl: support idmapped layers") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-07-08tty: use new tty_insert_flip_string_and_push_buffer() in pty_write()Jiri Slaby
There is a race in pty_write(). pty_write() can be called in parallel with e.g. ioctl(TIOCSTI) or ioctl(TCXONC) which also inserts chars to the buffer. Provided, tty_flip_buffer_push() in pty_write() is called outside the lock, it can commit inconsistent tail. This can lead to out of bounds writes and other issues. See the Link below. To fix this, we have to introduce a new helper called tty_insert_flip_string_and_push_buffer(). It does both tty_insert_flip_string() and tty_flip_buffer_commit() under the port lock. It also calls queue_work(), but outside the lock. See 71a174b39f10 (pty: do tty_flip_buffer_push without port->lock in pty_write) for the reasons. Keep the helper internal-only (in drivers' tty.h). It is not intended to be used widely. Link: https://seclists.org/oss-sec/2022/q2/155 Fixes: 71a174b39f10 (pty: do tty_flip_buffer_push without port->lock in pty_write) Cc: 一只狗 <chennbnbnb@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Suggested-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20220707082558.9250-2-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>