summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-04-26serial: 8250: Handle UART without interrupt on TEMTIlpo Järvinen
Add UART_CAP_NOTEMT for UARTs that lack interrupt on TEMT but want to use em485. Em485 framework needs to ensure not only FIFO is empty but also that tx shift register is empty. This approach uses Uwe Kleine-König's suggestion on simply using/incrementing stop_tx timer rather than adding another timer. When UART_CAP_NOTEMT is set and THRE is present w/o TEMT, stop tx timer is reused to wait for the emptying of the shift register. This change does not add the UART_CAP_NOTEMT define as it already exist but is currently no-op. See 7a107b2c6b81 (Revert "serial: 8250: Handle UART without interrupt on TEMT using em485") for further details. Vicente Bergas reported that RTS is deasserted roughly one bit too early losing stop bit tx. To address this problem, stop_delay now accounts for one extra bit using rough formula /7 (assumes worst-case of 2+5 bits). I suspect this glitch had to do with when THRE is getting asserted. If FIFO is emptied already during the tx of the stop bit, perhaps it leads to HW asserting THRE early for the normal frame time formula to work accurately. Suggested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Eric Tremblay <etremblay@distech-controls.com> Tested-by: Vicente Bergas <vicencb@gmail.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20220425143410.12703-4-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26serial: 8250: use THRE & __stop_tx also with DMAIlpo Järvinen
8250 DMA tx complete path lacks calls to normal 8250 stop handling. It does not use THRE to detect true completion of the tx and also doesn't call __stop_tx. This leads to problems with em485 that needs to handle RTS timing. Instead of handling tx stop internally within 8250 dma code, enable THRE when tx'able data runs out and tweak serial8250_handle_irq to call only __stop_tx when uart is using DMA. It also seems bit early to call serial8250_rpm_put_tx from there while tx is still underway(?). Tested-by: Vicente Bergas <vicencb@gmail.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20220425143410.12703-3-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26serial: Store character timing information to uart_portIlpo Järvinen
Struct uart_port currently stores FIFO timeout. Having character timing information readily available is useful. Even serial core itself determines char_time from port->timeout using inverse calculation. Store frame_time directly into uart_port. Character time is stored in nanoseconds to have reasonable precision with high rates. To avoid overflow, 64-bit math is necessary. It might be possible to determine timeout from frame_time by multiplying it with fifosize as needed but only part of the users seem to be protected by a lock. Thus, this patch does not pursue storing only frame_time in uart_port. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20220425143410.12703-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26serial: 8250: dw: Improve RZN1 supportPhil Edworthy
Renesas RZ/N1 SoC features a slightly modified DW UART. On this SoC, the CPR register value is known but not synthetized in hardware. We hence need to provide a CPR value in the platform data. This version of the controller also relies on acting as flow controller when using DMA, so we need to provide the "is dma flow controller" quirk. Co-developed-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20220422180615.9098-10-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26serial: 8250: dw: Add support for DMA flow controlling devicesPhil Edworthy
DW based controllers like the one on Renesas RZ/N1 must be programmed as flow controllers when using DMA. * Table 11.45 of the system manual, "Flow Control Combinations", states that using UART with DMA requires setting the DMA in the peripheral flow controller mode regardless of the direction. * Chapter 11.6.1.3 of the system manual, "Basic Interface Definitions", explains that the burst size in the above case must be configured in the peripheral's register DEST/SRC_BURST_SIZE. Experiments shown that upon Rx timeout, the DMA transaction needed to be manually cleared as well. Co-developed-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20220422180615.9098-9-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26serial: 8250: dw: Move the IO accessors to 8250_dwlib.hMiquel Raynal
These accessors should be used instead of the regular readl/writel() helpers. In order to use them also from 8250_dw.c in this directory, move the helpers to 8250_dwlib.h There is no functional change. There is no need for declaring `struct uart_port` or even UPIO_MEM32BE which both are already included in the 8250_dwlib.h header by 8250.h. Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20220422180615.9098-8-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26serial: 8250: dw: Introduce an rx_timeout variable in the IRQ pathMiquel Raynal
In a next change we are going to need the same Rx timeout condition as we already have in the IRQ handling code. Let's just create a boolean to clarify what this operation does before reusing it. There is no functional change. Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20220422180615.9098-7-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26serial: 8250: dma: Allow driver operations before starting DMA transfersMiquel Raynal
One situation where this could be used is when configuring the UART controller to be the DMA flow controller. This is a typical case where the driver might need to program a few more registers before starting a DMA transfer. Provide the necessary infrastructure to support this case. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20220422180615.9098-6-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26serial: 8250: dw: Allow to use a fallback CPR value if not synthesizedMiquel Raynal
DW UART controllers can be synthesized without the CPR register. In this case, allow to the platform information to provide a CPR value. Co-developed-by: Phil Edworthy <phil.edworthy@renesas.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20220422180615.9098-5-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26serial: 8250: dw: Move the USR register to pdataMiquel Raynal
This offset is a good candidate to pdata's because it changes depending on the vendor implementation. Let's move the usr_reg entry from regular to pdata. This way we can drop initializing it at run time. Let's also use a define for it instead of defining only the default value. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20220422180615.9098-4-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26serial: 8250: dw: Create a generic platform data structureEmil Renner Berthing
Use device tree match data rather than multiple calls to of_device_is_compatible() by introducing a platform data structure and adding a quirks mask. Provide a stub to the compatibles without quirks to simplify the handling of the upcoming changes. Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> [<miquel.raynal@bootlin.com: Minor changes + creation of a real pdata structure] Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20220422180615.9098-3-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26serial: 8250: dw: Move definitions to the shared headerPhil Edworthy
Move the per-device structure and a helper out of the main .c file, into a shared header as they will both be reused from another .c file. There is no functional change. Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com> [miquel.raynal@bootlin.com: Extracted from a bigger change] Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20220422180615.9098-2-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26virtio_net: fix wrong buf address calculation when using xdpNikolay Aleksandrov
We received a report[1] of kernel crashes when Cilium is used in XDP mode with virtio_net after updating to newer kernels. After investigating the reason it turned out that when using mergeable bufs with an XDP program which adjusts xdp.data or xdp.data_meta page_to_buf() calculates the build_skb address wrong because the offset can become less than the headroom so it gets the address of the previous page (-X bytes depending on how lower offset is): page_to_skb: page addr ffff9eb2923e2000 buf ffff9eb2923e1ffc offset 252 headroom 256 This is a pr_err() I added in the beginning of page_to_skb which clearly shows offset that is less than headroom by adding 4 bytes of metadata via an xdp prog. The calculations done are: receive_mergeable(): headroom = VIRTIO_XDP_HEADROOM; // VIRTIO_XDP_HEADROOM == 256 bytes offset = xdp.data - page_address(xdp_page) - vi->hdr_len - metasize; page_to_skb(): p = page_address(page) + offset; ... buf = p - headroom; Now buf goes -4 bytes from the page's starting address as can be seen above which is set as skb->head and skb->data by build_skb later. Depending on what's done with the skb (when it's freed most often) we get all kinds of corruptions and BUG_ON() triggers in mm[2]. We have to recalculate the new headroom after the xdp program has run, similar to how offset and len are recalculated. Headroom is directly related to data_hard_start, data and data_meta, so we use them to get the new size. The result is correct (similar pr_err() in page_to_skb, one case of xdp_page and one case of virtnet buf): a) Case with 4 bytes of metadata [ 115.949641] page_to_skb: page addr ffff8b4dcfad2000 offset 252 headroom 252 [ 121.084105] page_to_skb: page addr ffff8b4dcf018000 offset 20732 headroom 252 b) Case of pushing data +32 bytes [ 153.181401] page_to_skb: page addr ffff8b4dd0c4d000 offset 288 headroom 288 [ 158.480421] page_to_skb: page addr ffff8b4dd00b0000 offset 24864 headroom 288 c) Case of pushing data -33 bytes [ 835.906830] page_to_skb: page addr ffff8b4dd3270000 offset 223 headroom 223 [ 840.839910] page_to_skb: page addr ffff8b4dcdd68000 offset 12511 headroom 223 Offset and headroom are equal because offset points to the start of reserved bytes for the virtio_net header which are at buf start + headroom, while data points at buf start + vnet hdr size + headroom so when data or data_meta are adjusted by the xdp prog both the headroom size and the offset change equally. We can use data_hard_start to compute the new headroom after the xdp prog (linearized / page start case, the virtnet buf case is similar just with bigger base offset): xdp.data_hard_start = page_address + vnet_hdr xdp.data = page_address + vnet_hdr + headroom new headroom after xdp prog = xdp.data - xdp.data_hard_start - metasize An example reproducer xdp prog[3] is below. [1] https://github.com/cilium/cilium/issues/19453 [2] Two of the many traces: [ 40.437400] BUG: Bad page state in process swapper/0 pfn:14940 [ 40.916726] BUG: Bad page state in process systemd-resolve pfn:053b7 [ 41.300891] kernel BUG at include/linux/mm.h:720! [ 41.301801] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 41.302784] CPU: 1 PID: 1181 Comm: kubelet Kdump: loaded Tainted: G B W 5.18.0-rc1+ #37 [ 41.304458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014 [ 41.306018] RIP: 0010:page_frag_free+0x79/0xe0 [ 41.306836] Code: 00 00 75 ea 48 8b 07 a9 00 00 01 00 74 e0 48 8b 47 48 48 8d 50 ff a8 01 48 0f 45 fa eb d0 48 c7 c6 18 b8 30 a6 e8 d7 f8 fc ff <0f> 0b 48 8d 78 ff eb bc 48 8b 07 a9 00 00 01 00 74 3a 66 90 0f b6 [ 41.310235] RSP: 0018:ffffac05c2a6bc78 EFLAGS: 00010292 [ 41.311201] RAX: 000000000000003e RBX: 0000000000000000 RCX: 0000000000000000 [ 41.312502] RDX: 0000000000000001 RSI: ffffffffa6423004 RDI: 00000000ffffffff [ 41.313794] RBP: ffff993c98823600 R08: 0000000000000000 R09: 00000000ffffdfff [ 41.315089] R10: ffffac05c2a6ba68 R11: ffffffffa698ca28 R12: ffff993c98823600 [ 41.316398] R13: ffff993c86311ebc R14: 0000000000000000 R15: 000000000000005c [ 41.317700] FS: 00007fe13fc56740(0000) GS:ffff993cdd900000(0000) knlGS:0000000000000000 [ 41.319150] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 41.320152] CR2: 000000c00008a000 CR3: 0000000014908000 CR4: 0000000000350ee0 [ 41.321387] Call Trace: [ 41.321819] <TASK> [ 41.322193] skb_release_data+0x13f/0x1c0 [ 41.322902] __kfree_skb+0x20/0x30 [ 41.343870] tcp_recvmsg_locked+0x671/0x880 [ 41.363764] tcp_recvmsg+0x5e/0x1c0 [ 41.384102] inet_recvmsg+0x42/0x100 [ 41.406783] ? sock_recvmsg+0x1d/0x70 [ 41.428201] sock_read_iter+0x84/0xd0 [ 41.445592] ? 0xffffffffa3000000 [ 41.462442] new_sync_read+0x148/0x160 [ 41.479314] ? 0xffffffffa3000000 [ 41.496937] vfs_read+0x138/0x190 [ 41.517198] ksys_read+0x87/0xc0 [ 41.535336] do_syscall_64+0x3b/0x90 [ 41.551637] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 41.568050] RIP: 0033:0x48765b [ 41.583955] Code: e8 4a 35 fe ff eb 88 cc cc cc cc cc cc cc cc e8 fb 7a fe ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 [ 41.632818] RSP: 002b:000000c000a2f5b8 EFLAGS: 00000212 ORIG_RAX: 0000000000000000 [ 41.664588] RAX: ffffffffffffffda RBX: 000000c000062000 RCX: 000000000048765b [ 41.681205] RDX: 0000000000005e54 RSI: 000000c000e66000 RDI: 0000000000000016 [ 41.697164] RBP: 000000c000a2f608 R08: 0000000000000001 R09: 00000000000001b4 [ 41.713034] R10: 00000000000000b6 R11: 0000000000000212 R12: 00000000000000e9 [ 41.728755] R13: 0000000000000001 R14: 000000c000a92000 R15: ffffffffffffffff [ 41.744254] </TASK> [ 41.758585] Modules linked in: br_netfilter bridge veth netconsole virtio_net and [ 33.524802] BUG: Bad page state in process systemd-network pfn:11e60 [ 33.528617] page ffffe05dc0147b00 ffffe05dc04e7a00 ffff8ae9851ec000 (1) len 82 offset 252 metasize 4 hroom 0 hdr_len 12 data ffff8ae9851ec10c data_meta ffff8ae9851ec108 data_end ffff8ae9851ec14e [ 33.529764] page:000000003792b5ba refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x11e60 [ 33.532463] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) [ 33.532468] raw: 000fffffc0000000 0000000000000000 dead000000000122 0000000000000000 [ 33.532470] raw: 0000000000000000 0000000000000000 00000000fffffdff 0000000000000000 [ 33.532471] page dumped because: nonzero mapcount [ 33.532472] Modules linked in: br_netfilter bridge veth netconsole virtio_net [ 33.532479] CPU: 0 PID: 791 Comm: systemd-network Kdump: loaded Not tainted 5.18.0-rc1+ #37 [ 33.532482] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014 [ 33.532484] Call Trace: [ 33.532496] <TASK> [ 33.532500] dump_stack_lvl+0x45/0x5a [ 33.532506] bad_page.cold+0x63/0x94 [ 33.532510] free_pcp_prepare+0x290/0x420 [ 33.532515] free_unref_page+0x1b/0x100 [ 33.532518] skb_release_data+0x13f/0x1c0 [ 33.532524] kfree_skb_reason+0x3e/0xc0 [ 33.532527] ip6_mc_input+0x23c/0x2b0 [ 33.532531] ip6_sublist_rcv_finish+0x83/0x90 [ 33.532534] ip6_sublist_rcv+0x22b/0x2b0 [3] XDP program to reproduce(xdp_pass.c): #include <linux/bpf.h> #include <bpf/bpf_helpers.h> SEC("xdp_pass") int xdp_pkt_pass(struct xdp_md *ctx) { bpf_xdp_adjust_head(ctx, -(int)32); return XDP_PASS; } char _license[] SEC("license") = "GPL"; compile: clang -O2 -g -Wall -target bpf -c xdp_pass.c -o xdp_pass.o load on virtio_net: ip link set enp1s0 xdpdrv obj xdp_pass.o sec xdp_pass CC: stable@vger.kernel.org CC: Jason Wang <jasowang@redhat.com> CC: Xuan Zhuo <xuanzhuo@linux.alibaba.com> CC: Daniel Borkmann <daniel@iogearbox.net> CC: "Michael S. Tsirkin" <mst@redhat.com> CC: virtualization@lists.linux-foundation.org Fixes: 8fb7da9e9907 ("virtio_net: get build_skb() buf by data ptr") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20220425103703.3067292-1-razor@blackwall.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-26sysrq: do not omit current cpu when showing backtrace of all active CPUsChangbin Du
The backtrace of current CPU also should be printed as it is active. This change add stack trace for current CPU and print a hint for idle CPU for the generic workqueue based printing. (x86 already does this) Now it looks like below: [ 279.401567] sysrq: Show backtrace of all active CPUs [ 279.407234] sysrq: CPU5: [ 279.407505] Call Trace: [ 279.408789] [<ffffffff8000606c>] dump_backtrace+0x2c/0x3a [ 279.411698] [<ffffffff800060ac>] show_stack+0x32/0x3e [ 279.411809] [<ffffffff80542258>] sysrq_handle_showallcpus+0x4c/0xc6 [ 279.411929] [<ffffffff80542f16>] __handle_sysrq+0x106/0x26c [ 279.412034] [<ffffffff805436a8>] write_sysrq_trigger+0x64/0x74 [ 279.412139] [<ffffffff8029cd48>] proc_reg_write+0x8e/0xe2 [ 279.412252] [<ffffffff8021a8f8>] vfs_write+0x90/0x2be [ 279.412362] [<ffffffff8021acd2>] ksys_write+0xa6/0xce [ 279.412467] [<ffffffff8021ad24>] sys_write+0x2a/0x38 [ 279.412689] [<ffffffff80003ff8>] ret_from_syscall+0x0/0x2 [ 279.417173] sysrq: CPU6: backtrace skipped as idling [ 279.417185] sysrq: CPU4: backtrace skipped as idling [ 279.417187] sysrq: CPU0: backtrace skipped as idling [ 279.417181] sysrq: CPU7: backtrace skipped as idling [ 279.417190] sysrq: CPU1: backtrace skipped as idling [ 279.417193] sysrq: CPU3: backtrace skipped as idling [ 279.417219] sysrq: CPU2: [ 279.419179] Call Trace: [ 279.419440] [<ffffffff8000606c>] dump_backtrace+0x2c/0x3a [ 279.419782] [<ffffffff800060ac>] show_stack+0x32/0x3e [ 279.420015] [<ffffffff80542b30>] showacpu+0x5c/0x96 [ 279.420317] [<ffffffff800ba71c>] flush_smp_call_function_queue+0xd6/0x218 [ 279.420569] [<ffffffff800bb438>] generic_smp_call_function_single_interrupt+0x14/0x1c [ 279.420798] [<ffffffff800079ae>] handle_IPI+0xaa/0x13a [ 279.421024] [<ffffffff804dcb92>] riscv_intc_irq+0x56/0x70 [ 279.421274] [<ffffffff80a05b70>] generic_handle_arch_irq+0x6a/0xfa [ 279.421518] [<ffffffff80004006>] ret_from_exception+0x0/0x10 [ 279.421750] [<ffffffff80096492>] rcu_idle_enter+0x16/0x1e Signed-off-by: Changbin Du <changbin.du@gmail.com> Link: https://lore.kernel.org/r/20220117154300.2808-1-changbin.du@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26tty: hvcs: simplify if-if to if-elseWan Jiabing
Use if and else instead of if(A) and if (!A) and fix a coding style. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Link: https://lore.kernel.org/r/20220424091310.98780-1-wanjiabing@vivo.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26tty/hvc_opal: simplify if-if to if-elseWan Jiabing
Use if and else instead of if(A) and if (!A). Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Link: https://lore.kernel.org/r/20220426071041.168282-1-wanjiabing@vivo.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-26net: dsa: mv88e6xxx: Fix port_hidden_wait to account for port_base_addrNathan Rossi
The other port_hidden functions rely on the port_read/port_write functions to access the hidden control port. These functions apply the offset for port_base_addr where applicable. Update port_hidden_wait to use the port_wait_bit so that port_base_addr offsets are accounted for when waiting for the busy bit to change. Without the offset the port_hidden_wait function would timeout on devices that have a non-zero port_base_addr (e.g. MV88E6141), however devices that have a zero port_base_addr would operate correctly (e.g. MV88E6390). Fixes: 609070133aff ("net: dsa: mv88e6xxx: update code operating on hidden registers") Signed-off-by: Nathan Rossi <nathan@nathanrossi.com> Reviewed-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220425070454.348584-1-nathan@nathanrossi.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-26net: phy: marvell10g: fix return value on errorBaruch Siach
Return back the error value that we get from phy_read_mmd(). Fixes: c84786fa8f91 ("net: phy: marvell10g: read copper results from CSSR1") Signed-off-by: Baruch Siach <baruch.siach@siklu.com> Reviewed-by: Marek Behún <kabel@kernel.org> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/f47cb031aeae873bb008ba35001607304a171a20.1650868058.git.baruch@tkos.co.il Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-26bug: Have __warn() prototype defined unconditionallyShida Zhang
The __warn() prototype is declared in CONFIG_BUG scope but the function definition in panic.c is unconditional. The IBT enablement started using it unconditionally but a CONFIG_X86_KERNEL_IBT=y, CONFIG_BUG=n .config will trigger a arch/x86/kernel/traps.c: In function ‘__exc_control_protection’: arch/x86/kernel/traps.c:249:17: error: implicit declaration of function \ ‘__warn’; did you mean ‘pr_warn’? [-Werror=implicit-function-declaration] Pull up the declarations so that they're unconditionally visible too. [ bp: Rewrite commit message. ] Fixes: 991625f3dd2c ("x86/ibt: Add IBT feature, MSR and #CP handling") Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220426032007.510245-1-starzhangzsd@gmail.com
2022-04-26net: bcmgenet: hide status block before TX timestampingJonathan Lemon
The hardware checksum offloading requires use of a transmit status block inserted before the outgoing frame data, this was updated in '9a9ba2a4aaaa ("net: bcmgenet: always enable status blocks")' However, skb_tx_timestamp() assumes that it is passed a raw frame and PTP parsing chokes on this status block. Fix this by calling __skb_pull(), which hides the TSB before calling skb_tx_timestamp(), so an outgoing PTP packet is parsed correctly. As the data in the skb has already been set up for DMA, and the dma_unmap_* calls use a separately stored address, there is no no effective change in the data transmission. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220424165307.591145-1-jonathan.lemon@gmail.com Fixes: d03825fba459 ("net: bcmgenet: add skb_tx_timestamp call") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-26mctp: defer the kfree of object mdev->addrsLin Ma
The function mctp_unregister() reclaims the device's relevant resource when a netcard detaches. However, a running routine may be unaware of this and cause the use-after-free of the mdev->addrs object. The race condition can be demonstrated below cleanup thread another thread | unregister_netdev() | mctp_sendmsg() ... | ... mctp_unregister() | rt = mctp_route_lookup() ... | mctl_local_output() kfree(mdev->addrs) | ... | saddr = rt->dev->addrs[0]; | An attacker can adopt the (recent provided) mtcpserial driver with pty to fake the device detaching and use the userfaultfd to increase the race success chance (in mctp_sendmsg). The KASan report for such a POC is shown below: [ 86.051955] ================================================================== [ 86.051955] BUG: KASAN: use-after-free in mctp_local_output+0x4e9/0xb7d [ 86.051955] Read of size 1 at addr ffff888005f298c0 by task poc/295 [ 86.051955] [ 86.051955] Call Trace: [ 86.051955] <TASK> [ 86.051955] dump_stack_lvl+0x33/0x42 [ 86.051955] print_report.cold.13+0xb2/0x6b3 [ 86.051955] ? preempt_schedule_irq+0x57/0x80 [ 86.051955] ? mctp_local_output+0x4e9/0xb7d [ 86.051955] kasan_report+0xa5/0x120 [ 86.051955] ? mctp_local_output+0x4e9/0xb7d [ 86.051955] mctp_local_output+0x4e9/0xb7d [ 86.051955] ? mctp_dev_set_key+0x79/0x79 [ 86.051955] ? copyin+0x38/0x50 [ 86.051955] ? _copy_from_iter+0x1b6/0xf20 [ 86.051955] ? sysvec_apic_timer_interrupt+0x97/0xb0 [ 86.051955] ? asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 86.051955] ? mctp_local_output+0x1/0xb7d [ 86.051955] mctp_sendmsg+0x64d/0xdb0 [ 86.051955] ? mctp_sk_close+0x20/0x20 [ 86.051955] ? __fget_light+0x2fd/0x4f0 [ 86.051955] ? mctp_sk_close+0x20/0x20 [ 86.051955] sock_sendmsg+0xdd/0x110 [ 86.051955] __sys_sendto+0x1cc/0x2a0 [ 86.051955] ? __ia32_sys_getpeername+0xa0/0xa0 [ 86.051955] ? new_sync_write+0x335/0x550 [ 86.051955] ? alloc_file+0x22f/0x500 [ 86.051955] ? __ip_do_redirect+0x820/0x1820 [ 86.051955] ? vfs_write+0x44d/0x7b0 [ 86.051955] ? vfs_write+0x44d/0x7b0 [ 86.051955] ? fput_many+0x15/0x120 [ 86.051955] ? ksys_write+0x155/0x1b0 [ 86.051955] ? __ia32_sys_read+0xa0/0xa0 [ 86.051955] __x64_sys_sendto+0xd8/0x1b0 [ 86.051955] ? exit_to_user_mode_prepare+0x2f/0x120 [ 86.051955] ? syscall_exit_to_user_mode+0x12/0x20 [ 86.051955] do_syscall_64+0x3a/0x80 [ 86.051955] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 86.051955] RIP: 0033:0x7f82118a56b3 [ 86.051955] RSP: 002b:00007ffdb154b110 EFLAGS: 00000293 ORIG_RAX: 000000000000002c [ 86.051955] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f82118a56b3 [ 86.051955] RDX: 0000000000000010 RSI: 00007f8211cd4000 RDI: 0000000000000007 [ 86.051955] RBP: 00007ffdb154c1d0 R08: 00007ffdb154b164 R09: 000000000000000c [ 86.051955] R10: 0000000000000000 R11: 0000000000000293 R12: 000055d779800db0 [ 86.051955] R13: 00007ffdb154c2b0 R14: 0000000000000000 R15: 0000000000000000 [ 86.051955] </TASK> [ 86.051955] [ 86.051955] Allocated by task 295: [ 86.051955] kasan_save_stack+0x1c/0x40 [ 86.051955] __kasan_kmalloc+0x84/0xa0 [ 86.051955] mctp_rtm_newaddr+0x242/0x610 [ 86.051955] rtnetlink_rcv_msg+0x2fd/0x8b0 [ 86.051955] netlink_rcv_skb+0x11c/0x340 [ 86.051955] netlink_unicast+0x439/0x630 [ 86.051955] netlink_sendmsg+0x752/0xc00 [ 86.051955] sock_sendmsg+0xdd/0x110 [ 86.051955] __sys_sendto+0x1cc/0x2a0 [ 86.051955] __x64_sys_sendto+0xd8/0x1b0 [ 86.051955] do_syscall_64+0x3a/0x80 [ 86.051955] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 86.051955] [ 86.051955] Freed by task 301: [ 86.051955] kasan_save_stack+0x1c/0x40 [ 86.051955] kasan_set_track+0x21/0x30 [ 86.051955] kasan_set_free_info+0x20/0x30 [ 86.051955] __kasan_slab_free+0x104/0x170 [ 86.051955] kfree+0x8c/0x290 [ 86.051955] mctp_dev_notify+0x161/0x2c0 [ 86.051955] raw_notifier_call_chain+0x8b/0xc0 [ 86.051955] unregister_netdevice_many+0x299/0x1180 [ 86.051955] unregister_netdevice_queue+0x210/0x2f0 [ 86.051955] unregister_netdev+0x13/0x20 [ 86.051955] mctp_serial_close+0x6d/0xa0 [ 86.051955] tty_ldisc_kill+0x31/0xa0 [ 86.051955] tty_ldisc_hangup+0x24f/0x560 [ 86.051955] __tty_hangup.part.28+0x2ce/0x6b0 [ 86.051955] tty_release+0x327/0xc70 [ 86.051955] __fput+0x1df/0x8b0 [ 86.051955] task_work_run+0xca/0x150 [ 86.051955] exit_to_user_mode_prepare+0x114/0x120 [ 86.051955] syscall_exit_to_user_mode+0x12/0x20 [ 86.051955] do_syscall_64+0x46/0x80 [ 86.051955] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 86.051955] [ 86.051955] The buggy address belongs to the object at ffff888005f298c0 [ 86.051955] which belongs to the cache kmalloc-8 of size 8 [ 86.051955] The buggy address is located 0 bytes inside of [ 86.051955] 8-byte region [ffff888005f298c0, ffff888005f298c8) [ 86.051955] [ 86.051955] The buggy address belongs to the physical page: [ 86.051955] flags: 0x100000000000200(slab|node=0|zone=1) [ 86.051955] raw: 0100000000000200 dead000000000100 dead000000000122 ffff888005c42280 [ 86.051955] raw: 0000000000000000 0000000080660066 00000001ffffffff 0000000000000000 [ 86.051955] page dumped because: kasan: bad access detected [ 86.051955] [ 86.051955] Memory state around the buggy address: [ 86.051955] ffff888005f29780: 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc 00 [ 86.051955] ffff888005f29800: fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc 00 fc [ 86.051955] >ffff888005f29880: fc fc fc fb fc fc fc fc fa fc fc fc fc fa fc fc [ 86.051955] ^ [ 86.051955] ffff888005f29900: fc fc 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc [ 86.051955] ffff888005f29980: fc 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc [ 86.051955] ================================================================== To this end, just like the commit e04480920d1e ("Bluetooth: defer cleanup of resources in hci_unregister_dev()") this patch defers the destructive kfree(mdev->addrs) in mctp_unregister to the mctp_dev_put, where the refcount of mdev is zero and the entire device is reclaimed. This prevents the use-after-free because the sendmsg thread holds the reference of mdev in the mctp_route object. Fixes: 583be982d934 (mctp: Add device handling and netlink interface) Signed-off-by: Lin Ma <linma@zju.edu.cn> Acked-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://lore.kernel.org/r/20220422114340.32346-1-linma@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-26drm/i915/fbc: Consult hw.crtc instead of uapi.crtcVille Syrjälä
plane_state->uapi.crtc is not what we want to be looking at. If bigjoiner is used hw.crtc is what tells us what crtc the plane is supposedly using. Not an actual problem on current hardware as the only FBC capable pipe (A) can't be a bigjoiner slave and thus uapi.crtc==hw.crtc always here. But when we get more FBC instances this will become actually important. Fixes: 2e6c99f88679 ("drm/i915/fbc: Nuke lots of crap from intel_fbc_state_cache") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220413152852.7336-1-ville.syrjala@linux.intel.com Reviewed-by: Manasi Navare <manasi.d.navare@intel.com> (cherry picked from commit 3e1faae3398789abe8d4797255bfe28d95d81308) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2022-04-26drm/i915: Fix SEL_FETCH_PLANE_*(PIPE_B+) register addressesImre Deak
Fix typo in the _SEL_FETCH_PLANE_BASE_1_B register base address. Fixes: a5523e2ff074a5 ("drm/i915: Add PSR2 selective fetch registers") References: https://gitlab.freedesktop.org/drm/intel/-/issues/5400 Cc: José Roberto de Souza <jose.souza@intel.com> Cc: <stable@vger.kernel.org> # v5.9+ Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220421162221.2261895-1-imre.deak@intel.com (cherry picked from commit af2cbc6ef967f61711a3c40fca5366ea0bc7fecc) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2022-04-26cpufreq: qcom-cpufreq-hw: Clear dcvs interruptsVladimir Zapolskiy
It's noted that dcvs interrupts are not self-clearing, thus an interrupt handler runs constantly, which leads to a severe regression in runtime. To fix the problem an explicit write to clear interrupt register is required, note that on OSM platforms the register may not be present. Fixes: 275157b367f4 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support") Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2022-04-26tty: n_gsm: fix sometimes uninitialized warning in gsm_dlci_modem_output()Daniel Starke
'size' may be used uninitialized in gsm_dlci_modem_output() if called with an adaption that is neither 1 nor 2. The function is currently only called by gsm_modem_upd_via_data() and only for adaption 2. Properly handle every invalid case by returning -EINVAL to silence the compiler warning and avoid future regressions. Fixes: c19ffe00fed6 ("tty: n_gsm: fix invalid use of MSC in advanced option") Cc: stable@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Daniel Starke <daniel.starke@siemens.com> Link: https://lore.kernel.org/r/20220425104726.7986-1-daniel.starke@siemens.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-25Merge tag 'sunxi-clk-fixes-for-5.18-2' of ↵Stephen Boyd
https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into clk-fixes Pull Allwinner clk fixes from Jernej Skrabec: - Add missing sentinel - check return value for platform_get_resource() - mark rtc-32k as critical * tag 'sunxi-clk-fixes-for-5.18-2' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: clk: sunxi: sun9i-mmc: check return value after calling platform_get_resource() clk: sunxi-ng: sun6i-rtc: Mark rtc-32k as critical clk: sunxi-ng: fix not NULL terminated coccicheck error
2022-04-25video: fbdev: clps711x-fb: Use syscon_regmap_lookup_by_phandleAlexander Shiyan
Since version 5.13, the standard syscon bindings have been added to all clps711x DT nodes, so we can now use the more general syscon_regmap_lookup_by_phandle function to get the syscon pointer. Signed-off-by: Alexander Shiyan <eagle.alexander923@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de>
2022-04-25Merge branch 'net-smc-two-fixes-for-smc-fallback'Jakub Kicinski
Wen Gu says: ==================== net/smc: Two fixes for smc fallback This patch set includes two fixes for smc fallback: Patch 1/2 introduces some simple helpers to wrap the replacement and restore of clcsock's callback functions. Make sure that only the original callbacks will be saved and not overwritten. Patch 2/2 fixes a syzbot reporting slab-out-of-bound issue where smc_fback_error_report() accesses the already freed smc sock (see https://lore.kernel.org/r/00000000000013ca8105d7ae3ada@google.com/). The patch fixes it by resetting sk_user_data and restoring clcsock callback functions timely in fallback situation. But it should be noted that although patch 2/2 can fix the issue of 'slab-out-of-bounds/use-after-free in smc_fback_error_report', it can't pass the syzbot reproducer test. Because after applying these two patches in upstream, syzbot reproducer triggered another known issue like this: ================================================================== BUG: KASAN: use-after-free in tcp_retransmit_timer+0x2ef3/0x3360 net/ipv4/tcp_timer.c:511 Read of size 8 at addr ffff888020328380 by task udevd/4158 CPU: 1 PID: 4158 Comm: udevd Not tainted 5.18.0-rc3-syzkaller-00074-gb05a5683eba6-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0xeb/0x467 mm/kasan/report.c:313 print_report mm/kasan/report.c:429 [inline] kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491 tcp_retransmit_timer+0x2ef3/0x3360 net/ipv4/tcp_timer.c:511 tcp_write_timer_handler+0x5e6/0xbc0 net/ipv4/tcp_timer.c:622 tcp_write_timer+0xa2/0x2b0 net/ipv4/tcp_timer.c:642 call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1421 expire_timers kernel/time/timer.c:1466 [inline] __run_timers.part.0+0x679/0xa80 kernel/time/timer.c:1737 __run_timers kernel/time/timer.c:1715 [inline] run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1750 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637 irq_exit_rcu+0x5/0x20 kernel/softirq.c:649 sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> ... (detail report can be found in https://syzkaller.appspot.com/text?tag=CrashReport&x=15406b44f00000) IMHO, the above issue is the same as this known one: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed, and it doesn't seem to be related with SMC. The discussion about this known issue is ongoing and can be found in https://lore.kernel.org/bpf/000000000000f75af905d3ba0716@google.com/T/. And I added the temporary solution mentioned in the above discussion on top of my two patches, the syzbot reproducer of 'slab-out-of-bounds/ use-after-free in smc_fback_error_report' no longer triggers any issue. ==================== Link: https://lore.kernel.org/r/1650614179-11529-1-git-send-email-guwen@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-25net/smc: Fix slab-out-of-bounds issue in fallbackWen Gu
syzbot reported a slab-out-of-bounds/use-after-free issue, which was caused by accessing an already freed smc sock in fallback-specific callback functions of clcsock. This patch fixes the issue by restoring fallback-specific callback functions to original ones and resetting clcsock sk_user_data to NULL before freeing smc sock. Meanwhile, this patch introduces sk_callback_lock to make the access and assignment to sk_user_data mutually exclusive. Reported-by: syzbot+b425899ed22c6943e00b@syzkaller.appspotmail.com Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback") Link: https://lore.kernel.org/r/00000000000013ca8105d7ae3ada@google.com/ Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-25net/smc: Only save the original clcsock callback functionsWen Gu
Both listen and fallback process will save the current clcsock callback functions and establish new ones. But if both of them happen, the saved callback functions will be overwritten. So this patch introduces some helpers to ensure that only save the original callback functions of clcsock. Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-25Merge tag 'f2fs-fix-5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "This includes major bug fixes introduced in 5.18-rc1 and 5.17+: - Remove obsolete whint_mode (5.18-rc1) - Fix IO split issue caused by op_flags change in f2fs (5.18-rc1) - Fix a wrong condition check to detect IO failure loop (5.18-rc1) - Fix wrong data truncation during roll-forward (5.17+)" * tag 'f2fs-fix-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: should not truncate blocks during roll-forward recovery f2fs: fix wrong condition check when failing metapage read f2fs: keep io_flags to avoid IO split due to different op_flags in two fio holders f2fs: remove obsolete whint_mode
2022-04-25clk: sunxi: sun9i-mmc: check return value after calling platform_get_resource()Yang Yingliang
It will cause null-ptr-deref if platform_get_resource() returns NULL, we need check the return value. Fixes: 7a6fca879f59 ("clk: sunxi: Add driver for A80 MMC config clocks/resets") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/r/20220421134308.2885094-1-yangyingliang@huawei.com
2022-04-25bus: sunxi-rsb: Fix the return value of sunxi_rsb_device_create()Christophe JAILLET
This code is really spurious. It always returns an ERR_PTR, even when err is known to be 0 and calls put_device() after a successful device_register() call. It is likely that the return statement in the normal path is missing. Add 'return rdev;' to fix it. Fixes: d787dcdb9c8f ("bus: sunxi-rsb: Add driver for Allwinner Reduced Serial Bus") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Samuel Holland <samuel@sholland.org> Tested-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/r/ef2b9576350bba4c8e05e669e9535e9e2a415763.1650551719.git.christophe.jaillet@wanadoo.fr
2022-04-25no-MMU: expose vmalloc_huge() for alloc_large_system_hash()Linus Torvalds
It turns out that for the CONFIG_MMU=n builds, vmalloc_huge() was never defined, since it's defined in mm/vmalloc.c, which doesn't get built for the no-MMU configurations. Just implement the trivial wrapper for the no-MMU case too. In fact, just make it an alias to the existing __vmalloc() function that has the same signature. Link: https://lore.kernel.org/all/CAMuHMdVdx2V1uhv_152Sw3_z2xE0spiaWp1d6Ko8-rYmAxUBAg@mail.gmail.com/ Link: https://lore.kernel.org/all/CA+G9fYscb1y4a17Sf5G_Aibt+WuSf-ks_Qjw9tYFy=A4sjCEug@mail.gmail.com/ Link: https://lore.kernel.org/all/20220425150356.GA4138752@roeck-us.net/ Reported-and-tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-25Documentation: siphash: disambiguate HalfSipHash algorithm from hsiphash ↵Eric Biggers
functions Fix the documentation for the hsiphash functions to avoid conflating the HalfSipHash algorithm with the hsiphash functions, since these functions actually implement either HalfSipHash or SipHash, and random.c now uses HalfSipHash (in a very special way) without the hsiphash functions. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-04-25Documentation: siphash: enclose HalfSipHash usage example in the literal blockBagas Sanjaya
Render usage example of HalfSipHash function as code block by using literal block syntax. Cc: Jonathan Corbet <corbet@lwn.net> Cc: Eric Biggers <ebiggers@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-04-25Documentation: siphash: convert danger note to warning for HalfSipHashBagas Sanjaya
Render danger paragraph into warning block for emphasization. Cc: Jonathan Corbet <corbet@lwn.net> Cc: Eric Biggers <ebiggers@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-04-25random: document crng_fast_key_erasure() destination possibilityJason A. Donenfeld
This reverts 35a33ff3807d ("random: use memmove instead of memcpy for remaining 32 bytes"), which was made on a totally bogus basis. The thing it was worried about overlapping came from the stack, not from one of its arguments, as Eric pointed out. But the fact that this confusion even happened draws attention to the fact that it's a bit non-obvious that the random_data parameter can alias chacha_state, and in fact should do so when the caller can't rely on the stack being cleared in a timely manner. So this commit documents that. Reported-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-04-25Revert "arm64: dts: tegra: Fix boolean properties with values"Arnd Bergmann
This reverts commit 1a67653de0dd, which caused a boot regression. The behavior of the "drive-push-pull" in the kernel does not match what the binding document describes. Revert Rob's patch to make the DT match the kernel again, rather than the binding. Link: https://lore.kernel.org/lkml/YlVAy95eF%2F9b1nmu@orome/ Reported-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-25tcp: make sure treq->af_specific is initializedEric Dumazet
syzbot complained about a recent change in TCP stack, hitting a NULL pointer [1] tcp request sockets have an af_specific pointer, which was used before the blamed change only for SYNACK generation in non SYNCOOKIE mode. tcp requests sockets momentarily created when third packet coming from client in SYNCOOKIE mode were not using treq->af_specific. Make sure this field is populated, in the same way normal TCP requests sockets do in tcp_conn_request(). [1] TCP: request_sock_TCPv6: Possible SYN flooding on port 20002. Sending cookies. Check SNMP counters. general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 1 PID: 3695 Comm: syz-executor864 Not tainted 5.18.0-rc3-syzkaller-00224-g5fd1fe4807f9 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:tcp_create_openreq_child+0xe16/0x16b0 net/ipv4/tcp_minisocks.c:534 Code: 48 c1 ea 03 80 3c 02 00 0f 85 e5 07 00 00 4c 8b b3 28 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7e 08 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 c9 07 00 00 48 8b 3c 24 48 89 de 41 ff 56 08 48 RSP: 0018:ffffc90000de0588 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff888076490330 RCX: 0000000000000100 RDX: 0000000000000001 RSI: ffffffff87d67ff0 RDI: 0000000000000008 RBP: ffff88806ee1c7f8 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff87d67f00 R11: 0000000000000000 R12: ffff88806ee1bfc0 R13: ffff88801b0e0368 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f517fe58700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffcead76960 CR3: 000000006f97b000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> tcp_v6_syn_recv_sock+0x199/0x23b0 net/ipv6/tcp_ipv6.c:1267 tcp_get_cookie_sock+0xc9/0x850 net/ipv4/syncookies.c:207 cookie_v6_check+0x15c3/0x2340 net/ipv6/syncookies.c:258 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1131 [inline] tcp_v6_do_rcv+0x1148/0x13b0 net/ipv6/tcp_ipv6.c:1486 tcp_v6_rcv+0x3305/0x3840 net/ipv6/tcp_ipv6.c:1725 ip6_protocol_deliver_rcu+0x2e9/0x1900 net/ipv6/ip6_input.c:422 ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:464 NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:473 dst_input include/net/dst.h:461 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ipv6_rcv+0x27f/0x3b0 net/ipv6/ip6_input.c:297 __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5405 __netif_receive_skb+0x24/0x1b0 net/core/dev.c:5519 process_backlog+0x3a0/0x7c0 net/core/dev.c:5847 __napi_poll+0xb3/0x6e0 net/core/dev.c:6413 napi_poll net/core/dev.c:6480 [inline] net_rx_action+0x8ec/0xc60 net/core/dev.c:6567 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637 irq_exit_rcu+0x5/0x20 kernel/softirq.c:649 sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1097 Fixes: 5b0b9e4c2c89 ("tcp: md5: incorrect tcp_header_len for incoming connections") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-25tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWATEric Dumazet
I had this bug sitting for too long in my pile, it is time to fix it. Thanks to Doug Porter for reminding me of it! We had various attempts in the past, including commit 0cbe6a8f089e ("tcp: remove SOCK_QUEUE_SHRUNK"), but the issue is that TCP stack currently only generates EPOLLOUT from input path, when tp->snd_una has advanced and skb(s) cleaned from rtx queue. If a flow has a big RTT, and/or receives SACKs, it is possible that the notsent part (tp->write_seq - tp->snd_nxt) reaches 0 and no more data can be sent until tp->snd_una finally advances. What is needed is to also check if POLLOUT needs to be generated whenever tp->snd_nxt is advanced, from output path. This bug triggers more often after an idle period, as we do not receive ACK for at least one RTT. tcp_notsent_lowat could be a fraction of what CWND and pacing rate would allow to send during this RTT. In a followup patch, I will remove the bogus call to tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED) from tcp_check_space(). Fact that we have decided to generate an EPOLLOUT does not mean the application has immediately refilled the transmit queue. This optimistic call might have been the reason the bug seemed not too serious. Tested: 200 ms rtt, 1% packet loss, 32 MB tcp_rmem[2] and tcp_wmem[2] $ echo 500000 >/proc/sys/net/ipv4/tcp_notsent_lowat $ cat bench_rr.sh SUM=0 for i in {1..10} do V=`netperf -H remote_host -l30 -t TCP_RR -- -r 10000000,10000 -o LOCAL_BYTES_SENT | egrep -v "MIGRATED|Bytes"` echo $V SUM=$(($SUM + $V)) done echo SUM=$SUM Before patch: $ bench_rr.sh 130000000 80000000 140000000 140000000 140000000 140000000 130000000 40000000 90000000 110000000 SUM=1140000000 After patch: $ bench_rr.sh 430000000 590000000 530000000 450000000 450000000 350000000 450000000 490000000 480000000 460000000 SUM=4680000000 # This is 410 % of the value before patch. Fixes: c9bee3b7fdec ("tcp: TCP_NOTSENT_LOWAT socket option") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Doug Porter <dsp@fb.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-25net: mscc: ocelot: don't add VID 0 to ocelot->vlans when leaving VLAN-aware ↵Vladimir Oltean
bridge DSA, through dsa_port_bridge_leave(), first notifies the port of the fact that it left a bridge, then, if that bridge was VLAN-aware, it notifies the port of the change in VLAN awareness state, towards VLAN-unaware mode. So ocelot_port_vlan_filtering() can be called when ocelot_port->bridge is NULL, and this makes ocelot_add_vlan_unaware_pvid() create a struct ocelot_bridge_vlan with a vid of 0 and an "untagged" setting of true on that port. In a way this structure correctly reflects the reality, but by design, VID 0 (OCELOT_STANDALONE_PVID) was not meant to be kept in the bridge VLAN list of the driver, but managed separately. Having OCELOT_STANDALONE_PVID in ocelot->vlans makes us trip up on several sanity checks that did not expect to have this VID there. For example, after we leave a VLAN-aware bridge and we re-join it, we can no longer program egress-tagged VLANs to hardware: # ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up # ip link set swp0 master br0 # ip link set swp0 nomaster # ip link set swp0 master br0 # bridge vlan add dev swp0 vid 100 Error: mscc_ocelot_switch_lib: Port with more than one egress-untagged VLAN cannot have egress-tagged VLANs. But this configuration is in fact supported by the hardware, since we could use OCELOT_PORT_TAG_NATIVE. According to its comment: /* all VLANs except the native VLAN and VID 0 are egress-tagged */ yet when assessing the eligibility for this mode, we do not check for VID 0 in ocelot_port_uses_native_vlan(), instead we just ensure that ocelot_port_num_untagged_vlans() == 1. This is simply because VID 0 doesn't have a bridge VLAN structure. The way I identify the problem is that ocelot_port_vlan_filtering(false) only means to call ocelot_add_vlan_unaware_pvid() when we dynamically turn off VLAN awareness for a bridge we are under, and the PVID changes from the bridge PVID to a reserved PVID based on the bridge number. Since OCELOT_STANDALONE_PVID is statically added to the VLAN table during ocelot_vlan_init() and never removed afterwards, calling ocelot_add_vlan_unaware_pvid() for it is not intended and does not serve any purpose. Fix the issue by avoiding the call to ocelot_add_vlan_unaware_pvid(vid=0) when we're resetting VLAN awareness after leaving the bridge, to become a standalone port. Fixes: 54c319846086 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-25net: mscc: ocelot: ignore VID 0 added by 8021q moduleVladimir Oltean
Both the felix DSA driver and ocelot switchdev driver declare dev->features & NETIF_F_HW_VLAN_CTAG_FILTER under certain circumstances*, so the 8021q module will add VID 0 to our RX filter when the port goes up, to ensure 802.1p traffic is not dropped. We treat VID 0 as a special value (OCELOT_STANDALONE_PVID) which deliberately does not have a struct ocelot_bridge_vlan associated with it. Instead, this gets programmed to the VLAN table in ocelot_vlan_init(). If we allow external calls to modify VID 0, we reach the following situation: # ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up # ip link set swp0 master br0 # ip link set swp0 up # this adds VID 0 to ocelot->vlans with untagged=false bridge vlan port vlan-id swp0 1 PVID Egress Untagged # the bridge also adds VID 1 br0 1 PVID Egress Untagged # bridge vlan add dev swp0 vid 100 untagged Error: mscc_ocelot_switch_lib: Port with egress-tagged VLANs cannot have more than one egress-untagged (native) VLAN. This configuration should have been accepted, because ocelot_port_manage_port_tag() should select OCELOT_PORT_TAG_NATIVE. Yet it isn't, because we have an entry in ocelot->vlans which says VID 0 should be egress-tagged, something the hardware can't do. Fix this by suppressing additions/deletions on VID 0 and managing this VLAN exclusively using OCELOT_STANDALONE_PVID. *DSA toggles it when the port becomes VLAN-aware by joining a VLAN-aware bridge. Ocelot declares it unconditionally for some reason. Fixes: 54c319846086 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-25net: dsa: flood multicast to CPU when slave has IFF_PROMISCVladimir Oltean
Certain DSA switches can eliminate flooding to the CPU when none of the ports have the IFF_ALLMULTI or IFF_PROMISC flags set. This is done by synthesizing a call to dsa_port_bridge_flags() for the CPU port, a call which normally comes from the bridge driver via switchdev. The bridge port flags and IFF_PROMISC|IFF_ALLMULTI have slightly different semantics, and due to inattention/lack of proper testing, the IFF_PROMISC flag allows unknown unicast to be flooded to the CPU, but not unknown multicast. This must be fixed by setting both BR_FLOOD (unicast) and BR_MCAST_FLOOD in the synthesized dsa_port_bridge_flags() call, since IFF_PROMISC means that packets should not be filtered regardless of their MAC DA. Fixes: 7569459a52c9 ("net: dsa: manage flooding on the CPU ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-25ip_gre, ip6_gre: Fix race condition on o_seqno in collect_md modePeilin Ye
As pointed out by Jakub Kicinski, currently using TUNNEL_SEQ in collect_md mode is racy for [IP6]GRE[TAP] devices. Consider the following sequence of events: 1. An [IP6]GRE[TAP] device is created in collect_md mode using "ip link add ... external". "ip" ignores "[o]seq" if "external" is specified, so TUNNEL_SEQ is off, and the device is marked as NETIF_F_LLTX (i.e. it uses lockless TX); 2. Someone sets TUNNEL_SEQ on outgoing skb's, using e.g. bpf_skb_set_tunnel_key() in an eBPF program attached to this device; 3. gre_fb_xmit() or __gre6_xmit() processes these skb's: gre_build_header(skb, tun_hlen, flags, protocol, tunnel_id_to_key32(tun_info->key.tun_id), (flags & TUNNEL_SEQ) ? htonl(tunnel->o_seqno++) : 0); ^^^^^^^^^^^^^^^^^ Since we are not using the TX lock (&txq->_xmit_lock), multiple CPUs may try to do this tunnel->o_seqno++ in parallel, which is racy. Fix it by making o_seqno atomic_t. As mentioned by Eric Dumazet in commit b790e01aee74 ("ip_gre: lockless xmit"), making o_seqno atomic_t increases "chance for packets being out of order at receiver" when NETIF_F_LLTX is on. Maybe a better fix would be: 1. Do not ignore "oseq" in external mode. Users MUST specify "oseq" if they want the kernel to allow sequencing of outgoing packets; 2. Reject all outgoing TUNNEL_SEQ packets if the device was not created with "oseq". Unfortunately, that would break userspace. We could now make [IP6]GRE[TAP] devices always NETIF_F_LLTX, but let us do it in separate patches to keep this fix minimal. Suggested-by: Jakub Kicinski <kuba@kernel.org> Fixes: 77a5196a804e ("gre: add sequence number for collect md mode.") Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-25ip6_gre: Make o_seqno start from 0 in native modePeilin Ye
For IP6GRE and IP6GRETAP devices, currently o_seqno starts from 1 in native mode. According to RFC 2890 2.2., "The first datagram is sent with a sequence number of 0." Fix it. It is worth mentioning that o_seqno already starts from 0 in collect_md mode, see the "if (tunnel->parms.collect_md)" clause in __gre6_xmit(), where tunnel->o_seqno is passed to gre_build_header() before getting incremented. Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-25ip_gre: Make o_seqno start from 0 in native modePeilin Ye
For GRE and GRETAP devices, currently o_seqno starts from 1 in native mode. According to RFC 2890 2.2., "The first datagram is sent with a sequence number of 0." Fix it. It is worth mentioning that o_seqno already starts from 0 in collect_md mode, see gre_fb_xmit(), where tunnel->o_seqno is passed to gre_build_header() before getting incremented. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-25net: lan966x: fix a couple off by one bugsDan Carpenter
The lan966x->ports[] array has lan966x->num_phys_ports elements. These are assigned in lan966x_probe(). That means the > comparison should be changed to >=. The first off by one check is harmless but the second one could lead to an out of bounds access and a crash. Fixes: 5ccd66e01cbe ("net: lan966x: add support for interrupts from analyzer") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-25net/smc: sync err code when tcp connection was refusedliuyacan
In the current implementation, when TCP initiates a connection to an unavailable [ip,port], ECONNREFUSED will be stored in the TCP socket, but SMC will not. However, some apps (like curl) use getsockopt(,,SO_ERROR,,) to get the error information, which makes them miss the error message and behave strangely. Fixes: 50717a37db03 ("net/smc: nonblocking connect rework") Signed-off-by: liuyacan <liuyacan@corp.netease.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-25net: hns: Add missing fwnode_handle_put in hns_mac_initPeng Wu
In one of the error paths of the device_for_each_child_node() loop in hns_mac_init, add missing call to fwnode_handle_put. Signed-off-by: Peng Wu <wupeng58@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>