summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-11-30dpaa_eth: add basic XDP supportCamelia Groza
Implement the XDP_DROP and XDP_PASS actions. Avoid draining and reconfiguring the buffer pool at each XDP setup/teardown by increasing the frame headroom and reserving XDP_PACKET_HEADROOM bytes from the start. Since we always reserve an entire page per buffer, this change only impacts Jumbo frame scenarios where the maximum linear frame size is reduced by 256 bytes. Multi buffer Scatter/Gather frames are now used instead in these scenarios. Allow XDP programs to access the entire buffer. The data in the received frame's headroom can be overwritten by the XDP program. Extract the relevant fields from the headroom while they are still available, before running the XDP program. Since the headroom might be resized before the frame is passed up to the stack, remove the check for a fixed headroom value when building an skb. Allow the meta data to be updated and pass the information up the stack. Scatter/Gather frames are dropped when XDP is enabled. Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Signed-off-by: Camelia Groza <camelia.groza@nxp.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-30dpaa_eth: add struct for software backpointersCamelia Groza
We maintain an skb backpointer in the software annotations area of Tx frames. Introduce a structure for explicit handling. Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Signed-off-by: Camelia Groza <camelia.groza@nxp.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-01gfs2: Fix deadlock between gfs2_{create_inode,inode_lookup} and delete_work_funcAndreas Gruenbacher
In gfs2_create_inode and gfs2_inode_lookup, make sure to cancel any pending delete work before taking the inode glock. Otherwise, gfs2_cancel_delete_work may block waiting for delete_work_func to complete, and delete_work_func may block trying to acquire the inode glock in gfs2_inode_lookup. Reported-by: Alexander Aring <aahringo@redhat.com> Fixes: a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-12-01Merge branch 'xdp-preferred-busy-polling'Daniel Borkmann
Björn Töpel says: ==================== This series introduces three new features: 1. A new "heavy traffic" busy-polling variant that works in concert with the existing napi_defer_hard_irqs and gro_flush_timeout knobs. 2. A new socket option that let a user change the busy-polling NAPI budget. 3. Allow busy-polling to be performed on XDP sockets. The existing busy-polling mode, enabled by the SO_BUSY_POLL socket option or system-wide using the /proc/sys/net/core/busy_read knob, is an opportunistic. That means that if the NAPI context is not scheduled, it will poll it. If, after busy-polling, the budget is exceeded the busy-polling logic will schedule the NAPI onto the regular softirq handling. One implication of the behavior above is that a busy/heavy loaded NAPI context will never enter/allow for busy-polling. Some applications prefer that most NAPI processing would be done by busy-polling. This series adds a new socket option, SO_PREFER_BUSY_POLL, that works in concert with the napi_defer_hard_irqs and gro_flush_timeout knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were introduced in commit 6f8b12d661d0 ("net: napi: add hard irqs deferral feature"), and allows for a user to defer interrupts to be enabled and instead schedule the NAPI context from a watchdog timer. When a user enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled, and the NAPI context is being processed by a softirq, the softirq NAPI processing will exit early to allow the busy-polling to be performed. If the application stops performing busy-polling via a system call, the watchdog timer defined by gro_flush_timeout will timeout, and regular softirq handling will resume. In summary; Heavy traffic applications that prefer busy-polling over softirq processing should use this option. Patch 6 touches a lot of drivers, so the Cc: list is grossly long. Example usage: $ echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs $ echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout Note that the timeout should be larger than the userspace processing window, otherwise the watchdog will timeout and fall back to regular softirq processing. Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket. Performance simple UDP ping-pong: A packet generator blasts UDP packets from a packet generator to a certain {src,dst}IP/port, so a dedicated ksoftirq will be busy handling the packets at a certain core. A simple UDP test program that simply does recvfrom/sendto is running at the host end. Throughput in pps and RTT latency is measured at the packet generator. /proc/sys/net/core/busy_read is set (20). Min Max Avg (usec) 1. Blocking 2-cores: 490Kpps 1218.192 1335.427 1271.083 2. Blocking, 1-core: 155Kpps 1327.195 17294.855 4761.367 3. Non-blocking, 2-cores: 475Kpps 1221.197 1330.465 1270.740 4. Non-blocking, 1-core: 3Kpps 29006.482 37260.465 33128.367 5. Non-blocking, prefer busy-poll, 1-core: 420Kpps 1202.535 5494.052 4885.443 Scenario 2 and 5 shows when the new option should be used. Throughput go from 155 to 420Kpps, average latency are similar, but the tail latencies are much better for the latter. Performance XDP sockets: Again, a packet generator blasts UDP packets from a packet generator to a certain {src,dst}IP/port. Today, running XDP sockets sample on the same core as the softirq handling, performance tanks mainly because we do not yield to user-space when the XDP socket Rx queue is full. # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r Rx: 64Kpps # # preferred busy-polling, budget 8 # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 8 Rx 9.9Mpps # # preferred busy-polling, budget 64 # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 64 Rx: 19.3Mpps # # preferred busy-polling, budget 256 # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 256 Rx: 21.4Mpps # # preferred busy-polling, budget 512 # taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 512 Rx: 21.7Mpps Compared to the two-core case: # taskset -c 4 ./xdpsock -i ens785f1 -q 20 -n 1 -r Rx: 20.7Mpps We're getting better single-core performance than two, for this naïve drop scenario. Performance netperf UDP_RR: Note that netperf UDP_RR is not a heavy traffic tests, and preferred busy-polling is not typically something we want to use here. $ echo 20 | sudo tee /proc/sys/net/core/busy_read $ netperf -H 192.168.1.1 -l 30 -t UDP_RR -v 2 -- \ -o min_latency,mean_latency,max_latency,stddev_latency,transaction_rate busy-polling blocking sockets: 12,13.33,224,0.63,74731.177 I hacked netperf to use non-blocking sockets and re-ran: busy-polling non-blocking sockets: 12,13.46,218,0.72,73991.172 prefer busy-polling non-blocking sockets: 12,13.62,221,0.59,73138.448 Using the preferred busy-polling mode does not impact performance. The above tests was done for the 'ice' driver. Thanks to Jakub for suggesting this busy-polling addition [1], and Eric for all input/review! Changes: rfc-v1 [2] -> rfc-v2: * Changed name from bias to prefer. * Base the work on Eric's/Luigi's defer irq/gro timeout work. * Proper GRO flushing. * Build issues for some XDP drivers. rfc-v2 [3] -> v1: * Fixed broken qlogic build. * Do not trigger an IPI (XDP socket wakeup) when busy-polling is enabled. v1 [4] -> v2: * Added napi_id to socionext driver, and added Ilias Acked-by:. (Ilias) * Added a samples patch to improve busy-polling for xdpsock/l2fwd. * Correctly mark atomic operations with {WRITE,READ}_ONCE, to make KCSAN and the code readers happy. (Eric) * Check NAPI budget not to exceed U16_MAX. (Eric) * Added kdoc. v2 [5] -> v3: * Collected Acked-by. * Check NAPI disable prior prefer busy-polling. (Jakub) * Added napi_id registration for virtio-net. (Michael) * Added napi_id registration for veth. v3 [6] -> v4: * Collected Acked-by/Reviewed-by. [1] https://lore.kernel.org/netdev/20200925120652.10b8d7c5@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ [2] https://lore.kernel.org/bpf/20201028133437.212503-1-bjorn.topel@gmail.com/ [3] https://lore.kernel.org/bpf/20201105102812.152836-1-bjorn.topel@gmail.com/ [4] https://lore.kernel.org/bpf/20201112114041.131998-1-bjorn.topel@gmail.com/ [5] https://lore.kernel.org/bpf/20201116110416.10719-1-bjorn.topel@gmail.com/ [6] https://lore.kernel.org/bpf/20201119083024.119566-1-bjorn.topel@gmail.com/ ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2020-12-01samples/bpf: Add option to set the busy-poll budgetBjörn Töpel
Support for the SO_BUSY_POLL_BUDGET setsockopt, via the batching option ('b'). Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20201130185205.196029-11-bjorn.topel@gmail.com
2020-12-01samples/bpf: Add busy-poll support to xdpsockBjörn Töpel
Add a new option to xdpsock, 'B', for busy-polling. This option will also set the batching size, 'b' option, to the busy-poll budget. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20201130185205.196029-10-bjorn.topel@gmail.com
2020-12-01samples/bpf: Use recvfrom() in xdpsock/l2fwdBjörn Töpel
Start using recvfrom() the l2fwd scenario, instead of poll() which is more expensive and need additional knobs for busy-polling. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20201130185205.196029-9-bjorn.topel@gmail.com
2020-12-01samples/bpf: Use recvfrom() in xdpsock/rxdropBjörn Töpel
Start using recvfrom() the rxdrop scenario. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20201130185205.196029-8-bjorn.topel@gmail.com
2020-12-01xsk: Propagate napi_id to XDP socket Rx pathBjörn Töpel
Add napi_id to the xdp_rxq_info structure, and make sure the XDP socket pick up the napi_id in the Rx path. The napi_id is used to find the corresponding NAPI structure for socket busy polling. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/bpf/20201130185205.196029-7-bjorn.topel@gmail.com
2020-12-01xsk: Add busy-poll support for {recv,send}msg()Björn Töpel
Wire-up XDP socket busy-poll support for recvmsg() and sendmsg(). If the XDP socket prefers busy-polling, make sure that no wakeup/IPI is performed. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20201130185205.196029-6-bjorn.topel@gmail.com
2020-12-01xsk: Check need wakeup flag in sendmsg()Björn Töpel
Add a check for need wake up in sendmsg(), so that if a user calls sendmsg() when no wakeup is needed, do not trigger a wakeup. To simplify the need wakeup check in the syscall, unconditionally enable the need wakeup flag for Tx. This has a side-effect for poll(); If poll() is called for a socket without enabled need wakeup, a Tx wakeup is unconditionally performed. The wakeup matrix for AF_XDP now looks like: need wakeup | poll() | sendmsg() | recvmsg() ------------+--------------+-------------+------------ disabled | wake Tx | wake Tx | nop enabled | check flag; | check flag; | check flag; | wake Tx/Rx | wake Tx | wake Rx Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20201130185205.196029-5-bjorn.topel@gmail.com
2020-12-01xsk: Add support for recvmsg()Björn Töpel
Add support for non-blocking recvmsg() to XDP sockets. Previously, only sendmsg() was supported by XDP socket. Now, for symmetry and the upcoming busy-polling support, recvmsg() is added. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20201130185205.196029-4-bjorn.topel@gmail.com
2020-12-01net: Add SO_BUSY_POLL_BUDGET socket optionBjörn Töpel
This option lets a user set a per socket NAPI budget for busy-polling. If the options is not set, it will use the default of 8. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/20201130185205.196029-3-bjorn.topel@gmail.com
2020-12-01net: Introduce preferred busy-pollingBjörn Töpel
The existing busy-polling mode, enabled by the SO_BUSY_POLL socket option or system-wide using the /proc/sys/net/core/busy_read knob, is an opportunistic. That means that if the NAPI context is not scheduled, it will poll it. If, after busy-polling, the budget is exceeded the busy-polling logic will schedule the NAPI onto the regular softirq handling. One implication of the behavior above is that a busy/heavy loaded NAPI context will never enter/allow for busy-polling. Some applications prefer that most NAPI processing would be done by busy-polling. This series adds a new socket option, SO_PREFER_BUSY_POLL, that works in concert with the napi_defer_hard_irqs and gro_flush_timeout knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were introduced in commit 6f8b12d661d0 ("net: napi: add hard irqs deferral feature"), and allows for a user to defer interrupts to be enabled and instead schedule the NAPI context from a watchdog timer. When a user enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled, and the NAPI context is being processed by a softirq, the softirq NAPI processing will exit early to allow the busy-polling to be performed. If the application stops performing busy-polling via a system call, the watchdog timer defined by gro_flush_timeout will timeout, and regular softirq handling will resume. In summary; Heavy traffic applications that prefer busy-polling over softirq processing should use this option. Example usage: $ echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs $ echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout Note that the timeout should be larger than the userspace processing window, otherwise the watchdog will timeout and fall back to regular softirq processing. Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/20201130185205.196029-2-bjorn.topel@gmail.com
2020-11-30selftests/bpf: Fix flavored variants of test_imaKP Singh
Flavored variants of test_progs (e.g. test_progs-no_alu32) change their working directory to the corresponding subdirectory (e.g. no_alu32). Since the setup script required by test_ima (ima_setup.sh) is not mentioned in the dependencies, it does not get copied to these subdirectories and causes flavored variants of test_ima to fail. Adding the script to TRUNNER_EXTRA_FILES ensures that the file is also copied to the subdirectories for the flavored variants of test_progs. Fixes: 34b82d3ac105 ("bpf: Add a selftest for bpf_ima_inode_hash") Reported-by: Yonghong Song <yhs@fb.com> Suggested-by: Yonghong Song <yhs@fb.com> Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20201126184946.1708213-1-kpsingh@chromium.org
2020-11-30cifs: fix potential use-after-free in cifs_echo_request()Paulo Alcantara
This patch fixes a potential use-after-free bug in cifs_echo_request(). For instance, thread 1 -------- cifs_demultiplex_thread() clean_demultiplex_info() kfree(server) thread 2 (workqueue) -------- apic_timer_interrupt() smp_apic_timer_interrupt() irq_exit() __do_softirq() run_timer_softirq() call_timer_fn() cifs_echo_request() <- use-after-free in server ptr Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-11-30cifs: allow syscalls to be restarted in __smb_send_rqst()Paulo Alcantara
A customer has reported that several files in their multi-threaded app were left with size of 0 because most of the read(2) calls returned -EINTR and they assumed no bytes were read. Obviously, they could have fixed it by simply retrying on -EINTR. We noticed that most of the -EINTR on read(2) were due to real-time signals sent by glibc to process wide credential changes (SIGRT_1), and its signal handler had been established with SA_RESTART, in which case those calls could have been automatically restarted by the kernel. Let the kernel decide to whether or not restart the syscalls when there is a signal pending in __smb_send_rqst() by returning -ERESTARTSYS. If it can't, it will return -EINTR anyway. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-11-30ring-buffer: Set the right timestamp in the slow path of __rb_reserve_next()Andrea Righi
In the slow path of __rb_reserve_next() a nested event(s) can happen between evaluating the timestamp delta of the current event and updating write_stamp via local_cmpxchg(); in this case the delta is not valid anymore and it should be set to 0 (same timestamp as the interrupting event), since the event that we are currently processing is not the last event in the buffer. Link: https://lkml.kernel.org/r/X8IVJcp1gRE+FJCJ@xps-13-7390 Cc: Ingo Molnar <mingo@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Link: https://lwn.net/Articles/831207 Fixes: a389d86f7fd0 ("ring-buffer: Have nested events still record running time stamp") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-30ring-buffer: Update write stamp with the correct tsSteven Rostedt (VMware)
The write stamp, used to calculate deltas between events, was updated with the stale "ts" value in the "info" structure, and not with the updated "ts" variable. This caused the deltas between events to be inaccurate, and when crossing into a new sub buffer, had time go backwards. Link: https://lkml.kernel.org/r/20201124223917.795844-1-elavila@google.com Cc: stable@vger.kernel.org Fixes: a389d86f7fd09 ("ring-buffer: Have nested events still record running time stamp") Reported-by: "J. Avila" <elavila@google.com> Tested-by: Daniel Mentz <danielmentz@google.com> Tested-by: Will McVicker <willmcvicker@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-30arm64: mte: Fix typo in macro definitionVincenzo Frascino
UL in the definition of SYS_TFSR_EL1_TF1 was misspelled causing compilation issues when trying to implement in kernel MTE async mode. Fix the macro correcting the typo. Note: MTE async mode will be introduced with a future series. Fixes: c058b1c4a5ea ("arm64: mte: system register definitions") Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20201130170709.22309-1-vincenzo.frascino@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30can: m_can: m_can_class_unregister(): move right after m_can_class_register()Marc Kleine-Budde
This patch moves the function m_can_class_unregister() directly after the m_can_class_register() function. Link: https://lore.kernel.org/r/20201130133713.269256-7-mkl@pengutronix.de Reviewed-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-30can: m_can: m_can_plat_remove(): remove unneeded platform_set_drvdata()Marc Kleine-Budde
There's no need to unset the drvdata on remove, so remove the unneeded call to platform_set_drvdata() in m_can_plat_remove(). Link: https://lore.kernel.org/r/20201130133713.269256-6-mkl@pengutronix.de Reviewed-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-30can: m_can: remove not used variable struct m_can_classdev::freqMarc Kleine-Budde
This patch removes the unused variable freq from the struct m_can_classdev. Link: https://lore.kernel.org/r/20201130133713.269256-5-mkl@pengutronix.de Reviewed-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-30can: m_can: Kconfig: convert the into menuMarc Kleine-Budde
Since there is more than one base driver for the m_can core, let's convert this into a menu. Link: https://lore.kernel.org/r/20201130133713.269256-4-mkl@pengutronix.de Reviewed-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-30can: tcan4x5x: tcan4x5x_can_probe(): remove probe failed error messageMarc Kleine-Budde
The driver core already emits a probe failed error message, so remove this one from the driver. Link: https://lore.kernel.org/r/20201130133713.269256-3-mkl@pengutronix.de Reviewed-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-30can: tcan4x5x: remove mram_start and reg_offset from struct tcan4x5x_privMarc Kleine-Budde
Both struct tcan4x5x_priv::mram_start and struct tcan4x5x_priv::reg_offset are only assigned once with a constant and then always used read-only. This patch changes the driver to use the constant directly instead. Link: https://lore.kernel.org/r/20201130133713.269256-2-mkl@pengutronix.de Reviewed-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-30can: tcan4x5x: rename parse_config() functionDan Murphy
Rename the tcan4x5x_parse_config() function to tcan4x5x_get_gpios() since the function retrieves the gpio configurations from the firmware. Signed-off-by: Dan Murphy <dmurphy@ti.com> Link: http://lore.kernel.org/r/20200226140358.30017-1-dmurphy@ti.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-30arm64: entry: fix EL1 debug transitionsMark Rutland
In debug_exception_enter() and debug_exception_exit() we trace hardirqs on/off while RCU isn't guaranteed to be watching, and we don't save and restore the hardirq state, and so may return with this having changed. Handle this appropriately with new entry/exit helpers which do the bare minimum to ensure this is appropriately maintained, without marking debug exceptions as NMIs. These are placed in entry-common.c with the other entry/exit helpers. In future we'll want to reconsider whether some debug exceptions should be NMIs, but this will require a significant refactoring, and for now this should prevent issues with lockdep and RCU. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marins <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-12-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30arm64: entry: fix NMI {user, kernel}->kernel transitionsMark Rutland
Exceptions which can be taken at (almost) any time are consdiered to be NMIs. On arm64 that includes: * SDEI events * GICv3 Pseudo-NMIs * Kernel stack overflows * Unexpected/unhandled exceptions ... but currently debug exceptions (BRKs, breakpoints, watchpoints, single-step) are not considered NMIs. As these can be taken at any time, kernel features (lockdep, RCU, ftrace) may not be in a consistent kernel state. For example, we may take an NMI from the idle code or partway through an entry/exit path. While nmi_enter() and nmi_exit() handle most of this state, notably they don't save/restore the lockdep state across an NMI being taken and handled. When interrupts are enabled and an NMI is taken, lockdep may see interrupts become disabled within the NMI code, but not see interrupts become enabled when returning from the NMI, leaving lockdep believing interrupts are disabled when they are actually disabled. The x86 code handles this in idtentry_{enter,exit}_nmi(), which will shortly be moved to the generic entry code. As we can't use either yet, we copy the x86 approach in arm64-specific helpers. All the NMI entrypoints are marked as noinstr to prevent any instrumentation handling code being invoked before the state has been corrected. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-11-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30arm64: entry: fix non-NMI kernel<->kernel transitionsMark Rutland
There are periods in kernel mode when RCU is not watching and/or the scheduler tick is disabled, but we can still take exceptions such as interrupts. The arm64 exception handlers do not account for this, and it's possible that RCU is not watching while an exception handler runs. The x86/generic entry code handles this by ensuring that all (non-NMI) kernel exception handlers call irqentry_enter() and irqentry_exit(), which handle RCU, lockdep, and IRQ flag tracing. We can't yet move to the generic entry code, and already hadnle the user<->kernel transitions elsewhere, so we add new kernel<->kernel transition helpers alog the lines of the generic entry code. Since we now track interrupts becoming masked when an exception is taken, local_daif_inherit() is modified to track interrupts becoming re-enabled when the original context is inherited. To balance the entry/exit paths, each handler masks all DAIF exceptions before exit_to_kernel_mode(). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-10-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30arm64: ptrace: prepare for EL1 irq/rcu trackingMark Rutland
Exceptions from EL1 may be taken when RCU isn't watching (e.g. in idle sequences), or when the lockdep hardirqs transiently out-of-sync with the hardware state (e.g. in the middle of local_irq_enable()). To correctly handle these cases, we'll need to save/restore this state across some exceptions taken from EL1. A series of subsequent patches will update EL1 exception handlers to handle this. In preparation for this, and to avoid dependencies between those patches, this patch adds two new fields to struct pt_regs so that exception handlers can track this state. Note that this is placed in pt_regs as some entry/exit sequences such as el1_irq are invoked from assembly, which makes it very difficult to add a separate structure as with the irqentry_state used by x86. We can separate this once more of the exception logic is moved to C. While the fields only need to be bool, they are both made u64 to keep pt_regs 16-byte aligned. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-9-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30arm64: entry: fix non-NMI user<->kernel transitionsMark Rutland
When built with PROVE_LOCKING, NO_HZ_FULL, and CONTEXT_TRACKING_FORCE will WARN() at boot time that interrupts are enabled when we call context_tracking_user_enter(), despite the DAIF flags indicating that IRQs are masked. The problem is that we're not tracking IRQ flag changes accurately, and so lockdep believes interrupts are enabled when they are not (and vice-versa). We can shuffle things so to make this more accurate. For kernel->user transitions there are a number of constraints we need to consider: 1) When we call __context_tracking_user_enter() HW IRQs must be disabled and lockdep must be up-to-date with this. 2) Userspace should be treated as having IRQs enabled from the PoV of both lockdep and tracing. 3) As context_tracking_user_enter() stops RCU from watching, we cannot use RCU after calling it. 4) IRQ flag tracing and lockdep have state that must be manipulated before RCU is disabled. ... with similar constraints applying for user->kernel transitions, with the ordering reversed. The generic entry code has enter_from_user_mode() and exit_to_user_mode() helpers to handle this. We can't use those directly, so we add arm64 copies for now (without the instrumentation markers which aren't used on arm64). These replace the existing user_exit() and user_exit_irqoff() calls spread throughout handlers, and the exception unmasking is left as-is. Note that: * The accounting for debug exceptions from userspace now happens in el0_dbg() and ret_to_user(), so this is removed from debug_exception_enter() and debug_exception_exit(). As user_exit_irqoff() wakes RCU, the userspace-specific check is removed. * The accounting for syscalls now happens in el0_svc(), el0_svc_compat(), and ret_to_user(), so this is removed from el0_svc_common(). This does not adversely affect the workaround for erratum 1463225, as this does not depend on any of the state tracking. * In ret_to_user() we mask interrupts with local_daif_mask(), and so we need to inform lockdep and tracing. Here a trace_hardirqs_off() is sufficient and safe as we have not yet exited kernel context and RCU is usable. * As PROVE_LOCKING selects TRACE_IRQFLAGS, the ifdeferry in entry.S only needs to check for the latter. * EL0 SError handling will be dealt with in a subsequent patch, as this needs to be treated as an NMI. Prior to this patch, booting an appropriately-configured kernel would result in spats as below: | DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled()) | WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5280 check_flags.part.54+0x1dc/0x1f0 | Modules linked in: | CPU: 2 PID: 1 Comm: init Not tainted 5.10.0-rc3 #3 | Hardware name: linux,dummy-virt (DT) | pstate: 804003c5 (Nzcv DAIF +PAN -UAO -TCO BTYPE=--) | pc : check_flags.part.54+0x1dc/0x1f0 | lr : check_flags.part.54+0x1dc/0x1f0 | sp : ffff80001003bd80 | x29: ffff80001003bd80 x28: ffff66ce801e0000 | x27: 00000000ffffffff x26: 00000000000003c0 | x25: 0000000000000000 x24: ffffc31842527258 | x23: ffffc31842491368 x22: ffffc3184282d000 | x21: 0000000000000000 x20: 0000000000000001 | x19: ffffc318432ce000 x18: 0080000000000000 | x17: 0000000000000000 x16: ffffc31840f18a78 | x15: 0000000000000001 x14: ffffc3184285c810 | x13: 0000000000000001 x12: 0000000000000000 | x11: ffffc318415857a0 x10: ffffc318406614c0 | x9 : ffffc318415857a0 x8 : ffffc31841f1d000 | x7 : 647261685f706564 x6 : ffffc3183ff7c66c | x5 : ffff66ce801e0000 x4 : 0000000000000000 | x3 : ffffc3183fe00000 x2 : ffffc31841500000 | x1 : e956dc24146b3500 x0 : 0000000000000000 | Call trace: | check_flags.part.54+0x1dc/0x1f0 | lock_is_held_type+0x10c/0x188 | rcu_read_lock_sched_held+0x70/0x98 | __context_tracking_enter+0x310/0x350 | context_tracking_enter.part.3+0x5c/0xc8 | context_tracking_user_enter+0x6c/0x80 | finish_ret_to_user+0x2c/0x13cr Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-8-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30arm64: entry: move el1 irq/nmi logic to CMark Rutland
In preparation for reworking the EL1 irq/nmi entry code, move the existing logic to C. We no longer need the asm_nmi_enter() and asm_nmi_exit() wrappers, so these are removed. The new C functions are marked noinstr, which prevents compiler instrumentation and runtime probing. In subsequent patches we'll want the new C helpers to be called in all cases, so we don't bother wrapping the calls with ifdeferry. Even when the new C functions are stubs the trivial calls are unlikely to have a measurable impact on the IRQ or NMI paths anyway. Prototypes are added to <asm/exception.h> as otherwise (in some configurations) GCC will complain about the lack of a forward declaration. We already do this for existing function, e.g. enter_from_user_mode(). The new helpers are marked as noinstr (which prevents all instrumentation, tracing, and kprobes). Otherwise, there should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-7-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30arm64: entry: prepare ret_to_user for function callMark Rutland
In a subsequent patch ret_to_user will need to make a C function call (in some configurations) which may clobber x0-x18 at the start of the finish_ret_to_user block, before enable_step_tsk consumes the flags loaded into x1. In preparation for this, let's load the flags into x19, which is preserved across C function calls. This avoids a redundant reload of the flags and ensures we operate on a consistent shapshot regardless. There should be no functional change as a result of this patch. At this point of the entry/exit paths we only need to preserve x28 (tsk) and the sp, and x19 is free for this use. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-6-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30arm64: entry: move enter_from_user_mode to entry-common.cMark Rutland
In later patches we'll want to extend enter_from_user_mode() and add a corresponding exit_to_user_mode(). As these will be common for all entries/exits from userspace, it'd be better for these to live in entry-common.c with the rest of the entry logic. This patch moves enter_from_user_mode() into entry-common.c. As with other functions in entry-common.c it is marked as noinstr (which prevents all instrumentation, tracing, and kprobes) but there are no other functional changes. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-5-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30arm64: entry: mark entry code as noinstrMark Rutland
Functions in entry-common.c are marked as notrace and NOKPROBE_SYMBOL(), but they're still subject to other instrumentation which may rely on lockdep/rcu/context-tracking being up-to-date, and may cause nested exceptions (e.g. for WARN/BUG or KASAN's use of BRK) which will corrupt exceptions registers which have not yet been read. Prevent this by marking all functions in entry-common.c as noinstr to prevent compiler instrumentation. This also blacklists the functions for tracing and kprobes, so we don't need to handle that separately. Functions elsewhere will be dealt with in subsequent patches. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-4-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30arm64: mark idle code as noinstrMark Rutland
Core code disables RCU when calling arch_cpu_idle(), so it's not safe for arch_cpu_idle() or its calees to be instrumented, as the instrumentation callbacks may attempt to use RCU or other features which are unsafe to use in this context. Mark them noinstr to prevent issues. The use of local_irq_enable() in arch_cpu_idle() is similarly problematic, and the "sched/idle: Fix arch_cpu_idle() vs tracing" patch queued in the tip tree addresses that case. Reported-by: Marco Elver <elver@google.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30arm64: syscall: exit userspace before unmasking exceptionsMark Rutland
In el0_svc_common() we unmask exceptions before we call user_exit(), and so there's a window where an IRQ or debug exception can be taken while RCU is not watching. In do_debug_exception() we account for this in via debug_exception_{enter,exit}(), but in the el1_irq asm we do not and we call trace functions which rely on RCU before we have a guarantee that RCU is watching. Let's avoid this by having el0_svc_common() exit userspace before unmasking exceptions, matching what we do for all other EL0 entry paths. We can use user_exit_irqoff() to avoid the pointless save/restore of IRQ flags while we're sure exceptions are masked in DAIF. The workaround for Cortex-A76 erratum 1463225 may trigger a debug exception before this point, but the debug code invoked in this case is safe even when RCU is not watching. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30can: kvaser_pciefd: kvaser_pciefd_open(): fix error handlingZhang Qilong
If kvaser_pciefd_bus_on() failed, we should call close_candev() to avoid reference leak. Fixes: 26ad340e582d3 ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices") Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Link: https://lore.kernel.org/r/20201128133922.3276973-3-zhangqilong3@huawei.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-30can: c_can: c_can_power_up(): fix error handlingZhang Qilong
In the error handling in c_can_power_up(), there are two bugs: 1) c_can_pm_runtime_get_sync() will increase usage counter if device is not empty. Forgetting to call c_can_pm_runtime_put_sync() will result in a reference leak here. 2) c_can_reset_ram() operation will set start bit when enable is true. We should clear it in the error handling. We fix it by adding c_can_pm_runtime_put_sync() for 1), and c_can_reset_ram(enable is false) for 2) in the error handling. Fixes: 8212003260c60 ("can: c_can: Add d_can suspend resume support") Fixes: 52cde85acc23f ("can: c_can: Add d_can raminit support") Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Link: https://lore.kernel.org/r/20201128133922.3276973-2-zhangqilong3@huawei.com [mkl: return "0" instead of "ret"] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-30can: sun4i_can: sun4i_can_err(): don't count arbitration lose as an errorJeroen Hofstee
Losing arbitration is normal in a CAN-bus network, it means that a higher priority frame is being send and the pending message will be retried later. Hence most driver only increment arbitration_lost, but the sun4i driver also incremeants tx_error, causing errors to be reported on a normal functioning CAN-bus. So stop counting them as errors. Fixes: 0738eff14d81 ("can: Allwinner A10/A20 CAN Controller support - Kernel module") Signed-off-by: Jeroen Hofstee <jhofstee@victronenergy.com> Link: https://lore.kernel.org/r/20201127095941.21609-1-jhofstee@victronenergy.com [mkl: split into two seperate patches] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-30can: sja1000: sja1000_err(): don't count arbitration lose as an errorJeroen Hofstee
Losing arbitration is normal in a CAN-bus network, it means that a higher priority frame is being send and the pending message will be retried later. Hence most driver only increment arbitration_lost, but the sja1000 driver also incremeants tx_error, causing errors to be reported on a normal functioning CAN-bus. So stop counting them as errors. Fixes: 8935f57e68c4 ("can: sja1000: fix network statistics update") Signed-off-by: Jeroen Hofstee <jhofstee@victronenergy.com> Link: https://lore.kernel.org/r/20201127095941.21609-1-jhofstee@victronenergy.com [mkl: split into two seperate patches] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-30can: m_can: tcan4x5x_can_probe(): fix error path: remove erroneous ↵Marc Kleine-Budde
clk_disable_unprepare() The clocks mcan_class->cclk and mcan_class->hclk are not prepared by any call during tcan4x5x_can_probe(), so remove erroneous clk_disable_unprepare() on them. Fixes: 5443c226ba91 ("can: tcan4x5x: Add tcan4x5x driver to the kernel") Link: http://lore.kernel.org/r/20201130114252.215334-1-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-29Linux 5.10-rc6v5.10-rc6Linus Torvalds
2020-11-29can: tcan4x5x: tcan4x5x_clear_interrupts(): remove redundant return statementSean Nyekjaer
This patch removes a redundant return at the end of tcan4x5x_clear_interrupts(). Signed-off-by: Sean Nyekjaer <sean@geanix.com> Link: http://lore.kernel.org/r/20191211141635.322577-1-sean@geanix.com Reported-by: Daniels Umanovskis <daniels@umanovskis.se> Acked-by: Dan Murphy <dmurphy@ti.com> Fixes: 5443c226ba91 ("can: tcan4x5x: Add tcan4x5x driver to the kernel") Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-29can: mcp251xfd: tef-path: reduce number of SPI core requests to set UINC bitMarc Kleine-Budde
Reduce the number of separate SPI core requests when setting the UINC bit in the TEF FIFO, and instead batch them up into a single SPI core request. Link: https://lore.kernel.org/r/20201126132144.351154-6-mkl@pengutronix.de Tested-by: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-29can: mcp251xfd: move struct mcp251xfd_tef_ring definitionMarc Kleine-Budde
This patch moves the struct mcp251xfd_tef_ring upwards, so that the union mcp251xfd_write_reg_buf and struct spi_transfer can be made members of it. Link: https://lore.kernel.org/r/20201126132144.351154-5-mkl@pengutronix.de Tested-by: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-29can: mcp251xfd: struct mcp251xfd_priv::tef to array of length 1Marc Kleine-Budde
This patch converts the struct mcp251xfd_tef_ring member within the struct mcp251xfd_priv into an array of length one. This way all rings (tef, tx and rx) can be accessed in the same way. Link: https://lore.kernel.org/r/20201126132144.351154-4-mkl@pengutronix.de Tested-by: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-29can: mcp25xxfd: rx-path: reduce number of SPI core requests to set UINC bitUrsula Maplehurst
Reduce the number of separate SPI core requests when setting the UINC bit in the RX FIFO, and instead batch them up into a single SPI core request. Link: https://github.com/marckleinebudde/linux/issues/4 Link: https://lore.kernel.org/r/20201126132144.351154-3-mkl@pengutronix.de Tested-by: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Ursula Maplehurst <ursula@kangatronix.co.uk> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-29can: mcp251xfd: mcp25xxfd_ring_alloc(): add define instead open coding the ↵Marc Kleine-Budde
maximum number of RX objects This patch add a define for the maximum number of RX objects instead of open coding it. Link: https://lore.kernel.org/r/20201126132144.351154-2-mkl@pengutronix.de Tested-by: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>