summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-03-30bpf: Verifier, refine 32bit bound in do_refine_retval_rangeJohn Fastabend
Further refine return values range in do_refine_retval_range by noting these are int return types (We will assume here that int is a 32-bit type). Two reasons to pull this out of original patch. First it makes the original fix impossible to backport. And second I've not seen this as being problematic in practice unlike the other case. Fixes: 849fa50662fbc ("bpf/verifier: refine retval R0 state for bpf_get_stack helper") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158560421952.10843.12496354931526965046.stgit@john-Precision-5820-Tower
2020-03-30bpf: Verifier, do explicit ALU32 bounds trackingJohn Fastabend
It is not possible for the current verifier to track ALU32 and JMP ops correctly. This can result in the verifier aborting with errors even though the program should be verifiable. BPF codes that hit this can work around it by changin int variables to 64-bit types, marking variables volatile, etc. But this is all very ugly so it would be better to avoid these tricks. But, the main reason to address this now is do_refine_retval_range() was assuming return values could not be negative. Once we fixed this code that was previously working will no longer work. See do_refine_retval_range() patch for details. And we don't want to suddenly cause programs that used to work to fail. The simplest example code snippet that illustrates the problem is likely this, 53: w8 = w0 // r8 <- [0, S32_MAX], // w8 <- [-S32_MIN, X] 54: w8 <s 0 // r8 <- [0, U32_MAX] // w8 <- [0, X] The expected 64-bit and 32-bit bounds after each line are shown on the right. The current issue is without the w* bounds we are forced to use the worst case bound of [0, U32_MAX]. To resolve this type of case, jmp32 creating divergent 32-bit bounds from 64-bit bounds, we add explicit 32-bit register bounds s32_{min|max}_value and u32_{min|max}_value. Then from branch_taken logic creating new bounds we can track 32-bit bounds explicitly. The next case we observed is ALU ops after the jmp32, 53: w8 = w0 // r8 <- [0, S32_MAX], // w8 <- [-S32_MIN, X] 54: w8 <s 0 // r8 <- [0, U32_MAX] // w8 <- [0, X] 55: w8 += 1 // r8 <- [0, U32_MAX+1] // w8 <- [0, X+1] In order to keep the bounds accurate at this point we also need to track ALU32 ops. To do this we add explicit ALU32 logic for each of the ALU ops, mov, add, sub, etc. Finally there is a question of how and when to merge bounds. The cases enumerate here, 1. MOV ALU32 - zext 32-bit -> 64-bit 2. MOV ALU64 - copy 64-bit -> 32-bit 3. op ALU32 - zext 32-bit -> 64-bit 4. op ALU64 - n/a 5. jmp ALU32 - 64-bit: var32_off | upper_32_bits(var64_off) 6. jmp ALU64 - 32-bit: (>> (<< var64_off)) Details for each case, For "MOV ALU32" BPF arch zero extends so we simply copy the bounds from 32-bit into 64-bit ensuring we truncate var_off and 64-bit bounds correctly. See zext_32_to_64. For "MOV ALU64" copy all bounds including 32-bit into new register. If the src register had 32-bit bounds the dst register will as well. For "op ALU32" zero extend 32-bit into 64-bit the same as move, see zext_32_to_64. For "op ALU64" calculate both 32-bit and 64-bit bounds no merging is done here. Except we have a special case. When RSH or ARSH is done we can't simply ignore shifting bits from 64-bit reg into the 32-bit subreg. So currently just push bounds from 64-bit into 32-bit. This will be correct in the sense that they will represent a valid state of the register. However we could lose some accuracy if an ARSH is following a jmp32 operation. We can handle this special case in a follow up series. For "jmp ALU32" mark 64-bit reg unknown and recalculate 64-bit bounds from tnum by setting var_off to ((<<(>>var_off)) | var32_off). We special case if 64-bit bounds has zero'd upper 32bits at which point we can simply copy 32-bit bounds into 64-bit register. This catches a common compiler trick where upper 32-bits are zeroed and then 32-bit ops are used followed by a 64-bit compare or 64-bit op on a pointer. See __reg_combine_64_into_32(). For "jmp ALU64" cast the bounds of the 64bit to their 32-bit counterpart. For example s32_min_value = (s32)reg->smin_value. For tnum use only the lower 32bits via, (>>(<<var_off)). See __reg_combine_64_into_32(). Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158560419880.10843.11448220440809118343.stgit@john-Precision-5820-Tower
2020-03-30Merge tag 'regulator-spi-v5.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc Pull spi and regulator updates from Mark Brown: "At one point in the release cycle I managed to fat finger things and apply some SPI fixes onto a regulator branch and merge that into the SPI tree, then pull in a change shared with the MTD tree moving the Mediatek quadspi driver over to become the Mediatek spi-nor driver in the SPI tree. This has made a mess which I only just noticed while preparing this and I can't see a sensible way to unpick things due to other subsequent merge commits especially the pull from MTD so it looks like the most sensible thing to do is give up and combine the two pull requests. Fortunately both subsystems were fairly quiet this cycle, the highlights are: regulator: - Support for Monoloithic Power Systems MP5416, MP8867 and MPS8869 and Qualcomm PMI8994 and SMB208. SPI: - Lots of enhancements for spi-fsl-dspi, including XSPI mode support, from Vladimir Oltean. - Support for amlogic Meson G12A, IBM FSI, Mediatek spi-nor (moved from MTD), NXP i.MX8Mx, Rockchip PX30, RK3308 and RK3328, and Qualcomm Atheros AR934x/QCA95xx" * tag 'regulator-spi-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc: (118 commits) spi: efm32: Convert to use GPIO descriptors regulator: qcom_smd: Add pmi8994 regulator support regulator: da9063: Fix get_mode() functions to read sleep field spi: spi-fsl-lpspi: Replace zero-length array with flexible-array member spi: spi-s3c24xx: Replace zero-length array with flexible-array member spi: stm32: Fix comments compilation warnings spi: atmel-quadspi: Add verbose debug facilities to monitor register accesses spi: spi-fsl-dspi: Add support for LS1028A spi: spi-fsl-dspi: Move invariant configs out of dspi_transfer_one_message spi: spi-fsl-dspi: Fix interrupt-less DMA mode taking an XSPI code path spi: spi-fsl-dspi: Avoid NULL pointer in dspi_slave_abort for non-DMA mode spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion spi: spi-fsl-dspi: Protect against races on dspi->words_in_flight spi: spi-fsl-dspi: Avoid reading more data than written in EOQ mode spi: spi-fsl-dspi: Fix bits-per-word acceleration in DMA mode spi: spi-fsl-dspi: Fix little endian access to PUSHR CMD and TXDATA spi: spi-fsl-dspi: Don't access reserved fields in SPI_MCR regulator: driver.h: fix regulator_map_* function names regulator: da9063: fix suspend spi: mxs: Drop GPIO includes ...
2020-03-30Merge tag 'regmap-v5.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regma: update from Mark Brown: "Only one small documentation fix for regmap this time around" * tag 'regmap-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: wrong descriptions in regmap_range_cfg
2020-03-30bpf: Verifier, do_refine_retval_range may clamp umin to 0 incorrectlyJohn Fastabend
do_refine_retval_range() is called to refine return values from specified helpers, probe_read_str and get_stack at the moment, the reasoning is because both have a max value as part of their input arguments and because the helper ensure the return value will not be larger than this we can set smax values of the return register, r0. However, the return value is a signed integer so setting umax is incorrect It leads to further confusion when the do_refine_retval_range() then calls, __reg_deduce_bounds() which will see a umax value as meaning the value is unsigned and then assuming it is unsigned set the smin = umin which in this case results in 'smin = 0' and an 'smax = X' where X is the input argument from the helper call. Here are the comments from _reg_deduce_bounds() on why this would be safe to do. /* Learn sign from unsigned bounds. Signed bounds cross the sign * boundary, so we must be careful. */ if ((s64)reg->umax_value >= 0) { /* Positive. We can't learn anything from the smin, but smax * is positive, hence safe. */ reg->smin_value = reg->umin_value; reg->smax_value = reg->umax_value = min_t(u64, reg->smax_value, reg->umax_value); But now we incorrectly have a return value with type int with the signed bounds (0,X). Suppose the return value is negative, which is possible the we have the verifier and reality out of sync. Among other things this may result in any error handling code being falsely detected as dead-code and removed. For instance the example below shows using bpf_probe_read_str() causes the error path to be identified as dead code and removed. >From the 'llvm-object -S' dump, r2 = 100 call 45 if r0 s< 0 goto +4 r4 = *(u32 *)(r7 + 0) But from dump xlate (b7) r2 = 100 (85) call bpf_probe_read_compat_str#-96768 (61) r4 = *(u32 *)(r7 +0) <-- dropped if goto Due to verifier state after call being R0=inv(id=0,umax_value=100,var_off=(0x0; 0x7f)) To fix omit setting the umax value because its not safe. The only actual bounds we know is the smax. This results in the correct bounds (SMIN, X) where X is the max length from the helper. After this the new verifier state looks like the following after call 45. R0=inv(id=0,smax_value=100) Then xlated version no longer removed dead code giving the expected result, (b7) r2 = 100 (85) call bpf_probe_read_compat_str#-96768 (c5) if r0 s< 0x0 goto pc+4 (61) r4 = *(u32 *)(r7 +0) Note, bpf_probe_read_* calls are root only so we wont hit this case with non-root bpf users. v3: comment had some documentation about meta set to null case which is not relevant here and confusing to include in the comment. v2 note: In original version we set msize_smax_value from check_func_arg() and propagated this into smax of retval. The logic was smax is the bound on the retval we set and because the type in the helper is ARG_CONST_SIZE we know that the reg is a positive tnum_const() so umax=smax. Alexei pointed out though this is a bit odd to read because the register in check_func_arg() has a C type of u32 and the umax bound would be the normally relavent bound here. Pulling in extra knowledge about future checks makes reading the code a bit tricky. Further having a signed meta data that can only ever be positive is also a bit odd. So dropped the msize_smax_value metadata and made it a u64 msize_max_value to indicate its unsigned. And additionally save bound from umax value in check_arg_funcs which is the same as smax due to as noted above tnumx_cont and negative check but reads better. By my analysis nothing functionally changes in v2 but it does get easier to read so that is win. Fixes: 849fa50662fbc ("bpf/verifier: refine retval R0 state for bpf_get_stack helper") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158560417900.10843.14351995140624628941.stgit@john-Precision-5820-Tower
2020-03-30Merge tag 'staging-5.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging and IIO driver updates from Greg KH: "Here is the big staging and IIO driver pull request for 5.7-rc1. We again end up deleting more code than we added here, thanks to finally getting rid of the old and obsolete wireless USB stuff, and the exfat code (which is coming in again through the vfs tree in a much cleaner version). But some code does come back, with the octeon drivers being found to actually be used in the wild, so those deletions are now reverted. Other than those major things, just loads and loads of tiny checkpatch cleanups all over the place, along with new IIO drivers and fixes. All have been in linux-next with no reported issues" [ Stephen Rothwell points out some reported issues due to merge conflicts ] * tag 'staging-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (415 commits) staging: vt6656: Use DIV_ROUND_UP macro instead of specific code staging: remove hp100 driver staging: wilc1000: Use crc7 in lib/ rather than a private copy Staging: rtl8192u: ieee80211: Use netdev_alert(). Staging: rtl8192u: ieee80211: Use netdev_info() with network devices. Staging: rtl8192u: ieee80211: Use netdev_warn() for network devices. Staging: rtl8192u: ieee80211: Use netdev_dbg() for debug messages. staging: wlan-ng: fix use-after-free Read in hfa384x_usbin_callback staging: rtl8723bs: hal: Remove NULL check before kfree staging: rtl8723bs: hal: Correct typos in comments staging: rtl8723bs: os_dep: Correct typos in comments staging: rtl8723bs: core: Correct typos in comments staging: rtl8723bs: hal: Remove unnecessary cast on void pointer staging: rtl8188eu: cleanup long line in odm.c staging: rtl8723bs: hal: Compress return logic staging: rtl8723bs: rtw_cmd: Compress lines for immediate return staging: rtl8723bs: rtw_efuse: Compress lines for immediate return staging: wilc1000: remove label from examples in DT binding documentation staging: rtl8723bs: Remove blank line before '}' brace Staging: rtl8188eu: hal: Add space around operators ...
2020-03-30Merge tag 'driver-core-5.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the "big" set of driver core changes for 5.7-rc1. Nothing huge in here, just lots of little firmware core changes and use of new apis, a libfs fix, a debugfs api change, and some driver core deferred probe rework. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (44 commits) Revert "driver core: Set fw_devlink to "permissive" behavior by default" driver core: Set fw_devlink to "permissive" behavior by default driver core: Replace open-coded list_last_entry() driver core: Read atomic counter once in driver_probe_done() libfs: fix infoleak in simple_attr_read() driver core: Add device links from fwnode only for the primary device platform/x86: touchscreen_dmi: Add info for the Chuwi Vi8 Plus tablet platform/x86: touchscreen_dmi: Add EFI embedded firmware info support Input: icn8505 - Switch to firmware_request_platform for retreiving the fw Input: silead - Switch to firmware_request_platform for retreiving the fw selftests: firmware: Add firmware_request_platform tests test_firmware: add support for firmware_request_platform firmware: Add new platform fallback mechanism and firmware_request_platform() Revert "drivers: base: power: wakeup.c: Use built-in RCU list checking" drivers: base: power: wakeup.c: Use built-in RCU list checking component: allow missing unbind callback debugfs: remove return value of debugfs_create_file_size() debugfs: Check module state before warning in {full/open}_proxy_open() firmware: fix a double abort case with fw_load_sysfs_fallback arch_topology: Fix putting invalid cpu clk ...
2020-03-30bpf, lsm: Make BPF_LSM depend on BPF_EVENTSKP Singh
LSM and tracing programs share their helpers with bpf_tracing_func_proto which is only defined (in bpf_trace.c) when BPF_EVENTS is enabled. Instead of adding __weak symbol, make BPF_LSM depend on BPF_EVENTS so that both tracing and LSM programs can actually share helpers. Fixes: fc611f47f218 ("bpf: Introduce BPF_PROG_TYPE_LSM") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200330204059.13024-1-kpsingh@chromium.org
2020-03-30Merge tag 'usb-5.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB / PHY updates from Greg KH: "Here are the big set of USB and PHY driver patches for 5.7-rc1. Nothing huge here, some new PHY drivers, loads of USB gadget fixes and updates, xhci updates, usb-serial driver updates and new device ids, and other minor things. Full details in the shortlog. All have been in linux-next for a while with no reported issues" * tag 'usb-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (239 commits) USB: cdc-acm: restore capability check order usb: cdns3: make signed 1 bit bitfields unsigned usb: gadget: fsl: remove unused variable 'driver_desc' usb: gadget: f_fs: Fix use after free issue as part of queue failure usb: typec: Correct the documentation for typec_cable_put() USB: serial: io_edgeport: fix slab-out-of-bounds read in edge_interrupt_callback USB: serial: option: add Wistron Neweb D19Q1 USB: serial: option: add BroadMobi BM806U USB: serial: option: add support for ASKEY WWHC050 usb: core: Add ACPI support for USB interface devices driver core: platform: Reimplement devm_platform_ioremap_resource usb: dwc2: convert to devm_platform_get_and_ioremap_resource usb: host: hisilicon: convert to devm_platform_get_and_ioremap_resource usb: host: xhci-plat: convert to devm_platform_get_and_ioremap_resource drivers: provide devm_platform_get_and_ioremap_resource() phy: qcom-qusb2: Add new overriding tuning parameters in QUSB2 V2 PHY phy: qcom-qusb2: Add support for overriding tuning parameters in QUSB2 V2 PHY dt-bindings: phy: qcom-qusb2: Add support for overriding Phy tuning parameters phy: qcom-qusb2: Add generic QUSB2 V2 PHY support dt-bindings: phy: qcom,qusb2: Add compatibles for QUSB2 V2 phy and SC7180 ...
2020-03-30Merge branch 'bpf_sk_assign'Alexei Starovoitov
Joe Stringer says: ==================== Introduce a new helper that allows assigning a previously-found socket to the skb as the packet is received towards the stack, to cause the stack to guide the packet towards that socket subject to local routing configuration. The intention is to support TProxy use cases more directly from eBPF programs attached at TC ingress, to simplify and streamline Linux stack configuration in scale environments with Cilium. Normally in ip{,6}_rcv_core(), the skb will be orphaned, dropping any existing socket reference associated with the skb. Existing tproxy implementations in netfilter get around this restriction by running the tproxy logic after ip_rcv_core() in the PREROUTING table. However, this is not an option for TC-based logic (including eBPF programs attached at TC ingress). This series introduces the BPF helper bpf_sk_assign() to associate the socket with the skb on the ingress path as the packet is passed up the stack. The initial patch in the series simply takes a reference on the socket to ensure safety, but later patches relax this for listen sockets. To ensure delivery to the relevant socket, we still consult the routing table, for full examples of how to configure see the tests in patch #5; the simplest form of the route would look like this: $ ip route add local default dev lo This series is laid out as follows: * Patch 1 extends the eBPF API to add sk_assign() and defines a new socket free function to allow the later paths to understand when the socket associated with the skb should be kept through receive. * Patches 2-3 optimize the receive path to avoid taking a reference on listener sockets during receive. * Patches 4-5 extends the selftests with examples of the new functionality and validation of correct behaviour. Changes since v4: * Fix build with CONFIG_INET disabled * Rebase Changes since v3: * Use sock_gen_put() directly instead of sock_edemux() from sock_pfree() * Commit message wording fixups * Add acks from Martin, Lorenz * Rebase Changes since v2: * Add selftests for UDP socket redirection * Drop the early demux optimization patch (defer for more testing) * Fix check for orphaning after TC act return * Tidy up the tests to clean up properly and be less noisy. Changes since v1: * Replace the metadata_dst approach with using the skb->destructor to determine whether the socket has been prefetched. This is much simpler. * Avoid taking a reference on listener sockets during receive * Restrict assigning sockets across namespaces * Restrict assigning SO_REUSEPORT sockets * Fix cookie usage for socket dst check * Rebase the tests against test_progs infrastructure * Tidy up commit messages ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-03-30selftests: bpf: Extend sk_assign tests for UDPJoe Stringer
Add support for testing UDP sk_assign to the existing tests. Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Lorenz Bauer <lmb@cloudflare.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200329225342.16317-6-joe@wand.net.nz
2020-03-30selftests: bpf: Add test for sk_assignLorenz Bauer
Attach a tc direct-action classifier to lo in a fresh network namespace, and rewrite all connection attempts to localhost:4321 to localhost:1234 (for port tests) and connections to unreachable IPv4/IPv6 IPs to the local socket (for address tests). Includes implementations for both TCP and UDP. Keep in mind that both client to server and server to client traffic passes the classifier. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200329225342.16317-5-joe@wand.net.nz Co-authored-by: Joe Stringer <joe@wand.net.nz>
2020-03-30bpf: Don't refcount LISTEN sockets in sk_assign()Joe Stringer
Avoid taking a reference on listen sockets by checking the socket type in the sk_assign and in the corresponding skb_steal_sock() code in the the transport layer, and by ensuring that the prefetch free (sock_pfree) function uses the same logic to check whether the socket is refcounted. Suggested-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200329225342.16317-4-joe@wand.net.nz
2020-03-30net: Track socket refcounts in skb_steal_sock()Joe Stringer
Refactor the UDP/TCP handlers slightly to allow skb_steal_sock() to make the determination of whether the socket is reference counted in the case where it is prefetched by earlier logic such as early_demux. Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200329225342.16317-3-joe@wand.net.nz
2020-03-30bpf: Add socket assign supportJoe Stringer
Add support for TPROXY via a new bpf helper, bpf_sk_assign(). This helper requires the BPF program to discover the socket via a call to bpf_sk*_lookup_*(), then pass this socket to the new helper. The helper takes its own reference to the socket in addition to any existing reference that may or may not currently be obtained for the duration of BPF processing. For the destination socket to receive the traffic, the traffic must be routed towards that socket via local route. The simplest example route is below, but in practice you may want to route traffic more narrowly (eg by CIDR): $ ip route add local default dev lo This patch avoids trying to introduce an extra bit into the skb->sk, as that would require more invasive changes to all code interacting with the socket to ensure that the bit is handled correctly, such as all error-handling cases along the path from the helper in BPF through to the orphan path in the input. Instead, we opt to use the destructor variable to switch on the prefetch of the socket. Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200329225342.16317-2-joe@wand.net.nz
2020-03-30Merge tag 'media/v5.7-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - New sensor driver: imx219 - Support for some new pixelformats - Support for Sun8i SoC - Added more codecs to meson vdec driver - Prepare for removing the legacy usbvision driver by moving it to staging. This driver has issues and use legacy core APIs. If nobody steps up to address those, it is time for its retirement. - Several cleanups and improvements on drivers, with the addition of new supported boards * tag 'media/v5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (236 commits) media: venus: firmware: Ignore secure call error on first resume media: mtk-vpu: load vpu firmware from the new location media: i2c: video-i2c: fix build errors due to 'imply hwmon' media: MAINTAINERS: add myself to co-maintain Hantro G1/G2 for i.MX8MQ media: hantro: add initial i.MX8MQ support media: dt-bindings: Document i.MX8MQ VPU bindings media: vivid: fix incorrect PA assignment to HDMI outputs media: hantro: Add linux-rockchip mailing list to MAINTAINERS media: cedrus: h264: Fix 4K decoding on H6 media: siano: Use scnprintf() for avoiding potential buffer overflow media: rc: Use scnprintf() for avoiding potential buffer overflow media: allegro: create new struct for channel parameters media: allegro: move mail definitions to separate file media: allegro: pass buffers through firmware media: allegro: verify source and destination buffer in VCU response media: allegro: handle dependency of bitrate and bitrate_peak media: allegro: read bitrate mode directly from control media: allegro: make QP configurable media: allegro: make frame rate configurable media: allegro: skip filler data if possible ...
2020-03-30bpf, doc: Add John as official reviewer to BPF subsystemDaniel Borkmann
We've added John Fastabend to our weekly BPF patch review rotation over last months now where he provided excellent and timely feedback on BPF patches. Therefore, add him to the BPF core reviewer team to the MAINTAINERS file to reflect that. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/0e9a74933b3f21f4c5b5a3bc7f8e900b39805639.1585556231.git.daniel@iogearbox.net
2020-03-30Merge tag 'hwmon-for-v5.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon updates from Guenter Roeck: - New driver for AXI fan control - Attenuator bypass support and support for inverting pwm output in adt7475 driver - Support for new power supply version in ibm-cffps driver - PMBus drivers: * support for multi-phase chips * ltc2978 driver: add support for LTC2972, LTC2979, LTC3884, LTC3889, LTC7880, LTM4664, LTM4677, LTM4678, LTM4680, and LTM4700/ * tps53679 driver: add support for TPS53681, TPS53647, and TPS53667 * isl68137 driver: support for various 2nd Gen Renesas digital multiphase chips added to isl68137 driver - Minor improvements and fixes in nct7904, ibmpowernv, lm73, ibmaem, and k10temp drivers * tag 'hwmon-for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (29 commits) docs: hwmon: Update documentation for isl68137 pmbus driver hwmon: (pmbus) add support for 2nd Gen Renesas digital multiphase hwmon: (pmbus/ibm-cffps) Add another PSU CCIN to version detection hwmon: (nct7904) Fix the incorrect quantity for fan & temp attributes hwmon: (ibmpowernv) Use scnprintf() for avoiding potential buffer overflow hwmon: (adt7475) Add support for inverting pwm output hwmon: (adt7475) Add attenuator bypass support dt-bindings: hwmon: Document adt7475 pwm-active-state property dt-bindings: hwmon: Document adt7475 bypass-attenuator property dt-bindings: hwmon: Document adt7475 binding hwmon: (lm73) Add support for of_match_table dt-bindings: Add TI LM73 as a trivial device hwmon: (pmbus/tps53679) Add documentation hwmon: (pmbus/tps53679) Add support for TPS53647 and TPS53667 hwmon: (pmbus/tps53679) Add support for TPS53681 hwmon: (pmbus/tps53679) Add support for IIN and PIN to TPS53679 and TPS53688 hwmon: (pmbus/tps53679) Add support for multiple chips IDs hwmon: (pmbus) Implement multi-phase support hwmon: (pmbus) Add 'phase' parameter where needed for multi-phase support hwmon: (pmbus) Add IC_DEVICE_ID and IC_DEVICE_REV command definitions ...
2020-03-30bpf: btf: Fix arg verification in btf_ctx_access()KP Singh
The bounds checking for the arguments accessed in the BPF program breaks when the expected_attach_type is not BPF_TRACE_FEXIT, BPF_LSM_MAC or BPF_MODIFY_RETURN resulting in no check being done for the default case (the programs which do not receive the return value of the attached function in its arguments) when the index of the argument being accessed is equal to the number of arguments (nr_args). This was a result of a misplaced "else if" block introduced by the Commit 6ba43b761c41 ("bpf: Attachment verification for BPF_MODIFY_RETURN") Fixes: 6ba43b761c41 ("bpf: Attachment verification for BPF_MODIFY_RETURN") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200330144246.338-1-kpsingh@chromium.org
2020-03-30Merge tag 'ras_updates_for_5.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Do not report spurious MCEs on some Intel platforms caused by errata; by Prarit Bhargava. - Change dev-mcelog's hardcoded limit of 32 error records to a dynamic one, controlled by the number of logical CPUs, by Tony Luck. - Add support for the processor identification number (PPIN) on AMD, by Wei Huang. * tag 'ras_updates_for_5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce/amd: Add PPIN support for AMD MCE x86/mce/dev-mcelog: Dynamically allocate space for machine check records x86/mce: Do not log spurious corrected mce errors
2020-03-30Merge tag 'edac_updates_for_5.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - A substantial edac_mc cleanup, sanitizing object freeing, streamlining and simplifying code flow, and getting rid of a lot of needless complexity in memory controller representation code, by Robert Richter. - A new EDAC driver for the ARM DMC-520 memory controller, by Lei Wang, Shiping Ji and others. - The usual sprinkling of misc cleanups and fixes all over the subsystem. * tag 'edac_updates_for_5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/armada_xp: Use scnprintf() for avoiding potential buffer overflow EDAC/synopsys: Do not dump uninitialized pinf->col EDAC: Add EDAC driver for DMC520 dt-bindings: edac: Dmc-520.yaml EDAC/mce_amd: Print !SMCA processor warning only once EDAC/mc: Remove per layer counters EDAC/mc: Remove detail[] string and cleanup error string generation EDAC/mc: Pass the error descriptor to error reporting functions EDAC/mc: Remove enable_per_layer_report function argument EDAC/mc: Report "unknown memory" on too many DIMM labels found EDAC/mc: Carve out error increment into a separate function EDAC/mc: Determine mci pointer from the error descriptor EDAC: Store error type in struct edac_raw_error_desc EDAC/mc: Reorder functions edac_mc_alloc*() EDAC/mc: Split edac_mc_alloc() into smaller functions EDAC/mc: Change mci device removal to use put_device()
2020-03-30Merge tag 'pstore-v5.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore updates from Kees Cook: "These mostly some minor cleanups and a bug fix for an ftrace corner case: - Improve failure paths (chenqiwu) - Fix ftrace position index (Vasily Averin) - Use proper flexible-array member (Gustavo A. R. Silva)" * tag 'pstore-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/ram: Replace zero-length array with flexible-array member pstore: pstore_ftrace_seq_next should increase position index pstore/ram: remove unnecessary ramoops_unregister_dummy() pstore/platform: fix potential mem leak if pstore_init_fs failed
2020-03-30Merge tag 'seccomp-v5.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: "A couple of seccomp updates. They're both mostly bug fixes that I wanted to have sit in linux-next for a while: - allow TSYNC and USER_NOTIF together (Tycho Andersen) - add missing compat_ioctl for notify (Sven Schnelle)" * tag 'seccomp-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: seccomp: Add missing compat_ioctl for notify seccomp: allow TSYNC and USER_NOTIF together
2020-03-30Merge tag 'erofs-for-5.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "Updates with a XArray adaptation, several fixes for shrinker and corrupted images are ready for this cycle. All commits have been stress tested with no noticeable smoke out and have been in linux-next as well. Summary: - Convert radix tree usage to XArray - Fix shrink scan count on multiple filesystem instances - Better handling for specific corrupted images - Update my email address in MAINTAINERS" * tag 'erofs-for-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: MAINTAINERS: erofs: update my email address erofs: handle corrupted images whose decompressed size less than it'd be erofs: use LZ4_decompress_safe() for full decoding erofs: correct the remaining shrink objects erofs: convert workstn to XArray
2020-03-30Merge tag 'docs-5.7' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "This has been a busy cycle for documentation work. Highlights include: - Lots of RST conversion work by Mauro, Daniel ALmeida, and others. Maybe someday we'll get to the end of this stuff...maybe... - Some organizational work to bring some order to the core-api manual. - Various new docs and additions to the existing documentation. - Typo fixes, warning fixes, ..." * tag 'docs-5.7' of git://git.lwn.net/linux: (123 commits) Documentation: x86: exception-tables: document CONFIG_BUILDTIME_TABLE_SORT MAINTAINERS: adjust to filesystem doc ReST conversion docs: deprecated.rst: Add BUG()-family doc: zh_CN: add translation for virtiofs doc: zh_CN: index files in filesystems subdirectory docs: locking: Drop :c:func: throughout docs: locking: Add 'need' to hardirq section docs: conf.py: avoid thousands of duplicate label warning on Sphinx docs: prevent warnings due to autosectionlabel docs: fix reference to core-api/namespaces.rst docs: fix pointers to io-mapping.rst and io_ordering.rst files Documentation: Better document the softlockup_panic sysctl docs: hw-vuln: tsx_async_abort.rst: get rid of an unused ref docs: perf: imx-ddr.rst: get rid of a warning docs: filesystems: fuse.rst: supress a Sphinx warning docs: translations: it: avoid duplicate refs at programming-language.rst docs: driver.rst: supress two ReSt warnings docs: trace: events.rst: convert some new stuff to ReST format Documentation: Add io_ordering.rst to driver-api manual Documentation: Add io-mapping.rst to driver-api manual ...
2020-03-30Merge tag 'for-5.7/io_uring-2020-03-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring updates from Jens Axboe: "Here are the io_uring changes for this merge window. Light on new features this time around (just splice + buffer selection), lots of cleanups, fixes, and improvements to existing support. In particular, this contains: - Cleanup fixed file update handling for stack fallback (Hillf) - Re-work of how pollable async IO is handled, we no longer require thread offload to handle that. Instead we rely using poll to drive this, with task_work execution. - In conjunction with the above, allow expendable buffer selection, so that poll+recv (for example) no longer has to be a split operation. - Make sure we honor RLIMIT_FSIZE for buffered writes - Add support for splice (Pavel) - Linked work inheritance fixes and optimizations (Pavel) - Async work fixes and cleanups (Pavel) - Improve io-wq locking (Pavel) - Hashed link write improvements (Pavel) - SETUP_IOPOLL|SETUP_SQPOLL improvements (Xiaoguang)" * tag 'for-5.7/io_uring-2020-03-29' of git://git.kernel.dk/linux-block: (54 commits) io_uring: cleanup io_alloc_async_ctx() io_uring: fix missing 'return' in comment io-wq: handle hashed writes in chains io-uring: drop 'free_pfile' in struct io_file_put io-uring: drop completion when removing file io_uring: Fix ->data corruption on re-enqueue io-wq: close cancel gap for hashed linked work io_uring: make spdxcheck.py happy io_uring: honor original task RLIMIT_FSIZE io-wq: hash dependent work io-wq: split hashing and enqueueing io-wq: don't resched if there is no work io-wq: remove duplicated cancel code io_uring: fix truncated async read/readv and write/writev retry io_uring: dual license io_uring.h uapi header io_uring: io_uring_enter(2) don't poll while SETUP_IOPOLL|SETUP_SQPOLL enabled io_uring: Fix unused function warnings io_uring: add end-of-bits marker and build time verify it io_uring: provide means of removing buffers io_uring: add IOSQE_BUFFER_SELECT support for IORING_OP_RECVMSG ...
2020-03-30ipvs: fix uninitialized variable warningHaishuang Yan
If outer_proto is not set, GCC warning as following: In file included from net/netfilter/ipvs/ip_vs_core.c:52: net/netfilter/ipvs/ip_vs_core.c: In function 'ip_vs_in_icmp': include/net/ip_vs.h:233:4: warning: 'outer_proto' may be used uninitialized in this function [-Wmaybe-uninitialized] 233 | printk(KERN_DEBUG pr_fmt(msg), ##__VA_ARGS__); \ | ^~~~~~ net/netfilter/ipvs/ip_vs_core.c:1666:8: note: 'outer_proto' was declared here 1666 | char *outer_proto; | ^~~~~~~~~~~ Fixes: 73348fed35d0 ("ipvs: optimize tunnel dumps for icmp errors") Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30netfilter: nft_exthdr: fix endianness of tcp option castSergey Marinkevich
I got a problem on MIPS with Big-Endian is turned on: every time when NF trying to change TCP MSS it returns because of new.v16 was greater than old.v16. But real MSS was 1460 and my rule was like this: add rule table chain tcp option maxseg size set 1400 And 1400 is lesser that 1460, not greater. Later I founded that main causer is cast from u32 to __be16. Debugging: In example MSS = 1400(HEX: 0x578). Here is representation of each byte like it is in memory by addresses from left to right(e.g. [0x0 0x1 0x2 0x3]). LE — Little-Endian system, BE — Big-Endian, left column is type. LE BE u32: [78 05 00 00] [00 00 05 78] As you can see, u32 representation will be casted to u16 from different half of 4-byte address range. But actually nf_tables uses registers and store data of various size. Actually TCP MSS stored in 2 bytes. But registers are still u32 in definition: struct nft_regs { union { u32 data[20]; struct nft_verdict verdict; }; }; So, access like regs->data[priv->sreg] exactly u32. So, according to table presents above, per-byte representation of stored TCP MSS in register will be: LE BE (u32)regs->data[]: [78 05 00 00] [05 78 00 00] ^^ ^^ We see that register uses just half of u32 and other 2 bytes may be used for some another data. But in nft_exthdr_tcp_set_eval() it casted just like u32 -> __be16: new.v16 = src But u32 overfill __be16, so it get 2 low bytes. For clarity draw one more table(<xx xx> means that bytes will be used for cast). LE BE u32: [<78 05> 00 00] [00 00 <05 78>] (u32)regs->data[]: [<78 05> 00 00] [05 78 <00 00>] As you can see, for Little-Endian nothing changes, but for Big-endian we take the wrong half. In my case there is some other data instead of zeros, so new MSS was wrongly greater. For shooting this bug I used solution for ports ranges. Applying of this patch does not affect Little-Endian systems. Signed-off-by: Sergey Marinkevich <sergey.marinkevich@eltex-co.ru> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30bpf: Simplify reg_set_min_max_inv handlingJann Horn
reg_set_min_max_inv() contains exactly the same logic as reg_set_min_max(), just flipped around. While this makes sense in a cBPF verifier (where ALU operations are not symmetric), it does not make sense for eBPF. Replace reg_set_min_max_inv() with a helper that flips the opcode around, then lets reg_set_min_max() do the complicated work. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200330160324.15259-4-daniel@iogearbox.net
2020-03-30bpf: Fix tnum constraints for 32-bit comparisonsJann Horn
The BPF verifier tried to track values based on 32-bit comparisons by (ab)using the tnum state via 581738a681b6 ("bpf: Provide better register bounds after jmp32 instructions"). The idea is that after a check like this: if ((u32)r0 > 3) exit We can't meaningfully constrain the arithmetic-range-based tracking, but we can update the tnum state to (value=0,mask=0xffff'ffff'0000'0003). However, the implementation from 581738a681b6 didn't compute the tnum constraint based on the fixed operand, but instead derives it from the arithmetic-range-based tracking. This means that after the following sequence of operations: if (r0 >= 0x1'0000'0001) exit if ((u32)r0 > 7) exit The verifier assumed that the lower half of r0 is in the range (0, 0) and apply the tnum constraint (value=0,mask=0xffff'ffff'0000'0000) thus causing the overall tnum to be (value=0,mask=0x1'0000'0000), which was incorrect. Provide a fixed implementation. Fixes: 581738a681b6 ("bpf: Provide better register bounds after jmp32 instructions") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200330160324.15259-3-daniel@iogearbox.net
2020-03-30bpf: Undo incorrect __reg_bound_offset32 handlingDaniel Borkmann
Anatoly has been fuzzing with kBdysch harness and reported a hang in one of the outcomes: 0: (b7) r0 = 808464432 1: (7f) r0 >>= r0 2: (14) w0 -= 808464432 3: (07) r0 += 808464432 4: (b7) r1 = 808464432 5: (de) if w1 s<= w0 goto pc+0 R0_w=invP(id=0,umin_value=808464432,umax_value=5103431727,var_off=(0x30303020;0x10000001f)) R1_w=invP808464432 R10=fp0 6: (07) r0 += -2144337872 7: (14) w0 -= -1607454672 8: (25) if r0 > 0x30303030 goto pc+0 R0_w=invP(id=0,umin_value=271581184,umax_value=271581311,var_off=(0x10300000;0x7f)) R1_w=invP808464432 R10=fp0 9: (76) if w0 s>= 0x303030 goto pc+2 12: (95) exit from 8 to 9: safe from 5 to 6: R0_w=invP(id=0,umin_value=808464432,umax_value=5103431727,var_off=(0x30303020;0x10000001f)) R1_w=invP808464432 R10=fp0 6: (07) r0 += -2144337872 7: (14) w0 -= -1607454672 8: (25) if r0 > 0x30303030 goto pc+0 R0_w=invP(id=0,umin_value=271581184,umax_value=271581311,var_off=(0x10300000;0x7f)) R1_w=invP808464432 R10=fp0 9: safe from 8 to 9: safe verification time 589 usec stack depth 0 processed 17 insns (limit 1000000) [...] The underlying program was xlated as follows: # bpftool p d x i 9 0: (b7) r0 = 808464432 1: (7f) r0 >>= r0 2: (14) w0 -= 808464432 3: (07) r0 += 808464432 4: (b7) r1 = 808464432 5: (de) if w1 s<= w0 goto pc+0 6: (07) r0 += -2144337872 7: (14) w0 -= -1607454672 8: (25) if r0 > 0x30303030 goto pc+0 9: (76) if w0 s>= 0x303030 goto pc+2 10: (05) goto pc-1 11: (05) goto pc-1 12: (95) exit The verifier rewrote original instructions it recognized as dead code with 'goto pc-1', but reality differs from verifier simulation in that we're actually able to trigger a hang due to hitting the 'goto pc-1' instructions. Taking different examples to make the issue more obvious: in this example we're probing bounds on a completely unknown scalar variable in r1: [...] 5: R0_w=inv1 R1_w=inv(id=0) R10=fp0 5: (18) r2 = 0x4000000000 7: R0_w=inv1 R1_w=inv(id=0) R2_w=inv274877906944 R10=fp0 7: (18) r3 = 0x2000000000 9: R0_w=inv1 R1_w=inv(id=0) R2_w=inv274877906944 R3_w=inv137438953472 R10=fp0 9: (18) r4 = 0x400 11: R0_w=inv1 R1_w=inv(id=0) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R10=fp0 11: (18) r5 = 0x200 13: R0_w=inv1 R1_w=inv(id=0) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R5_w=inv512 R10=fp0 13: (2d) if r1 > r2 goto pc+4 R0_w=inv1 R1_w=inv(id=0,umax_value=274877906944,var_off=(0x0; 0x7fffffffff)) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R5_w=inv512 R10=fp0 14: R0_w=inv1 R1_w=inv(id=0,umax_value=274877906944,var_off=(0x0; 0x7fffffffff)) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R5_w=inv512 R10=fp0 14: (ad) if r1 < r3 goto pc+3 R0_w=inv1 R1_w=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7fffffffff)) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R5_w=inv512 R10=fp0 15: R0=inv1 R1=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7fffffffff)) R2=inv274877906944 R3=inv137438953472 R4=inv1024 R5=inv512 R10=fp0 15: (2e) if w1 > w4 goto pc+2 R0=inv1 R1=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7f00000000)) R2=inv274877906944 R3=inv137438953472 R4=inv1024 R5=inv512 R10=fp0 16: R0=inv1 R1=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7f00000000)) R2=inv274877906944 R3=inv137438953472 R4=inv1024 R5=inv512 R10=fp0 16: (ae) if w1 < w5 goto pc+1 R0=inv1 R1=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7f00000000)) R2=inv274877906944 R3=inv137438953472 R4=inv1024 R5=inv512 R10=fp0 [...] We're first probing lower/upper bounds via jmp64, later we do a similar check via jmp32 and examine the resulting var_off there. After fall-through in insn 14, we get the following bounded r1 with 0x7fffffffff unknown marked bits in the variable section. Thus, after knowing r1 <= 0x4000000000 and r1 >= 0x2000000000: max: 0b100000000000000000000000000000000000000 / 0x4000000000 var: 0b111111111111111111111111111111111111111 / 0x7fffffffff min: 0b010000000000000000000000000000000000000 / 0x2000000000 Now, in insn 15 and 16, we perform a similar probe with lower/upper bounds in jmp32. Thus, after knowing r1 <= 0x4000000000 and r1 >= 0x2000000000 and w1 <= 0x400 and w1 >= 0x200: max: 0b100000000000000000000000000000000000000 / 0x4000000000 var: 0b111111100000000000000000000000000000000 / 0x7f00000000 min: 0b010000000000000000000000000000000000000 / 0x2000000000 The lower/upper bounds haven't changed since they have high bits set in u64 space and the jmp32 tests can only refine bounds in the low bits. However, for the var part the expectation would have been 0x7f000007ff or something less precise up to 0x7fffffffff. A outcome of 0x7f00000000 is not correct since it would contradict the earlier probed bounds where we know that the result should have been in [0x200,0x400] in u32 space. Therefore, tests with such info will lead to wrong verifier assumptions later on like falsely predicting conditional jumps to be always taken, etc. The issue here is that __reg_bound_offset32()'s implementation from commit 581738a681b6 ("bpf: Provide better register bounds after jmp32 instructions") makes an incorrect range assumption: static void __reg_bound_offset32(struct bpf_reg_state *reg) { u64 mask = 0xffffFFFF; struct tnum range = tnum_range(reg->umin_value & mask, reg->umax_value & mask); struct tnum lo32 = tnum_cast(reg->var_off, 4); struct tnum hi32 = tnum_lshift(tnum_rshift(reg->var_off, 32), 32); reg->var_off = tnum_or(hi32, tnum_intersect(lo32, range)); } In the above walk-through example, __reg_bound_offset32() as-is chose a range after masking with 0xffffffff of [0x0,0x0] since umin:0x2000000000 and umax:0x4000000000 and therefore the lo32 part was clamped to 0x0 as well. However, in the umin:0x2000000000 and umax:0x4000000000 range above we'd end up with an actual possible interval of [0x0,0xffffffff] for u32 space instead. In case of the original reproducer, the situation looked as follows at insn 5 for r0: [...] 5: R0_w=invP(id=0,umin_value=808464432,umax_value=5103431727,var_off=(0x0; 0x1ffffffff)) R1_w=invP808464432 R10=fp0 0x30303030 0x13030302f 5: (de) if w1 s<= w0 goto pc+0 R0_w=invP(id=0,umin_value=808464432,umax_value=5103431727,var_off=(0x30303020; 0x10000001f)) R1_w=invP808464432 R10=fp0 0x30303030 0x13030302f [...] After the fall-through, we similarly forced the var_off result into the wrong range [0x30303030,0x3030302f] suggesting later on that fixed bits must only be of 0x30303020 with 0x10000001f unknowns whereas such assumption can only be made when both bounds in hi32 range match. Originally, I was thinking to fix this by moving reg into a temp reg and use proper coerce_reg_to_size() helper on the temp reg where we can then based on that define the range tnum for later intersection: static void __reg_bound_offset32(struct bpf_reg_state *reg) { struct bpf_reg_state tmp = *reg; struct tnum lo32, hi32, range; coerce_reg_to_size(&tmp, 4); range = tnum_range(tmp.umin_value, tmp.umax_value); lo32 = tnum_cast(reg->var_off, 4); hi32 = tnum_lshift(tnum_rshift(reg->var_off, 32), 32); reg->var_off = tnum_or(hi32, tnum_intersect(lo32, range)); } In the case of the concrete example, this gives us a more conservative unknown section. Thus, after knowing r1 <= 0x4000000000 and r1 >= 0x2000000000 and w1 <= 0x400 and w1 >= 0x200: max: 0b100000000000000000000000000000000000000 / 0x4000000000 var: 0b111111111111111111111111111111111111111 / 0x7fffffffff min: 0b010000000000000000000000000000000000000 / 0x2000000000 However, above new __reg_bound_offset32() has no effect on refining the knowledge of the register contents. Meaning, if the bounds in hi32 range mismatch we'll get the identity function given the range reg spans [0x0,0xffffffff] and we cast var_off into lo32 only to later on binary or it again with the hi32. Likewise, if the bounds in hi32 range match, then we mask both bounds with 0xffffffff, use the resulting umin/umax for the range to later intersect the lo32 with it. However, _prior_ called __reg_bound_offset() did already such intersection on the full reg and we therefore would only repeat the same operation on the lo32 part twice. Given this has no effect and the original commit had false assumptions, this patch reverts the code entirely which is also more straight forward for stable trees: apparently 581738a681b6 got auto-selected by Sasha's ML system and misclassified as a fix, so it got sucked into v5.4 where it should never have landed. A revert is low-risk also from a user PoV since it requires a recent kernel and llc to opt-into -mcpu=v3 BPF CPU to generate jmp32 instructions. A proper bounds refinement would need a significantly more complex approach which is currently being worked, but no stable material [0]. Hence revert is best option for stable. After the revert, the original reported program gets rejected as follows: 1: (7f) r0 >>= r0 2: (14) w0 -= 808464432 3: (07) r0 += 808464432 4: (b7) r1 = 808464432 5: (de) if w1 s<= w0 goto pc+0 R0_w=invP(id=0,umin_value=808464432,umax_value=5103431727,var_off=(0x0; 0x1ffffffff)) R1_w=invP808464432 R10=fp0 6: (07) r0 += -2144337872 7: (14) w0 -= -1607454672 8: (25) if r0 > 0x30303030 goto pc+0 R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x3fffffff)) R1_w=invP808464432 R10=fp0 9: (76) if w0 s>= 0x303030 goto pc+2 R0=invP(id=0,umax_value=3158063,var_off=(0x0; 0x3fffff)) R1=invP808464432 R10=fp0 10: (30) r0 = *(u8 *)skb[808464432] BPF_LD_[ABS|IND] uses reserved fields processed 11 insns (limit 1000000) [...] [0] https://lore.kernel.org/bpf/158507130343.15666.8018068546764556975.stgit@john-Precision-5820-Tower/T/ Fixes: 581738a681b6 ("bpf: Provide better register bounds after jmp32 instructions") Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200330160324.15259-2-daniel@iogearbox.net
2020-03-30Merge branch 'split-phylink-PCS-operations'David S. Miller
Russell King says: ==================== split phylink PCS operations This series splits the phylink_mac_ops structure so that PCS can be supported separately with their own PCS operations, separating them from the MAC layer. This may need adaption later as more users come along. v2: change pcs_config() and associated called function prototypes to only pass the information that is required, and add some documention. v3: change phylink_create() prototype ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30net: phylink: add separate pcs operations structureRussell King
Add a separate set of PCS operations, which MAC drivers can use to couple phylink with their associated MAC PCS layer. The PCS operations include: - pcs_get_state() - reads the link up/down, resolved speed, duplex and pause from the PCS. - pcs_config() - configures the PCS for the specified mode, PHY interface type, and setting the advertisement. - pcs_an_restart() - restarts 802.3 in-band negotiation with the link partner - pcs_link_up() - informs the PCS that link has come up, and the parameters of the link. Link parameters are used to program the PCS for fixed speed and non-inband modes. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30net: phylink: rename 'ops' to 'mac_ops'Russell King
Rename the bland 'ops' member of struct phylink to be a more descriptive 'mac_ops' - this is necessary as we're about to introduce another set of operations. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30net: phylink: change phylink_mii_c22_pcs_set_advertisement() prototypeRussell King
Change phylink_mii_c22_pcs_set_advertisement() to take only the PHY interface and advertisement mask, rather than the full phylink state. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30r8169: factor out rtl8169_tx_mapHeiner Kallweit
Factor out mapping the tx skb to a new function rtl8169_tx_map(). This allows to remove redundancies, and rtl8169_get_txd_opts1() has only one user left, so it can be inlined. As a result rtl8169_xmit_frags() is significantly simplified, and in rtl8169_start_xmit() the code is simplified and better readable. No functional change intended. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2020-03-29 Here are a few more Bluetooth patches for the 5.7 kernel: - Fix assumption of encryption key size when reading fails - Add support for DEFER_SETUP with L2CAP Enhanced Credit Based Mode - Fix issue with auto-connected devices - Fix suspend handling when entering the state fails ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30qed: Fix use after free in qed_chain_freeYuval Basson
The qed_chain data structure was modified in commit 1a4a69751f4d ("qed: Chain support for external PBL") to support receiving an external pbl (due to iWARP FW requirements). The pages pointed to by the pbl are allocated in qed_chain_alloc and their virtual address are stored in an virtual addresses array to enable accessing and freeing the data. The physical addresses however weren't stored and were accessed directly from the external-pbl during free. Destroy-qp flow, leads to freeing the external pbl before the chain is freed, when the chain is freed it tries accessing the already freed external pbl, leading to a use-after-free. Therefore we need to store the physical addresses in additional to the virtual addresses in a new data structure. Fixes: 1a4a69751f4d ("qed: Chain support for external PBL") Signed-off-by: Michal Kalderon <mkalderon@marvell.com> Signed-off-by: Yuval Bason <ybason@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30r8169: improve handling of TD_MSS_MAXHeiner Kallweit
If the mtu is greater than TD_MSS_MAX, then TSO is disabled, see rtl8169_fix_features(). Because mss is less than mtu, we can't have the case mss > TD_MSS_MAX in the TSO path. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30Merge branch 'Port-and-flow-policers-for-DSA'David S. Miller
Vladimir Oltean says: ==================== Port and flow policers for DSA (SJA1105, Felix/Ocelot) This series adds support for 2 types of policers: - port policers, via tc matchall filter - flow policers, via tc flower filter for 2 DSA drivers: - sja1105 - felix/ocelot First we start with ocelot/felix. Prior to this patch, the ocelot core library currently only supported: - Port policers - Flow-based dropping and trapping But the felix wrapper could not actually use the port policers due to missing linkage and support in the DSA core. So one of the patches addresses exactly that limitation by adding the missing support to the DSA core. The other patch for felix flow policers (via the VCAP IS2 engine) is actually in the ocelot library itself, since the linkage with the ocelot flower classifier has already been done in an earlier patch set. Then with the newly added .port_policer_add and .port_policer_del, we can also start supporting the L2 policers on sja1105. Then, for full functionality of these L2 policers on sja1105, we also implement a more limited set of flow-based policing keys for this switch, namely for broadcast and VLAN PCP. Series version 1 was submitted here: https://patchwork.ozlabs.org/cover/1263353/ Nothing functional changed in v2, only a rebase. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30net: dsa: sja1105: add broadcast and per-traffic class policersVladimir Oltean
This patch adds complete support for manipulating the L2 Policing Tables from this switch. There are 45 table entries, one entry per each port and traffic class, and one dedicated entry for broadcast traffic for each ingress port. Policing entries are shareable, and we use this functionality to support shared block filters. We are modeling broadcast policers as simple tc-flower matches on dst_mac. As for the traffic class policers, the switch only deduces the traffic class from the VLAN PCP field, so it makes sense to model this as a tc-flower match on vlan_prio. How to limit broadcast traffic coming from all front-panel ports to a cumulated total of 10 Mbit/s: tc qdisc add dev sw0p0 ingress_block 1 clsact tc qdisc add dev sw0p1 ingress_block 1 clsact tc qdisc add dev sw0p2 ingress_block 1 clsact tc qdisc add dev sw0p3 ingress_block 1 clsact tc filter add block 1 flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \ action police rate 10mbit burst 64k How to limit traffic with VLAN PCP 0 (also includes untagged traffic) to 100 Mbit/s on port 0 only: tc filter add dev sw0p0 ingress protocol 802.1Q flower skip_sw \ vlan_prio 0 action police rate 100mbit burst 64k The broadcast, VLAN PCP and port policers are compatible with one another (can be installed at the same time on a port). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30net: dsa: sja1105: add configuration of port policersVladimir Oltean
This adds partial configuration support for the L2 Policing Table. Out of the 45 policing entries, only 5 are used (one for each port), in a shared manner. All 8 traffic classes, and the broadcast policer, are redirected to a common instance which belongs to the ingress port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30net: dsa: felix: add port policersVladimir Oltean
This patch is a trivial passthrough towards the ocelot library, which support port policers since commit 2c1d029a017f ("net: mscc: ocelot: Implement port policers via tc command"). Some data structure conversion between the DSA core and the Ocelot library is necessary, for policer parameters. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30net: dsa: add port policersVladimir Oltean
The approach taken to pass the port policer methods on to drivers is pragmatic. It is similar to the port mirroring implementation (in that the DSA core does all of the filter block interaction and only passes simple operations for the driver to implement) and dissimilar to how flow-based policers are going to be implemented (where the driver has full control over the flow_cls_offload data structure). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30net: dsa: refactor matchall mirred action to separate functionVladimir Oltean
Make room for other actions for the matchall filter by keeping the mirred argument parsing self-contained in its own function. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30net: mscc: ocelot: add action of police on vcap_is2Xiaoliang Yang
Ocelot has 384 policers that can be allocated to ingress ports, QoS classes per port, and VCAP IS2 entries. ocelot_police.c supports to set policers which can be allocated to police action of VCAP IS2. We allocate policers from maximum pol_id, and decrease the pol_id when add a new vcap_is2 entry which is police action. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30Merge tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: - floppy driver cleanup series from Willy - NVMe updates and fixes (Various) - null_blk trace improvements (Chaitanya) - bcache fixes (Coly) - md fixes (via Song) - loop block size change optimizations (Martijn) - scnprintf() use (Takashi) * tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block: (81 commits) null_blk: add trace in null_blk_zoned.c null_blk: add tracepoint helpers for zoned mode block: add a zone condition debug helper nvme: cleanup namespace identifier reporting in nvme_init_ns_head nvme: rename __nvme_find_ns_head to nvme_find_ns_head nvme: refactor nvme_identify_ns_descs error handling nvme-tcp: Add warning on state change failure at nvme_tcp_setup_ctrl nvme-rdma: Add warning on state change failure at nvme_rdma_setup_ctrl nvme: Fix controller creation races with teardown flow nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl nvme: Fix ctrl use-after-free during sysfs deletion nvme-pci: Re-order nvme_pci_free_ctrl nvme: Remove unused return code from nvme_delete_ctrl_sync nvme: Use nvme_state_terminal helper nvme: release ida resources nvme: Add compat_ioctl handler for NVME_IOCTL_SUBMIT_IO nvmet-tcp: optimize tcp stack TX when data digest is used nvme-fabrics: Use scnprintf() for avoiding potential buffer overflow nvme-multipath: do not reset on unknown status nvmet-rdma: allocate RW ctxs according to mdts ...
2020-03-30Merge branch 'ionic-support-for-firmware-upgrade'David S. Miller
Shannon Nelson says: ==================== ionic support for firmware upgrade The Pensando Distributed Services Card can get firmware upgrades from the off-host centralized management suite, and can be upgraded without a host reboot or driver reload. This patchset sets up the support for fw upgrade in the Linux driver. When the upgrade begins, the DSC first brings the link down, then stops the firmware. The driver will notice this and quiesce itself by stopping the queues and releasing DMA resources, then monitoring for firmware to start back up. When the upgrade is finished the firmware is restarted and link is brought up, and the driver rebuilds the queues and restarts traffic flow. First we separate the Link state from the netdev state, then reorganize a few things to prepare for partial tear-down of the queues. Next we fix up the state machine so that we take the Tx and Rx queues down and back up when we get LINK_DOWN and LINK_UP events. Lastly, we add handling of the FW reset itself by tearing down the lif internals and rebuilding them with the new FW setup. v2: This changes the design from (ab)using the full .ndo_stop and .ndo_open routines to getting a better separation between the alloc and the init functions so that we can keep our resource allocations as long as possible. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30ionic: remove lifs on fw resetShannon Nelson
When the FW RESET event comes to the driver from the firmware, or the fw_status goes to 0 (stopped) or to 0xff (no PCI connection), then shut down the driver activity. This event signals a FW upgrade where we need to quiesce all operations and wait for the FW to restart. The FW will continue the update process once it sees all the LIFs are reset. When the update process is done it will set the fw_status back to RUNNING. Meanwhile, the heartbeat check continues and when the fw_status is seen as set to running we can restart the driver operations. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30ionic: disable the queues on link downShannon Nelson
When the link goes down, we need to disable the queues on the NIC in addition to stopping the netdev stack. This lets the FW know that the driver has stopped queue activity, and then the FW can do internal reconfiguration work, whether actually Link related, or for other internal FW needs. To do this, we pull out the queue enable and disable from ionic_open() and ionic_stop() so they can be used by other routines. To help keep things sane, we swap the queue enables so that the rx queue and its napi are enabled before the tx queue which rides on the rx queues napi. We also drop the ionic_lif_quiesce() as it doesn't do anything more than what the queue disable has already taken care of. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>