summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-07-02Merge tag 'block-5.8-2020-07-01' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - Use kvfree_sensitive() for the block keyslot free (Eric) - Sync blk-mq debugfs flags (Hou) - Memory leak fix in virtio-blk error path (Hou) * tag 'block-5.8-2020-07-01' of git://git.kernel.dk/linux-block: virtio-blk: free vblk-vqs in error path of virtblk_probe() block/keyslot-manager: use kvfree_sensitive() blk-mq-debugfs: update blk_queue_flag_name[] accordingly for new flags
2020-07-02Merge tag 'io_uring-5.8-2020-07-01' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "One fix in here, for a regression in 5.7 where a task is waiting in the kernel for a condition, but that condition won't become true until task_work is run. And the task_work can't be run exactly because the task is waiting in the kernel, so we'll never make any progress. One example of that is registering an eventfd and queueing io_uring work, and then the task goes and waits in eventfd read with the expectation that it'll get woken (and read an event) when the io_uring request completes. The io_uring request is finished through task_work, which won't get run while the task is looping in eventfd read" * tag 'io_uring-5.8-2020-07-01' of git://git.kernel.dk/linux-block: io_uring: use signal based task_work running task_work: teach task_work_add() to do signal_wake_up()
2020-07-02MAINTAINERS: net: macb: add Claudiu as co-maintainerNicolas Ferre
I would like that Claudiu becomes co-maintainer of the Cadence macb driver. He's already participating to lots of reviews and enhancements to this driver and knows the different versions of this controller. Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-02net: dsa: microchip: set the correct number of portsCodrin Ciubotariu
The number of ports is incorrectly set to the maximum available for a DSA switch. Even if the extra ports are not used, this causes some functions to be called later, like port_disable() and port_stp_state_set(). If the driver doesn't check the port index, it will end up modifying unknown registers. Fixes: b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477") Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-02xen/xenbus: let xenbus_map_ring_valloc() return errno values onlyJuergen Gross
Today xenbus_map_ring_valloc() can return either a negative errno value (-ENOMEM or -EINVAL) or a grant status value. This is a mess as e.g -ENOMEM and GNTST_eagain have the same numeric value. Fix that by turning all grant mapping errors into -ENOENT. This is no problem as all callers of xenbus_map_ring_valloc() only use the return value to print an error message, and in case of mapping errors the grant status value has already been printed by __xenbus_map_ring() before. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20200701121638.19840-3-jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2020-07-02xen/xenbus: avoid large structs and arrays on the stackJuergen Gross
xenbus_map_ring_valloc() and its sub-functions are putting quite large structs and arrays on the stack. This is problematic at runtime, but might also result in build failures (e.g. with clang due to the option -Werror,-Wframe-larger-than=... used). Fix that by moving most of the data from the stack into a dynamically allocated struct. Performance is no issue here, as xenbus_map_ring_valloc() is used only when adding a new PV device to a backend driver. While at it move some duplicated code from pv/hvm specific mapping functions to the single caller. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20200701121638.19840-2-jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2020-07-02tcp: md5: allow changing MD5 keys in all socket statesEric Dumazet
This essentially reverts commit 721230326891 ("tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets") Mathieu reported that many vendors BGP implementations can actually switch TCP MD5 on established flows. Quoting Mathieu : Here is a list of a few network vendors along with their behavior with respect to TCP MD5: - Cisco: Allows for password to be changed, but within the hold-down timer (~180 seconds). - Juniper: When password is initially set on active connection it will reset, but after that any subsequent password changes no network resets. - Nokia: No notes on if they flap the tcp connection or not. - Ericsson/RedBack: Allows for 2 password (old/new) to co-exist until both sides are ok with new passwords. - Meta-Switch: Expects the password to be set before a connection is attempted, but no further info on whether they reset the TCP connection on a change. - Avaya: Disable the neighbor, then set password, then re-enable. - Zebos: Would normally allow the change when socket connected. We can revert my prior change because commit 9424e2e7ad93 ("tcp: md5: fix potential overestimation of TCP option space") removed the leak of 4 kernel bytes to the wire that was the main reason for my patch. While doing my investigations, I found a bug when a MD5 key is changed, leading to these commits that stable teams want to consider before backporting this revert : Commit 6a2febec338d ("tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()") Commit e6ced831ef11 ("tcp: md5: refine tcp_md5_do_add()/tcp_md5_hash_key() barriers") Fixes: 721230326891 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets" Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-02IB/sa: Resolv use-after-free in ib_nl_make_request()Divya Indi
There is a race condition where ib_nl_make_request() inserts the request data into the linked list but the timer in ib_nl_request_timeout() can see it and destroy it before ib_nl_send_msg() is done touching it. This could happen, for instance, if there is a long delay allocating memory during nlmsg_new() This causes a use-after-free in the send_mad() thread: [<ffffffffa02f43cb>] ? ib_pack+0x17b/0x240 [ib_core] [ <ffffffffa032aef1>] ib_sa_path_rec_get+0x181/0x200 [ib_sa] [<ffffffffa0379db0>] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm] [<ffffffffa0374450>] ? cma_bind_port+0xa0/0xa0 [rdma_cm] [<ffffffffa040f850>] ? rds_rdma_cm_event_handler_cmn+0x850/0x850 [rds_rdma] [<ffffffffa040f22c>] rds_rdma_cm_event_handler_cmn+0x22c/0x850 [rds_rdma] [<ffffffffa040f860>] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma] [<ffffffffa037778e>] addr_handler+0x9e/0x140 [rdma_cm] [<ffffffffa026cdb4>] process_req+0x134/0x190 [ib_addr] [<ffffffff810a02f9>] process_one_work+0x169/0x4a0 [<ffffffff810a0b2b>] worker_thread+0x5b/0x560 [<ffffffff810a0ad0>] ? flush_delayed_work+0x50/0x50 [<ffffffff810a68fb>] kthread+0xcb/0xf0 [<ffffffff816ec49a>] ? __schedule+0x24a/0x810 [<ffffffff816ec49a>] ? __schedule+0x24a/0x810 [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180 [<ffffffff816f25a7>] ret_from_fork+0x47/0x90 [<ffffffff810a6830>] ? kthread_create_on_node+0x180/0x180 The ownership rule is once the request is on the list, ownership transfers to the list and the local thread can't touch it any more, just like for the normal MAD case in send_mad(). Thus, instead of adding before send and then trying to delete after on errors, move the entire thing under the spinlock so that the send and update of the lists are atomic to the conurrent threads. Lightly reoganize things so spinlock safe memory allocations are done in the final NL send path and the rest of the setup work is done before and outside the lock. Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before sending") Link: https://lore.kernel.org/r/1592964789-14533-1-git-send-email-divya.indi@oracle.com Signed-off-by: Divya Indi <divya.indi@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-02block: make function __bio_integrity_free() staticWei Yongjun
Fix sparse build warning: block/bio-integrity.c:27:6: warning: symbol '__bio_integrity_free' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-02Merge branch 'nvme-5.8' of git://git.infradead.org/nvme into block-5.8Jens Axboe
Pull NVMe fixes from Christoph. * 'nvme-5.8' of git://git.infradead.org/nvme: nvme: fix a crash in nvme_mpath_add_disk nvme: fix identify error status silent ignore
2020-07-02IB/hfi1: Do not destroy link_wq when the device is shut downKaike Wan
The workqueue link_wq should only be destroyed when the hfi1 driver is unloaded, not when the device is shut down. Fixes: 71d47008ca1b ("IB/hfi1: Create workqueue for link events") Link: https://lore.kernel.org/r/20200623204053.107638.70315.stgit@awfm-01.aw.intel.com Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-02IB/hfi1: Do not destroy hfi1_wq when the device is shut downKaike Wan
The workqueue hfi1_wq is destroyed in function shutdown_device(), which is called by either shutdown_one() or remove_one(). The function shutdown_one() is called when the kernel is rebooted while remove_one() is called when the hfi1 driver is unloaded. When the kernel is rebooted, hfi1_wq is destroyed while all qps are still active, leading to a kernel crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000102 IP: [<ffffffff94cb7b02>] __queue_work+0x32/0x3e0 PGD 0 Oops: 0000 [#1] SMP Modules linked in: dm_round_robin nvme_rdma(OE) nvme_fabrics(OE) nvme_core(OE) ib_isert iscsi_target_mod target_core_mod ib_ucm mlx4_ib iTCO_wdt iTCO_vendor_support mxm_wmi sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm rpcrdma sunrpc irqbypass crc32_pclmul ghash_clmulni_intel rdma_ucm aesni_intel ib_uverbs lrw gf128mul opa_vnic glue_helper ablk_helper ib_iser cryptd ib_umad rdma_cm iw_cm ses enclosure libiscsi scsi_transport_sas pcspkr joydev ib_ipoib(OE) scsi_transport_iscsi ib_cm sg ipmi_ssif mei_me lpc_ich i2c_i801 mei ioatdma ipmi_si dm_multipath ipmi_devintf ipmi_msghandler wmi acpi_pad acpi_power_meter hangcheck_timer ip_tables ext4 mbcache jbd2 mlx4_en sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm hfi1(OE) crct10dif_pclmul crct10dif_common crc32c_intel drm ahci mlx4_core libahci rdmavt(OE) igb megaraid_sas ib_core libata drm_panel_orientation_quirks ptp pps_core devlink dca i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod CPU: 19 PID: 0 Comm: swapper/19 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1 Hardware name: Phegda X2226A/S2600CW, BIOS SE5C610.86B.01.01.0024.021320181901 02/13/2018 task: ffff8a799ba0d140 ti: ffff8a799bad8000 task.ti: ffff8a799bad8000 RIP: 0010:[<ffffffff94cb7b02>] [<ffffffff94cb7b02>] __queue_work+0x32/0x3e0 RSP: 0018:ffff8a90dde43d80 EFLAGS: 00010046 RAX: 0000000000000082 RBX: 0000000000000086 RCX: 0000000000000000 RDX: ffff8a90b924fcb8 RSI: 0000000000000000 RDI: 000000000000001b RBP: ffff8a90dde43db8 R08: ffff8a799ba0d6d8 R09: ffff8a90dde53900 R10: 0000000000000002 R11: ffff8a90dde43de8 R12: ffff8a90b924fcb8 R13: 000000000000001b R14: 0000000000000000 R15: ffff8a90d2890000 FS: 0000000000000000(0000) GS:ffff8a90dde40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000102 CR3: 0000001a70410000 CR4: 00000000001607e0 Call Trace: [<ffffffff94cb8105>] queue_work_on+0x45/0x50 [<ffffffffc03f781e>] _hfi1_schedule_send+0x6e/0xc0 [hfi1] [<ffffffffc03f78a2>] hfi1_schedule_send+0x32/0x70 [hfi1] [<ffffffffc02cf2d9>] rvt_rc_timeout+0xe9/0x130 [rdmavt] [<ffffffff94ce563a>] ? trigger_load_balance+0x6a/0x280 [<ffffffffc02cf1f0>] ? rvt_free_qpn+0x40/0x40 [rdmavt] [<ffffffff94ca7f58>] call_timer_fn+0x38/0x110 [<ffffffffc02cf1f0>] ? rvt_free_qpn+0x40/0x40 [rdmavt] [<ffffffff94caa3bd>] run_timer_softirq+0x24d/0x300 [<ffffffff94ca0f05>] __do_softirq+0xf5/0x280 [<ffffffff9537832c>] call_softirq+0x1c/0x30 [<ffffffff94c2e675>] do_softirq+0x65/0xa0 [<ffffffff94ca1285>] irq_exit+0x105/0x110 [<ffffffff953796c8>] smp_apic_timer_interrupt+0x48/0x60 [<ffffffff95375df2>] apic_timer_interrupt+0x162/0x170 <EOI> [<ffffffff951adfb7>] ? cpuidle_enter_state+0x57/0xd0 [<ffffffff951ae10e>] cpuidle_idle_call+0xde/0x230 [<ffffffff94c366de>] arch_cpu_idle+0xe/0xc0 [<ffffffff94cfc3ba>] cpu_startup_entry+0x14a/0x1e0 [<ffffffff94c57db7>] start_secondary+0x1f7/0x270 [<ffffffff94c000d5>] start_cpu+0x5/0x14 The solution is to destroy the workqueue only when the hfi1 driver is unloaded, not when the device is shut down. In addition, when the device is shut down, no more work should be scheduled on the workqueues and the workqueues are flushed. Fixes: 8d3e71136a08 ("IB/{hfi1, qib}: Add handling of kernel restart") Link: https://lore.kernel.org/r/20200623204047.107638.77646.stgit@awfm-01.aw.intel.com Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-02xtensa: update *pos in cpuinfo_op.nextMax Filippov
Increment *pos in the cpuinfo_op.next to fix the following warning triggered by cat /proc/cpuinfo: seq_file: buggy .next function c_next did not update position index Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2020-07-02xtensa: fix __sync_fetch_and_{and,or}_4 declarationsMax Filippov
Building xtensa kernel with gcc-10 produces the following warnings: arch/xtensa/kernel/xtensa_ksyms.c:90:15: warning: conflicting types for built-in function ‘__sync_fetch_and_and_4’; expected ‘unsigned int(volatile void *, unsigned int)’ [-Wbuiltin-declaration-mismatch] arch/xtensa/kernel/xtensa_ksyms.c:96:15: warning: conflicting types for built-in function ‘__sync_fetch_and_or_4’; expected ‘unsigned int(volatile void *, unsigned int)’ [-Wbuiltin-declaration-mismatch] Fix declarations of these functions to avoid the warning. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2020-07-02tpm_tis: Remove the HID IFX0102Jarkko Sakkinen
Acer C720 running Linux v5.3 reports this in klog: tpm_tis: 1.2 TPM (device-id 0xB, rev-id 16) tpm tpm0: tpm_try_transmit: send(): error -5 tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts tpm_tis tpm_tis: Could not get TPM timeouts and durations tpm_tis 00:08: 1.2 TPM (device-id 0xB, rev-id 16) tpm tpm0: tpm_try_transmit: send(): error -5 tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts tpm_tis 00:08: Could not get TPM timeouts and durations ima: No TPM chip found, activating TPM-bypass! tpm_inf_pnp 00:08: Found TPM with ID IFX0102 % git --no-pager grep IFX0102 drivers/char/tpm drivers/char/tpm/tpm_infineon.c: {"IFX0102", 0}, drivers/char/tpm/tpm_tis.c: {"IFX0102", 0}, /* Infineon */ Obviously IFX0102 was added to the HID table for the TCG TIS driver by mistake. Fixes: 93e1b7d42e1e ("[PATCH] tpm: add HID module parameter") Link: https://bugzilla.kernel.org/show_bug.cgi?id=203877 Cc: stable@vger.kernel.org Cc: Kylene Jo Hall <kjhall@us.ibm.com> Reported-by: Ferry Toth: <ferry.toth@elsinga.info> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
2020-07-02tpm_tis_spi: Prefer async probeDouglas Anderson
On a Chromebook I'm working on I noticed a big (~1 second) delay during bootup where nothing was happening. Right around this big delay there were messages about the TPM: [ 2.311352] tpm_tis_spi spi0.0: TPM ready IRQ confirmed on attempt 2 [ 3.332790] tpm_tis_spi spi0.0: Cr50 firmware version: ... I put a few printouts in and saw that tpm_tis_spi_init() (specifically tpm_chip_register() in that function) was taking the lion's share of this time, though ~115 ms of the time was in cr50_print_fw_version(). Let's make a one-line change to prefer async probe for tpm_tis_spi. There's no reason we need to block other drivers from probing while we load. NOTES: * It's possible that other hardware runs through the init sequence faster than Cr50 and this isn't such a big problem for them. However, even if they are faster they are still doing _some_ transfers over a SPI bus so this should benefit everyone even if to a lesser extent. * It's possible that there are extra delays in the code that could be optimized out. I didn't dig since once I enabled async probe they no longer impacted me. Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
2020-07-02tpm: ibmvtpm: Wait for ready buffer before probing for TPM2 attributesDavid Gibson
The tpm2_get_cc_attrs_tbl() call will result in TPM commands being issued, which will need the use of the internal command/response buffer. But, we're issuing this *before* we've waited to make sure that buffer is allocated. This can result in intermittent failures to probe if the hypervisor / TPM implementation doesn't respond quickly enough. I find it fails almost every time with an 8 vcpu guest under KVM with software emulated TPM. To fix it, just move the tpm2_get_cc_attrs_tlb() call after the existing code to wait for initialization, which will ensure the buffer is allocated. Fixes: 18b3670d79ae9 ("tpm: ibmvtpm: Add support for TPM2") Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
2020-07-02tpm/st33zp24: fix spelling mistake "drescription" -> "description"Binbin Zhou
Trivial fix, the spelling of "drescription" is incorrect in function comment. Fix this. Signed-off-by: Binbin Zhou <zhoubinbin@uniontech.com> Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
2020-07-02tpm_tis: extra chip->ops check on error path in tpm_tis_core_initVasily Averin
Found by smatch: drivers/char/tpm/tpm_tis_core.c:1088 tpm_tis_core_init() warn: variable dereferenced before check 'chip->ops' (see line 979) 'chip->ops' is assigned in the beginning of function in tpmm_chip_alloc->tpm_chip_alloc and is used before first possible goto to error path. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
2020-07-02tpm_tis_spi: Don't send anything during flow controlDouglas Anderson
During flow control we are just reading from the TPM, yet our spi_xfer has the tx_buf and rx_buf both non-NULL which means we're requesting a full duplex transfer. SPI is always somewhat of a full duplex protocol anyway and in theory the other side shouldn't really be looking at what we're sending it during flow control, but it's still a bit ugly to be sending some "random" data when we shouldn't. The default tpm_tis_spi_flow_control() tries to address this by setting 'phy->iobuf[0] = 0'. This partially avoids the problem of sending "random" data, but since our tx_buf and rx_buf both point to the same place I believe there is the potential of us sending the TPM's previous byte back to it if we hit the retry loop. Another flow control implementation, cr50_spi_flow_control(), doesn't address this at all. Let's clean this up and just make the tx_buf NULL before we call flow_control(). Not only does this ensure that we're not sending any "random" bytes but it also possibly could make the SPI controller behave in a slightly more optimal way. NOTE: no actual observed problems are fixed by this patch--it's was just made based on code inspection. Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
2020-07-02tpm: Fix TIS locality timeout problemsJames Bottomley
It has been reported that some TIS based TPMs are giving unexpected errors when using the O_NONBLOCK path of the TPM device. The problem is that some TPMs don't like it when you get and then relinquish a locality (as the tpm_try_get_ops()/tpm_put_ops() pair does) without sending a command. This currently happens all the time in the O_NONBLOCK write path. Fix this by moving the tpm_try_get_ops() further down the code to after the O_NONBLOCK determination is made. This is safe because the priv->buffer_mutex still protects the priv state being modified. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206275 Fixes: d23d12484307 ("tpm: fix invalid locking in NONBLOCKING mode") Reported-by: Mario Limonciello <Mario.Limonciello@dell.com> Tested-by: Alex Guzman <alex@guzman.io> Cc: stable@vger.kernel.org Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
2020-07-02RDMA/mlx5: Fix legacy IPoIB QP initializationLeon Romanovsky
Legacy IPoIB sets IB_QP_CREATE_NETIF_QP QP create flag and because mlx5 doesn't use this flag, the process_create_flags() failed to create IPoIB QPs. Fixes: 2978975ce7f1 ("RDMA/mlx5: Process create QP flags in one place") Link: https://lore.kernel.org/r/20200630122147.445847-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-02IB/hfi1: Add explicit cast OPA_MTU_8192 to 'enum ib_mtu'Nathan Chancellor
Clang warns: drivers/infiniband/hw/hfi1/qp.c:198:9: warning: implicit conversion from enumeration type 'enum opa_mtu' to different enumeration type 'enum ib_mtu' [-Wenum-conversion] mtu = OPA_MTU_8192; ~ ^~~~~~~~~~~~ enum opa_mtu extends enum ib_mtu. There are typically two ways to deal with this: * Remove the expected types and just use 'int' for all parameters and types. * Explicitly cast the enums between each other. This driver chooses to do the later so do the same thing here. Fixes: 6d72344cf6c4 ("IB/ipoib: Increase ipoib Datagram mode MTU's upper limit") Link: https://lore.kernel.org/r/20200623005224.492239-1-natechancellor@gmail.com Link: https://github.com/ClangBuiltLinux/linux/issues/1062 Link: https://lore.kernel.org/linux-rdma/20200527040350.GA3118979@ubuntu-s3-xlarge-x86/ Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-02arm64/alternatives: use subsections for replacement sequencesArd Biesheuvel
When building very large kernels, the logic that emits replacement sequences for alternatives fails when relative branches are present in the code that is emitted into the .altinstr_replacement section and patched in at the original site and fixed up. The reason is that the linker will insert veneers if relative branches go out of range, and due to the relative distance of the .altinstr_replacement from the .text section where its branch targets usually live, veneers may be emitted at the end of the .altinstr_replacement section, with the relative branches in the sequence pointed at the veneers instead of the actual target. The alternatives patching logic will attempt to fix up the branch to point to its original target, which will be the veneer in this case, but given that the patch site is likely to be far away as well, it will be out of range and so patching will fail. There are other cases where these veneers are problematic, e.g., when the target of the branch is in .text while the patch site is in .init.text, in which case putting the replacement sequence inside .text may not help either. So let's use subsections to emit the replacement code as closely as possible to the patch site, to ensure that veneers are only likely to be emitted if they are required at the patch site as well, in which case they will be in range for the replacement sequence both before and after it is transported to the patch site. This will prevent alternative sequences in non-init code from being released from memory after boot, but this is tolerable given that the entire section is only 512 KB on an allyesconfig build (which weighs in at 500+ MB for the entire Image). Also, note that modules today carry the replacement sequences in non-init sections as well, and any of those that target init code will be emitted into init sections after this change. This fixes an early crash when booting an allyesconfig kernel on a system where any of the alternatives sequences containing relative branches are activated at boot (e.g., ARM64_HAS_PAN on TX2) Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Dave P Martin <dave.martin@arm.com> Link: https://lore.kernel.org/r/20200630081921.13443-1-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-07-02kvm: use more precise cast and do not drop __userPaolo Bonzini
Sparse complains on a call to get_compat_sigset, fix it. The "if" right above explains that sigmask_arg->sigset is basically a compat_sigset_t. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-02nvme: fix a crash in nvme_mpath_add_diskChristoph Hellwig
For private namespaces ns->head_disk is NULL, so add a NULL check before updating the BDI capabilities. Fixes: b2ce4d90690b ("nvme-multipath: set bdi capabilities once") Reported-by: Avinash M N <Avinash.M.N@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
2020-07-02nvme: fix identify error status silent ignoreSagi Grimberg
Commit 59c7c3caaaf8 intended to only silently ignore non retry-able errors (DNR bit set) such that we can still identify misbehaving controllers, and in the other hand propagate retry-able errors (DNR bit cleared) so we don't wrongly abandon a namespace just because it happens to be temporarily inaccessible. The goal remains the same as the original commit where this was introduced but unfortunately had the logic backwards. Fixes: 59c7c3caaaf8 ("nvme: fix possible hang when ns scanning fails during error recovery") Reported-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-02drm/meson: viu: fix setting the OSD burst length in VIU_OSD1_FIFO_CTRL_STATMartin Blumenstingl
The burst length is configured in VIU_OSD1_FIFO_CTRL_STAT[31] and VIU_OSD1_FIFO_CTRL_STAT[11:10]. The public S905D3 datasheet describes this as: - 0x0 = up to 24 per burst - 0x1 = up to 32 per burst - 0x2 = up to 48 per burst - 0x3 = up to 64 per burst - 0x4 = up to 96 per burst - 0x5 = up to 128 per burst The lower two bits map to VIU_OSD1_FIFO_CTRL_STAT[11:10] while the upper bit maps to VIU_OSD1_FIFO_CTRL_STAT[31]. Replace meson_viu_osd_burst_length_reg() with pre-defined macros which set these values. meson_viu_osd_burst_length_reg() always returned 0 (for the two used values: 32 and 64 at least) and thus incorrectly set the burst size to 24. Fixes: 147ae1cbaa1842 ("drm: meson: viu: use proper macros instead of magic constants") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Tested-by: Christian Hewitt <christianshewitt@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200620155752.21065-1-martin.blumenstingl@googlemail.com
2020-07-02btrfs: reset tree root pointer after error in init_tree_rootsJosef Bacik
Eric reported an issue where mounting -o recovery with a fuzzed fs resulted in a kernel panic. This is because we tried to free the tree node, except it was an error from the read. Fix this by properly resetting the tree_root->node == NULL in this case. The panic was the following BTRFS warning (device loop0): failed to read tree root BUG: kernel NULL pointer dereference, address: 000000000000001f RIP: 0010:free_extent_buffer+0xe/0x90 [btrfs] Call Trace: free_root_extent_buffers.part.0+0x11/0x30 [btrfs] free_root_pointers+0x1a/0xa2 [btrfs] open_ctree+0x1776/0x18a5 [btrfs] btrfs_mount_root.cold+0x13/0xfa [btrfs] ? selinux_fs_context_parse_param+0x37/0x80 legacy_get_tree+0x27/0x40 vfs_get_tree+0x25/0xb0 fc_mount+0xe/0x30 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x147/0x3e0 [btrfs] ? cred_has_capability+0x7c/0x120 ? legacy_get_tree+0x27/0x40 legacy_get_tree+0x27/0x40 vfs_get_tree+0x25/0xb0 do_mount+0x735/0xa40 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Nik says: this is problematic only if we fail on the last iteration of the loop as this results in init_tree_roots returning err value with tree_root->node = -ERR. Subsequently the caller does: fail_tree_roots which calls free_root_pointers on the bogus value. Reported-by: Eric Sandeen <sandeen@redhat.com> Fixes: b8522a1e5f42 ("btrfs: Factor out tree roots initialization during mount") CC: stable@vger.kernel.org # 5.5+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add details how the pointer gets dereferenced ] Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-02btrfs: fix reclaim_size counter leak after stealing from global reserveFilipe Manana
Commit 7f9fe614407692 ("btrfs: improve global reserve stealing logic"), added in the 5.8 merge window, introduced another leak for the space_info's reclaim_size counter. This is very often triggered by the test cases generic/269 and generic/416 from fstests, producing a stack trace like the following during unmount: [37079.155499] ------------[ cut here ]------------ [37079.156844] WARNING: CPU: 2 PID: 2000423 at fs/btrfs/block-group.c:3422 btrfs_free_block_groups+0x2eb/0x300 [btrfs] [37079.158090] Modules linked in: dm_snapshot btrfs dm_thin_pool (...) [37079.164440] CPU: 2 PID: 2000423 Comm: umount Tainted: G W 5.7.0-rc7-btrfs-next-62 #1 [37079.165422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), (...) [37079.167384] RIP: 0010:btrfs_free_block_groups+0x2eb/0x300 [btrfs] [37079.168375] Code: bd 58 ff ff ff 00 4c 8d (...) [37079.170199] RSP: 0018:ffffaa53875c7de0 EFLAGS: 00010206 [37079.171120] RAX: ffff98099e701cf8 RBX: ffff98099e2d4000 RCX: 0000000000000000 [37079.172057] RDX: 0000000000000001 RSI: ffffffffc0acc5b1 RDI: 00000000ffffffff [37079.173002] RBP: ffff98099e701cf8 R08: 0000000000000000 R09: 0000000000000000 [37079.173886] R10: 0000000000000000 R11: 0000000000000000 R12: ffff98099e701c00 [37079.174730] R13: ffff98099e2d5100 R14: dead000000000122 R15: dead000000000100 [37079.175578] FS: 00007f4d7d0a5840(0000) GS:ffff9809ec600000(0000) knlGS:0000000000000000 [37079.176434] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [37079.177289] CR2: 0000559224dcc000 CR3: 000000012207a004 CR4: 00000000003606e0 [37079.178152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [37079.178935] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [37079.179675] Call Trace: [37079.180419] close_ctree+0x291/0x2d1 [btrfs] [37079.181162] generic_shutdown_super+0x6c/0x100 [37079.181898] kill_anon_super+0x14/0x30 [37079.182641] btrfs_kill_super+0x12/0x20 [btrfs] [37079.183371] deactivate_locked_super+0x31/0x70 [37079.184012] cleanup_mnt+0x100/0x160 [37079.184650] task_work_run+0x68/0xb0 [37079.185284] exit_to_usermode_loop+0xf9/0x100 [37079.185920] do_syscall_64+0x20d/0x260 [37079.186556] entry_SYSCALL_64_after_hwframe+0x49/0xb3 [37079.187197] RIP: 0033:0x7f4d7d2d9357 [37079.187836] Code: eb 0b 00 f7 d8 64 89 01 48 (...) [37079.189180] RSP: 002b:00007ffee4e0d368 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [37079.189845] RAX: 0000000000000000 RBX: 00007f4d7d3fb224 RCX: 00007f4d7d2d9357 [37079.190515] RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 0000559224dc5c90 [37079.191173] RBP: 0000559224dc1970 R08: 0000000000000000 R09: 00007ffee4e0c0e0 [37079.191815] R10: 0000559224dc7b00 R11: 0000000000000246 R12: 0000000000000000 [37079.192451] R13: 0000559224dc5c90 R14: 0000559224dc1a80 R15: 0000559224dc1ba0 [37079.193096] irq event stamp: 0 [37079.193729] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [37079.194379] hardirqs last disabled at (0): [<ffffffff97ab8935>] copy_process+0x755/0x1ea0 [37079.195033] softirqs last enabled at (0): [<ffffffff97ab8935>] copy_process+0x755/0x1ea0 [37079.195700] softirqs last disabled at (0): [<0000000000000000>] 0x0 [37079.196318] ---[ end trace b32710d864dea887 ]--- In the past commit d611add48b717a ("btrfs: fix reclaim counter leak of space_info objects") fixed similar cases. That commit however has a date more recent (April 7 2020) then the commit mentioned before (March 13 2020), however it was merged in kernel 5.7 while the older commit, which introduces a new leak, was merged only in the 5.8 merge window. So the leak sneaked in unnoticed. Fix this by making steal_from_global_rsv() remove the ticket using the helper remove_ticket(), which decrements the reclaim_size counter of the space_info object. Fixes: 7f9fe614407692 ("btrfs: improve global reserve stealing logic") Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-02btrfs: fix fatal extent_buffer readahead vs releasepage raceBoris Burkov
Under somewhat convoluted conditions, it is possible to attempt to release an extent_buffer that is under io, which triggers a BUG_ON in btrfs_release_extent_buffer_pages. This relies on a few different factors. First, extent_buffer reads done as readahead for searching use WAIT_NONE, so they free the local extent buffer reference while the io is outstanding. However, they should still be protected by TREE_REF. However, if the system is doing signficant reclaim, and simultaneously heavily accessing the extent_buffers, it is possible for releasepage to race with two concurrent readahead attempts in a way that leaves TREE_REF unset when the readahead extent buffer is released. Essentially, if two tasks race to allocate a new extent_buffer, but the winner who attempts the first io is rebuffed by a page being locked (likely by the reclaim itself) then the loser will still go ahead with issuing the readahead. The loser's call to find_extent_buffer must also race with the reclaim task reading the extent_buffer's refcount as 1 in a way that allows the reclaim to re-clear the TREE_REF checked by find_extent_buffer. The following represents an example execution demonstrating the race: CPU0 CPU1 CPU2 reada_for_search reada_for_search readahead_tree_block readahead_tree_block find_create_tree_block find_create_tree_block alloc_extent_buffer alloc_extent_buffer find_extent_buffer // not found allocates eb lock pages associate pages to eb insert eb into radix tree set TREE_REF, refs == 2 unlock pages read_extent_buffer_pages // WAIT_NONE not uptodate (brand new eb) lock_page if !trylock_page goto unlock_exit // not an error free_extent_buffer release_extent_buffer atomic_dec_and_test refs to 1 find_extent_buffer // found try_release_extent_buffer take refs_lock reads refs == 1; no io atomic_inc_not_zero refs to 2 mark_buffer_accessed check_buffer_tree_ref // not STALE, won't take refs_lock refs == 2; TREE_REF set // no action read_extent_buffer_pages // WAIT_NONE clear TREE_REF release_extent_buffer atomic_dec_and_test refs to 1 unlock_page still not uptodate (CPU1 read failed on trylock_page) locks pages set io_pages > 0 submit io return free_extent_buffer release_extent_buffer dec refs to 0 delete from radix tree btrfs_release_extent_buffer_pages BUG_ON(io_pages > 0)!!! We observe this at a very low rate in production and were also able to reproduce it in a test environment by introducing some spurious delays and by introducing probabilistic trylock_page failures. To fix it, we apply check_tree_ref at a point where it could not possibly be unset by a competing task: after io_pages has been incremented. All the codepaths that clear TREE_REF check for io, so they would not be able to clear it after this point until the io is done. Stack trace, for reference: [1417839.424739] ------------[ cut here ]------------ [1417839.435328] kernel BUG at fs/btrfs/extent_io.c:4841! [1417839.447024] invalid opcode: 0000 [#1] SMP [1417839.502972] RIP: 0010:btrfs_release_extent_buffer_pages+0x20/0x1f0 [1417839.517008] Code: ed e9 ... [1417839.558895] RSP: 0018:ffffc90020bcf798 EFLAGS: 00010202 [1417839.570816] RAX: 0000000000000002 RBX: ffff888102d6def0 RCX: 0000000000000028 [1417839.586962] RDX: 0000000000000002 RSI: ffff8887f0296482 RDI: ffff888102d6def0 [1417839.603108] RBP: ffff88885664a000 R08: 0000000000000046 R09: 0000000000000238 [1417839.619255] R10: 0000000000000028 R11: ffff88885664af68 R12: 0000000000000000 [1417839.635402] R13: 0000000000000000 R14: ffff88875f573ad0 R15: ffff888797aafd90 [1417839.651549] FS: 00007f5a844fa700(0000) GS:ffff88885f680000(0000) knlGS:0000000000000000 [1417839.669810] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1417839.682887] CR2: 00007f7884541fe0 CR3: 000000049f609002 CR4: 00000000003606e0 [1417839.699037] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1417839.715187] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1417839.731320] Call Trace: [1417839.737103] release_extent_buffer+0x39/0x90 [1417839.746913] read_block_for_search.isra.38+0x2a3/0x370 [1417839.758645] btrfs_search_slot+0x260/0x9b0 [1417839.768054] btrfs_lookup_file_extent+0x4a/0x70 [1417839.778427] btrfs_get_extent+0x15f/0x830 [1417839.787665] ? submit_extent_page+0xc4/0x1c0 [1417839.797474] ? __do_readpage+0x299/0x7a0 [1417839.806515] __do_readpage+0x33b/0x7a0 [1417839.815171] ? btrfs_releasepage+0x70/0x70 [1417839.824597] extent_readpages+0x28f/0x400 [1417839.833836] read_pages+0x6a/0x1c0 [1417839.841729] ? startup_64+0x2/0x30 [1417839.849624] __do_page_cache_readahead+0x13c/0x1a0 [1417839.860590] filemap_fault+0x6c7/0x990 [1417839.869252] ? xas_load+0x8/0x80 [1417839.876756] ? xas_find+0x150/0x190 [1417839.884839] ? filemap_map_pages+0x295/0x3b0 [1417839.894652] __do_fault+0x32/0x110 [1417839.902540] __handle_mm_fault+0xacd/0x1000 [1417839.912156] handle_mm_fault+0xaa/0x1c0 [1417839.921004] __do_page_fault+0x242/0x4b0 [1417839.930044] ? page_fault+0x8/0x30 [1417839.937933] page_fault+0x1e/0x30 [1417839.945631] RIP: 0033:0x33c4bae [1417839.952927] Code: Bad RIP value. [1417839.960411] RSP: 002b:00007f5a844f7350 EFLAGS: 00010206 [1417839.972331] RAX: 000000000000006e RBX: 1614b3ff6a50398a RCX: 0000000000000000 [1417839.988477] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002 [1417840.004626] RBP: 00007f5a844f7420 R08: 000000000000006e R09: 00007f5a94aeccb8 [1417840.020784] R10: 00007f5a844f7350 R11: 0000000000000000 R12: 00007f5a94aecc79 [1417840.036932] R13: 00007f5a94aecc78 R14: 00007f5a94aecc90 R15: 00007f5a94aecc40 CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-02btrfs: convert comments to fallthrough annotationsMarcos Paulo de Souza
Convert fall through comments to the pseudo-keyword which is now the preferred way. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-02Merge tag 'amd-drm-fixes-5.8-2020-07-01' of ↵Dave Airlie
git://people.freedesktop.org/~agd5f/linux into drm-fixes amd-drm-fixes-5.8-2020-07-01: amdgpu: - Fix for vega20 boards without RAS support - DC bandwidth revalidation fix - Fix Renoir vram info fetching - Fix hwmon freq printing Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200701194415.4065-1-alexander.deucher@amd.com
2020-07-02Merge tag 'drm-intel-fixes-2020-07-01' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v5.8-rc4: - GVT fixes - Include asm sources for render cache clear batches Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87imf7l6ee.fsf@intel.com
2020-07-01cifs: prevent truncation from long to int in wait_for_free_creditsRonnie Sahlberg
The wait_event_... defines evaluate to long so we should not assign it an int as this may truncate the value. Reported-by: Marshall Midden <marshallmidden@gmail.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-07-01net: dsa: microchip: enable ksz9893 via i2c in the ksz9477 driverHelmut Grohne
The KSZ9893 3-Port Gigabit Ethernet Switch can be controlled via SPI, I²C or MDIO (very limited and not supported by this driver). While there is already a compatible entry for the SPI bus, it was missing for I²C. Signed-off-by: Helmut Grohne <helmut.grohne@intenta.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01tcp: fix SO_RCVLOWAT possible hangs under high mem pressureEric Dumazet
Whenever tcp_try_rmem_schedule() returns an error, we are under trouble and should make sure to wakeup readers so that they can drain socket queues and eventually make room. Fixes: 03f45c883c6f ("tcp: avoid extra wakeups for SO_RCVLOWAT users") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01cifs: Fix the target file was deleted when rename failed.Zhang Xiaoxu
When xfstest generic/035, we found the target file was deleted if the rename return -EACESS. In cifs_rename2, we unlink the positive target dentry if rename failed with EACESS or EEXIST, even if the target dentry is positived before rename. Then the existing file was deleted. We should just delete the target file which created during the rename. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-07-01SMB3: Honor 'posix' flag for multiuser mountsPaul Aurich
The flag from the primary tcon needs to be copied into the volume info so that cifs_get_tcon will try to enable extensions on the per-user tcon. At that point, since posix extensions must have already been enabled on the superblock, don't try to needlessly adjust the mount flags. Fixes: ce558b0e17f8 ("smb3: Add posix create context for smb3.11 posix mounts") Fixes: b326614ea215 ("smb3: allow "posix" mount option to enable new SMB311 protocol extensions") Signed-off-by: Paul Aurich <paul@darkrain42.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-07-01SMB3: Honor 'handletimeout' flag for multiuser mountsPaul Aurich
Fixes: ca567eb2b3f0 ("SMB3: Allow persistent handle timeout to be configurable on mount") Signed-off-by: Paul Aurich <paul@darkrain42.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-07-01SMB3: Honor lease disabling for multiuser mountsPaul Aurich
Fixes: 3e7a02d47872 ("smb3: allow disabling requesting leases") Signed-off-by: Paul Aurich <paul@darkrain42.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-07-01SMB3: Honor persistent/resilient handle flags for multiuser mountsPaul Aurich
Without this: - persistent handles will only be enabled for per-user tcons if the server advertises the 'Continuous Availabity' capability - resilient handles would never be enabled for per-user tcons Signed-off-by: Paul Aurich <paul@darkrain42.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-07-01SMB3: Honor 'seal' flag for multiuser mountsPaul Aurich
Ensure multiuser SMB3 mounts use encryption for all users' tcons if the mount options are configured to require encryption. Without this, only the primary tcon and IPC tcons are guaranteed to be encrypted. Per-user tcons would only be encrypted if the server was configured to require encryption. Signed-off-by: Paul Aurich <paul@darkrain42.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-07-01ip: Fix SO_MARK in RST, ACK and ICMP packetsWillem de Bruijn
When no full socket is available, skbs are sent over a per-netns control socket. Its sk_mark is temporarily adjusted to match that of the real (request or timewait) socket or to reflect an incoming skb, so that the outgoing skb inherits this in __ip_make_skb. Introduction of the socket cookie mark field broke this. Now the skb is set through the cookie and cork: <caller> # init sockc.mark from sk_mark or cmsg ip_append_data ip_setup_cork # convert sockc.mark to cork mark ip_push_pending_frames ip_finish_skb __ip_make_skb # set skb->mark to cork mark But I missed these special control sockets. Update all callers of __ip(6)_make_skb that were originally missed. For IPv6, the same two icmp(v6) paths are affected. The third case is not, as commit 92e55f412cff ("tcp: don't annotate mark on control socket from tcp_v6_send_response()") replaced the ctl_sk->sk_mark with passing the mark field directly as a function argument. That commit predates the commit that introduced the bug. Fixes: c6af0c227a22 ("ip: support SO_MARK cmsg") Signed-off-by: Willem de Bruijn <willemb@google.com> Reported-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01cifs: Display local UID details for SMB sessions in DebugDataPaul Aurich
This is useful for distinguishing SMB sessions on a multiuser mount. Signed-off-by: Paul Aurich <paul@darkrain42.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-07-01tcp: md5: do not send silly options in SYNCOOKIESEric Dumazet
Whenever cookie_init_timestamp() has been used to encode ECN,SACK,WSCALE options, we can not remove the TS option in the SYNACK. Otherwise, tcp_synack_options() will still advertize options like WSCALE that we can not deduce later when receiving the packet from the client to complete 3WHS. Note that modern linux TCP stacks wont use MD5+TS+SACK in a SYN packet, but we can not know for sure that all TCP stacks have the same logic. Before the fix a tcpdump would exhibit this wrong exchange : 10:12:15.464591 IP C > S: Flags [S], seq 4202415601, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 456965269 ecr 0,nop,wscale 8], length 0 10:12:15.464602 IP S > C: Flags [S.], seq 253516766, ack 4202415602, win 65535, options [nop,nop,md5 valid,mss 1400,nop,nop,sackOK,nop,wscale 8], length 0 10:12:15.464611 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid], length 0 10:12:15.464678 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid], length 12 10:12:15.464685 IP S > C: Flags [.], ack 13, win 65535, options [nop,nop,md5 valid], length 0 After this patch the exchange looks saner : 11:59:59.882990 IP C > S: Flags [S], seq 517075944, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 1751508483 ecr 0,nop,wscale 8], length 0 11:59:59.883002 IP S > C: Flags [S.], seq 1902939253, ack 517075945, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 1751508479 ecr 1751508483,nop,wscale 8], length 0 11:59:59.883012 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508479], length 0 11:59:59.883114 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508479], length 12 11:59:59.883122 IP S > C: Flags [.], ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508483], length 0 11:59:59.883152 IP S > C: Flags [P.], seq 1:13, ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508484 ecr 1751508483], length 12 11:59:59.883170 IP C > S: Flags [.], ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508484 ecr 1751508484], length 0 Of course, no SACK block will ever be added later, but nothing should break. Technically, we could remove the 4 nops included in MD5+TS options, but again some stacks could break seeing not conventional alignment. Fixes: 4957faade11b ("TCPCT part 1g: Responder Cookie => Initiator") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01rds: If one path needs re-connection, check all and re-connectRao Shoaib
In testing with mprds enabled, Oracle Cluster nodes after reboot were not able to communicate with others nodes and so failed to rejoin the cluster. Peers with lower IP address initiated connection but the node could not respond as it choose a different path and could not initiate a connection as it had a higher IP address. With this patch, when a node sends out a packet and the selected path is down, all other paths are also checked and any down paths are re-connected. Reviewed-by: Ka-cheong Poon <ka-cheong.poon@oracle.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01tcp: md5: refine tcp_md5_do_add()/tcp_md5_hash_key() barriersEric Dumazet
My prior fix went a bit too far, according to Herbert and Mathieu. Since we accept that concurrent TCP MD5 lookups might see inconsistent keys, we can use READ_ONCE()/WRITE_ONCE() instead of smp_rmb()/smp_wmb() Clearing all key->key[] is needed to avoid possible KMSAN reports, if key->keylen is increased. Since tcp_md5_do_add() is not fast path, using __GFP_ZERO to clear all struct tcp_md5sig_key is simpler. data_race() was added in linux-5.8 and will prevent KCSAN reports, this can safely be removed in stable backports, if data_race() is not yet backported. v2: use data_race() both in tcp_md5_hash_key() and tcp_md5_do_add() Fixes: 6a2febec338d ("tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Marco Elver <elver@google.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01genetlink: remove genl_bindSean Tranchetti
A potential deadlock can occur during registering or unregistering a new generic netlink family between the main nl_table_lock and the cb_lock where each thread wants the lock held by the other, as demonstrated below. 1) Thread 1 is performing a netlink_bind() operation on a socket. As part of this call, it will call netlink_lock_table(), incrementing the nl_table_users count to 1. 2) Thread 2 is registering (or unregistering) a genl_family via the genl_(un)register_family() API. The cb_lock semaphore will be taken for writing. 3) Thread 1 will call genl_bind() as part of the bind operation to handle subscribing to GENL multicast groups at the request of the user. It will attempt to take the cb_lock semaphore for reading, but it will fail and be scheduled away, waiting for Thread 2 to finish the write. 4) Thread 2 will call netlink_table_grab() during the (un)registration call. However, as Thread 1 has incremented nl_table_users, it will not be able to proceed, and both threads will be stuck waiting for the other. genl_bind() is a noop, unless a genl_family implements the mcast_bind() function to handle setting up family-specific multicast operations. Since no one in-tree uses this functionality as Cong pointed out, simply removing the genl_bind() function will remove the possibility for deadlock, as there is no attempt by Thread 1 above to take the cb_lock semaphore. Fixes: c380d9a7afff ("genetlink: pass multicast bind/unbind to families") Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Johannes Berg <johannes.berg@intel.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01dt-bindings: clock: imx: Fix e-mail addressFabio Estevam
The freescale.com domain is gone for quite some time. Use the nxp.com domain instead. Signed-off-by: Fabio Estevam <festevam@gmail.com> Link: https://lore.kernel.org/r/20200701005346.1008-1-festevam@gmail.com Signed-off-by: Rob Herring <robh@kernel.org>