summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-11-06ASoC: SOF: topology: Fix bytes control size checksDragos Tarcatu
When using the example SOF amp widget topology, KASAN dumps this when the AMP bytes kcontrol gets loaded: [ 9.579548] BUG: KASAN: slab-out-of-bounds in sof_control_load+0x8cc/0xac0 [snd_sof] [ 9.588194] Write of size 40 at addr ffff8882314559dc by task systemd-udevd/2411 Fix that by rejecting the topology if the bytes data size > max_size Fixes: 311ce4fe7637d ("ASoC: SOF: Add support for loading topologies") Reviewed-by: Jaska Uimonen <jaska.uimonen@intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Signed-off-by: Dragos Tarcatu <dragos_tarcatu@mentor.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20191106145816.9367-1-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-11-06ARM: dts: stm32: change joystick pinctrl definition on stm32mp157c-ev1Amelie Delaunay
Pins used for joystick are all configured as input. "push-pull" is not a valid setting for an input pin. Fixes: a502b343ebd0 ("pinctrl: stmfx: update pinconf settings") Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: Amelie Delaunay <amelie.delaunay@st.com> Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
2019-11-06ARM: dts: stm32: remove OV5640 pinctrl definition on stm32mp157c-ev1Amelie Delaunay
"push-pull" configuration is now fully handled by the gpiolib and the STMFX pinctrl driver. There is no longer need to declare a pinctrl group to only configure "push-pull" setting for the line. It is done directly by the gpiolib. Fixes: a502b343ebd0 ("pinctrl: stmfx: update pinconf settings") Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: Amelie Delaunay <amelie.delaunay@st.com> Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
2019-11-06ARM: dts: stm32: Fix CAN RAM mapping on stm32mp157cChristophe Roullier
Split the 10Kbytes CAN message RAM to be able to use simultaneously FDCAN1 and FDCAN2 instances. First 5Kbytes are allocated to FDCAN1 and last 5Kbytes are used for FDCAN2. To do so, set the offset to 0x1400 in mram-cfg for FDCAN2. Fixes: d44d6e021301 ("ARM: dts: stm32: change CAN RAM mapping on stm32mp157c") Signed-off-by: Christophe Roullier <christophe.roullier@st.com> Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
2019-11-06ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157Patrice Chotard
Relax qspi pins slew-rate to minimize peak currents. Fixes: 844030057339 ("ARM: dts: stm32: add flash nor support on stm32mp157c eval board") Signed-off-by: Patrice Chotard <patrice.chotard@st.com> Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
2019-11-06scsi: core: Handle drivers which set sg_tablesize to zeroMichael Schmitz
In scsi_mq_setup_tags(), cmd_size is calculated based on zero size for the scatter-gather list in case the low level driver uses SG_NONE in its host template. cmd_size is passed on to the block layer for calculation of the request size, and we've seen NULL pointer dereference errors from the block layer in drivers where SG_NONE is used and a mq IO scheduler is active, apparently as a consequence of this (see commit 68ab2d76e4be ("scsi: cxlflash: Set sg_tablesize to 1 instead of SG_NONE"), and a recent patch by Finn Thain converting the three m68k NFR5380 drivers to avoid setting SG_NONE). Try to avoid these errors by accounting for at least one sg list entry when calculating cmd_size, regardless of whether the low level driver set a zero sg_tablesize. Tested on 030 m68k with the atari_scsi driver - setting sg_tablesize to SG_NONE no longer results in a crash when loading this driver. CC: Finn Thain <fthain@telegraphics.com.au> Link: https://lore.kernel.org/r/1572922150-4358-1-git-send-email-schmitzmic@gmail.com Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-11-05scsi: qla2xxx: fix NPIV tear down processMartin Wilck
Fix two issues with commit f5187b7d1ac6 ("scsi: qla2xxx: Optimize NPIV tear down process"): a missing negation in a wait_event_timeout() condition, and a missing loop end condition. Fixes: f5187b7d1ac6 ("scsi: qla2xxx: Optimize NPIV tear down process") Link: https://lore.kernel.org/r/20191105145550.10268-1-martin.wilck@suse.com Signed-off-by: Martin Wilck <mwilck@suse.com> Acked-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-11-05scsi: sd_zbc: Fix sd_zbc_complete()Damien Le Moal
The ILLEGAL REQUEST/INVALID FIELD IN CDB error generated by an attempt to reset a conventional zone does not apply to the reset write pointer command with the ALL bit set, that is, to REQ_OP_ZONE_RESET_ALL requests. Fix sd_zbc_complete() to be quiet only in the case of REQ_OP_ZONE_RESET, excluding REQ_OP_ZONE_RESET_ALL. Since REQ_OP_ZONE_RESET is the only request handled by sd_zbc_complete(), also simplify the code using a simple if statement. [mkp: applied by hand] Fixes: d81e9d494354 ("scsi: implement REQ_OP_ZONE_RESET_ALL") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191027140549.26272-4-damien.lemoal@wdc.com Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-11-05Documentation: TLS: Add missing counter descriptionTariq Toukan
Add TLS TX counter description for the handshake retransmitted packets that triggers the resync procedure then skip it, going into the regular transmit flow. Fixes: 46a3ea98074e ("net/mlx5e: kTLS, Enhance TX resync flow") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05NFC: fdp: fix incorrect free objectPan Bian
The address of fw_vsc_cfg is on stack. Releasing it with devm_kfree() is incorrect, which may result in a system crash or other security impacts. The expected object to free is *fw_vsc_cfg. Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05net: prevent load/store tearing on sk->sk_stampEric Dumazet
Add a couple of READ_ONCE() and WRITE_ONCE() to prevent load-tearing and store-tearing in sock_read_timestamp() and sock_write_timestamp() This might prevent another KCSAN report. Fixes: 3a0ed3e96197 ("sock: Make sock->sk_stamp thread-safe") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05net: qualcomm: rmnet: Fix potential UAF when unregisteringSean Tranchetti
During the exit/unregistration process of the RmNet driver, the function rmnet_unregister_real_device() is called to handle freeing the driver's internal state and removing the RX handler on the underlying physical device. However, the order of operations this function performs is wrong and can lead to a use after free of the rmnet_port structure. Before calling netdev_rx_handler_unregister(), this port structure is freed with kfree(). If packets are received on any RmNet devices before synchronize_net() completes, they will attempt to use this already-freed port structure when processing the packet. As such, before cleaning up any other internal state, the RX handler must be unregistered in order to guarantee that no further packets will arrive on the device. Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05net/tls: fix sk_msg trim on fallback to copy modeJakub Kicinski
sk_msg_trim() tries to only update curr pointer if it falls into the trimmed region. The logic, however, does not take into the account pointer wrapping that sk_msg_iter_var_prev() does nor (as John points out) the fact that msg->sg is a ring buffer. This means that when the message was trimmed completely, the new curr pointer would have the value of MAX_MSG_FRAGS - 1, which is neither smaller than any other value, nor would it actually be correct. Special case the trimming to 0 length a little bit and rework the comparison between curr and end to take into account wrapping. This bug caused the TLS code to not copy all of the message, if zero copy filled in fewer sg entries than memcopy would need. Big thanks to Alexander Potapenko for the non-KMSAN reproducer. v2: - take into account that msg->sg is a ring buffer (John). Link: https://lore.kernel.org/netdev/20191030160542.30295-1-jakub.kicinski@netronome.com/ (v1) Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface") Reported-by: syzbot+f8495bff23a879a6d0bd@syzkaller.appspotmail.com Reported-by: syzbot+6f50c99e8f6194bf363f@syzkaller.appspotmail.com Co-developed-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05mlx4_core: fix wrong comment about the reason of subtract one from the max_cqesDotan Barak
The reason for the pre-allocation of one CQE is to enable resizing of the CQ. Fix comment accordingly. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.com> Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05net: dsa: bcm_sf2: Fix driver removalFlorian Fainelli
With the DSA core doing the call to dsa_port_disable() we do not need to do that within the driver itself. This could cause an use after free since past dsa_unregister_switch() we should not be accessing any dsa_switch internal structures. Fixes: 0394a63acfe2 ("net: dsa: enable and disable all ports") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05net: sched: prevent duplicate flower rules from tcf_proto destroy raceJohn Hurley
When a new filter is added to cls_api, the function tcf_chain_tp_insert_unique() looks up the protocol/priority/chain to determine if the tcf_proto is duplicated in the chain's hashtable. It then creates a new entry or continues with an existing one. In cls_flower, this allows the function fl_ht_insert_unque to determine if a filter is a duplicate and reject appropriately, meaning that the duplicate will not be passed to drivers via the offload hooks. However, when a tcf_proto is destroyed it is removed from its chain before a hardware remove hook is hit. This can lead to a race whereby the driver has not received the remove message but duplicate flows can be accepted. This, in turn, can lead to the offload driver receiving incorrect duplicate flows and out of order add/delete messages. Prevent duplicates by utilising an approach suggested by Vlad Buslov. A hash table per block stores each unique chain/protocol/prio being destroyed. This entry is only removed when the full destroy (and hardware offload) has completed. If a new flow is being added with the same identiers as a tc_proto being detroyed, then the add request is replayed until the destroy is complete. Fixes: 8b64678e0af8 ("net: sched: refactor tp insert/delete for concurrent execution") Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reported-by: Louis Peens <louis.peens@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05net: hns3: Use the correct style for SPDX License IdentifierNishad Kamdar
This patch corrects the SPDX License Identifier style in header files related to Hisilicon network devices. For C header files Documentation/process/license-rules.rst mandates C-like comments (opposed to C source files where C++ style should be used) Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05bonding: fix state transition issue in link monitoringJay Vosburgh
Since de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring"), the bonding driver has utilized two separate variables to indicate the next link state a particular slave should transition to. Each is used to communicate to a different portion of the link state change commit logic; one to the bond_miimon_commit function itself, and another to the state transition logic. Unfortunately, the two variables can become unsynchronized, resulting in incorrect link state transitions within bonding. This can cause slaves to become stuck in an incorrect link state until a subsequent carrier state transition. The issue occurs when a special case in bond_slave_netdev_event sets slave->link directly to BOND_LINK_FAIL. On the next pass through bond_miimon_inspect after the slave goes carrier up, the BOND_LINK_FAIL case will set the proposed next state (link_new_state) to BOND_LINK_UP, but the new_link to BOND_LINK_DOWN. The setting of the final link state from new_link comes after that from link_new_state, and so the slave will end up incorrectly in _DOWN state. Resolve this by combining the two variables into one. Reported-by: Aleksei Zakharov <zakharov.a.g@yandex.ru> Reported-by: Sha Zhang <zhangsha.zhang@huawei.com> Cc: Mahesh Bandewar <maheshb@google.com> Fixes: de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring") Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2019-11-02 The following pull-request contains BPF updates for your *net* tree. We've added 6 non-merge commits during the last 6 day(s) which contain a total of 8 files changed, 35 insertions(+), 9 deletions(-). The main changes are: 1) Fix ppc BPF JIT's tail call implementation by performing a second pass to gather a stable JIT context before opcode emission, from Eric Dumazet. 2) Fix build of BPF samples sys_perf_event_open() usage to compiled out unavailable test_attr__{enabled,open} checks. Also fix potential overflows in bpf_map_{area_alloc,charge_init} on 32 bit archs, from Björn Töpel. 3) Fix narrow loads of bpf_sysctl context fields with offset > 0 on big endian archs like s390x and also improve the test coverage, from Ilya Leoshkevich. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05Merge branch 'nvme-5.4-rc7' of git://git.infradead.org/nvme into for-linusJens Axboe
Pull NVMe fixes from Keith: "We have a few late nvme fixes for a couple device removal kernel crashes, and a compat fix for a new ioctl introduced during this merge window." * 'nvme-5.4-rc7' of git://git.infradead.org/nvme: nvme: change nvme_passthru_cmd64 to explicitly mark rsvd nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths nvme-rdma: fix a segmentation fault during module unload
2019-11-05taprio: fix panic while hw offload sched list swapIvan Khoronzhuk
Don't swap oper and admin schedules too early, it's not correct and causes crash. Steps to reproduce: 1) tc qdisc replace dev eth0 parent root handle 100 taprio \ num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \ queues 1@0 1@1 1@2 \ base-time $SOME_BASE_TIME \ sched-entry S 01 80000 \ sched-entry S 02 15000 \ sched-entry S 04 40000 \ flags 2 2) tc qdisc replace dev eth0 parent root handle 100 taprio \ base-time $SOME_BASE_TIME \ sched-entry S 01 90000 \ sched-entry S 02 20000 \ sched-entry S 04 40000 \ flags 2 3) tc qdisc replace dev eth0 parent root handle 100 taprio \ base-time $SOME_BASE_TIME \ sched-entry S 01 150000 \ sched-entry S 02 200000 \ sched-entry S 04 40000 \ flags 2 Do 2 3 2 .. steps more times if not happens and observe: [ 305.832319] Unable to handle kernel write to read-only memory at virtual address ffff0000087ce7f0 [ 305.910887] CPU: 0 PID: 0 Comm: swapper/0 Not tainted [ 305.919306] Hardware name: Texas Instruments AM654 Base Board (DT) [...] [ 306.017119] x1 : ffff800848031d88 x0 : ffff800848031d80 [ 306.022422] Call trace: [ 306.024866] taprio_free_sched_cb+0x4c/0x98 [ 306.029040] rcu_process_callbacks+0x25c/0x410 [ 306.033476] __do_softirq+0x10c/0x208 [ 306.037132] irq_exit+0xb8/0xc8 [ 306.040267] __handle_domain_irq+0x64/0xb8 [ 306.044352] gic_handle_irq+0x7c/0x178 [ 306.048092] el1_irq+0xb0/0x128 [ 306.051227] arch_cpu_idle+0x10/0x18 [ 306.054795] do_idle+0x120/0x138 [ 306.058015] cpu_startup_entry+0x20/0x28 [ 306.061931] rest_init+0xcc/0xd8 [ 306.065154] start_kernel+0x3bc/0x3e4 [ 306.068810] Code: f2fbd5b7 f2fbd5b6 d503201f f9400422 (f9000662) [ 306.074900] ---[ end trace 96c8e2284a9d9d6e ]--- [ 306.079507] Kernel panic - not syncing: Fatal exception in interrupt [ 306.085847] SMP: stopping secondary CPUs [ 306.089765] Kernel Offset: disabled Try to explain one of the possible crash cases: The "real" admin list is assigned when admin_sched is set to new_admin, it happens after "swap", that assigns to oper_sched NULL. Thus if call qdisc show it can crash. Farther, next second time, when sched list is updated, the admin_sched is not NULL and becomes the oper_sched, previous oper_sched was NULL so just skipped. But then admin_sched is assigned new_admin, but schedules to free previous assigned admin_sched (that already became oper_sched). Farther, next third time, when sched list is updated, while one more swap, oper_sched is not null, but it was happy to be freed already (while prev. admin update), so while try to free oper_sched the kernel panic happens at taprio_free_sched_cb(). So, move the "swap emulation" where it should be according to function comment from code. Fixes: 9c66d15646760e ("taprio: Add support for hardware offloading") Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05Merge tag 'linux-can-fixes-for-5.4-20191105' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2019-11-05 this is a pull request of 33 patches for net/master. In the first patch Wen Yang's patch adds a missing of_node_put() to CAN device infrastructure. Navid Emamdoost's patch for the gs_usb driver fixes a memory leak in the gs_can_open() error path. Johan Hovold provides two patches, one for the mcba_usb, the other for the usb_8dev driver. Both fix a use-after-free after USB-disconnect. Joakim Zhang's patch improves the flexcan driver, the ECC mechanism is now completely disabled instead of masking the interrupts. The next three patches all target the peak_usb driver. Stephane Grosjean's patch fixes a potential out-of-sync while decoding packets, Johan Hovold's patch fixes a slab info leak, Jeroen Hofstee's patch adds missing reporting of bus off recovery events. Followed by three patches for the c_can driver. Kurt Van Dijck's patch fixes detection of potential missing status IRQs, Jeroen Hofstee's patches add a chip reset on open and add missing reporting of bus off recovery events. Appana Durga Kedareswara rao's patch for the xilinx driver fixes the flags field initialization for axi CAN. The next seven patches target the rx-offload helper, they are by me and Jeroen Hofstee. The error handling in case of a queue overflow is fixed removing a memory leak. Further the error handling in case of queue overflow and skb OOM is cleaned up. The next two patches are by me and target the flexcan and ti_hecc driver. In case of a error during can_rx_offload_queue_sorted() the error counters in the drivers are incremented. Jeroen Hofstee provides 6 patches for the ti_hecc driver, which properly stop the device in ifdown, improve the rx-offload support (which hit mainline in v5.4-rc1), and add missing FIFO overflow and state change reporting. The following four patches target the j1939 protocol. Colin Ian King's patch fixes a memory leak in the j1939_sk_errqueue() handling. Three patches by Oleksij Rempel fix a memory leak on socket release and fix the EOMA packet in the transport protocol. Timo Schlüßler's patch fixes a potential race condition in the mcp251x driver on after suspend. The last patch is by Yegor Yefremov and updates the SPDX-License-Identifier to v3.0. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06nvme: change nvme_passthru_cmd64 to explicitly mark rsvdCharles Machalow
Changing nvme_passthru_cmd64 to add a field: rsvd2. This field is an explicit marker for the padding space added on certain platforms as a result of the enlargement of the result field from 32 bit to 64 bits in size, and fixes differences in struct size when using compat ioctl for 32-bit binaries on 64-bit architecture. Fixes: 65e68edce0db ("nvme: allow 64-bit results in passthru commands") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Charles Machalow <csm10495@gmail.com> [changelog] Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-05drm/i915/gen8+: Add RC6 CTX corruption WAImre Deak
In some circumstances the RC6 context can get corrupted. We can detect this and take the required action, that is disable RC6 and runtime PM. The HW recovers from the corrupted state after a system suspend/resume cycle, so detect the recovery and re-enable RC6 and runtime PM. v2: rebase (Mika) v3: - Move intel_suspend_gt_powersave() to the end of the GEM suspend sequence. - Add commit message. v4: - Rebased on intel_uncore_forcewake_put(i915->uncore, ...) API change. v5: rebased on gem/gt split (Mika) Signed-off-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2019-11-05drm/i915: Lower RM timeout to avoid DSI hard hangsUma Shankar
In BXT/APL, device 2 MMIO reads from MIPI controller requires its PLL to be turned ON. When MIPI PLL is turned off (MIPI Display is not active or connected), and someone (host or GT engine) tries to read MIPI registers, it causes hard hang. This is a hardware restriction or limitation. Driver by itself doesn't read MIPI registers when MIPI display is off. But any userspace application can submit unprivileged batch buffer for execution. In that batch buffer there can be mmio reads. And these reads are allowed even for unprivileged applications. If these register reads are for MIPI DSI controller and MIPI display is not active during that time, then the MMIO read operation causes system hard hang and only way to recover is hard reboot. A genuine process/application won't submit batch buffer like this and doesn't cause any issue. But on a compromised system, a malign userspace process/app can generate such batch buffer and can trigger system hard hang (denial of service attack). The fix is to lower the internal MMIO timeout value to an optimum value of 950us as recommended by hardware team. If the timeout is beyond 1ms (which will hit for any value we choose if MMIO READ on a DSI specific register is performed without PLL ON), it causes the system hang. But if the timeout value is lower than it will be below the threshold (even if timeout happens) and system will not get into a hung state. This will avoid a system hang without losing any programming or GT interrupts, taking the worst case of lowest CDCLK frequency and early DC5 abort into account. Signed-off-by: Uma Shankar <uma.shankar@intel.com> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
2019-11-05drm/i915/cmdparser: Ignore Length operands during command matchingJon Bloomfield
Some of the gen instruction macros (e.g. MI_DISPLAY_FLIP) have the length directly encoded in them. Since these are used directly in the tables, the Length becomes part of the comparison used for matching during parsing. Thus, if the cmd being parsed has a different length to that in the table, it is not matched and the cmd is accepted via the default variable length path. Fix by masking out everything except the Opcode in the cmd tables Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
2019-11-05drm/i915/cmdparser: Add support for backward jumpsJon Bloomfield
To keep things manageable, the pre-gen9 cmdparser does not attempt to track any form of nested BB_START's. This did not prevent usermode from using nested starts, or even chained batches because the cmdparser is not strictly enforced pre gen9. Instead, the existence of a nested BB_START would cause the batch to be emitted in insecure mode, and any privileged capabilities would not be available. For Gen9, the cmdparser becomes mandatory (for BCS at least), and so not providing any form of nested BB_START support becomes overly restrictive. Any such batch will simply not run. We make heavy use of backward jumps in igt, and it is much easier to add support for this restricted subset of nested jumps, than to rewrite the whole of our test suite to avoid them. Add the required logic to support limited backward jumps, to instructions that have already been validated by the parser. Note that it's not sufficient to simply approve any BB_START that jumps backwards in the buffer because this would allow an attacker to embed a rogue instruction sequence within the operand words of a harmless instruction (say LRI) and jump to that. We introduce a bit array to track every instr offset successfully validated, and test the target of BB_START against this. If the target offset hits, it is re-written to the same offset in the shadow buffer and the BB_START cmd is allowed. Note: This patch deliberately ignores checkpatch issues in the cmdtables, in order to match the style of the surrounding code. We'll correct the entire file in one go in a later patch. v2: set dispatch secure late (Mika) v3: rebase (Mika) v4: Clear whitelist on each parse Minor review updates (Chris) v5: Correct backward jump batching v6: fix compilation error due to struct eb shuffle (Mika) Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
2019-11-05drm/i915/cmdparser: Use explicit goto for error pathsJon Bloomfield
In the next patch we will be adding a second valid termination condition which will require a small amount of refactoring to share logic with the BB_END case. Refactor all error conditions to jump to a dedicated exit path, with 'break' reserved only for a successful parse. Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
2019-11-05drm/i915: Add gen9 BCS cmdparsingJon Bloomfield
For gen9 we enable cmdparsing on the BCS ring, specifically to catch inadvertent accesses to sensitive registers Unlike gen7/hsw, we use the parser only to block certain registers. We can rely on h/w to block restricted commands, so the command tables only provide enough info to allow the parser to delineate each command, and identify commands that access registers. Note: This patch deliberately ignores checkpatch issues in favour of matching the style of the surrounding code. We'll correct the entire file in one go in a later patch. v3: rebase (Mika) v4: Add RING_TIMESTAMP registers to whitelist (Jon) Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
2019-11-05drm/i915: Allow parsing of unsized batchesJon Bloomfield
In "drm/i915: Add support for mandatory cmdparsing" we introduced the concept of mandatory parsing. This allows the cmdparser to be invoked even when user passes batch_len=0 to the execbuf ioctl's. However, the cmdparser needs to know the extents of the buffer being scanned. Refactor the code to ensure the cmdparser uses the actual object size, instead of the incoming length, if user passes 0. Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tyler Hicks <tyhicks@canonical.com> Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
2019-11-05drm/i915: Support ro ppgtt mapped cmdparser shadow buffersJon Bloomfield
For Gen7, the original cmdparser motive was to permit limited use of register read/write instructions in unprivileged BB's. This worked by copying the user supplied bb to a kmd owned bb, and running it in secure mode, from the ggtt, only if the scanner finds no unsafe commands or registers. For Gen8+ we can't use this same technique because running bb's from the ggtt also disables access to ppgtt space. But we also do not actually require 'secure' execution since we are only trying to reduce the available command/register set. Instead we will copy the user buffer to a kmd owned read-only bb in ppgtt, and run in the usual non-secure mode. Note that ro pages are only supported by ppgtt (not ggtt), but luckily that's exactly what we need. Add the required paths to map the shadow buffer to ppgtt ro for Gen8+ v2: IS_GEN7/IS_GEN (Mika) v3: rebase v4: rebase v5: rebase Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
2019-11-05drm/i915: Add support for mandatory cmdparsingJon Bloomfield
The existing cmdparser for gen7 can be bypassed by specifying batch_len=0 in the execbuf call. This is safe because bypassing simply reduces the cmd-set available. In a later patch we will introduce cmdparsing for gen9, as a security measure, which must be strictly enforced since without it we are vulnerable to DoS attacks. Introduce the concept of 'required' cmd parsing that cannot be bypassed by submitting zero-length bb's. v2: rebase (Mika) v2: rebase (Mika) v3: fix conflict on engine flags (Mika) Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
2019-11-05drm/i915: Remove Master tables from cmdparserJon Bloomfield
The previous patch has killed support for secure batches on gen6+, and hence the cmdparsers master tables are now dead code. Remove them. Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tyler Hicks <tyhicks@canonical.com> Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
2019-11-05drm/i915: Disable Secure Batches for gen6+Jon Bloomfield
Retroactively stop reporting support for secure batches through the api for gen6+ so that older binaries trigger the fallback path instead. Older binaries use secure batches pre gen6 to access resources that are not available to normal usermode processes. However, all known userspace explicitly checks for HAS_SECURE_BATCHES before relying on the secure batch feature. Since there are no known binaries relying on this for newer gens we can kill secure batches from gen6, via I915_PARAM_HAS_SECURE_BATCHES. v2: rebase (Mika) v3: rebase (Mika) Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
2019-11-05drm/i915: Rename gen7 cmdparser tablesJon Bloomfield
We're about to introduce some new tables for later gens, and the current naming for the gen7 tables will no longer make sense. v2: rebase Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>
2019-11-05ALSA: hda: hdmi - add Tigerlake supportKai Vehmanen
Add Tigerlake HDMI codec support. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205379 BugLink: https://bugs.freedesktop.org/show_bug.cgi?id=112171 Cc: Pan Xiuli <xiuli.pan@linux.intel.com> Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Link: https://lore.kernel.org/r/20191105161053.22958-1-kai.vehmanen@linux.intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-11-05ASoC: max98373: replace gpio_request with devm_gpio_requestYong Zhi
Use devm_gpio_request() to automatic unroll when fails and avoid resource leaks at error paths. Signed-off-by: Yong Zhi <yong.zhi@intel.com> Link: https://lore.kernel.org/r/1572905399-22402-1-git-send-email-yong.zhi@intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-11-05ASoC: stm32: sai: add restriction on mmap supportOlivier Moysan
Do not support mmap in S/PDIF mode. In S/PDIF mode the buffer has to be copied, to allow the channel status bits insertion. Signed-off-by: Olivier Moysan <olivier.moysan@st.com> Link: https://lore.kernel.org/r/20191104133654.28750-1-olivier.moysan@st.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-11-05Merge tag 'for-linus-2019-11-05' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull clone3 stack argument update from Christian Brauner: "This changes clone3() to do basic stack validation and to set up the stack depending on whether or not it is growing up or down. With clone3() the expectation is now very simply that the .stack argument points to the lowest address of the stack and that .stack_size specifies the initial stack size. This is diferent from legacy clone() where the "stack" argument had to point to the lowest or highest address of the stack depending on the architecture. clone3() was released with 5.3. Currently, it is not documented and very unclear to userspace how the stack and stack_size argument have to be passed. After talking to glibc folks we concluded that changing clone3() to determine stack direction and doing basic validation is the right course of action. Note, this is a potentially user visible change. In the very unlikely case, that it breaks someone's use-case we will revert. (And then e.g. place the new behavior under an appropriate flag.) Note that passing an empty stack will continue working just as before. Breaking someone's use-case is very unlikely. Neither glibc nor musl currently expose a wrapper for clone3(). There is currently also no real motivation for anyone to use clone3() directly. First, because using clone{3}() with stacks requires some assembly (see glibc and musl). Second, because it does not provide features that legacy clone() doesn't. New features for clone3() will first happen in v5.5 which is why v5.4 is still a good time to try and make that change now and backport it to v5.3. I did a codesearch on https://codesearch.debian.net, github, and gitlab and could not find any software currently relying directly on clone3(). I expect this to change once we land CLONE_CLEAR_SIGHAND which was a request coming from glibc at which point they'll likely start using it" * tag 'for-linus-2019-11-05' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: clone3: validate stack arguments
2019-11-05Merge tag 'gpio-v5.4-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "More GPIO fixes! We found a late regression in the Intel Merrifield driver. Oh well. We fixed it up. - Fix a build error in the tools used for kselftest - A series of reverts to bring the Intel Merrifield back to working. We will likely unrevert the reverts for v5.5 but we can't have v5.4 broken" * tag 'gpio-v5.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: Revert "gpio: merrifield: Pass irqchip when adding gpiochip" Revert "gpio: merrifield: Restore use of irq_base" Revert "gpio: merrifield: Move hardware initialization to callback" tools: gpio: Use !building_out_of_srctree to determine srctree
2019-11-05watchdog: bd70528: Add MODULE_ALIAS to allow module auto loadingMatti Vaittinen
The bd70528 watchdog driver is probed by MFD driver. Add MODULE_ALIAS in order to allow udev to load the module when MFD sub-device cell for watchdog is added. Fixes: bbc88a0ec9f37 ("watchdog: bd70528: Initial support for ROHM BD70528 watchdog block") Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2019-11-05watchdog: imx_sc_wdt: Pretimeout should follow SCU firmware formatAnson Huang
SCU firmware calculates pretimeout based on current time stamp instead of watchdog timeout stamp, need to convert the pretimeout to SCU firmware's timeout value. Fixes: 15f7d7fc5542 ("watchdog: imx_sc: Add pretimeout support") Signed-off-by: Anson Huang <Anson.Huang@nxp.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2019-11-05watchdog: meson: Fix the wrong value of left timeXingyu Chen
The left time value is wrong when we get it by sysfs. The left time value should be equal to preset timeout value minus elapsed time value. According to the Meson-GXB/GXL datasheets which can be found at [0], the timeout value is saved to BIT[0-15] of the WATCHDOG_TCNT, and elapsed time value is saved to BIT[16-31] of the WATCHDOG_TCNT. [0]: http://linux-meson.com Fixes: 683fa50f0e18 ("watchdog: Add Meson GXBB Watchdog Driver") Signed-off-by: Xingyu Chen <xingyu.chen@amlogic.com> Acked-by: Neil Armstrong <narmstrong@baylibre.com> Reviewed-by: Kevin Hilman <khilman@baylibre.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2019-11-05watchdog: pm8916_wdt: fix pretimeout registration flowJorge Ramirez-Ortiz
When an IRQ is present in the dts, the probe function shall fail if the interrupt can not be registered. The probe function shall also be retried if getting the irq is being deferred. Signed-off-by: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2019-11-05watchdog: cpwd: fix build regressionArnd Bergmann
The compat_ptr_ioctl() infrastructure did not make it into linux-5.4, so cpwd now fails to build. Fix it by using an open-coded version. Fixes: 68f28b01fb9e ("watchdog: cpwd: use generic compat_ptr_ioctl") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2019-11-06nvme-multipath: fix crash in nvme_mpath_clear_ctrl_pathsAnton Eidelman
nvme_mpath_clear_ctrl_paths() iterates through the ctrl->namespaces list while holding ctrl->scan_lock. This does not seem to be the correct way of protecting from concurrent list modification. Specifically, nvme_scan_work() sorts ctrl->namespaces AFTER unlocking scan_lock. This may result in the following (rare) crash in ctrl disconnect during scan_work: BUG: kernel NULL pointer dereference, address: 0000000000000050 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 3995 Comm: nvme 5.3.5-050305-generic RIP: 0010:nvme_mpath_clear_current_path+0xe/0x90 [nvme_core] ... Call Trace: nvme_mpath_clear_ctrl_paths+0x3c/0x70 [nvme_core] nvme_remove_namespaces+0x35/0xe0 [nvme_core] nvme_do_delete_ctrl+0x47/0x90 [nvme_core] nvme_sysfs_delete+0x49/0x60 [nvme_core] dev_attr_store+0x17/0x30 sysfs_kf_write+0x3e/0x50 kernfs_fop_write+0x11e/0x1a0 __vfs_write+0x1b/0x40 vfs_write+0xb9/0x1a0 ksys_write+0x67/0xe0 __x64_sys_write+0x1a/0x20 do_syscall_64+0x5a/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f8d02bfb154 Fix: After taking scan_lock in nvme_mpath_clear_ctrl_paths() down_read(&ctrl->namespaces_rwsem) as well to make list traversal safe. This will not cause deadlocks because taking scan_lock never happens while holding the namespaces_rwsem. Moreover, scan work downs namespaces_rwsem in the same order. Alternative: sort ctrl->namespaces in nvme_scan_work() while still holding the scan_lock. This would leave nvme_mpath_clear_ctrl_paths() without correct protection against ctrl->namespaces modification by anyone other than scan_work. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-06nvme-rdma: fix a segmentation fault during module unloadMax Gurtovoy
In case there are controllers that are not associated with any RDMA device (e.g. during unsuccessful reconnection) and the user will unload the module, these controllers will not be freed and will access already freed memory. The same logic appears in other fabric drivers as well. Fixes: 87fd125344d6 ("nvme-rdma: remove redundant reference between ib_device and tagset") Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-05clone3: validate stack argumentsChristian Brauner
Validate the stack arguments and setup the stack depening on whether or not it is growing down or up. Legacy clone() required userspace to know in which direction the stack is growing and pass down the stack pointer appropriately. To make things more confusing microblaze uses a variant of the clone() syscall selected by CONFIG_CLONE_BACKWARDS3 that takes an additional stack_size argument. IA64 has a separate clone2() syscall which also takes an additional stack_size argument. Finally, parisc has a stack that is growing upwards. Userspace therefore has a lot nasty code like the following: #define __STACK_SIZE (8 * 1024 * 1024) pid_t sys_clone(int (*fn)(void *), void *arg, int flags, int *pidfd) { pid_t ret; void *stack; stack = malloc(__STACK_SIZE); if (!stack) return -ENOMEM; #ifdef __ia64__ ret = __clone2(fn, stack, __STACK_SIZE, flags | SIGCHLD, arg, pidfd); #elif defined(__parisc__) /* stack grows up */ ret = clone(fn, stack, flags | SIGCHLD, arg, pidfd); #else ret = clone(fn, stack + __STACK_SIZE, flags | SIGCHLD, arg, pidfd); #endif return ret; } or even crazier variants such as [3]. With clone3() we have the ability to validate the stack. We can check that when stack_size is passed, the stack pointer is valid and the other way around. We can also check that the memory area userspace gave us is fine to use via access_ok(). Furthermore, we probably should not require userspace to know in which direction the stack is growing. It is easy for us to do this in the kernel and I couldn't find the original reasoning behind exposing this detail to userspace. /* Intentional user visible API change */ clone3() was released with 5.3. Currently, it is not documented and very unclear to userspace how the stack and stack_size argument have to be passed. After talking to glibc folks we concluded that trying to change clone3() to setup the stack instead of requiring userspace to do this is the right course of action. Note, that this is an explicit change in user visible behavior we introduce with this patch. If it breaks someone's use-case we will revert! (And then e.g. place the new behavior under an appropriate flag.) Breaking someone's use-case is very unlikely though. First, neither glibc nor musl currently expose a wrapper for clone3(). Second, there is no real motivation for anyone to use clone3() directly since it does not provide features that legacy clone doesn't. New features for clone3() will first happen in v5.5 which is why v5.4 is still a good time to try and make that change now and backport it to v5.3. Searches on [4] did not reveal any packages calling clone3(). [1]: https://lore.kernel.org/r/CAG48ez3q=BeNcuVTKBN79kJui4vC6nw0Bfq6xc-i0neheT17TA@mail.gmail.com [2]: https://lore.kernel.org/r/20191028172143.4vnnjpdljfnexaq5@wittgenstein [3]: https://github.com/systemd/systemd/blob/5238e9575906297608ff802a27e2ff9effa3b338/src/basic/raw-clone.h#L31 [4]: https://codesearch.debian.net Fixes: 7f192e3cd316 ("fork: add clone3") Cc: Kees Cook <keescook@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-api@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: <stable@vger.kernel.org> # 5.3 Cc: GNU C Library <libc-alpha@sourceware.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Aleksa Sarai <cyphar@cyphar.com> Link: https://lore.kernel.org/r/20191031113608.20713-1-christian.brauner@ubuntu.com
2019-11-05ceph: don't allow copy_file_range when stripe_count != 1Luis Henriques
copy_file_range tries to use the OSD 'copy-from' operation, which simply performs a full object copy. Unfortunately, the implementation of this system call assumes that stripe_count is always set to 1 and doesn't take into account that the data may be striped across an object set. If the file layout has stripe_count different from 1, then the destination file data will be corrupted. For example: Consider a 8 MiB file with 4 MiB object size, stripe_count of 2 and stripe_size of 2 MiB; the first half of the file will be filled with 'A's and the second half will be filled with 'B's: 0 4M 8M Obj1 Obj2 +------+------+ +----+ +----+ file: | AAAA | BBBB | | AA | | AA | +------+------+ |----| |----| | BB | | BB | +----+ +----+ If we copy_file_range this file into a new file (which needs to have the same file layout!), then it will start by copying the object starting at file offset 0 (Obj1). And then it will copy the object starting at file offset 4M -- which is Obj1 again. Unfortunately, the solution for this is to not allow remote object copies to be performed when the file layout stripe_count is not 1 and simply fallback to the default (VFS) copy_file_range implementation. Cc: stable@vger.kernel.org Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-11-05ceph: don't try to handle hashed dentries in non-O_CREAT atomic_openJeff Layton
If ceph_atomic_open is handed a !d_in_lookup dentry, then that means that it already passed d_revalidate so we *know* that it's negative (or at least was very recently). Just return -ENOENT in that case. This also addresses a subtle bug in dentry handling. Non-O_CREAT opens call atomic_open with the parent's i_rwsem shared, but calling d_splice_alias on a hashed dentry requires the exclusive lock. If ceph_atomic_open receives a hashed, negative dentry on a non-O_CREAT open, and another client were to race in and create the file before we issue our OPEN, ceph_fill_trace could end up calling d_splice_alias on the dentry with the new inode with insufficient locks. Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>