summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-06-19KVM: x86: Modify struct kvm_nested_state to have explicit fields for dataLiran Alon
Improve the KVM_{GET,SET}_NESTED_STATE structs by detailing the format of VMX nested state data in a struct. In order to avoid changing the ioctl values of KVM_{GET,SET}_NESTED_STATE, there is a need to preserve sizeof(struct kvm_nested_state). This is done by defining the data struct as "data.vmx[0]". It was the most elegant way I found to preserve struct size while still keeping struct readable and easy to maintain. It does have a misfortunate side-effect that now it has to be accessed as "data.vmx[0]" rather than just "data.vmx". Because we are already modifying these structs, I also modified the following: * Define the "format" field values as macros. * Rename vmcs_pa to vmcs12_pa for better readability. Signed-off-by: Liran Alon <liran.alon@oracle.com> [Remove SVM stubs, add KVM_STATE_NESTED_VMX_VMCS12_SIZE. - Paolo] Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-19fanotify: update connector fsid cache on add markAmir Goldstein
When implementing connector fsid cache, we only initialized the cache when the first mark added to object was added by FAN_REPORT_FID group. We forgot to update conn->fsid when the second mark is added by FAN_REPORT_FID group to an already attached connector without fsid cache. Reported-and-tested-by: syzbot+c277e8e2f46414645508@syzkaller.appspotmail.com Fixes: 77115225acc6 ("fanotify: cache fsid in fsnotify_mark_connector") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-06-19quota: fix a problem about transfer quotayangerkun
Run below script as root, dquot_add_space will return -EDQUOT since __dquot_transfer call dquot_add_space with flags=0, and dquot_add_space think it's a preallocation. Fix it by set flags as DQUOT_SPACE_WARN. mkfs.ext4 -O quota,project /dev/vdb mount -o prjquota /dev/vdb /mnt setquota -P 23 1 1 0 0 /dev/vdb dd if=/dev/zero of=/mnt/test-file bs=4K count=1 chattr -p 23 test-file Fixes: 7b9ca4c61bc2 ("quota: Reduce contention on dq_data_lock") Signed-off-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-06-19drm/i915: Don't clobber M/N values during fastset checkVille Syrjälä
We're now calling intel_pipe_config_compare(..., true) uncoditionally which means we're always going clobber the calculated M/N values with the old values if the fuzzy M/N check passes. That causes problems because the fuzzy check allows for a huge difference in the values. I'm actually tempted to just make the M/N checks exact, but that might prevent fastboot from kicking in when people want it. So for now let's overwrite the computed values with the old values only if decide to skip the modeset. v2: Copy has_drrs along with M/N M2/N2 values Cc: stable@vger.kernel.org Cc: Blubberbub@protonmail.com Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Hans de Goede <hdegoede@redhat.com> Tested-by: Blubberbub@protonmail.com Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110782 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110675 Fixes: d19f958db23c ("drm/i915: Enable fastset for non-boot modesets.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190612172423.25231-1-ville.syrjala@linux.intel.com Reviewed-by: Imre Deak <imre.deak@intel.com> (cherry picked from commit f0521558a2a89d58a08745e225025d338572e60a) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190619120929.4057-1-ville.syrjala@linux.intel.com
2019-06-19powerpc: enable a 30-bit ZONE_DMA for 32-bit pmacChristoph Hellwig
With the strict dma mask checking introduced with the switch to the generic DMA direct code common wifi chips on 32-bit powerbooks stopped working. Add a 30-bit ZONE_DMA to the 32-bit pmac builds to allow them to reliably allocate dma coherent memory. Fixes: 65a21b71f948 ("powerpc/dma: remove dma_nommu_dma_supported") Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Larry Finger <Larry.Finger@lwfinger.net> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-19ovl: make i_ino consistent with st_ino in more casesAmir Goldstein
Relax the condition that overlayfs supports nfs export, to require that i_ino is consistent with st_ino/d_ino. It is enough to require that st_ino and d_ino are consistent. This fixes the failure of xfstest generic/504, due to mismatch of st_ino to inode number in the output of /proc/locks. Fixes: 12574a9f4c9c ("ovl: consistent i_ino for non-samefs with xino") Cc: <stable@vger.kernel.org> # v4.19 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-06-19Merge tag 'gvt-fixes-2019-06-19' of https://github.com/intel/gvt-linux into ↵Jani Nikula
drm-intel-fixes gvt-fixes-2019-06-19 - Fix reserved PVINFO register write (Weinan) Signed-off-by: Jani Nikula <jani.nikula@intel.com> From: Zhenyu Wang <zhenyuw@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190619062240.GM9684@zhen-hp.sh.intel.com
2019-06-18scsi: qla2xxx: Fix hardlockup in abort command during driver removeArun Easi
[436194.555537] NMI watchdog: Watchdog detected hard LOCKUP on cpu 5 [436194.555558] RIP: 0010:native_queued_spin_lock_slowpath+0x63/0x1e0 [436194.555563] Call Trace: [436194.555564] _raw_spin_lock_irqsave+0x30/0x40 [436194.555564] qla24xx_async_abort_command+0x29/0xd0 [qla2xxx] [436194.555565] qla24xx_abort_command+0x208/0x2d0 [qla2xxx] [436194.555565] __qla2x00_abort_all_cmds+0x16b/0x290 [qla2xxx] [436194.555565] qla2x00_abort_all_cmds+0x42/0x60 [qla2xxx] [436194.555566] qla2x00_abort_isp_cleanup+0x2bd/0x3a0 [qla2xxx] [436194.555566] qla2x00_remove_one+0x1ad/0x360 [qla2xxx] [436194.555566] pci_device_remove+0x3b/0xb0 Fixes: 219d27d7147e (scsi: qla2xxx: Fix race conditions in the code for aborting SCSI commands) Cc: stable@vger.kernel.org # 5.2 Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18scsi: ufs: Avoid runtime suspend possibly being blocked foreverStanley Chu
UFS runtime suspend can be triggered after pm_runtime_enable() is invoked in ufshcd_pltfrm_init(). However if the first runtime suspend is triggered before binding ufs_hba structure to ufs device structure via platform_set_drvdata(), then UFS runtime suspend will be no longer triggered in the future because its dev->power.runtime_error was set in the first triggering and does not have any chance to be cleared. To be more clear, dev->power.runtime_error is set if hba is NULL in ufshcd_runtime_suspend() which returns -EINVAL to rpm_callback() where dev->power.runtime_error is set as -EINVAL. In this case, any future rpm_suspend() for UFS device fails because rpm_check_suspend_allowed() fails due to non-zero dev->power.runtime_error. To resolve this issue, make sure the first UFS runtime suspend get valid "hba" in ufshcd_runtime_suspend(): Enable UFS runtime PM only after hba is successfully bound to UFS device structure. Fixes: 62694735ca95 ([SCSI] ufs: Add runtime PM support for UFS host controller driver) Cc: stable@vger.kernel.org Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18scsi: qedi: update driver version to 8.37.0.20Nilesh Javali
Update qedi driver version to 8.37.0.20 Signed-off-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18scsi: qedi: Check targetname while finding boot target informationNilesh Javali
The kernel panic was observed during iSCSI discovery via offload with below call trace, [ 2115.646901] BUG: unable to handle kernel NULL pointer dereference at (null) [ 2115.646909] IP: [<ffffffffacf7f0cc>] strncmp+0xc/0x60 [ 2115.646927] PGD 0 [ 2115.646932] Oops: 0000 [#1] SMP [ 2115.647107] CPU: 24 PID: 264 Comm: kworker/24:1 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1 [ 2115.647133] Workqueue: slowpath-13:00. qed_slowpath_task [qed] [ 2115.647135] task: ffff8d66af80b0c0 ti: ffff8d66afb80000 task.ti: ffff8d66afb80000 [ 2115.647136] RIP: 0010:[<ffffffffacf7f0cc>] [<ffffffffacf7f0cc>] strncmp+0xc/0x60 [ 2115.647141] RSP: 0018:ffff8d66afb83c68 EFLAGS: 00010206 [ 2115.647143] RAX: 0000000000000001 RBX: 0000000000000007 RCX: 000000000000000a [ 2115.647144] RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffff8d632b3ba040 [ 2115.647145] RBP: ffff8d66afb83c68 R08: 0000000000000000 R09: 000000000000ffff [ 2115.647147] R10: 0000000000000007 R11: 0000000000000800 R12: ffff8d66a30007a0 [ 2115.647148] R13: ffff8d66747a3c10 R14: ffff8d632b3ba000 R15: ffff8d66747a32f8 [ 2115.647149] FS: 0000000000000000(0000) GS:ffff8d66aff00000(0000) knlGS:0000000000000000 [ 2115.647151] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2115.647152] CR2: 0000000000000000 CR3: 0000000509610000 CR4: 00000000007607e0 [ 2115.647153] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2115.647154] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2115.647155] PKRU: 00000000 [ 2115.647157] Call Trace: [ 2115.647165] [<ffffffffc0634cc5>] qedi_get_protocol_tlv_data+0x2c5/0x510 [qedi] [ 2115.647184] [<ffffffffc05968f5>] ? qed_mfw_process_tlv_req+0x245/0xbe0 [qed] [ 2115.647195] [<ffffffffc05496cb>] qed_mfw_fill_tlv_data+0x4b/0xb0 [qed] [ 2115.647206] [<ffffffffc0596911>] qed_mfw_process_tlv_req+0x261/0xbe0 [qed] [ 2115.647215] [<ffffffffacce0e8e>] ? dequeue_task_fair+0x41e/0x660 [ 2115.647221] [<ffffffffacc2a59e>] ? __switch_to+0xce/0x580 [ 2115.647230] [<ffffffffc0546013>] qed_slowpath_task+0xa3/0x160 [qed] [ 2115.647278] RIP [<ffffffffacf7f0cc>] strncmp+0xc/0x60 Fix kernel panic by validating the session targetname before providing TLV data and confirming the presence of boot targets. Signed-off-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18RDMA/mlx5: Enable decap and packet reformat on FDBMaor Gottlieb
If FDB flow tables support decap operation, enable it on creation, This allows to perform decapsulation of tunnelled packets by steering rules. If FDB flow tables support reformat operation, enable it on creation as well. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18RDMA/mlx5: Consider eswitch encap modeMaor Gottlieb
When flow steering is created, then the encap support should consider the eswitch encap mode. If the eswitch flow table (FDB) supports encap then it shouldn't be supported on NIC RX flow tables. Fixes: 4adda1122c490 ('RDMA/mlx5: Enable decap and packet reformat on flow tables') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18Merge remote-tracking branch 'mlx5-next/mlx5-next' into HEADDoug Ledford
Take mlx5-next so we can take a dependent two patch series next. Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18RDMA/odp: Fix missed unlock in non-blocking invalidate_startJason Gunthorpe
If invalidate_start returns with EAGAIN then the umem_rwsem needs to be unlocked as no invalidate_end will be called. Cc: <stable@vger.kernel.org> Fixes: ca748c39ea3f ("RDMA/umem: Get rid of per_mm->notifier_count") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18IB/hfi1: Spelling s/statisfied/satisfied/Geert Uytterhoeven
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18RDMA: Report available cdevs through RDMA_NLDEV_CMD_GET_CHARDEVJason Gunthorpe
Update the struct ib_client for all modules exporting cdevs related to the ibdevice to also implement RDMA_NLDEV_CMD_GET_CHARDEV. All cdevs are now autoloadable and discoverable by userspace over netlink instead of relying on sysfs. uverbs also exposes the DRIVER_ID for drivers that are able to support driver id binding in rdma-core. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18RDMA: Add NLDEV_GET_CHARDEV to allow char dev discovery and autoloadJason Gunthorpe
Allow userspace to issue a netlink query against the ib_device for something like "uverbs" and get back the char dev name, inode major/minor, and interface ABI information for "uverbs0". Since we are now in netlink this can also trigger a module autoload to make the uverbs device come into existence. Largely this will let us replace searching and reading inside sysfs to setup devices, and provides an alternative (using driver_id) to device name based provider binding for things like rxe. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Module autoload for masquerade and redirection does not work. 2) Leak in unqueued packets in nf_ct_frag6_queue(). Ignore duplicated fragments, pretend they are placed into the queue. Patches from Guillaume Nault. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18hvsock: fix epollout hang from race conditionSunil Muthuswamy
Currently, hvsock can enter into a state where epoll_wait on EPOLLOUT will not return even when the hvsock socket is writable, under some race condition. This can happen under the following sequence: - fd = socket(hvsocket) - fd_out = dup(fd) - fd_in = dup(fd) - start a writer thread that writes data to fd_out with a combination of epoll_wait(fd_out, EPOLLOUT) and - start a reader thread that reads data from fd_in with a combination of epoll_wait(fd_in, EPOLLIN) - On the host, there are two threads that are reading/writing data to the hvsocket stack: hvs_stream_has_space hvs_notify_poll_out vsock_poll sock_poll ep_poll Race condition: check for epollout from ep_poll(): assume no writable space in the socket hvs_stream_has_space() returns 0 check for epollin from ep_poll(): assume socket has some free space < HVS_PKT_LEN(HVS_SEND_BUF_SIZE) hvs_stream_has_space() will clear the channel pending send size host will not notify the guest because the pending send size has been cleared and so the hvsocket will never mark the socket writable Now, the EPOLLOUT will never return even if the socket write buffer is empty. The fix is to set the pending size to the default size and never change it. This way the host will always notify the guest whenever the writable space is bigger than the pending size. The host is already optimized to *only* notify the guest when the pending size threshold boundary is crossed and not everytime. This change also reduces the cpu usage somewhat since hv_stream_has_space() is in the hotpath of send: vsock_stream_sendmsg()->hv_stream_has_space() Earlier hv_stream_has_space was setting/clearing the pending size on every call. Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18net/udp_gso: Allow TX timestamp with UDP GSOFred Klassen
Fixes an issue where TX Timestamps are not arriving on the error queue when UDP_SEGMENT CMSG type is combined with CMSG type SO_TIMESTAMPING. This can be illustrated with an updated updgso_bench_tx program which includes the '-T' option to test for this condition. It also introduces the '-P' option which will call poll() before reading the error queue. ./udpgso_bench_tx -4ucTPv -S 1472 -l2 -D 172.16.120.18 poll timeout udp tx: 0 MB/s 1 calls/s 1 msg/s The "poll timeout" message above indicates that TX timestamp never arrived. This patch preserves tx_flags for the first UDP GSO segment. Only the first segment is timestamped, even though in some cases there may be benefital in timestamping both the first and last segment. Factors in deciding on first segment timestamp only: - Timestamping both first and last segmented is not feasible. Hardware can only have one outstanding TS request at a time. - Timestamping last segment may under report network latency of the previous segments. Even though the doorbell is suppressed, the ring producer counter has been incremented. - Timestamping the first segment has the upside in that it reports timestamps from the application's view, e.g. RTT. - Timestamping the first segment has the downside that it may underreport tx host network latency. It appears that we have to pick one or the other. And possibly follow-up with a config flag to choose behavior. v2: Remove tests as noted by Willem de Bruijn <willemb@google.com> Moving tests from net to net-next v3: Update only relevant tx_flag bits as per Willem de Bruijn <willemb@google.com> v4: Update comments and commit message as per Willem de Bruijn <willemb@google.com> Fixes: ee80d1ebe5ba ("udp: add udp gso") Signed-off-by: Fred Klassen <fklassen@appneta.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18Merge branch 'net-netem-fix-issues-with-corrupting-GSO-frames'David S. Miller
Jakub Kicinski says: ==================== net: netem: fix issues with corrupting GSO frames Corrupting GSO frames currently leads to crashes, due to skb use after free. These stem from the skb list handling - the segmented skbs come back on a list, and this list is not properly unlinked before enqueuing the segments. Turns out this condition is made very likely to occur because of another bug - in backlog accounting. Segments are counted twice, which means qdisc's limit gets reached leading to drops and making the use after free very likely to happen. The bugs are fixed in order in which they were added to the tree. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18net: netem: fix use after free and double free with packet corruptionJakub Kicinski
Brendan reports that the use of netem's packet corruption capability leads to strange crashes. This seems to be caused by commit d66280b12bd7 ("net: netem: use a list in addition to rbtree") which uses skb->next pointer to construct a fast-path queue of in-order skbs. Packet corruption code has to invoke skb_gso_segment() in case of skbs in need of GSO. skb_gso_segment() returns a list of skbs. If next pointers of the skbs on that list do not get cleared fast path list may point to freed skbs or skbs which are also on the RB tree. Let's say skb gets segmented into 3 frames: A -> B -> C A gets hooked to the t_head t_tail list by tfifo_enqueue(), but it's next pointer didn't get cleared so we have: h t |/ A -> B -> C Now if B and C get also get enqueued successfully all is fine, because tfifo_enqueue() will overwrite the list in order. IOW: Enqueue B: h t | | A -> B C Enqueue C: h t | | A -> B -> C But if B and C get reordered we may end up with: h t RB tree |/ | A -> B -> C B \ C Or if they get dropped just: h t |/ A -> B -> C where A and B are already freed. To reproduce either limit has to be set low to cause freeing of segs or reorders have to happen (due to delay jitter). Note that we only have to mark the first segment as not on the list, "finish_segs" handling of other frags already does that. Another caveat is that qdisc_drop_all() still has to free all segments correctly in case of drop of first segment, therefore we re-link segs before calling it. v2: - re-link before drop, v1 was leaking non-first segs if limit was hit at the first seg - better commit message which lead to discovering the above :) Reported-by: Brendan Galloway <brendan.galloway@netronome.com> Fixes: d66280b12bd7 ("net: netem: use a list in addition to rbtree") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18net: netem: fix backlog accounting for corrupted GSO framesJakub Kicinski
When GSO frame has to be corrupted netem uses skb_gso_segment() to produce the list of frames, and re-enqueues the segments one by one. The backlog length has to be adjusted to account for new frames. The current calculation is incorrect, leading to wrong backlog lengths in the parent qdisc (both bytes and packets), and incorrect packet backlog count in netem itself. Parent backlog goes negative, netem's packet backlog counts all non-first segments twice (thus remaining non-zero even after qdisc is emptied). Move the variables used to count the adjustment into local scope to make 100% sure they aren't used at any stage in backports. Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18net: lio_core: fix potential sign-extension overflow on large shiftColin Ian King
Left shifting the signed int value 1 by 31 bits has undefined behaviour and the shift amount oq_no can be as much as 63. Fix this by using BIT_ULL(oq_no) instead. Addresses-Coverity: ("Bad shift operation") Fixes: f21fb3ed364b ("Add support of Cavium Liquidio ethernet adapters") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18Merge branch 'net-fix-quite-a-few-dst_cache-crashes-reported-by-syzbot'David S. Miller
Xin Long says: ==================== net: fix quite a few dst_cache crashes reported by syzbot There are two kinds of crashes reported many times by syzbot with no reproducer. Call Traces are like: BUG: KASAN: slab-out-of-bounds in rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556 rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556 __mkroute_output net/ipv4/route.c:2332 [inline] ip_route_output_key_hash_rcu+0x819/0x2d50 net/ipv4/route.c:2564 ip_route_output_key_hash+0x1ef/0x360 net/ipv4/route.c:2393 __ip_route_output_key include/net/route.h:125 [inline] ip_route_output_flow+0x28/0xc0 net/ipv4/route.c:2651 ip_route_output_key include/net/route.h:135 [inline] ... or: kasan: GPF could be caused by NULL-ptr deref or user memory access RIP: 0010:dst_dev_put+0x24/0x290 net/core/dst.c:168 <IRQ> rt_fibinfo_free_cpus net/ipv4/fib_semantics.c:200 [inline] free_fib_info_rcu+0x2e1/0x490 net/ipv4/fib_semantics.c:217 __rcu_reclaim kernel/rcu/rcu.h:240 [inline] rcu_do_batch kernel/rcu/tree.c:2437 [inline] invoke_rcu_callbacks kernel/rcu/tree.c:2716 [inline] rcu_process_callbacks+0x100a/0x1ac0 kernel/rcu/tree.c:2697 ... They were caused by the fib_nh_common percpu member 'nhc_pcpu_rth_output' overwritten by another percpu variable 'dev->tstats' access overflow in tipc udp media xmit path when counting packets on a non tunnel device. The fix is to make udp tunnel work with no tunnel device by allowing not to count packets on the tstats when the tunnel dev is NULL in Patches 1/3 and 2/3, then pass a NULL tunnel dev in tipc_udp_tunnel() in Patch 3/3. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skbXin Long
udp_tunnel(6)_xmit_skb() called by tipc_udp_xmit() expects a tunnel device to count packets on dev->tstats, a perpcu variable. However, TIPC is using udp tunnel with no tunnel device, and pass the lower dev, like veth device that only initializes dev->lstats(a perpcu variable) when creating it. Later iptunnel_xmit_stats() called by ip(6)tunnel_xmit() thinks the dev as a tunnel device, and uses dev->tstats instead of dev->lstats. tstats' each pointer points to a bigger struct than lstats, so when tstats->tx_bytes is increased, other percpu variable's members could be overwritten. syzbot has reported quite a few crashes due to fib_nh_common percpu member 'nhc_pcpu_rth_output' overwritten, call traces are like: BUG: KASAN: slab-out-of-bounds in rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556 rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556 __mkroute_output net/ipv4/route.c:2332 [inline] ip_route_output_key_hash_rcu+0x819/0x2d50 net/ipv4/route.c:2564 ip_route_output_key_hash+0x1ef/0x360 net/ipv4/route.c:2393 __ip_route_output_key include/net/route.h:125 [inline] ip_route_output_flow+0x28/0xc0 net/ipv4/route.c:2651 ip_route_output_key include/net/route.h:135 [inline] ... or: kasan: GPF could be caused by NULL-ptr deref or user memory access RIP: 0010:dst_dev_put+0x24/0x290 net/core/dst.c:168 <IRQ> rt_fibinfo_free_cpus net/ipv4/fib_semantics.c:200 [inline] free_fib_info_rcu+0x2e1/0x490 net/ipv4/fib_semantics.c:217 __rcu_reclaim kernel/rcu/rcu.h:240 [inline] rcu_do_batch kernel/rcu/tree.c:2437 [inline] invoke_rcu_callbacks kernel/rcu/tree.c:2716 [inline] rcu_process_callbacks+0x100a/0x1ac0 kernel/rcu/tree.c:2697 ... The issue exists since tunnel stats update is moved to iptunnel_xmit by Commit 039f50629b7f ("ip_tunnel: Move stats update to iptunnel_xmit()"), and here to fix it by passing a NULL tunnel dev to udp_tunnel(6)_xmit_skb so that the packets counting won't happen on dev->tstats. Reported-by: syzbot+9d4c12bfd45a58738d0a@syzkaller.appspotmail.com Reported-by: syzbot+a9e23ea2aa21044c2798@syzkaller.appspotmail.com Reported-by: syzbot+c4c4b2bb358bb936ad7e@syzkaller.appspotmail.com Reported-by: syzbot+0290d2290a607e035ba1@syzkaller.appspotmail.com Reported-by: syzbot+a43d8d4e7e8a7a9e149e@syzkaller.appspotmail.com Reported-by: syzbot+a47c5f4c6c00fc1ed16e@syzkaller.appspotmail.com Fixes: 039f50629b7f ("ip_tunnel: Move stats update to iptunnel_xmit()") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18ip6_tunnel: allow not to count pkts on tstats by passing dev as NULLXin Long
A similar fix to Patch "ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL" is also needed by ip6_tunnel. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULLXin Long
iptunnel_xmit() works as a common function, also used by a udp tunnel which doesn't have to have a tunnel device, like how TIPC works with udp media. In these cases, we should allow not to count pkts on dev's tstats, so that udp tunnel can work with no tunnel device safely. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18apparmor: reset pos on failure to unpack for various functionsMike Salvatore
Each function that manipulates the aa_ext struct should reset it's "pos" member on failure. This ensures that, on failure, no changes are made to the state of the aa_ext struct. There are paths were elements are optional and the error path is used to indicate the optional element is not present. This means instead of just aborting on error the unpack stream can become unsynchronized on optional elements, if using one of the affected functions. Cc: stable@vger.kernel.org Fixes: 736ec752d95e ("AppArmor: policy routines for loading and unpacking policy") Signed-off-by: Mike Salvatore <mike.salvatore@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2019-06-18apparmor: enforce nullbyte at end of tag stringJann Horn
A packed AppArmor policy contains null-terminated tag strings that are read by unpack_nameX(). However, unpack_nameX() uses string functions on them without ensuring that they are actually null-terminated, potentially leading to out-of-bounds accesses. Make sure that the tag string is null-terminated before passing it to strcmp(). Cc: stable@vger.kernel.org Fixes: 736ec752d95e ("AppArmor: policy routines for loading and unpacking policy") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2019-06-18apparmor: fix PROFILE_MEDIATES for untrusted inputJohn Johansen
While commit 11c236b89d7c2 ("apparmor: add a default null dfa") ensure every profile has a policy.dfa it does not resize the policy.start[] to have entries for every possible start value. Which means PROFILE_MEDIATES is not safe to use on untrusted input. Unforunately commit b9590ad4c4f2 ("apparmor: remove POLICY_MEDIATES_SAFE") did not take into account the start value usage. The input string in profile_query_cb() is user controlled and is not properly checked to be within the limited start[] entries, even worse it can't be as userspace policy is allowed to make us of entries types the kernel does not know about. This mean usespace can currently cause the kernel to access memory up to 240 entries beyond the start array bounds. Cc: stable@vger.kernel.org Fixes: b9590ad4c4f2 ("apparmor: remove POLICY_MEDIATES_SAFE") Signed-off-by: John Johansen <john.johansen@canonical.com>
2019-06-18RDMA/efa: Handle mmap insertions overflowGal Pressman
When inserting a new mmap entry to the xarray we should check for 'mmap_page' overflow as it is limited to 32 bits. Fixes: 40909f664d27 ("RDMA/efa: Add EFA verbs implementation") Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18Merge branch 'md-fixes' of https://github.com/liu-song-6/linux into for-linusJens Axboe
Pull MD fix from Song. * 'md-fixes' of https://github.com/liu-song-6/linux: md: fix for divide error in status_resync
2019-06-18Merge tag 'for-5.2-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - regression where properties stored as xattrs are not properly persisted - a small readahead fix (the fstests testcase for that fix hangs on unpatched kernel, so we'd like get it merged to ease future testing) - fix a race during block group creation and deletion * tag 'for-5.2-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix failure to persist compression property xattr deletion on fsync btrfs: start readahead also in seed devices Btrfs: fix race between block group removal and block group allocation
2019-06-18Merge tag 'armsoc-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Olof Johansson: "I've been bad at collecting fixes this release cycle, so this is a fairly large batch that's been trickling in for a while. It's the usual mix, more or less. Some of the bigger things fixed: - Voltage fix for MMC on TI DRA7 that sometimes would overvoltage cards - Regression fixes for D_CAN on am355x - i.MX6SX cpuidle fix to deal with wakeup latency (dropped uart chars) - DT fixes for some DRA7 variants that don't share the superset of blocks on the chip plus the usual mix of stuff: minor build/warning fixes, Kconfig dependencies, and some DT fixlets" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (28 commits) soc: ixp4xx: npe: Fix an IS_ERR() vs NULL check in probe ARM: ixp4xx: include irqs.h where needed ARM: ixp4xx: mark ixp4xx_irq_setup as __init ARM: ixp4xx: don't select SERIAL_OF_PLATFORM firmware: trusted_foundations: add ARMv7 dependency MAINTAINERS: Change QCOM repo location ARM: davinci: da8xx: specify dma_coherent_mask for lcdc ARM: davinci: da850-evm: call regulator_has_full_constraints() ARM: mvebu_v7_defconfig: fix Ethernet on Clearfog ARM: dts: am335x phytec boards: Fix cd-gpios active level ARM: dts: dra72x: Disable usb4_tm target module arm64: arch_k3: Fix kconfig dependency warning ARM: dts: Drop bogus CLKSEL for timer12 on dra7 MAINTAINERS: Update Stefan Wahren email address ARM: dts: bcm: Add missing device_type = "memory" property soc: bcm: brcmstb: biuctrl: Register writes require a barrier soc: brcmstb: Fix error path for unsupported CPUs ARM: dts: dra71x: Disable usb4_tm target module ARM: dts: dra71x: Disable rtc target module ARM: dts: dra76x: Disable usb4_tm target module ...
2019-06-18tun: wake up waitqueues after IFF_UP is setFei Li
Currently after setting tap0 link up, the tun code wakes tx/rx waited queues up in tun_net_open() when .ndo_open() is called, however the IFF_UP flag has not been set yet. If there's already a wait queue, it would fail to transmit when checking the IFF_UP flag in tun_sendmsg(). Then the saving vhost_poll_start() will add the wq into wqh until it is waken up again. Although this works when IFF_UP flag has been set when tun_chr_poll detects; this is not true if IFF_UP flag has not been set at that time. Sadly the latter case is a fatal error, as the wq will never be waken up in future unless later manually setting link up on purpose. Fix this by moving the wakeup process into the NETDEV_UP event notifying process, this makes sure IFF_UP has been set before all waited queues been waken up. Signed-off-by: Fei Li <lifei.shirley@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18Merge tag 'meminit-v5.2-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull stack init fix from Kees Cook: "This is a small update to the stack auto-initialization self-test code to deal with the Clang initialization pattern. It's been in linux-next for a couple weeks; I had waited a bit wondering if anything more substantial was going to show up, but nothing has, so I'm sending this now before it gets too late" * tag 'meminit-v5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: lib/test_stackinit: Handle Clang auto-initialization pattern
2019-06-18drm: return -EFAULT if copy_to_user() failsDan Carpenter
The copy_from_user() function returns the number of bytes remaining to be copied but we want to return a negative error code. Otherwise the callers treat it as a successful copy. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Sean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190618131843.GA29463@mwanda
2019-06-18net: remove duplicate fetch in sock_getsockoptJingYi Hou
In sock_getsockopt(), 'optlen' is fetched the first time from userspace. 'len < 0' is then checked. Then in condition 'SO_MEMINFO', 'optlen' is fetched the second time from userspace. If change it between two fetches may cause security problems or unexpected behaivor, and there is no reason to fetch it a second time. To fix this, we need to remove the second fetch. Signed-off-by: JingYi Hou <houjingyi647@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18tipc: fix issues with early FAILOVER_MSG from peerTuong Lien
It appears that a FAILOVER_MSG can come from peer even when the failure link is resetting (i.e. just after the 'node_write_unlock()'...). This means the failover procedure on the node has not been started yet. The situation is as follows: node1 node2 linkb linka linka linkb | | | | | | x failure | | | RESETTING | | | | | | x failure RESET | | RESETTING FAILINGOVER | | | (FAILOVER_MSG) | | |<-------------------------------------------------| | *FAILINGOVER | | | | | (dummy FAILOVER_MSG) | | |------------------------------------------------->| | RESET | | FAILOVER_END | FAILINGOVER RESET | . . . . . . . . . . . . Once this happens, the link failover procedure will be triggered wrongly on the receiving node since the node isn't in FAILINGOVER state but then another link failover will be carried out. The consequences are: 1) A peer might get stuck in FAILINGOVER state because the 'sync_point' was set, reset and set incorrectly, the criteria to end the failover would not be met, it could keep waiting for a message that has already received. 2) The early FAILOVER_MSG(s) could be queued in the link failover deferdq but would be purged or not pulled out because the 'drop_point' was not set correctly. 3) The early FAILOVER_MSG(s) could be dropped too. 4) The dummy FAILOVER_MSG could make the peer leaving FAILINGOVER state shortly, but later on it would be restarted. The same situation can also happen when the link is in PEER_RESET state and a FAILOVER_MSG arrives. The commit resolves the issues by forcing the link down immediately, so the failover procedure will be started normally (which is the same as when receiving a FAILOVER_MSG and the link is in up state). Also, the function "tipc_node_link_failover()" is toughen to avoid such a situation from happening. Acked-by: Jon Maloy <jon.maloy@ericsson.se> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18bnx2x: Check if transceiver implements DDM before accessMauro S. M. Rodrigues
Some transceivers may comply with SFF-8472 even though they do not implement the Digital Diagnostic Monitoring (DDM) interface described in the spec. The existence of such area is specified by the 6th bit of byte 92, set to 1 if implemented. Currently, without checking this bit, bnx2x fails trying to read sfp module's EEPROM with the follow message: ethtool -m enP5p1s0f1 Cannot get Module EEPROM data: Input/output error Because it fails to read the additional 256 bytes in which it is assumed to exist the DDM data. This issue was noticed using a Mellanox Passive DAC PN 01FT738. The EEPROM data was confirmed by Mellanox as correct and similar to other Passive DACs from other manufacturers. Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com> Acked-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18xhci: detect USB 3.2 capable host controllers correctlyMathias Nyman
USB 3.2 capability in a host can be detected from the xHCI Supported Protocol Capability major and minor revision fields. If major is 0x3 and minor 0x20 then the host is USB 3.2 capable. For USB 3.2 capable hosts set the root hub lane count to 2. The Major Revision and Minor Revision fields contain a BCD version number. The value of the Major Revision field is JJh and the value of the Minor Revision field is MNh for version JJ.M.N, where JJ = major revision number, M - minor version number, N = sub-minor version number, e.g. version 3.1 is represented with a value of 0310h. Also fix the extra whitespace printed out when announcing regular SuperSpeed hosts. Cc: <stable@vger.kernel.org> # v4.18+ Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-18usb: xhci: Don't try to recover an endpoint if port is in error state.Mathias Nyman
A USB3 device needs to be reset and re-enumarated if the port it connects to goes to a error state, with link state inactive. There is no use in trying to recover failed transactions by resetting endpoints at this stage. Tests show that in rare cases, after multiple endpoint resets of a roothub port the whole host controller might stop completely. Several retries to recover from transaction error can happen as it can take a long time before the hub thread discovers the USB3 port error and inactive link. We can't reliably detect the port error from slot or endpoint context due to a limitation in xhci, see xhci specs section 4.8.3: "There are several cases where the EP State field in the Output Endpoint Context may not reflect the current state of an endpoint" and "Software should maintain an accurate value for EP State, by tracking it with an internal variable that is driven by Events and Doorbell accesses" Same appears to be true for slot state. set a flag to the corresponding slot if a USB3 roothub port link goes inactive to prevent both queueing new URBs and resetting endpoints. Reported-by: Rapolu Chiranjeevi <chiranjeevi.rapolu@intel.com> Tested-by: Rapolu Chiranjeevi <chiranjeevi.rapolu@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-18KVM: fix typo in documentationDennis Restle
The documentation mentions a non-existing capability KVM_CAP_USER_MEM.s The right name is KVM_CAP_USER_MEMORY. Signed-off-by: Dennis Restle <derestle@htwg-konstanz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-18drm/panfrost: Make sure a BO is only unmapped when appropriateBoris Brezillon
mmu_ops->unmap() will fail when called on a BO that has not been previously mapped, and the error path in panfrost_ioctl_create_bo() can call drm_gem_object_put_unlocked() (which in turn calls panfrost_mmu_unmap()) on a BO that has not been mapped yet. Keep track of the mapped/unmapped state to avoid such issues. Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver") Cc: <stable@vger.kernel.org> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190618081343.16927-1-boris.brezillon@collabora.com
2019-06-18md: fix for divide error in status_resyncMariusz Tkaczyk
Stopping external metadata arrays during resync/recovery causes retries, loop of interrupting and starting reconstruction, until it hit at good moment to stop completely. While these retries curr_mark_cnt can be small- especially on HDD drives, so subtraction result can be smaller than 0. However it is casted to uint without checking. As a result of it the status bar in /proc/mdstat while stopping is strange (it jumps between 0% and 99%). The real problem occurs here after commit 72deb455b5ec ("block: remove CONFIG_LBDAF"). Sector_div() macro has been changed, now the divisor is casted to uint32. For db = -8 the divisior(db/32-1) becomes 0. Check if db value can be really counted and replace these macro by div64_u64() inline. Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-06-18soc: ixp4xx: npe: Fix an IS_ERR() vs NULL check in probeDan Carpenter
The devm_ioremap_resource() function doesn't return NULL, it returns error pointers. Fixes: 0b458d7b10f8 ("soc: ixp4xx: npe: Pass addresses as resources") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Olof Johansson <olof@lixom.net>
2019-06-18arm64/mm: don't initialize pgd_cache twiceMike Rapoport
When PGD_SIZE != PAGE_SIZE, arm64 uses kmem_cache for allocation of PGD memory. That cache was initialized twice: first through pgtable_cache_init() alias and then as an override for weak pgd_cache_init(). Remove the alias from pgtable_cache_init() and keep the only pgd_cache initialization in pgd_cache_init(). Fixes: caa841360134 ("x86/mm: Initialize PGD cache during mm initialization") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-18MAINTAINERS: Update my email addressHanjun Guo
The @linaro.org address is not working and bonucing, so update the references. Signed-off-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>