summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2017-10-05btrfs: prevent to set invalid default subvolidsatoru takeuchi
commit 6d6d282932d1a609e60dc4467677e0e863682f57 upstream. `btrfs sub set-default` succeeds to set an ID which isn't corresponding to any fs/file tree. If such the bad ID is set to a filesystem, we can't mount this filesystem without specifying `subvol` or `subvolid` mount options. Fixes: 6ef5ed0d386b ("Btrfs: add ioctl and incompat flag to set the default mount subvol") Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com> Reviewed-by: Qu Wenruo <quwenruo.btrfs@gmx.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05vfs: Return -ENXIO for negative SEEK_HOLE / SEEK_DATA offsetsAndreas Gruenbacher
commit fc46820b27a2d9a46f7e90c9ceb4a64a1bc5fab8 upstream. In generic_file_llseek_size, return -ENXIO for negative offsets as well as offsets beyond EOF. This affects filesystems which don't implement SEEK_HOLE / SEEK_DATA internally, possibly because they don't support holes. Fixes xfstest generic/448. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flagsSteve French
commit 1013e760d10e614dc10b5624ce9fc41563ba2e65 upstream. Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05SMB: Validate negotiate (to protect against downgrade) even if signing offSteve French
commit 0603c96f3af50e2f9299fa410c224ab1d465e0f9 upstream. As long as signing is supported (ie not a guest user connection) and connection is SMB3 or SMB3.02, then validate negotiate (protect against man in the middle downgrade attacks). We had been doing this only when signing was required, not when signing was just enabled, but this more closely matches recommended SMB3 behavior and is better security. Suggested by Metze. Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Jeremy Allison <jra@samba.org> Acked-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05cifs: release auth_key.response for reconnect.Shu Wang
commit f5c4ba816315d3b813af16f5571f86c8d4e897bd upstream. There is a race that cause cifs reconnect in cifs_mount, - cifs_mount - cifs_get_tcp_session - [ start thread cifs_demultiplex_thread - cifs_read_from_socket: -ECONNABORTED - DELAY_WORK smb2_reconnect_server ] - cifs_setup_session - [ smb2_reconnect_server ] auth_key.response was allocated in cifs_setup_session, and will release when the session destoried. So when session re- connect, auth_key.response should be check and released. Tested with my system: CIFS VFS: Free previous auth_key.response = ffff8800320bbf80 A simple auth_key.response allocation call trace: - cifs_setup_session - SMB2_sess_setup - SMB2_sess_auth_rawntlmssp_authenticate - build_ntlmssp_auth_blob - setup_ntlmv2_rsp Signed-off-by: Shu Wang <shuwang@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05cifs: release cifs root_cred after exit_cifsShu Wang
commit 94183331e815617246b1baa97e0916f358c794bb upstream. memory leak was found by kmemleak. exit_cifs_spnego should be called before cifs module removed, or cifs root_cred will not be released. kmemleak report: unreferenced object 0xffff880070a3ce40 (size 192): backtrace: kmemleak_alloc+0x4a/0xa0 kmem_cache_alloc+0xc7/0x1d0 prepare_kernel_cred+0x20/0x120 init_cifs_spnego+0x2d/0x170 [cifs] 0xffffffffc07801f3 do_one_initcall+0x51/0x1b0 do_init_module+0x60/0x1fd load_module+0x161e/0x1b60 SYSC_finit_module+0xa9/0x100 SyS_finit_module+0xe/0x10 Signed-off-by: Shu Wang <shuwang@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27ext4: fix incorrect quotaoff if the quota feature is enabledzhangyi (F)
commit b0a5a9589decd07db755d6a8d9c0910d96ff7992 upstream. Current ext4 quota should always "usage enabled" if the quota feautre is enabled. But in ext4_orphan_cleanup(), it turn quotas off directly (used for the older journaled quota), so we cannot turn it on again via "quotaon" unless umount and remount ext4. Simple reproduce: mkfs.ext4 -O project,quota /dev/vdb1 mount -o prjquota /dev/vdb1 /mnt chattr -p 123 /mnt chattr +P /mnt touch /mnt/aa /mnt/bb exec 100<>/mnt/aa rm -f /mnt/aa sync echo c > /proc/sysrq-trigger #reboot and mount mount -o prjquota /dev/vdb1 /mnt #query status quotaon -Ppv /dev/vdb1 #output quotaon: Cannot find mountpoint for device /dev/vdb1 quotaon: No correct mountpoint specified. This patch add check for journaled quotas to avoid incorrect quotaoff when ext4 has quota feautre. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27f2fs: check hot_data for roll-forward recoveryJaegeuk Kim
commit 125c9fb1ccb53eb2ea9380df40f3c743f3fb2fed upstream. We need to check HOT_DATA to truncate any previous data block when doing roll-forward recovery. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13xfs: XFS_IS_REALTIME_INODE() should be false if no rt device presentRichard Wareing
commit b31ff3cdf540110da4572e3e29bd172087af65cc upstream. If using a kernel with CONFIG_XFS_RT=y and we set the RHINHERIT flag on a directory in a filesystem that does not have a realtime device and create a new file in that directory, it gets marked as a real time file. When data is written and a fsync is issued, the filesystem attempts to flush a non-existent rt device during the fsync process. This results in a crash dereferencing a null buftarg pointer in xfs_blkdev_issue_flush(): BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: xfs_blkdev_issue_flush+0xd/0x20 ..... Call Trace: xfs_file_fsync+0x188/0x1c0 vfs_fsync_range+0x3b/0xa0 do_fsync+0x3d/0x70 SyS_fsync+0x10/0x20 do_syscall_64+0x4d/0xb0 entry_SYSCALL64_slow_path+0x25/0x25 Setting RT inode flags does not require special privileges so any unprivileged user can cause this oops to occur. To reproduce, confirm kernel is compiled with CONFIG_XFS_RT=y and run: # mkfs.xfs -f /dev/pmem0 # mount /dev/pmem0 /mnt/test # mkdir /mnt/test/foo # xfs_io -c 'chattr +t' /mnt/test/foo # xfs_io -f -c 'pwrite 0 5m' -c fsync /mnt/test/foo/bar Or just run xfstests with MKFS_OPTIONS="-d rtinherit=1" and wait. Kernels built with CONFIG_XFS_RT=n are not exposed to this bug. Fixes: f538d4da8d52 ("[XFS] write barrier support") Signed-off-by: Richard Wareing <rwareing@fb.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13btrfs: resume qgroup rescan on rw remountAleksa Sarai
commit 6c6b5a39c4bf3dbd8cf629c9f5450e983c19dbb9 upstream. Several distributions mount the "proper root" as ro during initrd and then remount it as rw before pivot_root(2). Thus, if a rescan had been aborted by a previous shutdown, the rescan would never be resumed. This issue would manifest itself as several btrfs ioctl(2)s causing the entire machine to hang when btrfs_qgroup_wait_for_completion was hit (due to the fs_info->qgroup_rescan_running flag being set but the rescan itself not being resumed). Notably, Docker's btrfs storage driver makes regular use of BTRFS_QUOTA_CTL_DISABLE and BTRFS_IOC_QUOTA_RESCAN_WAIT (causing this problem to be manifested on boot for some machines). Cc: Jeff Mahoney <jeffm@suse.com> Fixes: b382a324b60f ("Btrfs: fix qgroup rescan resume on mount") Signed-off-by: Aleksa Sarai <asarai@suse.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-13dlm: avoid double-free on error path in dlm_device_{register,unregister}Edwin Török
commit 55acdd926f6b21a5cdba23da98a48aedf19ac9c3 upstream. Can be reproduced when running dlm_controld (tested on 4.4.x, 4.12.4): # seq 1 100 | xargs -P0 -n1 dlm_tool join # seq 1 100 | xargs -P0 -n1 dlm_tool leave misc_register fails due to duplicate sysfs entry, which causes dlm_device_register to free ls->ls_device.name. In dlm_device_deregister the name was freed again, causing memory corruption. According to the comment in dlm_device_deregister the name should've been set to NULL when registration fails, so this patch does that. sysfs: cannot create duplicate filename '/dev/char/10:1' ------------[ cut here ]------------ warning: cpu: 1 pid: 4450 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x56/0x70 modules linked in: msr rfcomm dlm ccm bnep dm_crypt uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev btusb media btrtl btbcm btintel bluetooth ecdh_generic intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm snd_hda_codec_hdmi irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel thinkpad_acpi pcbc nvram snd_seq_midi snd_seq_midi_event aesni_intel snd_hda_codec_realtek snd_hda_codec_generic snd_rawmidi aes_x86_64 crypto_simd glue_helper snd_hda_intel snd_hda_codec cryptd intel_cstate arc4 snd_hda_core snd_seq snd_seq_device snd_hwdep iwldvm intel_rapl_perf mac80211 joydev input_leds iwlwifi serio_raw cfg80211 snd_pcm shpchp snd_timer snd mac_hid mei_me lpc_ich mei soundcore sunrpc parport_pc ppdev lp parport autofs4 i915 psmouse e1000e ahci libahci i2c_algo_bit sdhci_pci ptp drm_kms_helper sdhci pps_core syscopyarea sysfillrect sysimgblt fb_sys_fops drm wmi video cpu: 1 pid: 4450 comm: dlm_test.exe not tainted 4.12.4-041204-generic hardware name: lenovo 232425u/232425u, bios g2et82ww (2.02 ) 09/11/2012 task: ffff96b0cbabe140 task.stack: ffffb199027d0000 rip: 0010:sysfs_warn_dup+0x56/0x70 rsp: 0018:ffffb199027d3c58 eflags: 00010282 rax: 0000000000000038 rbx: ffff96b0e2c49158 rcx: 0000000000000006 rdx: 0000000000000000 rsi: 0000000000000086 rdi: ffff96b15e24dcc0 rbp: ffffb199027d3c70 r08: 0000000000000001 r09: 0000000000000721 r10: ffffb199027d3c00 r11: 0000000000000721 r12: ffffb199027d3cd1 r13: ffff96b1592088f0 r14: 0000000000000001 r15: ffffffffffffffef fs: 00007f78069c0700(0000) gs:ffff96b15e240000(0000) knlgs:0000000000000000 cs: 0010 ds: 0000 es: 0000 cr0: 0000000080050033 cr2: 000000178625ed28 cr3: 0000000091d3e000 cr4: 00000000001406e0 call trace: sysfs_do_create_link_sd.isra.2+0x9e/0xb0 sysfs_create_link+0x25/0x40 device_add+0x5a9/0x640 device_create_groups_vargs+0xe0/0xf0 device_create_with_groups+0x3f/0x60 ? snprintf+0x45/0x70 misc_register+0x140/0x180 device_write+0x6a8/0x790 [dlm] __vfs_write+0x37/0x160 ? apparmor_file_permission+0x1a/0x20 ? security_file_permission+0x3b/0xc0 vfs_write+0xb5/0x1a0 sys_write+0x55/0xc0 ? sys_fcntl+0x5d/0xb0 entry_syscall_64_fastpath+0x1e/0xa9 rip: 0033:0x7f78083454bd rsp: 002b:00007f78069bbd30 eflags: 00000293 orig_rax: 0000000000000001 rax: ffffffffffffffda rbx: 0000000000000006 rcx: 00007f78083454bd rdx: 000000000000009c rsi: 00007f78069bee00 rdi: 0000000000000005 rbp: 00007f77f8000a20 r08: 000000000000fcf0 r09: 0000000000000032 r10: 0000000000000024 r11: 0000000000000293 r12: 00007f78069bde00 r13: 00007f78069bee00 r14: 000000000000000a r15: 00007f78069bbd70 code: 85 c0 48 89 c3 74 12 b9 00 10 00 00 48 89 c2 31 f6 4c 89 ef e8 2c c8 ff ff 4c 89 e2 48 89 de 48 c7 c7 b0 8e 0c a8 e8 41 e8 ed ff <0f> ff 48 89 df e8 00 d5 f4 ff 5b 41 5c 41 5d 5d c3 66 0f 1f 84 ---[ end trace 40412246357cc9e0 ]--- dlm: 59f24629-ae39-44e2-9030-397ebc2eda26: leaving the lockspace group... bug: unable to handle kernel null pointer dereference at 0000000000000001 ip: [<ffffffff811a3b4a>] kmem_cache_alloc+0x7a/0x140 pgd 0 oops: 0000 [#1] smp modules linked in: dlm 8021q garp mrp stp llc openvswitch nf_defrag_ipv6 nf_conntrack libcrc32c iptable_filter dm_multipath crc32_pclmul dm_mod aesni_intel psmouse aes_x86_64 sg ablk_helper cryptd lrw gf128mul glue_helper i2c_piix4 nls_utf8 tpm_tis tpm isofs nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc xen_wdt ip_tables x_tables autofs4 hid_generic usbhid hid sr_mod cdrom sd_mod ata_generic pata_acpi 8139too serio_raw ata_piix 8139cp mii uhci_hcd ehci_pci ehci_hcd libata scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_mod ipv6 cpu: 0 pid: 394 comm: systemd-udevd tainted: g w 4.4.0+0 #1 hardware name: xen hvm domu, bios 4.7.2-2.2 05/11/2017 task: ffff880002410000 ti: ffff88000243c000 task.ti: ffff88000243c000 rip: e030:[<ffffffff811a3b4a>] [<ffffffff811a3b4a>] kmem_cache_alloc+0x7a/0x140 rsp: e02b:ffff88000243fd90 eflags: 00010202 rax: 0000000000000000 rbx: ffff8800029864d0 rcx: 000000000007b36c rdx: 000000000007b36b rsi: 00000000024000c0 rdi: ffff880036801c00 rbp: ffff88000243fdc0 r08: 0000000000018880 r09: 0000000000000054 r10: 000000000000004a r11: ffff880034ace6c0 r12: 00000000024000c0 r13: ffff880036801c00 r14: 0000000000000001 r15: ffffffff8118dcc2 fs: 00007f0ab77548c0(0000) gs:ffff880036e00000(0000) knlgs:0000000000000000 cs: e033 ds: 0000 es: 0000 cr0: 0000000080050033 cr2: 0000000000000001 cr3: 000000000332d000 cr4: 0000000000040660 stack: ffffffff8118dc90 ffff8800029864d0 0000000000000000 ffff88003430b0b0 ffff880034b78320 ffff88003430b0b0 ffff88000243fdf8 ffffffff8118dcc2 ffff8800349c6700 ffff8800029864d0 000000000000000b 00007f0ab7754b90 call trace: [<ffffffff8118dc90>] ? anon_vma_fork+0x60/0x140 [<ffffffff8118dcc2>] anon_vma_fork+0x92/0x140 [<ffffffff8107033e>] copy_process+0xcae/0x1a80 [<ffffffff8107128b>] _do_fork+0x8b/0x2d0 [<ffffffff81071579>] sys_clone+0x19/0x20 [<ffffffff815a30ae>] entry_syscall_64_fastpath+0x12/0x71 ] code: f6 75 1c 4c 89 fa 44 89 e6 4c 89 ef e8 a7 e4 00 00 41 f7 c4 00 80 00 00 49 89 c6 74 47 eb 32 49 63 45 20 48 8d 4a 01 4d 8b 45 00 <49> 8b 1c 06 4c 89 f0 65 49 0f c7 08 0f 94 c0 84 c0 74 ac 49 63 rip [<ffffffff811a3b4a>] kmem_cache_alloc+0x7a/0x140 rsp <ffff88000243fd90> cr2: 0000000000000001 --[ end trace 70cb9fd1b164a0e8 ]-- Signed-off-by: Edwin Török <edvin.torok@citrix.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-07epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove()Oleg Nesterov
commit 138e4ad67afd5c6c318b056b4d17c17f2c0ca5c0 upstream. The race was introduced by me in commit 971316f0503a ("epoll: ep_unregister_pollwait() can use the freed pwq->whead"). I did not realize that nothing can protect eventpoll after ep_poll_callback() sets ->whead = NULL, only whead->lock can save us from the race with ep_free() or ep_remove(). Move ->whead = NULL to the end of ep_poll_callback() and add the necessary barriers. TODO: cleanup the ewake/EPOLLEXCLUSIVE logic, it was confusing even before this patch. Hopefully this explains use-after-free reported by syzcaller: BUG: KASAN: use-after-free in debug_spin_lock_before ... _raw_spin_lock_irqsave+0x4a/0x60 kernel/locking/spinlock.c:159 ep_poll_callback+0x29f/0xff0 fs/eventpoll.c:1148 this is spin_lock(eventpoll->lock), ... Freed by task 17774: ... kfree+0xe8/0x2c0 mm/slub.c:3883 ep_free+0x22c/0x2a0 fs/eventpoll.c:865 Fixes: 971316f0503a ("epoll: ep_unregister_pollwait() can use the freed pwq->whead") Reported-by: 范龙飞 <long7573@126.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-07CIFS: remove endian related sparse warningSteve French
commit 6e3c1529c39e92ed64ca41d53abadabbaa1d5393 upstream. Recent patch had an endian warning ie cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup() Signed-off-by: Steve French <smfrench@gmail.com> CC: Ronnie Sahlberg <lsahlber@redhat.com> Acked-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-07CIFS: Fix maximum SMB2 header sizePavel Shilovsky
commit 9e37b1784f2be9397a903307574ee565bbadfd75 upstream. Currently the maximum size of SMB2/3 header is set incorrectly which leads to hanging of directory listing operations on encrypted SMB3 connections. Fix this by setting the maximum size to 170 bytes that is calculated as RFC1002 length field size (4) + transform header size (52) + SMB2 header size (64) + create response size (56). Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com> Acked-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-30nfsd: Limit end of page list when decoding NFSv4 WRITEChuck Lever
commit fc788f64f1f3eb31e87d4f53bcf1ab76590d5838 upstream. When processing an NFSv4 WRITE operation, argp->end should never point past the end of the data in the final page of the page list. Otherwise, nfsd4_decode_compound can walk into uninitialized memory. More critical, nfsd4_decode_write is failing to increment argp->pagelen when it increments argp->pagelist. This can cause later xdr decoders to assume more data is available than really is, which can cause server crashes on malformed requests. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-30cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup()Ronnie Sahlberg
commit d3edede29f74d335f81d95a4588f5f136a9f7dcf upstream. Add checking for the path component length and verify it is <= the maximum that the server advertizes via FileFsAttributeInformation. With this patch cifs.ko will now return ENAMETOOLONG instead of ENOENT when users to access an overlong path. To test this, try to cd into a (non-existing) directory on a CIFS share that has a too long name: cd /mnt/aaaaaaaaaaaaaaa... and it now should show a good error message from the shell: bash: cd: /mnt/aaaaaaaaaaaaaaaa...aaaaaa: File name too long rh bz 1153996 Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-30cifs: Fix df output for users with quota limitsSachin Prabhu
commit 42bec214d8bd432be6d32a1acb0a9079ecd4d142 upstream. The df for a SMB2 share triggers a GetInfo call for FS_FULL_SIZE_INFORMATION. The values returned are used to populate struct statfs. The problem is that none of the information returned by the call contains the total blocks available on the filesystem. Instead we use the blocks available to the user ie. quota limitation when filling out statfs.f_blocks. The information returned does contain Actual free units on the filesystem and is used to populate statfs.f_bfree. For users with quota enabled, it can lead to situations where the total free space reported is more than the total blocks on the system ending up with df reports like the following # df -h /mnt/a Filesystem Size Used Avail Use% Mounted on //192.168.22.10/a 2.5G -2.3G 2.5G - /mnt/a To fix this problem, we instead populate both statfs.f_bfree with the same value as statfs.f_bavail ie. CallerAvailableAllocationUnits. This is similar to what is done already in the code for cifs and df now reports the quota information for the user used to mount the share. # df --si /mnt/a Filesystem Size Used Avail Use% Mounted on //192.168.22.10/a 2.7G 101M 2.6G 4% /mnt/a Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Pierguido Lambri <plambri@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-16pnfs/blocklayout: require 64-bit sector_tChristoph Hellwig
commit 8a9d6e964d318533ba3d2901ce153ba317c99a89 upstream. The blocklayout code does not compile cleanly for a 32-bit sector_t, and also has no reliable checks for devices sizes, which makes it unsafe to use with a kernel that doesn't support large block devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Arnd Bergmann <arnd@arndb.de> Fixes: 5c83746a0cf2 ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing") Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-16fuse: initialize the flock flag in fuse_file on allocationMateusz Jurczyk
commit 68227c03cba84a24faf8a7277d2b1a03c8959c2c upstream. Before the patch, the flock flag could remain uninitialized for the lifespan of the fuse_file allocation. Unless set to true in fuse_file_flock(), it would remain in an indeterminate state until read in an if statement in fuse_release_common(). This could consequently lead to taking an unexpected branch in the code. The bug was discovered by a runtime instrumentation designed to detect use of uninitialized memory in the kernel. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Fixes: 37fb3a30b462 ("fuse: fix flock") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11f2fs: sanity check checkpoint segno and blkoffJin Qian
commit 15d3042a937c13f5d9244241c7a9c8416ff6e82a upstream. Make sure segno and blkoff read from raw image are valid. Cc: stable@vger.kernel.org Signed-off-by: Jin Qian <jinqian@google.com> [Jaegeuk Kim: adjust minor coding style] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> [AmitP: Found in Android Security bulletin for Aug'17, fixes CVE-2017-10663] Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11ext4: fix overflow caused by missing cast in ext4_resize_fs()Jerry Lee
commit aec51758ce10a9c847a62a48a168f8c804c6e053 upstream. On a 32-bit platform, the value of n_blcoks_count may be wrong during the file system is resized to size larger than 2^32 blocks. This may caused the superblock being corrupted with zero blocks count. Fixes: 1c6bd7173d66 Signed-off-by: Jerry Lee <jerrylee@qnap.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesizeJan Kara
commit fcf5ea10992fbac3c7473a1db33d56a139333cd1 upstream. ext4_find_unwritten_pgoff() does not properly handle a situation when starting index is in the middle of a page and blocksize < pagesize. The following command shows the bug on filesystem with 1k blocksize: xfs_io -f -c "falloc 0 4k" \ -c "pwrite 1k 1k" \ -c "pwrite 3k 1k" \ -c "seek -a -r 0" foo In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048, SEEK_DATA) will return the correct result. Fix the problem by neglecting buffers in a page before starting offset. Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11pstore: Use dynamic spinlock initializerKees Cook
commit e9a330c4289f2ba1ca4bf98c2b430ab165a8931b upstream. The per-prz spinlock should be using the dynamic initializer so that lockdep can correctly track it. Without this, under lockdep, we get a warning at boot that the lock is in non-static memory. Fixes: 109704492ef6 ("pstore: Make spinlock per zone instead of global") Fixes: 76d5692a5803 ("pstore: Correctly initialize spinlock and flags") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11pstore: Correctly initialize spinlock and flagsKees Cook
commit 76d5692a58031696e282384cbd893832bc92bd76 upstream. The ram backend wasn't always initializing its spinlock correctly. Since it was coming from kzalloc memory, though, it was harmless on architectures that initialize unlocked spinlocks to 0 (at least x86 and ARM). This also fixes a possibly ignored flag setting too. When running under CONFIG_DEBUG_SPINLOCK, the following Oops was visible: [ 0.760836] persistent_ram: found existing buffer, size 29988, start 29988 [ 0.765112] persistent_ram: found existing buffer, size 30105, start 30105 [ 0.769435] persistent_ram: found existing buffer, size 118542, start 118542 [ 0.785960] persistent_ram: found existing buffer, size 0, start 0 [ 0.786098] persistent_ram: found existing buffer, size 0, start 0 [ 0.786131] pstore: using zlib compression [ 0.790716] BUG: spinlock bad magic on CPU#0, swapper/0/1 [ 0.790729] lock: 0xffffffc0d1ca9bb0, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 [ 0.790742] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.10.0-rc2+ #913 [ 0.790747] Hardware name: Google Kevin (DT) [ 0.790750] Call trace: [ 0.790768] [<ffffff900808ae88>] dump_backtrace+0x0/0x2bc [ 0.790780] [<ffffff900808b164>] show_stack+0x20/0x28 [ 0.790794] [<ffffff9008460ee0>] dump_stack+0xa4/0xcc [ 0.790809] [<ffffff9008113cfc>] spin_dump+0xe0/0xf0 [ 0.790821] [<ffffff9008113d3c>] spin_bug+0x30/0x3c [ 0.790834] [<ffffff9008113e28>] do_raw_spin_lock+0x50/0x1b8 [ 0.790846] [<ffffff9008a2d2ec>] _raw_spin_lock_irqsave+0x54/0x6c [ 0.790862] [<ffffff90083ac3b4>] buffer_size_add+0x48/0xcc [ 0.790875] [<ffffff90083acb34>] persistent_ram_write+0x60/0x11c [ 0.790888] [<ffffff90083aab1c>] ramoops_pstore_write_buf+0xd4/0x2a4 [ 0.790900] [<ffffff90083a9d3c>] pstore_console_write+0xf0/0x134 [ 0.790912] [<ffffff900811c304>] console_unlock+0x48c/0x5e8 [ 0.790923] [<ffffff900811da18>] register_console+0x3b0/0x4d4 [ 0.790935] [<ffffff90083aa7d0>] pstore_register+0x1a8/0x234 [ 0.790947] [<ffffff90083ac250>] ramoops_probe+0x6b8/0x7d4 [ 0.790961] [<ffffff90085ca548>] platform_drv_probe+0x7c/0xd0 [ 0.790972] [<ffffff90085c76ac>] driver_probe_device+0x1b4/0x3bc [ 0.790982] [<ffffff90085c7ac8>] __device_attach_driver+0xc8/0xf4 [ 0.790996] [<ffffff90085c4bfc>] bus_for_each_drv+0xb4/0xe4 [ 0.791006] [<ffffff90085c7414>] __device_attach+0xd0/0x158 [ 0.791016] [<ffffff90085c7b18>] device_initial_probe+0x24/0x30 [ 0.791026] [<ffffff90085c648c>] bus_probe_device+0x50/0xe4 [ 0.791038] [<ffffff90085c35b8>] device_add+0x3a4/0x76c [ 0.791051] [<ffffff90087d0e84>] of_device_add+0x74/0x84 [ 0.791062] [<ffffff90087d19b8>] of_platform_device_create_pdata+0xc0/0x100 [ 0.791073] [<ffffff90087d1a2c>] of_platform_device_create+0x34/0x40 [ 0.791086] [<ffffff900903c910>] of_platform_default_populate_init+0x58/0x78 [ 0.791097] [<ffffff90080831fc>] do_one_initcall+0x88/0x160 [ 0.791109] [<ffffff90090010ac>] kernel_init_freeable+0x264/0x31c [ 0.791123] [<ffffff9008a25bd0>] kernel_init+0x18/0x11c [ 0.791133] [<ffffff9008082ec0>] ret_from_fork+0x10/0x50 [ 0.793717] console [pstore-1] enabled [ 0.797845] pstore: Registered ramoops as persistent store backend [ 0.804647] ramoops: attached 0x100000@0xf7edc000, ecc: 0/0 Fixes: 663deb47880f ("pstore: Allow prz to control need for locking") Fixes: 109704492ef6 ("pstore: Make spinlock per zone instead of global") Reported-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11pstore: Allow prz to control need for lockingJoel Fernandes
commit 663deb47880f2283809669563c5a52ac7c6aef1a upstream. In preparation of not locking at all for certain buffers depending on if there's contention, make locking optional depending on the initialization of the prz. Signed-off-by: Joel Fernandes <joelaf@google.com> [kees: moved locking flag into prz instead of via caller arguments] Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11dentry name snapshotsAl Viro
commit 49d31c2f389acfe83417083e1208422b4091cd9e upstream. take_dentry_name_snapshot() takes a safe snapshot of dentry name; if the name is a short one, it gets copied into caller-supplied structure, otherwise an extra reference to external name is grabbed (those are never modified). In either case the pointer to stable string is stored into the same structure. dentry must be held by the caller of take_dentry_name_snapshot(), but may be freely dropped afterwards - the snapshot will stay until destroyed by release_dentry_name_snapshot(). Intended use: struct name_snapshot s; take_dentry_name_snapshot(&s, dentry); ... access s.name ... release_dentry_name_snapshot(&s); Replaces fsnotify_oldname_...(), gets used in fsnotify to obtain the name to pass down with event. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11pstore: Make spinlock per zone instead of globalJoel Fernandes
commit 109704492ef637956265ec2eb72ae7b3b39eb6f4 upstream. Currently pstore has a global spinlock for all zones. Since the zones are independent and modify different areas of memory, there's no need to have a global lock, so we should use a per-zone lock as introduced here. Also, when ramoops's ftrace use-case has a FTRACE_PER_CPU flag introduced later, which splits the ftrace memory area into a single zone per CPU, it will eliminate the need for locking. In preparation for this, make the locking optional. Signed-off-by: Joel Fernandes <joelaf@google.com> [kees: updated commit message] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Leo Yan <leo.yan@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27udf: Fix deadlock between writeback and udf_setsize()Jan Kara
commit f2e95355891153f66d4156bf3a142c6489cd78c6 upstream. udf_setsize() called truncate_setsize() with i_data_sem held. Thus truncate_pagecache() called from truncate_setsize() could lock a page under i_data_sem which can deadlock as page lock ranks below i_data_sem - e. g. writeback can hold page lock and try to acquire i_data_sem to map a block. Fix the problem by moving truncate_setsize() calls from under i_data_sem. It is safe for us to change i_size without holding i_data_sem as all the places that depend on i_size being stable already hold inode_lock. Fixes: 7e49b6f2480cb9a9e7322a91592e56a5c85361f5 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27NFS: only invalidate dentrys that are clearly invalid.NeilBrown
commit cc89684c9a265828ce061037f1f79f4a68ccd3f7 upstream. Since commit bafc9b754f75 ("vfs: More precise tests in d_invalidate") in v3.18, a return of '0' from ->d_revalidate() will cause the dentry to be invalidated even if it has filesystems mounted on or it or on a descendant. The mounted filesystem is unmounted. This means we need to be careful not to return 0 unless the directory referred to truly is invalid. So -ESTALE or -ENOENT should invalidate the directory. Other errors such a -EPERM or -ERESTARTSYS should be returned from ->d_revalidate() so they are propagated to the caller. A particular problem can be demonstrated by: 1/ mount an NFS filesystem using NFSv3 on /mnt 2/ mount any other filesystem on /mnt/foo 3/ ls /mnt/foo 4/ turn off network, or otherwise make the server unable to respond 5/ ls /mnt/foo & 6/ cat /proc/$!/stack # note that nfs_lookup_revalidate is in the call stack 7/ kill -9 $! # this results in -ERESTARTSYS being returned 8/ observe that /mnt/foo has been unmounted. This patch changes nfs_lookup_revalidate() to only treat -ESTALE from nfs_lookup_verify_inode() and -ESTALE or -ENOENT from ->lookup() as indicating an invalid inode. Other errors are returned. Also nfs_check_inode_attributes() is changed to return -ESTALE rather than -EIO. This is consistent with the error returned in similar circumstances from nfs_update_inode(). As this bug allows any user to unmount a filesystem mounted on an NFS filesystem, this fix is suitable for stable kernels. Fixes: bafc9b754f75 ("vfs: More precise tests in d_invalidate") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27f2fs: Don't clear SGID when inheriting ACLsJaegeuk Kim
commit c925dc162f770578ff4a65ec9b08270382dba9e6 upstream. This patch copies commit b7f8a09f80: "btrfs: Don't clear SGID when inheriting ACLs" written by Jan. Fixes: 073931017b49d9458aa351605b43a7e34598caef Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27CIFS: Fix handle_cancelled_mid callback initializationPavel Shilovsky
Commit 99e214e006cf ("Handle mismatched open calls") was applied with errors that result in initializing handle_cancelled_mid callback twice in smb21_operations and smb30_operations structures but not initializing it in smb20_operations structure. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21exec: Limit arg stack to at most 75% of _STK_LIMKees Cook
commit da029c11e6b12f321f36dac8771e833b65cec962 upstream. To avoid pathological stack usage or the need to special-case setuid execs, just limit all arg stack usage to at most 75% of _STK_LIM (6MB). Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21fs/dcache.c: fix spin lockup issue on nlru->lockSahitya Tummala
commit b17c070fb624cf10162cf92ea5e1ec25cd8ac176 upstream. __list_lru_walk_one() acquires nlru spin lock (nlru->lock) for longer duration if there are more number of items in the lru list. As per the current code, it can hold the spin lock for upto maximum UINT_MAX entries at a time. So if there are more number of items in the lru list, then "BUG: spinlock lockup suspected" is observed in the below path: spin_bug+0x90 do_raw_spin_lock+0xfc _raw_spin_lock+0x28 list_lru_add+0x28 dput+0x1c8 path_put+0x20 terminate_walk+0x3c path_lookupat+0x100 filename_lookup+0x6c user_path_at_empty+0x54 SyS_faccessat+0xd0 el0_svc_naked+0x24 This nlru->lock is acquired by another CPU in this path - d_lru_shrink_move+0x34 dentry_lru_isolate_shrink+0x48 __list_lru_walk_one.isra.10+0x94 list_lru_walk_node+0x40 shrink_dcache_sb+0x60 do_remount_sb+0xbc do_emergency_remount+0xb0 process_one_work+0x228 worker_thread+0x2e0 kthread+0xf4 ret_from_fork+0x10 Fix this lockup by reducing the number of entries to be shrinked from the lru list to 1024 at once. Also, add cond_resched() before processing the lru list again. Link: http://marc.info/?t=149722864900001&r=1&w=2 Link: http://lkml.kernel.org/r/1498707575-2472-1-git-send-email-stummala@codeaurora.org Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Vladimir Davydov <vdavydov.dev@gmail.com> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Alexander Polakov <apolyakov@beget.ru> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29CIFS: Improve readdir verbosityPavel Shilovsky
commit dcd87838c06f05ab7650b249ebf0d5b57ae63e1e upstream. Downgrade the loglevel for SMB2 to prevent filling the log with messages if e.g. readdir was interrupted. Also make SMB2 and SMB1 codepaths do the same logging during readdir. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAILNeilBrown
commit 9fa4eb8e490a28de40964b1b0e583d8db4c7e57c upstream. If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL ioctl, autofs4_d_automount() will return ERR_PTR(status) with that status to follow_automount(), which will then dereference an invalid pointer. So treat a positive status the same as zero, and map to ENOENT. See comment in systemd src/core/automount.c::automount_send_ready(). Link: http://lkml.kernel.org/r/871sqwczx5.fsf@notabene.neil.brown.name Signed-off-by: NeilBrown <neilb@suse.com> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29fs/exec.c: account for argv/envp pointersKees Cook
commit 98da7d08850fb8bdeb395d6368ed15753304aa0c upstream. When limiting the argv/envp strings during exec to 1/4 of the stack limit, the storage of the pointers to the strings was not included. This means that an exec with huge numbers of tiny strings could eat 1/4 of the stack limit in strings and then additional space would be later used by the pointers to the strings. For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721 single-byte strings would consume less than 2MB of stack, the max (8MB / 4) amount allowed, but the pointers to the strings would consume the remaining additional stack space (1677721 * 4 == 6710884). The result (1677721 + 6710884 == 8388605) would exhaust stack space entirely. Controlling this stack exhaustion could result in pathological behavior in setuid binaries (CVE-2017-1000365). [akpm@linux-foundation.org: additional commenting from Kees] Fixes: b6a2fea39318 ("mm: variable length argument support") Link: http://lkml.kernel.org/r/20170622001720.GA32173@beast Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Qualys Security Advisory <qsa@qualys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-26mm: larger stack guard gap, between vmasHugh Dickins
commit 1be7107fbe18eed3e319a6c3e83c78254b693acb upstream. Stack guard page is a useful feature to reduce a risk of stack smashing into a different mapping. We have been using a single page gap which is sufficient to prevent having stack adjacent to a different mapping. But this seems to be insufficient in the light of the stack usage in userspace. E.g. glibc uses as large as 64kB alloca() in many commonly used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX] which is 256kB or stack strings with MAX_ARG_STRLEN. This will become especially dangerous for suid binaries and the default no limit for the stack size limit because those applications can be tricked to consume a large portion of the stack and a single glibc call could jump over the guard page. These attacks are not theoretical, unfortunatelly. Make those attacks less probable by increasing the stack guard gap to 1MB (on systems with 4k pages; but make it depend on the page size because systems with larger base pages might cap stack allocations in the PAGE_SIZE units) which should cover larger alloca() and VLA stack allocations. It is obviously not a full fix because the problem is somehow inherent, but it should reduce attack space a lot. One could argue that the gap size should be configurable from userspace, but that can be done later when somebody finds that the new 1MB is wrong for some special case applications. For now, add a kernel command line option (stack_guard_gap) to specify the stack gap size (in page units). Implementation wise, first delete all the old code for stack guard page: because although we could get away with accounting one extra page in a stack vma, accounting a larger gap can break userspace - case in point, a program run with "ulimit -S -v 20000" failed when the 1MB gap was counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK and strict non-overcommit mode. Instead of keeping gap inside the stack vma, maintain the stack guard gap as a gap between vmas: using vm_start_gap() in place of vm_start (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few places which need to respect the gap - mainly arch_get_unmapped_area(), and and the vma tree's subtree_gap support for that. Original-patch-by: Oleg Nesterov <oleg@redhat.com> Original-patch-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Tested-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [wt: backport to 4.11: adjust context] [wt: backport to 4.9: adjust context ; kernel doc was not in admin-guide] [wt: backport to 4.4: adjust context ; drop ppc hugetlb_radix changes] [wt: backport to 3.18: adjust context ; no FOLL_POPULATE ; s390 uses generic arch_get_unmapped_area()] Signed-off-by: Willy Tarreau <w@1wt.eu> [gkh: minor build fixes for 3.18] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-26configfs: Fix race between create_link and configfs_rmdirNicholas Bellinger
commit ba80aa909c99802c428682c352b0ee0baac0acd3 upstream. This patch closes a long standing race in configfs between the creation of a new symlink in create_link(), while the symlink target's config_item is being concurrently removed via configfs_rmdir(). This can happen because the symlink target's reference is obtained by config_item_get() in create_link() before the CONFIGFS_USET_DROPPING bit set by configfs_detach_prep() during configfs_rmdir() shutdown is actually checked.. This originally manifested itself on ppc64 on v4.8.y under heavy load using ibmvscsi target ports with Novalink API: [ 7877.289863] rpadlpar_io: slot U8247.22L.212A91A-V1-C8 added [ 7879.893760] ------------[ cut here ]------------ [ 7879.893768] WARNING: CPU: 15 PID: 17585 at ./include/linux/kref.h:46 config_item_get+0x7c/0x90 [configfs] [ 7879.893811] CPU: 15 PID: 17585 Comm: targetcli Tainted: G O 4.8.17-customv2.22 #12 [ 7879.893812] task: c00000018a0d3400 task.stack: c0000001f3b40000 [ 7879.893813] NIP: d000000002c664ec LR: d000000002c60980 CTR: c000000000b70870 [ 7879.893814] REGS: c0000001f3b43810 TRAP: 0700 Tainted: G O (4.8.17-customv2.22) [ 7879.893815] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 28222242 XER: 00000000 [ 7879.893820] CFAR: d000000002c664bc SOFTE: 1 GPR00: d000000002c60980 c0000001f3b43a90 d000000002c70908 c0000000fbc06820 GPR04: c0000001ef1bd900 0000000000000004 0000000000000001 0000000000000000 GPR08: 0000000000000000 0000000000000001 d000000002c69560 d000000002c66d80 GPR12: c000000000b70870 c00000000e798700 c0000001f3b43ca0 c0000001d4949d40 GPR16: c00000014637e1c0 0000000000000000 0000000000000000 c0000000f2392940 GPR20: c0000001f3b43b98 0000000000000041 0000000000600000 0000000000000000 GPR24: fffffffffffff000 0000000000000000 d000000002c60be0 c0000001f1dac490 GPR28: 0000000000000004 0000000000000000 c0000001ef1bd900 c0000000f2392940 [ 7879.893839] NIP [d000000002c664ec] config_item_get+0x7c/0x90 [configfs] [ 7879.893841] LR [d000000002c60980] check_perm+0x80/0x2e0 [configfs] [ 7879.893842] Call Trace: [ 7879.893844] [c0000001f3b43ac0] [d000000002c60980] check_perm+0x80/0x2e0 [configfs] [ 7879.893847] [c0000001f3b43b10] [c000000000329770] do_dentry_open+0x2c0/0x460 [ 7879.893849] [c0000001f3b43b70] [c000000000344480] path_openat+0x210/0x1490 [ 7879.893851] [c0000001f3b43c80] [c00000000034708c] do_filp_open+0xfc/0x170 [ 7879.893853] [c0000001f3b43db0] [c00000000032b5bc] do_sys_open+0x1cc/0x390 [ 7879.893856] [c0000001f3b43e30] [c000000000009584] system_call+0x38/0xec [ 7879.893856] Instruction dump: [ 7879.893858] 409d0014 38210030 e8010010 7c0803a6 4e800020 3d220000 e94981e0 892a0000 [ 7879.893861] 2f890000 409effe0 39200001 992a0000 <0fe00000> 4bffffd0 60000000 60000000 [ 7879.893866] ---[ end trace 14078f0b3b5ad0aa ]--- To close this race, go ahead and obtain the symlink's target config_item reference only after the existing CONFIGFS_USET_DROPPING check succeeds. This way, if configfs_rmdir() wins create_link() will return -ENONET, and if create_link() wins configfs_rmdir() will return -EBUSY. Reported-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-26Call echo service immediately after socket reconnectSachin Prabhu
commit b8c600120fc87d53642476f48c8055b38d6e14c7 upstream. Commit 4fcd1813e640 ("Fix reconnect to not defer smb3 session reconnect long after socket reconnect") changes the behaviour of the SMB2 echo service and causes it to renegotiate after a socket reconnect. However under default settings, the echo service could take up to 120 seconds to be scheduled. The patch forces the echo service to be called immediately resulting a negotiate call being made immediately on reconnect. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com> Acked-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14btrfs: fix memory leak in update_space_info failure pathJeff Mahoney
commit 896533a7da929136d0432713f02a3edffece2826 upstream. If we fail to add the space_info kobject, we'll leak the memory for the percpu counter. Fixes: 6ab0a2029c (btrfs: publish allocation data in sysfs) Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14btrfs: use correct types for page indices in btrfs_page_exists_in_rangeDavid Sterba
commit cc2b702c52094b637a351d7491ac5200331d0445 upstream. Variables start_idx and end_idx are supposed to hold a page index derived from the file offsets. The int type is not the right one though, offsets larger than 1 << 44 will get silently trimmed off the high bits. (1 << 44 is 16TiB) What can go wrong, if start is below the boundary and end gets trimmed: - if there's a page after start, we'll find it (radix_tree_gang_lookup_slot) - the final check "if (page->index <= end_idx)" will unexpectedly fail The function will return false, ie. "there's no page in the range", although there is at least one. btrfs_page_exists_in_range is used to prevent races in: * in hole punching, where we make sure there are not pages in the truncated range, otherwise we'll wait for them to finish and redo truncation, but we're going to replace the pages with holes anyway so the only problem is the intermediate state * lock_extent_direct: we want to make sure there are no pages before we lock and start DIO, to prevent stale data reads For practical occurence of the bug, there are several constaints. The file must be quite large, the affected range must cross the 16TiB boundary and the internal state of the file pages and pending operations must match. Also, we must not have started any ordered data in the range, otherwise we don't even reach the buggy function check. DIO locking tries hard in several places to avoid deadlocks with buffered IO and avoids waiting for ranges. The worst consequence seems to be stale data read. CC: Liu Bo <bo.li.liu@oracle.com> Fixes: fc4adbff823f7 ("btrfs: Drop EXTENT_UPTODATE check in hole punching and direct locking") Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14ext4: keep existing extra fields when inode expandsKonstantin Khlebnikov
commit 887a9730614727c4fff7cb756711b190593fc1df upstream. ext4_expand_extra_isize() should clear only space between old and new size. Fixes: 6dd4ee7cab7e # v2.6.23 Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14ext4: fix SEEK_HOLEJan Kara
commit 7d95eddf313c88b24f99d4ca9c2411a4b82fef33 upstream. Currently, SEEK_HOLE implementation in ext4 may both return that there's a hole at some offset although that offset already has data and skip some holes during a search for the next hole. The first problem is demostrated by: xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "seek -h 0" file wrote 57344/57344 bytes at offset 0 56 KiB, 14 ops; 0.0000 sec (2.054 GiB/sec and 538461.5385 ops/sec) Whence Result HOLE 0 Where we can see that SEEK_HOLE wrongly returned offset 0 as containing a hole although we have written data there. The second problem can be demonstrated by: xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k" -c "seek -h 0" file wrote 57344/57344 bytes at offset 0 56 KiB, 14 ops; 0.0000 sec (1.978 GiB/sec and 518518.5185 ops/sec) wrote 8192/8192 bytes at offset 131072 8 KiB, 2 ops; 0.0000 sec (2 GiB/sec and 500000.0000 ops/sec) Whence Result HOLE 139264 Where we can see that hole at offsets 56k..128k has been ignored by the SEEK_HOLE call. The underlying problem is in the ext4_find_unwritten_pgoff() which is just buggy. In some cases it fails to update returned offset when it finds a hole (when no pages are found or when the first found page has higher index than expected), in some cases conditions for detecting hole are just missing (we fail to detect a situation where indices of returned pages are not contiguous). Fix ext4_find_unwritten_pgoff() to properly detect non-contiguous page indices and also handle all cases where we got less pages then expected in one place and handle it properly there. Fixes: c8c0df241cc2719b1262e627f999638411934f60 CC: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14nfsd4: fix null dereference on replayJ. Bruce Fields
commit 9a307403d374b993061f5992a6e260c944920d0b upstream. if we receive a compound such that: - the sessionid, slot, and sequence number in the SEQUENCE op match a cached succesful reply with N ops, and - the Nth operation of the compound is a PUTFH, PUTPUBFH, PUTROOTFH, or RESTOREFH, then nfsd4_sequence will return 0 and set cstate->status to nfserr_replay_cache. The current filehandle will not be set. This will cause us to call check_nfsd_access with first argument NULL. To nfsd4_compound it looks like we just succesfully executed an operation that set a filehandle, but the current filehandle is not set. Fix this by moving the nfserr_replay_cache earlier. There was never any reason to have it after the encode_op label, since the only case where he hit that is when opdesc->op_func sets it. Note that there are two ways we could hit this case: - a client is resending a previously sent compound that ended with one of the four PUTFH-like operations, or - a client is sending a *new* compound that (incorrectly) shares sessionid, slot, and sequence number with a previously sent compound, and the length of the previously sent compound happens to match the position of a PUTFH-like operation in the new compound. The second is obviously incorrect client behavior. The first is also very strange--the only purpose of a PUTFH-like operation is to set the current filehandle to be used by the following operation, so there's no point in having it as the last in a compound. So it's likely this requires a buggy or malicious client to reproduce. Reported-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07xfs: fix unaligned access in xfs_btree_visit_blocksEric Sandeen
commit a4d768e702de224cc85e0c8eac9311763403b368 upstream. This structure copy was throwing unaligned access warnings on sparc64: Kernel unaligned access at TPC[1043c088] xfs_btree_visit_blocks+0x88/0xe0 [xfs] xfs_btree_copy_ptrs does a memcpy, which avoids it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07xfs: bad assertion for delalloc an extent that start at i_sizeZorro Lang
commit 892d2a5f705723b2cb488bfb38bcbdcf83273184 upstream. By run fsstress long enough time enough in RHEL-7, I find an assertion failure (harder to reproduce on linux-4.11, but problem is still there): XFS: Assertion failed: (iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c The assertion is in xfs_getbmap() funciton: if (map[i].br_startblock == DELAYSTARTBLOCK && --> map[i].br_startoff <= XFS_B_TO_FSB(mp, XFS_ISIZE(ip))) ASSERT((iflags & BMV_IF_DELALLOC) != 0); When map[i].br_startoff == XFS_B_TO_FSB(mp, XFS_ISIZE(ip)), the startoff is just at EOF. But we only need to make sure delalloc extents that are within EOF, not include EOF. Signed-off-by: Zorro Lang <zlang@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07xfs: fix indlen accounting error on partial delalloc conversionBrian Foster
commit 0daaecacb83bc6b656a56393ab77a31c28139bc7 upstream. The delalloc -> real block conversion path uses an incorrect calculation in the case where the middle part of a delalloc extent is being converted. This is documented as a rare situation because XFS generally attempts to maximize contiguity by converting as much of a delalloc extent as possible. If this situation does occur, the indlen reservation for the two new delalloc extents left behind by the conversion of the middle range is calculated and compared with the original reservation. If more blocks are required, the delta is allocated from the global block pool. This delta value can be characterized as the difference between the new total requirement (temp + temp2) and the currently available reservation minus those blocks that have already been allocated (startblockval(PREV.br_startblock) - allocated). The problem is that the current code does not account for previously allocated blocks correctly. It subtracts the current allocation count from the (new - old) delta rather than the old indlen reservation. This means that more indlen blocks than have been allocated end up stashed in the remaining extents and free space accounting is broken as a result. Fix up the calculation to subtract the allocated block count from the original extent indlen and thus correctly allocate the reservation delta based on the difference between the new total requirement and the unused blocks from the original reservation. Also remove a bogus assert that contradicts the fact that the new indlen reservation can be larger than the original indlen reservation. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07xfs: fix up quotacheck buffer list error handlingBrian Foster
commit 20e8a063786050083fe05b4f45be338c60b49126 upstream. The quotacheck error handling of the delwri buffer list assumes the resident buffers are locked and doesn't clear the _XBF_DELWRI_Q flag on the buffers that are dequeued. This can lead to assert failures on buffer release and possibly other locking problems. Move this code to a delwri queue cancel helper function to encapsulate the logic required to properly release buffers from a delwri queue. Update the helper to clear the delwri queue flag and call it from quotacheck. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07xfs: prevent multi-fsb dir readahead from reading random blocksBrian Foster
commit cb52ee334a45ae6c78a3999e4b473c43ddc528f4 upstream. Directory block readahead uses a complex iteration mechanism to map between high-level directory blocks and underlying physical extents. This mechanism attempts to traverse the higher-level dir blocks in a manner that handles multi-fsb directory blocks and simultaneously maintains a reference to the corresponding physical blocks. This logic doesn't handle certain (discontiguous) physical extent layouts correctly with multi-fsb directory blocks. For example, consider the case of a 4k FSB filesystem with a 2 FSB (8k) directory block size and a directory with the following extent layout: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..7]: 88..95 0 (88..95) 8 1: [8..15]: 80..87 0 (80..87) 8 2: [16..39]: 168..191 0 (168..191) 24 3: [40..63]: 5242952..5242975 1 (72..95) 24 Directory block 0 spans physical extents 0 and 1, dirblk 1 lies entirely within extent 2 and dirblk 2 spans extents 2 and 3. Because extent 2 is larger than the directory block size, the readahead code erroneously assumes the block is contiguous and issues a readahead based on the physical mapping of the first fsb of the dirblk. This results in read verifier failure and a spurious corruption or crc failure, depending on the filesystem format. Further, the subsequent readahead code responsible for walking through the physical table doesn't correctly advance the physical block reference for dirblk 2. Instead of advancing two physical filesystem blocks, the first iteration of the loop advances 1 block (correctly), but the subsequent iteration advances 2 more physical blocks because the next physical extent (extent 3, above) happens to cover more than dirblk 2. At this point, the higher-level directory block walking is completely off the rails of the actual physical layout of the directory for the respective mapping table. Update the contiguous dirblock logic to consider the current offset in the physical extent to avoid issuing directory readahead to unrelated blocks. Also, update the mapping table advancing code to consider the current offset within the current dirblock to avoid advancing the mapping reference too far beyond the dirblock. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07xfs: handle array index overrun in xfs_dir2_leaf_readbuf()Eric Sandeen
commit 023cc840b40fad95c6fe26fff1d380a8c9d45939 upstream. Carlos had a case where "find" seemed to start spinning forever and never return. This was on a filesystem with non-default multi-fsb (8k) directory blocks, and a fragmented directory with extents like this: 0:[0,133646,2,0] 1:[2,195888,1,0] 2:[3,195890,1,0] 3:[4,195892,1,0] 4:[5,195894,1,0] 5:[6,195896,1,0] 6:[7,195898,1,0] 7:[8,195900,1,0] 8:[9,195902,1,0] 9:[10,195908,1,0] 10:[11,195910,1,0] 11:[12,195912,1,0] 12:[13,195914,1,0] ... i.e. the first extent is a contiguous 2-fsb dir block, but after that it is fragmented into 1 block extents. At the top of the readdir path, we allocate a mapping array which (for this filesystem geometry) can hold 10 extents; see the assignment to map_info->map_size. During readdir, we are therefore able to map extents 0 through 9 above into the array for readahead purposes. If we count by 2, we see that the last mapped index (9) is the first block of a 2-fsb directory block. At the end of xfs_dir2_leaf_readbuf() we have 2 loops to fill more readahead; the outer loop assumes one full dir block is processed each loop iteration, and an inner loop that ensures that this is so by advancing to the next extent until a full directory block is mapped. The problem is that this inner loop may step past the last extent in the mapping array as it tries to reach the end of the directory block. This will read garbage for the extent length, and as a result the loop control variable 'j' may become corrupted and never fail the loop conditional. The number of valid mappings we have in our array is stored in map->map_valid, so stop this inner loop based on that limit. There is an ASSERT at the top of the outer loop for this same condition, but we never made it out of the inner loop, so the ASSERT never fired. Huge appreciation for Carlos for debugging and isolating the problem. Debugged-and-analyzed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Tested-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>