summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-11-08kvm: x86: ensure pv_cpuid.features is initialized when enabling capOliver Upton
Make the paravirtual cpuid enforcement mechanism idempotent to ioctl() ordering by updating pv_cpuid.features whenever userspace requests the capability. Extract this update out of kvm_update_cpuid_runtime() into a new helper function and move its other call site into kvm_vcpu_after_set_cpuid() where it more likely belongs. Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Peter Shier <pshier@google.com> Message-Id: <20201027231044.655110-5-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08kvm: x86: reads of restricted pv msrs should also result in #GPOliver Upton
commit 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") only protects against disallowed guest writes to KVM paravirtual msrs, leaving msr reads unchecked. Fix this by enforcing KVM_CPUID_FEATURES for msr reads as well. Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Peter Shier <pshier@google.com> Message-Id: <20201027231044.655110-4-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08KVM: x86: use positive error values for msr emulation that causes #GPMaxim Levitsky
Recent introduction of the userspace msr filtering added code that uses negative error codes for cases that result in either #GP delivery to the guest, or handled by the userspace msr filtering. This breaks an assumption that a negative error code returned from the msr emulation code is a semi-fatal error which should be returned to userspace via KVM_RUN ioctl and usually kill the guest. Fix this by reusing the already existing KVM_MSR_RET_INVALID error code, and by adding a new KVM_MSR_RET_FILTERED error code for the userspace filtered msrs. Fixes: 291f35fb2c1d1 ("KVM: x86: report negative values from wrmsr emulation to userspace") Reported-by: Qian Cai <cai@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201101115523.115780-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08KVM: Documentation: Update entry for KVM_CAP_ENFORCE_PV_CPUIDPeter Xu
Should be squashed into 66570e966dd9cb4f. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201023183358.50607-3-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08KVM: Documentation: Update entry for KVM_X86_SET_MSR_FILTERPeter Xu
It should be an accident when rebase, since we've already have section 8.25 (which is KVM_CAP_S390_DIAG318). Fix the number. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201001012044.5151-2-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08KVM: x86/mmu: fix counting of rmap entries in pte_list_addLi RongQing
Fix an off-by-one style bug in pte_list_add() where it failed to account the last full set of SPTEs, i.e. when desc->sptes is full and desc->more is NULL. Merge the two "PTE_LIST_EXT-1" checks as part of the fix to avoid an extra comparison. Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <1601196297-24104-1-git-send-email-lirongqing@baidu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08Merge tag 'kvmarm-fixes-5.10-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for v5.10, take #2 - Fix compilation error when PMD and PUD are folded - Fix regresssion of the RAZ behaviour of ID_AA64ZFR0_EL1
2020-11-07Merge tag 'block-5.10-2020-11-07' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request from Christoph: - revert a nvme_queue size optimization (Keith Bush) - fabrics timeout races fixes (Chao Leng and Sagi Grimberg)" - null_blk zone locking fix (Damien) * tag 'block-5.10-2020-11-07' of git://git.kernel.dk/linux-block: null_blk: Fix scheduling in atomic with zoned mode nvme-tcp: avoid repeated request completion nvme-rdma: avoid repeated request completion nvme-tcp: avoid race between time out and tear down nvme-rdma: avoid race between time out and tear down nvme: introduce nvme_sync_io_queues Revert "nvme-pci: remove last_sq_tail"
2020-11-07Merge tag 'io_uring-5.10-2020-11-07' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "A set of fixes for io_uring: - SQPOLL cancelation fixes - Two fixes for the io_identity COW - Cancelation overflow fix (Pavel) - Drain request cancelation fix (Pavel) - Link timeout race fix (Pavel)" * tag 'io_uring-5.10-2020-11-07' of git://git.kernel.dk/linux-block: io_uring: fix link lookup racing with link timeout io_uring: use correct pointer for io_uring_show_cred() io_uring: don't forget to task-cancel drained reqs io_uring: fix overflowed cancel w/ linked ->files io_uring: drop req/tctx io_identity separately io_uring: ensure consistent view of original task ->mm from SQPOLL io_uring: properly handle SQPOLL request cancelations io-wq: cancel request if it's asking for files and we don't have them
2020-11-07futex: Handle transient "ownerless" rtmutex state correctlyMike Galbraith
Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner(). This is one possible chain of events leading to this: Task Prio Operation T1 120 lock(F) T2 120 lock(F) -> blocks (top waiter) T3 50 (RT) lock(F) -> boosts T1 and blocks (new top waiter) XX timeout/ -> wakes T2 signal T1 50 unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set) T2 120 cleanup -> try_to_take_mutex() fails because T3 is the top waiter and the lower priority T2 cannot steal the lock. -> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON() The comment states that this is invalid and rt_mutex_real_owner() must return a non NULL owner when the trylock failed, but in case of a queued and woken up waiter rt_mutex_real_owner() == NULL is a valid transient state. The higher priority waiter has simply not yet managed to take over the rtmutex. The BUG_ON() is therefore wrong and this is just another retry condition in fixup_pi_state_owner(). Drop the locks, so that T3 can make progress, and then try the fixup again. Gratian provided a great analysis, traces and a reproducer. The analysis is to the point, but it confused the hell out of that tglx dude who had to page in all the futex horrors again. Condensed version is above. [ tglx: Wrote comment and changelog ] Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex") Reported-by: Gratian Crisan <gratian.crisan@ni.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87a6w6x7bb.fsf@ni.com Link: https://lore.kernel.org/r/87sg9pkvf7.fsf@nanos.tec.linutronix.de
2020-11-07net: marvell: prestera: fix compilation with CONFIG_BRIDGE=mVadym Kochan
With CONFIG_BRIDGE=m the compilation fails: ld: drivers/net/ethernet/marvell/prestera/prestera_switchdev.o: in function `prestera_bridge_port_event': prestera_switchdev.c:(.text+0x2ebd): undefined reference to `br_vlan_enabled' in case the driver is statically enabled. Fix it by adding 'BRIDGE || BRIDGE=n' dependency. Fixes: e1189d9a5fbe ("net: marvell: prestera: Add Switchdev driver implementation") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/r/20201106161128.24069-1-vadym.kochan@plvision.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-07Merge tag 'mlx5-fixes-2020-11-03' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2020-11-03 v1->v2: - Fix fixes line tag in patch #1 - Toss ktls refcount leak fix, Maxim will look further into the root cause. - Toss eswitch chain 0 prio patch, until we determine if it is needed for -rc and net. * tag 'mlx5-fixes-2020-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Fix incorrect access of RCU-protected xdp_prog net/mlx5e: Fix VXLAN synchronization after function reload net/mlx5: E-switch, Avoid extack error log for disabled vport net/mlx5: Fix deletion of duplicate rules net/mlx5e: Use spin_lock_bh for async_icosq_lock net/mlx5e: Protect encap route dev from concurrent release net/mlx5e: Fix modify header actions memory leak ==================== Link: https://lore.kernel.org/r/20201105202129.23644-1-saeedm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-07r8169: disable hw csum for short packets on all chip versionsHeiner Kallweit
RTL8125B has same or similar short packet hw padding bug as RTL8168evl. The main workaround has been extended accordingly, however we have to disable also hw checksumming for short packets on affected new chip versions. Instead of checking for an affected chip version let's simply disable hw checksumming for short packets in general. v2: - remove the version checks and disable short packet hw csum in general - reflect this in commit title and message Fixes: 0439297be951 ("r8169: add support for RTL8125B") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/7fbb35f0-e244-ef65-aa55-3872d7d38698@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-07r8169: fix potential skb double free in an error pathHeiner Kallweit
The caller of rtl8169_tso_csum_v2() frees the skb if false is returned. eth_skb_pad() internally frees the skb on error what would result in a double free. Therefore use __skb_put_padto() directly and instruct it to not free the skb on error. Fixes: b423e9ae49d7 ("r8169: fix offloaded tx checksum for small packets.") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/f7e68191-acff-9ded-4263-c016428a8762@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-07Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Driver bugfixes for I2C. Most of them are for the new mlxbf driver which got more exposure after rc1. The sh_mobile patch should already have reached you during the merge window, but I accidently dropped it. However, since it fixes a problem with rebooting, it is still fine for rc3" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: designware: slave should do WRITE_REQUESTED before WRITE_RECEIVED i2c: designware: call i2c_dw_read_clear_intrbits_slave() once i2c: mlxbf: I2C_MLXBF should depend on MELLANOX_PLATFORM i2c: mlxbf: Update author and maintainer email info i2c: mlxbf: Update reference clock frequency i2c: mlxbf: Remove unecessary wrapper functions i2c: mlxbf: Fix resrticted cast warning of sparse i2c: mlxbf: Add CONFIG_ACPI to guard ACPI function call i2c: sh_mobile: implement atomic transfers i2c: mediatek: move dma reset before i2c reset
2020-11-07Merge tag 'riscv-for-linus-5.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - SPDX comment style fix - ignore memory that is unusable - avoid setting a kernel text offset for the !MMU kernels, where skipping the first page of memory is both unnecessary and costly - avoid passing the flag bits in satp to pfn_to_virt() - fix __put_kernel_nofault, where we had the arguments to __put_user_nocheck reversed - workaround for a bug in the FU540 to avoid triggering PMP issues during early boot - change to how we pull symbols out of the vDSO. The old mechanism was removed from binutils-2.35 (and has been backported to Debian's 2.34) * tag 'riscv-for-linus-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: Fix the VDSO symbol generaton for binutils-2.35+ RISC-V: Use non-PGD mappings for early DTB access riscv: uaccess: fix __put_kernel_nofault() riscv: fix pfn_to_virt err in do_page_fault(). riscv: Set text_offset correctly for M-Mode RISC-V: Remove any memblock representing unusable memory area risc-v: kernel: ftrace: Fixes improper SPDX comment style
2020-11-07Merge tag 'usb-serial-5.10-rc3' of ↵Greg Kroah-Hartman
https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus Johan writes: USB-serial fixes for 5.10-rc3 Here's a fix for a long-standing issue with the cyberjack driver and some new device ids. All have been in linux-next with no reported issues. * tag 'usb-serial-5.10-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial: USB: serial: option: add Telit FN980 composition 0x1055 USB: serial: option: add LE910Cx compositions 0x1203, 0x1230, 0x1231 USB: serial: cyberjack: fix write-URB completion race USB: serial: option: add Quectel EC200T module support
2020-11-07perf/core: Fix a memory leak in perf_event_parse_addr_filter()kiyin(尹亮)
As shown through runtime testing, the "filename" allocation is not always freed in perf_event_parse_addr_filter(). There are three possible ways that this could happen: - It could be allocated twice on subsequent iterations through the loop, - or leaked on the success path, - or on the failure path. Clean up the code flow to make it obvious that 'filename' is always freed in the reallocation path and in the two return paths as well. We rely on the fact that kfree(NULL) is NOP and filename is initialized with NULL. This fixes the leak. No other side effects expected. [ Dan Carpenter: cleaned up the code flow & added a changelog. ] [ Ingo Molnar: updated the changelog some more. ] Fixes: 375637bc5249 ("perf/core: Introduce address range filtering") Signed-off-by: "kiyin(尹亮)" <kiyin@tencent.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "Srivatsa S. Bhat" <srivatsa@csail.mit.edu> Cc: Anthony Liguori <aliguori@amazon.com> -- kernel/events/core.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
2020-11-07x86/platform/uv: Recognize UV5 hubless system identifierMike Travis
Testing shows a problem in that UV5 hubless systems were not being recognized. Add them to the list of OEM IDs checked. Fixes: 6c7794423a998 ("Add UV5 direct references") Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201105222741.157029-4-mike.travis@hpe.com
2020-11-07x86/platform/uv: Remove spaces from OEM IDsMike Travis
Testing shows that trailing spaces caused problems with the OEM_ID and the OEM_TABLE_ID. One being that the OEM_ID would not string compare correctly. Another the OEM_ID and OEM_TABLE_ID would be concatenated in the printout. Remove any trailing spaces. Fixes: 1e61f5a95f191 ("Add and decode Arch Type in UVsystab") Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201105222741.157029-3-mike.travis@hpe.com
2020-11-07x86/platform/uv: Fix missing OEM_TABLE_IDMike Travis
Testing shows a problem in that the OEM_TABLE_ID was missing for hubless systems. This is used to determine the APIC type (legacy or extended). Add the OEM_TABLE_ID to the early hubless processing. Fixes: 1e61f5a95f191 ("Add and decode Arch Type in UVsystab") Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201105222741.157029-2-mike.travis@hpe.com
2020-11-07jbd2: fix up sparse warnings in checkpoint codeTheodore Ts'o
Add missing __acquires() and __releases() annotations. Also, in an "this should never happen" WARN_ON check, if it *does* actually happen, we need to release j_state_lock since this function is always supposed to release that lock. Otherwise, things will quickly grind to a halt after the WARN_ON trips. Fixes: 96f1e0974575 ("jbd2: avoid long hold times of j_state_lock...") Cc: stable@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-07ext4: fix sparse warnings in fast_commit codeTheodore Ts'o
Add missing __acquire() and __releases() annotations, and make fc_ineligible_reasons[] static, as it is not used outside of fs/ext4/fast_commit.c. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06ext4: cleanup fast commit mount optionsHarshad Shirwadkar
Drop no_fc mount option that disable fast commit even if it was enabled at mkfs time. Move fc_debug_force mount option under ifdef EXT4_DEBUG to annotate that this is strictly for debugging and testing purposes and should not be used in production. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-23-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06jbd2: don't start fast commit on aborted journalHarshad Shirwadkar
Fast commit should not be started if the journal is aborted. Signed-off-by: Harshad Shirwadkar<harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-22-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06ext4: make s_mount_flags modifications atomicHarshad Shirwadkar
Fast commit file system states are recorded in sbi->s_mount_flags. Fast commit expects these bit manipulations to be atomic. This patch adds helpers to make those modifications atomic. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-21-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06ext4: issue fsdev cache flush before starting fast commitHarshad Shirwadkar
If the journal dev is different from fsdev, issue a cache flush before committing fast commit blocks to disk. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201106035911.1942128-20-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06ext4: disable fast commit with data journallingHarshad Shirwadkar
Fast commits don't work with data journalling. This patch disables the fast commit support when data journalling is turned on. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201106035911.1942128-19-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06ext4: fix inode dirty check in case of fast commitsHarshad Shirwadkar
In case of fast commits, determine if the inode is dirty by checking if the inode is on fast commit list. This also helps us get rid of ext4_inode_info.i_fc_committed_subtid field. Reported-by: Andrea Righi <andrea.righi@canonical.com> Tested-by: Andrea Righi <andrea.righi@canonical.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-18-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06ext4: remove unnecessary fast commit calls from ext4_file_mmapHarshad Shirwadkar
Remove unnecessary calls to ext4_fc_start_update() and ext4_fc_stop_update() from ext4_file_mmap(). Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-17-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06ext4: mark buf dirty before submitting fast commit bufferHarshad Shirwadkar
Mark the fast commit buffer as dirty before submission. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-16-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06ext4: fix code documentatioonHarshad Shirwadkar
Add a TODO to remember fixing REQ_FUA | REQ_PREFLUSH for fast commit buffers. Also, fix a typo in top level comment in fast_commit.c Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-15-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06ext4: dedpulicate the code to wait on inode that's being committedHarshad Shirwadkar
This patch removes the deduplicates the code that implements waiting on inode that's being committed. That code is moved into a new function. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-14-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06jbd2: don't read journal->j_commit_sequence without taking a lockHarshad Shirwadkar
Take journal state lock before reading journal->j_commit_sequence. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-13-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06jbd2: don't touch buffer state until it is filledHarshad Shirwadkar
Fast commit buffers should be filled in before toucing their state. Remove code that sets buffer state as dirty before the buffer is passed to the file system. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-12-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06jbd2: add todo for a fast commit performance optimizationHarshad Shirwadkar
Fast commit performance can be optimized if commit thread doesn't wait for ongoing fast commits to complete until the transaction enters T_FLUSH state. Document this optimization. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-11-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06jbd2: don't pass tid to jbd2_fc_end_commit_fallback()Harshad Shirwadkar
In jbd2_fc_end_commit_fallback(), we know which tid to commit. There's no need for caller to pass it. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-10-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06jbd2: don't use state lock during commit pathHarshad Shirwadkar
Variables journal->j_fc_off, journal->j_fc_wbuf are accessed during commit path. Since today we allow only one process to perform a fast commit, there is no need take state lock before accessing these variables. This patch removes these locks and adds comments to describe this. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-9-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06jbd2: drop jbd2_fc_init documentationHarshad Shirwadkar
Now that jbd2_fc_init is dropped, drop its docs too. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-8-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06ext4: clean up the JBD2 API that initializes fast commitsHarshad Shirwadkar
This patch removes jbd2_fc_init() API and its related functions to simplify enabling fast commits. With this change, the number of fast commit blocks to use is solely determined by the JBD2 layer. So, we move the default value for minimum number of fast commit blocks from ext4/fast_commit.h to include/linux/jbd2.h. However, whether or not to use fast commits is determined by the file system. The file system just sets the fast commit feature using jbd2_journal_set_features(). JBD2 layer then determines how many blocks to use for fast commits (based on the value found in the JBD2 superblock). Note that the JBD2 feature flag of fast commits is just an indication that there are fast commit blocks present on disk. It doesn't tell JBD2 layer about the intent of the file system of whether to it wants to use fast commit or not. That's why, we blindly clear the fast commit flag in journal_reset() after the recovery is done. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-7-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06jbd2: rename j_maxlen to j_total_len and add jbd2_journal_max_txn_bufsHarshad Shirwadkar
The on-disk superblock field sb->s_maxlen represents the total size of the journal including the fast commit area and is no more the max number of blocks available for a transaction. The maximum number of blocks available to a transaction is reduced by the number of fast commit blocks. So, this patch renames j_maxlen to j_total_len to better represent its intent. Also, it adds a function to calculate max number of bufs available for a transaction. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-6-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06ext4: fixup ext4_fc_track_* functions' signatureHarshad Shirwadkar
Firstly, pass handle to all ext4_fc_track_* functions and use transaction id found in handle->h_transaction->h_tid for tracking fast commit updates. Secondly, don't pass inode to ext4_fc_track_link/create/unlink functions. inode can be found inside these functions as d_inode(dentry). However, rename path is an exeception. That's because in that case, we need inode that's not same as d_inode(dentry). To handle that, add a couple of low-level wrapper functions that take inode and dentry as arguments. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-5-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06ext4: drop redundant calls ext4_fc_track_rangeHarshad Shirwadkar
ext4_fc_track_range() should only be called when blocks are added or removed from an inode. So, the only places from where we need to call this function are ext4_map_blocks(), punch hole, collapse / zero range, truncate. Remove all the other redundant calls to ths function. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-4-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06ext4: mark fc ineligible if inode gets evictied due to mem pressureHarshad Shirwadkar
If inode gets evicted due to memory pressure, we have to remove it from the fast commit list. However, that inode may have uncommitted changes that fast commits will lose. So, just fall back to full commits in this case. Also, rename the fast commit ineligiblity reason from "EXT4_FC_REASON_MEM" to "EXT4_FC_REASON_MEM_NOMEM" for better expression. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-3-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06ext4: describe fast_commit feature flagsHarshad Shirwadkar
Fast commit feature has flags in the file system as well in JBD2. The meaning of fast commit feature flags can get confusing. Update docs and code to add more documentation about it. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201106035911.1942128-2-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06ext4: unlock xattr_sem properly in ext4_inline_data_truncate()Joseph Qi
It takes xattr_sem to check inline data again but without unlock it in case not have. So unlock it before return. Fixes: aef1c8513c1f ("ext4: let ext4_truncate handle inline data correctly") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/1604370542-124630-1-git-send-email-joseph.qi@linux.alibaba.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2020-11-06ext4: silence an uninitialized variable warningDan Carpenter
Smatch complains that "i" can be uninitialized if we don't enter the loop. I don't know if it's possible but we may as well silence this warning. [ Initialize i to sb->s_blocksize instead of 0. The only way the for loop could be skipped entirely is the in-memory data structures, in particular the bh->b_data for the on-disk superblock has gotten corrupted enough that calculated value of group is >= to ext4_get_groups_count(sb). In that case, we want to exit immediately without allocating a block. -- TYT ] Fixes: 8016e29f4362 ("ext4: fast commit recovery path") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20201030114620.GB3251003@mwanda Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2020-11-06MAINTAINERS: add missing file in ext4 entryChao Yu
include/trace/events/ext4.h belongs to ext4 module, add the file path into ext4 entry in MAINTAINERS. Signed-off-by: Chao Yu <yuchao0@huawei.com> Link: https://lore.kernel.org/r/20201030022435.1136-1-yuchao0@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-11-06ext4: correctly report "not supported" for {usr,grp}jquota when !CONFIG_QUOTAKaixu Xia
The macro MOPT_Q is used to indicates the mount option is related to quota stuff and is defined to be MOPT_NOSUPPORT when CONFIG_QUOTA is disabled. Normally the quota options are handled explicitly, so it didn't matter that the MOPT_STRING flag was missing, even though the usrjquota and grpjquota mount options take a string argument. It's important that's present in the !CONFIG_QUOTA case, since without MOPT_STRING, the mount option matcher will match usrjquota= followed by an integer, and will otherwise skip the table entry, and so "mount option not supported" error message is never reported. [ Fixed up the commit description to better explain why the fix works. --TYT ] Fixes: 26092bf52478 ("ext4: use a table-driven handler for mount options") Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Link: https://lore.kernel.org/r/1603986396-28917-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2020-11-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Alexei Starovoitov says: ==================== pull-request: bpf 2020-11-06 1) Pre-allocated per-cpu hashmap needs to zero-fill reused element, from David. 2) Tighten bpf_lsm function check, from KP. 3) Fix bpftool attaching to flow dissector, from Lorenz. 4) Use -fno-gcse for the whole kernel/bpf/core.c instead of function attribute, from Ard. * git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Update verification logic for LSM programs bpf: Zero-fill re-used per-cpu map element bpf: BPF_PRELOAD depends on BPF_SYSCALL tools/bpftool: Fix attaching flow dissector libbpf: Fix possible use after free in xsk_socket__delete libbpf: Fix null dereference in xsk_socket__delete libbpf, hashmap: Fix undefined behavior in hash_bits bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE tools, bpftool: Remove two unused variables. tools, bpftool: Avoid array index warnings. xsk: Fix possible memory leak at socket close bpf: Add struct bpf_redir_neigh forward declaration to BPF helper defs samples/bpf: Set rlimit for memlock to infinity in all samples bpf: Fix -Wshadow warnings selftest/bpf: Fix profiler test using CO-RE relocation for enums ==================== Link: https://lore.kernel.org/r/20201106221759.24143-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>