summaryrefslogtreecommitdiff
path: root/fs/iomap/buffered-io.c
AgeCommit message (Collapse)Author
2021-08-05iomap: Add another assertion to inline data handlingMatthew Wilcox (Oracle)
Check that the file tail does not cross a page boundary. Requested by Andreas. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-05iomap: Use kmap_local_page instead of kmap_atomicMatthew Wilcox (Oracle)
kmap_atomic() has the side-effect of disabling pagefaults and preemption. kmap_local_page() does not do this and is preferred. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-03iomap: Fix some typos and bad grammarAndreas Gruenbacher
Fix some typos and bad grammar in buffered-io.c to make the comments easier to read. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-03iomap: Support inline data with block size < page sizeMatthew Wilcox (Oracle)
Remove the restriction that inline data must start on a page boundary in a file. This allows, for example, the first 2KiB to be stored out of line and the trailing 30 bytes to be stored inline. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-03iomap: support reading inline data from non-zero posGao Xiang
The existing inline data support only works for cases where the entire file is stored as inline data. For larger files, EROFS stores the initial blocks separately and the remainder of the file ("file tail") adjacent to the inode. Generalise inline data to allow reading the inline file tail. Tails may not cross a page boundary in memory. We currently have no filesystems that support tails and writing, so that case is currently disabled (see iomap_write_begin_inline). Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-03iomap: simplify iomap_add_to_ioendChristoph Hellwig
Now that the outstanding writes are counted in bytes, there is no need to use the low-level __bio_try_merge_page API, we can switch back to always using bio_add_page and simply iomap_add_to_ioend again. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-03iomap: simplify iomap_readpage_actorChristoph Hellwig
Now that the outstanding reads are counted in bytes, there is no need to use the low-level __bio_try_merge_page API, we can switch back to always using bio_add_page and simplify iomap_readpage_actor again. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-07-15iomap: Don't create iomap_page objects in iomap_page_mkwrite_actorAndreas Gruenbacher
Now that we create those objects in iomap_writepage_map when needed, there's no need to pre-create them in iomap_page_mkwrite_actor anymore. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-07-15iomap: Don't create iomap_page objects for inline filesAndreas Gruenbacher
In iomap_readpage_actor, don't create iop objects for inline inodes. Otherwise, iomap_read_inline_data will set PageUptodate without setting iop->uptodate, and iomap_page_release will eventually complain. To prevent this kind of bug from occurring in the future, make sure the page doesn't have private data attached in iomap_read_inline_data. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-07-15iomap: Permit pages without an iop to enter writebackAndreas Gruenbacher
Create an iop in the writeback path if one doesn't exist. This allows us to avoid creating the iop in some cases. We'll initially do that for pages with inline data, but it can be extended to pages which are entirely within an extent. It also allows for an iop to be removed from pages in the future (eg page split). Co-developed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-07-03Merge branch 'work.iov_iter' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull iov_iter updates from Al Viro: "iov_iter cleanups and fixes. There are followups, but this is what had sat in -next this cycle. IMO the macro forest in there became much thinner and easier to follow..." * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits) csum_and_copy_to_pipe_iter(): leave handling of csum_state to caller clean up copy_mc_pipe_to_iter() pipe_zero(): we don't need no stinkin' kmap_atomic()... iov_iter: clean csum_and_copy_...() primitives up a bit copy_page_from_iter(): don't need kmap_atomic() for kvec/bvec cases copy_page_to_iter(): don't bother with kmap_atomic() for bvec/kvec cases iterate_xarray(): only of the first iteration we might get offset != 0 pull handling of ->iov_offset into iterate_{iovec,bvec,xarray} iov_iter: make iterator callbacks use base and len instead of iovec iov_iter: make the amount already copied available to iterator callbacks iov_iter: get rid of separate bvec and xarray callbacks iov_iter: teach iterate_{bvec,xarray}() about possible short copies iterate_bvec(): expand bvec.h macro forest, massage a bit iov_iter: unify iterate_iovec and iterate_kvec iov_iter: massage iterate_iovec and iterate_kvec to logics similar to iterate_bvec iterate_and_advance(): get rid of magic in case when n is 0 csum_and_copy_to_iter(): massage into form closer to csum_and_copy_from_iter() iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant [xarray] iov_iter_npages(): just use DIV_ROUND_UP() iov_iter_npages(): don't bother with iterate_all_kinds() ...
2021-06-29iomap: use __set_page_dirty_nobuffersMatthew Wilcox (Oracle)
The only difference between iomap_set_page_dirty() and __set_page_dirty_nobuffers() is that the latter includes a debugging check that a !Uptodate page has private data. Link: https://lkml.kernel.org/r/20210615162342.1669332-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-10iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing ↵Al Viro
variant Replacement is called copy_page_from_iter_atomic(); unlike the old primitive the callers do *not* need to do iov_iter_advance() after it. In case when they end up consuming less than they'd been given they need to do iov_iter_revert() on everything they had not consumed. That, however, needs to be done only on slow paths. All in-tree callers converted. And that kills the last user of iterate_all_kinds() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-02generic_perform_write()/iomap_write_actor(): saner logics for short copyAl Viro
if we run into a short copy and ->write_end() refuses to advance at all, use the amount we'd managed to copy for the next iteration to handle. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-05-14mm/filemap: fix readahead return typesMatthew Wilcox (Oracle)
A readahead request will not allocate more memory than can be represented by a size_t, even on systems that have HIGHMEM available. Change the length functions from returning an loff_t to a size_t. Link: https://lkml.kernel.org/r/20210510201201.1558972-1-willy@infradead.org Fixes: 32c0a6bcaa1f57 ("btrfs: add and use readahead_batch_length") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-04iomap: remove unused private field from ioendBrian Foster
The only remaining user of ->io_private is the generic ioend merging infrastructure. The only user of that is XFS, which no longer sets ->io_private or passes an associated merge callback. Remove the unused parameter and the ->io_private field. CC: linux-fsdevel@vger.kernel.org Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-08treewide: Change list_sort to use const pointersSami Tolvanen
list_sort() internally casts the comparison function passed to it to a different type with constant struct list_head pointers, and uses this pointer to call the functions, which trips indirect call Control-Flow Integrity (CFI) checking. Instead of removing the consts, this change defines the list_cmp_func_t type and changes the comparison function types of all list_sort() callers to use const pointers, thus avoiding type mismatches. Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com
2021-03-11block: rename BIO_MAX_PAGES to BIO_MAX_VECSChristoph Hellwig
Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been horribly confusingly misnamed. Rename it to BIO_MAX_VECS to stop confusing users of the bio API. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-28Merge tag 'xfs-5.12-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull more xfs updates from Darrick Wong: "The most notable fix here prevents premature reuse of freed metadata blocks, and adding the ability to detect accidental nested transactions, which are not allowed here. - Restore a disused sysctl control knob that was inadvertently dropped during the merge window to avoid fstests regressions. - Don't speculatively release freed blocks from the busy list until we're actually allocating them, which fixes a rare log recovery regression. - Don't nest transactions when scanning for free space. - Add an idiot^Wmaintainer light to detect nested transactions. ;)" * tag 'xfs-5.12-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: use current->journal_info for detecting transaction recursion xfs: don't nest transactions when scanning for eofblocks xfs: don't reuse busy extents on extent trim xfs: restore speculative_cow_prealloc_lifetime sysctl
2021-02-26block: Add bio_max_segsMatthew Wilcox (Oracle)
It's often inconvenient to use BIO_MAX_PAGES due to min() requiring the sign to be the same. Introduce bio_max_segs() and change BIO_MAX_PAGES to be unsigned to make it easier for the users. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-25xfs: use current->journal_info for detecting transaction recursionDave Chinner
Because the iomap code using PF_MEMALLOC_NOFS to detect transaction recursion in XFS is just wrong. Remove it from the iomap code and replace it with XFS specific internal checks using current->journal_info instead. [djwong: This change also realigns the lifetime of NOFS flag changes to match the incore transaction, instead of the inconsistent scheme we have now.] Fixes: 9070733b4efa ("xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-12-02mm: memcontrol: Use helpers to read page's memcg dataRoman Gushchin
Patch series "mm: allow mapping accounted kernel pages to userspace", v6. Currently a non-slab kernel page which has been charged to a memory cgroup can't be mapped to userspace. The underlying reason is simple: PageKmemcg flag is defined as a page type (like buddy, offline, etc), so it takes a bit from a page->mapped counter. Pages with a type set can't be mapped to userspace. But in general the kmemcg flag has nothing to do with mapping to userspace. It only means that the page has been accounted by the page allocator, so it has to be properly uncharged on release. Some bpf maps are mapping the vmalloc-based memory to userspace, and their memory can't be accounted because of this implementation detail. This patchset removes this limitation by moving the PageKmemcg flag into one of the free bits of the page->mem_cgroup pointer. Also it formalizes accesses to the page->mem_cgroup and page->obj_cgroups using new helpers, adds several checks and removes a couple of obsolete functions. As the result the code became more robust with fewer open-coded bit tricks. This patch (of 4): Currently there are many open-coded reads of the page->mem_cgroup pointer, as well as a couple of read helpers, which are barely used. It creates an obstacle on a way to reuse some bits of the pointer for storing additional bits of information. In fact, we already do this for slab pages, where the last bit indicates that a pointer has an attached vector of objcg pointers instead of a regular memcg pointer. This commits uses 2 existing helpers and introduces a new helper to converts all read sides to calls of these helpers: struct mem_cgroup *page_memcg(struct page *page); struct mem_cgroup *page_memcg_rcu(struct page *page); struct mem_cgroup *page_memcg_check(struct page *page); page_memcg_check() is intended to be used in cases when the page can be a slab page and have a memcg pointer pointing at objcg vector. It does check the lowest bit, and if set, returns NULL. page_memcg() contains a VM_BUG_ON_PAGE() check for the page not being a slab page. To make sure nobody uses a direct access, struct page's mem_cgroup/obj_cgroups is converted to unsigned long memcg_data. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com Link: https://lore.kernel.org/bpf/20201201215900.3569844-2-guro@fb.com
2020-11-04iomap: clean up writeback state logic on writepage errorBrian Foster
The iomap writepage error handling logic is a mash of old and slightly broken XFS writepage logic. When keepwrite writeback state tracking was introduced in XFS in commit 0d085a529b42 ("xfs: ensure WB_SYNC_ALL writeback handles partial pages correctly"), XFS had an additional cluster writeback context that scanned ahead of ->writepage() to process dirty pages over the current ->writepage() extent mapping. This context expected a dirty page and required retention of the TOWRITE tag on partial page processing so the higher level writeback context would revisit the page (in contrast to ->writepage(), which passes a page with the dirty bit already cleared). The cluster writeback mechanism was eventually removed and some of the error handling logic folded into the primary writeback path in commit 150d5be09ce4 ("xfs: remove xfs_cancel_ioend"). This patch accidentally conflated the two contexts by using the keepwrite logic in ->writepage() without accounting for the fact that the page is not dirty. Further, the keepwrite logic has no practical effect on the core ->writepage() caller (write_cache_pages()) because it never revisits a page in the current function invocation. Technically, the page should be redirtied for the keepwrite logic to have any effect. Otherwise, write_cache_pages() may find the tagged page but will skip it since it is clean. Even if the page was redirtied, however, there is still no practical effect to keepwrite since write_cache_pages() does not wrap around within a single invocation of the function. Therefore, the dirty page would simply end up retagged on the next writeback sequence over the associated range. All that being said, none of this really matters because redirtying a partially processed page introduces a potential infinite redirty -> writeback failure loop that deviates from the current design principle of clearing the dirty state on writepage failure to avoid building up too much dirty, unreclaimable memory on the system. Therefore, drop the spurious keepwrite usage and dirty state clearing logic from iomap_writepage_map(), treat the partially processed page the same as a fully processed page, and let the imminent ioend failure clean up the writeback state. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-11-04iomap: support partial page discard on writeback block mapping failureBrian Foster
iomap writeback mapping failure only calls into ->discard_page() if the current page has not been added to the ioend. Accordingly, the XFS callback assumes a full page discard and invalidation. This is problematic for sub-page block size filesystems where some portion of a page might have been mapped successfully before a failure to map a delalloc block occurs. ->discard_page() is not called in that error scenario and the bio is explicitly failed by iomap via the error return from ->prepare_ioend(). As a result, the filesystem leaks delalloc blocks and corrupts the filesystem block counters. Since XFS is the only user of ->discard_page(), tweak the semantics to invoke the callback unconditionally on mapping errors and provide the file offset that failed to map. Update xfs_discard_page() to discard the corresponding portion of the file and pass the range along to iomap_invalidatepage(). The latter already properly handles both full and sub-page scenarios by not changing any iomap or page state on sub-page invalidations. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-28iomap: Set all uptodate bits for an Uptodate pageMatthew Wilcox (Oracle)
For filesystems with block size < page size, we need to set all the per-block uptodate bits if the page was already uptodate at the time we create the per-block metadata. This can happen if the page is invalidated (eg by a write to drop_caches) but ultimately not removed from the page cache. This is a data corruption issue as page writeback skips blocks which are marked !uptodate. Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: Qian Cai <cai@redhat.com> Cc: Brian Foster <bfoster@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-21iomap: Change calling convention for zeroingMatthew Wilcox (Oracle)
Pass the full length to iomap_zero() and dax_iomap_zero(), and have them return how many bytes they actually handled. This is preparatory work for handling THP, although it looks like DAX could actually take advantage of it if there's a larger contiguous area. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-21iomap: Convert iomap_write_end typesMatthew Wilcox (Oracle)
iomap_write_end cannot return an error, so switch it to return size_t instead of int and remove the error checking from the callers. Also convert the arguments to size_t from unsigned int, in case anyone ever wants to support a page size larger than 2GB. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-21iomap: Convert write_count to write_bytes_pendingMatthew Wilcox (Oracle)
Instead of counting bio segments, count the number of bytes submitted. This insulates us from the block layer's definition of what a 'same page' is, which is not necessarily clear once THPs are involved. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-21iomap: Convert read_count to read_bytes_pendingMatthew Wilcox (Oracle)
Instead of counting bio segments, count the number of bytes submitted. This insulates us from the block layer's definition of what a 'same page' is, which is not necessarily clear once THPs are involved. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-21iomap: Support arbitrarily many blocks per pageMatthew Wilcox (Oracle)
Size the uptodate array dynamically to support larger pages in the page cache. With a 64kB page, we're only saving 8 bytes per page today, but with a 2MB maximum page size, we'd have to allocate more than 4kB per page. Add a few debugging assertions. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-21iomap: Use bitmap ops to set uptodate bitsMatthew Wilcox (Oracle)
Now that the bitmap is protected by a spinlock, we can use the more efficient bitmap ops instead of individual test/set bit ops. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-21iomap: Use kzalloc to allocate iomap_pageMatthew Wilcox (Oracle)
We can skip most of the initialisation, although spinlocks still need explicit initialisation as architectures may use a non-zero value to indicate unlocked. The comment is no longer useful as attach_page_private() handles the refcount now. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-21fs: Introduce i_blocks_per_pageMatthew Wilcox (Oracle)
This helper is useful for both THPs and for supporting block size larger than page size. Convert all users that I could find (we have a few different ways of writing this idiom, and I may have missed some). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2020-09-21iomap: Fix misplaced page flushingMatthew Wilcox (Oracle)
If iomap_unshare_actor() unshares to an inline iomap, the page was not being flushed. block_write_end() and __iomap_write_end() already contain flushes, so adding it to iomap_write_end_inline() seems like the best place. That means we can remove it from iomap_write_actor(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-21iomap: Use round_down/round_up macros in __iomap_write_beginNikolay Borisov
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-10iomap: Mark read blocks uptodate in write_beginMatthew Wilcox (Oracle)
When bringing (portions of) a page uptodate, we were marking blocks that were zeroed as being uptodate, but not blocks that were read from storage. Like the previous commit, this problem was found with generic/127 and a kernel which failed readahead I/Os. This bug causes writes to be silently lost when working with flaky storage. Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-10iomap: Clear page error before beginning a writeMatthew Wilcox (Oracle)
If we find a page in write_begin which is !Uptodate, we need to clear any error on the page before starting to read data into it. This matches how filemap_fault(), do_read_cache_page() and generic_file_buffered_read() handle PageError on !Uptodate pages. When calling iomap_set_range_uptodate() in __iomap_write_begin(), blocks were not being marked as uptodate. This was found with generic/127 and a specially modified kernel which would fail (some) readahead I/Os. The test read some bytes in a prior page which caused readahead to extend into page 0x34. There was a subsequent write to page 0x34, followed by a read to page 0x34. Because the blocks were still marked as !Uptodate, the read caused all blocks to be re-read, overwriting the write. With this change, and the next one, the bytes which were written are marked as being Uptodate, so even though the page is still marked as !Uptodate, the blocks containing the written data are not re-read from storage. Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-06-13Merge tag 'iomap-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull iomap fix from Darrick Wong: "A single iomap bug fix for a variable type mistake on 32-bit architectures, fixing an integer overflow problem in the unshare actor" * tag 'iomap-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: Fix unsharing of an extent >2GB on a 32-bit machine
2020-06-08iomap: Fix unsharing of an extent >2GB on a 32-bit machineMatthew Wilcox (Oracle)
Widen the type used for counting the number of bytes unshared. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-06-02iomap: use attach/detach_page_privateGuoqing Jiang
Since the new pair function is introduced, we can call them to clean the code in iomap. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Link: http://lkml.kernel.org/r/20200517214718.468-7-guoqing.jiang@cloud.ionos.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02iomap: convert from readpages to readaheadMatthew Wilcox (Oracle)
Use the new readahead operation in iomap. Convert XFS and ZoneFS to use it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-26-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02fs: convert mpage_readpages to mpage_readaheadMatthew Wilcox (Oracle)
Implement the new readahead aop and convert all callers (block_dev, exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6, reiserfs & udf). The callers are all trivial except for GFS2 & OCFS2. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> # ocfs2 Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> # ocfs2 Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-17-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-08Merge tag 'iomap-5.7-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull iomap fix from Darrick Wong: "Fix a problem in readahead where we can crash if we can't allocate a full bio due to GFP_NORETRY" * tag 'iomap-5.7-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: Handle memory allocation failure in readahead
2020-04-08Merge tag 'libnvdimm-for-5.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and dax updates from Dan Williams: "There were multiple touches outside of drivers/nvdimm/ this round to add cross arch compatibility to the devm_memremap_pages() interface, enhance numa information for persistent memory ranges, and add a zero_page_range() dax operation. This cycle I switched from the patchwork api to Konstantin's b4 script for collecting tags (from x86, PowerPC, filesystem, and device-mapper folks), and everything looks to have gone ok there. This has all appeared in -next with no reported issues. Summary: - Add support for region alignment configuration and enforcement to fix compatibility across architectures and PowerPC page size configurations. - Introduce 'zero_page_range' as a dax operation. This facilitates filesystem-dax operation without a block-device. - Introduce phys_to_target_node() to facilitate drivers that want to know resulting numa node if a given reserved address range was onlined. - Advertise a persistence-domain for of_pmem and papr_scm. The persistence domain indicates where cpu-store cycles need to reach in the platform-memory subsystem before the platform will consider them power-fail protected. - Promote numa_map_to_online_node() to a cross-kernel generic facility. - Save x86 numa information to allow for node-id lookups for reserved memory ranges, deploy that capability for the e820-pmem driver. - Pick up some miscellaneous minor fixes, that missed v5.6-final, including a some smatch reports in the ioctl path and some unit test compilation fixups. - Fixup some flexible-array declarations" * tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (29 commits) dax: Move mandatory ->zero_page_range() check in alloc_dax() dax,iomap: Add helper dax_iomap_zero() to zero a range dax: Use new dax zero page method for zeroing a page dm,dax: Add dax zero_page_range operation s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver dax, pmem: Add a dax operation zero_page_range pmem: Add functions for reading/writing page to/from pmem libnvdimm: Update persistence domain value for of_pmem and papr_scm device tools/test/nvdimm: Fix out of tree build libnvdimm/region: Fix build error libnvdimm/region: Replace zero-length array with flexible-array member libnvdimm/label: Replace zero-length array with flexible-array member ACPI: NFIT: Replace zero-length array with flexible-array member libnvdimm/region: Introduce an 'align' attribute libnvdimm/region: Introduce NDD_LABELING libnvdimm/namespace: Enforce memremap_compat_align() libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid libnvdimm: Out of bounds read in __nd_ioctl() acpi/nfit: improve bounds checking for 'func' mm/memremap_pages: Introduce memremap_compat_align() ...
2020-04-02dax,iomap: Add helper dax_iomap_zero() to zero a rangeVivek Goyal
Add a helper dax_ioamp_zero() to zero a range. This patch basically merges __dax_zero_page_range() and iomap_dax_zero(). Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20200228163456.1587-7-vgoyal@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-04-02iomap: Handle memory allocation failure in readaheadMatthew Wilcox (Oracle)
bio_alloc() can fail when we use GFP_NORETRY. If it does, allocate a bio large enough for a single page like mpage_readpages() does. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-05iomap: Remove pgoff from tracepointsMatthew Wilcox (Oracle)
The 'pgoff' displayed by the tracepoints wasn't a pgoff at all; it was a byte offset from the start of the file. We already emit that in the form of the 'offset', so we can just remove pgoff. That means we can remove 'page' as an argument to the tracepoint, and rename this type of tracepoint from being a page class to being a range class. Fixes: 0b1b213fcf3a ("xfs: event tracing support") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-01-06fs: Fix page_mkwrite off-by-one errorsAndreas Gruenbacher
The check in block_page_mkwrite that is meant to determine whether an offset is within the inode size is off by one. This bug has been copied into iomap_page_mkwrite and several filesystems (ubifs, ext4, f2fs, ceph). Fix that by introducing a new page_mkwrite_check_truncate helper that checks for truncate and computes the bytes in the page up to EOF. Use the helper in iomap. NOTE from Darrick: The original patch fixed a number of filesystems, but then there were merge conflicts with the f2fs for-next tree; a subsequent re-submission of the patch had different btrfs changes with no explanation; and Christoph complained that each per-fs fix should be a separate patch. In my view that's too much risk to take on, so I decided to drop all the hunks except for iomap, since I've actually QA'd XFS. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: drop everything but the iomap parts] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-12-05iomap: stop using ioend after it's been freed in iomap_finish_ioend()Zorro Lang
This patch fixes the following KASAN report. The @ioend has been freed by dio_put(), but the iomap_finish_ioend() still trys to access its data. [20563.631624] BUG: KASAN: use-after-free in iomap_finish_ioend+0x58c/0x5c0 [20563.638319] Read of size 8 at addr fffffc0c54a36928 by task kworker/123:2/22184 [20563.647107] CPU: 123 PID: 22184 Comm: kworker/123:2 Not tainted 5.4.0+ #1 [20563.653887] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.11 06/18/2019 [20563.664499] Workqueue: xfs-conv/sda5 xfs_end_io [xfs] [20563.669547] Call trace: [20563.671993] dump_backtrace+0x0/0x370 [20563.675648] show_stack+0x1c/0x28 [20563.678958] dump_stack+0x138/0x1b0 [20563.682455] print_address_description.isra.9+0x60/0x378 [20563.687759] __kasan_report+0x1a4/0x2a8 [20563.691587] kasan_report+0xc/0x18 [20563.694985] __asan_report_load8_noabort+0x18/0x20 [20563.699769] iomap_finish_ioend+0x58c/0x5c0 [20563.703944] iomap_finish_ioends+0x110/0x270 [20563.708396] xfs_end_ioend+0x168/0x598 [xfs] [20563.712823] xfs_end_io+0x1e0/0x2d0 [xfs] [20563.716834] process_one_work+0x7f0/0x1ac8 [20563.720922] worker_thread+0x334/0xae0 [20563.724664] kthread+0x2c4/0x348 [20563.727889] ret_from_fork+0x10/0x18 [20563.732941] Allocated by task 83403: [20563.736512] save_stack+0x24/0xb0 [20563.739820] __kasan_kmalloc.isra.9+0xc4/0xe0 [20563.744169] kasan_slab_alloc+0x14/0x20 [20563.747998] slab_post_alloc_hook+0x50/0xa8 [20563.752173] kmem_cache_alloc+0x154/0x330 [20563.756185] mempool_alloc_slab+0x20/0x28 [20563.760186] mempool_alloc+0xf4/0x2a8 [20563.763845] bio_alloc_bioset+0x2d0/0x448 [20563.767849] iomap_writepage_map+0x4b8/0x1740 [20563.772198] iomap_do_writepage+0x200/0x8d0 [20563.776380] write_cache_pages+0x8a4/0xed8 [20563.780469] iomap_writepages+0x4c/0xb0 [20563.784463] xfs_vm_writepages+0xf8/0x148 [xfs] [20563.788989] do_writepages+0xc8/0x218 [20563.792658] __writeback_single_inode+0x168/0x18f8 [20563.797441] writeback_sb_inodes+0x370/0xd30 [20563.801703] wb_writeback+0x2d4/0x1270 [20563.805446] wb_workfn+0x344/0x1178 [20563.808928] process_one_work+0x7f0/0x1ac8 [20563.813016] worker_thread+0x334/0xae0 [20563.816757] kthread+0x2c4/0x348 [20563.819979] ret_from_fork+0x10/0x18 [20563.825028] Freed by task 22184: [20563.828251] save_stack+0x24/0xb0 [20563.831559] __kasan_slab_free+0x10c/0x180 [20563.835648] kasan_slab_free+0x10/0x18 [20563.839389] slab_free_freelist_hook+0xb4/0x1c0 [20563.843912] kmem_cache_free+0x8c/0x3e8 [20563.847745] mempool_free_slab+0x20/0x28 [20563.851660] mempool_free+0xd4/0x2f8 [20563.855231] bio_free+0x33c/0x518 [20563.858537] bio_put+0xb8/0x100 [20563.861672] iomap_finish_ioend+0x168/0x5c0 [20563.865847] iomap_finish_ioends+0x110/0x270 [20563.870328] xfs_end_ioend+0x168/0x598 [xfs] [20563.874751] xfs_end_io+0x1e0/0x2d0 [xfs] [20563.878755] process_one_work+0x7f0/0x1ac8 [20563.882844] worker_thread+0x334/0xae0 [20563.886584] kthread+0x2c4/0x348 [20563.889804] ret_from_fork+0x10/0x18 [20563.894855] The buggy address belongs to the object at fffffc0c54a36900 which belongs to the cache bio-1 of size 248 [20563.906844] The buggy address is located 40 bytes inside of 248-byte region [fffffc0c54a36900, fffffc0c54a369f8) [20563.918485] The buggy address belongs to the page: [20563.923269] page:ffffffff82f528c0 refcount:1 mapcount:0 mapping:fffffc8e4ba31900 index:0xfffffc0c54a33300 [20563.932832] raw: 17ffff8000000200 ffffffffa3060100 0000000700000007 fffffc8e4ba31900 [20563.940567] raw: fffffc0c54a33300 0000000080aa0042 00000001ffffffff 0000000000000000 [20563.948300] page dumped because: kasan: bad access detected [20563.955345] Memory state around the buggy address: [20563.960129] fffffc0c54a36800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc [20563.967342] fffffc0c54a36880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [20563.974554] >fffffc0c54a36900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [20563.981766] ^ [20563.986288] fffffc0c54a36980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc [20563.993501] fffffc0c54a36a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [20564.000713] ================================================================== Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205703 Signed-off-by: Zorro Lang <zlang@redhat.com> Fixes: 9cd0ed63ca514 ("iomap: enhance writeback error message") Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-12-04iomap: fix sub-page uptodate handlingChristoph Hellwig
bio completions can race when a page spans more than one file system block. Add a spinlock to synchronize marking the page uptodate. Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads") Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>