Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull overlayfs updates from Christian Brauner:
"This contains overlayfs updates for this cycle.
The changes for overlayfs in here are primarily focussed on preparing
for some proposed changes to directory locking.
Overlayfs currently will sometimes lock a directory on the upper
filesystem and do a few different things while holding the lock. This
is incompatible with the new potential scheme.
This series narrows the region of code protected by the directory
lock, taking it multiple times when necessary. This theoretically
opens up the possibilty of other changes happening on the upper
filesytem between the unlock and the lock. To some extent the patches
guard against that by checking the dentries still have the expect
parent after retaking the lock. In general, concurrent changes to the
upper and lower filesystems aren't supported properly anyway"
* tag 'vfs-6.17-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
ovl: properly print correct variable
ovl: rename ovl_cleanup_unlocked() to ovl_cleanup()
ovl: change ovl_create_real() to receive dentry parent
ovl: narrow locking in ovl_check_rename_whiteout()
ovl: narrow locking in ovl_whiteout()
ovl: change ovl_cleanup_and_whiteout() to take rename lock as needed
ovl: narrow locking on ovl_remove_and_whiteout()
ovl: change ovl_workdir_cleanup() to take dir lock as needed.
ovl: narrow locking in ovl_workdir_cleanup_recurse()
ovl: narrow locking in ovl_indexdir_cleanup()
ovl: narrow locking in ovl_workdir_create()
ovl: narrow locking in ovl_cleanup_index()
ovl: narrow locking in ovl_cleanup_whiteouts()
ovl: narrow locking in ovl_rename()
ovl: simplify gotos in ovl_rename()
ovl: narrow locking in ovl_create_over_whiteout()
ovl: narrow locking in ovl_clear_empty()
ovl: narrow locking in ovl_create_upper()
ovl: narrow the locked region in ovl_copy_up_workdir()
ovl: Call ovl_create_temp() without lock held.
...
|
|
Case folding is often applied to subtrees and not on an entire
filesystem.
Disallowing layers from filesystems that support case folding is over
limiting.
Replace the rule that case-folding capable are not allowed as layers
with a rule that case folded directories are not allowed in a merged
directory stack.
Should case folding be enabled on an underlying directory while
overlayfs is mounted the outcome is generally undefined.
Specifically in ovl_lookup(), we check the base underlying directory
and fail with -ESTALE and write a warning to kmsg if an underlying
directory case folding is enabled.
Suggested-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/linux-fsdevel/20250520051600.1903319-1-kent.overstreet@linux.dev/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/20250602171702.1941891-1-amir73il@gmail.com
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix a regression in overlayfs caused by reworking the lookup_one*()
set of helpers
- Make sure that the name of the dentry is printed in overlayfs'
mkdir() helper
- Add missing iocb values to TRACE_IOCB_STRINGS define
- Unlock the superblock during iterate_supers_type(). This was an
accidental internal api change
- Drop a misleading assert in file_seek_cur_needs_f_lock() helper
- Never refuse to return PIDFD_GET_INGO when parent pid is zero
That can trivially happen in container scenarios where the parent
process might be located in an ancestor pid namespace
- Don't revalidate in try_lookup_noperm() as that causes regression for
filesystems such as cifs
- Fix simple_xattr_list() and reset the err variable after
security_inode_listsecurity() got called so as not to confuse
userspace about the length of the xattr
* tag 'vfs-6.16-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: drop assert in file_seek_cur_needs_f_lock
fs: unlock the superblock during iterate_supers_type
ovl: fix debug print in case of mkdir error
VFS: change try_lookup_noperm() to skip revalidation
fs: add missing values to TRACE_IOCB_STRINGS
fs/xattr.c: fix simple_xattr_list()
ovl: fix regression caused by lookup helpers API changes
pidfs: never refuse ppid == 0 in PIDFD_GET_INFO
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs update from Miklos Szeredi:
- Fix a regression in getting the path of an open file (e.g. in
/proc/PID/maps) for a nested overlayfs setup (André Almeida)
- Support data-only layers and verity in a user namespace (unprivileged
composefs use case)
- Fix a gcc warning (Kees)
- Cleanups
* tag 'ovl-update-v2-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: Annotate struct ovl_entry with __counted_by()
ovl: Replace offsetof() with struct_size() in ovl_stack_free()
ovl: Replace offsetof() with struct_size() in ovl_cache_entry_new()
ovl: Check for NULL d_inode() in ovl_dentry_upper()
ovl: Use str_on_off() helper in ovl_show_options()
ovl: don't require "metacopy=on" for "verity"
ovl: relax redirect/metacopy requirements for lower -> data redirect
ovl: make redirect/metacopy rejection consistent
ovl: Fix nested backing file paths
|
|
The lookup helpers API was changed by merge of vfs-6.16-rc1.async.dir to
pass a non-const qstr pointer argument to lookup_one*() helpers.
All of the callers of this API were changed to pass a pointer to temp
copy of qstr, except overlays that was passing a const pointer to
dentry->d_name that was changed to pass a non-const copy instead
when doing a lookup in lower layer which is not the fs of said dentry.
This wrong use of the API caused a regression in fstest overlay/012.
Fix the regression by making a non-const copy of dentry->d_name prior
to calling the lookup API, but the API should be fixed to not allow this
class of bugs.
Cc: NeilBrown <neilb@suse.de>
Fixes: 5741909697a3 ("VFS: improve interface for lookup_one functions")
Fixes: 390e34bc1490 ("VFS: change lookup_one_common and lookup_noperm_common to take a qstr")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/20250605101530.2336320-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Allow the special case of a redirect from a lower layer to a data layer
without having to turn on metacopy. This makes the feature work with
userxattr, which in turn allows data layers to be usable in user
namespaces.
Minimize the risk by only enabling redirect from a single lower layer to a
data layer iff a data layer is specified. The only way to access a data
layer is to enable this, so there's really no reason not to enable this.
This can be used safely if the lower layer is read-only and the
user.overlay.redirect xattr cannot be modified.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
When overlayfs finds a file with metacopy and/or redirect attributes and
the metacopy and/or redirect features are not enabled, then it refuses to
act on those attributes while also issuing a warning.
There was an inconsistency in not checking metacopy found from the index.
And also only warning on an upper metacopy if it found the next file on the
lower layer, while always warning for metacopy found on a lower layer.
Fix these inconsistencies and make the logic more straightforward, paving
the way for following patches to change when data redirects are allowed.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The lookup_one_len family of functions is (now) only used internally by
a filesystem on itself either
- in a context where permission checking is irrelevant such as by a
virtual filesystem populating itself, or xfs accessing its ORPHANAGE
or dquota accessing the quota file; or
- in a context where a permission check (MAY_EXEC on the parent) has just
been performed such as a network filesystem finding in "silly-rename"
file in the same directory. This is also the context after the
_parentat() functions where currently lookup_one_qstr_excl() is used.
So the permission check is pointless.
The name "one_len" is unhelpful in understanding the purpose of these
functions and should be changed. Most of the callers pass the len as
"strlen()" so using a qstr and QSTR() can simplify the code.
This patch renames these functions (include lookup_positive_unlocked()
which is part of the family despite the name) to have a name based on
"lookup_noperm". They are changed to receive a 'struct qstr' instead
of separate name and len. In a few cases the use of QSTR() results in a
new call to strlen().
try_lookup_noperm() takes a pointer to a qstr instead of the whole
qstr. This is consistent with d_hash_and_lookup() (which is nearly
identical) and useful for lookup_noperm_unlocked().
The new lookup_noperm_common() doesn't take a qstr yet. That will be
tidied up in a subsequent patch.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/r/20250319031545.2999807-5-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The family of functions:
lookup_one()
lookup_one_unlocked()
lookup_one_positive_unlocked()
appear designed to be used by external clients of the filesystem rather
than by filesystems acting on themselves as the lookup_one_len family
are used.
They are used by:
btrfs/ioctl - which is a user-space interface rather than an internal
activity
exportfs - i.e. from nfsd or the open_by_handle_at interface
overlayfs - at access the underlying filesystems
smb/server - for file service
They should be used by nfsd (more than just the exportfs path) and
cachefs but aren't.
It would help if the documentation didn't claim they should "not be
called by generic code".
Also the path component name is passed as "name" and "len" which are
(confusingly?) separate by the "base". In some cases the len in simply
"strlen" and so passing a qstr using QSTR() would make the calling
clearer.
Other callers do pass separate name and len which are stored in a
struct. Sometimes these are already stored in a qstr, other times it
easily could be.
So this patch changes these three functions to receive a 'struct qstr *',
and improves the documentation.
QSTR_LEN() is added to make it easy to pass a QSTR containing a known
len.
[brauner@kernel.org: take a struct qstr pointer]
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/r/20250319031545.2999807-2-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Pull misc vfs cleanups from Al Viro:
"Two unrelated patches - one is a removal of long-obsolete include in
overlayfs (it used to need fs/internal.h, but the extern it wanted has
been moved back to include/linux/namei.h) and another introduces
convenience helper constructing struct qstr by a NUL-terminated
string"
* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
add a string-to-qstr constructor
fs/overlayfs/namei.c: get rid of include ../internal.h
|
|
Added for the sake of vfs_path_lookup(), which is in linux/namei.h
these days.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
We want to be able to encode an fid from an inode with no alias.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20250105162404.357058-2-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Introduce ovl_revert_creds() wrapper of revert_creds() to
match callers of ovl_override_creds().
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
An opaque directory cannot have xwhiteouts, so instead of marking an
xwhiteouts directory with a new xattr, overload overlay.opaque xattr
for marking both opaque dir ('y') and xwhiteouts dir ('x').
This is more efficient as the overlay.opaque xattr is checked during
lookup of directory anyway.
This also prevents unnecessary checking the xattr when reading a
directory without xwhiteouts, i.e. most of the time.
Note that the xwhiteouts marker is not checked on the upper layer and
on the last layer in lowerstack, where xwhiteouts are not expected.
Fixes: bc8df7a3dc03 ("ovl: Add an alternative type of whiteout")
Cc: <stable@vger.kernel.org> # v6.7
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Tested-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
When the index feature is disabled, ofs->indexdir is NULL.
When the index feature is enabled, ofs->indexdir has the same value as
ofs->workdir and takes an extra reference.
This makes the code harder to understand when it is not always clear
that ofs->indexdir in one function is the same dentry as ofs->workdir
in another function.
Remove this redundancy, by referencing ofs->workdir directly in index
helpers and by using the ovl_indexdir() accessor in generic code.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
An xattr whiteout (called "xwhiteout" in the code) is a reguar file of
zero size with the "overlay.whiteout" xattr set. A file like this in a
directory with the "overlay.whiteouts" xattrs set will be treated the
same way as a regular whiteout.
The "overlay.whiteouts" directory xattr is used in order to
efficiently handle overlay checks in readdir(), as we only need to
checks xattrs in affected directories.
The advantage of this kind of whiteout is that they can be escaped
using the standard overlay xattr escaping mechanism. So, a file with a
"overlay.overlay.whiteout" xattr would be unescaped to
"overlay.whiteout", which could then be consumed by another overlayfs
as a whiteout.
Overlayfs itself doesn't create whiteouts like this, but a userspace
mechanism could use this alternative mechanism to convert images that
may contain whiteouts to be used with overlayfs.
To work as a whiteout for both regular overlayfs mounts as well as
userxattr mounts both the "user.overlay.whiteout*" and the
"trusted.overlay.whiteout*" xattrs will need to be created.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
When lower fs is a nested overlayfs, calling encode_fh() on a lower
directory dentry may trigger copy up and take sb_writers on the upper fs
of the lower nested overlayfs.
The lower nested overlayfs may have the same upper fs as this overlayfs,
so nested sb_writers lock is illegal.
Move all the callers that encode lower fh to before ovl_want_write().
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
Always use OVL_FS() to retrieve the corresponding struct ovl_fs from a
struct super_block.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
The legacy behavior of ovl_statfs() reports the f_fsid filled by
underlying upper fs. This fsid is not unique among overlayfs instances
on the same upper fs.
With mount option uuid=on, generate a non-persistent uuid per overlayfs
instance and use it as the seed for f_fsid, similar to tmpfs.
This is useful for reporting fanotify events with fid info from different
instances of overlayfs over the same upper fs.
The old behavior of null uuid and upper fs fsid is retained with the
mount option uuid=null, which is the default.
The mount option uuid=off that disables uuid checks in underlying layers
also retains the legacy behavior.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
The new digest field in the metacopy xattr is used during lookup to
record whether the header contained a digest in the OVL_HAS_DIGEST
flags.
When accessing file data the first time, if OVL_HAS_DIGEST is set, we
reload the metadata and check that the source lowerdata inode matches
the specified digest in it (according to the enabled verity
options). If the verity check passes we store this info in the inode
flags as OVL_VERIFIED_DIGEST, so that we can avoid doing it again if
the inode remains in memory.
The verification is done in ovl_maybe_validate_verity() which needs to
be called in the same places as ovl_maybe_lookup_lowerdata(), so there
is a new ovl_verify_lowerdata() helper that calls these in the right
order, and all current callers of ovl_maybe_lookup_lowerdata() are
changed to call it instead.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
Historically overlay.metacopy was a zero-size xattr, and it's
existence marked a metacopy file. This change adds a versioned header
with a flag field, a length and a digest. The initial use-case of this
will be for validating a fs-verity digest, but the flags field could
also be used later for other new features.
ovl_check_metacopy_xattr() now returns the size of the xattr,
emulating a size of OVL_METACOPY_MIN_SIZE for empty xattrs to
distinguish it from the no-xattr case.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
Do all the logic to set the mode during mount options parsing and
do not keep the option string around.
Use a constant_table to translate from enum redirect mode to string
in preperation for new mount api option parsing.
The mount option "off" is translated to either "follow" or "nofollow",
depending on the "redirect_always_follow" build/module config, so
in effect, there are only three possible redirect modes.
This results in a minor change to the string that is displayed
in show_options() - when redirect_dir is enabled by default and the user
mounts with the option "redirect_dir=off", instead of displaying the mode
"redirect_dir=off" in show_options(), the displayed mode will be either
"redirect_dir=follow" or "redirect_dir=nofollow", depending on the value
of "redirect_always_follow" build/module config.
The displayed mode reflects the effective mode, so mounting overlayfs
again with the dispalyed redirect_dir option will result with the same
effective and displayed mode.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
Defer lookup of lowerdata in the data-only layers to first data access
or before copy up.
We perform lowerdata lookup before copy up even if copy up is metadata
only copy up. We can further optimize this lookup later if needed.
We do best effort lazy lookup of lowerdata for d_real_inode(), because
this interface does not expect errors. The only current in-tree caller
of d_real_inode() is trace_uprobe and this caller is likely going to be
followed reading from the file, before placing uprobes on offset within
the file, so lowerdata should be available when setting the uprobe.
Tested-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Prepare to allow ovl_lookup() to leave the last entry in a non-dir
lowerstack empty to signify lazy lowerdata lookup.
In this case, ovl_lookup() stores the redirect path from metacopy to
lowerdata in ovl_inode, which is going to be used later to perform the
lazy lowerdata lookup.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Lookup in data-only layers only for a lower metacopy with an absolute
redirect xattr.
The metacopy xattr is not checked on files found in the data-only layers
and redirect xattr are not followed in the data-only layers.
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Introduce the format lowerdir=lower1:lower2::lowerdata1::lowerdata2
where the lower layers on the right of the :: separators are not merged
into the overlayfs merge dirs.
Data-only lower layers are only allowed at the bottom of the stack.
The files in those layers are only meant to be accessible via absolute
redirect from metacopy files in lower layers. Following changes will
implement lookup in the data layers.
This feature was requested for composefs ostree use case, where the
lower data layer should only be accessiable via absolute redirects
from metacopy inodes.
The lower data layers are not required to a have a unique uuid or any
uuid at all, because they are never used to compose the overlayfs inode
st_ino/st_dev.
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The ovl_inode contains a copy of lowerdata in lowerstack[], so the
lowerdata inode member can be removed.
Use accessors ovl_lowerdata*() to get the lowerdata whereever the member
was accessed directly.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The ovl_inode contains a copy of lowerpath in lowerstack[0], so the
lowerpath member can be removed.
Use accessor ovl_lowerpath() to get the lowerpath whereever the member
was accessed directly.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The lower stacks of all the ovl inode aliases should be identical
and there is redundant information in ovl_entry and ovl_inode.
Move lowerstack into ovl_inode and keep only the OVL_E_FLAGS
per overlay dentry.
Following patches will deduplicate redundant ovl_inode fields.
Note that for pure upper and negative dentries, OVL_E(dentry) may be
NULL now, so it is imporatnt to use the ovl_numlower() accessor.
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
In preparation for moving lowerstack into ovl_inode.
Note that in ovl_lookup() the temp stack dentry refs are now cloned
into the final ovl_lowerstack instead of being transferred, so cleanup
always needs to call ovl_stack_free(stack).
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This helps fortify against dereferencing a NULL ovl_entry,
before we move the ovl_entry reference into ovl_inode.
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Instead of open coded instances, because we are about to split
the two apart.
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
After copy up, we may need to update d_flags if upper dentry is on a
remote fs and lower dentries are not.
Add helpers to allow incremental update of the revalidate flags.
Fixes: bccece1ead36 ("ovl: allow remote upper")
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
|
If memory for uperredirect was allocated with kstrdup() in upperdir != NULL
and d.redirect != NULL path, it may seem that it can be lost when
upperredirect is reassigned later, but it's not possible.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 0a2d0d3f2f291 ("ovl: Check redirect on index as well")
Signed-off-by: Stanislav Goriainov <goriainov@ispras.ru>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
ovl_indexdir_cleanup() is called on mount of overayfs with nfs_export
feature to cleanup stale index records for lower and upper files that have
been deleted while overlayfs was offline.
This has the side effect (good or bad) of pre populating inode cache with
all the copied up upper inodes, while verifying the index entries.
For copied up directories, the upper file handles are decoded to conncted
upper dentries. This has the even bigger side effect of reading the
content of all the parent upper directories which may take significantly
more time and IO than just reading the upper inodes.
Do not request connceted upper dentries for verifying upper directory index
entries, because we have no use for the connected dentry.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
fix follow spelling misktakes:
decendant ==> descendant
indentify ==> identify
underlaying ==> underlying
Reported-by: Hacash Robot <hacashRobot@santino.com>
Signed-off-by: William Dean <williamsukatube@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Make the two places where lookup helpers can be called either on lower
or upper layers take the mount's idmapping into account. To this end we
pass down the mount in struct ovl_lookup_data. It can later also be used
to construct struct path for various other helpers. This is needed to
support idmapped base layers with overlay.
Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Add a helper that allows to retrieve ovl xattrs from either lower or
upper layers. To stop passing mnt and dentry separately everywhere use
struct path which more accurately reflects the tight coupling between
mount and dentry in this helper. Swich over all places to pass a path
argument that can operate on either upper or lower layers. This is
needed to support idmapped base layers with overlayfs.
Some helpers are always called with an upper dentry, which is now utilized
by these helpers to create the path. Make this usage explicit by renaming
the argument to "upperdentry" and by renaming the function as well in some
cases. Also add a check in ovl_do_getxattr() to catch misuse of these
functions.
Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Use helpers ovl_*xattr() to access user/trusted.overlay.* xattrs
and use helpers ovl_do_*xattr() to access generic xattrs. This is a
preparatory patch for using idmapped base layers with overlay.
Note that a few of those places called vfs_*xattr() calls directly to
reduce the amount of debug output. But as Miklos pointed out since
overlayfs has been stable for quite some time the debug output isn't all
that relevant anymore and the additional debug in all locations was
actually quite helpful when developing this patch series.
Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
We get occasional reports of lookup errors due to mismatched
origin ftype from users that re-format a lower squashfs image.
Commit 13c6ad0f45fd ("ovl: document lower modification caveats")
tries to discourage the practice of re-formating lower layers and
describes the expected behavior as undefined.
Commit b0e0f69731cd ("ovl: restrict lower null uuid for "xino=auto"")
limits the configurations in which origin file handles are followed.
In addition to these measures, change the behavior in case of detecting
a mismatch origin ftype in lookup to issue a warning, not follow origin,
but not fail the lookup operation either.
That should make overall more users happy without any big consequences.
Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxgPq9E9xxwU2CDyHy-_yCZZeymg+3n+-6AqkGGE1YtwvQ@mail.gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Instead of passing the overlay dentry.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs update from Miklos Szeredi:
- Fix a regression introduced in 5.2 that resulted in valid overlayfs
mounts being rejected with ELOOP (Too many levels of symbolic links)
- Fix bugs found by various tools
- Miscellaneous improvements and cleanups
* tag 'ovl-update-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: add debug print to ovl_do_getxattr()
ovl: invalidate readdir cache on changes to dir with origin
ovl: allow upperdir inside lowerdir
ovl: show "userxattr" in the mount data
ovl: trivial typo fixes in the file inode.c
ovl: fix misspellings using codespell tool
ovl: do not copy attr several times
ovl: remove ovl_map_dev_ino() return value
ovl: fix error for ovl_fill_super()
ovl: fix missing revert_creds() on error path
ovl: fix leaked dentry
ovl: restrict lower null uuid for "xino=auto"
ovl: check that upperdir path is not on a read-only mount
ovl: plumb through flush method
|
|
Since commit 6815f479ca90 ("ovl: use only uppermetacopy state in
ovl_lookup()"), overlayfs doesn't put temporary dentry when there is a
metacopy error, which leads to dentry leaks when shutting down the related
superblock:
overlayfs: refusing to follow metacopy origin for (/file0)
...
BUG: Dentry (____ptrval____){i=3f33,n=file3} still in use (1) [unmount of overlay overlay]
...
WARNING: CPU: 1 PID: 432 at umount_check.cold+0x107/0x14d
CPU: 1 PID: 432 Comm: unmount-overlay Not tainted 5.12.0-rc5 #1
...
RIP: 0010:umount_check.cold+0x107/0x14d
...
Call Trace:
d_walk+0x28c/0x950
? dentry_lru_isolate+0x2b0/0x2b0
? __kasan_slab_free+0x12/0x20
do_one_tree+0x33/0x60
shrink_dcache_for_umount+0x78/0x1d0
generic_shutdown_super+0x70/0x440
kill_anon_super+0x3e/0x70
deactivate_locked_super+0xc4/0x160
deactivate_super+0xfa/0x140
cleanup_mnt+0x22e/0x370
__cleanup_mnt+0x1a/0x30
task_work_run+0x139/0x210
do_exit+0xb0c/0x2820
? __kasan_check_read+0x1d/0x30
? find_held_lock+0x35/0x160
? lock_release+0x1b6/0x660
? mm_update_next_owner+0xa20/0xa20
? reacquire_held_locks+0x3f0/0x3f0
? __sanitizer_cov_trace_const_cmp4+0x22/0x30
do_group_exit+0x135/0x380
__do_sys_exit_group.isra.0+0x20/0x20
__x64_sys_exit_group+0x3c/0x50
do_syscall_64+0x45/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xae
...
VFS: Busy inodes after unmount of overlay. Self-destruct in 5 seconds. Have a nice day...
This fix has been tested with a syzkaller reproducer.
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: <stable@vger.kernel.org> # v5.8+
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 6815f479ca90 ("ovl: use only uppermetacopy state in ovl_lookup()")
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210329164907.2133175-1-mic@digikod.net
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
inode_wrong_type(inode, mode) returns true if setting inode->i_mode
to given value would've changed the inode type. We have enough of
those checks open-coded to make a helper worthwhile.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
CAP_DAC_READ_SEARCH is required by open_by_handle_at(2) so check it in
ovl_decode_real_fh() as well to prevent privilege escalation for
unprivileged overlay mounts.
[Amir] If the mounter is not capable in init ns, ovl_check_origin() and
ovl_verify_index() will not function as expected and this will break index
and nfs export features. So check capability in ovl_can_decode_fh(), to
auto disable those features.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
When the lower file of a metacopy is inaccessible, -EIO is returned. For
users not familiar with overlayfs internals, such as myself, the meaning of
this error may not be apparent or easy to determine, since the (metacopy)
file is present and open/stat succeed when accessed outside of the overlay.
Add a rate-limited warning for orphan metacopy to give users a hint when
investigating such errors.
Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxi23Zsmfb4rCed1n=On0NNA5KZD74jjjeyz+et32sk-gg@mail.gmail.com/
Signed-off-by: Kevin Locke <kevin@kevinlocke.name>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This replaces uuid with null in overlayfs file handles and thus relaxes
uuid checks for overlay index feature. It is only possible in case there is
only one filesystem for all the work/upper/lower directories and bare file
handles from this backing filesystem are unique. In other case when we have
multiple filesystems lets just fallback to "uuid=on" which is and
equivalent of how it worked before with all uuid checks.
This is needed when overlayfs is/was mounted in a container with index
enabled (e.g.: to be able to resolve inotify watch file handles on it to
paths in CRIU), and this container is copied and started alongside with the
original one. This way the "copy" container can't have the same uuid on the
superblock and mounting the overlayfs from it later would fail.
That is an example of the problem on top of loop+ext4:
dd if=/dev/zero of=loopbackfile.img bs=100M count=10
losetup -fP loopbackfile.img
losetup -a
#/dev/loop0: [64768]:35 (/loop-test/loopbackfile.img)
mkfs.ext4 loopbackfile.img
mkdir loop-mp
mount -o loop /dev/loop0 loop-mp
mkdir loop-mp/{lower,upper,work,merged}
mount -t overlay overlay -oindex=on,lowerdir=loop-mp/lower,\
upperdir=loop-mp/upper,workdir=loop-mp/work loop-mp/merged
umount loop-mp/merged
umount loop-mp
e2fsck -f /dev/loop0
tune2fs -U random /dev/loop0
mount -o loop /dev/loop0 loop-mp
mount -t overlay overlay -oindex=on,lowerdir=loop-mp/lower,\
upperdir=loop-mp/upper,workdir=loop-mp/work loop-mp/merged
#mount: /loop-test/loop-mp/merged:
#mount(2) system call failed: Stale file handle.
If you just change the uuid of the backing filesystem, overlay is not
mounting any more. In Virtuozzo we copy container disks (ploops) when
create the copy of container and we require fs uuid to be unique for a new
container.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This will be used in next patch to be able to change uuid checks and add
uuid nullification based on ofs->config.index for a new "uuid=off" mode.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|