Age | Commit message (Collapse) | Author |
|
[ Upstream commit 5a90f8d499225512a385585ffe3e28f687263d47 ]
Commit 8d391972ae2d ("gfs2: Remove __gfs2_writepage()") changed the log
flush code in gfs2_ail1_start_one() to call aops->writepages() instead
of aops->writepage(). For jdata inodes, this means that we will now try
to reserve log space and start a transaction before we can determine
that the pages in question have already been journaled. When this
happens in the context of gfs2_logd(), it can now appear that not enough
log space is available for freeing up log space, and we will lock up.
Fix that by issuing journal writes directly instead of going through
aops->writepages() in the log flush code.
Fixes: 8d391972ae2d ("gfs2: Remove __gfs2_writepage()")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d50a64e3c55e59e45e415c65531b0d76ad4cea36 ]
Move gfs2_trans_add_databufs() to trans.c. Pass in a glock instead of
a gfs2_inode.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Stable-dep-of: 5a90f8d49922 ("gfs2: Don't start unnecessary transactions during log flush")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2c63986dd35fa9eb0d7d1530b5eb2244b7296e22 ]
When creating and destroying inodes, we are relying on the inode hash
table to make sure that for a given inode number, only a single inode
will exist. We then link that inode to its inode and iopen glock and
let those glocks point back at the inode. However, when iget_failed()
is called, the inode is removed from the inode hash table before
gfs_evict_inode() is called, and uniqueness is no longer guaranteed.
Commit f1046a472b70 ("gfs2: gl_object races fix") was trying to work
around that problem by detaching the inode glock from the inode before
calling iget_failed(), but that broke the inode deallocation code in
gfs_evict_inode().
To fix that, deallocate partially created inodes in gfs2_create_inode()
instead of relying on gfs_evict_inode() for doing that.
This means that gfs2_evict_inode() and its helper functions will no
longer see partially created inodes, and so some simplifications are
possible there.
Fixes: 9ffa18884cce ("gfs2: gl_object races fix")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0cc617a54dfe6b44624c9a03e2e11a24eb9bc720 ]
Don't check for the GIF_ALLOC_FAILED flag in gfs2_ea_dealloc() and pass
that information explicitly instead. This allows for a cleaner
follow-up patch.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Stable-dep-of: 2c63986dd35f ("gfs2: deallocate inodes in gfs2_create_inode")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bcd18105fb34e27c097f222733dba9a3e79f191c ]
Move gfs2_dinode_dealloc() and its helper gfs2_final_release_pages()
from super.c to inode.c.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Stable-dep-of: 2c63986dd35f ("gfs2: deallocate inodes in gfs2_create_inode")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3774f53d7f0b30a996eab4a1264611489b48f14c ]
Having this flag attached to the iopen glock instead of the inode is
much simpler; it eliminates a protential weird race in gfs2_try_evict().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Stable-dep-of: 2c63986dd35f ("gfs2: deallocate inodes in gfs2_create_inode")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8bbfde0875590b71f012bd8b0c9cb988c9a873b9 ]
Introduce a new GLF_PENDING_REPLY flag to indicate that a reply from DLM
is expected. Include that flag in glock dumps to show more clearly
what's going on. (When the GLF_PENDING_REPLY flag is set, the GLF_LOCK
flag will also be set but the GLF_LOCK flag alone isn't sufficient to
tell that we are waiting for a DLM reply.)
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Stable-dep-of: 2c63986dd35f ("gfs2: deallocate inodes in gfs2_create_inode")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 57882533923ce7842a21b8f5be14de861403dd26 ]
Add a number of glock flags are currently not shown in the text form of
glock tracepoints.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Stable-dep-of: 2c63986dd35f ("gfs2: deallocate inodes in gfs2_create_inode")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ffd1cf0443a208b80e40100ed02892d2ec74c7e9 ]
When a request to evict an inode comes in over the network, we are
trying to grab an inode reference via the iopen glock's gl_object
pointer. There is a very small probability that by the time such a
request comes in, inode creation hasn't completed and the I_NEW flag is
still set. To deal with that, wait for the inode and then check if
inode creation was successful.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Stable-dep-of: 2c63986dd35f ("gfs2: deallocate inodes in gfs2_create_inode")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c79ba4be351a06e0ac4c51143a83023bb37888d6 ]
Rename enum dinode_demise to evict_behavior and its items
SHOULD_DELETE_DINODE to EVICT_SHOULD_DELETE,
SHOULD_NOT_DELETE_DINODE to EVICT_SHOULD_SKIP_DELETE, and
SHOULD_DEFER_EVICTION to EVICT_SHOULD_DEFER_DELETE.
In gfs2_evict_inode(), add a separate variable of type enum
evict_behavior instead of implicitly casting to int.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Stable-dep-of: 2c63986dd35f ("gfs2: deallocate inodes in gfs2_create_inode")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9fb794aac6ddd08a9c4982372250f06137696e90 ]
The GIF_DEFERRED_DELETE flag indicates an action that gfs2_evict_inode()
should take, so rename the flag to GIF_DEFER_DELETE to clarify.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Stable-dep-of: 2c63986dd35f ("gfs2: deallocate inodes in gfs2_create_inode")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1072b3aa6863bc4d91006038b032bfb4dcc98dec ]
Set gl_no_formal_ino of the iopen glock to the generation of the
associated inode (ip->i_no_formal_ino) as soon as that value is known.
This saves us from setting it later, possibly repeatedly, when queuing
GLF_VERIFY_DELETE work.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Stable-dep-of: 2c63986dd35f ("gfs2: deallocate inodes in gfs2_create_inode")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit ac5ee087d31ed93b6e45d2968a66828c6f621d8c upstream.
This patch moves the msleep_interruptible() out of the non-sleepable
context by moving the ls->ls_recover_spin spinlock around so
msleep_interruptible() will be called in a sleepable context.
Cc: stable@vger.kernel.org
Fixes: 4a7727725dc7 ("GFS2: Fix recovery issues for spectators")
Suggested-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9126d2754c5e5d1818765811a10af0a14cf1fa0a upstream.
When gfs2_sys_fs_add() fails, it sets sb->s_fs_info to NULL on its error
path (see commit 0d515210b696 ("GFS2: Add kobject release method")).
The intention seems to be to prevent dereferencing sb->s_fs_info once
the object pointed to has been deallocated, but that would be better
achieved by setting the pointer to NULL in free_sbd().
As a consequence, when the call to gfs2_sys_fs_add() fails in
gfs2_fill_super(), sdp = GFS2_SB(inode) will evaluate to NULL in iput()
-> gfs2_drop_inode(), and accessing sdp->sd_flags will be a NULL pointer
dereference.
Fix that by only setting sb->s_fs_info to NULL when actually freeing the
object pointed to in free_sbd().
Fixes: ae9f3bd8259a ("gfs2: replace sd_aspace with sd_inode")
Reported-by: syzbot+b12826218502df019f9d@syzkaller.appspotmail.com
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 62a2175ddf7e72941868f164b7c1f92e00f213bd ]
The filesystem's freeze/thaw functions can be called from contexts where
the holder isn't userspace but the kernel, e.g., during systemd
suspend/hibernate. So pass through the freeze/thaw flags from the VFS
instead of hard-coding them.
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit af4044fd0b77e915736527dd83011e46e6415f01 ]
When gfs2_create_inode() finds a directory, make sure to return -EISDIR.
Fixes: 571a4b57975a ("GFS2: bugger off early if O_CREAT open finds a directory")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ae9f3bd8259a0a8f67be2420e66bb05fbb95af48 ]
Currently, sdp->sd_aspace and the per-inode metadata address spaces use
sb->s_bdev->bd_mapping->host as their ->host; folios in those address
spaces will thus appear to be on bdev rather than on gfs2 filesystems.
This is a problem because gfs2 doesn't support cgroup writeback
(SB_I_CGROUPWB), but bdev does.
Fix that by using a "dummy" gfs2 inode as ->host in those address
spaces. When coming from a folio, folio->mapping->host->i_sb will then
be a gfs2 super block and the SB_I_CGROUPWB flag will not be set in
sb->s_iflags.
Based on a previous version from Bob Peterson from several years ago.
Thanks to Tetsuo Handa, Jan Kara, and Rafael Aquini for helping figure
this out.
Fixes: aaa2cacf8184 ("writeback: add lockdep annotation to inode_to_wb()")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d838605fea6eabae3746a276fd448f6719eb3926 ]
In run_queue(), check if the queue of pending requests is empty instead
of blindly assuming that it won't be.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 7c9d9223802fbed4dee1ae301661bf346964c9d2 upstream.
Truncate an inode's address space when flipping the GFS2_DIF_JDATA flag:
depending on that flag, the pages in the address space will either use
buffer heads or iomap_folio_state structs, and we cannot mix the two.
Reported-by: Kun Hu <huk23@m.fudan.edu.cn>, Jiaji Qin <jjtan24@m.fudan.edu.cn>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f9417fcfca3c5e30a0b961e7250fab92cfa5d123 ]
When mounting of a corrupted disk image fails, the error message printed
can reference uninitialized inode fields. To prevent that from happening,
always initialize those fields.
Reported-by: syzbot+aa0730b0a42646eb1359@syzkaller.appspotmail.com
Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7c6f714d88475ceae5342264858a641eafa19632 ]
Before commit f0e56edc2ec7 ("gfs2: Split the two kinds of glock "delete"
work"), function delete_work_func() was used to trigger the eviction of
in-memory inodes from remote as well as deleting unlinked inodes at a
later point. These two kinds of work were then split into two kinds of
work, and the two places in the code were deferred deletion of inodes is
required accidentally ended up queuing the wrong kind of work. This
caused unlinked inodes to be left behind, which could in the worst case
fill up filesystems and require a filesystem check to recover.
Fix that by queuing the right kind of work in try_rgrp_unlink() and
gfs2_drop_inode().
Fixes: f0e56edc2ec7 ("gfs2: Split the two kinds of glock "delete" work")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 160bc9555d8654464cbbd7bb1f6687048471d2f6 ]
Add an argument to gfs2_queue_verify_delete() that allows it to queue
GLF_VERIFY_DELETE work for immediate execution. This is used in the
next patch.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Stable-dep-of: 7c6f714d8847 ("gfs2: Fix unlinked inode cleanup")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 820ce8ed53ce2111aa5171f7349f289d7e9d0693 ]
Rename the GLF_VERIFY_EVICT flag to GLF_VERIFY_DELETE: that flag
indicates that we want to delete an inode / verify that it has been
deleted.
To match, rename gfs2_queue_verify_evict() to
gfs2_queue_verify_delete().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Stable-dep-of: 7c6f714d8847 ("gfs2: Fix unlinked inode cleanup")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 update from Andreas Gruenbacher:
- Convert the writepage address space operation to writepages (Matthew
Wilcox)
- A syzkaller fix (by Julian Sun) and a minor cleanup (Andreas
Gruenbacher)
* tag 'gfs2-v6.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Remove gfs2_aspace_writepage()
gfs2: Remove gfs2_jdata_writepage()
gfs2: Remove __gfs2_writepage()
gfs2: Add gfs2_aspace_writepages()
gfs2: fix double destroy_workqueue error
gfs2: Minor gfs2_glock_cb cleanup
|
|
In order to switch fuse over to using iomap for buffered writes we need
to be able to have the struct file for the original write, in case we
have to read in the page to make it uptodate. Handle this by using the
existing private field in the iomap_iter, and add the argument to
iomap_file_buffered_write. This will allow us to pass the file in
through the iomap buffered write path, and is flexible for any other
file systems needs.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/7f55c7c32275004ba00cddf862d970e6e633f750.1724755651.git.josef@toxicpanda.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
There are no remaining callers of gfs2_aspace_writepage() other than
vmscan, which is known to do more harm than good.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
There are no remaining callers of gfs2_jdata_writepage() other than
vmscan, which is known to do more harm than good.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Call aops->writepages() instead of using write_cache_pages() to call
aops->writepage. Change the handling of -ENODATA to not set the
persistent error on the block device.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
This saves one indirect function call per folio and gets us closer to
removing aops->writepage.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When gfs2_fill_super() fails, destroy_workqueue() is called within
gfs2_gl_hash_clear(), and the subsequent code path calls
destroy_workqueue() on the same work queue again.
This issue can be fixed by setting the work queue pointer to NULL after
the first destroy_workqueue() call and checking for a NULL pointer
before attempting to destroy the work queue again.
Reported-by: syzbot+d34c2a269ed512c531b0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d34c2a269ed512c531b0
Fixes: 30e388d57367 ("gfs2: Switch to a per-filesystem glock workqueue")
Cc: stable@vger.kernel.org
Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
In gfs2_glock_cb(), we only need to calculate the glock hold time for
inode glocks; the value is unused otherwise.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The logic for determining when to demote a glock in glock_work_func(),
introduced in commit 7cf8dcd3b68a ("GFS2: Automatically adjust glock min
hold time"), doesn't make sense: inode glocks have a minimum hold time
that delays demotion, while all other glocks are expected to be demoted
immediately. Instead of demoting non-inode glocks immediately,
glock_work_func() schedules glock work for them to be demoted, however.
Get rid of that unnecessary indirection.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Since the previous commit, function gfs2_quota_sync() will not cause the
sync generation to creep forward by one every time the function is
called; this helps keep things a but more tidy. We also don't care that
this function allocates a page of memory every time it is called, so no
good reason for keeping qd_changed() anymore, which just duplicates
qd_grab_sync().
This reverts commit 06aa6fd31a5f402b055e12ea53bb7b086359d3c8.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The quota sync generation is only ever updated under sd_quota_sync_mutex
by gfs2_quota_sync(), but its current value is fetched ouside of that
mutex, so use WRITE_ONCE() and READ_ONCE() when accessing it without
holding that mutex.
Pass the current sync generation to do_sync() from its callers to ensure
that we're not recording the wrong generation when the syncing is
done. Also, make sure that qd->qd_sync_gen only ever moves forward.
In gfs2_quota_sync(), only write the new sync generation when we know
that there are changes. This eliminates the need for function
sd_changed(), which we will remove in the next commit.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
With the locking the previous patch has introduced for each struct
gfs2_quota_data object, sd_quota_mutex has become largely irrelevant.
By waiting on the buffer head instead of waiting on the mutex in
get_bh(), it becomes completely irrelevant and can be removed.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The quota code is missing some locking between local quota changes and
syncing those quota changes to the global quota file (gfs2_quota_sync);
in particular, qd->qd_change needs to be kept in sync with the
QDF_CHANGE change flag and the number of references held. Use the
qd->qd_lockref.lock spinlock for that.
With the qd->qd_lockref.lock spinlock held, we can no longer call
lockref_get(), so turn qd_hold() into a variant that assumes that the
lock is held. This function is really supposed to take an additional
reference when one or more references are already held, so check for
that instead of checking if the lockref is dead.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The split between qd_fish() and gfs2_quota_sync() is rather unfortunate
as qd_fish() is repeatedly called to scan sdp->sd_quota_list only to
find the next object to that needs syncing; if there are multiple
objects on the list that need syncing, it makes more sense to grab them
all in one go. This is relatively easy to do when qd_fish() is folded
into gfs2_quota_sync().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Rename variable 'value' to 'change' as it stores a change in value.
Add new 'value' and 'limit' variables for the current value and limit.
Only fetch the tuning parameters when we need them.
Get rid of unnecessary nesting.
No change in functionality.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Function do_qc() is supposed to be conceptually simple: it alters the
current in-memory and on-disk quota change values for a given uid/gid by
a given delta. If the on-disk record isn't defined yet, a new record is
created. If the on-disk record exists and the resulting change value is
zero, there no longer is a need for that record and so the record is
deleted. On top of that, some reference counting is involved when
creating and deleting records.
Currently, instead of doing the above, do_qc() alters the on-disk value
and then it sets the in-memory value to the on-disk value. This is
incorrect when the on-disk value differs from the in-memory value. The
two values are allowed to differ when quota changes are synced to the
global quota file. Fix by changing both values by the same amount.
In addition, do_qc() currently gets confused when the delta value is 0.
It isn't supposed to be called that way, but that assumption isn't
mentioned and it makes the code harder to read. Make the code more
explicit.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Commit 432928c93779 ("gfs2: Add quota_change type") makes the incorrect
assertion that function do_qc() should behave differently in the two
contexts it is used in, but that isn't actually true. In all cases,
do_qc() grabs a "reference" when it starts using a slot in the per-node
quota changes file, and it releases that "reference" when no more
residual changes remain. Revert that broken commit.
There are some remaining issues with function do_qc() which are
addressed in the next commit.
This reverts commit 432928c9377959684c748a9bc6553ed2d3c2ea4f.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Commit 4c6a08125f22 ("gfs2: ignore negated quota changes") skips quota
changes with qd_change == 0 instead of writing them back, which leaves
behind non-zero qd_change values in the affected slots. The kernel then
assumes that those slots are unused, while the qd_change values on disk
indicate that they are indeed still in use. The next time the
filesystem is mounted, those invalid slots are read in from disk, which
will cause inconsistencies.
Revert that commit to avoid filesystem corruption.
This reverts commit 4c6a08125f2249531ec01783a5f4317d7342add5.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Rename qd_check_sync() to qd_grab_sync() and make it return a bool.
Turn the sync_gen pointer into a regular u64 and pass in U64_MAX instead
of a NULL pointer when sync generation checking isn't needed.
Introduce a new qd_ungrab_sync() helper for undoing the effects of
qd_grab_sync() if the subsequent bh_get() on the qd object fails.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The qd_bh_get_or_undo() helper introduced by that commit doesn't improve
the code much, so revert it and clean things up in a more useful way in
the next commit.
This reverts commit 7dbc6ae60dd7089d8ed42892b6a66c138f0aa7a0.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
In gfs2_quota_init(), make sure that the per-node "quota_change%u" file
doesn't contain duplicate uids/gids. Those duplicates would cause us to
acquire the glock corresponding to those ids repeatedly, which the glock
code doesn't allow.
When finding inconsistencies, we wipe them out and ignore them. The
resulting quotas will likely be inconsistent, and running quotacheck(1)
is advised.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Add a fail_brelse label and use it where useful. Move variable bh out
of the loop to extend its visibility to the new label. No functional
change.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The demote_ok glock operation is only still used to prevent the inode
glocks of the "jindex" and "rindex" directories from getting recycled
while they are still referenced by sdp->sd_jindex and sdp->sd_rindex.
However, the LRU walking code will no longer recycle glocks which are
referenced, so the demote_ok glock operation is obsolete and can be
removed.
Each of a glock's holders in the gl_holders list is holding a reference
on the glock, so when the list of holders isn't empty in demote_ok(),
the existing reference count check will already prevent the glock from
getting released. This means that demote_ok() is obsolete as well.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
This reverts commit e7ccaf5fe1590667b3fa2f8df5c5ec9ba0dc5b85.
Before commit e7ccaf5fe159, every time a resource group glock was
dequeued by gfs2_glock_dq(), it was added to the glock LRU list even
though the glock was still referenced by the resource group and could
never be evicted, anyway. Commit e7ccaf5fe159 added a GLOF_LRU hack to
avoid that overhead for resource group glocks, and that hack was since
adopted for some other types of glocks as well.
We now no longer add glocks to the glock LRU list while they are still
referenced. This solves the underlying problem, and obsoletes the
GLOF_LRU hack.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
(cherry picked from commit 3e5257c810cba91e274d07f3db5cf013c7c830be)
|
|
In the current glock reference counting model, a bias of one is added to
a glock's refcount when it is locked (gl->gl_state != LM_ST_UNLOCKED).
A glock is removed from the lru_list when it is enqueued, and added back
when it is dequeued. This isn't a very appropriate model because most
glocks are held for long periods of time (for example, the inode "owns"
references to its inode and iopen glocks as long as the inode is cached
even when the glock state changes to LM_ST_UNLOCKED), and they can only
be freed when they are no longer referenced, anyway.
Fix this by getting rid of the refcount bias for locked glocks. That
way, we can use lockref_put_or_lock() to efficiently drop all but the
last glock reference, and put the glock onto the lru_list when the last
reference is dropped. When find_insert_glock() returns a reference to a
cached glock, it removes the glock from the lru_list.
Dumping the "glocks" and "glstats" debugfs files also takes glock
references, but instead of removing the glocks from the lru_list in that
case as well, we leave them on the list. This ensures that dumping
those files won't perturb the order of the glocks on the lru_list.
In addition, when the last reference to an *unlocked* glock is dropped,
we immediately free it; this preserves the preexisting behavior. If it
later turns out that caching unlocked glocks is useful in some
situations, we can change the caching strategy.
It is currently unclear if a glock that has no active references can
have the GLF_LFLUSH flag set. To make sure that such a glock won't
accidentally be evicted due to memory pressure, we add a GLF_LFLUSH
check to gfs2_dispose_glock_lru().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Switch to a per-filesystem glock workqueue. Additional workqueues are
cheap nowadays, and keeping separate workqueues allows to flush the work
of each filesystem without affecting the others.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When glocks cannot be freed for a long time, avoid the "task blocked for
more than N seconds" messages and report how many glocks are still
outstanding, instead.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|