summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2010-08-09default to simple_setattrChristoph Hellwig
With the new truncate sequence every filesystem that wants to support file size changes on disk needs to implement its own ->setattr. So instead of calling inode_setattr which supports size changes call into a simple method that doesn't support this. simple_setattr is almost what we want except that it does not mark the inode dirty after changes. Given that marking the inode dirty is a no-op for the simple in-memory filesystems that use simple_setattr currently just add the mark_inode_dirty call. Also add a WARN_ON for the presence of a truncate method to simple_setattr to catch new instances of it during the transition period. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09rename generic_setattrChristoph Hellwig
Despite its name it's now a generic implementation of ->setattr, but rather a helper to copy attributes from a struct iattr to the inode. Rename it to setattr_copy to reflect this fact. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09add missing setattr methodsChristoph Hellwig
For the new truncate sequence every filesystem that wants to truncate on-disk state needs a seattr method. Convert the remaining filesystems that implement the truncate inode operation to have its own setattr method. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09get rid of block_write_begin_newtruncChristoph Hellwig
Move the call to vmtruncate to get rid of accessive blocks to the callers in preparation of the new truncate sequence and rename the non-truncating version to block_write_begin. While we're at it also remove several unused arguments to block_write_begin. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09introduce __block_write_beginChristoph Hellwig
Split up the block_write_begin implementation - __block_write_begin is a new trivial wrapper for block_prepare_write that always takes an already allocated page and can be either called from block_write_begin or filesystem code that already has a page allocated. Remove the handling of already allocated pages from block_write_begin after switching all callers that do it to __block_write_begin. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09clean up write_begin usage for directories in pagecacheChristoph Hellwig
For filesystem that implement directories in pagecache we call block_write_begin with an already allocated page for this code, while the normal regular file write path uses the default block_write_begin behaviour. Get rid of the __foofs_write_begin helper and opencode the normal write_begin call in foofs_write_begin, while adding a new foofs_prepare_chunk helper for the directory code. The added benefit is that foofs_prepare_chunk has a much saner calling convention. Note that the interruptible flag passed into block_write_begin is always ignored if we already pass in a page (see next patch for details), and we never were doing truncations of exessive blocks for this case either so we can switch directly to block_write_begin_newtrunc. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09get rid of cont_write_begin_newtruncChristoph Hellwig
Move the call to vmtruncate to get rid of accessive blocks to the callers in preparation of the new truncate sequence and rename the non-truncating version to cont_write_begin. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09get rid of nobh_write_begin_newtruncChristoph Hellwig
Move the call to vmtruncate to get rid of accessive blocks to the only remaining caller and rename the non-truncating version to nobh_write_begin. Get rid of the superflous file argument to it while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09sort out blockdev_direct_IO variantsChristoph Hellwig
Move the call to vmtruncate to get rid of accessive blocks to the callers in prepearation of the new truncate calling sequence. This was only done for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant was not needed anyway. Get rid of blockdev_direct_IO_no_locking and its _newtrunc variant while at it as just opencoding the two additional paramters is shorted than the name suffix. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09fix leak in __logfs_create()Al Viro
if kmalloc fails, we still need to drop the inode, as we do on other failure exits. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09Fix reiserfs_file_release()Al Viro
a) count file openers correctly; i_count use was completely wrong b) use new mutex for exclusion between final close/open/truncate, to protect tailpacking logics. i_mutex use was wrong and resulted in deadlocks. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09missing include in hppfsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09Deal with missing exports for hostfsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09ecryptfs: dont call lookup_one_len to avoid NULL nameidataLino Sanfilippo
I have encountered the same problem that Eric Sandeen described in this post http://lkml.org/lkml/fancy/2010/4/23/467 while experimenting with stackable filesystems. The reason seems to be that ecryptfs calls lookup_one_len() to get the lower dentry, which in turn calls the lower parent dirs d_revalidate() with a NULL nameidata object. If ecryptfs is the underlaying filesystem, the NULL pointer dereference occurs, since ecryptfs is not prepared to handle a NULL nameidata. I know that this cant happen any more, since it is no longer allowed to mount ecryptfs upon itself. But maybe this patch it useful nevertheless, since the problem would still apply for an underlaying filesystem that implements d_revalidate() and is not prepared to handle a NULL nameidata (I dont know if there actually is such a fs). With this patch (against 2.6.35-rc5) ecryptfs uses the vfs_lookup_path() function instead of lookup_one_len() which ensures that the nameidata passed to the lower filesystems d_revalidate(). Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-08-09fs/ecryptfs/file.c: introduce missing freeJulia Lawall
The comments in the code indicate that file_info should be released if the function fails. This releasing is done at the label out_free, not out. The semantic match that finds this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @r exists@ local idexpression x; statement S; expression E; identifier f,f1,l; position p1,p2; expression *ptr != NULL; @@ x@p1 = kmem_cache_zalloc(...); ... if (x == NULL) S <... when != x when != if (...) { <+...x...+> } ( x->f1 = E | (x->f1 == NULL || ...) | f(...,x->f1,...) ) ...> ( return <+...x...+>; | return@p2 ...; ) @script:python@ p1 << r.p1; p2 << r.p2; @@ print "* file: %s kmem_cache_zalloc %s" % (p1[0].file,p1[0].line) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: stable@kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-08-09ecryptfs: release reference to lower mount if interpose failsLino Sanfilippo
In ecryptfs_lookup_and_interpose_lower() the lower mount is not decremented if allocation of a dentry info struct failed. As a result the lower filesystem cant be unmounted any more (since it is considered busy). This patch corrects the reference counting. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: stable@kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-08-09eCryptfs: Handle ioctl calls with unlocked and compat functionsTyler Hicks
Lower filesystems that only implemented unlocked_ioctl weren't being passed ioctl calls because eCryptfs only checked for lower_file->f_op->ioctl and returned -ENOTTY if it was NULL. eCryptfs shouldn't implement ioctl(), since it doesn't require the BKL. This patch introduces ecryptfs_unlocked_ioctl() and ecryptfs_compat_ioctl(), which passes the calls on to the lower file system. https://bugs.launchpad.net/ecryptfs/+bug/469664 Reported-by: James Dupin <james.dupin@gmail.com> Cc: stable@kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-08-09ecryptfs: Fix warning in ecryptfs_process_response()Prarit Bhargava
Fix warning seen with "make -j24 CONFIG_DEBUG_SECTION_MISMATCH=y V=1": fs/ecryptfs/messaging.c: In function 'ecryptfs_process_response': fs/ecryptfs/messaging.c:276: warning: 'daemon' may be used uninitialized in this function Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-08-09fix printk typo 'faild'Paul Bolle
Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-08-09Merge commit 'linus/master' into bkl/coreFrederic Weisbecker
Merge reason: The staging tree has introduced the easycap driver lately. We need the latest updates to pushdown the bkl in its ioctl helper.
2010-08-09autofs/autofs4: Move compat_ioctl handling into fsArnd Bergmann
Handling of autofs ioctl numbers does not need to be generic and can easily be done directly in autofs itself. This also pushes the BKL into autofs and autofs4 ioctl methods. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ian Kent <raven@themaw.net> Cc: Autofs <autofs@linux.kernel.org> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-08-08mtd: Remove obsolete <mtd/compatmac.h> includeDavid Woodhouse
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-08-08omfs: fix uninitialized variable warningBill Pemberton
quiet the warning: fs/omfs/file.c: In function 'omfs_get_block': fs/omfs/file.c:225: warning: 'new_block' may be used uninitialized in this function new_block is used properly by the call to omfs_grow_extent() Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Bob Copeland <me@bobcopeland.com>
2010-08-08jffs2: Update copyright noticesDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2010-08-07Merge branch 'bkl/core' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing * 'bkl/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: do_coredump: Do not take BKL init: Remove the BKL from startup code
2010-08-07Merge branch 'for-2.6.36' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.36' of git://linux-nfs.org/~bfields/linux: (34 commits) nfsd4: fix file open accounting for RDWR opens nfsd: don't allow setting maxblksize after svc created nfsd: initialize nfsd versions before creating svc net: sunrpc: removed duplicated #include nfsd41: Fix a crash when a callback is retried nfsd: fix startup/shutdown order bug nfsd: minor nfsd read api cleanup gcc-4.6: nfsd: fix initialized but not read warnings nfsd4: share file descriptors between stateid's nfsd4: fix openmode checking on IO using lock stateid nfsd4: miscellaneous process_open2 cleanup nfsd4: don't pretend to support write delegations nfsd: bypass readahead cache when have struct file nfsd: minor nfsd_svc() cleanup nfsd: move more into nfsd_startup() nfsd: just keep single lockd reference for nfsd nfsd: clean up nfsd_create_serv error handling nfsd: fix error handling in __write_ports_addxprt nfsd: fix error handling when starting nfsd with rpcbind down nfsd4: fix v4 state shutdown error paths ...
2010-08-07AFS: Fix the module init error handlingDavid Howells
Fix the module init error handling. There are a bunch of goto labels for aborting the init procedure at different points and just undoing what needs undoing - they aren't all in the right places, however. This can lead to an oops like the following: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: [<ffffffff81042a31>] destroy_workqueue+0x17/0xc0 ... Modules linked in: kafs(+) dns_resolver rxkad af_rxrpc fscache Pid: 2171, comm: insmod Not tainted 2.6.35-cachefs+ #319 DG965RY/ ... Process insmod (pid: 2171, threadinfo ffff88003ca6a000, task ffff88003dcc3050) ... Call Trace: [<ffffffffa0055994>] afs_callback_update_kill+0x10/0x12 [kafs] [<ffffffffa007d1c5>] afs_init+0x190/0x1ce [kafs] [<ffffffffa007d035>] ? afs_init+0x0/0x1ce [kafs] [<ffffffff810001ef>] do_one_initcall+0x59/0x14e [<ffffffff8105f7ee>] sys_init_module+0x9c/0x1de [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-07Merge branch 'nfs-for-2.6.36' of ↵Linus Torvalds
git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.36' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (42 commits) NFS: NFSv4.1 is no longer a "developer only" feature NFS: NFS_V4 is no longer an EXPERIMENTAL feature NFS: Fix /proc/mount for legacy binary interface NFS: Fix the locking in nfs4_callback_getattr SUNRPC: Defer deleting the security context until gss_do_free_ctx() SUNRPC: prevent task_cleanup running on freed xprt SUNRPC: Reduce asynchronous RPC task stack usage SUNRPC: Move the bound cred to struct rpc_rqst SUNRPC: Clean up of rpc_bindcred() SUNRPC: Move remaining RPC client related task initialisation into clnt.c SUNRPC: Ensure that rpc_exit() always wakes up a sleeping task SUNRPC: Make the credential cache hashtable size configurable SUNRPC: Store the hashtable size in struct rpc_cred_cache NFS: Ensure the AUTH_UNIX credcache is allocated dynamically NFS: Fix the NFS users of rpc_restart_call() SUNRPC: The function rpc_restart_call() should return success/failure NFSv4: Get rid of the bogus RPC_ASSASSINATED(task) checks NFSv4: Clean up the process of renewing the NFSv4 lease NFSv4.1: Handle NFS4ERR_DELAY on SEQUENCE correctly NFS: nfs_rename() should not have to flush out writebacks ...
2010-08-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: add retrieve request fuse: add store request fuse: don't use atomic kmap
2010-08-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (45 commits) nilfs2: reject filesystem with unsupported block size nilfs2: avoid rec_len overflow with 64KB block size nilfs2: simplify nilfs_get_page function nilfs2: reject incompatible filesystem nilfs2: add feature set fields to super block nilfs2: clarify byte offset in super block format nilfs2: apply read-ahead for nilfs_btree_lookup_contig nilfs2: introduce check flag to btree node buffer nilfs2: add btree get block function with readahead option nilfs2: add read ahead mode to nilfs_btnode_submit_block nilfs2: fix buffer head leak in nilfs_btnode_submit_block nilfs2: eliminate inline keywords in btree implementation nilfs2: get maximum number of child nodes from bmap object nilfs2: reduce repetitive calculation of max number of child nodes nilfs2: optimize calculation of min/max number of btree node children nilfs2: remove redundant pointer checks in bmap lookup functions nilfs2: get rid of nilfs_bmap_union nilfs2: unify bmap set_target_v operations nilfs2: get rid of nilfs_btree uses nilfs2: get rid of nilfs_direct uses ...
2010-08-07Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits) ext4: Adding error check after calling ext4_mb_regular_allocator() ext4: Fix dirtying of journalled buffers in data=journal mode ext4: re-inline ext4_rec_len_(to|from)_disk functions jbd2: Remove t_handle_lock from start_this_handle() jbd2: Change j_state_lock to be a rwlock_t jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stop ext4: Add mount options in superblock ext4: force block allocation on quota_off ext4: fix freeze deadlock under IO ext4: drop inode from orphan list if ext4_delete_inode() fails ext4: check to make make sure bd_dev is set before dereferencing it jbd2: Make barrier messages less scary ext4: don't print scary messages for allocation failures post-abort ext4: fix EFBIG edge case when writing to large non-extent file ext4: fix ext4_get_blocks references ext4: Always journal quota file modifications ext4: Fix potential memory leak in ext4_fill_super ext4: Don't error out the fs if the user tries to make a file too big ext4: allocate stripe-multiple IOs on stripe boundaries ext4: move aio completion after unwritten extent conversion ... Fix up conflicts in fs/ext4/inode.c as per Ted. Fix up xfs conflicts as per earlier xfs merge.
2010-08-07Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: ext3: Fix dirtying of journalled buffers in data=journal mode ext3: default to ordered mode quota: Use mark_inode_dirty_sync instead of mark_inode_dirty quota: Change quota error message to print out disk and function name MAINTAINERS: Update entries of ext2 and ext3 MAINTAINERS: Update address of Andreas Dilger ext3: Avoid filesystem corruption after a crash under heavy delete load ext3: remove vestiges of nobh support ext3: Fix set but unused variables quota: clean up quota active checks quota: Clean up the namespace in dqblk_xfs.h quota: check quota reservation on remove_dquot_ref
2010-08-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: fs/dlm: Drop unnecessary null test dlm: use genl_register_family_with_ops()
2010-08-07Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6: udf: super.c Fix warning: variable 'sbi' set but not used udf: remove duplicated #include
2010-08-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [DNS RESOLVER] Minor typo correction DNS: Fixes for the DNS query module cifs: Include linux/err.h for IS_ERR and PTR_ERR DNS: Make AFS go to the DNS for AFSDB records for unknown cells DNS: Separate out CIFS DNS Resolver code cifs: account for new creduid=0x%x parameter in spnego upcall string cifs: reduce false positives with inode aliasing serverino autodisable CIFS: Make cifs_convert_address() take a const src pointer and a length cifs: show features compiled in as part of DebugData cifs: update README Fix up trivial conflicts in fs/cifs/cifsfs.c due to workqueue changes
2010-08-07Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits) workqueue: mark init_workqueues() as early_initcall() workqueue: explain for_each_*cwq_cpu() iterators fscache: fix build on !CONFIG_SYSCTL slow-work: kill it gfs2: use workqueue instead of slow-work drm: use workqueue instead of slow-work cifs: use workqueue instead of slow-work fscache: drop references to slow-work fscache: convert operation to use workqueue instead of slow-work fscache: convert object to use workqueue instead of slow-work workqueue: fix how cpu number is stored in work->data workqueue: fix mayday_mask handling on UP workqueue: fix build problem on !CONFIG_SMP workqueue: fix locking in retry path of maybe_create_worker() async: use workqueue for worker pool workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead workqueue: implement unbound workqueue workqueue: prepare for WQ_UNBOUND implementation libata: take advantage of cmwq and remove concurrency limitations workqueue: fix worker management invocation without pending works ... Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c
2010-08-07writeback: optimize periodic bdi thread wakeupsArtem Bityutskiy
Whe the first inode for a bdi is marked dirty, we wake up the bdi thread which should take care of the periodic background write-out. However, the write-out will actually start only 'dirty_writeback_interval' centisecs later, so we can delay the wake-up. This change was requested by Nick Piggin who pointed out that if we delay the wake-up, we weed out 2 unnecessary contex switches, which matters because '__mark_inode_dirty()' is a hot-path function. This patch introduces a new function - 'bdi_wakeup_thread_delayed()', which sets up a timer to wake-up the bdi thread and returns. So the wake-up is delayed. We also delete the timer in bdi threads just before writing-back. And synchronously delete it when unregistering bdi. At the unregister point the bdi does not have any users, so no one can arm it again. Since now we take 'bdi->wb_lock' in the timer, which can execute in softirq context, we have to use 'spin_lock_bh()' for 'bdi->wb_lock'. This patch makes this change as well. This patch also moves the 'bdi_wb_init()' function down in the file to avoid forward-declaration of 'bdi_wakeup_thread_delayed()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07writeback: prevent unnecessary bdi threads wakeupsArtem Bityutskiy
Finally, we can get rid of unnecessary wake-ups in bdi threads, which are very bad for battery-driven devices. There are two types of activities bdi threads do: 1. process bdi works from the 'bdi->work_list' 2. periodic write-back So there are 2 sources of wake-up events for bdi threads: 1. 'bdi_queue_work()' - submits bdi works 2. '__mark_inode_dirty()' - adds dirty I/O to bdi's The former already has bdi wake-up code. The latter does not, and this patch adds it. '__mark_inode_dirty()' is hot-path function, but this patch adds another 'spin_lock(&bdi->wb_lock)' there. However, it is taken only in rare cases when the bdi has no dirty inodes. So adding this spinlock should be fine and should not affect performance. This patch makes sure bdi threads and the forker thread do not wake-up if there is nothing to do. The forker thread will nevertheless wake up at least every 5 min. to check whether it has to kill a bdi thread. This can also be optimized, but is not worth it. This patch also tidies up the warning about unregistered bid, and turns it from an ugly crocodile to a simple 'WARN()' statement. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07writeback: move bdi threads exiting logic to the forker threadArtem Bityutskiy
Currently, bdi threads can decide to exit if there were no useful activities for 5 minutes. However, this causes nasty races: we can easily oops in the 'bdi_queue_work()' if the bdi thread decides to exit while we are waking it up. And even if we do not oops, but the bdi tread exits immediately after we wake it up, we'd lose the wake-up event and have an unnecessary delay (up to 5 secs) in the bdi work processing. This patch makes the forker thread to be the central place which not only creates bdi threads, but also kills them if they were inactive long enough. This better design-wise. Another reason why this change was done is to prepare for the further changes which will prevent the bdi threads from waking up every 5 sec and wasting power. Indeed, when the task does not wake up periodically anymore, it won't be able to exit either. This patch also moves the the 'wake_up_bit()' call from the bdi thread to the forker thread as well. So now the forker thread sets the BDI_pending bit, then forks the task or kills it, then clears the bit and wakes up the waiting process. The only process which may wain on the bit is 'bdi_wb_shutdown()'. This function was changed as well - now it first removes the bdi from the 'bdi_list', then waits on the 'BDI_pending' bit. Once it wakes up, it is guaranteed that the forker thread won't race with it, because the bdi is not visible. Note, the forker thread sets the 'BDI_pending' bit under the 'bdi->wb_lock' which is essential for proper serialization. And additionally, when we change 'bdi->wb.task', we now take the 'bdi->work_lock', to make sure that we do not lose wake-ups which we otherwise would when raced with, say, 'bdi_queue_work()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07writeback: move last_active to bdiArtem Bityutskiy
Currently bdi threads use local variable 'last_active' which stores last time when the bdi thread did some useful work. Move this local variable to 'struct bdi_writeback'. This is just a preparation for the further patches which will make the forker thread decide when bdi threads should be killed. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07writeback: do not remove bdi from bdi_listArtem Bityutskiy
The forker thread removes bdis from 'bdi_list' before forking the bdi thread. But this is wrong for at least 2 reasons. Reason #1: if we temporary remove a bdi from the list, we may miss works which would otherwise be given to us. Reason #2: this is racy; indeed, 'bdi_wb_shutdown()' expects that bdis are always in the 'bdi_list' (see 'bdi_remove_from_list()'), and when it races with the forker thread, it can shut down the bdi thread at the same time as the forker creates it. This patch makes sure the forker thread never removes bdis from 'bdi_list' (which was suggested by Christoph Hellwig). In order to make sure that we do not race with 'bdi_wb_shutdown()', we have to hold the 'bdi_lock' while walking the 'bdi_list' and setting the 'BDI_pending' flag. NOTE! The error path is interesting. Currently, when we fail to create a bdi thread, we move the bdi to the tail of 'bdi_list'. But if we never remove the bdi from the list, we cannot move it to the tail either, because then we can mess up the RCU readers which walk the list. And also, we'll have the race described above in "Reason #2". But I not think that adding to the tail is any important so I just do not do that. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07writeback: do not lose wake-ups in bdi threadsArtem Bityutskiy
Currently, bdi threads ('bdi_writeback_thread()') can lose wake-ups. For example, if 'bdi_queue_work()' is executed after the bdi thread have had finished 'wb_do_writeback()' but before it called 'schedule_timeout_interruptible()'. To fix this issue, we have to check whether we have works to process after we have changed the task state to 'TASK_INTERRUPTIBLE'. This patch also clean-ups handling of the cases when 'dirty_writeback_interval' is zero or non-zero. Additionally, this patch also removes unneeded 'list_empty_careful()' call. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07writeback: harmonize writeback threads namingArtem Bityutskiy
The write-back code mixes words "thread" and "task" for the same things. This is not a big deal, but still an inconsistency. hch: a convention I tend to use and I've seen in various places is to always use _task for the storage of the task_struct pointer, and thread everywhere else. This especially helps with having foo_thread for the actual thread and foo_task for a global variable keeping the task_struct pointer This patch renames: * 'bdi_add_default_flusher_task()' -> 'bdi_add_default_flusher_thread()' * 'bdi_forker_task()' -> 'bdi_forker_thread()' because bdi threads are 'bdi_writeback_thread()', so these names are more consistent. This patch also amends commentaries and makes them refer the forker and bdi threads as "thread", not "task". Also, while on it, make 'bdi_add_default_flusher_thread()' declaration use 'static void' instead of 'void static' and make checkpatch.pl happy. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07coda: fixup clash with block layer REQ_* definesJens Axboe
CODA should not be using defines in the global name space of that nature, prefix them with CODA_. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07writeback: remove wb in get_next_work_itemMinchan Kim
83ba7b07 cleans up the writeback. So we don't use wb any more in get_next_work_item. Let's remove unnecessary argument. CC: Christoph Hellwig <hch@lst.de> Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07splice: fix misuse of SPLICE_F_NONBLOCKMiklos Szeredi
SPLICE_F_NONBLOCK is clearly documented to only affect blocking on the pipe. In __generic_file_splice_read(), however, it causes an EAGAIN if the page is currently being read. This makes it impossible to write an application that only wants failure if the pipe is full. For example if the same process is handling both ends of a pipe and isn't otherwise able to determine whether a splice to the pipe will fill it or not. We could make the read non-blocking on O_NONBLOCK or some other splice flag, but for now this is the simplest fix. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07block: push down BKL into .open and .releaseArnd Bergmann
The open and release block_device_operations are currently called with the BKL held. In order to change that, we must first make sure that all drivers that currently rely on this have no regressions. This blindly pushes the BKL into all .open and .release operations for all block drivers to prepare for the next step. The drivers can subsequently replace the BKL with their own locks or remove it completely when it can be shown that it is not needed. The functions blkdev_get and blkdev_put are the only remaining users of the big kernel lock in the block layer, besides a few uses in the ioctl code, none of which need to serialize with blkdev_{get,put}. Most of these two functions is also under the protection of bdev->bd_mutex, including the actual calls to ->open and ->release, and the common code does not access any global data structures that need the BKL. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07writeback: Add tracing to balance_dirty_pagesDave Chinner
Tracing high level background writeback events is good, but it doesn't give the entire picture. Add visibility into write throttling to catch IO dispatched by foreground throttling of processing dirtying lots of pages. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07writeback: Initial tracing supportDave Chinner
Trace queue/sched/exec parts of the writeback loop. This provides insight into when and why flusher threads are scheduled to run. e.g a sync invocation leaves traces like: sync-[...]: writeback_queue: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0 flush-8:0-[...]: writeback_exec: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0 This also lays the foundation for adding more writeback tracing to provide deeper insight into the whole writeback path. The original tracing code is from Jens Axboe, though this version is a rewrite as a result of the code being traced changing significantly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07gcc-4.6: fs: fix unused but set warningsAndi Kleen
No real bugs I believe, just some dead code, and some shut up code. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>