linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2021-04-11	io_uring: refactor io_close	Pavel Begunkov
	A small refactoring shrinking it and making easier to read. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/19b24eed7cd491a0243b50366dd2a23b558e2665.1618101759.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: unify files and task cancel	Pavel Begunkov
	Now __io_uring_cancel() and __io_uring_files_cancel() are very similar and mostly differ by how we count requests, merge them and allow tctx_inflight() to handle counting. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1a5986a97df4dc1378f3fe0ca1eb483dbcf42112.1618101759.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: track inflight requests through counter	Pavel Begunkov
	Instead of keeping requests in a inflight_list, just track them with a per tctx atomic counter. Apart from it being much easier and more consistent with task cancel, it frees ->inflight_entry from being shared between iopoll and cancel-track, so less headache for us. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3c2ee0863cd7eeefa605f3eaff4c1c461a6f1157.1618101759.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: unify task and files cancel loops	Pavel Begunkov
	Move tracked inflight number check up the stack into __io_uring_files_cancel() so it's similar to task cancel. Will be used for further cleaning. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/dca5a395efebd1e3e0f3bbc6b9640c5e8aa7e468.1618101759.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: simplify apoll hash removal	Pavel Begunkov
	hash_del() works well with non-hashed nodes, there's no need to check if it is hashed first. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: refactor io_poll_complete()	Pavel Begunkov
	Remove error parameter from io_poll_complete(), 0 is always passed, and do a bit of cleaning on top. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: clean up io_poll_task_func()	Pavel Begunkov
	io_poll_complete() always fills an event (even an overflowed one), so we always should do io_cqring_ev_posted() afterwards. And that's what is currently happening, because second EPOLLONESHOT check is always true, it can't return !done for oneshots. Remove those branching, it's much easier to read. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io-wq: Fix io_wq_worker_affinity()	Peter Zijlstra
	Do not include private headers and do not frob in internals. On top of that, while the previous code restores the affinity, it doesn't ensure the task actually moves there if it was running, leading to the fun situation that it can be observed running outside of its allowed mask for potentially significant time. Use the proper API instead. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/YG7QkiUzlEbW85TU@hirez.programming.kicks-ass.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: don't attempt re-add of multishot poll request if racing	Jens Axboe
	We currently allow racy updates to multishot requests, but we can end up double adding the poll request if both completion and update does it. Ensure that we skip re-add on the update side if someone else is completing it. Fixes: b69de288e913 ("io_uring: allow events and user_data update of running poll requests") Reported-by: Joakim Hassila <joj@mac.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io-wq: simplify code in __io_worker_busy()	Hao Xu
	Leverage XOR to simplify the code in __io_worker_busy. Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/1617678525-3129-1-git-send-email-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: kill outdated comment about splice punt	Pavel Begunkov
	The splice/tee comment in io_prep_async_work() isn't relevant since the section was moved, delete it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/892a549c89c3d422b679677b8e68ffd3fcb736b6.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: encapsulate fixed files into struct	Pavel Begunkov
	Add struct io_fixed_file representing a single registered file, first to hide ugly struct file *, which may be misleading, and secondly to retype it to unsigned long as conversions to it and back to file for handling and masking FFS_* flags are getting nasty. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/78669731a605a7614c577c3de552631cfaf0869a.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: refactor file tables alloc/free	Pavel Begunkov
	Introduce a heler io_free_file_tables() doing all the cleaning, there are several places where it's hand coded. Also move all allocations into io_sqe_alloc_file_tables() and rename it, so all of it is in one place. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/502a84ebf41ff119b095e59661e678eacb752bf8.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: don't quiesce intial files register	Pavel Begunkov
	There is no reason why we would want to fully quiesce ring on IORING_REGISTER_FILES, if it's already registered we fail. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/563bb8060bb2d3efbc32fce6101678281c574d2a.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: set proper FFS* flags on reg file update	Pavel Begunkov
	Set FFS_* flags (e.g. FFS_ASYNC_READ) not only in initial registration but also on registered files update. Not a bug, but may miss getting profit out of the feature. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/df29a841a2d3d3695b509cdffce5070777d9d942.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: deduplicate NOSIGNAL setting	Pavel Begunkov
	Set MSG_NOSIGNAL and REQ_F_NOWAIT in send/recv prep routines and don't duplicate it in all four send/recv handlers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e1133a3ed1c0e192975b7341ea4b0bf91f63b132.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: put link timeout req consistently	Pavel Begunkov
	Don't put linked timeout req in io_async_find_and_cancel() but do it in io_link_timeout_fn(), so we have only one point for that and won't have to do it differently as it's now (put vs put_deferred). Btw, improve a bit io_async_find_and_cancel()'s locking. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d75b70957f245275ab7cba83e0ac9c1b86aae78a.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: simplify overflow handling	Pavel Begunkov
	Overflowed CQEs doesn't lock requests anymore, so we don't care so much about cancelling them, so kill cq_overflow_flushed and simplify the code. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5799867aeba9e713c32f49aef78e5e1aef9fbc43.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: lock annotate timeouts and poll	Pavel Begunkov
	Add timeout and poll ->comletion_lock annotations for Sparse, makes life easier while looking at the functions. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2345325643093d41543383ba985a735aeb899eac.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: kill unused forward decls	Pavel Begunkov
	Kill unused forward declarations for io_ring_file_put() and io_queue_next(). Also btw rename the first one. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/64aa27c3f9662e14615cc119189f5eaf12989671.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: store reg buffer end instead of length	Pavel Begunkov
	It's a bit more convenient for us to store a registered buffer end address instead of length, see struct io_mapped_ubuf, as it allow to not recompute it every time. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/39164403fe92f1dc437af134adeec2423cdf9395.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: improve import_fixed overflow checks	Pavel Begunkov
	Replace a hand-coded overflow check with a specialised function. Even though compilers are smart enough to generate identical binary (i.e. check carry bit), but it's more foolproof and conveys the intention better. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e437dcdc929bacbb6f11a4824ecbbf17225cb82a.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: refactor io_async_cancel()	Pavel Begunkov
	Remove extra tctx==NULL checks that are already done by io_async_cancel_one(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/70c2a8b958d942e86958a28af0452966ce1095b0.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: remove unused hash_wait	Pavel Begunkov
	No users of io_uring_ctx::hash_wait left, kill it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e25cb83c233a5f75f15275596b49fbafbea606fa.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: better ref handling in poll_remove_one	Pavel Begunkov
	Instead of io_put_req() to drop not a final ref, use req_ref_put(), which is slimmer and will also check the invariant. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/85b5774ce13ae55cc2e705abdc8cbafe1212f1bd.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: combine lock/unlock sections on exit	Pavel Begunkov
	io_ring_exit_work() already does uring_lock lock/unlock, no need to repeat it for lock waiting trick in io_ring_ctx_free(). Move the waiting with comments and spinlocking into io_ring_exit_work. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a8ae0589b0ea64ad4791e2c282e4e9b713dd7024.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: remove useless is_dying check on quiesce	Pavel Begunkov
	rsrc_data refs should always be valid for potential submitters, io_rsrc_ref_quiesce() restores it before unlocking, so percpu_ref_is_dying() check in io_sqe_files_unregister() does nothing and misleading. Concurrent quiesce is prevented with struct io_rsrc_data::quiesce. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/bf97055e1748ee3a382e66daf384a469eb90b931.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: reuse io_rsrc_node_destroy()	Pavel Begunkov
	Reuse io_rsrc_node_destroy() in __io_rsrc_put_work(). Also move it to a more appropriate place -- to the other node routines, and remove forward declaration. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cccafba41aee1e5bb59988704885b1340aef3a27.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: ctx-wide rsrc nodes	Pavel Begunkov
	If we're going to ever support multiple types of resources we need shared rsrc nodes to not bloat requests, that is implemented in this patch. It also gives a nicer API and saves one pointer dereference in io_req_set_rsrc_node(). We may say that all requests bound to a resource belong to one and only one rsrc node, and considering that nodes are removed and recycled strictly in-order, this separates requests into generations, where generation are changed on each node switch (i.e. io_rsrc_node_switch()). The API is simple, io_rsrc_node_switch() switches to a new generation if needed, and also optionally kills a passed in io_rsrc_data. Each call to io_rsrc_node_switch() have to be preceded with io_rsrc_node_switch_start(). The start function is idempotent and should not necessarily be followed by switch. One difference is that once a node was set it will always retain a valid rsrc node, even on unregister. It may be a nuisance at the moment, but makes much sense for multiple types of resources. Another thing changed is that nodes are bound to/associated with a io_rsrc_data later just before killing (i.e. switching). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7e9c693b4b9a2f47aa784b616ce29843021bb65a.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: refactor io_queue_rsrc_removal()	Pavel Begunkov
	Pass rsrc_node into io_queue_rsrc_removal() explicitly. Just a simple preparation patch, makes following changes nicer. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/002889ce4de7baf287f2b010eef86ffe889174c6.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: move rsrc_put callback into io_rsrc_data	Pavel Begunkov
	io_rsrc_node's callback operates only on a single io_rsrc_data and only with its resources, so rsrc_put() callback is actually a property of io_rsrc_data. Move it there, it makes code much nicecr. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9417c2fba3c09e8668f05747006a603d416d34b4.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: encapsulate rsrc node manipulations	Pavel Begunkov
	io_rsrc_node_get() and io_rsrc_node_set() are always used together, merge them into one so most users don't even see io_rsrc_node and don't need to care about it. It helped to catch io_sqe_files_register() inferring rsrc data argument for get and set differently, not a problem but a good sign. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0827b080b2e61b3dec795380f7e1a1995595d41f.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: use rsrc prealloc infra for files reg	Pavel Begunkov
	Keep it consistent with update and use io_rsrc_node_prealloc() + io_rsrc_node_get() in io_sqe_files_register() as well, that will be used in future patches, not as error prone and allows to deduplicate rsrc_node init. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cf87321e6be5e38f4dc7fe5079d2aa6945b1ace0.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: simplify io_rsrc_node_ref_zero	Pavel Begunkov
	Replace queue_delayed_work() with mod_delayed_work() in io_rsrc_node_ref_zero() as the later one can schedule a new work, and cleanup it further for better readability. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3b2b23e3a1ea4bbf789cd61815d33e05d9ff945e.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: name rsrc bits consistently	Pavel Begunkov
	Keep resource related structs' and functions' naming consistent, in particular use "io_rsrc" prefix for everything. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/962f5acdf810f3a62831e65da3932cde24f6d9df.1617287883.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io-wq: cancel task_work on exit only targeting the current 'wq'	Jens Axboe
	With using task_work_cancel(), we're potentially canceling task_work that isn't related to this specific io_wq. Use the newly added task_work_cancel_match() to ensure that we only remove and cancel work items that are specific to this io_wq. Fixes: 685fe7feedb9 ("io-wq: eliminate the need for a manager thread") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	task_work: add helper for more targeted task_work canceling	Jens Axboe
	The only exported helper we have right now is task_work_cancel(), which cancels any task_work from a given task where func matches the queued work item. This is a bit too coarse for some use cases. Add a task_work_cancel_match() that allows to more specifically target individual work items outside of purely the callback function used. task_work_cancel() can be trivially implemented on top of that, hence do so. Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: fix race around poll update and poll triggering	Jens Axboe
	Joakim reports that in some conditions he sees a multishot poll request being canceled, and that it coincides with getting -EALREADY on modification. As part of the poll update procedure, there's a small window where the request is marked as canceled, and if this coincides with the event actually triggering, then we can get a spurious -ECANCELED and termination of the multishot request. Don't mark the poll request as being canceled for update. We also don't care if we race on removal unless it's a one-shot request, we can safely updated for either case. Fixes: b69de288e913 ("io_uring: allow events and user_data update of running poll requests") Reported-by: Joakim Hassila <joj@mac.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: reg buffer overflow checks hardening	Pavel Begunkov
	We are safe with overflows in io_sqe_buffer_register() because it will just yield alloc failure, but it's nicer to check explicitly. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2b0625551be3d97b80a5fd21c8cd79dc1c91f0b5.1616624589.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: allow SQPOLL without CAP_SYS_ADMIN or CAP_SYS_NICE	Jens Axboe
	Now that we have any worker being attached to the original task as threads, accounting of CPU time is directly attributed to the original task as well. This means that we no longer have to restrict SQPOLL to needing elevated privileges, as it's really no different from just having the task spawn a busy looping thread in userspace. Reported-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io-wq: eliminate the need for a manager thread	Jens Axboe
	io-wq relies on a manager thread to create/fork new workers, as needed. But there's really no strong need for it anymore. We have the following cases that fork a new worker: 1) Work queue. This is done from the task itself always, and it's trivial to create a worker off that path, if needed. 2) All workers have gone to sleep, and we have more work. This is called off the sched out path. For this case, use a task_work items to queue a fork-worker operation. 3) Hashed work completion. Don't think we need to do anything off this case. If need be, it could just use approach 2 as well. Part of this change is incrementing the running worker count before the fork, to avoid cases where we observe we need a worker and then queue creation of one. Then new work comes in, we fork a new one. That last queue operation should have waited for the previous worker to come up, it's quite possible we don't even need it. Hence move the worker running from before we fork it off to more efficiently handle that case. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	kernel: allow fork with TIF_NOTIFY_SIGNAL pending	Jens Axboe
	fork() fails if signal_pending() is true, but there are two conditions that can lead to that: 1) An actual signal is pending. We want fork to fail for that one, like we always have. 2) TIF_NOTIFY_SIGNAL is pending, because the task has pending task_work. We don't need to make it fail for that case. Allow fork() to proceed if just task_work is pending, by changing the signal_pending() check to task_sigpending(). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: allow events and user_data update of running poll requests	Jens Axboe
	This adds two new POLL_ADD flags, IORING_POLL_UPDATE_EVENTS and IORING_POLL_UPDATE_USER_DATA. As with the other POLL_ADD flag, these are masked into sqe->len. If set, the POLL_ADD will have the following behavior: - sqe->addr must contain the the user_data of the poll request that needs to be modified. This field is otherwise invalid for a POLL_ADD command. - If IORING_POLL_UPDATE_EVENTS is set, sqe->poll_events must contain the new mask for the existing poll request. There are no checks for whether these are identical or not, if a matching poll request is found, then it is re-armed with the new mask. - If IORING_POLL_UPDATE_USER_DATA is set, sqe->off must contain the new user_data for the existing poll request. A POLL_ADD with any of these flags set may complete with any of the following results: 1) 0, which means that we successfully found the existing poll request specified, and performed the re-arm procedure. Any error from that re-arm will be exposed as a completion event for that original poll request, not for the update request. 2) -ENOENT, if no existing poll request was found with the given user_data. 3) -EALREADY, if the existing poll request was already in the process of being removed/canceled/completing. 4) -EACCES, if an attempt was made to modify an internal poll request (eg not one originally issued ass IORING_OP_POLL_ADD). The usual -EINVAL cases apply as well, if any invalid fields are set in the sqe for this command type. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: abstract out a io_poll_find_helper()	Jens Axboe
	We'll need this helper for another purpose, for now just abstract it out and have io_poll_cancel() use it for lookups. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: terminate multishot poll for CQ ring overflow	Jens Axboe
	If we hit overflow and fail to allocate an overflow entry for the completion, terminate the multishot poll mode. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: abstract out helper for removing poll waitqs/hashes	Jens Axboe
	No functional changes in this patch, just preparation for kill multishot poll on CQ overflow. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: add multishot mode for IORING_OP_POLL_ADD	Jens Axboe
	The default io_uring poll mode is one-shot, where once the event triggers, the poll command is completed and won't trigger any further events. If we're doing repeated polling on the same file or socket, then it can be more efficient to do multishot, where we keep triggering whenever the event becomes true. This deviates from the usual norm of having one CQE per SQE submitted. Add a CQE flag, IORING_CQE_F_MORE, which tells the application to expect further completion events from the submitted SQE. Right now the only user of this is POLL_ADD in multishot mode. Since sqe->poll_events is using the space that we normally use for adding flags to commands, use sqe->len for the flag space for POLL_ADD. Multishot mode is selected by setting IORING_POLL_ADD_MULTI in sqe->len. An application should expect more CQEs for the specificed SQE if the CQE is flagged with IORING_CQE_F_MORE. In multishot mode, only cancelation or an error will terminate the poll request, in which case the flag will be cleared. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: include cflags in completion trace event	Jens Axboe
	We should be including the completion flags for better introspection on exactly what completion event was logged. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: allocate memory for overflowed CQEs	Pavel Begunkov
	Instead of using a request itself for overflowed CQE stashing, allocate a separate entry. The disadvantage is that the allocation may fail and it will be accounted as lost (see rings->cq_overflow), so we lose reliability in case of memory pressure if the application is driving the CQ ring into overflow. However, it opens a way for for multiple CQEs per an SQE and even generating SQE-less CQEs. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> [axboe: use GFP_ATOMIC \| __GFP_ACCOUNT] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11	io_uring: mask in error/nval/hangup consistently for poll	Jens Axboe
	Instead of masking these in as part of regular POLL_ADD prep, do it in io_init_poll_iocb(), and include NVAL as that's generally unmaskable, and RDHUP alongside the HUP that is already set. Signed-off-by: Jens Axboe <axboe@kernel.dk>