kern/thread: document scheduling

author: Richard Braun <rbraun@sceen.net> 2013-03-17 21:06:31 +0100
committer: Richard Braun <rbraun@sceen.net> 2013-03-17 21:06:31 +0100
commit: 784211ffc825480fcd3469b53b68404ab5b84c07 (patch)
tree: ed440158950f3a6f17f3400334f2a242de2dd641
parent: ca2f92953a4000544479b81412031f1c63b0f74e (diff)
2 files changed, 82 insertions, 5 deletions
diff --git a/kern/thread.c b/kern/thread.c
index 6e0df4ca..34d6db50 100644
--- a/kern/thread.c
+++ b/kern/thread.c
@@ -15,11 +15,70 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  *
  *
- * By convention, the name of a kernel thread is built by prefixing the
- * kernel name and adding the name of the start function, without the module
- * name ("thread"). Threads that are bound to a processor also include the
- * "/cpu_id" suffix. For example, "x15_balancer/1" is the name of the
- * inter-processor balancing thread of the second processor.
+ * The scheduling algorithm implemented by this module, named Distributed
+ * Group Ratio Round-Robin (DGR3), is based on the following papers :
+ *  - "Group Ratio Round-Robin: O(1) Proportional Share Scheduling for
+ *    Uniprocessor and Multiprocessor Systems" by Bogdan Caprita, Wong Chun
+ *    Chan, Jason Nieh, Clifford Stein and Haoqiang Zheng.
+ *  - "Efficient and Scalable Multiprocessor Fair Scheduling Using Distributed
+ *    Weighted Round-Robin" by Tong li, Dan Baumberger and Scott Hahn.
+ *
+ * Note that the Group Ratio Round-Robin (GR3) paper offers a multiprocessor
+ * extension, but based on a single global queue, which strongly limits its
+ * scalability on systems with many processors. That extension isn't used in
+ * this implementation.
+ *
+ * The basic idea is to use GR3 for processor-local scheduling, and Distributed
+ * Weighted Round-Robin (DWRR) for inter-processor load balancing. These
+ * algorithms were chosen for several reasons. To begin with, they provide
+ * fair scheduling, a very desirable property for a modern scheduler. Next,
+ * being based on round-robin, their algorithmic complexity is very low (GR3
+ * has O(1) scheduling complexity, and O(g) complexity on thread addition
+ * or removal, g being the number of groups, with one group per priority, a
+ * low number in practice). Finally, they're very simple to implement, making
+ * them easy to adjust and maintain.
+ *
+ * Both algorithms are actually slightly modified for efficiency. First, this
+ * version of GR3 is simplified by mapping exactly one group to one priority,
+ * and in turn, one weight. This is possible because priorities are intended
+ * to match Unix nice values, and systems commonly provide a constant, small
+ * set of nice values. This removes the need for accounting deficit. Next,
+ * round tracking is used to improve the handling of dynamic events : work
+ * scaling is performed only on thread addition, and not when a thread that
+ * was removed is added again during the same round. In addition, since GR3
+ * is itself a round-robin algorithm, it already provides the feature required
+ * from local scheduling by DWRR, namely round slicing. Consequently, DWRR
+ * doesn't sit "on top" of GR3, but is actually merged with it. The result is
+ * an algorithm that shares the same data for both local scheduling and load
+ * balancing.
+ *
+ * A few words are used by both papers with slightly different meanings. Here
+ * are the definitions used in this implementation :
+ *  - The time unit is the system timer tick
+ *  - Work is the amount of execution time units consumed
+ *  - Weight is the amount of execution time units allocated
+ *  - A round is the shortest period during which all threads in a run queue
+ *    consume their allocated time (i.e. their work reaches their weight)
+ *
+ * TODO Sub-tick accounting.
+ *
+ * TODO CPU affinity.
+ *
+ * TODO Take into account the underlying CPU topology (and adjust load
+ * balancing to access the global highest round less frequently on large
+ * processor groups, perhaps by applying the load balancing algorithm in a
+ * bottom-up fashion with one highest round per processor group).
+ *
+ * TODO Interactivity can currently not be experimented. The current strategy
+ * is to always add threads in front of their group queue and track rounds
+ * so that they don't get more time than they should. A direct consequence
+ * is that continually spawning threads at short intervals is likely to cause
+ * starvation. This could be fixed by adding newly created threads at the back
+ * of their group queue. For now, don't overengineer, and wait until all this
+ * can actually be tested.
+ *
+ * TODO Review weight computation (it may be more appropriate to determine
+ * weights in a smoother way than a raw scaling).
  */
 
 #include <kern/assert.h>
diff --git a/kern/thread.h b/kern/thread.h
index bc28eebd..388b3ef5 100644
--- a/kern/thread.h
+++ b/kern/thread.h
@@ -13,6 +13,22 @@
  *
  * You should have received a copy of the GNU General Public License
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ *
+ * The thread module aims at providing an interface suitable to implement
+ * POSIX scheduling policies. As such, it provides scheduling classes and
+ * policies that closely match the standard ones. The "real-time" policies
+ * (FIFO and RR) directly map the first-in first-out (SCHED_FIFO) and
+ * round-robin (SCHED_RR) policies, while the "time-sharing" policy (TS,
+ * named that way because, unlike real-time threads, time sharing is enforced)
+ * can be used for the normal SCHED_OTHER policy. The idle policy is reserved
+ * for idling kernel threads.
+ *
+ * By convention, the name of a kernel thread is built by prefixing the
+ * kernel name and adding the name of the start function, without the module
+ * name ("thread"). Threads that are bound to a processor also include the
+ * "/cpu_id" suffix. For example, "x15_balancer/1" is the name of the
+ * inter-processor balancing thread of the second processor.
  */
 
 #ifndef _KERN_THREAD_H
@@ -109,6 +125,8 @@ struct thread_ts_ctx {
 
 /*
  * Thread structure.
+ *
+ * TODO Document access synchronization.
  */
 struct thread {
     struct tcb tcb;
author	Richard Braun <rbraun@sceen.net>	2013-03-17 21:06:31 +0100
committer	Richard Braun <rbraun@sceen.net>	2013-03-17 21:06:31 +0100
commit	784211ffc825480fcd3469b53b68404ab5b84c07 (patch)
tree	ed440158950f3a6f17f3400334f2a242de2dd641
parent	ca2f92953a4000544479b81412031f1c63b0f74e (diff)