\part{Viengoos} \chapter{Designation} \begin{quotation} ``The name of the song is called `HADDOCKS' EYES.'\,'' ``Oh, that's the name of the song, is it?'' Alice said, trying to feel interested. ``No, you don't understand,'' the Knight said, looking a little vexed. ``That's what the name is CALLED. The name really IS `THE AGED AGED MAN.'\,'' ``Then I ought to have said `That's what the SONG is called'?'' Alice corrected herself. ``No, you oughtn't: that's quite another thing! The SONG is called `WAYS AND MEANS': but that's only what it's CALLED, you know!'' ``Well, what IS the song, then?'' said Alice, who was by this time completely bewildered. ``I was coming to that,'' the Knight said. ``The song really IS `A-SITTING ON A GATE': and the tune's my own invention.'' \begin{flushright} \emph{Through the Looking Glass}\\ Lewis Carroll \end{flushright} \end{quotation} Viengoos is an object-capability system. Objects are designated exclusively by way of capabilities, which are kernel-protected, unforgeable references. Capabilities are in turn designated by indexing an address space. Each thread object has a capability slot that identifies the root of its address space. When a thread invokes an object, it specifies an index. Viengoos finds the capability corresponding to this index in its address space and then dereferences the capability to obtain the object. This chapter first describes how capabilities work, their format, and the kernel supported methods for manipulating capabilities. We then discuss addressing. Namely, how addresses are encoded, address space construction, and address resolution. \section{Capabilities} A capability both \emph{designates} an object and \emph{authorizes} access to it. (The importance of this is best illustrated by the Confused Deputy problem \cite{hardy88confused-deputy}.) Capabilities are unforgeable in that they are kernel protected---their bit representation is never exposed---and thus can only be transferred via authorized channels. To sense or modify an object, a thread may \emph{invoke} it. Invocation causes a message to be sent to the object. The exact semantics of an invocation depend on the invoked object's implementation. A capability may be delegated by transferring it in an object invocation. When a capability is transferred in such a way, the capability is copied to the receipient's message buffer. Because the receive buffer is allocated beforehand, copying does not require that the kernel allocate memory. In Viengoos, the only way to revoke access to an object is to destroy the object.\footnote{Revocation can be implemented by way of Redell's Caretaker but so far, this mechanism has not been required.} By destroying the object, all capabilities designating it become invalid and act as if they designated the VOID object. Viengoos allows user-object implementations. A user object is implemented by a process. The process allocates an end point and delegates it to clients. To use the object, a client invokes the end point. The server process is then notified that there is a message and may act on it as it sees fit. As user objects are accessed in the same way as kernel objects, it is possible to interpose on specific objects or to fully or partially emulate the kernel from a user-space process. \subsection{Format} A capability is 128-bits wide and consists of the following fields: \begin{itemize} \item an object identifier (OID), \item a version, \item a weak predicate (W), \item address translation directives, \begin{itemize} \item a guard, and \item a sub-page descriptor \end{itemize} \item an object memory policy, \begin{itemize} \item a discardability predicate (D), and \item a priority \end{itemize} \end{itemize} \subsubsection{Object Identification} The OID field is used to locate an object. The OID corresponds to a block of storage on backing store. Backing store is managed by so-called backing store managers. When an object is referenced and the object is not in memory, Viengoos submits a request to page the object in to the appropriate backing store manager. Similarly, when Viengoos decides that the object should be flushed to persistent store, it sends a request to the backing store manager. When an object is destroyed, all references to it must be invalidated. Invalidating references is difficult as it requires finding all of the references. Maintaining a linked list of capabilities referencing an object requires two additional pointers per capability. But this only suffices for in-memory objects: if a cappage is paged-out and the object is destroyed, these must be invalidated as well. To work around this problem, each object also has a version number. When a capability to an object is created, the object's version number is copied into the capability. Then, when dereferencing a capability, the capability is only considered valid if the the version numbers match. If they do not match, then the reference is known to not be valid and the VOID object is returned instead of the object instance. The use of the version field raises another problem: it is limited in size. To avoid overflowing it and having to do a disk scavenge before being able to reuse the storage, it is imperative to control its growth. The solution EROS has used is to only bump the field if a capability designating the object goes to disk, a relatively rare occurrence, they observe, and to rate-limit that to once every few minutes \cite{citation-needed}. \subsubsection{Weak Capabilities} The data, cappage, endpoint, and activity objects implement two interfaces (facets): a so-called strong facet and a weak facet. The weak facet allows access to a subset of the functionality that the strong facet allows. A capability designating the weak facet of a data-page provides read-only access to the object. The same applies for a cappage, however, the access is transitively removed: strong capabilities fetched via a weak capability are downgraded by the kernel to weak reference the object's weak facet. A capability designating the weak facet of an end-point only allows enqueuing messages. And, a capability designating the weak facet of an activity does not allowing changing the activity's policy. \subsubsection{Address Translation} In Viengoos, address spaces are composed through the arrangement of cappages; cappages act as page-tables. A thread object contains a capability slot, which is filled with the root capability. Some object methods all take a capability designating the root. Viengoos uses a guarded page table scheme \cite{liedtke94page-table-structures-for-fine-grain-vm}. To support this, capabilities contain two fields: a guard and a subpage descriptor. The guard consists of a value and a length. A subpage descriptor allows the use of only part of a capability page in address translation. It consists of a subpage count and an offset. The count indicates the number of subpages in the cappage. This value must be between 1 and 256 inclusive and be a power of 2. For example, a count of 2 means to divide the cappage into two subpages, each consisting of $256 / 2 = 128$ capabilities. The offset is then used to select the subpage to index. Address translation is discussed in section \ref{address-translation}. \subsubsection{Object Memory Policy} To allow principals to control memory is managed, each capability contains two fields that describe the discardability and the priority of the designated object. Resource management is described in chapter~\ref{chapter:resource-management}. \section{Addressing} Capabilities designated using thread-local addresses. Each thread object contains a capability slot that identifies the root of its address space. To designate a capability, a thread specifies the index of the capability in this address space. \subsection{Address Encoding} On Viengoos, all addresses are 64-bits wide. This is true even on 32-bit platforms. On these platforms, hardware addresses are automatically extended. A Viengoos address consists of a {\bf prefix} and a {\bf depth}. The depth specifies the length of the prefix. This type of addressing allows addressing not only leaf objects but also internal nodes. (The intuition behind an addresses depth is how far into the tree to search.) The address prefix is encoded in the most significant bits of the address. This is followed by a bit with the value of 1, and then $63 - depth$ (\var{idepth}), which is encoded in unary. \begin{center} \begin{bytefield}{32} \tiny{63}\hspace{\stretch{1}}\tiny{0}\\ \bitsl{20}{depth}{prefix} & \bit{1} & \bitsl{11}{63 - depth}{idepth} \end{bytefield} \end{center} Observe that the value of idepth is the position of the least significant bit that is on. The address with all zeros is the NULL address. The NULL address is sometimes used to denote some default action. When returned, it typically means failure. By convention, addresses are written \emph{prefix/depth}. Viengoos automatically translates machine addresses to the above form. The prefix is set to the machine address zero-extended to 63 bits and the depth is set to 63. For machines with 64-bits addresses, addresses with the most significant bit set are illegal. The root capability slot is identified by the address 0/0. Its encoding is: \begin{center} \begin{bytefield}{32} \tiny{63}\hspace{\stretch{1}}\tiny{0}\\ \bit{1} & \bitsl{31}{63}{0} \end{bytefield} \end{center} The address 0x804b2c0 is encoded: \begin{center} \begin{bytefield}{32} \tiny{63}\hspace{\stretch{1}}\tiny{0}\\ \bitsl{31}{63}{0x804b2c0} & \bit{1} \end{bytefield} \end{center} The address of the data object that contains the above byte would be the address rounded down to the nearest page size and with a depth of 63 - the logarithm base 2 of the page size. If the underlying hardware has base pages with a size of 4kb, then the address would be 0x804b000/51. \subsection{Address Translation} \label{address-translation} \begin{algorithm} \begin{algorithmic}[1] \Function{Thread $\rightarrow$ CapabilitySlotLookup}{$\mathit{address}$} \State $C \gets \mathit{thread.root}$ \Comment{The root of the address space.} \State $P \gets \mathit{prefix}(\mathit{address})$ \Comment{The bits to translate.} \State $R \gets \mathit{depth}(\mathit{address})$ \Comment{The number of bits remaining.} \Statex \Loop \If {$R = 0$} \label{alg:before-guard-check-start} \State \Return $\&C$ \Comment{C is the designated capability.} \EndIf \label{alg:before-guard-check-end} \Statex \BigComment{Check the guard.} \If{$R < \mathit{guard\_length}(C)$} \label{alg:guard-compare-start} \State \Return failure \Comment{Not enough bits to translate guard.} \EndIf \If {$\mathit{guard}(C) \not= P_{R..R-\mathit{guard\_length}(C) + 1}$} \State \Return failure \Comment{The guard does not match.} \EndIf \State $R \gets R - \mathit{guard\_length}(C)$ \label{alg:guard-compare-end} \Statex \If {$R = 0$} \label{alg:after-guard-check-start} \State \Return $\&C$ \Comment{C is the designated capability.} \EndIf \label{alg:after-guard-check-end} \Statex \State $\triangleright$ Look up the object designated by the PTE. \State $O \gets \mathit{cap\_to\_object}(C)$ \label{alg:object-lookup} \If{$\neg O \mathit{or} \mathit{typeof} (O) \not= cappage$} \State \Return failure \Comment{Type mismatch.} \EndIf \Statex \State $\triangleright$ Index the capability page getting the next page table entry. \State $S \gets 256/\mathit{subpages}(C)$ \Comment{The subpage size.} \label{alg:subpage-index-start} \If {$R < log_2(S)$} \State \Return failure \Comment{Not enough bits to index the cappage.} \EndIf \State $C \gets O.\mathit{caps}\left[S/\mathit{subpages}(C) + P_{R..R-\log_2(S)+1}\right]$ \State $R \gets R - \log_2(S)$ \label{alg:subpage-index-end} \EndLoop \EndFunction \end{algorithmic} \caption{Capability slot lookup.} \label{alg:capability-lookup} \end{algorithm} \begin{figure} \begin{center} \begin{tikzpicture} % Draw a capability page on the right. \begin{scope}[shift={(6.5,0)}] \draw[dashed,black!80] (-0.5,0) -- +(3,0) (-0.5,3) node (subpage1) {} -- +(3,0) (-0.5,6) node (subpage0) {} -- +(3,0); \draw (-0.025,-0.025) rectangle +(2.05,6.05); % Each slot is 2x0.5. We leave 0.025 white space around the % outline shape thus imply 0.05 white space between objects. \foreach \s in {2,3,...,5,8,9,...,11} \draw (0, \s/2) +(1,0.25) node (slot\s) {} ++(0.025,0.025) rectangle +(1.95,0.45); \path (2,6) node[anchor=north west] {\small{0}} ++(0,-3) node[anchor=south west] {\small{127}} ++(0,0) node[anchor=north west] {\small{128}} ++(0,-3) node[anchor=south west] {\small{255}}; \node[anchor=south] at (slot11.north) {Cappage}; \end{scope} % Draw the address. % The address. \path[inner sep=0] node (apre) [anchor=west] at (0,1.5) {\ldots0110} (apre.east) node (ag) [anchor=west] {10000} (ag.east) node (ai) [anchor=west] {0000011} (ai.east) node (apost) [anchor=west] {1011\ldots}; % The bounding box. \path (apre.north west) +(-0.1,0.1) node (a_tl) {}; \path (apost.south east) +(0.1,-0.1) node (a_br) {}; \draw (a_tl) rectangle (a_br); % Vertical separators. \foreach \h in {0.8} { \draw (apre.east) -- +(0, \h / 2) -- +(0, -\h / 2); \draw (ag.east) -- +(0, \h / 2) -- +(0, -\h / 2); \draw (ai.east) -- +(0, \h / 2) -- +(0, -\h / 2); } % The labels. \path (ag.south) +(0,-0.5) node[anchor=base] {\small guard} (ai.south) +(0,-0.5) node[anchor=base] {\small index}; % Draw the capability. \draw (1,6) node[draw] (guard) {10000/5} (guard.east) +(0.05,0) node [draw,anchor=west] (subpage) {0/2} (subpage.east) +(0.05,0) node [draw,anchor=west] (oid) {0xF4D6}; \path (guard.north) +(0,0.1) node[anchor=base] {\small guard} (subpage.north) +(0,0.1) node[anchor=base] {\small subpage} (oid.north) +(0,0.1) node[anchor=base] {\small OID}; % Connect the dots. \draw[->] (oid.east) -- node [near start, above] {\small 2.} (subpage0); \draw[black!80] (subpage0) -- node (subpage0mid) {} (subpage1); \draw[->] (subpage.south) |- node [near start, left] {\small 3.} (subpage0mid); \path (guard) -- node (compare) {=?} (ag); \draw[->] (guard) -- (compare); \draw[->] (ag.north) +(0,0.1) -- (compare); \node at (compare.north west) {\small 1.}; \draw[->] (ai.north) +(0,0) node (ain) {} (slot8.west) ++(-0.8,0) node (slot8w) {} (ain) |- node [very near start, left] {\small 4.} (slot8w); \node[anchor=east] at (guard.west) {GPT:}; \node[anchor=east] at (apre.west) {Address:}; \end{tikzpicture} \end{center} \caption[Address translation using guard page tables]{Translating part of an address using a GPT entry. The capability containing the GPT entry is at the top left in the figure, to the right is the referenced capability page, and bottom left is the address. First, the guard is compared to the address. If they match, the object is found. The subpage descriptor selects a part of the capability page, which is then indexed using the next portion of the address.} \label{fig:address-translation} \end{figure} \index{address translation!algorithm|(} Address translation proceeds according to the following algorithm. Given an address, translation starts with the capability in the thread's address space capability slot. First, the most significant bits of the address are compared with the guard in the capability (lines \ref{alg:guard-compare-start}--\ref{alg:guard-compare-end}). If these match, those address bits are consumed. If there are no address bits left, then the designated capability slot has been located and is returned. Otherwise, the object designated by the capability is found (line \ref{alg:object-lookup}), divided according to the subpage descriptor in the capability and indexed using the most significant remaining bits of the address (lines \ref{alg:subpage-index-start}--\ref{alg:subpage-index-end}). Again, the number of bits used to index the subpage are consumed. If all the bits are consumed, the capability slot has been located and is returned. Otherwise, the process is repeated with the new capability and the remaining address bits. An iteration of this process is illustrated in figure~\ref{fig:address-translation}. Note that a capability slot can be identified by two different names: either with or without the guard specified in the slot. This is a matter of convenience: it is useful to be able to modify the capability that designates the object at a particular address by designating the object. If this functionality were not provided, doing this would require finding the guard, which is possible but cumbersome. Moreover, the extension is quite simple. When looking up objects, the same principle applies, however, the check if the address has been fully translated at lines~\ref{alg:before-guard-check-start}--\ref{alg:before-guard-check-end} is removed. That is, it is not sufficient to specify the capability slot that designates the object, the guard must also match. \index{address translation!algorithm|)} \section{Data Structures} \subsection{\type{addr}} The format of an address is: \begin{center} \begin{bytefield}{32} \tiny{63}\hspace{\stretch{1}}\tiny{0}\\ \bit{1} & \bitsl{31}{63}{0} \end{bytefield} \end{center} \var{idepth} is stored in unary. The depth is 63 - \var{idepth}. \subsection{\type{addr_trans}} The \type{addr_trans} structure has the following layout: \begin{struct}{32} \bitsl{14}{22-lsp}{guard} & \bitsl{8}{(lsp)}{subpage} & \bits{4}{$log_2$ sps} & \bits{6}{g\_depth} \end{struct} \var{$log_2$ sps} is logarithm base 2 of the number of subpages. \var{subpage} is the subpage to select. It has a width of \var{lsp}. \var{g\_depth} is the number of length of the guard. \var{guard} is the value of the guard and is zero-extended to \var{g\_depth}. Its width is also not fixed. \subsection{\type{object_policy}} The \type{object_policy} structure has the following layout: \begin{struct}{8} \bit{D} & \bits{7}{priority} \end{struct} \var{D} is the discardability predicate. \subsection{\type{cap_properties}} The \type{cap_properties} structure has the following layout: \begin{struct}{32} \bits{24}{\dontcare} & \bits{8}{object\_policy} \\ \bits{32}{addr\_trans} \end{struct} \subsection{\type{cap}} The following is the internal representation of a capability. Only the discardability predicate, the priority and the address translator are exposed to the user. \begin{struct}{32} \bits{23}{version} & \bit{W} & \bit{D} & \bits{7}{priority} \\ \bits{32}{address translator} \\ \wordbox{2}{OID} \end{struct} \var{D} is the discardability predicate. \var{W} is the weak predicate. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \chapter{Threads and Activations} A thread encapsulates an execution context. This consists of a register file, a name space, and a resource principal. Viengoos does not implement a first-class task abstraction, which encapsulates multiple threads running in a single address space. It is possible to achieve this on Viengoos by specifying the same address space root for multiple thread objects. In Viengoos, unlike in traditional kernels, threads are not blocking entities. Blocking is instead done by messengers, which hold and transfer messages (see chapter~\ref{messengers} for a description of messengers). By separating messengers from threads, it is possible to reliably wait for multiple events while the thread continues to execute and to do so in a manner that does not require the kernel or a server to block on the thread. \section{Thread State} A thread consists of four user-accessible capability slots: the address space root, the current activity the exception messenger and the user-thread control block (UTCB). A thread object also contains space to save its CPU state, which it contains when the thread is not executing on a CPU. \subsection{Address Space Root} The address space root capability slot determines the root of a thread's address space. This address space is used in two situations. First, when a thread performs an IPC, the arguments are identified by addresses, which are resolved in this address space. Second, this naming context determines the hardware address space. That is, it is the context in which the memory addresses to all hardware load and store instructions are resolved. \subsection{Activity} A thread's activity slot determines the thread's current activity. This is used to schedule the thread and to account resources that are allocated or consumed when resolving a page fault. If the activity slot does not contain a capability designating an activity, the thread is not scheduled. \subsection{Exception Messenger} When a thread generates an exception, for instance, when it attempts to access a memory location for which there is no valid translation, the thread is suspended, and an exception message is generated and delivered to the thread's exception messenger. The kernel delivers execptions in non-blocking mode meaning if the exeception messenger is not ready to receive a message, the message is dropped. \subsection{User-Thread Control Block} \begin{lstlisting}[float,caption=The user-thread control block structure.,frame=single,captionpos=b] struct utcb { union { struct { uintptr_t activated_mode : 1; uintptr_t pending_message : 1; uintptr_t interrupt_in_transition : 1; }; uint64_t status; }; uintptr_t saved_ip; uintptr_t saved_sp; uintptr_t activation_handler_sp; uintptr_t activation_handler_ip; uintptr_t activation_handler_end; uint64_t protected_payload; uint64_t messenger_id; /* Inline data. */ struct { uintptr_t inline_word_count : 2; uintptr_t inline_cap_count : 1; }; uintptr_t inline_words[2]; addr_t inline_caps[1]; }; \end{lstlisting} The user-thread control block (UTCB) is a normal data page. It provides a conduit for the user and kernel to communicate and coordinate action. This is primarily used for managing activations and in message delivery. \begin{description} \item[activated mode] The \var{activated mode} bit indicates whether the thread is in activated mode. An activation is only delivered if this bit is clear. Before a thread is activated, this bit is checked. If it is set, the activation is either delayed or dropped. In the former case, the messenger causing the activatation blocks on the thread. A blocked messenger may be unblocked either by a future activation or if \func{thread\_activation\_collect} is called. If the messenger is destroyed or the delivery aborted, the activation will not be delivered. \item[pending message] The kernel sets the \var{pending message} bit when a messenger attempts to deliver a message to the thread, however, the thread is in activated mode. \item[interrupt in transition] The \var{interrupt in transition} bit is set by the kernel when activating a thread to indicate whether the thread was activated while the instruction pointer was in the so-called \emph{activation transition range} (see \var{activation handler start} and \var{activation handler end} below). \item[saved ip \textmd{and} saved sp] When delivering an activation, the kernel interrupts the thread, sets the \var{saved ip} and \var{saved sp} variables to the instruction pointer and the stack pointer, sets the thread's sp and ip to \var{activation handler sp} and \var{activation handler ip}, respecitve, and then resumes the thread. \item[activation handler sp \textmd{and} activation handler ip] When the kernel activates a thread, it sets its SP and IP to these values, respectively. \item[activation handler ip \textmd{and} activation handler end] These variables determine the so-called \emph{activation transition range}. If the kernel activates a thread and its in the activation transition range (between \var{activation handler ip} inclusive and \var{activation handler end} exclusive, the kernel does not save the current IP and SP in \var{saved ip} and \var{saved sp} but sets the \var{interrupt in transition} bit. Using this mechanism, on many architectures, it is possible to atomically, with respect to activation delivery, clear \var{activated mode} and restore the interrupted stack pointer and instruction pointer without entering the kernel \item[protected payload] When delivering a messenger's message, the \var{protected payload} variable is set to the protected payload of the capability that was used to invoke the messenger to send the message. \item[messenger id] When delivering a messenger's message, the \var{messenger id} is set to the messenger's \var{message id}. \item[inline word count, inline cap count, inline words \textmd{and} inline caps] When a messenger delivers a message inline, \var{inline word count} is set to the number of words (not bytes) that were transferred and \var{inline cap count} to the number of capabilities transferred. The \var{inline words} variable is filled with any data. If the message includes capabilities, they are saved sequentially in the slots specified at invocation time and the address is copied to \var{inline capability}. If an error occurs while transferring a capability, the corresponding elemeent in \var{inline capabilities} is set to \const{ADDR\_VOID}. \end{description} \section{Activations} A thread may be activated if a messenger associated with the thread transfers or receives a message. Threads are only activated by messengers and scheduling events. In the case of scheduling events, a kernel-provided messenger whose \var{message id} variable is set to \const{\~0ULL} is used and delivery is done in a non-blocking manner. Activating a thread proceeds as follows: \begin{itemize} \item Atomically, with respect to the thread's execution: \begin{itemize} \item If the thread is activated (\var{activated mode} is non-zero): \begin{itemize} \item If delivery is non-blocking, return. \item Otherwise: \begin{itemize} \item Block the messenger on the thread, and \item Set \var{pending message} to $1$. \end{itemize} \end{itemize} \end{itemize} \item Suspend the thread's execution. \item Set \var{protected payload} to that saved in the messenger. \item Set \var{messenger id} to the messenger's messenger id. \item If the activation is the result of a message receipt: \begin{itemize} \item If the message is inline: \begin{itemize} \item Copy the messenger's data to \var{inline words} and \var{inline caps} \item Update \var{inline word count} and \var{inline cap count}. \end{itemize} \end{itemize} \item Set \var{activated mode} to $1$. \item If the thread's IP is between \var{activation handler ip} (inclusive) and \var{activation handler end} (exclusive): \begin{itemize} \item Set \var{interrupt in transition} to $1$. \end{itemize} \item Otherwise: \begin{itemize} \item Set \var{interrupt in transition} to $0$, \item Set \var{saved sp} to the thread's stack pointer, and \item Set \var{saved ip} to the thread's instruction pointer. \end{itemize} \item Set the thread's stack pointer to \var{activation handler sp}. \item Set the thread's instruction pointer to \var{activation handler ip}. \item Resume the thread. \end{itemize} \section{Exceptions} Exceptions are synthesized by the kernel in response to a thread action. There is one type of exception, a fault exception. The following actions result in the generation of a fault exception: \begin{description} \item[page fault] A hardware load or store attempts to dereference an address for which there is no valid translation. \item[access fault] A hardware load or store attempts to accesses an object in an unauthorized manner. \item[discarded] A hardware load or store attempts to access an object that has been discarded. \end{description} When a thread generates an exception, the thread is suspended and a fault message is delivered to its exception messenger. If this would block, the message is discarded. A fault message takes four parameters: the address of the fault (using Viengoos address encoding), the value of the stack pointer, the value of the instruction pointer and a fault information structure, which includes the type of access and whether the object has been discarded: \begin{center} \begin{bytefield}{32} \tiny{63}\hspace{\stretch{1}}\tiny{0}\\ \bitsx{28}{58}{$\sim$} & \bits{3}{rwx} & \bit{d} \end{bytefield} \end{center} \noindent The fault message does not include a reply messenger. \section{Methods} \begin{lstlisting} thread_exregs(cap_t activity, cap_t thread, uintptr_t flags, in out cap_t aspace, in out cap_t activity, in out cap_t utcb, in out cap_t exception_messenger, in out uintptr_t sp, in out uintptr_t ip) \end{lstlisting} \var{flags} is a bit-wise or of the following \begin{lstlisting} THREAD_EXREGS_SET_UTCB = 64 THREAD_EXREGS_SET_EXCEPTION_MESSENGER = 32 THREAD_EXREGS_SET_ASPACE = 16 THREAD_EXREGS_SET_ACTIVITY = 8 THREAD_EXREGS_SET_SP = 4 THREAD_EXREGS_SET_IP = 2 THREAD_EXREGS_GET_REGS = 1 \end{lstlisting} If \const{THREAD\_EXREGS\_GET\_REGS} is set, the current value of the address space root, activity, utcb, exception messenger, sp and ip are returned. Otherwise, the values are undefined. If \const{THREAD\_EXREGS\_SET\_IP} is set, the thread's instruction pointer is set according to \var{ip}. If \const{THREAD\_EXREGS\_SET\_SP} is set, the thread's stack pointer is set according to \var{sp}. If \const{THREAD\_EXREGS\_SET\_ACTIVITY} is set, the thread's activity is set according to \var{activity}. If \const{THREAD\_EXREGS\_SET\_ASPACE} is set, the thread's address space root is set according to \var{aspace}. If \const{THREAD\_EXREGS\_SET\_EXCEPTION\_MESSENGER} is set, the thread's exception messenger is set according to \var{exception\_messenger}. If \const{THREAD\_EXREGS\_SET\_UTCB} is set, the thread's UTCB is set according to \var{utcb}. \begin{lstlisting} thread_id (cap_t activity, cap_t thread, out uint64_t id) \end{lstlisting} Return the thread's unique identifier. \begin{lstlisting} thread_activation_collect (cap_t activity, cap_t thread) \end{lstlisting} Cause a blocked messenger, if any, to attempt to send an activation. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \chapter{Messengers and IPC} IPC in Viengoos is asynchronous with respect to thread execution. This is achieved by separating the messaging functionality from threads. To send an IPC, a program allocates a so-called messenger, loads a payload and then enqueues the messenger on a receiving messenger. When the receiving messenger accepts the message from the sending messenger, the message is copied and the threads associated with the two messengers are optionally notified of what has occurred by way of an activation. As messengers must be explicitly allocated like any other kernel object, the required storage can be correctly accounted. \section{Messages} Messages can carry both data and capabilities. A message is stored in a normal data page. To send a message, the page containing the message is associated with a messenger, which is then enqueued on the target object. When the target accepts the message, the source messenger's message is copied to the target messenger's message buffer. \subsection{Format} The kernel interprets a message buffer according to the following format. The first 32-bits of the buffer contain the message header. This consists of a 16-bit capability address count followed by a 16-bit data count. Immediately following the header is the array of capability addresses, followed by the array of bytes. \begin{lstlisting}[float,caption=Message format.,frame=single,captionpos=b] struct message { uint16_t cap_count; uint16_t data_count; addr_t caps[cap_count]; char data[data_count]; }; \end{lstlisting} When used to send a message, the capability addresses are interpreted as the location of the capabilities to send. When used to receive a message, the capability addresses are interpreted as the slots in which the received capabilities should be stored. \subsection{Canonical Form} Kernel objects interpret and format messages according to the following convention. For received message, the first word of a message is interpreted as the method to invoke on the object. The remaining bytes are the methods arguments. The last capability is interpreted as the messenger to reply to. When sending a reply, the first data word is the negation of the method identifier. The second word contains the error code. If no error occurred, the error code is set to 0. When marshalling and unmarshalling arguments, the size of each argument is rounded up to be a multiple of the word size and zero filled; the sign bit is not extended. \section{Messengers} Messengers are first-class kernel objects, which are responsible for receiving and transferring messages. A messenger references a message buffer and a thread. It can either transfer its contents to another messenger or its can wait for another messenger to send it a message. After sending or receiving a message, a messenger optionally notifies its associated thread by way of an activation. This is depicted in figure~\ref{fig:sending-a-message}. \begin{figure} \centering \begin{tikzpicture}[text width=2cm, text centered,font=\small] \node (thread1) at (0, 0) {Sending Thread}; \node (messenger1) at (3.5, 0) {Sending Messenger}; \node (messenger2) at (7, 0) {Receiving Messenger}; \node (thread2) at (10.5, 0) {Receiving Thread}; \node (buffer1) at (1.75, -2) {Buffer}; \node (buffer2) at (8.75, -2) {Buffer}; \draw[<->] (thread1) -- (messenger1); \draw[->] (messenger1) -- (messenger2); \draw[<->] (messenger2) -- (thread2); \draw[->] (thread1) -- (buffer1); \draw[->] (messenger1) -- (buffer1); \draw[->] (thread2) -- (buffer2); \draw[->] (messenger2) -- (buffer2); \draw[->,dashed] (buffer1) -- node[anchor=north] {\footnotesize{Copy}} (buffer2); \end{tikzpicture} \caption{To send or receive a message, a thread associates itself and a buffer with a messenger. To send a message, it enqueues the prepared messenger on some other messenger. When the latter messenger accepts the former's message, the contents of the sending messenger's message buffer is copied to the receiving messenger's message buffer. The associated threads are then optionally activated.} \label{fig:sending-a-message} \end{figure} To send a message, a messenger is enqueued on another messenger. The messenger's payload is only transferred to the target messenger once the messenger is unblocked. To prevent unprocessed messages from being overwritten, messengers are blocked on message delivery. A further message is only delivered once the messenger is explicitly unblocked. Message payload may be stored either in a message buffer or inline. This is specified in the IPC interface. \subsection{State} A messenger has four capability slots: a thread slot, an address space root slot, a message buffer slot and an activity slot. A messenger also contains a so-called \emph{messenger id} field, and a blocking status. \subsubsection{Thread} The thread slot specifies the thread to optionally activate when the messenger transfers its message or receives a message. This is controlled via the IPC system call. \subsubsection{Address Space Root} The address space root specifies the address space in which to interpret the capability addresses in the message buffer. \subsubsection{Message Buffer} The message buffer slot identifies the message buffer. \subsubsection{Activity} The activity identifies the activity used to send the message. \subsubsection{Messenger ID} The messenger ID is a 64-bit user-settable variable that is delivered to the thread (in its UTCB) on activation. This can be used to identify a user-buffer associated with a messenger. This variable can only be read or modified by way of a strong capability. \subsubsection{Blocking Status} A messenger is either block, in which case any attempts to deliver a message to it will block, or it is unblocked, in which an attempt to deliver a message to it will succeed immediately. \subsection{Message Transfer} \label{message-transfer} Messages are transfered between two messengers, a source messenger and a target messenger. A message transfer only occurs when the target messenger is not blocked. If a message transfer is attempted and the target messenger is blocked, then the transfer is either aborted (if the transfer is executed in non-blocking mode), or the source messenger is enqueued on the target messenger (otherwise). Message transfer proceeds as follows: \begin{itemize} \item The target messenger is blocked. \item The capabilities in the source message are matched with the capability slots in the target message. For each pair of capability address and capability slot address, the capability and slot are looked up relative to their respective messenger's address space root. If there is a valid, writable capability slot, the source capability is copied to it. If there is no source capability, a void capability is used. The target capability slot's address translator and policy are preserved. If the capability slot address does not resolve to a capability slot, or, the capability slot is not writable, the capability address in the target message is overwritten with \const{ADDR\_VOID}. If there are more capability slot addresses then there are capability addresses, each of the remaining capability slot addresses is overwritten with \const{ADDR\_VOID}. \item The data is byte copied from the source to the destination. \item If the target messenger is set to activate its associated thread on receive, this is scheduled. Likewise, if the source messenger is set to activate its associated thread on delivery, this is scheduled. \end{itemize} \section{IPC} IPC consists of three phases: the receive phase, the send phase and the return phase. All three phases are optional. Each phase is executed after the previous phase has completed. If a phase does not complete successfully, the phase is aborted and the remaining phases are not executed. The IPC interface has the following signature: \begin{lstlisting} error_t ipc (uintptr_t flags, cap_t recv_activity, cap_t recv_messenger, cap_t recv_buf, cap_t recv_inline_cap, cap_t send_activity, cap_t target_messenger, cap_t send_messenger, cap_t send_buf, uintptr_t send_inline_word1, uintptr_t send_inline_word2, cap_t send_inline_cap) \end{lstlisting} The flags parameter selects which phases are executes and controls their execution. It has the following format: \begin{struct}{32} \bits{15}{$\sim$} & \bit{R} & \bit{N} & \bit{A} & \bit{T} & \bit{S} & \bit{I} & \bit{C} & \bit{S} & \bit{n} & \bit{a} & \bit{t} & \bit{s} & \bit{i} & \bits{2}{W} & \bit{c} & \bit{r} \end{struct} The receive flags are: \begin{description} \item[R - receive phase] The IPC includes a receive phase. \item[N - non-blocking] The receive phase is non-blocking. \item[A - activate] On message receipt, the receiving messenger activates its associated thread. \item[T - set thread] Associate the receiving messenger with the calling thread. \item[S - set address space root] Set the receiving messenger's address space root to the caller's address space root. \item[I - receive inline] The receiving messenger should receive the message inline. \item[C - inline capability] Ignored if \textbf{I} is not set. The inline message includes a capability slot at \var{recv inline cap}. \end{description} The send flags are: \begin{description} \item[S - send phase] The IPC includes a send phase. \item[n - non-blocking] The send phase is non-blocking. \item[a - activate] On message delivery, the receiving messenger activates its associated thread. \item[t - set thread] Associate the sending messenger with the calling thread. \item[s - set address space root] Set the sending messenger's address space root to the caller's address space root. \item[i - send inline] The sending messenger should receive the message inline. \item[W - inline words] Ignored if \textbf{i} is not set. The number of inline words to transfer. Valid values are 0, 1 and 2. \item[c - inline capability] Ignored if \textbf{i} is not set. The number of inline capabilities to transfer. Valid values are 0 and 1. \end{description} The return flags are: \begin{description} \item[r - return phase] The IPC includes a return phase. \end{description} The remaining parameters are described below. The capability addresses are resolved in the context of the caller's address space. \subsection{Receive Phase} The receive phase proceeds as follows: \begin{itemize} \item \var{recv messenger} is looked up. If it does not designate a messenger or the designation is not strong, the IPC is aborted and \const{EINVAL} is returned. \item If the message is not inline and \var{recv buf} is not \const{ADDR\_VOID}, the messenger's message buffer capability slot is set to the capability designated by \var{recv buf}. \item If the set associated thread flag is set, the messenger's thread capability slot is set to a capability designating the caller's thread object. \item If the set address space root flag is set, the messenger's address space root capability slot is set to the calling thread's address space root. The thread's address space root's address translator and policy are copied. \item If one or more messengers are blocked on \var{recv messenger} trying to delivery a message, the messenger which has blocked longest is selected and its payload is transferred to \var{recv messenger}. See section~\ref{message-transfer} for details. \item If there are no messengers blocked on \var{recv messenger} trying to delivery a message and the non-blocking flag is set, \const{EWOULDBLOCK} is returned. Otherwise, \var{recv messenger} is unblocked. \end{itemize} \subsection{Send Phase} The send phase proceeds as follows: \begin{itemize} \item \var{send messenger} is looked up. If it does not designate a messenger or the designation is not strong, the IPC is aborted and \const{EINVAL} is returned. \item If the message is not inline and \var{send buf} is not \const{ADDR\_VOID}, the messenger's message buffer capability slot is set to the capability designated by \var{send buf}. \item If the set associated thread flag is set, the messenger's thread capability slot is set to a capability designating the caller's thread object. \item If the set address space root flag is set, the messenger's address space root capability slot is set to the calling thread's address space root. The thread's address space root's address translator and policy are copied. \item \var{target messenger} is looked up. If it does not designate a messenger, the IPC is aborted and \const{EINVAL} is returned. \item An attempt to deliver \var{send messenger}'s message to \var{target messenger} is made. If \var{target messenger} is blocked and delivery is non-blocking, delivery is aborted and \const{ETIMEDOUT} is returned. Otherwise, if \var{target messenger} is blocked, \var{send messenger} is blocked on it. Otherwise the message is delivered. See section~\ref{message-transfer} for message delivery details. \end{itemize} \subsection{Return Phase} The return phase proceeds as follows: \begin{itemize} \item Control is returned to the calling thread to just after the IPC call. Note that control is not returned by way of an activation. \end{itemize} If the IPC does not include a return phase, then thread blocks until it is next activated by an event other than a CPU available event. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \chapter{Resource Management} \label{chapter:resource-management} \section{Object Policy} When an object is accessed, if the object is claimed,\footnote{Claiming is discussed in \ref{object-claiming}.} the policy in the designating object is applied to the object. The discardability property is a hint that Viengoos may, instead of flushing changes to disk, simply discard a frame's content. If a capability has the weak predicate set, this hint is ignored. If content discarded, the next access to the object will raise a discarded event. If an activity is discarded, all objects allocated against the activity are destroyed. The priority property allows an activity to control the order in which the frames, which it has claimed, are released. If the content is dirty and has not been marked as discardable, the content is written to backing store. Otherwise, the frame is made eligible for immediate reuse. The lower the numric value of the priority field, the lower the frame's priority. Frames are released in priority order. If multiple frames have the same priority, they are released in a random order unless the priority is 0, in which case, the frames are released in approximately LRU order. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \chapter{Primordial Objects} \begin{quotation} \noindent I. The world is everything that is the case.\\ I.I The world is the totality of facts, not of things.\\ I.II The world is determined by the facts, and by these being \emph{all} the facts.\\ I.I2 For the totality of facts determines both what is the case, and also all that is not the case. \begin{flushright} \emph{Tractatus Logico-Philosophicus} by Ludwig Wittgenstein \end{flushright} \end{quotation} This chapter describes the primordial objects implemented by the microkernel. They include folios, the unit of storage allocation, data and capability pages, threads, message buffers, end points, and activities. These objects represent the fundamental building blocks of the system; all other objects are built from compositions of these objects. \clearpage \section{Objects} All objects are derived from the generic base object \type{object}. Each object has a number (possibly zero) of user-accessible capability slots. \begin{lstlisting} cap_copy (addr_t principal, addr_t object, addr_t target, addr_t source_address_space, addr_t source, uint32_t flags, struct cap_properties properties) \end{lstlisting} Copy the capability in the capability slot \var{source} in the address space rooted at \var{source\_address\_space} to \var{object}'s slot at address \var{target}. By default, preserves \var{source}'s subpage specification and \var{target}'s guard. If CAP\_COPY\_COPY\_SUBPAGE is set, then uses the subpage specification in CAP\_PROPERTIES. If CAP\_COPY\_COPY\_ADDR\_TRANS\_GUARD is set, uses the guard description in CAP\_PROPERTIES. If CAP\_COPY\_COPY\_SOURCE\_GUARD is set, uses the guard description in source. Otherwise, preserves the guard in TARGET. If CAP\_COPY\_WEAKEN is set, saves a weakened version of SOURCE in *TARGET (e.g., if SOURCE's type is cap\_page, *TARGET's type is set to cap\_rpage). If CAP\_COPY\_DISCARDABLE\_SET is set, then sets the discardable bit based on the value in PROPERTIES. Otherwise, copies SOURCE's value. If CAP\_COPY\_PRIORITY\_SET is set, then sets the priority based on the value in properties. Otherwise, copies SOURCE's value. \begin{lstlisting} cap_read (addr_t, principal, addr_t, address_space, addr_t, cap, l4_word_t, type, struct cap_properties, properties) \end{lstlisting} Returns the public bits of the capability CAP in TYPE and CAP\_PROPERTIES. \clearpage \section{Folios} A folio is the unit of backing store allocation. A folio consists of 129 4k pages. 128 may be used to allocate objects and the remainder is a header that describes the folio itself and the individual objects. The header holds a \subsection{Data Structures} \subsubsection{folio\_priority} \begin{struct}{32} \bit{\dontcare} & \bits{15}{priority} & \bits{15}{group} & \bit{D} \end{struct} \var{D} is the discardability predicate. \begin{struct}{32} \bits{5}{\dontcare} & \bit{C} & \bits{6}{type} & \bits{20}{version} \\ \wordbox{2}{wait\_queue\_next} \wordbox{2}{wait\_queue\_prev} \end{struct} \subsection{Methods} \subsection{Convenience Functions} \clearpage \section{Pages} Data pages and capabilities pages. \subsection{Methods} \subsection{Convenience Functions} \clearpage \section{Threads} \subsection{Methods} \subsection{Convenience Functions} \clearpage \section{Messengers} \subsection{Methods} \subsection{Convenience Functions} \clearpage \section{Endpoints} \subsection{Methods} \subsection{Convenience Functions} \clearpage \section{Activities} An activity is a resource principal. \subsection{Methods} \subsection{Convenience Functions} \chapter{Exceptions} Exception handling mechanism. \chapter{Resource Management}