Age | Commit message (Collapse) | Author |
|
This currently contains a single RPC to get Linux-compatible hwcaps,
as well as the values of MIDR_EL1 and REVIDR_EL1 system registers.
In the future, this is expected to host the APIs to manage PAC keys,
and possibly some sort of AArch64-specific APIs for userland IRQ
handlers.
Message-ID: <20240415090149.38358-6-bugaevc@gmail.com>
|
|
And make it so that the generic vm_param.h doesn't require the machine-
specific one to define PAGE_SIZE etc. We *don't* want a PAGE_SIZE
constant to be statically exported to userland; instead userland should
initialize vm_page_size by querying vm_statistics(), and then use
vm_page_size.
We'd also like to eventually avoid exporting VM_MAX_ADDRESS, but this is
not feasible at the moment. To make it feasible in the future, userland
should try to avoid relying on the definition where possible.
Message-ID: <20240415090149.38358-5-bugaevc@gmail.com>
|
|
We use largely the same ABI as Linux: a syscall is invoked with the
"svc #0" instruction, passing arguments the same way as for a regular
function call. Specifically, up to 8 arguments are passed in the x0-x7
registers, and the rest are placed on the stack (this is only necessary
for the vm_map() syscall). w8 should contain the (negative) Mach trap
number. A syscall preserves all registers except for x0, which upon
returning contains the return value.
Message-ID: <20240415090149.38358-4-bugaevc@gmail.com>
|
|
This adds "aarch64" host support to the build system, along with some
uninteresting installed headers. The empty aarch64/aarch64/ast.h header
is also added to create the aarch64/aarch64/ directory (due to Git
peculiarity).
With this, it should be possible to run 'configure --host=aarch64-gnu'
and 'make install-data' successfully.
Message-ID: <20240415090149.38358-3-bugaevc@gmail.com>
|
|
This is distinct from CPU_TYPE_ARM, since we're going to exclusively use
AArch64 / A64, which CPU_TYPE_ARM was never meant to support, and to
match EM_AARCH64, which is also separate from EM_ARM. CPU_TYPE_X86_64
was similarly made distinct from CPU_TYPE_I386.
This is named CPU_TYPE_ARM64 rather than CPU_TYPE_AARCH64, since AArch64
is an "execution state" (analogous to long mode on x86_64) rather than a
CPU type. "ARM64" here is not a name of the architecture, but simply
means an ARM CPU that is capable of (and for our case, will only really
be) running in the 64-bit mode (AArch64).
There are no subtypes defined, and none are expected to be defined in
the future. Support for individual features/extensions should be
discovered by other means, i.e. the aarch64_get_hwcaps() RPC.
Message-ID: <20240415090149.38358-2-bugaevc@gmail.com>
|
|
If the host is loaded it may take some time to boot.
|
|
So it does not have to timeout.
|
|
The makefile pieces are not ready for this.
|
|
For thread_wakeup.
|
|
qemu-system-i386 says at most 2047 MB RAM can be simulated
|
|
|
|
|
|
We need it to properly driver interrupts etc. of APs
|
|
|
|
When operating on the kernel map, vm_map_pageable_scan() does what
the code itself describes as "HACK HACK HACK HACK": it unlocks the map,
and calls vm_fault_wire() with the map unlocked. This hack is required
to avoid a deadlock in case vm_fault or one of its callees (perhaps, a
pager) needs to allocate memory in the kernel map. The hack relies on
other kernel code being "well-behaved", in particular on that nothing
will do any serious changes to this region of memory while the map is
unlocked, since this region of memory is "owned" by the caller.
Even if the kernel code is "well-behaved" and doesn't alter VM regions
that it doesn't "own", it can still access adjacent regions. While this
doesn't affect the region being wired down as such, it can still end up
causing trouble due to extension & coalescence (merging) of VM entries.
VM entry coalescence is an optimization where two adjacent VM entries
with identical properties are merged into a single one that spans the
combined region of the two original entries. VM entry extension is a
similar an optimization where an existing VM entry is extended to cover
an adjacent region, instead of a new VM entry being created to describe
the region.
These optimizations are a private implementation detail of vm_map, and
(while they can be observed through e.g. vm_region) they are not
supposed to cause any visible effects to how the described regions of
memory behave; coalescence/extension and clipping happen automatically
as needed when adding or removing mappings, or changing their
properties. This is why it's fine for "well-behaved" kernel code to
unknowingly cause extension or coalescence of VM entries describing a
region by operating on adjacent VM regions.
The "HACK HACK HACK HACK" code path relies on the VM entries in the
region staying intact while it keeps the map unlocked, as it passes
direct pointers to the entries into vm_fault_wire(), and also walks the
list of entries in the region by following the vme_next pointers in the
entries. Yet, this assumption is violated by the entries getting
concurrently modified by other kernel code operating on adjacent VM
regions, as described above. This is not only undefined behavior in the
sense of the C language standard, but can also cause very real issues.
Specifically, we've been seeing the VM subsystem deadlock when building
Mach with SMP support and running a test program that calls
mach_port_names() concurrently and repearedly. mach_port_names()
implementation allocates and wires down memory, and when called from
multiple threads, it was likely to allocate, and wire, several adjacent
regions of memory, which would then cause entry coalescence/extension
and clipping to kick in. The specific sequence of events that led to a
deadlock appear to have been:
1. Multiple threads execute mach_port_names() concurrently.
2. One of the threads is wiring down a memory region, another is
unwiring an adjacent memory region.
3. The wiring thread has unlocked the ipc_kernel_map, and called into
vm_fault_wire().
4. Due to entry coalescence/extension, the entry the wiring thread was
going to wire down now describes a broader region of memory, namely
it includes an adjustent region of memory that has previously been
wired down by the other thread that is about to unwire it.
5. The wiring thread sets the busy bit on a wired-down page that the
unwiring thread is about to unwire, and is waiting to take the map
lock for reading in vm_map_verify().
6. The unwiring thread holds the map lock for writing, and is waiting
for the page to lose its busy bit.
7. Deadlock!
To prevent this from happening, we have to ensure that the VM entries,
at least as passed into vm_fault_wire() and as used for walking the list
of such entries, stay intact while we have the map unlocked. One simple
way to achieve that that I have proposed previously is to make a
temporary copy of the VM entries in the region, and pass the copies into
vm_fault_wire(). The entry copies would not be affected by coalescence/
extension, even if the original entries in the map are. This is however
only straigtforward to do when there's just a single entry describing
the while region, and there are further concerns with e.g. whether the
underlying memory objects could, too, get coalesced.
Arguably, making copies of the memory entries is making the hack even
bigger. This patch instead implements a relatively clean solution that,
arguably, makes the whole thing less of a hack: namely, making use of
the in-transition bit on VM entries to prevent coalescence and any other
unwanted effects. The entry in-transition bit was introduced for a very
similar use case: the VM map copyout logic has to temporarily unlock the
map to run its continuation, so it marks the VM entries it copied out
into the map up to that point as being "in transition", asking other
code to hold off making any serious changes to those entries. There's a
companion "needs wakeup" bit that other code can set to block on the VM
entry exiting this in-transition state; the code that puts an entry into
the in-transition state is expected to, when unsetting the in-transition
bit back, check for needs_wakeup being set, and wake any waiters up in
that case, so they can retry whatever operation they wanted to do. There
is no need to check for needs_wakeup in case of vm_map_pageable_scan(),
however, exactly because we expect kernel code to be "well-behaved" and
not make any attempts to modify the VM region.
This relies on the in-transition bit inhibiting coalescence/extension,
as implemented in the previous commit.
Also, fix a tiny sad misaligned comment line.
Reported-by: Damien Zammit <damien@zamaudio.com>
Helped-by: Damien Zammit <damien@zamaudio.com>
Message-ID: <20240405151850.41633-3-bugaevc@gmail.com>
|
|
The in-transition mechanism exists to make it possible to unlock a map
while still making sure some VM entries won't disappear from under you.
This is currently used by the VM copyin mechanics.
Entries in this state are better left alone, and extending/coalescing is
only an optimization, so it makes sense to skip it if the entry to be
extended is in transition. vm_map_coalesce_entry() already checks for
this; check for it in other similar places too.
This is in preparation for using the in-transition mechanism for wiring,
where it's much more important that the entries are not extended while
in transition.
Message-ID: <20240405151850.41633-2-bugaevc@gmail.com>
|
|
When operating on the kernel map, vm_map_pageable_scan() does what
the code itself describes as "HACK HACK HACK HACK": it unlocks the map,
and calls vm_fault_wire() with the map unlocked. This hack is required
to avoid a deadlock in case vm_fault or one of its callees (perhaps, a
pager) needs to allocate memory in the kernel map. The hack relies on
other kernel code being "well-behaved", in particular on that nothing
will do any serious changes to this region of memory while the map is
unlocked, since this region of memory is "owned" by the caller.
This reasoning doesn't apply to the validity of the 'end' entry (the
first entry after the region to be wired), since it's not a part of the
region, and is "owned" by someone else. Once the map is unlocked, the
'end' entry could get deallocated. Alternatively, a different entry
could get inserted after the VM region in front of 'end', which would
break the 'for (entry = start; entry != end; entry = entry->vme_next)'
loop condition.
This was not an issue in the original Mach 3 kernel, since it used an
address range check for the loop condition, but got broken in commit
023401c5b97023670a44059a60eb2a3a11c8a929 "VM: rework map entry wiring".
Fix this by switching the iteration back to use an address check.
This partly fixes a deadlock with concurrent mach_port_names() calls on
SMP, which was
Reported-by: Damien Zammit <damien@zamaudio.com>
Message-ID: <20240405151850.41633-1-bugaevc@gmail.com>
|
|
If a bootstrap ELF contains a PT_GNU_STACK phdr, take stack protection
from there. Otherwise, default to VM_PROT_ALL.
|
|
to get unmask_irq declaration
|
|
Message-ID: <20240327161841.95685-18-bugaevc@gmail.com>
|
|
Message-ID: <20240327161841.95685-17-bugaevc@gmail.com>
|
|
Message-ID: <20240327161841.95685-16-bugaevc@gmail.com>
|
|
Message-ID: <20240327161841.95685-15-bugaevc@gmail.com>
|
|
Message-ID: <20240327161841.95685-14-bugaevc@gmail.com>
|
|
Message-ID: <20240327161841.95685-13-bugaevc@gmail.com>
|
|
Mark it as noreturn, and make sure to halt, not reboot.
Message-ID: <20240327161841.95685-12-bugaevc@gmail.com>
|
|
There might be good reasons why Mach on x86 shouldn't be built as PIC/
PIE, but there are also very good reasons to support PIE on other
architectures. Potentially implementing KASLR is one such reason; but
also the Linux AArch64 boot protocol (that the AArch64 port will use for
booting) lets the bootloader load the kernel image at any address,
which makes PIC pretty much required.
Message-ID: <20240327161841.95685-11-bugaevc@gmail.com>
|
|
ipc_entry_lookup_failed() is used with both mach_msg_user_header_t and
mach_msg_header_t arguments, which are different types. Make it into a
macro, so it works with both.
Message-ID: <20240327161841.95685-9-bugaevc@gmail.com>
|
|
Initializing a variable with itself is undefined, and GCC 14 rightfully
produces a warning about the variable being used (to initialize itself)
prior to initialization. X15 sets the variables to 0 instead, so do the
same in Mach.
Message-ID: <20240327161841.95685-8-bugaevc@gmail.com>
|
|
Depending on the architecture and setup, it may not be possible to
access user memory directly, for example, due to user mode mappings not
being accessible from kernel mode (x86 SMAP, AArch64 PAN). There are
dedicated machine-specific copyin()/copyout() routines that know how to
access user memory from the kernel; use them.
Message-ID: <20240327161841.95685-6-bugaevc@gmail.com>
|
|
Not only on x86_64.
Message-ID: <20240327161841.95685-5-bugaevc@gmail.com>
|
|
Message-ID: <20240327161841.95685-4-bugaevc@gmail.com>
|
|
It's not only x86_64, none of new architectures are going to have it.
Message-ID: <20240327161841.95685-3-bugaevc@gmail.com>
|
|
Is _IO{,R,W,WR} macros conflict with the glibc-provided macros and bring
confusion as to what is supposed to be the right definition.
There is currently no user of it anyway, the Hurd console driver has its
own copy.
|
|
faster RPCs.
This is a follow up to
https://git.savannah.gnu.org/cgit/hurd/gnumach.git/commit/?id=69620634858b2992e1a362e33c95d9a8ee57bce7
where we made inlined ports 8 bytes long to avoid resizing.
The last thing that copy{in,out}msg were doing was just updating
msgt_size field since that's required for kernel stub code and implicitly
assumed by IPC code. This was moved into ipc_kmsg_copy{in,out}_body.
For a 32 bit userland, the code also stops updating
msgt_size for out of line ports, same as the 64 bit userland.
Message-ID: <ZdQxWNSieTHcpM1b@jupiter.tail36e24.ts.net>
|
|
Message-ID: <20240309140244.347835-3-luca@orpolo.org>
|
|
Message-ID: <20240309140244.347835-2-luca@orpolo.org>
|
|
This allows 32on64 to work again. Also, it's a clearer indication of a
missing part.
Message-ID: <20240309140244.347835-1-luca@orpolo.org>
|
|
Otherwise the types in linux/dev/include/linux/skbuff.h are unknown.
|
|
non-pageable
Otherwise, if the allocated memory is passed over for returning data such as
in device_read, we end up with
../vm/vm_map.c:4245: vm_map_copyin_page_list: Assertion `src_entry->wired_count > 0' failed.Debugger invoked: assertion failure
|
|
x86_64 ignores the segmentation limit, so we have to check it by hand
when accessing userland pointers.
Reported-by: Sergey Bugaev <bugaevc@gmail.com>
|
|
We should only set USER
- for user processes maps
- for 32bit Xen support
This was not actually posing problem since in 32bit segmentation
protects us, and in 64bit the l4 entry for the kernel is already set.
But better be safe than sorry.
|
|
If userland passes a kernel pointer, it's not a page fault that we get,
but a general protection fault. We also want to go through the recovery
in that case, to make e.g. copyin/out return an error.
|
|
Otherwise, it is easy to crash the kernel if userland passes arbitrary port
names.
Message-ID: <ZdriTgNhPsfu7c2M@jupiter.tail36e24.ts.net>
|
|
This will prevent calling vm_map_delete without the map locked
unless ref_count is zero.
Message-ID: <20240223081505.458240-1-damien@zamaudio.com>
|
|
Suggested-by: Damien Zammit <damien@zamaudio.com>
|
|
so that kern/machine.c can use it
|
|
Fixes assertion errors when LDEBUG is compiled in.
Message-ID: <20240223081404.458062-1-damien@zamaudio.com>
|
|
During quantum adjustment, disable interrupts and call appropriate lock.
Message-ID: <20240223080948.457792-1-damien@zamaudio.com>
|
|
This is not needed because cpu_up does this when it comes online,
it calls pset_add_processor().
Message-ID: <20240223080357.457465-1-damien@zamaudio.com>
|