hurd/gnumach.git - GNU Mach

Age	Commit message (Collapse)	Author
3 days	Implement per-task virtual memory limitHEAD master	Diego Nieto Cid
	* doc/mach.texi: add a "Memory Limitations" section to document the new interfaces. * include/mach/gnumach.defs: (vm_set_size_limit) new routine (vm_get_size_limit) likewise * kern/task.c: (task_create_kernel) if parent_task is not null copy virtual memory limit * tests/test-vm.c: (test_vm_limit) add test for the new routines * vm/vm_map.h: (struct vm_map) new fields size_none, size_cur_limit and size_max_limit (vm_map_find_entry) add new parameters cur_protection and max_protection * vm/vm_map.c: (vm_map_setup) initialize new fields (vm_map_enforce_limit) new function (vm_map_copy_limits) new function (vm_map_find_entry) add protection and max_protection parameters. call limit enforcer function (vm_map_enter) likewise (vm_map_copyout) likewise (vm_map_copyout_page_list) likewise (vm_map_fork) copy parent limit to the new map and compute and set size_none of the new map * vm/vm_user.c: (vm_set_size_limit) new function (vm_get_size_limit) likewise * xen/grant.c: update call to vm_map_find_entry to pass protection parameters Message-ID: <0b71f4f89b7cc2b159893a805480d7493d522d60.1758485757.git.dnietoc@gmail.com>
2025-06-24	i386 kern: fix overflow in vm_object_print_part call	Milos Nikic
	The call to vm_object_print_part was passing 0ULL and ~0ULL for offset and size, respectively. These values are 64-bit (unsigned long long), which causes compiler warnings when building for 32-bit platforms where vm_offset_t and vm_size_t are typedefs of uintptr_t (i.e., unsigned int). This patch replaces those constants with 0 and UINTPTR_MAX, which match the expected types and avoid implicit conversion or overflow warnings. No functional change. Message-ID: <20250623235844.763-1-nikic.milos@gmail.com>
2025-06-16	vm_page: Make sure to inspect internal page list	Michael Kelly
	notably when the external page list is not empty but vm_page_can_move returns false.
2025-06-09	vm_allocate_contiguous: Avoid 64bit addresses for 32bit userland	Samuel Thibault

2025-06-08	vm_object_pmap_remove: Also remove in the shadow object	Michael Kelly
	Otherwise, when e.g. terminating tasks with shared mappings, some references remain within the pv_head_table to pmap_t objects that had already been destroyed.
2025-05-29	vm_page_seg_evict: Fix creating a pager for internal objects with DMM	Samuel Thibault
	fd63a4bbf6f2 ("vm_page: Avoid creating a pager in DMM when not double-paging") avoided crashes when we do not have a DMM and wish to evict an internal opbject. But the double_paging condition was too restrictive and prevented normal internal objects from paging out when we do have a DMM. This re-enables creating a pager when we do have a DMM, so internal objects can be paged out.
2025-05-29	vm_page_seg_evict: Fix object pointer on restart	Michael Kelly
	The code further down depends on object being NULL if we goto out.
2025-05-12	vm_page: Explain which we clear modify on migration of a non-dirty page	Samuel Thibault

2025-05-07	vm: Add missing lock	Samuel Thibault

2025-03-25	vm_page_free: Do not check mem->object lock when mem->absent is false	Samuel Thibault
	It may be NULL in that case.
2025-02-26	vm_map_print: show the number of wired pagesv1.8+git20250304	Samuel Thibault

2025-02-25	vm_object: Drop old now-unused counter	Samuel Thibault

2025-02-12	Use MACRO_BEGIN/END	Samuel Thibault
	This notably fixes at least a SAVE_HINT call.
2025-02-12	vm: Add and use vm_object_lock_taken, vm_object_cache_locked, ↵	Samuel Thibault
	vm_page_locked_queues to check locking
2025-02-12	vm: Fix checking lock	Samuel Thibault

2025-02-12	vm: Add missing locking	Samuel Thibault

2025-02-12	Revert "vm_page: Keep evicting out pages with external pager"	Samuel Thibault
	This reverts commit 07f78f7000400bd9d2b3a9f665e7cdfea0cd3df9. Using double_paging is not only about insisting on triggering more evictions, but also about page being non-NULL and thus following a different codepath.
2025-02-11	vm_page: Keep evicting out pages with external pager	Samuel Thibault
	`double_paging` used to be used to detect when we have not really flushed a page yet (we have just pushed it to the external pager for now) and we really want to flush something because allocations are paused. But when we do not have a default pager, we are not double-paging but we should still continue evicting pages.
2025-02-09	vm_page: Avoid double-paging when we do not have a DMM	Samuel Thibault

2025-02-09	vm_page: Avoid creating a pager in DMM when not double-paging	Samuel Thibault

2025-02-09	vm_page: Also detect default memory manager being dead	Samuel Thibault

2025-02-05	vm_page_print: Fix typo	Samuel Thibault

2025-02-04	vm_page: Avoid trying to evict internal pages until defpager is up	Samuel Thibault
	Otherwise we will get stuck inside vm_object_pager_create's call to vm_object_enter trying to reference it. This avoids getting stuck when there is no swap and we don't start a defpager.
2024-12-10	Fix various function pointer types	Sergey Bugaev
	Fixes Wincompatible-pointer-types errors on GCC 15. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-ID: <20241210115705.710555-1-bugaevc@gmail.com>
2024-12-02	vm_page.c: fix a deadlock when running with smp enabled	Etienne Brateau
	the call vm_page_seg_pull_cache_page() return an vm_page (src) with his object being locked, as we don’t unlock before doing the vm_page_insert, it is still lock there, and so trying to relock it cause a deadlock. Replace this lock by an assert. This case was not seen as for non-smp locking is a no-op. Message-ID: <20241202182721.27920-2-etienne.brateau@gmail.com>
2024-10-21	fix a compile warning.	jbranso@dismail.de
	* vm/vm_page.c(vm_page_setup): %lu -> %zu vm/vm_page.c: In function 'vm_page_setup': vm/vm_page.c:1425:41: warning: format '%lu' expects argument of type 'long unsigned int', but argument 2 has type 'size_t' {aka 'unsigned int'} [-Wformat=] 1425 \| printf("vm_page: page table size: %lu entries (%luk)\n", nr_pages, \| ~~^ ~~~~~~~~ \| \| \| \| long unsigned int size_t {aka unsigned int} \| %u vm/vm_page.c:1425:54: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Wformat=] 1425 \| printf("vm_page: page table size: %lu entries (%luk)\n", nr_pages, \| ~~^ \| \| \| long unsigned int \| %u 1426 \| table_size >> 10); \| ~~~~~~~~~~~~~~~~ \| \| \| size_t {aka unsigned int} Message-ID: <20241020190744.2522-1-jbranso@dismail.de>
2024-07-09	Fix bogus format	Samuel Thibault

2024-04-05	vm: Mark entries as in-transition while wiring down	Sergey Bugaev
	When operating on the kernel map, vm_map_pageable_scan() does what the code itself describes as "HACK HACK HACK HACK": it unlocks the map, and calls vm_fault_wire() with the map unlocked. This hack is required to avoid a deadlock in case vm_fault or one of its callees (perhaps, a pager) needs to allocate memory in the kernel map. The hack relies on other kernel code being "well-behaved", in particular on that nothing will do any serious changes to this region of memory while the map is unlocked, since this region of memory is "owned" by the caller. Even if the kernel code is "well-behaved" and doesn't alter VM regions that it doesn't "own", it can still access adjacent regions. While this doesn't affect the region being wired down as such, it can still end up causing trouble due to extension & coalescence (merging) of VM entries. VM entry coalescence is an optimization where two adjacent VM entries with identical properties are merged into a single one that spans the combined region of the two original entries. VM entry extension is a similar an optimization where an existing VM entry is extended to cover an adjacent region, instead of a new VM entry being created to describe the region. These optimizations are a private implementation detail of vm_map, and (while they can be observed through e.g. vm_region) they are not supposed to cause any visible effects to how the described regions of memory behave; coalescence/extension and clipping happen automatically as needed when adding or removing mappings, or changing their properties. This is why it's fine for "well-behaved" kernel code to unknowingly cause extension or coalescence of VM entries describing a region by operating on adjacent VM regions. The "HACK HACK HACK HACK" code path relies on the VM entries in the region staying intact while it keeps the map unlocked, as it passes direct pointers to the entries into vm_fault_wire(), and also walks the list of entries in the region by following the vme_next pointers in the entries. Yet, this assumption is violated by the entries getting concurrently modified by other kernel code operating on adjacent VM regions, as described above. This is not only undefined behavior in the sense of the C language standard, but can also cause very real issues. Specifically, we've been seeing the VM subsystem deadlock when building Mach with SMP support and running a test program that calls mach_port_names() concurrently and repearedly. mach_port_names() implementation allocates and wires down memory, and when called from multiple threads, it was likely to allocate, and wire, several adjacent regions of memory, which would then cause entry coalescence/extension and clipping to kick in. The specific sequence of events that led to a deadlock appear to have been: 1. Multiple threads execute mach_port_names() concurrently. 2. One of the threads is wiring down a memory region, another is unwiring an adjacent memory region. 3. The wiring thread has unlocked the ipc_kernel_map, and called into vm_fault_wire(). 4. Due to entry coalescence/extension, the entry the wiring thread was going to wire down now describes a broader region of memory, namely it includes an adjustent region of memory that has previously been wired down by the other thread that is about to unwire it. 5. The wiring thread sets the busy bit on a wired-down page that the unwiring thread is about to unwire, and is waiting to take the map lock for reading in vm_map_verify(). 6. The unwiring thread holds the map lock for writing, and is waiting for the page to lose its busy bit. 7. Deadlock! To prevent this from happening, we have to ensure that the VM entries, at least as passed into vm_fault_wire() and as used for walking the list of such entries, stay intact while we have the map unlocked. One simple way to achieve that that I have proposed previously is to make a temporary copy of the VM entries in the region, and pass the copies into vm_fault_wire(). The entry copies would not be affected by coalescence/ extension, even if the original entries in the map are. This is however only straigtforward to do when there's just a single entry describing the while region, and there are further concerns with e.g. whether the underlying memory objects could, too, get coalesced. Arguably, making copies of the memory entries is making the hack even bigger. This patch instead implements a relatively clean solution that, arguably, makes the whole thing less of a hack: namely, making use of the in-transition bit on VM entries to prevent coalescence and any other unwanted effects. The entry in-transition bit was introduced for a very similar use case: the VM map copyout logic has to temporarily unlock the map to run its continuation, so it marks the VM entries it copied out into the map up to that point as being "in transition", asking other code to hold off making any serious changes to those entries. There's a companion "needs wakeup" bit that other code can set to block on the VM entry exiting this in-transition state; the code that puts an entry into the in-transition state is expected to, when unsetting the in-transition bit back, check for needs_wakeup being set, and wake any waiters up in that case, so they can retry whatever operation they wanted to do. There is no need to check for needs_wakeup in case of vm_map_pageable_scan(), however, exactly because we expect kernel code to be "well-behaved" and not make any attempts to modify the VM region. This relies on the in-transition bit inhibiting coalescence/extension, as implemented in the previous commit. Also, fix a tiny sad misaligned comment line. Reported-by: Damien Zammit <damien@zamaudio.com> Helped-by: Damien Zammit <damien@zamaudio.com> Message-ID: <20240405151850.41633-3-bugaevc@gmail.com>
2024-04-05	vm: Don't attempt to extend in-transition entries	Sergey Bugaev
	The in-transition mechanism exists to make it possible to unlock a map while still making sure some VM entries won't disappear from under you. This is currently used by the VM copyin mechanics. Entries in this state are better left alone, and extending/coalescing is only an optimization, so it makes sense to skip it if the entry to be extended is in transition. vm_map_coalesce_entry() already checks for this; check for it in other similar places too. This is in preparation for using the in-transition mechanism for wiring, where it's much more important that the entries are not extended while in transition. Message-ID: <20240405151850.41633-2-bugaevc@gmail.com>
2024-04-05	vm: Fix use-after-free in vm_map_pageable_scan()	Sergey Bugaev
	When operating on the kernel map, vm_map_pageable_scan() does what the code itself describes as "HACK HACK HACK HACK": it unlocks the map, and calls vm_fault_wire() with the map unlocked. This hack is required to avoid a deadlock in case vm_fault or one of its callees (perhaps, a pager) needs to allocate memory in the kernel map. The hack relies on other kernel code being "well-behaved", in particular on that nothing will do any serious changes to this region of memory while the map is unlocked, since this region of memory is "owned" by the caller. This reasoning doesn't apply to the validity of the 'end' entry (the first entry after the region to be wired), since it's not a part of the region, and is "owned" by someone else. Once the map is unlocked, the 'end' entry could get deallocated. Alternatively, a different entry could get inserted after the VM region in front of 'end', which would break the 'for (entry = start; entry != end; entry = entry->vme_next)' loop condition. This was not an issue in the original Mach 3 kernel, since it used an address range check for the loop condition, but got broken in commit 023401c5b97023670a44059a60eb2a3a11c8a929 "VM: rework map entry wiring". Fix this by switching the iteration back to use an address check. This partly fixes a deadlock with concurrent mach_port_names() calls on SMP, which was Reported-by: Damien Zammit <damien@zamaudio.com> Message-ID: <20240405151850.41633-1-bugaevc@gmail.com>
2024-03-04	vm_allocate_contiguous: Add missing page unwiring after making the area ↵	Samuel Thibault
	non-pageable Otherwise, if the allocated memory is passed over for returning data such as in device_read, we end up with ../vm/vm_map.c:4245: vm_map_copyin_page_list: Assertion `src_entry->wired_count > 0' failed.Debugger invoked: assertion failure
2024-02-23	vm_map: Add comment and assert for vm_map_delete	Damien Zammit
	This will prevent calling vm_map_delete without the map locked unless ref_count is zero. Message-ID: <20240223081505.458240-1-damien@zamaudio.com>
2024-02-22	vm_map_lookup: Add parameter for keeping map locked	Damien Zammit
	This adds a parameter called keep_map_locked to vm_map_lookup() that allows the function to return with the map locked. This is to prepare for fixing a bug with gsync where the map is locked twice by mistake. Co-Authored-By: Sergey Bugaev <bugaevc@gmail.com> Message-ID: <20240222082410.422869-3-damien@zamaudio.com>
2024-02-04	vm_pages_phys: Avoid faults while we keep vm locks	Samuel Thibault
	In principle we are actually writing to the allocated area outside of the vm lock, but better be safe in case somebody changes things.
2024-01-30	Add vm_pages_phys	Samuel Thibault
	For rumpdisk to efficiently determine the physical address, both for checking whether it is below 4GiB, and for giving it to the disk driver, we need a gnumach primitive (and that is not conditioned by MACH_VM_DEBUG like mach_vm_region_info and mach_vm_object_pages_phys are).
2024-01-13	adjust range when changing memory pageability	Luca Dariz
	* vm/vm_map.c: use actual limits instead of min/max boundaries to change pageability of the currently mapped memory. This caused the initial vm_wire_all(host, task VM_WIRE_ALL) in glibc startup to fail with KERN_NO_SPACE. Message-ID: <20240111210907.419689-5-luca@orpolo.org>
2023-11-27	vm: Coalesce map entries	Sergey Bugaev
	When - extending an existing entry, - changing protection or inheritance of a range of entries, we can get several entries that could be coalesced. Attempt to do that. Message-ID: <20230705141639.85792-4-bugaevc@gmail.com>
2023-11-27	vm: Add vm_map_coalesce_entry	Sergey Bugaev
	This function attempts to coalesce a VM map entry with its preceeding entry. It wraps vm_object_coalesce. Message-ID: <20230705141639.85792-3-bugaevc@gmail.com>
2023-10-01	ddb: Add whatis command	Samuel Thibault
	This is convenient when tracking buffer overflows
2023-08-29	vm_page_bootalloc: Return a phys_addr_t	Samuel Thibault

2023-08-29	vm_page: Fix setting higher bits in physical addresses	Samuel Thibault

2023-08-28	mach_vm_object_pages: Extend for PAE	Samuel Thibault

2023-08-28	vm_allocate_contiguous: Accept returning end of allowed memory	Samuel Thibault
	*result_paddr + size is exactly pass the allocated memory, so it can be equal to the requested bound.
2023-08-28	typo	Samuel Thibault

2023-08-21	vm_allocate_contiguous: better handle pmax	Samuel Thibault
	In case pmax is inside a segment, we should avoid using it, and stay with the previous segment, thus being sure to respect the caller's constraints.
2023-08-14	vm: Also check for virtual addresses in vm_map_delete	Samuel Thibault

2023-08-14	vm: Fix ordering of addresses between DMA32 and DIRECTMAP	Samuel Thibault

2023-08-09	Fix missing DMA32 limit	Samuel Thibault
	Rumpdisk needs to allocate dma32 memory areas, so we do always need this limit. The non-Xen x86_64 case had a typo, and the 32bit PAE case didn't have the DMA32 limit. Also, we have to cope with VM_PAGE_DMA32_LIMIT being either above or below VM_PAGE_DIRECTMAP_LIMIT depending on the cases.
2023-07-05	vm: Make vm_object_coalesce return new object and offset	Sergey Bugaev
	vm_object_coalesce() callers used to rely on the fact that it always merged the next_object into prev_object, potentially destroying next_object and leaving prev_object the result of the whole operation. After ee65849bec5da261be90f565bee096abb4117bdd "vm: Allow coalescing null object with an internal object", this is no longer true, since in case of prev_object == VM_OBJECT_NULL and next_object != VM_OBJECT_NULL, the overall result is next_object, not prev_object. The current callers are prepared to deal with this since they handle this case seprately anyway, but the following commit will introduce another caller that handles both cases in the same code path. So, declare the way vm_object_coalesce() coalesces the two objects its implementation detail, and make it return the resulting object and the offset into it explicitly. This simplifies the callers, too. Message-Id: <20230705141639.85792-2-bugaevc@gmail.com>
2023-07-03	vm: Eagerly release deallocated pages	Sergey Bugaev
	If a deallocated VM map entry refers to an object that only has a single reference and doesn't have a pager port, we can eagerly release any physical pages that were contained in the deallocated range. This is not a 100% solution: it is still possible to "leak" physical pages that can never appear in virtual memory again by creating several references to a memory object (perhaps by forking a VM map with VM_INHERIT_SHARE) and deallocating the pages from all the maps referring to the object. That being said, it should help to release the pages in the common case sooner. Message-Id: <20230626112656.435622-6-bugaevc@gmail.com>