diff options
author | Dev Jain <dev.jain@arm.com> | 2025-04-16 11:00:48 +0530 |
---|---|---|
committer | Andrew Morton <akpm@linux-foundation.org> | 2025-05-11 17:48:33 -0700 |
commit | 4a34c584d8cd13d2b721d21cf629f77c60bfb4a4 (patch) | |
tree | cdb64a916e3e5a9c9f849494524f0ff29b4139fb /lib/test_vmalloc.c | |
parent | 75404e07663b1622948944cf31531fa87cb1785d (diff) |
mempolicy: optimize queue_folios_pte_range by PTE batching
After the check for queue_folio_required(), the code only cares about the
folio in the for loop, i.e the PTEs are redundant. Therefore, optimize
this loop by skipping over a PTE batch mapping the same folio.
With a test program migrating pages of the calling process, which includes
a mapped VMA of size 4GB with pte-mapped large folios of order-9, and
migrating once back and forth node-0 and node-1, the average execution
time reduces from 7.5 to 4 seconds, giving an approx 47% speedup.
Link: https://lkml.kernel.org/r/20250416053048.96479-1-dev.jain@arm.com
Signed-off-by: Dev Jain <dev.jain@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'lib/test_vmalloc.c')
0 files changed, 0 insertions, 0 deletions