mempolicy: optimize queue_folios_pte_range by PTE batching - linux/linux-stable.git

diff options

author	Dev Jain <dev.jain@arm.com>	2025-04-16 11:00:48 +0530
committer	Andrew Morton <akpm@linux-foundation.org>	2025-05-11 17:48:33 -0700
commit	4a34c584d8cd13d2b721d21cf629f77c60bfb4a4 (patch)
tree	cdb64a916e3e5a9c9f849494524f0ff29b4139fb /lib/test_vmalloc.c
parent	75404e07663b1622948944cf31531fa87cb1785d (diff)

mempolicy: optimize queue_folios_pte_range by PTE batching

After the check for queue_folio_required(), the code only cares about the folio in the for loop, i.e the PTEs are redundant. Therefore, optimize this loop by skipping over a PTE batch mapping the same folio. With a test program migrating pages of the calling process, which includes a mapped VMA of size 4GB with pte-mapped large folios of order-9, and migrating once back and forth node-0 and node-1, the average execution time reduces from 7.5 to 4 seconds, giving an approx 47% speedup. Link: https://lkml.kernel.org/r/20250416053048.96479-1-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Diffstat (limited to 'lib/test_vmalloc.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: