diff options
author | Gregory Price <gourry@gourry.net> | 2025-05-12 12:21:32 -0400 |
---|---|---|
committer | Dave Jiang <dave.jiang@intel.com> | 2025-05-13 13:07:46 -0700 |
commit | f109e77dde6e52439dce9fca19a0121c7cd04424 (patch) | |
tree | c04f3efcb8ddea3eb772501ca0691b60ccb245f2 | |
parent | 419dc40b82374cc7c417f0af613b9e6ea1d34095 (diff) |
cxl: docs/allocation/reclaim
Document a bit about how reclaim interacts with various CXL
configurations.
Signed-off-by: Gregory Price <gourry@gourry.net>
Link: https://patch.msgid.link/20250512162134.3596150-16-gourry@gourry.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
-rw-r--r-- | Documentation/driver-api/cxl/allocation/reclaim.rst | 51 | ||||
-rw-r--r-- | Documentation/driver-api/cxl/index.rst | 1 |
2 files changed, 52 insertions, 0 deletions
diff --git a/Documentation/driver-api/cxl/allocation/reclaim.rst b/Documentation/driver-api/cxl/allocation/reclaim.rst new file mode 100644 index 000000000000..f40f1cae391a --- /dev/null +++ b/Documentation/driver-api/cxl/allocation/reclaim.rst @@ -0,0 +1,51 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======= +Reclaim +======= +Another way CXL memory can be utilized *indirectly* is via the reclaim system +in :code:`mm/vmscan.c`. Reclaim is engaged when memory capacity on the system +becomes pressured based on global and cgroup-local `watermark` settings. + +In this section we won't discuss the `watermark` configurations, just how CXL +memory can be consumed by various pieces of reclaim system. + +Demotion +======== +By default, the reclaim system will prefer swap (or zswap) when reclaiming +memory. Enabling :code:`kernel/mm/numa/demotion_enabled` will cause vmscan +to opportunistically prefer distant NUMA nodes to swap or zswap, if capacity +is available. + +Demotion engages the :code:`mm/memory_tier.c` component to determine the +next demotion node. The next demotion node is based on the :code:`HMAT` +or :code:`CDAT` performance data. + +cpusets.mems_allowed quirk +-------------------------- +In Linux v6.15 and below, demotion does not respect :code:`cpusets.mems_allowed` +when migrating pages. As a result, if demotion is enabled, vmscan cannot +guarantee isolation of a container's memory from nodes not set in mems_allowed. + +In Linux v6.XX and up, demotion does attempt to respect +:code:`cpusets.mems_allowed`; however, certain classes of shared memory +originally instantiated by another cgroup (such as common libraries - e.g. +libc) may still be demoted. As a result, the mems_allowed interface still +cannot provide perfect isolation from the remote nodes. + +ZSwap and Node Preference +========================= +In Linux v6.15 and below, ZSwap allocates memory from the local node of the +processor for the new pages being compressed. Since pages being compressed +are typically cold, the result is a cold page becomes promoted - only to +be later demoted as it ages off the LRU. + +In Linux v6.XX, ZSwap tries to prefer the node of the page being compressed +as the allocation target for the compression page. This helps prevent +thrashing. + +Demotion with ZSwap +=================== +When enabling both Demotion and ZSwap, you create a situation where ZSwap +will prefer the slowest form of CXL memory by default until that tier of +memory is exhausted. diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-api/cxl/index.rst index 7acab7e7df96..d3ab928d4d7c 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -46,5 +46,6 @@ that have impacts on each other. The docs here break up configurations steps. allocation/dax allocation/page-allocator + allocation/reclaim .. only:: subproject and html |