summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu/resctrl/rdtgroup.c
AgeCommit message (Collapse)Author
2024-07-02x86/resctrl: Refactor mkdir_mondata_subdir() with a helper functionTony Luck
In Sub-NUMA Cluster (SNC) mode Linux must create the monitor files in the original "mon_L3_XX" directories and also in each of the "mon_sub_L3_YY" directories. Refactor mkdir_mondata_subdir() to move the creation of monitoring files into a helper function to avoid the need to duplicate code later. No functional change. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20240628215619.76401-12-tony.luck@intel.com
2024-07-02x86/resctrl: Initialize on-stack struct rmid_read instancesTony Luck
New semantics rely on some struct rmid_read members having NULL values to distinguish between the SNC and non-SNC scenarios. resctrl can thus no longer rely on this struct not being initialized properly. Initialize all on-stack declarations of struct rmid_read: rdtgroup_mondata_show() mbm_update() mkdir_mondata_subdir() to ensure that garbage values from the stack are not passed down to other functions. [ bp: Massage commit message. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20240628215619.76401-11-tony.luck@intel.com
2024-07-02x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor filesTony Luck
When SNC is enabled, monitoring data is collected at the SNC node granularity, but must be reported at L3-cache granularity for backwards compatibility in addition to reporting at the node level. Add a "ci" field to the rdt_mon_domain structure to save the cache information about the enclosing L3 cache for the domain. This provides: 1) The cache id which is needed to compose the name of the legacy monitoring directory, and to determine which domains should be summed to provide L3-scoped data. 2) The shared_cpu_map which is needed to determine which CPUs can be used to read the RMID counters with the MSR interface. This is the first step to an eventual goal of monitor reporting files like this (for a system with two SNC nodes per L3): $ cd /sys/fs/resctrl/mon_data $ tree mon_L3_00 mon_L3_00 <- 00 here is L3 cache id ├── llc_occupancy \ These files provide legacy support ├── mbm_local_bytes > for non-SNC aware monitor apps ├── mbm_total_bytes / that expect data at L3 cache level ├── mon_sub_L3_00 <- 00 here is SNC node id │   ├── llc_occupancy \ These files are finer grained │   ├── mbm_local_bytes > data from each SNC node │   └── mbm_total_bytes / └── mon_sub_L3_01 ├── llc_occupancy \ ├── mbm_local_bytes > As above, but for node 1. └── mbm_total_bytes / [ bp: Massage commit message. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20240628215619.76401-9-tony.luck@intel.com
2024-07-02x86/resctrl: Block use of mba_MBps mount option on Sub-NUMA Cluster (SNC) ↵Tony Luck
systems When SNC is enabled there is a mismatch between the MBA control function which operates at L3 cache scope and the MBM monitor functions which measure memory bandwidth on each SNC node. Block use of the mba_MBps when scopes for MBA/MBM do not match. Improve user diagnostics by adding invalfc() message when mba_MBps is not supported. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20240628215619.76401-8-tony.luck@intel.com
2024-07-02x86/resctrl: Split the rdt_domain and rdt_hw_domain structuresTony Luck
The same rdt_domain structure is used for both control and monitor functions. But this results in wasted memory as some of the fields are only used by control functions, while most are only used for monitor functions. Split into separate rdt_ctrl_domain and rdt_mon_domain structures with just the fields required for control and monitoring respectively. Similar split of the rdt_hw_domain structure into rdt_hw_ctrl_domain and rdt_hw_mon_domain. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20240628215619.76401-5-tony.luck@intel.com
2024-07-02x86/resctrl: Prepare for different scope for control/monitor operationsTony Luck
Resctrl assumes that control and monitor operations on a resource are performed at the same scope. Prepare for systems that use different scope (specifically Intel needs to split the RDT_RESOURCE_L3 resource to use L3 scope for cache control and NODE scope for cache occupancy and memory bandwidth monitoring). Create separate domain lists for control and monitor operations. Note that errors during initialization of either control or monitor functions on a domain would previously result in that domain being excluded from both control and monitor operations. Now the domains are allocated independently it is no longer required to disable both control and monitor operations if either fail. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20240628215619.76401-4-tony.luck@intel.com
2024-07-02x86/resctrl: Prepare to split rdt_domain structureTony Luck
The rdt_domain structure is used for both control and monitor features. It is about to be split into separate structures for these two usages because the scope for control and monitoring features for a resource will be different for future resources. To allow for common code that scans a list of domains looking for a specific domain id, move all the common fields ("list", "id", "cpu_mask") into their own structure within the rdt_domain structure. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20240628215619.76401-3-tony.luck@intel.com
2024-07-02x86/resctrl: Prepare for new domain scopeTony Luck
Resctrl resources operate on subsets of CPUs in the system with the defining attribute of each subset being an instance of a particular level of cache. E.g. all CPUs sharing an L3 cache would be part of the same domain. In preparation for features that are scoped at the NUMA node level, change the code from explicit references to "cache_level" to a more generic scope. At this point the only options for this scope are groups of CPUs that share an L2 cache or L3 cache. Clean up the error handling when looking up domains. Report invalid ids before calling rdt_find_domain() in preparation for better messages when scope can be other than cache scope. This means that rdt_find_domain() will never return an error. So remove checks for error from the call sites. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20240628215619.76401-2-tony.luck@intel.com
2024-06-10x86/resctrl: Replace open coded cacheinfo searchesTony Luck
pseudo_lock_region_init() and rdtgroup_cbm_to_size() open code a search for details of a particular cache level. Replace with get_cpu_cacheinfo_level(). Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20240610003927.341707-5-tony.luck@intel.com
2024-04-24x86/resctrl: Pass domain to target CPUTony Luck
reset_all_ctrls() and resctrl_arch_update_domains() use on_each_cpu_mask() to call rdt_ctrl_update() on potentially one CPU from each domain. But this means rdt_ctrl_update() needs to figure out which domain to apply changes to. Doing so requires a search of all domains in a resource, which can only be done safely if cpus_lock is held. Both callers do hold this lock, but there isn't a way for a function called on another CPU via IPI to verify this. Commit c0d848fcb09d ("x86/resctrl: Remove lockdep annotation that triggers false positive") removed the incorrect assertions. Add the target domain to the msr_param structure and call rdt_ctrl_update() for each domain separately using smp_call_function_single(). This means that rdt_ctrl_update() doesn't need to search for the domain and get_domain_from_cpu() can safely assert that the cpus_lock is held since the remaining callers do not use IPI. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: James Morse <james.morse@arm.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Link: https://lore.kernel.org/r/20240308213846.77075-2-tony.luck@intel.com
2024-02-19x86/resctrl: Separate arch and fs resctrl locksJames Morse
resctrl has one mutex that is taken by the architecture-specific code, and the filesystem parts. The two interact via cpuhp, where the architecture code updates the domain list. Filesystem handlers that walk the domains list should not run concurrently with the cpuhp callback modifying the list. Exposing a lock from the filesystem code means the interface is not cleanly defined, and creates the possibility of cross-architecture lock ordering headaches. The interaction only exists so that certain filesystem paths are serialised against CPU hotplug. The CPU hotplug code already has a mechanism to do this using cpus_read_lock(). MPAM's monitors have an overflow interrupt, so it needs to be possible to walk the domains list in irq context. RCU is ideal for this, but some paths need to be able to sleep to allocate memory. Because resctrl_{on,off}line_cpu() take the rdtgroup_mutex as part of a cpuhp callback, cpus_read_lock() must always be taken first. rdtgroup_schemata_write() already does this. Most of the filesystem code's domain list walkers are currently protected by the rdtgroup_mutex taken in rdtgroup_kn_lock_live(). The exceptions are rdt_bit_usage_show() and the mon_config helpers which take the lock directly. Make the domain list protected by RCU. An architecture-specific lock prevents concurrent writers. rdt_bit_usage_show() could walk the domain list using RCU, but to keep all the filesystem operations the same, this is changed to call cpus_read_lock(). The mon_config helpers send multiple IPIs, take the cpus_read_lock() in these cases. The other filesystem list walkers need to be able to sleep. Add cpus_read_lock() to rdtgroup_kn_lock_live() so that the cpuhp callbacks can't be invoked when file system operations are occurring. Add lockdep_assert_cpus_held() in the cases where the rdtgroup_kn_lock_live() call isn't obvious. Resctrl's domain online/offline calls now need to take the rdtgroup_mutex themselves. [ bp: Fold in a build fix: https://lore.kernel.org/r/87zfvwieli.ffs@tglx ] Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-25-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Move domain helper migration into resctrl_offline_cpu()James Morse
When a CPU is taken offline the resctrl filesystem code needs to check if it was the CPU nominated to perform the periodic overflow and limbo work. If so, another CPU needs to be chosen to do this work. This is currently done in core.c, mixed in with the code that removes the CPU from the domain's mask, and potentially free()s the domain. Move the migration of the overflow and limbo helpers into the filesystem code, into resctrl_offline_cpu(). As resctrl_offline_cpu() runs before the architecture code has removed the CPU from the domain mask, the callers need to be told which CPU is being removed, to avoid picking it as the new CPU. This uses the exclude_cpu feature previously added. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-24-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Add CPU offline callback for resctrl workJames Morse
The resctrl architecture specific code may need to free a domain when a CPU goes offline, it also needs to reset the CPUs PQR_ASSOC register. Amongst other things, the resctrl filesystem code needs to clear this CPU from the cpu_mask of any control and monitor groups. Currently, this is all done in core.c and called from resctrl_offline_cpu(), making the split between architecture and filesystem code unclear. Move the filesystem work to remove the CPU from the control and monitor groups into a filesystem helper called resctrl_offline_cpu(), and rename the one in core.c resctrl_arch_offline_cpu(). Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-23-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but CPUJames Morse
When a CPU is taken offline resctrl may need to move the overflow or limbo handlers to run on a different CPU. Once the offline callbacks have been split, cqm_setup_limbo_handler() will be called while the CPU that is going offline is still present in the CPU mask. Pass the CPU to exclude to cqm_setup_limbo_handler() and mbm_setup_overflow_handler(). These functions can use a variant of cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs need excluding. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-22-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Add CPU online callback for resctrl workJames Morse
The resctrl architecture specific code may need to create a domain when a CPU comes online, it also needs to reset the CPUs PQR_ASSOC register. The resctrl filesystem code needs to update the rdtgroup_default CPU mask when CPUs are brought online. Currently, this is all done in one function, resctrl_online_cpu(). It will need to be split into architecture and filesystem parts before resctrl can be moved to /fs/. Pull the rdtgroup_default update work out as a filesystem specific cpu_online helper. resctrl_online_cpu() is the obvious name for this, which means the version in core.c needs renaming. resctrl_online_cpu() is called by the arch code once it has done the work to add the new CPU to any domains. In future patches, resctrl_online_cpu() will take the rdtgroup_mutex itself. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-21-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Add helpers for system wide mon/alloc capableJames Morse
resctrl reads rdt_alloc_capable or rdt_mon_capable to determine whether any of the resources support the corresponding features. resctrl also uses the static keys that affect the architecture's context-switch code to determine the same thing. This forces another architecture to have the same static keys. As the static key is enabled based on the capable flag, and none of the filesystem uses of these are in the scheduler path, move the capable flags behind helpers, and use these in the filesystem code instead of the static key. After this change, only the architecture code manages and uses the static keys to ensure __resctrl_sched_in() does not need runtime checks. This avoids multiple architectures having to define the same static keys. Cases where the static key implicitly tested if the resctrl filesystem was mounted all have an explicit check now. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-20-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Make rdt_enable_key the arch's decision to switchJames Morse
rdt_enable_key is switched when resctrl is mounted. It was also previously used to prevent a second mount of the filesystem. Any other architecture that wants to support resctrl has to provide identical static keys. Now that there are helpers for enabling and disabling the alloc/mon keys, resctrl doesn't need to switch this extra key, it can be done by the arch code. Use the static-key increment and decrement helpers, and change resctrl to ensure the calls are balanced. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-19-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Move alloc/mon static keys into helpersJames Morse
resctrl enables three static keys depending on the features it has enabled. Another architecture's context switch code may look different, any static keys that control it should be buried behind helpers. Move the alloc/mon logic into arch-specific helpers as a preparatory step for making the rdt_enable_key's status something the arch code decides. This means other architectures don't have to mirror the static keys. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-18-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Make resctrl_mounted checks explicitJames Morse
The rdt_enable_key is switched when resctrl is mounted, and used to prevent a second mount of the filesystem. It also enables the architecture's context switch code. This requires another architecture to have the same set of static keys, as resctrl depends on them too. The existing users of these static keys are implicitly also checking if the filesystem is mounted. Make the resctrl_mounted checks explicit: resctrl can keep track of whether it has been mounted once. This doesn't need to be combined with whether the arch code is context switching the CLOSID. rdt_mon_enable_key is never used just to test that resctrl is mounted, but does also have this implication. Add a resctrl_mounted to all uses of rdt_mon_enable_key. This will allow the static key changing to be moved behind resctrl_arch_ calls. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-17-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Move CLOSID/RMID matching and setting to use helpersJames Morse
When switching tasks, the CLOSID and RMID that the new task should use are stored in struct task_struct. For x86 the CLOSID known by resctrl, the value in task_struct, and the value written to the CPU register are all the same thing. MPAM's CPU interface has two different PARTIDs - one for data accesses the other for instruction fetch. Storing resctrl's CLOSID value in struct task_struct implies the arch code knows whether resctrl is using CDP. Move the matching and setting of the struct task_struct properties to use helpers. This allows arm64 to store the hardware format of the register, instead of having to convert it each time. __rdtgroup_move_task()s use of READ_ONCE()/WRITE_ONCE() ensures torn values aren't seen as another CPU may schedule the task being moved while the value is being changed. MPAM has an additional corner-case here as the PMG bits extend the PARTID space. If the scheduler sees a new-CLOSID but old-RMID, the task will dirty an RMID that the limbo code is not watching causing an inaccurate count. x86's RMID are independent values, so the limbo code will still be watching the old-RMID in this circumstance. To avoid this, arm64 needs both the CLOSID/RMID WRITE_ONCE()d together. Both values must be provided together. Because MPAM's RMID values are not unique, the CLOSID must be provided when matching the RMID. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-12-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmidJames Morse
MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be used for different control groups. This means once a CLOSID is allocated, all its monitoring ids may still be dirty, and held in limbo. Instead of allocating the first free CLOSID, on architectures where CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID is enabled, search closid_num_dirty_rmid[] to find the cleanest CLOSID. The CLOSID found is returned to closid_alloc() for the free list to be updated. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-11-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Use __set_bit()/__clear_bit() instead of open codingJames Morse
The resctrl CLOSID allocator uses a single 32bit word to track which CLOSID are free. The setting and clearing of bits is open coded. Convert the existing open coded bit manipulations of closid_free_map to use __set_bit() and friends. These don't need to be atomic as this list is protected by the mutex. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-10-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Allow RMID allocation to be scoped by CLOSIDJames Morse
MPAMs RMID values are not unique unless the CLOSID is considered as well. alloc_rmid() expects the RMID to be an independent number. Pass the CLOSID in to alloc_rmid(). Use this to compare indexes when allocating. If the CLOSID is not relevant to the index, this ends up comparing the free RMID with itself, and the first free entry will be used. With MPAM the CLOSID is included in the index, so this becomes a walk of the free RMID entries, until one that matches the supplied CLOSID is found. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-8-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Access per-rmid structures by indexJames Morse
x86 systems identify traffic using the CLOSID and RMID. The CLOSID is used to lookup the control policy, the RMID is used for monitoring. For x86 these are independent numbers. Arm's MPAM has equivalent features PARTID and PMG, where the PARTID is used to lookup the control policy. The PMG in contrast is a small number of bits that are used to subdivide PARTID when monitoring. The cache-occupancy monitors require the PARTID to be specified when monitoring. This means MPAM's PMG field is not unique. There are multiple PMG-0, one per allocated CLOSID/PARTID. If PMG is treated as equivalent to RMID, it cannot be allocated as an independent number. Bitmaps like rmid_busy_llc need to be sized by the number of unique entries for this resource. Treat the combined CLOSID and RMID as an index, and provide architecture helpers to pack and unpack an index. This makes the MPAM values unique. The domain's rmid_busy_llc and rmid_ptrs[] are then sized by index, as are domain mbm_local[] and mbm_total[]. x86 can ignore the CLOSID field when packing and unpacking an index, and report as many indexes as RMID. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-7-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Track the closid with the rmidJames Morse
x86's RMID are independent of the CLOSID. An RMID can be allocated, used and freed without considering the CLOSID. MPAM's equivalent feature is PMG, which is not an independent number, it extends the CLOSID/PARTID space. For MPAM, only PMG-bits worth of 'RMID' can be allocated for a single CLOSID. i.e. if there is 1 bit of PMG space, then each CLOSID can have two monitor groups. To allow resctrl to disambiguate RMID values for different CLOSID, everything in resctrl that keeps an RMID value needs to know the CLOSID too. This will always be ignored on x86. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Xin Hao <xhao@linux.alibaba.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-6-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Move RMID allocation out of mkdir_rdt_prepare()James Morse
RMIDs are allocated for each monitor or control group directory, because each of these needs its own RMID. For control groups, rdtgroup_mkdir_ctrl_mon() later goes on to allocate the CLOSID. MPAM's equivalent of RMID is not an independent number, so can't be allocated until the CLOSID is known. An RMID allocation for one CLOSID may fail, whereas another may succeed depending on how many monitor groups a control group has. The RMID allocation needs to move to be after the CLOSID has been allocated. Move the RMID allocation out of mkdir_rdt_prepare() to occur in its caller, after the mkdir_rdt_prepare() call. This allows the RMID allocator to know the CLOSID. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-5-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16x86/resctrl: Create helper for RMID allocation and mondata dir creationJames Morse
When monitoring is supported, each monitor and control group is allocated an RMID. For control groups, rdtgroup_mkdir_ctrl_mon() later goes on to allocate the CLOSID. MPAM's equivalent of RMID are not an independent number, so can't be allocated until the CLOSID is known. An RMID allocation for one CLOSID may fail, whereas another may succeed depending on how many monitor groups a control group has. The RMID allocation needs to move to be after the CLOSID has been allocated. Move the RMID allocation and mondata dir creation to a helper. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-4-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-01-25x86/resctrl: Remove redundant variable in mbm_config_write_domain()Babu Moger
The kernel test robot reported the following warning after commit 54e35eb8611c ("x86/resctrl: Read supported bandwidth sources from CPUID"). even though the issue is present even in the original commit 92bd5a139033 ("x86/resctrl: Add interface to write mbm_total_bytes_config") which added this function. The reported warning is: $ make C=1 CHECK=scripts/coccicheck arch/x86/kernel/cpu/resctrl/rdtgroup.o ... arch/x86/kernel/cpu/resctrl/rdtgroup.c:1621:5-8: Unneeded variable: "ret". Return "0" on line 1655 Remove the local variable 'ret'. [ bp: Massage commit message, make mbm_config_write_domain() void. ] Fixes: 92bd5a139033 ("x86/resctrl: Add interface to write mbm_total_bytes_config") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202401241810.jbd8Ipa1-lkp@intel.com/ Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/202401241810.jbd8Ipa1-lkp@intel.com
2024-01-23x86/resctrl: Read supported bandwidth sources from CPUIDBabu Moger
If the BMEC (Bandwidth Monitoring Event Configuration) feature is supported, the bandwidth events can be configured. The maximum supported bandwidth bitmask can be read from CPUID: CPUID_Fn80000020_ECX_x03 [Platform QoS Monitoring Bandwidth Event Configuration] Bits Description 31:7 Reserved 6:0 Identifies the bandwidth sources that can be tracked. While at it, move the mask checking to mon_config_write() before iterating over all the domains. Also, print the valid bitmask when the user tries to configure invalid event configuration value. The CPUID details are documented in the Processor Programming Reference (PPR) Vol 1.1 for AMD Family 19h Model 11h B1 - 55901 Rev 0.25 in the Link tag. Fixes: dc2a3e857981 ("x86/resctrl: Add interface to read mbm_total_bytes_config") Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Link: https://lore.kernel.org/r/669896fa512c7451319fa5ca2fdb6f7e015b5635.1705359148.git.babu.moger@amd.com
2023-10-17x86/resctrl: Display RMID of resource groupBabu Moger
In x86, hardware uses RMID to identify a monitoring group. When a user creates a monitor group these details are not visible. These details can help resctrl debugging. Add RMID(mon_hw_id) to the monitor groups display in the resctrl interface. Users can see these details when resctrl is mounted with "-o debug" option. Add RFTYPE_MON_BASE that complements existing RFTYPE_CTRL_BASE and represents files belonging to monitoring groups. Other architectures do not use "RMID". Use the name mon_hw_id to refer to "RMID" in an effort to keep the naming generic. For example: $cat /sys/fs/resctrl/mon_groups/mon_grp1/mon_hw_id 3 Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-10-babu.moger@amd.com
2023-10-17x86/resctrl: Add support for the files of MON groups onlyBabu Moger
Files unique to monitoring groups have the RFTYPE_MON flag. When a new monitoring group is created the resctrl files with flags RFTYPE_BASE (files common to all resource groups) and RFTYPE_MON (files unique to monitoring groups) are created to support interacting with the new monitoring group. A resource group can support both monitoring and control, also termed a CTRL_MON resource group. CTRL_MON groups should get both monitoring and control resctrl files but that is not the case. Only the RFTYPE_BASE and RFTYPE_CTRL files are created for CTRL_MON groups. Ensure that files with the RFTYPE_MON flag are created for CTRL_MON groups. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-9-babu.moger@amd.com
2023-10-17x86/resctrl: Display CLOSID for resource groupBabu Moger
In x86, hardware uses CLOSID to identify a control group. When a user creates a control group this information is not visible to the user. It can help resctrl debugging. Add CLOSID(ctrl_hw_id) to the control groups display in the resctrl interface. Users can see this detail when resctrl is mounted with the "-o debug" option. Other architectures do not use "CLOSID". Use the names ctrl_hw_id to refer to "CLOSID" in an effort to keep the naming generic. For example: $cat /sys/fs/resctrl/ctrl_grp1/ctrl_hw_id 1 Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-8-babu.moger@amd.com
2023-10-17x86/resctrl: Introduce "-o debug" mount optionBabu Moger
Add "-o debug" option to mount resctrl filesystem in debug mode. When in debug mode resctrl displays files that have the new RFTYPE_DEBUG flag to help resctrl debugging. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-7-babu.moger@amd.com
2023-10-17x86/resctrl: Move default group file creation to mountBabu Moger
The default resource group and its files are created during kernel init time. Upcoming changes will make some resctrl files optional based on a mount parameter. If optional files are to be added to the default group based on the mount option, then each new file needs to be created separately and call kernfs_activate() again. Create all files of the default resource group during resctrl mount, destroyed during unmount, to avoid scattering resctrl file addition across two separate code flows. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-6-babu.moger@amd.com
2023-10-17x86/resctrl: Unwind properly from rdt_enable_ctx()Babu Moger
rdt_enable_ctx() enables the features provided during resctrl mount. Additions to rdt_enable_ctx() are required to also modify error paths of rdt_enable_ctx() callers to ensure correct unwinding if errors are encountered after calling rdt_enable_ctx(). This is error prone. Introduce rdt_disable_ctx() to refactor the error unwinding of rdt_enable_ctx() to simplify future additions. This also simplifies cleanup in rdt_kill_sb(). Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-5-babu.moger@amd.com
2023-10-17x86/resctrl: Rename rftype flags for consistencyBabu Moger
resctrl associates rftype flags with its files so that files can be chosen based on the resource, whether it is info or base, and if it is control or monitor type file. These flags use the RF_ as well as RFTYPE_ prefixes. Change the prefix to RFTYPE_ for all these flags to be consistent. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-4-babu.moger@amd.com
2023-10-17x86/resctrl: Simplify rftype flag definitionsBabu Moger
The rftype flags are bitmaps used for adding files under the resctrl filesystem. Some of these bitmap defines have one extra level of indirection which is not necessary. Drop the RF_* defines and simplify the macros. [ bp: Massage commit message. ] Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-3-babu.moger@amd.com
2023-10-17x86/resctrl: Add multiple tasks to the resctrl group at onceBabu Moger
The resctrl task assignment for monitor or control group needs to be done one at a time. For example: $mount -t resctrl resctrl /sys/fs/resctrl/ $mkdir /sys/fs/resctrl/ctrl_grp1 $echo 123 > /sys/fs/resctrl/ctrl_grp1/tasks $echo 456 > /sys/fs/resctrl/ctrl_grp1/tasks $echo 789 > /sys/fs/resctrl/ctrl_grp1/tasks This is not user-friendly when dealing with hundreds of tasks. Support multiple task assignment in one command with tasks ids separated by commas. For example: $echo 123,456,789 > /sys/fs/resctrl/ctrl_grp1/tasks Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-2-babu.moger@amd.com
2023-10-11x86/resctrl: Add sparse_masks file in infoFenghua Yu
Add the interface in resctrl FS to show if sparse cache allocation bit masks are supported on the platform. Reading the file returns either a "1" if non-contiguous 1s are supported and "0" otherwise. The file path is /sys/fs/resctrl/info/{resource}/sparse_masks, where {resource} can be either "L2" or "L3". Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Peter Newman <peternewman@google.com> Link: https://lore.kernel.org/r/7300535160beba41fd8aa073749ec1ee29b4621f.1696934091.git.maciej.wieczor-retman@intel.com
2023-10-11x86/resctrl: Fix remaining kernel-doc warningsMaciej Wieczor-Retman
The kernel test robot reported kernel-doc warnings here: arch/x86/kernel/cpu/resctrl/rdtgroup.c:915: warning: Function parameter or member 'of' not described in 'rdt_bit_usage_show' arch/x86/kernel/cpu/resctrl/rdtgroup.c:915: warning: Function parameter or member 'seq' not described in 'rdt_bit_usage_show' arch/x86/kernel/cpu/resctrl/rdtgroup.c:915: warning: Function parameter or member 'v' not described in 'rdt_bit_usage_show' arch/x86/kernel/cpu/resctrl/rdtgroup.c:1144: warning: Function parameter or member 'type' not described in '__rdtgroup_cbm_overlaps' arch/x86/kernel/cpu/resctrl/rdtgroup.c:1224: warning: Function parameter or member 'rdtgrp' not described in 'rdtgroup_mode_test_exclusive' arch/x86/kernel/cpu/resctrl/rdtgroup.c:1261: warning: Function parameter or member 'of' not described in 'rdtgroup_mode_write' arch/x86/kernel/cpu/resctrl/rdtgroup.c:1261: warning: Function parameter or member 'buf' not described in 'rdtgroup_mode_write' arch/x86/kernel/cpu/resctrl/rdtgroup.c:1261: warning: Function parameter or member 'nbytes' not described in 'rdtgroup_mode_write' arch/x86/kernel/cpu/resctrl/rdtgroup.c:1261: warning: Function parameter or member 'off' not described in 'rdtgroup_mode_write' arch/x86/kernel/cpu/resctrl/rdtgroup.c:1370: warning: Function parameter or member 'of' not described in 'rdtgroup_size_show' arch/x86/kernel/cpu/resctrl/rdtgroup.c:1370: warning: Function parameter or member 's' not described in 'rdtgroup_size_show' arch/x86/kernel/cpu/resctrl/rdtgroup.c:1370: warning: Function parameter or member 'v' not described in 'rdtgroup_size_show' The first two functions are missing an argument description while the other three are file callbacks and don't require a kernel-doc comment. Closes: https://lore.kernel.org/oe-kbuild-all/202310070434.mD8eRNAz-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Newman <peternewman@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20231011064843.246592-1-maciej.wieczor-retman@intel.com
2023-06-07x86/resctrl: Implement rename op for mon groupsPeter Newman
To change the resources allocated to a large group of tasks, such as an application container, a container manager must write all of the tasks' IDs into the tasks file interface of the new control group. This is challenging when the container's task list is always changing. In addition, if the container manager is using monitoring groups to separately track the bandwidth of containers assigned to the same control group, when moving a container, it must first move the container's tasks to the default monitoring group of the new control group before it can move these tasks into the container's replacement monitoring group under the destination control group. This is undesirable because it makes bandwidth usage during the move unattributable to the correct tasks and resets monitoring event counters and cache usage information for the group. Implement the rename operation only for resctrlfs monitor groups to enable users to move a monitoring group from one control group to another. This effects a change in resources allocated to all the tasks in the monitoring group while otherwise leaving the monitoring data intact. Signed-off-by: Peter Newman <peternewman@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20230419125015.693566-3-peternewman@google.com
2023-06-07x86/resctrl: Factor rdtgroup lock for multi-file opsPeter Newman
rdtgroup_kn_lock_live() can only release a kernfs reference for a single file before waiting on the rdtgroup_mutex, limiting its usefulness for operations on multiple files, such as rename. Factor the work needed to respectively break and unbreak active protection on an individual file into rdtgroup_kn_{get,put}(). No functional change. Signed-off-by: Peter Newman <peternewman@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20230419125015.693566-2-peternewman@google.com
2023-05-30x86/resctrl: Only show tasks' pid in current pid namespaceShawn Wang
When writing a task id to the "tasks" file in an rdtgroup, rdtgroup_tasks_write() treats the pid as a number in the current pid namespace. But when reading the "tasks" file, rdtgroup_tasks_show() shows the list of global pids from the init namespace, which is confusing and incorrect. To be more robust, let the "tasks" file only show pids in the current pid namespace. Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files") Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Fenghua Yu <fenghua.yu@intel.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/all/20230116071246.97717-1-shawnwang@linux.alibaba.com/
2023-03-15x86/resctrl: Clear staged_config[] before and after it is usedShawn Wang
As a temporary storage, staged_config[] in rdt_domain should be cleared before and after it is used. The stale value in staged_config[] could cause an MSR access error. Here is a reproducer on a system with 16 usable CLOSIDs for a 15-way L3 Cache (MBA should be disabled if the number of CLOSIDs for MB is less than 16.) : mount -t resctrl resctrl -o cdp /sys/fs/resctrl mkdir /sys/fs/resctrl/p{1..7} umount /sys/fs/resctrl/ mount -t resctrl resctrl /sys/fs/resctrl mkdir /sys/fs/resctrl/p{1..8} An error occurs when creating resource group named p8: unchecked MSR access error: WRMSR to 0xca0 (tried to write 0x00000000000007ff) at rIP: 0xffffffff82249142 (cat_wrmsr+0x32/0x60) Call Trace: <IRQ> __flush_smp_call_function_queue+0x11d/0x170 __sysvec_call_function+0x24/0xd0 sysvec_call_function+0x89/0xc0 </IRQ> <TASK> asm_sysvec_call_function+0x16/0x20 When creating a new resource control group, hardware will be configured by the following process: rdtgroup_mkdir() rdtgroup_mkdir_ctrl_mon() rdtgroup_init_alloc() resctrl_arch_update_domains() resctrl_arch_update_domains() iterates and updates all resctrl_conf_type whose have_new_ctrl is true. Since staged_config[] holds the same values as when CDP was enabled, it will continue to update the CDP_CODE and CDP_DATA configurations. When group p8 is created, get_config_index() called in resctrl_arch_update_domains() will return 16 and 17 as the CLOSIDs for CDP_CODE and CDP_DATA, which will be translated to an invalid register - 0xca0 in this scenario. Fix it by clearing staged_config[] before and after it is used. [reinette: re-order commit tags] Fixes: 75408e43509e ("x86/resctrl: Allow different CODE/DATA configurations to be staged") Suggested-by: Xin Hao <xhao@linux.alibaba.com> Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/2fad13f49fbe89687fc40e9a5a61f23a28d1507a.1673988935.git.reinette.chatre%40intel.com
2023-03-08x86/resctl: fix scheduler confusion with 'current'Linus Torvalds
The implementation of 'current' on x86 is very intentionally special: it is a very common thing to look up, and it uses 'this_cpu_read_stable()' to get the current thread pointer efficiently from per-cpu storage. And the keyword in there is 'stable': the current thread pointer never changes as far as a single thread is concerned. Even if when a thread is preempted, or moved to another CPU, or even across an explicit call 'schedule()' that thread will still have the same value for 'current'. It is, after all, the kernel base pointer to thread-local storage. That's why it's stable to begin with, but it's also why it's important enough that we have that special 'this_cpu_read_stable()' access for it. So this is all done very intentionally to allow the compiler to treat 'current' as a value that never visibly changes, so that the compiler can do CSE and combine multiple different 'current' accesses into one. However, there is obviously one very special situation when the currently running thread does actually change: inside the scheduler itself. So the scheduler code paths are special, and do not have a 'current' thread at all. Instead there are _two_ threads: the previous and the next thread - typically called 'prev' and 'next' (or prev_p/next_p) internally. So this is all actually quite straightforward and simple, and not all that complicated. Except for when you then have special code that is run in scheduler context, that code then has to be aware that 'current' isn't really a valid thing. Did you mean 'prev'? Did you mean 'next'? In fact, even if then look at the code, and you use 'current' after the new value has been assigned to the percpu variable, we have explicitly told the compiler that 'current' is magical and always stable. So the compiler is quite free to use an older (or newer) value of 'current', and the actual assignment to the percpu storage is not relevant even if it might look that way. Which is exactly what happened in the resctl code, that blithely used 'current' in '__resctrl_sched_in()' when it really wanted the new process state (as implied by the name: we're scheduling 'into' that new resctl state). And clang would end up just using the old thread pointer value at least in some configurations. This could have happened with gcc too, and purely depends on random compiler details. Clang just seems to have been more aggressive about moving the read of the per-cpu current_task pointer around. The fix is trivial: just make the resctl code adhere to the scheduler rules of using the prev/next thread pointer explicitly, instead of using 'current' in a situation where it just wasn't valid. That same code is then also used outside of the scheduler context (when a thread resctl state is explicitly changed), and then we will just pass in 'current' as that pointer, of course. There is no ambiguity in that case. The fix may be trivial, but noticing and figuring out what went wrong was not. The credit for that goes to Stephane Eranian. Reported-by: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/lkml/20230303231133.1486085-1-eranian@google.com/ Link: https://lore.kernel.org/lkml/alpine.LFD.2.01.0908011214330.3304@localhost.localdomain/ Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: Stephane Eranian <eranian@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-01-26x86/resctrl: Fix a silly -Wunused-but-set-variable warningBorislav Petkov (AMD)
clang correctly complains arch/x86/kernel/cpu/resctrl/rdtgroup.c:1456:6: warning: variable \ 'h' set but not used [-Wunused-but-set-variable] u32 h; ^ but it can't know whether this use is innocuous or really a problem. There's a reason why those warning switches are behind a W=1 and not enabled by default - yes, one needs to do: make W=1 CC=clang HOSTCC=clang arch/x86/kernel/cpu/resctrl/ with clang 14 in order to trigger it. I would normally not take a silly fix like that but this one is simple and doesn't make the code uglier so... Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/202301242015.kbzkVteJ-lkp@intel.com
2023-01-23x86/resctrl: Add interface to write mbm_local_bytes_configBabu Moger
The event configuration for mbm_local_bytes can be changed by the user by writing to the configuration file /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config. The event configuration settings are domain specific and will affect all the CPUs in the domain. Following are the types of events supported: ==== =========================================================== Bits Description ==== =========================================================== 6 Dirty Victims from the QOS domain to all types of memory 5 Reads to slow memory in the non-local NUMA domain 4 Reads to slow memory in the local NUMA domain 3 Non-temporal writes to non-local NUMA domain 2 Non-temporal writes to local NUMA domain 1 Reads to memory in the non-local NUMA domain 0 Reads to memory in the local NUMA domain ==== =========================================================== For example, to change the mbm_local_bytes_config to count all the non-temporal writes on domain 0, the bits 2 and 3 needs to be set which is 1100b (in hex 0xc). Run the command: $echo 0=0xc > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config To change the mbm_local_bytes to count only reads to local NUMA domain 1, the bit 0 needs to be set which 1b (in hex 0x1). Run the command: $echo 1=0x1 > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20230113152039.770054-13-babu.moger@amd.com
2023-01-23x86/resctrl: Add interface to write mbm_total_bytes_configBabu Moger
The event configuration for mbm_total_bytes can be changed by the user by writing to the file /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config. The event configuration settings are domain specific and affect all the CPUs in the domain. Following are the types of events supported: ==== =========================================================== Bits Description ==== =========================================================== 6 Dirty Victims from the QOS domain to all types of memory 5 Reads to slow memory in the non-local NUMA domain 4 Reads to slow memory in the local NUMA domain 3 Non-temporal writes to non-local NUMA domain 2 Non-temporal writes to local NUMA domain 1 Reads to memory in the non-local NUMA domain 0 Reads to memory in the local NUMA domain ==== =========================================================== For example: To change the mbm_total_bytes to count only reads on domain 0, the bits 0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33). Run the command: $echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config To change the mbm_total_bytes to count all the slow memory reads on domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30). Run the command: $echo 1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20230113152039.770054-12-babu.moger@amd.com
2023-01-23x86/resctrl: Add interface to read mbm_local_bytes_configBabu Moger
The event configuration can be viewed by the user by reading the configuration file /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config. The event configuration settings are domain specific and will affect all the CPUs in the domain. Following are the types of events supported: ==== =========================================================== Bits Description ==== =========================================================== 6 Dirty Victims from the QOS domain to all types of memory 5 Reads to slow memory in the non-local NUMA domain 4 Reads to slow memory in the local NUMA domain 3 Non-temporal writes to non-local NUMA domain 2 Non-temporal writes to local NUMA domain 1 Reads to memory in the non-local NUMA domain 0 Reads to memory in the local NUMA domain ==== =========================================================== By default, the mbm_local_bytes_config is set to 0x15 to count all the local event types. For example: $cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 0=0x15;1=0x15;2=0x15;3=0x15 In this case, the event mbm_local_bytes is configured with 0x15 on domains 0 to 3. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20230113152039.770054-11-babu.moger@amd.com
2023-01-23x86/resctrl: Add interface to read mbm_total_bytes_configBabu Moger
The event configuration can be viewed by the user by reading the configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config. The event configuration settings are domain specific and will affect all the CPUs in the domain. Following are the types of events supported: ==== =========================================================== Bits Description ==== =========================================================== 6 Dirty Victims from the QOS domain to all types of memory 5 Reads to slow memory in the non-local NUMA domain 4 Reads to slow memory in the local NUMA domain 3 Non-temporal writes to non-local NUMA domain 2 Non-temporal writes to local NUMA domain 1 Reads to memory in the non-local NUMA domain 0 Reads to memory in the local NUMA domain ==== =========================================================== By default, the mbm_total_bytes_config is set to 0x7f to count all the event types. For example: $cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 0=0x7f;1=0x7f;2=0x7f;3=0x7f In this case, the event mbm_total_bytes is configured with 0x7f on domains 0 to 3. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20230113152039.770054-10-babu.moger@amd.com