Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the small set of driver core and kernfs changes for 6.10-rc1.
Nothing major here at all, just a small set of changes for some driver
core apis, and minor fixups. Included in here are:
- sysfs_bin_attr_simple_read() helper added and used
- device_show_string() helper added and used
All usages of these were acked by the various maintainers. Also in
here are:
- kernfs minor cleanup
- removed unused functions
- typo fix in documentation
- pay attention to sysfs_create_link() failures in module.c finally
All of these have been in linux-next for a very long time with no
reported problems"
* tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
device property: Fix a typo in the description of device_get_child_node_count()
kernfs: mount: Remove unnecessary ‘NULL’ values from knparent
scsi: Use device_show_string() helper for sysfs attributes
platform/x86: Use device_show_string() helper for sysfs attributes
perf: Use device_show_string() helper for sysfs attributes
IB/qib: Use device_show_string() helper for sysfs attributes
hwmon: Use device_show_string() helper for sysfs attributes
driver core: Add device_show_string() helper for sysfs attributes
treewide: Use sysfs_bin_attr_simple_read() helper
sysfs: Add sysfs_bin_attr_simple_read() helper
module: don't ignore sysfs_create_link() failures
driver core: Remove unused platform_notify, platform_notify_remove
|
|
Patch series "Memory allocation profiling", v6.
Overview:
Low overhead [1] per-callsite memory allocation profiling. Not just for
debug kernels, overhead low enough to be deployed in production.
Example output:
root@moria-kvm:~# sort -rn /proc/allocinfo
127664128 31168 mm/page_ext.c:270 func:alloc_page_ext
56373248 4737 mm/slub.c:2259 func:alloc_slab_page
14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded
14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash
13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs
11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio
9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node
4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable
4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start
3940352 962 mm/memory.c:4214 func:alloc_anon_folio
2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node
...
Usage:
kconfig options:
- CONFIG_MEM_ALLOC_PROFILING
- CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
- CONFIG_MEM_ALLOC_PROFILING_DEBUG
adds warnings for allocations that weren't accounted because of a
missing annotation
sysctl:
/proc/sys/vm/mem_profiling
Runtime info:
/proc/allocinfo
Notes:
[1]: Overhead
To measure the overhead we are comparing the following configurations:
(1) Baseline with CONFIG_MEMCG_KMEM=n
(2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n)
(3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y)
(4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1)
(5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT
(6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) && CONFIG_MEMCG_KMEM=y
(7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y
Performance overhead:
To evaluate performance we implemented an in-kernel test executing
multiple get_free_page/free_page and kmalloc/kfree calls with allocation
sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
affinity set to a specific CPU to minimize the noise. Below are results
from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on
56 core Intel Xeon:
kmalloc pgalloc
(1 baseline) 6.764s 16.902s
(2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%)
(3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%)
(4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%)
(5 memcg) 13.388s (+97.94%) 48.460s (+186.71%)
(6 def disabled+memcg) 13.332s (+97.10%) 48.105s (+184.61%)
(7 def enabled+memcg) 13.446s (+98.78%) 54.963s (+225.18%)
Memory overhead:
Kernel size:
text data bss dec diff
(1) 26515311 18890222 17018880 62424413
(2) 26524728 19423818 16740352 62688898 264485
(3) 26524724 19423818 16740352 62688894 264481
(4) 26524728 19423818 16740352 62688898 264485
(5) 26541782 18964374 16957440 62463596 39183
Memory consumption on a 56 core Intel CPU with 125GB of memory:
Code tags: 192 kB
PageExts: 262144 kB (256MB)
SlabExts: 9876 kB (9.6MB)
PcpuExts: 512 kB (0.5MB)
Total overhead is 0.2% of total memory.
Benchmarks:
Hackbench tests run 100 times:
hackbench -s 512 -l 200 -g 15 -f 25 -P
baseline disabled profiling enabled profiling
avg 0.3543 0.3559 (+0.0016) 0.3566 (+0.0023)
stdev 0.0137 0.0188 0.0077
hackbench -l 10000
baseline disabled profiling enabled profiling
avg 6.4218 6.4306 (+0.0088) 6.5077 (+0.0859)
stdev 0.0933 0.0286 0.0489
stress-ng tests:
stress-ng --class memory --seq 4 -t 60
stress-ng --class cpu --seq 4 -t 60
Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/
[2] https://lore.kernel.org/all/20240306182440.2003814-1-surenb@google.com/
This patch (of 37):
The next patch drops vmalloc.h from a system header in order to fix a
circular dependency; this adds it to all the files that were pulling it in
implicitly.
[kent.overstreet@linux.dev: fix arch/alpha/lib/memcpy.c]
Link: https://lkml.kernel.org/r/20240327002152.3339937-1-kent.overstreet@linux.dev
[surenb@google.com: fix arch/x86/mm/numa_32.c]
Link: https://lkml.kernel.org/r/20240402180933.1663992-1-surenb@google.com
[kent.overstreet@linux.dev: a few places were depending on sizes.h]
Link: https://lkml.kernel.org/r/20240404034744.1664840-1-kent.overstreet@linux.dev
[arnd@arndb.de: fix mm/kasan/hw_tags.c]
Link: https://lkml.kernel.org/r/20240404124435.3121534-1-arnd@kernel.org
[surenb@google.com: fix arc build]
Link: https://lkml.kernel.org/r/20240405225115.431056-1-surenb@google.com
Link: https://lkml.kernel.org/r/20240321163705.3067592-1-surenb@google.com
Link: https://lkml.kernel.org/r/20240321163705.3067592-2-surenb@google.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@samsung.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Deduplicate ->read() callbacks of bin_attributes which are backed by a
simple buffer in memory:
Use the newly introduced sysfs_bin_attr_simple_read() helper instead,
either by referencing it directly or by declaring such bin_attributes
with BIN_ATTR_SIMPLE_RO() or BIN_ATTR_SIMPLE_ADMIN_RO().
Aside from a reduction of LoC, this shaves off a few bytes from vmlinux
(304 bytes on an x86_64 allyesconfig).
No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Zhi Wang <zhiwang@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/92ee0a0e83a5a3f3474845db6c8575297698933a.1712410202.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Pull drm fixes from Dave Airlie:
"fbdev:
- fix uninit var in error path
shmem:
- revert unGPLing an export
i915:
- Don't use stolen memory or BAR mappings for ring buffers with LLC
- Add inverted backlight quirk for HP 14-r206nv
- Fix GSI offset for MCR lookups
- GVT fixes (memleak, debugfs attributes, kconfig, typos)
amdgpu:
- SMU 13 fixes
- Enable TMZ for GC 10.3.6
- Misc display fixes
- Buddy allocator fixes
- GC 11 fixes
- S0ix fix
- INFO IOCTL queries for GC 11
- VCN harvest fixes for SR-IOV
- UMC 8.10 RAS fixes
- Don't restrict bpc to 8
- NBIO 7.5 fix
- Allow freesync on PCon for more devices
amdkfd:
- SDMA fix
- Illegal memory access fix"
* tag 'drm-next-2023-03-03-1' of git://anongit.freedesktop.org/drm/drm: (45 commits)
drm/amdgpu/vcn: fix compilation issue with legacy gcc
drm/amd/display: Extend Freesync over PCon support for more devices
Revert "drm/amd/display: Do not set DRR on pipe commit"
drm/amd/display: fix shift-out-of-bounds in CalculateVMAndRowBytes
drm/amd/display: Ext displays with dock can't recognized after resume
drm/amdgpu: fix ttm_bo calltrace warning in psp_hw_fini
drm/amdgpu: remove unused variable ring
drm/amd/display: fix dm irq error message in gpu recover
drm/amd: Fix initialization for nbio 7.5.1
drm/amd/display: Don't restrict bpc to 8 bpc
drm/amdgpu: Make umc_v8_10_convert_error_address static and remove unused variable
drm/radeon: Fix eDP for single-display iMac11,2
drm/shmem-helper: Revert accidental non-GPL export
drm: omapdrm: Do not use helper unininitialized in omap_fbdev_init()
drm/amd/pm: downgrade log level upon SMU IF version mismatch
drm/amdgpu: Add ecc info query interface for umc v8_10
drm/amdgpu: Add convert_error_address function for umc v8_10
drm/amdgpu: add bad_page_threshold check in ras_eeprom_check_err
drm/amdgpu: change default behavior of bad_page_threshold parameter
drm/amdgpu: exclude duplicate pages from UMC RAS UE count
...
|
|
There is a spelling mistake in a literal string. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20230202125018.285523-1-colin.i.king@gmail.com
|
|
One-element arrays are deprecated, and we are replacing them with
flexible array members instead. So, replace one-element array with
flexible-array member in struct gvt_firmware_header and refactor the
rest of the code accordingly.
Additionally, previous implementation was allocating 8 bytes more than
required to represent firmware_header + cfg_space data + mmio data.
This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [1].
To make reviewing this patch easier, I'm pasting before/after struct
sizes.
pahole -C gvt_firmware_header before/drivers/gpu/drm/i915/gvt/firmware.o
struct gvt_firmware_header {
u64 magic; /* 0 8 */
u32 crc32; /* 8 4 */
u32 version; /* 12 4 */
u64 cfg_space_size; /* 16 8 */
u64 cfg_space_offset; /* 24 8 */
u64 mmio_size; /* 32 8 */
u64 mmio_offset; /* 40 8 */
unsigned char data[1]; /* 48 1 */
/* size: 56, cachelines: 1, members: 8 */
/* padding: 7 */
/* last cacheline: 56 bytes */
};
pahole -C gvt_firmware_header after/drivers/gpu/drm/i915/gvt/firmware.o
struct gvt_firmware_header {
u64 magic; /* 0 8 */
u32 crc32; /* 8 4 */
u32 version; /* 12 4 */
u64 cfg_space_size; /* 16 8 */
u64 cfg_space_offset; /* 24 8 */
u64 mmio_size; /* 32 8 */
u64 mmio_offset; /* 40 8 */
unsigned char data[]; /* 48 0 */
/* size: 48, cachelines: 1, members: 8 */
/* last cacheline: 48 bytes */
};
As you can see the additional byte of the fake-flexible array (data[1])
forced the compiler to pad the struct but those bytes aren't actually used
as first & last bytes (of both cfg_space and mmio) are controlled by the
<>_size and <>_offset members present in the gvt_firmware_header struct.
Link: https://github.com/KSPP/linux/issues/79
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836 [1]
Signed-off-by: Paulo Miguel Almeida <paulo.miguel.almeida.rodenas@gmail.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/Y6Eu2604cqtryP4g@mail.google.com
|
|
struct gvt_firmware_header has a crc32 member in which all members that
come after the that field are used to calculate it. The previous
implementation added the value '4' (crc32's u32 size) to calculate the
crc32_start offset which came across as a bit cryptic until you take a
deeper look at the struct.
This patch changes crc32_start offset to the 'version' member which is
the first member of the struct gvt_firmware_header after crc32.
It's worth mentioning that doing a build before/after this patch results
in no binary output differences.
Signed-off-by: Paulo Miguel Almeida <paulo.miguel.almeida.rodenas@gmail.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20221030033628.GA279284@mail.google.com
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
The code of saving initial HW state snapshot has been moved into i915.
Let the GVT-g core logic use that snapshot.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Vivi Rodrigo <rodrigo.vivi@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20220407071945.72148-4-zhi.a.wang@intel.com
|
|
Using struct drm_device.pdev is deprecated. Convert i915 to struct
drm_device.dev. No functional changes.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210128133127.2311-4-tzimmermann@suse.de
|
|
drm-intel-next-queued
gvt-next-2020-03-10
- Fix CFL dmabuf display after vfio edid enabling (Tina)
- Clean up scan non-priv batch debugfs entry (Chris)
- Use intel engines initialized in gvt, cleanup previous ring id (Chris)
- Use intel_gt instead (Chris)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
From: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200310081928.GG28483@zhen-hp.sh.intel.com
|
|
Teach gvt to use intel_gt directly as it currently assumes direct HW
access.
[Zhenyu: rebase, fix compiling]
Cc: Ding Zhuocheng <zhuocheng.ding@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200304032307.2983-3-zhenyuw@linux.intel.com
|
|
If the module happens to be loaded later at runtime there is a chance
memory is already fragmented enough to fail allocation of firmware
blob storage and consequently GVT init. Since it doesn't seem to be
necessary to have the blob contiguous, use vmalloc() instead to avoid
the issue.
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1579723824-25711-1-git-send-email-igor.druzhinin@citrix.com
|
|
Only a few call sites remain which have been converted to uncore mmio
accessors and so the macro can be removed.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190611104548.30545-5-tvrtko.ursulin@linux.intel.com
|
|
Compute the offset of the end of the crc32 field using offsetofend()
rather than open-coding.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
This patch add a function intel_gvt_for_each_tracked_mmio() to
iterate each tracked mmio. The caller don't be aware of how the
tracked mmios are presented internally.
v2: remove snapshot_hw_mmio_registers().
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
MMIO block with tracked mmio, is introduced for the sake of performance
of searching tracked mmio. All the tracked mmio needs to get the initial
value from the HW state during vGPU being created. This patch is to
initialize the tracked registers in MMIO block with the HW state.
v2: Add "Fixes:" line for this patch (Zhenyu)
Fixes: 65f9f6febf12 ("drm/i915/gvt: Optimize MMIO register handling for some large MMIO blocks")
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
The size, length, addr_mask fields actually are not necessary. Every
tracked mmio has DWORD size, and addr_mask is a legacy field.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
Firmware loading interface for GVT-g golden HW state has been broken
before. This patch fixes GVT-g firmware loading interface. A user should
apply this patch if he wants to load GVT-g golden HW state from firmware
interface.
Fixes: 579cea5 ("drm/i915/gvt: golden virtual HW state management")
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: drm-intel-fixes@lists.freedesktop.org
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
for a handled mmio reg, its default value is read from hardware, while
for an unhandled mmio regs, its default value would be random if not
explicitly set to 0
Signed-off-by: Zhao Yan <yan.y.zhao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
As now gvt init is late after MMIO initialization, use normal MMIO
read function for initial firmware exposure if no available firmware
loaded.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
Add proper __iomem annotation for pointers obtained via ioremap().
Signed-off-by: Du, Changbin <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
i915 core should only call functions and structures exposed through
intel_gvt.h. Remove internal gvt.h and i915_pvinfo.h.
Change for internal intel_gvt structure as private handler which
not requires to expose gvt internal structure for i915 core.
v2: Fix per Chris's comment
- carefully handle dev_priv->gvt assignment
- add necessary bracket for macro helper
- forward declartion struct intel_gvt
- keep free operation within same file handling alloc
v3: fix use after free and remove intel_gvt.initialized
v4: change to_gvt() to an inline
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
Each vGPU expects a golden virtual HW state, which is just the state after
system is freshly powered on. GVT-g will try to load the golden virtual HW
state via kernel firmware interface.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|