Age | Commit message (Collapse) | Author |
|
If post action is not supported, eg. ignore_flow_level is not
supported, don't offload post action rule. Otherwise, will hit
panic [1].
Fix it by checking if post action table is valid or not.
[1]
[445537.863880] BUG: unable to handle page fault for address: ffffffffffffffb1
[445537.864617] #PF: supervisor read access in kernel mode
[445537.865244] #PF: error_code(0x0000) - not-present page
[445537.865860] PGD 70683a067 P4D 70683a067 PUD 70683c067 PMD 0
[445537.866497] Oops: 0000 [#1] PREEMPT SMP NOPTI
[445537.867077] CPU: 19 PID: 248742 Comm: tc Kdump: loaded Tainted: G O 6.5.0+ #1
[445537.867888] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[445537.868834] RIP: 0010:mlx5e_tc_post_act_add+0x51/0x130 [mlx5_core]
[445537.869635] Code: c0 0d 00 00 e8 20 96 c6 d3 48 85 c0 0f 84 e5 00 00 00 c7 83 b0 01 00 00 00 00 00 00 49 89 c5 31 c0 31 d2 66 89 83 b4 01 00 00 <49> 8b 44 24 10 83 23 df 83 8b d8 01 00 00 04 48 89 83 c0 01 00 00
[445537.871318] RSP: 0018:ffffb98741cef428 EFLAGS: 00010246
[445537.871962] RAX: 0000000000000000 RBX: ffff8df341167000 RCX: 0000000000000001
[445537.872704] RDX: 0000000000000000 RSI: ffffffff954844e1 RDI: ffffffff9546e9cb
[445537.873430] RBP: ffffb98741cef448 R08: 0000000000000020 R09: 0000000000000246
[445537.874160] R10: 0000000000000000 R11: ffffffff943f73ff R12: ffffffffffffffa1
[445537.874893] R13: ffff8df36d336c20 R14: ffffffffffffffa1 R15: ffff8df341167000
[445537.875628] FS: 00007fcd6564f800(0000) GS:ffff8dfa9ea00000(0000) knlGS:0000000000000000
[445537.876425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[445537.877090] CR2: ffffffffffffffb1 CR3: 00000003b5884001 CR4: 0000000000770ee0
[445537.877832] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[445537.878564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[445537.879300] PKRU: 55555554
[445537.879797] Call Trace:
[445537.880263] <TASK>
[445537.880713] ? show_regs+0x6e/0x80
[445537.881232] ? __die+0x29/0x70
[445537.881731] ? page_fault_oops+0x85/0x160
[445537.882276] ? search_exception_tables+0x65/0x70
[445537.882852] ? kernelmode_fixup_or_oops+0xa2/0x120
[445537.883432] ? __bad_area_nosemaphore+0x18b/0x250
[445537.884019] ? bad_area_nosemaphore+0x16/0x20
[445537.884566] ? do_kern_addr_fault+0x8b/0xa0
[445537.885105] ? exc_page_fault+0xf5/0x1c0
[445537.885623] ? asm_exc_page_fault+0x2b/0x30
[445537.886149] ? __kmem_cache_alloc_node+0x1df/0x2a0
[445537.886717] ? mlx5e_tc_post_act_add+0x51/0x130 [mlx5_core]
[445537.887431] ? mlx5e_tc_post_act_add+0x30/0x130 [mlx5_core]
[445537.888172] alloc_flow_post_acts+0xfb/0x1c0 [mlx5_core]
[445537.888849] parse_tc_actions+0x582/0x5c0 [mlx5_core]
[445537.889505] parse_tc_fdb_actions+0xd7/0x1f0 [mlx5_core]
[445537.890175] __mlx5e_add_fdb_flow+0x1ab/0x2b0 [mlx5_core]
[445537.890843] mlx5e_add_fdb_flow+0x56/0x120 [mlx5_core]
[445537.891491] ? debug_smp_processor_id+0x1b/0x30
[445537.892037] mlx5e_tc_add_flow+0x79/0x90 [mlx5_core]
[445537.892676] mlx5e_configure_flower+0x305/0x450 [mlx5_core]
[445537.893341] mlx5e_rep_setup_tc_cls_flower+0x3d/0x80 [mlx5_core]
[445537.894037] mlx5e_rep_setup_tc_cb+0x5c/0xa0 [mlx5_core]
[445537.894693] tc_setup_cb_add+0xdc/0x220
[445537.895177] fl_hw_replace_filter+0x15f/0x220 [cls_flower]
[445537.895767] fl_change+0xe87/0x1190 [cls_flower]
[445537.896302] tc_new_tfilter+0x484/0xa50
Fixes: f0da4daa3413 ("net/mlx5e: Refactor ct to use post action infrastructure")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Automatic Verification <verifier@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shachar Kagan <skagan@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
|
|
Due to the cited patch, devlink health commands take devlink lock and
this may result in deadlock for mlx5e_tx_reporter as it takes local
state_lock before calling devlink health report and on the other hand
devlink health commands such as diagnose for same reporter take local
state_lock after taking devlink lock (see kernel log below).
To fix it, remove local state_lock from mlx5e_tx_timeout_work() before
calling devlink_health_report() and take care to cancel the work before
any call to close channels, which may free the SQs that should be
handled by the work. Before cancel_work_sync(), use current_work() to
check we are not calling it from within the work, as
mlx5e_tx_timeout_work() itself may close the channels and reopen as part
of recovery flow.
While removing state_lock from mlx5e_tx_timeout_work() keep rtnl_lock to
ensure no change in netdev->real_num_tx_queues, but use rtnl_trylock()
and a flag to avoid deadlock by calling cancel_work_sync() before
closing the channels while holding rtnl_lock too.
Kernel log:
======================================================
WARNING: possible circular locking dependency detected
6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Not tainted
------------------------------------------------------
kworker/u16:2/65 is trying to acquire lock:
ffff888122f6c2f8 (&devlink->lock_key#2){+.+.}-{3:3}, at: devlink_health_report+0x2f1/0x7e0
but task is already holding lock:
ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&priv->state_lock){+.+.}-{3:3}:
__mutex_lock+0x12c/0x14b0
mlx5e_rx_reporter_diagnose+0x71/0x700 [mlx5_core]
devlink_nl_cmd_health_reporter_diagnose_doit+0x212/0xa50
genl_family_rcv_msg_doit+0x1e9/0x2f0
genl_rcv_msg+0x2e9/0x530
netlink_rcv_skb+0x11d/0x340
genl_rcv+0x24/0x40
netlink_unicast+0x438/0x710
netlink_sendmsg+0x788/0xc40
sock_sendmsg+0xb0/0xe0
__sys_sendto+0x1c1/0x290
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x46/0xb0
-> #0 (&devlink->lock_key#2){+.+.}-{3:3}:
__lock_acquire+0x2c8a/0x6200
lock_acquire+0x1c1/0x550
__mutex_lock+0x12c/0x14b0
devlink_health_report+0x2f1/0x7e0
mlx5e_health_report+0xc9/0xd7 [mlx5_core]
mlx5e_reporter_tx_timeout+0x2ab/0x3d0 [mlx5_core]
mlx5e_tx_timeout_work+0x1c1/0x280 [mlx5_core]
process_one_work+0x7c2/0x1340
worker_thread+0x59d/0xec0
kthread+0x28f/0x330
ret_from_fork+0x1f/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&priv->state_lock);
lock(&devlink->lock_key#2);
lock(&priv->state_lock);
lock(&devlink->lock_key#2);
*** DEADLOCK ***
4 locks held by kworker/u16:2/65:
#0: ffff88811a55b138 ((wq_completion)mlx5e#2){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340
#1: ffff888101de7db8 ((work_completion)(&priv->tx_timeout_work)){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340
#2: ffffffff84ce8328 (rtnl_mutex){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x53/0x280 [mlx5_core]
#3: ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core]
stack backtrace:
CPU: 1 PID: 65 Comm: kworker/u16:2 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core]
Call Trace:
<TASK>
dump_stack_lvl+0x57/0x7d
check_noncircular+0x278/0x300
? print_circular_bug+0x460/0x460
? find_held_lock+0x2d/0x110
? __stack_depot_save+0x24c/0x520
? alloc_chain_hlocks+0x228/0x700
__lock_acquire+0x2c8a/0x6200
? register_lock_class+0x1860/0x1860
? kasan_save_stack+0x1e/0x40
? kasan_set_free_info+0x20/0x30
? ____kasan_slab_free+0x11d/0x1b0
? kfree+0x1ba/0x520
? devlink_health_do_dump.part.0+0x171/0x3a0
? devlink_health_report+0x3d5/0x7e0
lock_acquire+0x1c1/0x550
? devlink_health_report+0x2f1/0x7e0
? lockdep_hardirqs_on_prepare+0x400/0x400
? find_held_lock+0x2d/0x110
__mutex_lock+0x12c/0x14b0
? devlink_health_report+0x2f1/0x7e0
? devlink_health_report+0x2f1/0x7e0
? mutex_lock_io_nested+0x1320/0x1320
? trace_hardirqs_on+0x2d/0x100
? bit_wait_io_timeout+0x170/0x170
? devlink_health_do_dump.part.0+0x171/0x3a0
? kfree+0x1ba/0x520
? devlink_health_do_dump.part.0+0x171/0x3a0
devlink_health_report+0x2f1/0x7e0
mlx5e_health_report+0xc9/0xd7 [mlx5_core]
mlx5e_reporter_tx_timeout+0x2ab/0x3d0 [mlx5_core]
? lockdep_hardirqs_on_prepare+0x400/0x400
? mlx5e_reporter_tx_err_cqe+0x1b0/0x1b0 [mlx5_core]
? mlx5e_tx_reporter_timeout_dump+0x70/0x70 [mlx5_core]
? mlx5e_tx_reporter_dump_sq+0x320/0x320 [mlx5_core]
? mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core]
? mutex_lock_io_nested+0x1320/0x1320
? process_one_work+0x70f/0x1340
? lockdep_hardirqs_on_prepare+0x400/0x400
? lock_downgrade+0x6e0/0x6e0
mlx5e_tx_timeout_work+0x1c1/0x280 [mlx5_core]
process_one_work+0x7c2/0x1340
? lockdep_hardirqs_on_prepare+0x400/0x400
? pwq_dec_nr_in_flight+0x230/0x230
? rwlock_bug.part.0+0x90/0x90
worker_thread+0x59d/0xec0
? process_one_work+0x1340/0x1340
kthread+0x28f/0x330
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Fixes: c90005b5f75c ("devlink: Hold the instance lock in health callbacks")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
IPsec FDB offload can only work with FW steering as of now,
disable the cap upon non FW steering.
And since the IPSec cap is dynamic now based on steering mode.
Cleanup the resources if they exist instead of checking the
IPsec cap again.
Fixes: edd8b295f9e2 ("Merge branch 'mlx5-ipsec-packet-offload-support-in-eswitch-mode'")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
After IPSec TX tables are destroyed, the flow rules in TC rhashtable,
which have the destination to IPSec, are restored to the original
one, the uplink.
However, when the device is in switchdev mode and unload driver with
IPSec rules configured, TC rhashtable cleanup is done before IPSec
cleanup, which means tc_ht->tbl is already freed when walking TC
rhashtable, in order to restore the destination. So add the checking
before walking to avoid unexpected behavior.
Fixes: d1569537a837 ("net/mlx5e: Modify and restore TC rules for IPSec TX rules")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Currently eswitch mode_lock is so heavy, for example, it's locked
during the whole process of the mode change, which may need to hold
other locks. As the mode_lock is also used by IPSec to block mode and
encap change now, it is easy to cause lock dependency.
Since some of protections are also done by devlink lock, the eswitch
mode_lock is not needed at those places, and thus the possibility of
lockdep issue is reduced.
Fixes: c8e350e62fc5 ("net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
IPsec NAT-T packets are UDP encapsulated packets over ESP normal ones.
In case they arrive to RX, the SPI and ESP are located in inner header,
while the check was performed on outer header instead.
That wrong check caused to the situation where received rekeying request
was missed and caused to rekey timeout, which "compensated" this failure
by completing rekeying.
Fixes: d65954934937 ("net/mlx5e: Support IPsec NAT-T functionality")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
After IPsec decryption it isn't enough to only check the IPsec syndrome
but need to also check the ASO syndrome in order to verify that the
operation was actually successful.
Verify that both syndromes are actually zero and in case not drop the
packet and increment the appropriate flow counter for the drop reason.
Fixes: 6b5c45e16e43 ("net/mlx5e: Configure IPsec packet offload flow steering")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
After previous commit, which unified various IPsec creation modes,
there is no need to have struct mlx5e_ipsec_rx exposed in global
IPsec header. Move it to ipsec_fs.c to be placed together with
already existing struct mlx5e_ipsec_tx.
Fixes: 1762f132d542 ("net/mlx5e: Support IPsec packet offload for RX in switchdev mode")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Change normal IPsec flow to use the same creation/destruction functions
for status flow table as that of ESW, which first of all refines the
code to have less code duplication.
And more importantly, the ESW status table handles IPsec syndrome
checks at steering by HW, which is more efficient than the previous
behaviour we had where it was copied to WQE meta data and checked
by the driver.
Fixes: 1762f132d542 ("net/mlx5e: Support IPsec packet offload for RX in switchdev mode")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
According to RFC4303, section "3.3.3. Sequence Number Generation",
the first packet sent using a given SA will contain a sequence
number of 1.
However if user didn't set seq/oseq, the HW used zero as first sequence
packet number. Such misconfiguration causes to drop of first packet
if replay window protection was enabled in SA.
To fix it, set sequence number to be at least 1.
Fixes: 7db21ef4566e ("net/mlx5e: Set IPsec replay sequence numbers")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Users can configure IPsec replay window size, but mlx5 driver didn't
honor their choice and set always 32bits. Fix assignment logic to
configure right size from the beginning.
Fixes: 7db21ef4566e ("net/mlx5e: Set IPsec replay sequence numbers")
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
The status bits of register MAC_FPE_CTRL_STS are clear on read. Using
32-bit read for MAC_FPE_CTRL_STS in dwmac5_fpe_configure() and
dwmac5_fpe_send_mpacket() clear the status bits. Then the stmmac interrupt
handler missing FPE event status and leads to FPE handshaking failure and
retries.
To avoid clear status bits of MAC_FPE_CTRL_STS in dwmac5_fpe_configure()
and dwmac5_fpe_send_mpacket(), add fpe_csr to stmmac_fpe_cfg structure to
cache the control bits of MAC_FPE_CTRL_STS and to avoid reading
MAC_FPE_CTRL_STS in those methods.
Fixes: 5a5586112b92 ("net: stmmac: support FPE link partner hand-shaking procedure")
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Jianheng Zhang <Jianheng.Zhang@synopsys.com>
Link: https://lore.kernel.org/r/CY5PR12MB637225A7CF529D5BE0FBE59CBF81A@CY5PR12MB6372.namprd12.prod.outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
coalescing
The current adaptive interrupt coalescing code updates only rx
packet stats for dim algorithm. This patch also updates tx packet
stats which will be useful when there is only tx traffic.
Also moved configuring hardware adaptive interrupt setting to
driver dim callback.
Fixes: 6e144b47f560 ("octeontx2-pf: Add support for adaptive interrupt coalescing")
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://lore.kernel.org/r/20231201053330.3903694-1-sumang@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Probe of Sohard Arcnet cards fails,
if 2 or more cards are installed in a system.
See kernel log:
[ 2.759203] arcnet: arcnet loaded
[ 2.763648] arcnet:com20020: COM20020 chipset support (by David Woodhouse et al.)
[ 2.770585] arcnet:com20020_pci: COM20020 PCI support
[ 2.772295] com20020 0000:02:00.0: enabling device (0000 -> 0003)
[ 2.772354] (unnamed net_device) (uninitialized): PLX-PCI Controls
...
[ 3.071301] com20020 0000:02:00.0 arc0-0 (uninitialized): PCI COM20020: station FFh found at F080h, IRQ 101.
[ 3.071305] com20020 0000:02:00.0 arc0-0 (uninitialized): Using CKP 64 - data rate 2.5 Mb/s
[ 3.071534] com20020 0000:07:00.0: enabling device (0000 -> 0003)
[ 3.071581] (unnamed net_device) (uninitialized): PLX-PCI Controls
...
[ 3.369501] com20020 0000:07:00.0: Led pci:green:tx:0-0 renamed to pci:green:tx:0-0_1 due to name collision
[ 3.369535] com20020 0000:07:00.0: Led pci:red:recon:0-0 renamed to pci:red:recon:0-0_1 due to name collision
[ 3.370586] com20020 0000:07:00.0 arc0-0 (uninitialized): PCI COM20020: station E1h found at C000h, IRQ 35.
[ 3.370589] com20020 0000:07:00.0 arc0-0 (uninitialized): Using CKP 64 - data rate 2.5 Mb/s
[ 3.370608] com20020: probe of 0000:07:00.0 failed with error -5
commit 5ef216c1f848 ("arcnet: com20020-pci: add rotary index support")
changes the device name of all COM20020 based PCI cards,
even if only some cards support this:
snprintf(dev->name, sizeof(dev->name), "arc%d-%d", dev->dev_id, i);
The error happens because all Sohard Arcnet cards would be called arc0-0,
since the Sohard Arcnet cards don't have a PLX rotary coder.
I.e. EAE Arcnet cards have a PLX rotary coder,
which sets the first decimal, ensuring unique devices names.
This patch adds two new card feature flags to indicate
which cards support LEDs and the PLX rotary coder.
For EAE based cards the names still depend on the PLX rotary coder
(untested, since missing EAE hardware).
For Sohard based cards, this patch will result in devices
being called arc0, arc1, ... (tested).
Signed-off-by: Thomas Reichinger <thomas.reichinger@sohard.de>
Fixes: 5ef216c1f848 ("arcnet: com20020-pci: add rotary index support")
Link: https://lore.kernel.org/r/20231130113503.6812-1-thomas.reichinger@sohard.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In some potential instances the reference count on struct packet_sock
could be saturated and cause overflows which gets the kernel a bit
confused. To prevent this, move to a 64-bit atomic reference count on
64-bit architectures to prevent the possibility of this type to overflow.
Because we can not handle saturation, using refcount_t is not possible
in this place. Maybe someday in the future if it changes it could be
used. Also, instead of using plain atomic64_t, use atomic_long_t instead.
32-bit machines tend to be memory-limited (i.e. anything that increases
a reference uses so much memory that you can't actually get to 2**32
references). 32-bit architectures also tend to have serious problems
with 64-bit atomics. Hence, atomic_long_t is the more natural solution.
Reported-by: "The UK's National Cyber Security Centre (NCSC)" <security@ncsc.gov.uk>
Co-developed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231201131021.19999-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd fixes from Jason Gunthorpe:
- A small fix for the dirty tracking self test to fail correctly if the
code is buggy
- Fix a tricky syzkaller race UAF with object reference counting
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd: Do not UAF during iommufd_put_object()
iommufd: Add iommufd_ctx to iommufd_put_object()
iommufd/selftest: Fix _test_mock_dirty_bitmaps()
|
|
Pull vdpa fixes from Michael Tsirkin:
"Fixes in mlx5 and pds drivers"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
pds_vdpa: set features order
pds_vdpa: clear config callback when status goes to 0
pds_vdpa: fix up format-truncation complaint
vdpa/mlx5: preserve CVQ vringh index
|
|
devm_hwmon_device_register_with_groups() returns an error pointer upon
failure. Check its return value for errors.
Compile-tested only.
Fixes: 1a218d312e65 ("platform/mellanox: mlxbf-pmc: Add Mellanox BlueField PMC driver")
Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Suggested-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20231201055447.2356001-1-chentao@kylinos.cn
[ij: split the change into two]
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
devm_kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure.
Compile-tested only.
Fixes: 1a218d312e65 ("platform/mellanox: mlxbf-pmc: Add Mellanox BlueField PMC driver")
Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Suggested-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20231201055447.2356001-1-chentao@kylinos.cn
[ij: split the change into two]
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
The secure boot state of the BlueField SoC is represented by two bits:
0 = production state
1 = secure boot enabled
2 = non-secure (secure boot disabled)
3 = RMA state
There is also a single bit to indicate whether production keys or
development keys are being used when secure boot is enabled.
This single bit (specified by MLXBF_BOOTCTL_SB_DEV_MASK) only has
meaning if secure boot state equals 1 (secure boot enabled).
The secure boot states are as follows:
- “GA secured” is when secure boot is enabled with official production keys.
- “Secured (development)” is when secure boot is enabled with development keys.
Without this fix “GA Secured” is displayed on development cards which is
misleading. This patch updates the logic in "lifecycle_state_show()" to
handle the case where the SoC is configured for secure boot and is using
development keys.
Fixes: 79e29cb8fbc5c ("platform/mellanox: Add bootctl driver for Mellanox BlueField Soc")
Reviewed-by: Khalil Blaiech <kblaiech@nvidia.com>
Signed-off-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/20231130183515.17214-1-davthompson@nvidia.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Since commit 0ec7731655de ("regmap: Ensure range selector registers
are updated after cache sync") opening pcm512x based soundcards fail
with EINVAL and dmesg shows sync cache and pm_runtime_get errors:
[ 228.794676] pcm512x 1-004c: Failed to sync cache: -22
[ 228.794740] pcm512x 1-004c: ASoC: error at snd_soc_pcm_component_pm_runtime_get on pcm512x.1-004c: -22
This is caused by the cache check result leaking out into the
regcache_sync return value.
Fix this by making the check local-only, as the comment above the
regcache_read call states a non-zero return value means there's
nothing to do so the return value should not be altered.
Fixes: 0ec7731655de ("regmap: Ensure range selector registers are updated after cache sync")
Cc: stable@vger.kernel.org
Signed-off-by: Matthias Reichl <hias@horus.com>
Link: https://lore.kernel.org/r/20231203222216.96547-1-hias@horus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Delay loops in r8152 should break out if RTL8152_INACCESSIBLE is set
so that they don't delay too long if the device becomes
inaccessible. Add the break to the loop in r8153_aldps_en().
Fixes: 4214cc550bf9 ("r8152: check if disabling ALDPS is finished")
Reviewed-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Delay loops in r8152 should break out if RTL8152_INACCESSIBLE is set
so that they don't delay too long if the device becomes
inaccessible. Add the break to the loop in r8153_pre_firmware_1().
Fixes: 9370f2d05a2a ("r8152: support request_firmware for RTL8153")
Reviewed-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Delay loops in r8152 should break out if RTL8152_INACCESSIBLE is set
so that they don't delay too long if the device becomes
inaccessible. Add the break to the loop in
r8156b_wait_loading_flash().
Fixes: 195aae321c82 ("r8152: support new chips")
Reviewed-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previous commits added checks for RTL8152_INACCESSIBLE in the loops in
the driver. There are still a few more that keep tripping the driver
up in error cases and make things take longer than they should. Add
those in.
All the loops that are part of this commit existed in some form or
another since the r8152 driver was first introduced, though
RTL8152_INACCESSIBLE was known as RTL8152_UNPLUG before commit
715f67f33af4 ("r8152: Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE")
Fixes: ac718b69301c ("net/usb: new driver for RTL8152")
Reviewed-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As of commit d9962b0d4202 ("r8152: Block future register access if
register access fails") there is a race condition that can happen
between the USB device reset thread and napi_enable() (not) getting
called during rtl8152_open(). Specifically:
* While rtl8152_open() is running we get a register access error
that's _not_ -ENODEV and queue up a USB reset.
* rtl8152_open() exits before calling napi_enable() due to any reason
(including usb_submit_urb() returning an error).
In that case:
* Since the USB reset is perform in a separate thread asynchronously,
it can run at anytime USB device lock is not held - even before
rtl8152_open() has exited with an error and caused __dev_open() to
clear the __LINK_STATE_START bit.
* The rtl8152_pre_reset() will notice that the netif_running() returns
true (since __LINK_STATE_START wasn't cleared) so it won't exit
early.
* rtl8152_pre_reset() will then hang in napi_disable() because
napi_enable() was never called.
We can fix the race by making sure that the r8152 reset routines don't
run at the same time as we're opening the device. Specifically we need
the reset routines in their entirety rely on the return value of
netif_running(). The only way to reliably depend on that is for them
to hold the rntl_lock() mutex for the duration of reset.
Grabbing the rntl_lock() mutex for the duration of reset seems like a
long time, but reset is not expected to be common and the rtnl_lock()
mutex is already held for long durations since the core grabs it
around the open/close calls.
Fixes: d9962b0d4202 ("r8152: Block future register access if register access fails")
Reviewed-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
Pull smb client fixes from Steve French:
- Two fallocate fixes
- Fix warnings from new gcc
- Two symlink fixes
* tag 'v6.7-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client, common: fix fortify warnings
cifs: Fix FALLOC_FL_INSERT_RANGE by setting i_size after EOF moved
cifs: Fix FALLOC_FL_ZERO_RANGE by setting i_size if EOF moved
smb: client: report correct st_size for SMB and NFS symlinks
smb: client: fix missing mode bits for SMB symlinks
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fix from Takashi Sakamoto:
"A single patch to fix long-standing issue of memory leak at failure of
device registration for fw_unit. We rarely encounter the issue, but it
should be applied to stable releases, since it fixes inappropriate API
usage"
* tag 'firewire-fixes-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: core: fix possible memory leak in create_units()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix corruption of f0/vs0 during FP/Vector save, seen as userspace
crashes when using io-uring workers (in particular with MariaDB)
- Fix KVM_RUN potentially clobbering all host userspace FP/Vector
registers
Thanks to Timothy Pearson, Jens Axboe, and Nicholas Piggin.
* tag 'powerpc-6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Fix KVM_RUN clobbering FP/VEC user registers
powerpc: Don't clobber f0/vs0 during fp|altivec register save
|
|
Pull vfio fixes from Alex Williamson:
- Fix the lifecycle of a mutex in the pds variant driver such that a
reset prior to opening the device won't find it uninitialized.
Implement the release path to symmetrically destroy the mutex. Also
switch a different lock from spinlock to mutex as the code path has
the potential to sleep and doesn't need the spinlock context
otherwise (Brett Creeley)
- Fix an issue detected via randconfig where KVM tries to symbol_get an
undeclared function. The symbol is temporarily declared
unconditionally here, which resolves the problem and avoids churn
relative to a series pending for the next merge window which resolves
some of this symbol ugliness, but also fixes Kconfig dependencies
(Sean Christopherson)
* tag 'vfio-v6.7-rc4' of https://github.com/awilliam/linux-vfio:
vfio: Drop vfio_file_iommu_group() stub to fudge around a KVM wart
vfio/pds: Fix possible sleep while in atomic context
vfio/pds: Fix mutex lock->magic != lock warning
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- A fix for the Xen event driver setting the correct return value when
experiencing an allocation failure
- A fix for allocating space for a struct in the percpu area to not
cross page boundaries (this one is for x86, a similar one for Arm was
already in the pull request for rc3)
* tag 'for-linus-6.7a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events: fix error code in xen_bind_pirq_msi_to_irq()
x86/xen: fix percpu vcpu_info allocation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:
- objpool: Fix objpool overrun case on memory/cache access delay
especially on the big.LITTLE SoC. The objpool uses a copy of object
slot index internal loop, but the slot index can be changed on
another processor in parallel. In that case, the difference of 'head'
local copy and the 'slot->last' index will be bigger than local slot
size. In that case, we need to re-read the slot::head to update it.
- kretprobe: Fix to use appropriate rcu API for kretprobe holder. Since
kretprobe_holder::rp is RCU managed, it should use
rcu_assign_pointer() and rcu_dereference_check() correctly. Also
adding __rcu tag for finding wrong usage by sparse.
- rethook: Fix to use appropriate rcu API for rethook::handler. The
same as kretprobe, rethook::handler is RCU managed and it should use
rcu_assign_pointer() and rcu_dereference_check(). This also adds
__rcu tag for finding wrong usage by sparse.
* tag 'probes-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rethook: Use __rcu pointer for rethook::handler
kprobes: consistent rcu api usage for kretprobe holder
lib: objpool: fix head overrun on RK3588 SBC
|
|
When FIFO reaches near full state, device will issue pause frame.
If pause slot is enabled(set to 1), in this time, device will issue
pause frame only once. But if pause slot is disabled(set to 0), device
will keep sending pause frames until FIFO reaches near empty state.
When pause slot is disabled, if there is no one to handle receive
packets, device FIFO will reach near full state and keep sending
pause frames. That will impact entire local area network.
This issue can be reproduced in Chromebox (not Chromebook) in
developer mode running a test image (and v5.10 kernel):
1) ping -f $CHROMEBOX (from workstation on same local network)
2) run "powerd_dbus_suspend" from command line on the $CHROMEBOX
3) ping $ROUTER (wait until ping fails from workstation)
Takes about ~20-30 seconds after step 2 for the local network to
stop working.
Fix this issue by enabling pause slot to only send pause frame once
when FIFO reaches near full state.
Fixes: f1bce4ad2f1c ("r8169: add support for RTL8125")
Reported-by: Grant Grundler <grundler@chromium.org>
Tested-by: Grant Grundler <grundler@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: ChunHao Lin <hau@realtek.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/20231129155350.5843-1-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
rndis_filter uses utf8s_to_utf16s() which is provided by setting
NLS, so select NLS to fix the build error:
ERROR: modpost: "utf8s_to_utf16s" [drivers/net/hyperv/hv_netvsc.ko] undefined!
Fixes: 1ce09e899d28 ("hyperv: Add support for setting MAC from within guests")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20231130055853.19069-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When an EEH error is encountered by a PCI adapter, the EEH driver
modifies the PCI channel's state as shown below:
enum {
/* I/O channel is in normal state */
pci_channel_io_normal = (__force pci_channel_state_t) 1,
/* I/O to channel is blocked */
pci_channel_io_frozen = (__force pci_channel_state_t) 2,
/* PCI card is dead */
pci_channel_io_perm_failure = (__force pci_channel_state_t) 3,
};
If the same EEH error then causes the tg3 driver's transmit timeout
logic to execute, the tg3_tx_timeout() function schedules a reset
task via tg3_reset_task_schedule(), which may cause a race condition
between the tg3 and EEH driver as both attempt to recover the HW via
a reset action.
EEH driver gets error event
--> eeh_set_channel_state()
and set device to one of
error state above scheduler: tg3_reset_task() get
returned error from tg3_init_hw()
--> dev_close() shuts down the interface
tg3_io_slot_reset() and
tg3_io_resume() fail to
reset/resume the device
To resolve this issue, we avoid the race condition by checking the PCI
channel state in the tg3_reset_task() function and skip the tg3 driver
initiated reset when the PCI channel is not in the normal state. (The
driver has no access to tg3 device registers at this point and cannot
even complete the reset task successfully without external assistance.)
We'll leave the reset procedure to be managed by the EEH driver which
calls the tg3_io_error_detected(), tg3_io_slot_reset() and
tg3_io_resume() functions as appropriate.
Adding the same checking in tg3_dump_state() to avoid dumping all
device registers when the PCI channel is not in the normal state.
Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
Tested-by: Venkata Sai Duggi <venkata.sai.duggi@ibm.com>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231201001911.656-1-thinhtr@linux.vnet.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix issues in two cpufreq drivers, in the AMD P-state driver and
in the power-capping DTPM framework.
Specifics:
- Fix the AMD P-state driver's EPP sysfs interface in the cases when
the performance governor is in use (Ayush Jain)
- Make the ->fast_switch() callback in the AMD P-state driver return
the target frequency as expected (Gautham R. Shenoy)
- Allow user space to control the range of frequencies to use via
scaling_min_freq and scaling_max_freq when AMD P-state driver is in
use (Wyes Karny)
- Prevent power domains needed for wakeup signaling from being turned
off during system suspend on Qualcomm systems and prevent
performance states votes from runtime-suspended devices from being
lost across a system suspend-resume cycle in qcom-cpufreq-nvmem
(Stephan Gerhold)
- Fix disabling the 792 Mhz OPP in the imx6q cpufreq driver for the
i.MX6ULL types that can run at that frequency (Christoph
Niedermaier)
- Eliminate unnecessary and harmful conversions to uW from the DTPM
(dynamic thermal and power management) framework (Lukasz Luba)"
* tag 'pm-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq/amd-pstate: Only print supported EPP values for performance governor
cpufreq/amd-pstate: Fix scaling_min_freq and scaling_max_freq update
powercap: DTPM: Fix unneeded conversions to micro-Watts
cpufreq/amd-pstate: Fix the return value of amd_pstate_fast_switch()
pmdomain: qcom: rpmpd: Set GENPD_FLAG_ACTIVE_WAKEUP
cpufreq: qcom-nvmem: Preserve PM domain votes in system suspend
cpufreq: qcom-nvmem: Enable virtual power domain devices
cpufreq: imx6q: Don't disable 792 Mhz OPP unnecessarily
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"This fixes a recently introduced build issue on ARM32 and a NULL
pointer dereference in the ACPI backlight driver due to a design issue
exposed by a recent change in the ACPI bus type code.
Specifics:
- Fix a recently introduced build issue on ARM32 platforms caused by
an inadvertent header file breakage (Dave Jiang)
- Eliminate questionable usage of acpi_driver_data() in the ACPI
backlight cooling device code that leads to NULL pointer
dereferences after recent ACPI core changes (Hans de Goede)"
* tag 'acpi-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: video: Use acpi_video_device for cooling-dev driver data
ACPI: Fix ARM32 platforms compile issue introduced by fw_table changes
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Fix a regression where the arm64 KPTI ends up enabled even on systems
that don't need it"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Avoid enabling KPTI unnecessarily
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Fix race conditions in device probe path
- Handle ERR_PTR() returns in __iommu_domain_alloc() path
- Update MAINTAINERS entry for Qualcom IOMMUs
- Printk argument fix in device tree specific code
- Several Intel VT-d fixes from Lu Baolu:
- Do not support enforcing cache coherency for non-empty domains
- Avoid devTLB invalidation if iommu is off
- Disable PCI ATS in legacy passthrough mode
- Support non-PCI devices when clearing context
- Fix incorrect cache invalidation for mm notification
- Add MTL to quirk list to skip TE disabling
- Set variable intel_dirty_ops to static
* tag 'iommu-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Fix printk arg in of_iommu_get_resv_regions()
iommu/vt-d: Set variable intel_dirty_ops to static
iommu/vt-d: Fix incorrect cache invalidation for mm notification
iommu/vt-d: Add MTL to quirk list to skip TE disabling
iommu/vt-d: Make context clearing consistent with context mapping
iommu/vt-d: Disable PCI ATS in legacy passthrough mode
iommu/vt-d: Omit devTLB invalidation requests when TES=0
iommu/vt-d: Support enforce_cache_coherency only for empty domains
iommu: Avoid more races around device probe
MAINTAINERS: list all Qualcomm IOMMU drivers in the QUALCOMM IOMMU entry
iommu: Flow ERR_PTR out from __iommu_domain_alloc()
|
|
Bpf cpu=v4 support is introduced in [1] and Commit 4cd58e9af8b9
("bpf: Support new 32bit offset jmp instruction") added support for new
32bit offset jmp instruction. Unfortunately, in function
bpf_adj_delta_to_off(), for new branch insn with 32bit offset, the offset
(plus/minor a small delta) compares to 16-bit offset bound
[S16_MIN, S16_MAX], which caused the following verification failure:
$ ./test_progs-cpuv4 -t verif_scale_pyperf180
...
insn 10 cannot be patched due to 16-bit range
...
libbpf: failed to load object 'pyperf180.bpf.o'
scale_test:FAIL:expect_success unexpected error: -12 (errno 12)
#405 verif_scale_pyperf180:FAIL
Note that due to recent llvm18 development, the patch [2] (already applied
in bpf-next) needs to be applied to bpf tree for testing purpose.
The fix is rather simple. For 32bit offset branch insn, the adjusted
offset compares to [S32_MIN, S32_MAX] and then verification succeeded.
[1] https://lore.kernel.org/all/20230728011143.3710005-1-yonghong.song@linux.dev
[2] https://lore.kernel.org/bpf/20231110193644.3130906-1-yonghong.song@linux.dev
Fixes: 4cd58e9af8b9 ("bpf: Support new 32bit offset jmp instruction")
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231201024640.3417057-1-yonghong.song@linux.dev
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"No surprise here, including only a collection of HD-audio
device-specific small fixes"
* tag 'sound-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda: Disable power-save on KONTRON SinglePC
ALSA: hda/realtek: Add supported ALC257 for ChromeOS
ALSA: hda/realtek: Headset Mic VREF to 100%
ALSA: hda: intel-nhlt: Ignore vbps when looking for DMIC 32 bps format
ALSA: hda: cs35l56: Enable low-power hibernation mode on SPI
ALSA: cs35l41: Fix for old systems which do not support command
ALSA: hda: cs35l41: Remove unnecessary boolean state variable firmware_running
ALSA: hda - Fix speaker and headset mic pin config for CHUWI CoreBook XPro
|
|
Pull drm fixes from Dave Airlie:
"Weekly fixes, mostly amdgpu fixes with a scattering of nouveau, i915,
and a couple of reverts. Hopefully it will quieten down in coming
weeks.
drm:
- Revert unexport of prime helpers for fd/handle conversion
dma_resv:
- Do not double add fences in dma_resv_add_fence.
gpuvm:
- Fix GPUVM license identifier.
i915:
- Mark internal GSC engine with reserved uabi class
- Take VGA converters into account in eDP probe
- Fix intel_pre_plane_updates() call to ensure workarounds get applied
panel:
- Revert panel fixes as they require exporting device_is_dependent.
nouveau:
- fix oversized allocations in new vm path
- fix zero-length array
- remove a stray lock
nt36523:
- Fix error check for nt36523.
amdgpu:
- DMUB fix
- DCN 3.5 fixes
- XGMI fix
- DCN 3.2 fixes
- Vangogh suspend fix
- NBIO 7.9 fix
- GFX11 golden register fix
- Backlight fix
- NBIO 7.11 fix
- IB test overflow fix
- DCN 3.1.4 fixes
- fix a runtime pm ref count
- Retimer fix
- ABM fix
- DCN 3.1.5 fix
- Fix AGP addressing
- Fix possible memory leak in SMU error path
- Make sure PME is enabled in D3
- Fix possible NULL pointer dereference in debugfs
- EEPROM fix
- GC 9.4.3 fix
amdkfd:
- IP version check fix
- Fix memory leak in pqm_uninit()"
* tag 'drm-fixes-2023-12-01' of git://anongit.freedesktop.org/drm/drm: (53 commits)
Revert "drm/prime: Unexport helpers for fd/handle conversion"
drm/amdgpu: Use another offset for GC 9.4.3 remap
drm/amd/display: Fix some HostVM parameters in DML
drm/amdkfd: Free gang_ctx_bo and wptr_bo in pqm_uninit
drm/amdgpu: Update EEPROM I2C address for smu v13_0_0
drm/amd/display: Allow DTBCLK disable for DCN35
drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointer
drm/amd: Enable PCIe PME from D3
drm/amd/pm: fix a memleak in aldebaran_tables_init
drm/amdgpu: fix AGP addressing when GART is not at 0
drm/amd/display: update dcn315 lpddr pstate latency
drm/amd/display: fix ABM disablement
drm/amd/display: Fix black screen on video playback with embedded panel
drm/amd/display: Fix conversions between bytes and KB
drm/amdkfd: Use common function for IP version check
drm/amd/display: Remove config update
drm/amd/display: Update DCN35 clock table policy
drm/amd/display: force toggle rate wa for first link training for a retimer
drm/amdgpu: correct the amdgpu runtime dereference usage count
drm/amd/display: Update min Z8 residency time to 2100 for DCN314
...
|
|
Pull io_uring fixes from Jens Axboe:
- Fix an issue with discontig page checking for IORING_SETUP_NO_MMAP
- Fix an issue with not allowing IORING_SETUP_NO_MMAP also disallowing
mmap'ed buffer rings
- Fix an issue with deferred release of memory mapped pages
- Fix a lockdep issue with IORING_SETUP_NO_MMAP
- Use fget/fput consistently, even from our sync system calls. No real
issue here, but if we were ever to allow closing io_uring descriptors
it would be required. Let's play it safe and just use the full ref
counted versions upfront. Most uses of io_uring are threaded anyway,
and hence already doing the full version underneath.
* tag 'io_uring-6.7-2023-11-30' of git://git.kernel.dk/linux:
io_uring: use fget/fput consistently
io_uring: free io_buffer_list entries via RCU
io_uring/kbuf: prune deferred locked cache when tearing down
io_uring/kbuf: recycle freed mapped buffer ring entries
io_uring/kbuf: defer release of mapped buffer rings
io_uring: enable io_mem_alloc/free to be used in other parts
io_uring: don't guard IORING_OFF_PBUF_RING with SETUP_NO_MMAP
io_uring: don't allow discontig pages for IORING_SETUP_NO_MMAP
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Invalid namespace identification error handling (Marizio Ewan,
Keith)
- Fabrics keep-alive tuning (Mark)
- Fix for a bad error check regression in bcache (Markus)
- Fix for a performance regression with O_DIRECT (Ming)
- Fix for a flush related deadlock (Ming)
- Make the read-only warn on per-partition (Yu)
* tag 'block-6.7-2023-12-01' of git://git.kernel.dk/linux:
nvme-core: check for too small lba shift
blk-mq: don't count completed flush data request as inflight in case of quiesce
block: Document the role of the two attribute groups
block: warn once for each partition in bio_check_ro()
block: move .bd_inode into 1st cacheline of block_device
nvme: check for valid nvme_identify_ns() before using it
nvme-core: fix a memory leak in nvme_ns_info_from_identify()
nvme: fine-tune sending of first keep-alive
bcache: revert replacing IS_ERR_OR_NULL with IS_ERR
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM verity target's FEC support to always initialize IO before it
frees it. Also fix alignment of struct dm_verity_fec_io within the
per-bio-data
- Fix DM verity target to not FEC failed readahead IO
- Update DM flakey target to use MAX_ORDER rather than MAX_ORDER - 1
* tag 'dm-6.7/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-flakey: start allocating with MAX_ORDER
dm-verity: align struct dm_verity_fec_io properly
dm verity: don't perform FEC for failed readahead IO
dm verity: initialize fec io before freeing it
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three small fixes, one in drivers.
The core changes are to the internal representation of flags in
scsi_devices which removes space wasting bools in favour of single bit
flags and to add a flag to force a runtime resume which is used by ATA
devices"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Fix system start for ATA devices
scsi: Change SCSI device boolean fields to single bit flags
scsi: ufs: core: Clear cmd if abort succeeds in MCQ mode
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2 fix from Jan Kara:
"Fix an ext2 bug introduced by changes in ext2 & iomap stepping on each
other toes (apparently ext2 driver does not get much testing in
linux-next)"
* tag 'fs_for_v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2: Fix ki_pos update for DIO buffered-io fallback case
|
|
Pull more bcachefs bugfixes from Kent Overstreet:
- bcache & bcachefs were broken with CFI enabled; patch for closures to
fix type punning
- mark erasure coding as extra-experimental; there are incompatible
disk space accounting changes coming for erasure coding, and I'm
still seeing checksum errors in some tests
- several fixes for durability-related issues (durability is a device
specific setting where we can tell bcachefs that data on a given
device should be counted as replicated x times)
- a fix for a rare livelock when a btree node merge then updates a
parent node that is almost full
- fix a race in the device removal path, where dropping a pointer in a
btree node to a device would be clobbered by an in flight btree write
updating the btree node key on completion
- fix one SRCU lock hold time warning in the btree gc code - ther's
still a bunch more of these to fix
- fix a rare race where we'd start copygc before initializing the "are
we rw" percpu refcount; copygc would think we were already ro and die
immediately
* tag 'bcachefs-2023-11-29' of https://evilpiepirate.org/git/bcachefs: (23 commits)
bcachefs: Extra kthread_should_stop() calls for copygc
bcachefs: Convert gc_alloc_start() to for_each_btree_key2()
bcachefs: Fix race between btree writes and metadata drop
bcachefs: move journal seq assertion
bcachefs: -EROFS doesn't count as move_extent_start_fail
bcachefs: trace_move_extent_start_fail() now includes errcode
bcachefs: Fix split_race livelock
bcachefs: Fix bucket data type for stripe buckets
bcachefs: Add missing validation for jset_entry_data_usage
bcachefs: Fix zstd compress workspace size
bcachefs: bpos is misaligned on big endian
bcachefs: Fix ec + durability calculation
bcachefs: Data update path won't accidentaly grow replicas
bcachefs: deallocate_extra_replicas()
bcachefs: Proper refcounting for journal_keys
bcachefs: preserve device path as device name
bcachefs: Fix an endianness conversion
bcachefs: Start gc, copygc, rebalance threads after initing writes ref
bcachefs: Don't stop copygc thread on device resize
bcachefs: Make sure bch2_move_ratelimit() also waits for move_ops
...
|
|
Merge a fix for a recently introduced build issue on ARM32 platforms
caused by an inadvertent header file breakage (Dave Jiang).
* acpi-tables:
ACPI: Fix ARM32 platforms compile issue introduced by fw_table changes
|