Age | Commit message (Collapse) | Author |
|
[ Upstream commit e575d3a6dd22123888defb622b1742aa2d45b942 ]
TSAR is the correct spelling (Transmit Scheduling ARbiter).
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240613210036.1125203-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 861cd9b9cb62 ("net/mlx5: Verify support for scheduling element and TSAR type")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 452ef7f86036392005940de54228d42ca0044192 ]
Add the missing masks for supported element types and Transmit
Scheduling Arbiter (TSAR) types in scheduling elements.
Also, add the corresponding bit masks for these types in the QoS
capabilities of a NIC scheduler.
Fixes: 214baf22870c ("net/mlx5e: Support HTB offload")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b28ad32442bec2f0d9cb660d7d698a1a53c13d08 ]
Rename 400G_8X speed to comply to naming convention.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/ac98447cac8379a43fbdb36d56e5fb2b741a97ff.1695204156.git.leon@kernel.org
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Stable-dep-of: 80bf474242b2 ("net/mlx5e: Add missing link mode to ptys2ext_ethtool_map")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 6513eb3d3191574b58859ef2d6dc26c0277c6f81 upstream.
The referenced commit drops bad input, but has false positives.
Tighten the check to avoid these.
The check detects illegal checksum offload requests, which produce
csum_start/csum_off beyond end of packet after segmentation.
But it is based on two incorrect assumptions:
1. virtio_net_hdr_to_skb with VIRTIO_NET_HDR_GSO_TCP[46] implies GSO.
True in callers that inject into the tx path, such as tap.
But false in callers that inject into rx, like virtio-net.
Here, the flags indicate GRO, and CHECKSUM_UNNECESSARY or
CHECKSUM_NONE without VIRTIO_NET_HDR_F_NEEDS_CSUM is normal.
2. TSO requires checksum offload, i.e., ip_summed == CHECKSUM_PARTIAL.
False, as tcp[46]_gso_segment will fix up csum_start and offset for
all other ip_summed by calling __tcp_v4_send_check.
Because of 2, we can limit the scope of the fix to virtio_net_hdr
that do try to set these fields, with a bogus value.
Link: https://lore.kernel.org/netdev/20240909094527.GA3048202@port70.net/
Fixes: 89add40066f9 ("net: drop bad gso csum_start and offset in virtio_net_hdr")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20240910213553.839926-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 33cf42e68efc8ff529a7eee08a4f0ba8c8d0a207 ]
This is required by layouts that need to read whole NVMEM content. It's
especially useful for NVMEM devices without hardcoded layout (like
U-Boot environment data block).
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20231221173421.13737-2-zajec5@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: 8679e8b4a1eb ("nvmem: u-boot-env: error if NVMEM device is too small")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 365130fd47af6d4317aa16a407874b699ab8d8cb ]
Similar to recently propose for_each_child_of_node_scoped() this
new version of the loop macro instantiates a new local
struct fwnode_handle * that uses the __free(fwnode_handle) auto
cleanup handling so that if a reference to a node is held on early
exit from the loop the reference will be released. If the loop
runs to completion, the child pointer will be NULL and no action will
be taken.
The reason this is useful is that it removes the need for
fwnode_handle_put() on early loop exits. If there is a need
to retain the reference, then return_ptr(child) or no_free_ptr(child)
may be used to safely disable the auto cleanup.
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Link: https://lore.kernel.org/r/20240217164249.921878-5-jic23@kernel.org
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Stable-dep-of: 61cbfb5368dd ("iio: adc: ad7124: fix DT configuration parsing")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 59ed5e2d505bf5f9b4af64d0021cd0c96aec1f7c ]
Useful where the fwnode_handle was obtained from a call such as
fwnode_find_reference() as it will safely do nothing if IS_ERR() is true
and will automatically release the reference on the variable leaving
scope.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Link: https://lore.kernel.org/r/20240217164249.921878-3-jic23@kernel.org
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Stable-dep-of: 61cbfb5368dd ("iio: adc: ad7124: fix DT configuration parsing")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8dd1426e2c80e32ac1995007330c8f95ffa28ebb ]
It verified GFX9-11 swizzle modes on GFX12, which has undefined behavior.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7ceb94e87bffff7c12b61eb29749e1d8ac976896 ]
Add GFX12 swizzle mode definitions for use with DCN401
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Acked-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit f3a052391822b772b4e27f2594526cf1eb103cab upstream.
amd-pstate driver utilizes the functions and data structures
provided by the ITMT architecture to enable the scheduler to
favor scheduling on cores which can be get a higher frequency
with lower voltage. We call it amd-pstate preferrred core.
Here sched_set_itmt_core_prio() is called to set priorities and
sched_set_itmt_support() is called to enable ITMT feature.
amd-pstate driver uses the highest performance value to indicate
the priority of CPU. The higher value has a higher priority.
The initial core rankings are set up by amd-pstate when the
system boots.
Add a variable hw_prefcore in cpudata structure. It will check
if the processor and power firmware support preferred core
feature.
Add one new early parameter `disable` to allow user to disable
the preferred core.
Only when hardware supports preferred core and user set `enabled`
in early parameter, amd pstate driver supports preferred core featue.
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Co-developed-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Meng Li <li.meng@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 12753d71e8c5c3e716cedba23ddeed508da0bdc4 upstream.
Add support for getting the highest performance to the
generic CPPC driver. This enables downstream drivers
such as amd-pstate to discover and use these values.
Refer to Chapter 8.4.6.1.1.1. Highest Performance of ACPI
Specification 6.5 for details on continuous performance control
of CPPC (linked below).
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Meng Li <li.meng@amd.com>
Link: https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html?highlight=cppc#highest-performance
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit eba2591d99d1f14a04c8a8a845ab0795b93f5646 upstream.
Instead of directly dereferencing page tables entries, which can cause
issues (see commit 20a004e7b017 ("arm64: mm: Use READ_ONCE/WRITE_ONCE when
accessing page tables"), let's introduce new functions to get the
pud/p4d/pgd entries (the pte and pmd versions already exist).
Note that arm pgd_t is actually an array so pgdp_get() is defined as a
macro to avoid a build error.
Those new functions will be used in subsequent commits by the riscv
architecture.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20231213203001.179237-3-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 33f339a1ba54e56bba57ee9a77c71e385ab4825c ]
There's a potential race when `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` is
false during the execution of `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN`, but
becomes true when `BPF_CGROUP_RUN_PROG_GETSOCKOPT` is called.
This inconsistency can lead to `BPF_CGROUP_RUN_PROG_GETSOCKOPT` receiving
an "-EFAULT" from `__cgroup_bpf_run_filter_getsockopt(max_optlen=0)`.
Scenario shown as below:
`process A` `process B`
----------- ------------
BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN
enable CGROUP_GETSOCKOPT
BPF_CGROUP_RUN_PROG_GETSOCKOPT (-EFAULT)
To resolve this, remove the `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` macro and
directly uses `copy_from_sockptr` to ensure that `max_optlen` is always
set before `BPF_CGROUP_RUN_PROG_GETSOCKOPT` is invoked.
Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
Co-developed-by: Yanghui Li <yanghui.li@mediatek.com>
Signed-off-by: Yanghui Li <yanghui.li@mediatek.com>
Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Link: https://patch.msgid.link/20240830082518.23243-1-Tze-nan.Wu@mediatek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0b05b0cd78c92371fdde6333d006f39eaf9e0860 ]
Split __sys_getsockopt() into two functions by removing the core
logic into a sub-function (do_sock_getsockopt()). This will avoid
code duplication when doing the same operation in other callers, for
instance.
do_sock_getsockopt() will be called by io_uring getsockopt() command
operation in the following patch.
The same was done for the setsockopt pair.
Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231016134750.1381153-5-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: 33f339a1ba54 ("bpf, net: Fix a potential race in do_sock_getsockopt()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1406245c29454ff84919736be83e14cdaba7fec1 ]
Split __sys_setsockopt() into two functions by removing the core
logic into a sub-function (do_sock_setsockopt()). This will avoid
code duplication when doing the same operation in other callers, for
instance.
do_sock_setsockopt() will be called by io_uring setsockopt() command
operation in the following patch.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231016134750.1381153-4-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: 33f339a1ba54 ("bpf, net: Fix a potential race in do_sock_getsockopt()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3f31e0d14d44ad491a81b7c1f83f32fbc300a867 ]
The whole network stack uses sockptr, and while it doesn't move to
something more modern, let's use sockptr in setsockptr BPF hooks, so, it
could be used by other callers.
The main motivation for this change is to use it in the io_uring
{g,s}etsockopt(), which will use a userspace pointer for *optval, but, a
kernel value for optlen.
Link: https://lore.kernel.org/all/ZSArfLaaGcfd8LH8@gmail.com/
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231016134750.1381153-3-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: 33f339a1ba54 ("bpf, net: Fix a potential race in do_sock_getsockopt()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a615f67e1a426f35366b8398c11f31c148e7df48 ]
The whole network stack uses sockptr, and while it doesn't move to
something more modern, let's use sockptr in getsockptr BPF hooks, so, it
could be used by other callers.
The main motivation for this change is to use it in the io_uring
{g,s}etsockopt(), which will use a userspace pointer for *optval, but, a
kernel value for optlen.
Link: https://lore.kernel.org/all/ZSArfLaaGcfd8LH8@gmail.com/
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231016134750.1381153-2-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: 33f339a1ba54 ("bpf, net: Fix a potential race in do_sock_getsockopt()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c898f6d7b093bd71e66569cd6797c87d4056f44b ]
This introduces hci_cmd_sync_run/hci_cmd_sync_run_once which acts like
hci_cmd_sync_queue/hci_cmd_sync_queue_once but runs immediately when
already on hdev->cmd_sync_work context.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 227a0cdf4a02 ("Bluetooth: MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 881559af5f5c545f6828e7c74d79813eb886d523 ]
If connection is still queued/pending in the cmd_sync queue it means no
command has been generated and it should be safe to just dequeue the
callback when it is being aborted.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 227a0cdf4a02 ("Bluetooth: MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 505ea2b295929e7be2b4e1bc86ee31cb7862fb01 ]
This adds functions to queue, dequeue and lookup into the cmd_sync
list.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 227a0cdf4a02 ("Bluetooth: MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5f641f03abccddd1a37233ff1b8e774b9ff1f4e8 ]
This fixes the UAF on __hci_acl_create_connection_sync caused by
connection abortion, it uses the same logic as to LE_LINK which uses
hci_cmd_sync_cancel to prevent the callback to run if the connection is
abort prematurely.
Reported-by: syzbot+3f0a39be7a2035700868@syzkaller.appspotmail.com
Fixes: 45340097ce6e ("Bluetooth: hci_conn: Only do ACL connections sequentially")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 227a0cdf4a02 ("Bluetooth: MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4aa42119d971603dc9e4d8cf4f53d5fcf082ea7d ]
With the last commit we moved to using the hci_sync queue for "Create
Connection" requests, removing the need for retrying the paging after
finished/failed "Create Connection" requests and after the end of
inquiries.
hci_conn_check_pending() was used to trigger this retry, we can remove it
now.
Note that we can also remove the special handling for COMMAND_DISALLOWED
errors in the completion handler of "Create Connection", because "Create
Connection" requests are now always serialized.
This is somewhat reverting commit 4c67bc74f016 ("[Bluetooth] Support
concurrent connect requests").
With this, the BT_CONNECT2 state of ACL hci_conn objects should now be
back to meaning only one thing: That we received a "Connection Request"
from another device (see hci_conn_request_evt), but the response to that
is going to be deferred.
Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 227a0cdf4a02 ("Bluetooth: MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 45340097ce6ea7e875674a5a7d24c95ecbc93ef9 ]
Pretty much all bluetooth chipsets only support paging a single device at
a time, and if they don't reject a secondary "Create Connection" request
while another is still ongoing, they'll most likely serialize those
requests in the firware.
With commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect
requests") we started adding some serialization of our own in case the
adapter returns "Command Disallowed" HCI error.
This commit was using the BT_CONNECT2 state for the serialization, this
state is also used for a few more things (most notably to indicate we're
waiting for an inquiry to cancel) and therefore a bit unreliable. Also
not all BT firwares would respond with "Command Disallowed" on too many
connection requests, some will also respond with "Hardware Failure"
(BCM4378), and others will error out later and send a "Connect Complete"
event with error "Rejected Limited Resources" (Marvell 88W8897).
We can clean things up a bit and also make the serialization more reliable
by using our hci_sync machinery to always do "Create Connection" requests
in a sequential manner.
This is very similar to what we're already doing for establishing LE
connections, and it works well there.
Note that this causes a test failure in mgmt-tester (test "Pair Device
- Power off 1") because the hci_abort_conn_sync() changes the error we
return on timeout of the "Create Connection". We'll fix this on the
mgmt-tester side by adjusting the expected error for the test.
Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 227a0cdf4a02 ("Bluetooth: MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 79c0868ad65a8fc7cdfaa5f2b77a4b70d0b0ea16 ]
We have error defines already, so let's use them.
Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 227a0cdf4a02 ("Bluetooth: MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1a5caec7f80ca2e659c03f45378ee26915f4eda2 ]
When adding devm_regulator_bulk_get_const() I missed adding a stub for
when CONFIG_REGULATOR is not enabled. Under certain conditions (like
randconfig testing) this can cause the compiler to reports errors
like:
error: implicit declaration of function 'devm_regulator_bulk_get_const';
did you mean 'devm_regulator_bulk_get_enable'?
Add the stub.
Fixes: 1de452a0edda ("regulator: core: Allow drivers to define their init data as const")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202408301813.TesFuSbh-lkp@intel.com/
Cc: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://patch.msgid.link/20240830073511.1.Ib733229a8a19fad8179213c05e1af01b51e42328@changeid
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit b6ecc662037694488bfff7c9fd21c405df8411f2 upstream.
Currently napi_disable() gets called during rxq and txq cleanup,
even before napi is enabled and hrtimer is initialized. It causes
kernel panic.
? page_fault_oops+0x136/0x2b0
? page_counter_cancel+0x2e/0x80
? do_user_addr_fault+0x2f2/0x640
? refill_obj_stock+0xc4/0x110
? exc_page_fault+0x71/0x160
? asm_exc_page_fault+0x27/0x30
? __mmdrop+0x10/0x180
? __mmdrop+0xec/0x180
? hrtimer_active+0xd/0x50
hrtimer_try_to_cancel+0x2c/0xf0
hrtimer_cancel+0x15/0x30
napi_disable+0x65/0x90
mana_destroy_rxq+0x4c/0x2f0
mana_create_rxq.isra.0+0x56c/0x6d0
? mana_uncfg_vport+0x50/0x50
mana_alloc_queues+0x21b/0x320
? skb_dequeue+0x5f/0x80
Cc: stable@vger.kernel.org
Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ")
Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 532f8bcd1c2c4e8112f62e1922fd1703bc0ffce0 upstream.
This reverts commit 59b047bc98084f8af2c41483e4d68a5adf2fa7f7 which
breaks compatibility with commands like:
bluetoothd[46328]: @ MGMT Command: Load.. (0x0013) plen 74 {0x0001} [hci0]
Keys: 2
BR/EDR Address: C0:DC:DA:A5:E5:47 (Samsung Electronics Co.,Ltd)
Key type: Authenticated key from P-256 (0x03)
Central: 0x00
Encryption size: 16
Diversifier[2]: 0000
Randomizer[8]: 0000000000000000
Key[16]: 6ed96089bd9765be2f2c971b0b95f624
LE Address: D7:2A:DE:1E:73:A2 (Static)
Key type: Unauthenticated key from P-256 (0x02)
Central: 0x00
Encryption size: 16
Diversifier[2]: 0000
Randomizer[8]: 0000000000000000
Key[16]: 87dd2546ededda380ffcdc0a8faa4597
@ MGMT Event: Command Status (0x0002) plen 3 {0x0001} [hci0]
Load Long Term Keys (0x0013)
Status: Invalid Parameters (0x0d)
Cc: stable@vger.kernel.org
Link: https://github.com/bluez/bluez/issues/875
Fixes: 59b047bc9808 ("Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ea72ce5da22806d5713f3ffb39a6d5ae73841f93 upstream.
iounmap() on x86 occasionally fails to unmap because the provided valid
ioremap address is not below high_memory. It turned out that this
happens due to KASLR.
KASLR uses the full address space between PAGE_OFFSET and vaddr_end to
randomize the starting points of the direct map, vmalloc and vmemmap
regions. It thereby limits the size of the direct map by using the
installed memory size plus an extra configurable margin for hot-plug
memory. This limitation is done to gain more randomization space
because otherwise only the holes between the direct map, vmalloc,
vmemmap and vaddr_end would be usable for randomizing.
The limited direct map size is not exposed to the rest of the kernel, so
the memory hot-plug and resource management related code paths still
operate under the assumption that the available address space can be
determined with MAX_PHYSMEM_BITS.
request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1
downwards. That means the first allocation happens past the end of the
direct map and if unlucky this address is in the vmalloc space, which
causes high_memory to become greater than VMALLOC_START and consequently
causes iounmap() to fail for valid ioremap addresses.
MAX_PHYSMEM_BITS cannot be changed for that because the randomization
does not align with address bit boundaries and there are other places
which actually require to know the maximum number of address bits. All
remaining usage sites of MAX_PHYSMEM_BITS have been analyzed and found
to be correct.
Cure this by exposing the end of the direct map via PHYSMEM_END and use
that for the memory hot-plug and resource management related places
instead of relying on MAX_PHYSMEM_BITS. In the KASLR case PHYSMEM_END
maps to a variable which is initialized by the KASLR initialization and
otherwise it is based on MAX_PHYSMEM_BITS as before.
To prevent future hickups add a check into add_pages() to catch callers
trying to add memory above PHYSMEM_END.
Fixes: 0483e1fa6e09 ("x86/mm: Implement ASLR for kernel memory regions")
Reported-by: Max Ramanouski <max8rr8@gmail.com>
Reported-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-By: Max Ramanouski <max8rr8@gmail.com>
Tested-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/87ed6soy3z.ffs@tglx
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 71833e79a42178d8a50b5081c98c78ace9325628 upstream.
Replace IS_ENABLED() with IS_REACHABLE() to substitute empty stubs for:
i2c_acpi_get_i2c_resource()
i2c_acpi_client_count()
i2c_acpi_find_bus_speed()
i2c_acpi_new_device_by_fwnode()
i2c_adapter *i2c_acpi_find_adapter_by_handle()
i2c_acpi_waive_d0_probe()
commit f17c06c6608a ("i2c: Fix conditional for substituting empty ACPI
functions") partially fixed this conditional to depend on CONFIG_I2C,
but used IS_ENABLED(), which is wrong since CONFIG_I2C is tristate.
CONFIG_ACPI is boolean but let's also change it to use IS_REACHABLE()
to future-proof it against becoming tristate.
Somehow despite testing various combinations of CONFIG_I2C and CONFIG_ACPI
we missed the combination CONFIG_I2C=m, CONFIG_ACPI=y.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: f17c06c6608a ("i2c: Fix conditional for substituting empty ACPI functions")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202408141333.gYnaitcV-lkp@intel.com/
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 61e2bbafb00e4b9a5de45e6448a7b6b818658576 ]
When I was doing some experiments, I found that when using the first
parameter, namely, struct net, in ip_metrics_convert() always triggers NULL
pointer crash. Then I digged into this part, realizing that we can remove
this one due to its uselessness.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 172e422ffea20a89bfdc672741c1aad6fbb5044e ]
In some setups directories can have many (usually negative) dentries.
Hence __fsnotify_update_child_dentry_flags() function can take a
significant amount of time. Since the bulk of this function happens
under inode->i_lock this causes a significant contention on the lock
when we remove the watch from the directory as the
__fsnotify_update_child_dentry_flags() call from fsnotify_recalc_mask()
races with __fsnotify_update_child_dentry_flags() calls from
__fsnotify_parent() happening on children. This can lead upto softlockup
reports reported by users.
Fix the problem by calling fsnotify_update_children_dentry_flags() to
set PARENT_WATCHED flags only when parent starts watching children.
When parent stops watching children, clear false positive PARENT_WATCHED
flags lazily in __fsnotify_parent() for each accessed child.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 56f45266df67aa0f5b2a6881c8c4d16dbfff6b7d ]
This timer HW supports 8, 16 and 32-bit timer widths. This
driver currently uses a u32 to store the max possible value
of the timer. However, statements perform addition of 2 in
xilinx_pwm_apply() when calculating the period_cycles and
duty_cycles values. Since priv->max is a u32, this will
result in an overflow to 1 which will not only be incorrect
but fail on range comparison. This results in making it
impossible to set the PWM in this timer mode.
There are two obvious solutions to the current problem:
1. Cast each instance where overflow occurs to u64.
2. Change priv->max from a u32 to a u64.
Solution #1 requires more code modifications, and leaves
opportunity to introduce similar overflows if other math
statements are added in the future. These may also go
undetected if running in non 32-bit timer modes.
Solution #2 is the much smaller and cleaner approach and
thus the chosen method in this patch.
This was tested on a Zynq UltraScale+ with multiple
instances of the PWM IP.
Signed-off-by: Ken Sloat <ksloat@designlinxhs.com>
Reviewed-by: Michal Simek <michal.simek@amd.com>
Reviewed-by: Sean Anderson <sean.anderson@seco.com>
Link: https://lore.kernel.org/r/SJ0P222MB0107490C5371B848EF04351CA1E19@SJ0P222MB0107.NAMP222.PROD.OUTLOOK.COM
Signed-off-by: Michal Simek <michal.simek@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2e3f0d693875db698891ffe89a18121bda5b95b8 ]
Add qcom_smem_bust_hwspin_lock_by_host to enable remoteproc to bust the
hwspin_lock owned by smem. In the event the remoteproc crashes
unexpectedly, the remoteproc driver can invoke this API to try and bust
the hwspin_lock and release the lock if still held by the remoteproc
device.
Signed-off-by: Chris Lew <quic_clew@quicinc.com>
Reviewed-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Link: https://lore.kernel.org/r/20240529-hwspinlock-bust-v3-3-c8b924ffa5a2@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7c327d56597d8de1680cf24e956b704270d3d84a ]
When a remoteproc crashes or goes down unexpectedly this can result in
a state where locks held by the remoteproc will remain locked possibly
resulting in deadlock. This new API hwspin_lock_bust() allows
hwspinlock implementers to define a bust operation for freeing previously
acquired hwspinlocks after verifying ownership of the acquired lock.
Signed-off-by: Richard Maina <quic_rmaina@quicinc.com>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Chris Lew <quic_clew@quicinc.com>
Link: https://lore.kernel.org/r/20240529-hwspinlock-bust-v3-1-c8b924ffa5a2@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f17c06c6608ad4ecd2ccf321753fb511812d821b ]
Add IS_ENABLED(CONFIG_I2C) to the conditional around a bunch of ACPI
functions.
The conditional around these functions depended only on CONFIG_ACPI.
But the functions are implemented in I2C core, so are only present if
CONFIG_I2C is enabled.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e6ce8a28c768dbbad3f818db286cd0f4c7a921a8 ]
The UMP 1.1 spec says that an RPN/NRPN should be sent when one of the
following occurs:
* a CC 38 is received
* a subsequent CC 6 is received
* a CC 98, 99, 100, and 101 is received, indicating the last RPN/NRPN
message has ended and a new one has started
That said, we should send a partial data even if it's not fully
filled. Let's change the UMP conversion helper code to follow that
rule.
Link: https://patch.msgid.link/20240731130528.12600-2-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0c60eb0cc320fffbb8b10329d276af14f6f5e6bf ]
If the user sets use_mcq_mode to 0, the host will try to activate the LSDB
mode unconditionally even when the LSDBS of device HCI cap is 1. This makes
commands time out and causes device probing to fail.
To prevent that problem, check the LSDBS cap when MCQ is not supported.
Signed-off-by: Kyoungrul Kim <k831.kim@samsung.com>
Link: https://lore.kernel.org/r/20240709232520epcms2p8ebdb5c4fccc30a6221390566589bf122@epcms2p8
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 3568affcddd68743e25aa3ec1647d9b82797757b upstream.
As pointed out by Stephen Boyd it is possible that during initialization
of the pmic_glink child drivers, the protection-domain notifiers fires,
and the associated work is scheduled, before the client registration
returns and as a result the local "client" pointer has been initialized.
The outcome of this is a NULL pointer dereference as the "client"
pointer is blindly dereferenced.
Timeline provided by Stephen:
CPU0 CPU1
---- ----
ucsi->client = NULL;
devm_pmic_glink_register_client()
client->pdr_notify(client->priv, pg->client_state)
pmic_glink_ucsi_pdr_notify()
schedule_work(&ucsi->register_work)
<schedule away>
pmic_glink_ucsi_register()
ucsi_register()
pmic_glink_ucsi_read_version()
pmic_glink_ucsi_read()
pmic_glink_ucsi_read()
pmic_glink_send(ucsi->client)
<client is NULL BAD>
ucsi->client = client // Too late!
This code is identical across the altmode, battery manager and usci
child drivers.
Resolve this by splitting the allocation of the "client" object and the
registration thereof into two operations.
This only happens if the protection domain registry is populated at the
time of registration, which by the introduction of commit '1ebcde047c54
("soc: qcom: add pd-mapper implementation")' became much more likely.
Reported-by: Amit Pundir <amit.pundir@linaro.org>
Closes: https://lore.kernel.org/all/CAMi1Hd2_a7TjA7J9ShrAbNOd_CoZ3D87twmO5t+nZxC9sX18tA@mail.gmail.com/
Reported-by: Johan Hovold <johan@kernel.org>
Closes: https://lore.kernel.org/all/ZqiyLvP0gkBnuekL@hovoldconsulting.com/
Reported-by: Stephen Boyd <swboyd@chromium.org>
Closes: https://lore.kernel.org/all/CAE-0n52JgfCBWiFQyQWPji8cq_rCsviBpW-m72YitgNfdaEhQg@mail.gmail.com/
Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver")
Cc: stable@vger.kernel.org
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Tested-by: Amit Pundir <amit.pundir@linaro.org>
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Link: https://lore.kernel.org/r/20240820-pmic-glink-v6-11-races-v3-1-eec53c750a04@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0870b0d8b393dde53106678a1e2cec9dfa52f9b7 ]
Typically, busy-polling durations are below 100 usec.
When/if the busy-poller thread migrates to another cpu,
local_clock() can be off by +/-2msec or more for small
values of HZ, depending on the platform.
Use ktimer_get_ns() to ensure deterministic behavior,
which is the whole point of busy-polling.
Fixes: 060212928670 ("net: add low latency socket poll")
Fixes: 9a3c71aa8024 ("net: convert low latency sockets to sched_clock()")
Fixes: 37089834528b ("sched, net: Fixup busy_loop_us_clock()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20240827114916.223377-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2aeeef906d5a526dc60cf4af92eda69836c39b1f ]
In the cited commit, bond->ipsec_lock is added to protect ipsec_list,
hence xdo_dev_state_add and xdo_dev_state_delete are called inside
this lock. As ipsec_lock is a spin lock and such xfrmdev ops may sleep,
"scheduling while atomic" will be triggered when changing bond's
active slave.
[ 101.055189] BUG: scheduling while atomic: bash/902/0x00000200
[ 101.055726] Modules linked in:
[ 101.058211] CPU: 3 PID: 902 Comm: bash Not tainted 6.9.0-rc4+ #1
[ 101.058760] Hardware name:
[ 101.059434] Call Trace:
[ 101.059436] <TASK>
[ 101.060873] dump_stack_lvl+0x51/0x60
[ 101.061275] __schedule_bug+0x4e/0x60
[ 101.061682] __schedule+0x612/0x7c0
[ 101.062078] ? __mod_timer+0x25c/0x370
[ 101.062486] schedule+0x25/0xd0
[ 101.062845] schedule_timeout+0x77/0xf0
[ 101.063265] ? asm_common_interrupt+0x22/0x40
[ 101.063724] ? __bpf_trace_itimer_state+0x10/0x10
[ 101.064215] __wait_for_common+0x87/0x190
[ 101.064648] ? usleep_range_state+0x90/0x90
[ 101.065091] cmd_exec+0x437/0xb20 [mlx5_core]
[ 101.065569] mlx5_cmd_do+0x1e/0x40 [mlx5_core]
[ 101.066051] mlx5_cmd_exec+0x18/0x30 [mlx5_core]
[ 101.066552] mlx5_crypto_create_dek_key+0xea/0x120 [mlx5_core]
[ 101.067163] ? bonding_sysfs_store_option+0x4d/0x80 [bonding]
[ 101.067738] ? kmalloc_trace+0x4d/0x350
[ 101.068156] mlx5_ipsec_create_sa_ctx+0x33/0x100 [mlx5_core]
[ 101.068747] mlx5e_xfrm_add_state+0x47b/0xaa0 [mlx5_core]
[ 101.069312] bond_change_active_slave+0x392/0x900 [bonding]
[ 101.069868] bond_option_active_slave_set+0x1c2/0x240 [bonding]
[ 101.070454] __bond_opt_set+0xa6/0x430 [bonding]
[ 101.070935] __bond_opt_set_notify+0x2f/0x90 [bonding]
[ 101.071453] bond_opt_tryset_rtnl+0x72/0xb0 [bonding]
[ 101.071965] bonding_sysfs_store_option+0x4d/0x80 [bonding]
[ 101.072567] kernfs_fop_write_iter+0x10c/0x1a0
[ 101.073033] vfs_write+0x2d8/0x400
[ 101.073416] ? alloc_fd+0x48/0x180
[ 101.073798] ksys_write+0x5f/0xe0
[ 101.074175] do_syscall_64+0x52/0x110
[ 101.074576] entry_SYSCALL_64_after_hwframe+0x4b/0x53
As bond_ipsec_add_sa_all and bond_ipsec_del_sa_all are only called
from bond_change_active_slave, which requires holding the RTNL lock.
And bond_ipsec_add_sa and bond_ipsec_del_sa are xfrm state
xdo_dev_state_add and xdo_dev_state_delete APIs, which are in user
context. So ipsec_lock doesn't have to be spin lock, change it to
mutex, and thus the above issue can be resolved.
Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Link: https://patch.msgid.link/20240823031056.110999-4-jianbol@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 70c261d500951cf3ea0fcf32651aab9a65a91471 ]
From netdev/egress, skb->len can include the ethernet header, therefore,
subtract network offset from skb->len when validating IPv6 packet length.
Fixes: 42df6e1d221d ("netfilter: Introduce egress hook")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5fd0628918977a0afdc2e6bc562d8751b5d3b8c5 ]
Subtract network offset to skb->len before performing IPv4 header sanity
checks, then adjust transport offset from offset from mac header.
Jorge Ortiz says:
When small UDP packets (< 4 bytes payload) are sent from eth0,
`meta l4proto udp` condition is not met because `NFT_PKTINFO_L4PROTO` is
not set. This happens because there is a comparison that checks if the
transport header offset exceeds the total length. This comparison does
not take into account the fact that the skb network offset might be
non-zero in egress mode (e.g., 14 bytes for Ethernet header).
Fixes: 0ae8e4cca787 ("netfilter: nf_tables: set transport offset from mac header for netdev/egress")
Reported-by: Jorge Ortiz <jorge.ortiz.escribano@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
This reverts commit 983e6b2636f0099dbac1874c9e885bbe1cf2df05 which is
commit 8a2f11878771da65b8ac135c73b47dae13afbd62 upstream.
It wasn't needed and caused a build break on s390, so just revert it
entirely.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240830221217.GA3837758@thelio-3990X
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@samsung.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
handling
[ Upstream commit 34af4554fb0ce164e2c4876683619eb1e23848d4 ]
To avoid issues with out of order cleanup, or ambiguity about when the
auto freed data is first instantiated, do it within the for loop definition.
The disadvantage is that the struct device_node *child variable creation
is not immediately obvious where this is used.
However, in many cases, if there is another definition of
struct device_node *child; the compiler / static analysers will notify us
that it is unused, or uninitialized.
Note that, in the vast majority of cases, the _available_ form should be
used and as code is converted to these scoped handers, we should confirm
that any cases that do not check for available have a good reason not
to.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20240225142714.286440-3-jic23@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Stable-dep-of: afc954fd223d ("thermal: of: Fix OF node leak in thermal_of_trips_init() error path")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 9448e55d032d99af8e23487f51a542d51b2f1a48 upstream.
The recent addition of scope based cleanup support to the kernel
provides a convenient tool to reduce the chances of leaking reference
counts where of_node_put() should have been called in an error path.
This enables
struct device_node *child __free(device_node) = NULL;
for_each_child_of_node(np, child) {
if (test)
return test;
}
with no need for a manual call of of_node_put().
A following patch will reduce the scope of the child variable to the
for loop, to avoid an issues with ordering of autocleanup, and make it
obvious when this assigned a non NULL value.
In this simple example the gains are small but there are some very
complex error handling cases buried in these loops that will be
greatly simplified by enabling early returns with out the need
for this manual of_node_put() call.
Note that there are coccinelle checks in
scripts/coccinelle/iterators/for_each_child.cocci to detect a failure
to call of_node_put(). This new approach does not cause false positives.
Longer term we may want to add scripting to check this new approach is
done correctly with no double of_node_put() calls being introduced due
to the auto cleanup. It may also be useful to script finding places
this new approach is useful.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20240225142714.286440-2-jic23@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f03e94f23b04c2b71c0044c1534921b3975ef10c upstream.
scsi_logical_block_count() should return the block count of a given SCSI
command. The original implementation ended up shifting twice, leading to an
incorrect count being returned. Fix the conversion between bytes and
logical blocks.
Cc: stable@vger.kernel.org
Fixes: 6a20e21ae1e2 ("scsi: core: Add helper to return number of logical blocks in a request")
Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com>
Link: https://lore.kernel.org/r/20240813053534.7720-1-chaotian.jing@mediatek.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 807067bf014d4a3ae2cc55bd3de16f22a01eb580 ]
syzkaller reported UAF in kcm_release(). [0]
The scenario is
1. Thread A builds a skb with MSG_MORE and sets kcm->seq_skb.
2. Thread A resumes building skb from kcm->seq_skb but is blocked
by sk_stream_wait_memory()
3. Thread B calls sendmsg() concurrently, finishes building kcm->seq_skb
and puts the skb to the write queue
4. Thread A faces an error and finally frees skb that is already in the
write queue
5. kcm_release() does double-free the skb in the write queue
When a thread is building a MSG_MORE skb, another thread must not touch it.
Let's add a per-sk mutex and serialise kcm_sendmsg().
[0]:
BUG: KASAN: slab-use-after-free in __skb_unlink include/linux/skbuff.h:2366 [inline]
BUG: KASAN: slab-use-after-free in __skb_dequeue include/linux/skbuff.h:2385 [inline]
BUG: KASAN: slab-use-after-free in __skb_queue_purge_reason include/linux/skbuff.h:3175 [inline]
BUG: KASAN: slab-use-after-free in __skb_queue_purge include/linux/skbuff.h:3181 [inline]
BUG: KASAN: slab-use-after-free in kcm_release+0x170/0x4c8 net/kcm/kcmsock.c:1691
Read of size 8 at addr ffff0000ced0fc80 by task syz-executor329/6167
CPU: 1 PID: 6167 Comm: syz-executor329 Tainted: G B 6.8.0-rc5-syzkaller-g9abbc24128bc #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
Call trace:
dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:291
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:298
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x178/0x518 mm/kasan/report.c:488
kasan_report+0xd8/0x138 mm/kasan/report.c:601
__asan_report_load8_noabort+0x20/0x2c mm/kasan/report_generic.c:381
__skb_unlink include/linux/skbuff.h:2366 [inline]
__skb_dequeue include/linux/skbuff.h:2385 [inline]
__skb_queue_purge_reason include/linux/skbuff.h:3175 [inline]
__skb_queue_purge include/linux/skbuff.h:3181 [inline]
kcm_release+0x170/0x4c8 net/kcm/kcmsock.c:1691
__sock_release net/socket.c:659 [inline]
sock_close+0xa4/0x1e8 net/socket.c:1421
__fput+0x30c/0x738 fs/file_table.c:376
____fput+0x20/0x30 fs/file_table.c:404
task_work_run+0x230/0x2e0 kernel/task_work.c:180
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0x618/0x1f64 kernel/exit.c:871
do_group_exit+0x194/0x22c kernel/exit.c:1020
get_signal+0x1500/0x15ec kernel/signal.c:2893
do_signal+0x23c/0x3b44 arch/arm64/kernel/signal.c:1249
do_notify_resume+0x74/0x1f4 arch/arm64/kernel/entry-common.c:148
exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline]
exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline]
el0_svc+0xac/0x168 arch/arm64/kernel/entry-common.c:713
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
Allocated by task 6166:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x40/0x78 mm/kasan/common.c:68
kasan_save_alloc_info+0x70/0x84 mm/kasan/generic.c:626
unpoison_slab_object mm/kasan/common.c:314 [inline]
__kasan_slab_alloc+0x74/0x8c mm/kasan/common.c:340
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slub.c:3813 [inline]
slab_alloc_node mm/slub.c:3860 [inline]
kmem_cache_alloc_node+0x204/0x4c0 mm/slub.c:3903
__alloc_skb+0x19c/0x3d8 net/core/skbuff.c:641
alloc_skb include/linux/skbuff.h:1296 [inline]
kcm_sendmsg+0x1d3c/0x2124 net/kcm/kcmsock.c:783
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
sock_sendmsg+0x220/0x2c0 net/socket.c:768
splice_to_socket+0x7cc/0xd58 fs/splice.c:889
do_splice_from fs/splice.c:941 [inline]
direct_splice_actor+0xec/0x1d8 fs/splice.c:1164
splice_direct_to_actor+0x438/0xa0c fs/splice.c:1108
do_splice_direct_actor fs/splice.c:1207 [inline]
do_splice_direct+0x1e4/0x304 fs/splice.c:1233
do_sendfile+0x460/0xb3c fs/read_write.c:1295
__do_sys_sendfile64 fs/read_write.c:1362 [inline]
__se_sys_sendfile64 fs/read_write.c:1348 [inline]
__arm64_sys_sendfile64+0x160/0x3b4 fs/read_write.c:1348
__invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
Freed by task 6167:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x40/0x78 mm/kasan/common.c:68
kasan_save_free_info+0x5c/0x74 mm/kasan/generic.c:640
poison_slab_object+0x124/0x18c mm/kasan/common.c:241
__kasan_slab_free+0x3c/0x78 mm/kasan/common.c:257
kasan_slab_free include/linux/kasan.h:184 [inline]
slab_free_hook mm/slub.c:2121 [inline]
slab_free mm/slub.c:4299 [inline]
kmem_cache_free+0x15c/0x3d4 mm/slub.c:4363
kfree_skbmem+0x10c/0x19c
__kfree_skb net/core/skbuff.c:1109 [inline]
kfree_skb_reason+0x240/0x6f4 net/core/skbuff.c:1144
kfree_skb include/linux/skbuff.h:1244 [inline]
kcm_release+0x104/0x4c8 net/kcm/kcmsock.c:1685
__sock_release net/socket.c:659 [inline]
sock_close+0xa4/0x1e8 net/socket.c:1421
__fput+0x30c/0x738 fs/file_table.c:376
____fput+0x20/0x30 fs/file_table.c:404
task_work_run+0x230/0x2e0 kernel/task_work.c:180
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0x618/0x1f64 kernel/exit.c:871
do_group_exit+0x194/0x22c kernel/exit.c:1020
get_signal+0x1500/0x15ec kernel/signal.c:2893
do_signal+0x23c/0x3b44 arch/arm64/kernel/signal.c:1249
do_notify_resume+0x74/0x1f4 arch/arm64/kernel/entry-common.c:148
exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline]
exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline]
el0_svc+0xac/0x168 arch/arm64/kernel/entry-common.c:713
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
The buggy address belongs to the object at ffff0000ced0fc80
which belongs to the cache skbuff_head_cache of size 240
The buggy address is located 0 bytes inside of
freed 240-byte region [ffff0000ced0fc80, ffff0000ced0fd70)
The buggy address belongs to the physical page:
page:00000000d35f4ae4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10ed0f
flags: 0x5ffc00000000800(slab|node=0|zone=2|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 05ffc00000000800 ffff0000c1cbf640 fffffdffc3423100 dead000000000004
raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff0000ced0fb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff0000ced0fc00: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
>ffff0000ced0fc80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff0000ced0fd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc
ffff0000ced0fd80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Reported-by: syzbot+b72d86aa5df17ce74c60@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b72d86aa5df17ce74c60
Tested-by: syzbot+b72d86aa5df17ce74c60@syzkaller.appspotmail.com
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240815220437.69511-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1eeb5043573981f3a1278876515851b7f6b1df1b ]
We lost ability to unload ipv6 module a long time ago.
Instead of calling expensive inet_twsk_purge() twice,
we can handle all families in one round.
Also remove an extra line added in my prior patch,
per Kuniyuki Iwashima feedback.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/netdev/20240327192934.6843-1-kuniyu@amazon.com/
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240329153203.345203-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 565d121b6998 ("tcp: prevent concurrent execution of tcp_sk_exit_batch")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c5e12ac3beb0dd3a718296b2d8af5528e9ab728e ]
As explained by Horatiu Vultur in commit 603ead96582d ("net: sparx5: Add
spinlock for frame transmission from CPU") which is for a similar
hardware design, multiple CPUs can simultaneously perform injection
or extraction. There are only 2 register groups for injection and 2
for extraction, and the driver only uses one of each. So we'd better
serialize access using spin locks, otherwise frame corruption is
possible.
Note that unlike in sparx5, FDMA in ocelot does not have this issue
because struct ocelot_fdma_tx_ring already contains an xmit_lock.
I guess this is mostly a problem for NXP LS1028A, as that is dual core.
I don't think VSC7514 is. So I'm blaming the commit where LS1028A (aka
the felix DSA driver) started using register-based packet injection and
extraction.
Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
register injection
[ Upstream commit 67c3ca2c5cfe6a50772514e3349b5e7b3b0fac03 ]
Problem description
-------------------
On an NXP LS1028A (felix DSA driver) with the following configuration:
- ocelot-8021q tagging protocol
- VLAN-aware bridge (with STP) spanning at least swp0 and swp1
- 8021q VLAN upper interfaces on swp0 and swp1: swp0.700, swp1.700
- ptp4l on swp0.700 and swp1.700
we see that the ptp4l instances do not see each other's traffic,
and they all go to the grand master state due to the
ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES condition.
Jumping to the conclusion for the impatient
-------------------------------------------
There is a zero-day bug in the ocelot switchdev driver in the way it
handles VLAN-tagged packet injection. The correct logic already exists in
the source code, in function ocelot_xmit_get_vlan_info() added by commit
5ca721c54d86 ("net: dsa: tag_ocelot: set the classified VLAN during xmit").
But it is used only for normal NPI-based injection with the DSA "ocelot"
tagging protocol. The other injection code paths (register-based and
FDMA-based) roll their own wrong logic. This affects and was noticed on
the DSA "ocelot-8021q" protocol because it uses register-based injection.
By moving ocelot_xmit_get_vlan_info() to a place that's common for both
the DSA tagger and the ocelot switch library, it can also be called from
ocelot_port_inject_frame() in ocelot.c.
We need to touch the lines with ocelot_ifh_port_set()'s prototype
anyway, so let's rename it to something clearer regarding what it does,
and add a kernel-doc. ocelot_ifh_set_basic() should do.
Investigation notes
-------------------
Debugging reveals that PTP event (aka those carrying timestamps, like
Sync) frames injected into swp0.700 (but also swp1.700) hit the wire
with two VLAN tags:
00000000: 01 1b 19 00 00 00 00 01 02 03 04 05 81 00 02 bc
~~~~~~~~~~~
00000010: 81 00 02 bc 88 f7 00 12 00 2c 00 00 02 00 00 00
~~~~~~~~~~~
00000020: 00 00 00 00 00 00 00 00 00 00 00 01 02 ff fe 03
00000030: 04 05 00 01 00 04 00 00 00 00 00 00 00 00 00 00
00000040: 00 00
The second (unexpected) VLAN tag makes felix_check_xtr_pkt() ->
ptp_classify_raw() fail to see these as PTP packets at the link
partner's receiving end, and return PTP_CLASS_NONE (because the BPF
classifier is not written to expect 2 VLAN tags).
The reason why packets have 2 VLAN tags is because the transmission
code treats VLAN incorrectly.
Neither ocelot switchdev, nor felix DSA, declare the NETIF_F_HW_VLAN_CTAG_TX
feature. Therefore, at xmit time, all VLANs should be in the skb head,
and none should be in the hwaccel area. This is done by:
static struct sk_buff *validate_xmit_vlan(struct sk_buff *skb,
netdev_features_t features)
{
if (skb_vlan_tag_present(skb) &&
!vlan_hw_offload_capable(features, skb->vlan_proto))
skb = __vlan_hwaccel_push_inside(skb);
return skb;
}
But ocelot_port_inject_frame() handles things incorrectly:
ocelot_ifh_port_set(ifh, port, rew_op, skb_vlan_tag_get(skb));
void ocelot_ifh_port_set(struct sk_buff *skb, void *ifh, int port, u32 rew_op)
{
(...)
if (vlan_tag)
ocelot_ifh_set_vlan_tci(ifh, vlan_tag);
(...)
}
The way __vlan_hwaccel_push_inside() pushes the tag inside the skb head
is by calling:
static inline void __vlan_hwaccel_clear_tag(struct sk_buff *skb)
{
skb->vlan_present = 0;
}
which does _not_ zero out skb->vlan_tci as seen by skb_vlan_tag_get().
This means that ocelot, when it calls skb_vlan_tag_get(), sees
(and uses) a residual skb->vlan_tci, while the same VLAN tag is
_already_ in the skb head.
The trivial fix for double VLAN headers is to replace the content of
ocelot_ifh_port_set() with:
if (skb_vlan_tag_present(skb))
ocelot_ifh_set_vlan_tci(ifh, skb_vlan_tag_get(skb));
but this would not be correct either, because, as mentioned,
vlan_hw_offload_capable() is false for us, so we'd be inserting dead
code and we'd always transmit packets with VID=0 in the injection frame
header.
I can't actually test the ocelot switchdev driver and rely exclusively
on code inspection, but I don't think traffic from 8021q uppers has ever
been injected properly, and not double-tagged. Thus I'm blaming the
introduction of VLAN fields in the injection header - early driver code.
As hinted at in the early conclusion, what we _want_ to happen for
VLAN transmission was already described once in commit 5ca721c54d86
("net: dsa: tag_ocelot: set the classified VLAN during xmit").
ocelot_xmit_get_vlan_info() intends to ensure that if the port through
which we're transmitting is under a VLAN-aware bridge, the outer VLAN
tag from the skb head is stripped from there and inserted into the
injection frame header (so that the packet is processed in hardware
through that actual VLAN). And in all other cases, the packet is sent
with VID=0 in the injection frame header, since the port is VLAN-unaware
and has logic to strip this VID on egress (making it invisible to the
wire).
Fixes: 08d02364b12f ("net: mscc: fix the injection header")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|