Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A fix for a crash scenario that has been present since the initial
merge, a minor regression in sysfs attribute visibility, and a fix for
some flexible array warnings.
The bulk of this pull is an update to the libnvdimm unit test
infrastructure to test non-ACPI platforms. Given there is zero
regression risk for test updates, and the tests enable validation of
bits headed towards the next merge window, I saw no reason to hold the
new tests back. Santosh originally submitted this before the v5.11
window opened.
Summary:
- Fix a crash when sysfs accesses race 'dimm' driver probe/remove.
- Fix a regression in 'resource' attribute visibility necessary for
mapping badblocks and other physical address interrogations.
- Fix some flexible array warnings
- Expand the unit test infrastructure for non-ACPI platforms"
* tag 'libnvdimm-fixes-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm/dimm: Avoid race between probe and available_slots_show()
ndtest: Add papr health related flags
ndtest: Add nvdimm control functions
ndtest: Add regions and mappings to the test buses
ndtest: Add dimm attributes
ndtest: Add dimms to the two buses
ndtest: Add compatability string to treat it as PAPR family
testing/nvdimm: Add test module for non-nfit platforms
libnvdimm/namespace: Fix visibility of namespace resource attribute
libnvdimm/pmem: Remove unused header
ACPI: NFIT: Fix flexible_array.cocci warnings
|
|
Pull dma-mapping fix from Christoph Hellwig:
"Fix a 32 vs 64-bit padding issue in the new benchmark code (Barry
Song)"
* tag 'dma-mapping-5.11-2' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: benchmark: use u8 for reserved field in uAPI structure
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Prevent device managed IRQ allocation helpers from returning IRQ 0
- A fix for MSI activation of PCI endpoints with multiple MSIs
* tag 'irq_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Prevent [devm_]irq_alloc_desc from returning irq 0
genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull syscall entry fixes from Borislav Petkov:
- For syscall user dispatch, separate prctl operation from syscall
redirection range specification before the API has been made official
in 5.11.
- Ensure tasks using the generic syscall code do trap after returning
from a syscall when single-stepping is requested.
* tag 'core_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry: Use different define for selector variable in SUD
entry: Ensure trap after single-step on system call return
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Borislav Petkov:
"Revert an attempt to not spread IRQ threads on isolated CPUs which has
a bunch of problems"
* tag 'sched_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "lib: Restrict cpumask_local_spread to houskeeping CPUs"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Borislav Petkov:
"Two more timers-related fixes for v5.11:
- Use a freezable workqueue for RTC sync because the sync can happen
at any time and trigger suspend assertion checks in the i2c
subsystem.
- Correct a previous RTC validation change to check only bit 6 in
register D because some Intel machines use bits 0-5"
* tag 'timers_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ntp: Use freezable workqueue for RTC synchronization
rtc: mc146818: Dont test for bit 0-5 in Register D
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"I hope this is the last batch of x86/urgent updates for this round:
- Remove superfluous EFI PGD range checks which lead to those
assertions failing with certain kernel configs and LLVM.
- Disable setting breakpoints on facilities involved in #DB exception
handling to avoid infinite loops.
- Add extra serialization to non-serializing MSRs (IA32_TSC_DEADLINE
and x2 APIC MSRs) to adhere to SDM's recommendation and avoid any
theoretical issues.
- Re-add the EPB MSR reading on turbostat so that it works on older
kernels which don't have the corresponding EPB sysfs file.
- Add Alder Lake to the list of CPUs which support split lock.
- Fix %dr6 register handling in order to be able to set watchpoints
with gdb again.
- Disable CET instrumentation in the kernel so that gcc doesn't add
ENDBR64 to kernel code and thus confuse tracing"
* tag 'x86_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Remove EFI PGD build time checks
x86/debug: Prevent data breakpoints on cpu_dr7
x86/debug: Prevent data breakpoints on __per_cpu_offset
x86/apic: Add extra serialization for non-serializing MSRs
tools/power/turbostat: Fallback to an MSR read for EPB
x86/split_lock: Enable the split lock feature on another Alder Lake CPU
x86/debug: Fix DR6 handling
x86/build: Disable CET instrumentation in the kernel
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Use the 'python3' command to invoke python scripts because some
distributions do not provide the 'python' command any more.
- Clean-up and update documents
- Use pkg-config to search libcrypto
- Fix duplicated debug flags
- Ignore some more stubs in scripts/kallsyms.c
* tag 'kbuild-fixes-v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kallsyms: fix nonconverging kallsyms table with lld
kbuild: fix duplicated flags in DEBUG_CFLAGS
scripts/clang-tools: switch explicitly to Python 3
kbuild: remove PYTHON variable
Documentation/llvm: Add a section about supported architectures
Revert "checkpatch: add check for keyword 'boolean' in Kconfig definitions"
scripts: use pkg-config to locate libcrypto
kconfig: mconf: fix HOSTCC call
doc: gcc-plugins: update gcc-plugins.rst
kbuild: simplify GCC_PLUGINS enablement in dummy-tools/gcc
Documentation/Kbuild: Remove references to gcc-plugin.sh
scripts: switch explicitly to Python 3
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-02-05
This series contains updates to ice driver only.
Jake adds adds reporting of timeout length during devlink flash and
implements support to report devlink info regarding the version of
firmware that is stored (downloaded) to the device, but is not yet active.
ice_devlink_info_get will report "stored" versions when there is no
pending flash update. Version info includes the UNDI Option ROM, the
Netlist module, and the fw.bundle_id.
Gustavo A. R. Silva replaces a one-element array to flexible-array
member.
Bruce utilizes flex_array_size() helper and removes dead code on a check
for a condition that can't occur.
v2:
* removed security revision implementation, and re-ordered patches to
account for this removal
* squashed patches implementing ice_read_flash_module to avoid patches
refactoring the implementation of a previous patch in the series
* modify ice_devlink_info_get to always report "stored" versions instead
of only reporting them when a pending flash update is ready.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: remove dead code
ice: use flex_array_size where possible
ice: Replace one-element array with flexible-array member
ice: display stored UNDI firmware version via devlink info
ice: display stored netlist versions via devlink info
ice: display some stored NVM versions via devlink info
ice: introduce function for reading from flash modules
ice: cache NVM module bank information
ice: introduce context struct for info report
ice: create flash_info structure and separate NVM version
ice: report timeout length for erasing during devlink flash
====================
Link: https://lore.kernel.org/r/20210206044101.636242-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
1) Remove indirection and use nf_ct_get() instead from nfnetlink_log
and nfnetlink_queue, from Florian Westphal.
2) Add weighted random twos choice least-connection scheduling for IPVS,
from Darby Payne.
3) Add a __hash placeholder in the flow tuple structure to identify
the field to be included in the rhashtable key hash calculation.
4) Add a new nft_parse_register_load() and nft_parse_register_store()
to consolidate register load and store in the core.
5) Statify nft_parse_register() since it has no more module clients.
6) Remove redundant assignment in nft_cmp, from Colin Ian King.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next:
netfilter: nftables: remove redundant assignment of variable err
netfilter: nftables: statify nft_parse_register()
netfilter: nftables: add nft_parse_register_store() and use it
netfilter: nftables: add nft_parse_register_load() and use it
netfilter: flowtable: add hash offset field to tuple
ipvs: add weighted random twos choice algorithm
netfilter: ctnetlink: remove get_ct indirection
====================
Link: https://lore.kernel.org/r/20210206015005.23037-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull cifs fixes from Steve French:
"Three small smb3 fixes for stable"
* tag '5.11-rc6-smb3' of git://git.samba.org/sfrench/cifs-2.6:
cifs: report error instead of invalid when revalidating a dentry fails
smb3: fix crediting for compounding when only one request in flight
smb3: Fix out-of-bounds bug in SMB2_negotiate()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"A handful of fixes for this week:
- A fix to avoid evalating the VA twice in virt_addr_valid, which
fixes some WARNs under DEBUG_VIRTUAL.
- Two fixes related to STRICT_KERNEL_RWX: one that fixes some
permissions when strict is disabled, and one to fix some alignment
issues when strict is enabled.
- A fix to disallow the selection of MAXPHYSMEM_2GB on RV32, which
isn't valid any more but may still show up in some oldconfigs.
We still have the HiFive Unleashed ethernet phy reset regression, so
there will likely be something coming next week"
* tag 'riscv-for-linus-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Define MAXPHYSMEM_1GB only for RV32
riscv: Align on L1_CACHE_BYTES when STRICT_KERNEL_RWX
RISC-V: Fix .init section permission update
riscv: virt_addr_valid must check the address belongs to linear mapping
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- A fix for a change we made to __kernel_sigtramp_rt64() which confused
glibc's backtrace logic, and also changed the semantics of that
symbol, which was arguably an ABI break.
- A fix for a stack overwrite in our VSX instruction emulation.
- A couple of fixes for the Makefile logic in the new C VDSO.
Thanks to Masahiro Yamada, Naveen N. Rao, Raoni Fassina Firmino, and
Ravi Bangoria.
* tag 'powerpc-5.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64/signal: Fix regression in __kernel_sigtramp_rt64() semantics
powerpc/vdso64: remove meaningless vgettimeofday.o build rule
powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o
powerpc/sstep: Fix array out of bound warning
|
|
There's no benefit in trying to disable interrupts if NAPI is
scheduled already. This allows us to save a PCI write in this case.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/78c7f2fb-9772-1015-8c1d-632cbdff253f@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull ARM fixes from Russell King:
- Fix latent bug with DC21285 (Footbridge PCI bridge) configuration
accessors that affects GCC >= 4.9.2
- Fix misplaced tegra_uart_config in decompressor
- Ensure signal page contents are initialised
- Fix kexec oops
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: kexec: fix oops after TLB are invalidated
ARM: ensure the signal page contains defined contents
ARM: 9043/1: tegra: Fix misplaced tegra_uart_config in decompressor
ARM: footbridge: fix dc21285 PCI configuration accessors
|
|
The verdict returned from ena_xdp_execute() is used to determine the
fate of the RX buffer's page. In case of XDP Redirect/TX error the
verdict should be set to XDP_ABORTED, otherwise the page won't be freed.
Fixes: a318c70ad152 ("net: ena: introduce XDP redirect implementation")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A possible locking issue in vsock_connect_timeout() was recognized by
Eric Dumazet which might cause a null pointer dereference in
vsock_transport_cancel_pkt(). This patch assures that
vsock_transport_cancel_pkt() will be called within the lock, so a race
condition won't occur which could result in vsk->transport to be set to NULL.
Fixes: 380feae0def7 ("vsock: cancel packets when failing to connect")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Norbert Slusarek <nslusarek@gmx.net>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/trinity-f8e0937a-cf0e-4d80-a76e-d9a958ba3ef1-1612535522360@3c-app-gmx-bap12
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In vsock_stream_connect(), a thread will enter schedule_timeout().
While being scheduled out, another thread can enter vsock_stream_connect()
as well and set vsk->transport to NULL. In case a signal was sent, the
first thread can leave schedule_timeout() and vsock_transport_cancel_pkt()
will be called right after. Inside vsock_transport_cancel_pkt(), a null
dereference will happen on transport->cancel_pkt.
Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Signed-off-by: Norbert Slusarek <nslusarek@gmx.net>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/trinity-c2d6cede-bfb1-44e2-85af-1fbc7f541715-1612535117028@3c-app-gmx-bap12
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The "dev_has_header" function, recently added in
commit d549699048b4 ("net/packet: fix packet receive on L3 devices
without visible hard header"),
is more accurate as criteria for determining whether a device exposes
the LL header to upper layers, because in addition to dev->header_ops,
it also checks for dev->header_ops->create.
When transmitting an skb on a device, dev_hard_header can be called to
generate an LL header. dev_hard_header will only generate a header if
dev->header_ops->create is present.
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20210205224124.21345-1-xie.he.0141@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small, last-minute, USB driver fixes for 5.11-rc7
They all resolve issues reported, or are a few new device ids for some
drivers. They include:
- new device ids for some usb-serial drivers
- xhci fixes for a variety of reported problems
- dwc3 driver bugfixes
- dwc2 driver bugfixes
- usblp driver bugfix
- thunderbolt bugfix
- few other tiny fixes
All have been in linux-next with no reported issues"
* tag 'usb-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: dwc2: Fix endpoint direction check in ep_from_windex
usb: dwc3: fix clock issue during resume in OTG mode
xhci: fix bounce buffer usage for non-sg list case
usb: host: xhci: mvebu: make USB 3.0 PHY optional for Armada 3720
usb: xhci-mtk: break loop when find the endpoint to drop
usb: xhci-mtk: skip dropping bandwidth of unchecked endpoints
usb: renesas_usbhs: Clear pipe running flag in usbhs_pkt_pop()
USB: gadget: legacy: fix an error code in eth_bind()
thunderbolt: Fix possible NULL pointer dereference in tb_acpi_add_link()
USB: serial: option: Adding support for Cinterion MV31
usb: xhci-mtk: fix unreleased bandwidth data
usb: gadget: aspeed: add missing of_node_put
USB: usblp: don't call usb_set_interface if there's a single alt
USB: serial: cp210x: add pid/vid for WSDA-200-USB
USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"Nothing terribly interesting, just a few fixups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: xpad - sync supported devices with fork on GitHub
Input: ariel-pwrbutton - remove unused variable ariel_pwrbutton_id_table
Input: goodix - add support for Goodix GT9286 chip
dt-bindings: input: touchscreen: goodix: Add binding for GT9286 IC
dt-bindings: input: adc-keys: clarify description
Input: ili210x - implement pressure reporting for ILI251x
Input: i8042 - unbreak Pegatron C15B
Input: st1232 - wait until device is ready before reading resolution
Input: st1232 - do not read more bytes than needed
Input: st1232 - fix off-by-one error in resolution handling
|
|
Alex Elder says:
====================
net: ipa: a mix of small improvements
Version 2 of this series restructures a couple of the changed
functions (in patches 1 and 2) to avoid blocks of indented code
by returning early when possible, as suggested by Jakub. The
description of the first patch was changed as a result, to better
reflect what the updated patch does. It also fixes one spot I
identified when updating the code, where gsi_channel_stop() was
doing the wrong thing on error.
The original description for this series is below.
This series contains a sort of unrelated set of code cleanups.
The first two are things I wanted to do in a series that updated
some NAPI code recently. I didn't want to change things in a way
that affected existing testing so I set these aside for later
(i.e., now).
The third makes a change to event ring handling that's similar to
what was done a while back for channels. There's little benefit to
cacheing the current state of an event ring, so with this we'll just
fetch the state from hardware whenever we need it.
The fourth patch removes the definitions of two unused symbols.
The fifth replaces a count that is always 0 or 1 with a Boolean.
The sixth removes a build-time validation check that doesn't really
provide benefit.
And the last one fixes a problem (in two spots) that could cause a
build-time check to fail "bogusly".
====================
Link: https://lore.kernel.org/r/20210205221100.1738-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It's possible that the length passed to ipa_header_size_encoded()
is larger than what can be represented by the HDR_LEN field alone
(starting with IPA v4.5). If we attempted that, u32_encode_bits()
would trigger a build-time error.
Avoid this problem by masking off high-order bits of the value
encoded as the lower portion of the header length.
The same sort of problem exists in ipa_metadata_offset_encoded(),
so implement the same fix there.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is a build-time check that the packet status structure is a
multiple of 4 bytes in size. It's not clear where that constraint
comes from, but the structure defines what hardware provides so its
definition won't change. Get rid of the check; it adds no value.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The count argument to ipa_endpoint_replenish() is only ever 0 or 1,
and always will be (because we always handle each receive buffer in
a single transaction). Rename the argument to be add_one and change
it to be Boolean.
Update the function description to reflect the current code.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We do not support inter-EE channel or event ring commands. Inter-EE
interrupts are disabled (and never re-enabled) for all channels and
event rings, so we have no need for the GSI registers that clear
those interrupt conditions. So remove their definitions.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
An event ring's state only needs to be known when it is allocated,
reset, or deallocated. We check an event ring's state both before
and after performing an event ring control command that changes
its state. These are only issued at startup and shutdown, so there
is very little value in caching the state.
Stop recording a copy of the channel's last known state, and instead
fetch the true state from hardware whenever it's needed. In such
cases, *do* record the state in a local variable, in case an error
message reports it (so the value reported is the value seen).
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When stopping a channel, gsi_channel_stop() will ensure NAPI
polling is complete when it calls napi_disable(). So there is no
need to call napi_synchronize() in that case.
Move the call to napi_synchronize() out of __gsi_channel_stop()
and into gsi_channel_suspend(), so it's only used where needed.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the mutex calls out of gsi_channel_stop_retry() and into
__gsi_channel_stop(), to make the latter more semantically similar
to __gsi_channel_start().
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Vladimir Oltean says:
====================
LAG offload for Ocelot DSA switches
This patch series reworks the ocelot switchdev driver such that it could
share the same implementation for LAG offload as the felix DSA driver.
Testing has been done in the following topology:
+----------------------------------+
| Board 1 br0 |
| +---------+ |
| / \ |
| | | |
| | bond0 |
| | +-----+ |
| | / \ |
| eno0 swp0 swp1 swp2 |
+---|--------|-------|-------|-----+
| | | |
+--------+ | |
Cable | |
Cable| |Cable
Cable | |
+--------+ | |
| | | |
+---|--------|-------|-------|-----+
| eno0 swp0 swp1 swp2 |
| | \ / |
| | +-----+ |
| | bond0 |
| | | |
| \ / |
| +---------+ |
| Board 2 br0 |
+----------------------------------+
The same script can be run on both Board 1 and Board 2 to set this up:
ip link del bond0
ip link add bond0 type bond mode balance-xor miimon 1
OR
ip link add bond0 type bond mode 802.3ad
ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
ip link del br0
ip link add br0 type bridge
ip link set bond0 master br0
ip link set swp0 master br0
Then traffic can be tested between eno0 of Board 1 and eno0 of Board 2.
====================
Link: https://lore.kernel.org/r/20210205220221.255646-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The ocelot switch has been supporting LAG offload since its initial
commit, however felix could not make use of that, due to lack of a LAG
abstraction in DSA. Now that we have that, let's forward DSA's calls
towards the ocelot library, who will deal with setting up the bonding.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Given the following topology, and focusing only on Box A:
Box A
+----------------------------------+
| Board 1 br0 |
| +---------+ |
| / \ |
| | | |
| | bond0 |
| | +-----+ |
|192.168.1.1 | / \ |
| eno0 swp0 swp1 swp2 |
+---|--------|-------|-------|-----+
| | | |
+--------+ | |
Cable | |
Cable| |Cable
Cable | |
+--------+ | |
| | | |
+---|--------|-------|-------|-----+
| eno0 swp0 swp1 swp2 |
|192.168.1.2 | \ / |
| | +-----+ |
| | bond0 |
| | | |
| \ / |
| +---------+ |
| Board 2 br0 |
+----------------------------------+
Box B
The assisted_learning_on_cpu_port logic will see that swp0 is bridged
with a "foreign interface" (bond0) and will therefore install all
addresses learnt by the software bridge towards bond0 (including the
address of eno0 on Box B) as static addresses towards the CPU port.
But that's not what we want - bond0 is not really a "foreign interface"
but one we can offload including L2 forwarding from/towards it. So we
need to refine our logic for assisted learning such that, whenever we
see an address learnt on a non-DSA interface, we search through the tree
for any port that offloads that non-DSA interface.
Some confusion might arise as to why we search through the whole tree
instead of just the local switch returned by dsa_slave_dev_lower_find.
Or a different angle of the same confusion: why does
dsa_slave_dev_lower_find(br_dev) return a single dp that's under br_dev
instead of the whole list of bridged DSA ports?
To answer the second question, it should be enough to install the static
FDB entry on the CPU port of a single switch in the tree, because
dsa_port_fdb_add uses DSA_NOTIFIER_FDB_ADD which ensures that all other
switches in the tree get notified of that address, and add the entry
themselves using dsa_towards_port().
This should help understand the answer to the first question: the port
returned by dsa_slave_dev_lower_find may not be on the same switch as
the ports that offload the LAG. Nonetheless, if the driver implements
.crosschip_lag_join and .crosschip_bridge_join as mv88e6xxx does, there
still isn't any reason for trapping addresses learnt on the remote LAG
towards the CPU, and we should prevent that.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
At present there is an issue when ocelot is offloading a bonding
interface, but one of the links of the physical ports goes down. Traffic
keeps being hashed towards that destination, and of course gets dropped
on egress.
Monitor the netdev notifier events emitted by the bonding driver for
changes in the physical state of lower interfaces, to determine which
ports are active and which ones are no longer.
Then extend ocelot_get_bond_mask to return either the configured bonding
interfaces, or the active ones, depending on a boolean argument. The
code that does rebalancing only needs to do so among the active ports,
whereas the bridge forwarding mask and the logical port IDs still need
to look at the permanently bonded ports.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It makes it a bit easier to read and understand the code that deals with
balancing the 16 aggregation codes among the ports in a certain LAG.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We can now simplify the implementation by always using ocelot_get_bond_mask
to look up the other ports that are offloading the same bonding interface
as us.
In ocelot_set_aggr_pgids, the code had a way to uniquely iterate through
LAGs. We need to achieve the same behavior by marking each LAG as visited,
which we do now by using a temporary 32-bit "visited" bitmask. This is
ok and we do not need dynamic memory allocation, because we know that
this switch architecture will not have more than 32 ports (the PGID port
masks are 32-bit anyway).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The setup of logical port IDs is done in two places: from the inconclusively
named ocelot_setup_lag and from ocelot_port_lag_leave, a function that
also calls ocelot_setup_lag (which apparently does an incomplete setup
of the LAG).
To improve this situation, we can rename ocelot_setup_lag into
ocelot_setup_logical_port_ids, and drop the "lag" argument. It will now
set up the logical port IDs of all switch ports, which may be just
slightly more inefficient but more maintainable.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The index of the LAG is equal to the logical port ID that all the
physical port members have, which is further equal to the index of the
first physical port that is a member of the LAG.
The code gets a bit carried away with logic like this:
if (a == b)
c = a;
else
c = b;
which can be simplified, of course, into:
c = b;
(with a being port, b being lp, c being lag)
This further makes the "lp" variable redundant, since we can use "lag"
everywhere where "lp" (logical port) was used. So instead of a "c = b"
assignment, we can do a complete deletion of b. Only one comment here:
if (bond_mask) {
lp = __ffs(bond_mask);
ocelot->lags[lp] = 0;
}
lp was clobbered before, because it was used as a temporary variable to
hold the new smallest port ID from the bond. Now that we don't have "lp"
any longer, we'll just avoid the temporary variable and zeroize the
bonding mask directly.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since this code should be called from pure switchdev as well as from
DSA, we must find a way to determine the bonding mask not by looking
directly at the net_device lowers of the bonding interface, since those
could have different private structures.
We keep a pointer to the bonding upper interface, if present, in struct
ocelot_port. Then the bonding mask becomes the bitwise OR of all ports
that have the same bonding upper interface. This adds a duplication of
functionality with the current "lags" array, but the duplication will be
short-lived, since further patches will remove the latter completely.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
IPv6 header information is not currently part of the entropy source for
the 4-bit aggregation code used for LAG offload, even though it could be.
The hardware reference manual says about these fields:
ANA::AGGR_CFG.AC_IP6_TCPUDP_PORT_ENA
Use IPv6 TCP/UDP port when calculating aggregation code. Configure
identically for all ports. Recommended value is 1.
ANA::AGGR_CFG.AC_IP6_FLOW_LBL_ENA
Use IPv6 flow label when calculating AC. Configure identically for all
ports. Recommended value is 1.
Integration with the xmit_hash_policy of the bonding interface is TBD.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since switchdev/DSA exposes network interfaces that fulfill many of the
same user space expectations that dedicated NICs do, it makes sense to
not deny bonding interfaces with a bonding policy that we cannot offload,
but instead allow the bonding driver to select the egress interface in
software.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Make ocelot's net device event handler more streamlined by structuring
it in a similar way with others. The inspiration here was
dsa_slave_netdevice_event.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ocelot_netdevice_changeupper
ocelot_netdevice_port_event treats a single event, NETDEV_CHANGEUPPER.
So we can remove the check for the type of event, and rename the
function to be more suggestive, since there already is a function with a
very similar name of ocelot_netdevice_event.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Vladimir Oltean says:
====================
Automatically manage DSA master interface state
This patch series adds code that makes DSA open the master interface
automatically whenever one user interface gets opened, either by the
user, or by various networking subsystems: netconsole, nfsroot.
With that in place, we can remove some of the places in the network
stack where DSA-specific code was sprinkled.
====================
Link: https://lore.kernel.org/r/20210205133713.4172846-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit 728c02089a0e3eefb02e9927bfae50490f40e72e.
Since 2015 DSA has gained more integration with the network stack, we
can now have the same functionality without explicitly open-coding for
it:
- It now opens the DSA master netdevice automatically whenever a user
netdevice is opened.
- The master and switch interfaces are coupled in an upper/lower
hierarchy using the netdev adjacency lists.
In the nfsroot example below, the interface chosen by autoconfig was
swp3, and every interface except that and the DSA master, eth1, was
brought down afterwards:
[ 8.714215] mscc_felix 0000:00:00.5 swp0 (uninitialized): PHY [0000:00:00.3:10] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
[ 8.978041] mscc_felix 0000:00:00.5 swp1 (uninitialized): PHY [0000:00:00.3:11] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
[ 9.246134] mscc_felix 0000:00:00.5 swp2 (uninitialized): PHY [0000:00:00.3:12] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
[ 9.486203] mscc_felix 0000:00:00.5 swp3 (uninitialized): PHY [0000:00:00.3:13] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
[ 9.512827] mscc_felix 0000:00:00.5: configuring for fixed/internal link mode
[ 9.521047] mscc_felix 0000:00:00.5: Link is Up - 2.5Gbps/Full - flow control off
[ 9.530382] device eth1 entered promiscuous mode
[ 9.535452] DSA: tree 0 setup
[ 9.539777] printk: console [netcon0] enabled
[ 9.544504] netconsole: network logging started
[ 9.555047] fsl_enetc 0000:00:00.2 eth1: configuring for fixed/internal link mode
[ 9.562790] fsl_enetc 0000:00:00.2 eth1: Link is Up - 1Gbps/Full - flow control off
[ 9.564661] 8021q: adding VLAN 0 to HW filter on device bond0
[ 9.637681] fsl_enetc 0000:00:00.0 eth0: PHY [0000:00:00.0:02] driver [Qualcomm Atheros AR8031/AR8033] (irq=POLL)
[ 9.655679] fsl_enetc 0000:00:00.0 eth0: configuring for inband/sgmii link mode
[ 9.666611] mscc_felix 0000:00:00.5 swp0: configuring for inband/qsgmii link mode
[ 9.676216] 8021q: adding VLAN 0 to HW filter on device swp0
[ 9.682086] mscc_felix 0000:00:00.5 swp1: configuring for inband/qsgmii link mode
[ 9.690700] 8021q: adding VLAN 0 to HW filter on device swp1
[ 9.696538] mscc_felix 0000:00:00.5 swp2: configuring for inband/qsgmii link mode
[ 9.705131] 8021q: adding VLAN 0 to HW filter on device swp2
[ 9.710964] mscc_felix 0000:00:00.5 swp3: configuring for inband/qsgmii link mode
[ 9.719548] 8021q: adding VLAN 0 to HW filter on device swp3
[ 9.747811] Sending DHCP requests ..
[ 12.742899] mscc_felix 0000:00:00.5 swp1: Link is Up - 1Gbps/Full - flow control rx/tx
[ 12.743828] mscc_felix 0000:00:00.5 swp0: Link is Up - 1Gbps/Full - flow control off
[ 12.747062] IPv6: ADDRCONF(NETDEV_CHANGE): swp1: link becomes ready
[ 12.755216] fsl_enetc 0000:00:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 12.766603] IPv6: ADDRCONF(NETDEV_CHANGE): swp0: link becomes ready
[ 12.783188] mscc_felix 0000:00:00.5 swp2: Link is Up - 1Gbps/Full - flow control rx/tx
[ 12.785354] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 12.799535] IPv6: ADDRCONF(NETDEV_CHANGE): swp2: link becomes ready
[ 13.803141] mscc_felix 0000:00:00.5 swp3: Link is Up - 1Gbps/Full - flow control rx/tx
[ 13.811646] IPv6: ADDRCONF(NETDEV_CHANGE): swp3: link becomes ready
[ 15.452018] ., OK
[ 15.470336] IP-Config: Got DHCP answer from 10.0.0.1, my address is 10.0.0.39
[ 15.477887] IP-Config: Complete:
[ 15.481330] device=swp3, hwaddr=00:04:9f:05:de:0a, ipaddr=10.0.0.39, mask=255.255.255.0, gw=10.0.0.1
[ 15.491846] host=10.0.0.39, domain=(none), nis-domain=(none)
[ 15.498429] bootserver=10.0.0.1, rootserver=10.0.0.1, rootpath=
[ 15.498481] nameserver0=8.8.8.8
[ 15.627542] fsl_enetc 0000:00:00.0 eth0: Link is Down
[ 15.690903] mscc_felix 0000:00:00.5 swp0: Link is Down
[ 15.745216] mscc_felix 0000:00:00.5 swp1: Link is Down
[ 15.800498] mscc_felix 0000:00:00.5 swp2: Link is Down
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit 1532b9778478577152201adbafa7738b1e844868.
The above commit is good and it works, however it was meant as a bugfix
for stable kernels and now we have more self-contained ways in DSA to
handle the situation where the DSA master must be brought up.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is not fixing any actual bug that I know of, but having a DSA
interface that is up even when its lower (master) interface is down is
one of those things that just do not sound right.
Yes, DSA checks if the master is up before actually bringing the
user interface up, but nobody prevents bringing the master interface
down immediately afterwards... Then the user ports would attempt
dev_queue_xmit on an interface that is down, and wonder what's wrong.
This patch prevents that from happening. NETDEV_GOING_DOWN is the
notification emitted _before_ the master actually goes down, and we are
protected by the rtnl_mutex, so all is well.
For those of you reading this because you were doing switch testing
such as latency measurements for autonomously forwarded traffic, and you
needed a controlled environment with no extra packets sent by the
network stack, this patch breaks that, because now the user ports go
down too, which may shut down the PHY etc. But please don't do it like
that, just do instead:
tc qdisc add dev eno2 clsact
tc filter add dev eno2 egress flower action drop
Tested with two cascaded DSA switches:
$ ip link set eno2 down
sja1105 spi2.0 sw0p2: Link is Down
mscc_felix 0000:00:00.5 swp0: Link is Down
fsl_enetc 0000:00:00.2 eno2: Link is Down
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
DSA wants the master interface to be open before the user port is due to
historical reasons. The promiscuity of interfaces that are down used to
have issues, as referenced Lennert Buytenhek in commit df02c6ff2e39
("dsa: fix master interface allmulti/promisc handling").
The bugfix mentioned there, commit b6c40d68ff64 ("net: only invoke
dev->change_rx_flags when device is UP"), was basically a "don't do
that" approach to working around the promiscuity while down issue.
Further work done by Vlad Yasevich in commit d2615bf45069 ("net: core:
Always propagate flag changes to interfaces") has resolved the
underlying issue, and it is strictly up to the DSA and 8021q drivers
now, it is no longer mandated by the networking core that the master
interface must be up when changing its promiscuity.
From DSA's point of view, deciding to error out in dsa_slave_open
because the master isn't up is
(a) a bad user experience and
(b) knocking at an open door.
Even if there still was an issue with promiscuity while down, DSA could
still just open the master and avoid it.
Doing it this way has the additional benefit that user space can now
remove DSA-specific workarounds, like systemd-networkd with BindCarrier:
https://github.com/systemd/systemd/issues/7478
And we can finally remove one of the 2 bullets in the "Common pitfalls
using DSA setups" chapter.
Tested with two cascaded DSA switches:
$ ip link set sw0p2 up
fsl_enetc 0000:00:00.2 eno2: configuring for fixed/internal link mode
fsl_enetc 0000:00:00.2 eno2: Link is Up - 1Gbps/Full - flow control rx/tx
mscc_felix 0000:00:00.5 swp0: configuring for fixed/sgmii link mode
mscc_felix 0000:00:00.5 swp0: Link is Up - 1Gbps/Full - flow control off
8021q: adding VLAN 0 to HW filter on device swp0
sja1105 spi2.0 sw0p2: configuring for phy/rgmii-id link mode
IPv6: ADDRCONF(NETDEV_CHANGE): eno2: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): swp0: link becomes ready
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"One fix in drivers (lpfc) that stops an oops on resource exhaustion"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Fix EEH encountering oops with NVMe traffic
|
|
Pull block fixes from Jens Axboe:
"A few small regression fixes:
- NVMe pull request from Christoph:
- more quirks for buggy devices (Thorsten Leemhuis, Claus Stovgaard)
- update the email address for Keith (Keith Busch)
- fix an out of bounds access in nvmet-tcp (Sagi Grimberg)
- Regression fix for BFQ shallow depth calculations introduced in
this merge window (Lin)"
* tag 'block-5.11-2021-02-05' of git://git.kernel.dk/linux-block:
nvmet-tcp: fix out-of-bounds access when receiving multiple h2cdata PDUs
bfq-iosched: Revert "bfq: Fix computation of shallow depth"
update the email address for Keith Bush
nvme-pci: ignore the subsysem NQN on Phison E16
nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs
|
|
Huazhong Tan says:
====================
net: hns3: updates for -next
This series adds some code optimizations and compatibility
handlings for the HNS3 ethernet driver.
change log:
V2: refactor #2 as Jukub Kicinski reported and remove the part
about RSS size which will not be different in different hw.
updates netdev->max_mtu as well in #4 reported by Jakub Kicinski.
previous version:
V1: https://patchwork.kernel.org/project/netdevbpf/cover/1612269593-18691-1-git-send-email-tanhuazhong@huawei.com/
====================
Link: https://lore.kernel.org/r/1612513969-9278-1-git-send-email-tanhuazhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|