Age | Commit message (Collapse) | Author |
|
Tariq Toukan says:
====================
mlx5e Order-0 pages for Striding RQ
In this series, we refactor our Striding RQ receive-flow to always use
fragmented WQEs (Work Queue Elements) using order-0 pages, omitting the
flow that allocates and splits high-order pages which would fragment
and deplete high-order pages in the system.
The first patch gives a slight degradation, but opens the opportunity
to using a simple page-cache mechanism of a fair size.
The page-cache, implemented in patch 3, not only closes the performance
gap but even gives a gain.
In patch 2 we re-organize the code to better manage the calls for
alloc/de-alloc pages in the RX flow.
Series generated against net-next commit:
bed806cb266e "Merge branch 'mlxsw-ethtool'"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of reallocating and mapping pages for RX data-path,
recycle already used pages in a per ring cache.
Performance tests:
The following results were measured on a freshly booted system,
giving optimal baseline performance, as high-order pages are yet to
be fragmented and depleted.
We ran pktgen single-stream benchmarks, with iptables-raw-drop:
Single stride, 64 bytes:
* 4,739,057 - baseline
* 4,749,550 - order0 no cache
* 4,786,899 - order0 with cache
1% gain
Larger packets, no page cross, 1024 bytes:
* 3,982,361 - baseline
* 3,845,682 - order0 no cache
* 4,127,852 - order0 with cache
3.7% gain
Larger packets, every 3rd packet crosses a page, 1500 bytes:
* 3,731,189 - baseline
* 3,579,414 - order0 no cache
* 3,931,708 - order0 with cache
5.4% gain
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Manage the allocation and deallocation of mapped RX pages only
through dedicated API functions.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To improve the memory consumption scheme, we omit the flow that
demands and splits high-order pages in Striding RQ, and stay
with a single Striding RQ flow that uses order-0 pages.
Moving to fragmented memory allows the use of larger MPWQEs,
which reduces the number of UMR posts and filler CQEs.
Moving to a single flow allows several optimizations that improve
performance, especially in production servers where we would
anyway fallback to order-0 allocations:
- inline functions that were called via function pointers.
- improve the UMR post process.
This patch alone is expected to give a slight performance reduction.
However, the new memory scheme gives the possibility to use a page-cache
of a fair size, that doesn't inflate the memory footprint, which will
dramatically fix the reduction and even give a performance gain.
Performance tests:
The following results were measured on a freshly booted system,
giving optimal baseline performance, as high-order pages are yet to
be fragmented and depleted.
We ran pktgen single-stream benchmarks, with iptables-raw-drop:
Single stride, 64 bytes:
* 4,739,057 - baseline
* 4,749,550 - this patch
no reduction
Larger packets, no page cross, 1024 bytes:
* 3,982,361 - baseline
* 3,845,682 - this patch
3.5% reduction
Larger packets, every 3rd packet crosses a page, 1500 bytes:
* 3,731,189 - baseline
* 3,579,414 - this patch
4% reduction
Fixes: 461017cb006a ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
Fixes: bc77b240b3c5 ("net/mlx5e: Add fragmented memory support for RX multi packet WQE")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a configuration option to inject packet loss by discarding
approximately every 8th packet received and approximately every 8th DATA
packet transmitted.
Note that no locking is used, but it shouldn't really matter.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Improve sk_buff tracing within AF_RXRPC by the following means:
(1) Use an enum to note the event type rather than plain integers and use
an array of event names rather than a big multi ?: list.
(2) Distinguish Rx from Tx packets and account them separately. This
requires the call phase to be tracked so that we know what we might
find in rxtx_buffer[].
(3) Add a parameter to rxrpc_{new,see,get,free}_skb() to indicate the
event type.
(4) A pair of 'rotate' events are added to indicate packets that are about
to be rotated out of the Rx and Tx windows.
(5) A pair of 'lost' events are added, along with rxrpc_lose_skb() for
packet loss injection recording.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Remove _enter/_debug/_leave calls from rxrpc_recvmsg_data() of which one
uses an uninitialised variable.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Add a tracepoint to follow what recvmsg does within AF_RXRPC.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Add a tracepoint to follow the life of packets that get added to a call's
receive buffer.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Add a tracepoint to log information about ACK transmission.
Signed-off-by: David Howels <dhowells@redhat.com>
|
|
Add a tracepoint to log information from received ACK packets.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Add a tracepoint to follow the insertion of a packet into the transmit
buffer, its transmission and its rotation out of the buffer.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Add a pair of tracepoints, one to track rxrpc_connection struct ref
counting and the other to track the client connection cache state.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Add additional call tracepoint points for noting call-connected,
call-released and connection-failed events.
Also fix one tracepoint that was using an integer instead of the
corresponding enum value as the point type.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Print a symbolic packet type name for each valid received packet in the
trace output, not just a number.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Fix the basic transmit DATA packet content size at 1412 bytes so that they
can be arbitrarily assembled into jumbo packets.
In the future, I'm thinking of moving to keeping a jumbo packet header at
the beginning of each packet in the Tx queue and creating the packet header
on the spot when kernel_sendmsg() is invoked. That way, jumbo packets can
be assembled on the spur of the moment for (re-)transmission.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
rxrpc_send_call_packet() should use type in both its switch-statements
rather than using pkt->whdr.type. This might give the compiler an easier
job of uninitialised variable checking.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Don't transmit an ACK if call->ackr_reason in unset. There's the
possibility of a race between recvmsg() sending an ACK and the background
processing thread trying to send the same one.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Make the retransmission algorithm use for-loops instead of do-loops and
move the counter increments into the for-statement increment slots.
Though the do-loops are slighly more efficient since there will be at least
one pass through the each loop, the counter increments are harder to get
right as the continue-statements skip them.
Without this, if there are any positive acks within the loop, the do-loop
will cycle forever because the counter increment is never done.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
The soft-ACK parser doesn't increment the pointer into the soft-ACK list,
resulting in the first ACK/NACK value being applied to all the relevant
packets in the Tx queue. This has the potential to miss retransmissions
and cause excessive retransmissions.
Fix this by incrementing the pointer.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
If the last call on a client connection is release after the connection has
had a bunch of calls allocated but before any DATA packets are sent (so
that it's not yet marked RXRPC_CONN_EXPOSED), an assertion will happen in
rxrpc_disconnect_client_call().
af_rxrpc: Assertion failed - 1(0x1) >= 2(0x2) is false
------------[ cut here ]------------
kernel BUG at ../net/rxrpc/conn_client.c:753!
This is because it's expecting the conn to have been exposed and to have 2
or more refs - but this isn't necessarily the case.
Simply remove the assertion. This allows the conn to be moved into the
inactive state and deleted if it isn't resurrected before the final put is
called.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Call rxrpc_release_call() on getting an error in rxrpc_new_client_call()
rather than trying to do the cleanup ourselves. This isn't a problem,
provided we set RXRPC_CALL_HAS_USERID only if we actually add the call to
the calls tree as cleanup code fragments that would otherwise cause
problems are conditional.
Without this, we miss some of the cleanup.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
In rxrpc_put_one_client_conn(), if a connection has RXRPC_CONN_COUNTED set
on it, then it's accounted for in rxrpc_nr_client_conns and may be on
various lists - and this is cleaned up correctly.
However, if the connection doesn't have RXRPC_CONN_COUNTED set on it, then
the put routine returns rather than just skipping the extra bit of cleanup.
Fix this by making the extra bit of clean up conditional instead and always
killing off the connection.
This manifests itself as connections with a zero usage count hanging around
in /proc/net/rxrpc_conns because the connection allocated, but discarded,
due to a race with another process that set up a parallel connection, which
was then shared instead.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Purge the queue of to_be_accepted calls on socket release. Note that
purging sock_calls doesn't release the ref owned by to_be_accepted.
Probably the sock_calls list is redundant given a purges of the recvmsg_q,
the to_be_accepted queue and the calls tree.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Record calls that need to be accepted using sk_acceptq_added() otherwise
the backlog counter goes negative because sk_acceptq_removed() is called.
This causes the preallocator to malfunction.
Calls that are preaccepted by AFS within the kernel aren't affected by
this.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
The code for determining the last packet in rxrpc_recvmsg_data() has been
using the RXRPC_CALL_RX_LAST flag to determine if the rx_top pointer points
to the last packet or not. This isn't a good idea, however, as the input
code may be running simultaneously on another CPU and that sets the flag
*before* updating the top pointer.
Fix this by the following means:
(1) Restrict the use of RXRPC_CALL_RX_LAST to the input routines only.
There's otherwise a synchronisation problem between detecting the flag
and checking tx_top. This could probably be dealt with by appropriate
application of memory barriers, but there's a simpler way.
(2) Set RXRPC_CALL_RX_LAST after setting rx_top.
(3) Make rxrpc_rotate_rx_window() consult the flags header field of the
DATA packet it's about to discard to see if that was the last packet.
Use this as the basis for ending the Rx phase. This shouldn't be a
problem because the recvmsg side of things is guaranteed to see the
packets in order.
(4) Make rxrpc_recvmsg_data() return 1 to indicate the end of the data if:
(a) the packet it has just processed is marked as RXRPC_LAST_PACKET
(b) the call's Rx phase has been ended.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Check the return value of rxrpc_locate_data() in rxrpc_recvmsg_data().
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Move the check of rx_pkt_offset from rxrpc_locate_data() to the caller,
rxrpc_recvmsg_data(), so that it's more clear what's going on there.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Remove a tab that's on a line that should otherwise be blank.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Add CONFIG_AF_RXRPC_IPV6 and make the IPv6 support code conditional on it.
This is then made conditional on CONFIG_IPV6.
Without this, the following can be seen:
net/built-in.o: In function `rxrpc_init_peer':
>> peer_object.c:(.text+0x18c3c8): undefined reference to `ip6_route_output_flags'
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull cifs fixes from Steve French:
"Small set of cifs fixes"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
Move check for prefix path to within cifs_get_root()
Compare prepaths when comparing superblocks
Fix memory leaks in cifs_do_mount()
|
|
Pull nfsd bugfix from Bruce Fields:
"Fix a memory corruption bug that I introduced in 4.7"
* tag 'nfsd-4.8-2' of git://linux-nfs.org/~bfields/linux:
svcauth_gss: Revert 64c59a3726f2 ("Remove unnecessary allocation")
|
|
Pull drm fixes from Dave Airlie:
"Two sets of i915 fixes, one set of vc4 crasher fixes, and a couple of
atmel fixes.
Nothing too out there at this stage, though I think some people are
holidaying so it's been quiet enough"
* tag 'drm-fixes-for-4.8-rc6' of git://people.freedesktop.org/~airlied/linux:
drm/i915: Ignore OpRegion panel type except on select machines
Revert "drm/i915/psr: Make idle_frames sensible again"
drm/i915: Restore lost "Initialized i915" welcome message
drm/vc4: mark vc4_bo_cache_purge() static
drm/i915: Add GEN7_PCODE_MIN_FREQ_TABLE_GT_RATIO_OUT_OF_RANGE to SNB
drm/i915: disable 48bit full PPGTT when vGPU is active
drm/i915: enable vGPU detection for all
drm/atmel-hlcdc: Make ->reset() implementation static
drm: atmel-hlcdc: Fix vertical scaling
drm/vc4: Allow some more signals to be packed with uniform resets.
drm/i915/dvo: Remove dangling call to drm_encoder_cleanup()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"More annotations of tracepoints in the runtime PM framework to prevent
RCU from complaining when that code is invoked from the idle path
(Paul McKenney)"
* tag 'pm-4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / runtime: Use _rcuidle for runtime suspend tracepoints
|
|
drm-fixes
This pull request brings in a fix for crashes in X on VC4.
* tag 'drm-vc4-fixes-2016-09-14' of https://github.com/anholt/linux:
drm/vc4: mark vc4_bo_cache_purge() static
drm/vc4: Allow some more signals to be packed with uniform resets.
|
|
git://anongit.freedesktop.org/drm-intel into drm-fixes
i915 fixes from Jani.
* tag 'drm-intel-fixes-2016-09-15' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Ignore OpRegion panel type except on select machines
Revert "drm/i915/psr: Make idle_frames sensible again"
drm/i915: Restore lost "Initialized i915" welcome message
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"Round three of 4.8 rc fixes.
This is likely the last rdma pull request this cycle. The new rxe
driver had a few issues (you probably saw the boot bot bug report) and
they should be addressed now. There are a couple other fixes here,
mainly mlx4. There are still two outstanding issues that need
resolved but I don't think their fix will make this kernel cycle.
Summary:
- Various fixes to rdmavt, ipoib, mlx5, mlx4, rxe"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/rdmavt: Don't vfree a kzalloc'ed memory region
IB/rxe: Fix kmem_cache leak
IB/rxe: Fix race condition between requester and completer
IB/rxe: Fix duplicate atomic request handling
IB/rxe: Fix kernel panic in udp_setup_tunnel
IB/mlx5: Set source mac address in FTE
IB/mlx5: Enable MAD_IFC commands for IB ports only
IB/mlx4: Diagnostic HW counters are not supported in slave mode
IB/mlx4: Use correct subnet-prefix in QP1 mads under SR-IOV
IB/mlx4: Fix code indentation in QP1 MAD flow
IB/mlx4: Fix incorrect MC join state bit-masking on SR-IOV
IB/ipoib: Don't allow MC joins during light MC flush
IB/rxe: fix GFP_KERNEL in spinlock context
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"Here are a couple of bugfixes for v4.8-rc.
Most of them have actually been around for a while this time but for
some reason didn't get applied early on. The shmobile regulator fix
is the only one that isn't completely obvious.
Device tree changes:
- archtimer interrupts must be level triggered (multiple platforms)
- fix for USB and MMC clocks on STiH410
- fix split DT repository in case of raspberry-pi 3
- a new use of skeleton.dtsi on arm64 has crept in after that was
removed.
defconfig updates:
- xilinx vdma has a new Kconfig symbol name
- keystone requires CONFIG_NOP_USB_XCEIV since v4.8-rc1
Code fixes:
- fix regulator quirk on shmobile
- suspend-to-ram regression on EXYNOS
Maintainer updates:
- Javier Martinez Canillas is now a reviewer for Samsung EXYNOS"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: keystone: defconfig: Fix USB configuration
arm64: dts: Fix broken architected timer interrupt trigger
ARM: multi_v7_defconfig: update XILINX_VDMA
ARM64: dts: bcm: Use a symlink to R-Pi dtsi files from arch=arm
ARM: dts: Remove use of skeleton.dtsi from bcm283x.dtsi
ARM: dts: STiH407-family: Provide interconnect clock for consumption in ST SDHCI
ARM: dts: STiH410: Handle interconnect clock required by EHCI/OHCI (USB)
ARM: shmobile: fix regulator quirk for Gen2
ARM: EXYNOS: Clear OF_POPULATED flag from PMU node in IRQ init callback
MAINTAINERS: Add myself as reviewer for Samsung Exynos support
|
|
Pull ARM fixes from Russell King:
"Most of this update are fixes primarily discovered from testing on the
older StrongARM 1110 and PXA systems, as a result of recent interest
from several people in these platforms:
- Locomo interrupt handling incorrectly stores the handler data in
the chip's private data slot: when Locomo is combined with an
interrupt controller who's chip uses the chip private data, this
leads to an oops.
- SA1111 was missing a call to clk_disable() to clean up after a
failed probe.
- SA1111 and PCMCIA suspend/resume was broken:
The PCMCIA "ds" layer was using the legacy bus suspend/resume
methods, which the core PM code is no longer calling as a result of
device_pm_check_callbacks() introduced in commit aa8e54b559479
("PM / sleep: Go direct_complete if driver has no callbacks").
SA1111 was broken due to changes to PCMCIA which makes PCMCIA
suspend itself later than the SA1111 code expects, and resume
before the SA1111 code has initialised access to the pcmcia
sub-device.
- the default SA1111 interrupt mask polarity got messed up when it
was converted to use a dynamic interrupt base number for its
interrupts.
- fix platform_get_irq() error code propagation, which was causing
problems on platforms where the interrupt may not be available at
probe time in DT setups.
- fix the lack of clock to PCMCIA code on PXA platforms, which was
omitted in conversions of PXA to CCF.
- fix an oops in the PXA PCMCIA code caused by a previous commit not
realising that Lubbock is different from the rest of the PXA PCMCIA
drivers.
- ensure that SA1111 low-level PCMCIA drivers propagate their error
codes to the main probe function, rather than the driver silently
accepting a failure.
- fix the sa11xx debugfs reporting of timing information, which
always indicated zero due to the clock being a factor of 1000 out.
- fix the polarity of the status change signal reported from the
sockets.
Lastly, one ARM specific commit from Stefan Agner fixing the LPAE
cache attributes"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: pxa/lubbock: add pcmcia clock
ARM: locomo: fix locomo irq handling
ARM: 8612/1: LPAE: initialize cache policy correctly
ARM: sa1111: fix missing clk_disable()
ARM: sa1111: fix pcmcia suspend/resume
ARM: sa1111: fix pcmcia interrupt mask polarity
ARM: sa1111: fix error code propagation in sa1111_probe()
pcmcia: lubbock: fix sockets configuration
pcmcia: sa1111: fix propagation of lowlevel board init return code
pcmcia: soc_common: fix SS_STSCHG polarity
pcmcia: sa11xx_base: add units to the timing information
pcmcia: sa11xx_base: fix reporting of timing information
pcmcia: ds: fix suspend/resume
|
|
The userspace memory region 'mr' is allocated with kzalloc in
__rvt_alloc_mr however it is incorrectly being freed with vfree in
__rvt_free_mr. Fix this by using kfree to free it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Decrement qp reference when handling error path
in completer to prevent kmem_cache leak.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
rxe_requester() is sending a pkt with rxe_xmit_packet() and
then calls rxe_update() to update the wqe and qp's psn values.
But sometimes the response is received before the requester
had time to update the wqe in which case the completer
acts on errornous wqe values.
This fix updates the wqe and qp before actually sending
the request and rolls back when xmit fails.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
When handling ack for atomic opcodes like "fetch&add"
or "cmp&swp", the method send_atomic_ack() saves the ack
before sending it, in case it gets lost and never reach the
requester. In which case the method duplicate_request()
will need to find it using the duplicated request.psn.
But send_atomic_ack() used a wrong psn value and thus
the above ack was never found.
This fix uses the ack.psn to locate the ack in case
its needed.
This fix also copies the ack packet to the skb's control buffer
since duplicate_request() will need it when calling rxe_xmit_packet()
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Disable creation of a UDP socket for ipv6 when
CONFIG_IPV6 is not enabeld. Since udp_sock_create6()
returns 0 when CONFIG_IPV6 is not set
[ 46.888632] IP: [<c220705a>] setup_udp_tunnel_sock+0x6/0x4f
[ 46.891355] *pdpt = 0000000000000000 *pde = f000ff53f000ff53
[ 46.893918] Oops: 0002 [#1] PREEMPT
[ 46.896014] CPU: 0 PID: 1 Comm: swapper Not tainted 4.7.0-rc4-00001-g8700e3e #1
[ 46.900280] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[ 46.904905] task: cf06c040 ti: cf05e000 task.ti: cf05e000
[ 46.907854] EIP: 0060:[<c220705a>] EFLAGS: 00210246 CPU: 0
[ 46.911137] EIP is at setup_udp_tunnel_sock+0x6/0x4f
[ 46.914070] EAX: 00000044 EBX: 00000001 ECX: cf05fef0 EDX: ca8142e0
[ 46.917236] ESI: c2c4505b EDI: cf05fef0 EBP: cf05fed0 ESP: cf05fed0
[ 46.919836] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
[ 46.922046] CR0: 80050033 CR2: 000001fc CR3: 02cec000 CR4: 000006b0
[ 46.924550] Stack:
[ 46.926014] cf05ff10 c1fd4657 ca8142e0 0000000a 00000000 00000000 0000b712 00000008
[ 46.931274] 00000000 6bb5bd01 c1fd48de 00000000 00000000 cf05ff1c 00000000 00000000
[ 46.936122] cf05ff1c c1fd4bdf 00000000 cf05ff28 c2c4507b ffffffff cf05ff88 c2bf1c74
[ 46.942350] Call Trace:
[ 46.944403] [<c1fd4657>] rxe_setup_udp_tunnel+0x8f/0x99
[ 46.947689] [<c1fd48de>] ? net_to_rxe+0x4e/0x4e
[ 46.950567] [<c1fd4bdf>] rxe_net_init+0xe/0xa4
[ 46.953147] [<c2c4507b>] rxe_module_init+0x20/0x4c
[ 46.955448] [<c2bf1c74>] do_one_initcall+0x89/0x113
[ 46.957797] [<c2bf15eb>] ? set_debug_rodata+0xf/0xf
[ 46.959966] [<c2bf1dbc>] ? kernel_init_freeable+0xbe/0x15b
[ 46.962262] [<c2bf1ddc>] kernel_init_freeable+0xde/0x15b
[ 46.964418] [<c232eb54>] kernel_init+0x8/0xd0
[ 46.966618] [<c2333122>] ret_from_kernel_thread+0xe/0x24
[ 46.969592] [<c232eb4c>] ? rest_init+0x6f/0x6f
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Set the source mac address in the FTE when L2 specification
is provided.
Fixes: 038d2ef87572 ('IB/mlx5: Add flow steering support')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
MAD_IFC command is supported only for physical functions (PF)
and when physical port is IB. The proposed fix enforces it.
Fixes: d603c809ef91 ("IB/mlx5: Fix decision on using MAD_IFC")
Reported-by: David Chang <dchang@suse.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Modify the mlx4_ib_diag_counters() to avoid the following error in the
hypervisor when the slave tries to query the hardware counters in SR-IOV
mode.
mlx4_core 0000:81:00.0: Unknown command:0x30 accepted from slave:1
Fixes: 3f85f2aaabf7 ("IB/mlx4: Add diagnostic hardware counters")
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
When sending QP1 MAD packets which use a GRH, the source GID
(which consists of the 64-bit subnet prefix, and the 64 bit port GUID)
must be included in the packet GRH.
For SR-IOV, a GID cache is used, since the source GID needs to be the
slave's source GID, and not the Hypervisor's GID. This cache also
included a subnet_prefix. Unfortunately, the subnet_prefix field in
the cache was never initialized (to the default subnet prefix 0xfe80::0).
As a result, this field remained all zeroes. Therefore, when SR-IOV
was active, all QP1 packets which included a GRH had a source GID
subnet prefix of all-zeroes.
However, the subnet-prefix should initially be 0xfe80::0 (the default
subnet prefix). In addition, if OpenSM modifies a port's subnet prefix,
the new subnet prefix must be used in the GRH when sending QP1 packets.
To fix this we now initialize the subnet prefix in the SR-IOV GID cache
to the default subnet prefix. We update the cached value if/when OpenSM
modifies the port's subnet prefix. We take this cached value when sending
QP1 packets when SR-IOV is active.
Note that the value is stored as an atomic64. This eliminates any need
for locking when the subnet prefix is being updated.
Note also that we depend on the FW generating the "port management change"
event for tracking subnet-prefix changes performed by OpenSM. If running
early FW (before 2.9.4630), subnet prefix changes will not be tracked (but
the default subnet prefix still will be stored in the cache; therefore
users who do not modify the subnet prefix will not have a problem).
IF there is a need for such tracking also for early FW, we will add that
capability in a subsequent patch.
Fixes: 1ffeb2eb8be9 ("IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The indentation in the QP1 GRH flow in procedure build_mlx_header is
really confusing. Fix it, in preparation for a commit which touches
this code.
Fixes: 1ffeb2eb8be9 ("IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Because of an incorrect bit-masking done on the join state bits, when
handling a join request we failed to detect a difference between the
group join state and the request join state when joining as send only
full member (0x8). This caused the MC join request not to be sent.
This issue is relevant only when SRIOV is enabled and SM supports
send only full member.
This fix separates scope bits and join states bits a nibble each.
Fixes: b9c5d6a64358 ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|