Age | Commit message (Collapse) | Author |
|
I noticed that apic test from kvm-unit-tests always hangs on my EPYC 7401P,
the hanging test nmi-after-sti is trying to deliver 30000 NMIs and tracing
shows that we're sometimes able to deliver a few but never all.
When we're trying to inject an NMI we may fail to do so immediately for
various reasons, however, we still need to inject it so enable_nmi_window()
arms nmi_singlestep mode. #DB occurs as expected, but we're not checking
for pending NMIs before entering the guest and unless there's a different
event to process, the NMI will never get delivered.
Make KVM_REQ_EVENT request on the vCPU from db_interception() to make sure
pending NMIs are checked and possibly injected.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Only clear the valid bit when invalidate logical APIC id entry.
The current logic clear the valid bit, but also set the rest of
the bits (including reserved bits) to 1.
Fixes: 98d90582be2e ('svm: Fix AVIC DFR and LDR handling')
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This reverts commit bb218fbcfaaa3b115d4cd7a43c0ca164f3a96e57.
As Oren Twaig pointed out the old discussion:
https://patchwork.kernel.org/patch/8292231/
that the change coud potentially cause an extra IPI to be sent to
the destination vcpu because the AVIC hardware already set the IRR bit
before the incomplete IPI #VMEXIT with id=1 (target vcpu is not running).
Since writting to ICR and ICR2 will also set the IRR. If something triggers
the destination vcpu to get scheduled before the emulation finishes, then
this could result in an additional IPI.
Also, the issue mentioned in the commit bb218fbcfaaa was misdiagnosed.
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Oren Twaig <oren@scalemp.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
KVM bases its memory usage limits on the total number of guest pages
across all memslots. However, those limits, and the calculations to
produce them, use 32 bit unsigned integers. This can result in overflow
if a VM has more guest pages that can be represented by a u32. As a
result of this overflow, KVM can use a low limit on the number of MMU
pages it will allocate. This makes KVM unable to map all of guest memory
at once, prompting spurious faults.
Tested: Ran all kvm-unit-tests on an Intel Haswell machine. This patch
introduced no new failures.
Signed-off-by: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The remaining failures of vmx.flat when EPT is disabled are caused by
incorrectly reflecting VMfails to the L1 hypervisor. What happens is
that nested_vmx_restore_host_state corrupts the guest CR3, reloading it
with the host's shadow CR3 instead, because it blindly loads GUEST_CR3
from the vmcs01.
For simplicity let's just always use hardware VMCS checks when EPT is
disabled. This way, nested_vmx_restore_host_state is not reached at
all (or at least shouldn't be reached).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
As mentioned in the comment, there are some special cases where we can simply
clear the TPR shadow bit from the CPU-based execution controls in the vmcs02.
Handle them so that we can remove some XFAILs from vmx.flat.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
commit f1a2e44a3aec ("bpf: add queue and stack maps") introduced new BPF
helper functions:
- BPF_FUNC_map_push_elem
- BPF_FUNC_map_pop_elem
- BPF_FUNC_map_peek_elem
but they were made available only for network BPF programs. This patch
makes them available for tracepoint, cgroup and lirc programs.
Signed-off-by: Alban Crequy <alban@kinvolk.io>
Cc: Mauricio Vasquez B <mauricio.vasquez@polito.it>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Rewrite selftest to iterate over an array with input packet and
expected flow_keys. This should make it easier to extend this test
with additional cases without too much boilerplate.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Add two tests to check that sequence of 1024 jumps is verifiable.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
avoids outputting a series of
value:
No space left on device
The value itself is not wrong but bpf_fd_reuseport_array_lookup_elem() can
only return it if the map was created with value_size = 8. There's nothing
bpftool can do about it. Instead of repeating this error for every key in
the map, print an explanatory warning and a specialized error.
example before:
key: 00 00 00 00
value:
No space left on device
key: 01 00 00 00
value:
No space left on device
key: 02 00 00 00
value:
No space left on device
Found 0 elements
example after:
Warning: cannot read values from reuseport_sockarray map with value_size != 8
key: 00 00 00 00 value: <cannot read>
key: 01 00 00 00 value: <cannot read>
key: 02 00 00 00 value: <cannot read>
Found 0 elements
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Commit bf598a8f0f77 ("bpftool: Improve handling of ENOENT on map dumps")
used print_entry_plain() in case of ENOENT. However, that commit introduces
dead code. Per-cpu maps are zero-filled. When reading them, it's all or
nothing. There will never be a case where some cpus have an entry and
others don't.
The truth is that ENOENT is an error case. Use print_entry_error() to
output the desired message. That function's "value" parameter is also
renamed to indicate that we never use it for an actual map value.
The output format is unchanged.
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Linux kernel now supports statistics for BPF programs, and bpftool is
able to dump them. However, these statistics are not enabled by default,
and administrators may not know how to access them.
Add a paragraph in bpftool documentation, under the description of the
"bpftool prog show" command, to explain that such statistics are
available and that their collection is controlled via a dedicated sysctl
knob.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Manual pages would tell that option "-v" (lower case) would print the
version number for bpftool. This is wrong: the short name of the option
is "-V" (upper case). Fix the documentation accordingly.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The "pinmaps" keyword is present in the man page, in the verbose
description of the "bpftool prog load" command. However, it is missing
from the summary of available commands at the beginning of the file. Add
it there as well.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
When trying to dump the tree of all cgroups under a given root node,
bpftool attempts to query programs of all available attach types. Some
of those attach types do not support queries, therefore several of the
calls are actually expected to fail.
Those calls set errno to EINVAL, which has no consequence for dumping
the rest of the tree. It does have consequences however if errno is
inspected at a later time. For example, bpftool batch mode relies on
errno to determine whether a command has succeeded, and whether it
should carry on with the next command. Setting errno to EINVAL when
everything worked as expected would therefore make such command fail:
# echo 'cgroup tree \n net show' | \
bpftool batch file -
To improve this, reset errno when its value is EINVAL after attempting
to show programs for all existing attach types in do_show_tree_fn().
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Commit 569b0c77735d ("tools/bpftool: show btf id in program information")
made bpftool print an empty line after each program entry when listing
the BPF programs loaded on the system (plain output). This is especially
confusing when some programs have an associated BTF id, and others
don't. Let's remove the blank line.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The ENCAP flags in bpf_skb_adjust_room are ignored on decap with
bpf_skb_net_shrink. Reserve these bits for future use.
Fixes: 868d523535c2d ("bpf: add bpf_skb_adjust_room encap flags")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
replace tab after #define with space in line with rest of definitions
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
It was removed in commit 166b5a7f2ca3 ("selftests_bpf: extend
test_tc_tunnel for UDP encap") without any explanation.
Otherwise I see:
progs/test_tc_tunnel.c:160:17: warning: taking address of packed member 'ip' of class or structure
'v4hdr' may result in an unaligned pointer value [-Waddress-of-packed-member]
set_ipv4_csum(&h_outer.ip);
^~~~~~~~~~
1 warning generated.
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Fixes: 166b5a7f2ca3 ("selftests_bpf: extend test_tc_tunnel for UDP encap")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Add test case verifying that dedup happens (INTs are deduped in this
case) and VAR/DATASEC types are not deduped, but have their referenced
type IDs adjusted correctly.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Yonghong Song <yhs@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
This patch adds support for VAR and DATASEC in btf_dedup(). VAR/DATASEC
are never deduplicated, but they need to be processed anyway as types
they refer to might need to be remapped due to deduplication and
compaction.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Yonghong Song <yhs@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
When CONFIG_DEBUG_INFO_BTF is enabled but available version of pahole is too
old to support BTF generation, build script is supposed to emit warning and
proceed with the build. Due to using exit instead of return from BASH function,
existing handling code prematurely exits exit code 0, not completing some of
the build steps. This patch fixes issue by correctly returning just from
gen_btf() function only.
Fixes: e83b9f55448a ("kbuild: add ability to generate BTF type info for vmlinux")
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
There is a spelling mistake in a BNX2X_ERR message, fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Murali Karicheri says:
====================
net: hsr: updates from internal tree
This series picks commit from our internal kernel tree.
Patch 1/3 fixes a file name issue introduced in my previous
series.
History:
v2: fixed patch 3/3 by moving stats update to inside hsr_forward_skb()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add tx stats to hsr interface. Without this
ifconfig for hsr interface doesn't show tx packet stats.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix the path of hsr debugfs root directory to use the net device
name so that it can work with multiple interfaces. While at it,
also fix some typos.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix the file name and functions to match with existing implementation.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Recently genphy_read_abilities() has been added that dynamically detects
clause 22 PHY abilities. I *think* this detection should work with all
supported PHY's, at least for the ones with basic features sets, i.e.
PHY_BASIC_FEATURES and PHY_GBIT_FEATURES. So let's remove setting these
features explicitly and rely on phylib feature detection.
I don't have access to most of these PHY's, therefore I'd appreciate
regression testing.
v2:
- make the feature constant a comment so that readers know which
features are supported by the respective PHY
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"I debated holding this back for the v5.2 merge window due to the size
of the "zero-key" changes, but affected users would benefit from
having the fixes sooner. It did not make sense to change the zero-key
semantic in isolation for the "secure-erase" command, but instead
include it for all security commands.
The short background on the need for these changes is that some NVDIMM
platforms enable security with a default zero-key rather than let the
OS specify the initial key. This makes the security enabling that
landed in v5.0 unusable for some users.
Summary:
- Compatibility fix for nvdimm-security implementations with a
default zero-key.
- Miscellaneous small fixes for out-of-bound accesses, cleanup after
initialization failures, and missing debug messages"
* tag 'libnvdimm-fixes-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
tools/testing/nvdimm: Retain security state after overwrite
libnvdimm/pmem: fix a possible OOB access when read and write pmem
libnvdimm/security, acpi/nfit: unify zero-key for all security commands
libnvdimm/security: provide fix for secure-erase to use zero-key
libnvdimm/btt: Fix a kmemdup failure check
libnvdimm/namespace: Fix a potential NULL pointer dereference
acpi/nfit: Always dump _DSM output payload
|
|
Simon Horman says:
====================
nfp: Flower flow merging
John Hurley says,
These patches deal with 'implicit recirculation' on the NFP. This is a
firmware feature whereby a packet egresses to an 'internal' port meaning
that it will recirculate back to the header extract phase with the
'internal' port now marked as its ingress port. This internal port can
then be matched on by another rule. This process simulates how OvS
datapath outputs to an internal port. The FW traces the packet's
recirculation route and sends a 'merge hint' to the driver telling it
which flows it matched against. The driver can then decide if these flows
can be merged to a single rule and offloaded.
The patches deal with the following issues:
- assigning/freeing IDs to/from each of these new internal ports
- offloading rules that match on internal ports
- offloading neighbour table entries whose egress port is internal
- handling fallback traffic with an internal port as ingress
- using merge hints to create 'faster path' flows and tracking stats etc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A merge flow is formed from 2 sub flows. The match fields of the merge are
the same as the first sub flow that has formed it, with the actions being
a combination of the first and second sub flow. Therefore, a merge flow
should replace sub flow 1 when offloaded.
Offload valid merge flows by using a new 'flow mod' message type to
replace an existing offloaded rule. Track the deletion of sub flows that
are linked to a merge flow and revert offloaded merge rules if required.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With the merging of 2 sub flows, a new 'merge' flow will be created and
written to FW. The TC layer is unaware that the merge flow exists and will
request stats from the sub flows. Conversely, the FW treats a merge rule
the same as any other rule and sends stats updates to the NFP driver.
Add links between merge flows and their sub flows. Use these links to pass
merge flow stats updates from FW to the underlying sub flows, ensuring TC
stats requests are handled correctly. The updating of sub flow stats is
done on (the less time critcal) TC stats requests rather than on FW stats
update.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When combining 2 sub_flows to a single 'merge flow' (assuming the merge is
valid), the merge flow should contain the same match fields as sub_flow 1
with actions derived from a combination of sub_flows 1 and 2. This action
list should have all actions from sub_flow 1 with the exception of the
output action that triggered the 'implicit recirculation' by sending to
an internal port, followed by all actions of sub_flow 2. Any pre-actions
in either sub_flow should feature at the start of the action list.
Add code to generate a new merge flow and populate the match and actions
fields based on the sub_flows. The offloading of the flow is left to
future patches.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Two flows can be merged if the second flow (after recirculation) matches
on bits that are either matched on or explicitly set by the first flow.
This means that if a packet hits flow 1 and recirculates then it is
guaranteed to hit flow 2.
Add a 'can_merge' function that determines if 2 sub_flows in a merge hint
can be validly merged to a single flow.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If a merge hint is received containing 2 flows that are matched via an
implicit recirculation (sending to and matching on an internal port), fw
reports that the flows (called sub_flows) may be able to be combined to a
single flow.
Add infastructure to accept and process merge hint messages. The actual
merging of the flows is left as a stub call.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Each flow is given a context ID that the fw uses (along with its cookie)
to identity the flow. The flows stats are updated by the fw via this ID
which is a reference to a pre-allocated array entry.
In preparation for flow merge code, enable the nfp_fl_payload structure to
be accessed via this stats context ID. Rather than increasing the memory
requirements of the pre-allocated array, add a new rhashtable to associate
each active stats context ID with its rule payload.
While adding new code to the compile metadata functions, slightly
restructure the existing function to allow for cleaner, easier to read
error handling.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The neighbour table in the FW only accepts next hop entries if the egress
port is an nfp repr. Modify this to allow the next hop to be an internal
port. This means that if a packet is to egress to that port, it will
recirculate back into the system with the internal port becoming its
ingress port.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
FW may receive a packet with its ingress port marked as an internal port.
If a rule does not exist to match on this port, the packet will be sent to
the NFP driver. Modify the flower app to detect packets from such internal
ports and convert the ingress port to the correct kernel space netdev.
At this point, it is assumed that fallback packets from internal ports are
to be sent out said port. Therefore, set the redir_egress bool to true on
detection of these ports.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, it is assumed that fallback packets will be from reprs. Modify
this to allow an app to receive non-repr ports from the fallback channel -
e.g. from an internal port. If such a packet is received, do not update
repr stats.
Change the naming function calls so as not to imply it will always be a
repr netdev returned. Add the option to set a bool value to redirect a
fallback packet out the returned port rather than RXing it. Setting of
this bool in subsequent patches allows the handling of packets falling
back when they are due to egress an internal port.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Recent FW modifications allow the offloading of non repr ports. These
ports exist internally on the NFP. So if a rule outputs to an 'internal'
port, then the packet will recirculate back into the system but will now
have this internal port as it's incoming port. These ports are indicated
by a specific type field combined with an 8 bit port id.
Add private app data to assign additional port ids for use in offloads.
Provide functions to lookup or create new ids when a rule attempts to
match on an internal netdev - the only internal netdevs currently
supported are of type openvswitch. Have a netdev notifier to release
port ids on netdev unregister.
OvS offloads rules that match on internal ports as TC egress filters.
Ensure that such rules are accepted by the driver.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Write to a FW symbol to indicate that the driver supports flow merging. If
this symbol does not exist then flow merging and recirculation is not
supported on the FW. If support is available, add a stub to deal with FW
to kernel merge hint messages.
Full flow merging requires the firmware to support of flow mods. If it
does not, then do not attempt to 'turn on' flow merging.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull fsdax fix from Dan Williams:
"A single filesystem-dax fix. It has been lingering in -next for a long
while and there are no other fsdax fixes on the horizon:
- Avoid a crash scenario with architectures like powerpc that require
'pgtable_deposit' for the zero page"
* tag 'fsdax-fix-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
fs/dax: Deposit pagetable even when installing zero page
|
|
Huazhong Tan says:
====================
net: hns3: fixes sparse: warning and type error
This patchset fixes a sparse warning and a overflow problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When setting vport->bw_limit to hdev->tm_info.pg_info[0].bw_limit
in hclge_tm_vport_tc_info_update, vport->bw_limit can be as big as
HCLGE_ETHER_MAX_RATE (100000), which can not fit into u16 (65535).
So this patch fixes it by using u32 for vport->bw_limit.
Fixes: 848440544b41 ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The input parameter "proto" in function hclge_set_vlan_filter_hw()
is asked to be __be16, but got u16 when calling it in function
hclge_update_port_base_vlan_cfg().
This patch fixes it by converting it with htons().
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: 21e043cd8124 ("net: hns3: fix set port based VLAN for PF")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Xin Long says:
====================
sctp: fully support memory accounting
sctp memory accounting is added in this patchset by using
these kernel APIs on send side:
- sk_mem_charge()
- sk_mem_uncharge()
- sk_wmem_schedule()
- sk_under_memory_pressure()
- sk_mem_reclaim()
and these on receive side:
- sk_mem_charge()
- sk_mem_uncharge()
- sk_rmem_schedule()
- sk_under_memory_pressure()
- sk_mem_reclaim()
With sctp memory accounting, we can limit the memory allocation by
either sysctl:
# sysctl -w net.sctp.sctp_mem="10 20 50"
or cgroup:
# echo $((8<<14)) > \
/sys/fs/cgroup/memory/sctp_mem/memory.kmem.tcp.limit_in_bytes
When the socket is under memory pressure, the send side will block
and wait, while the receive side will renege or drop.
v1->v2:
- add the missing Reported/Tested/Acked/-bys.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sk_forward_alloc's updating is also done on rx path, but to be consistent
we change to use sk_mem_charge() in sctp_skb_set_owner_r().
In sctp_eat_data(), it's not enough to check sctp_memory_pressure only,
which doesn't work for mem_cgroup_sockets_enabled, so we change to use
sk_under_memory_pressure().
When it's under memory pressure, sk_mem_reclaim() and sk_rmem_schedule()
should be called on both RENEGE or CHUNK DELIVERY path exit the memory
pressure status as soon as possible.
Note that sk_rmem_schedule() is using datalen to make things easy there.
Reported-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now when sending packets, sk_mem_charge() and sk_mem_uncharge() have been
used to set sk_forward_alloc. We just need to call sk_wmem_schedule() to
check if the allocated should be raised, and call sk_mem_reclaim() to
check if the allocated should be reduced when it's under memory pressure.
If sk_wmem_schedule() returns false, which means no memory is allowed to
allocate, it will block and wait for memory to become available.
Note different from tcp, sctp wait_for_buf happens before allocating any
skb, so memory accounting check is done with the whole msg_len before it
too.
Reported-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When __ip6_rt_update_pmtu() is called, rt->from is RCU dereferenced, but is
never checked for null - rt6_flush_exceptions() may have removed the entry.
[ 1913.989004] RIP: 0010:ip6_rt_cache_alloc+0x13/0x170
[ 1914.209410] Call Trace:
[ 1914.214798] <IRQ>
[ 1914.219226] __ip6_rt_update_pmtu+0xb0/0x190
[ 1914.228649] ip6_tnl_xmit+0x2c2/0x970 [ip6_tunnel]
[ 1914.239223] ? ip6_tnl_parse_tlv_enc_lim+0x32/0x1a0 [ip6_tunnel]
[ 1914.252489] ? __gre6_xmit+0x148/0x530 [ip6_gre]
[ 1914.262678] ip6gre_tunnel_xmit+0x17e/0x3c7 [ip6_gre]
[ 1914.273831] dev_hard_start_xmit+0x8d/0x1f0
[ 1914.283061] sch_direct_xmit+0xfa/0x230
[ 1914.291521] __qdisc_run+0x154/0x4b0
[ 1914.299407] net_tx_action+0x10e/0x1f0
[ 1914.307678] __do_softirq+0xca/0x297
[ 1914.315567] irq_exit+0x96/0xa0
[ 1914.322494] smp_apic_timer_interrupt+0x68/0x130
[ 1914.332683] apic_timer_interrupt+0xf/0x20
[ 1914.341721] </IRQ>
Fixes: a68886a69180 ("net/ipv6: Make from in rt6_info rcu protected")
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ido Schimmel says:
====================
mlxsw: Add neighbour offload indication
Neighbour entries are programmed to the device's table so that the
correct destination MAC will be specified in a packet after it was
routed.
Despite being programmed to the device and unlike routes and FDB
entries, neighbour entries are currently not marked as offloaded. This
patchset changes that.
Patch #1 is a preparatory patch to make sure we only mark a neighbour as
offloaded in case it was successfully programmed to the device.
Patch #2 sets the offload indication on neighbours.
Patch #3 adds a test to verify above mentioned functionality.
Patched iproute2 version that prints the offload indication is available
here [1].
[1] https://github.com/idosch/iproute2/tree/idosch-next
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|