Age | Commit message (Collapse) | Author |
|
Pull block fixes from Jens Axboe:
"It's been a few weeks, so here's a small collection of fixes that
should go into the current series.
This contains:
- NVMe pull request from Christoph, with a few important fixes.
- kyber hang fix from Omar.
- A blk-throttl fix from Shaohua, fixing a case where we double
charge a bio.
- Two call_single_data alignment fixes from me, fixing up some
unfortunate changes that went into 4.14 without being properly
reviewed on the block side (since nobody was CC'ed on the
patch...).
- A bounce buffer fix in two parts, one from me and one from Ming.
- Revert bdi debug error handling patch. It's causing boot issues for
some folks, and a week down the line, we're still no closer to a
fix. Revert this patch for now until it's figured out, then we can
retry for 4.16"
* 'for-linus' of git://git.kernel.dk/linux-block:
Revert "bdi: add error handle for bdi_debug_register"
null_blk: unalign call_single_data
block: unalign call_single_data in struct request
block-throttle: avoid double charge
block: fix blk_rq_append_bio
block: don't let passthrough IO go into .make_request_fn()
nvme: setup streams after initializing namespace head
nvme: check hw sectors before setting chunk sectors
nvme: call blk_integrity_unregister after queue is cleaned up
nvme-fc: remove double put reference if admin connect fails
nvme: set discard_alignment to zero
kyber: fix another domain token wait queue hang
|
|
Pull KVM fixes from Paolo Bonzini:
"ARM fixes:
- A bug in handling of SPE state for non-vhe systems
- A fix for a crash on system shutdown
- Three timer fixes, introduced by the timer optimizations for v4.15
x86 fixes:
- fix for a WARN that was introduced in 4.15
- fix for SMM when guest uses PCID
- fixes for several bugs found by syzkaller
... and a dozen papercut fixes for the kvm_stat tool"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
tools/kvm_stat: sort '-f help' output
kvm: x86: fix RSM when PCID is non-zero
KVM: Fix stack-out-of-bounds read in write_mmio
KVM: arm/arm64: Fix timer enable flow
KVM: arm/arm64: Properly handle arch-timer IRQs after vtimer_save_state
KVM: arm/arm64: timer: Don't set irq as forwarded if no usable GIC
KVM: arm/arm64: Fix HYP unmapping going off limits
arm64: kvm: Prevent restoring stale PMSCR_EL1 for vcpu
KVM/x86: Check input paging mode when cs.l is set
tools/kvm_stat: add line for totals
tools/kvm_stat: stop ignoring unhandled arguments
tools/kvm_stat: suppress usage information on command line errors
tools/kvm_stat: handle invalid regular expressions
tools/kvm_stat: add hint on '-f help' to man page
tools/kvm_stat: fix child trace events accounting
tools/kvm_stat: fix extra handling of 'help' with fields filter
tools/kvm_stat: fix missing field update after filter change
tools/kvm_stat: fix drilldown in events-by-guests mode
tools/kvm_stat: fix command line option '-g'
kvm: x86: fix WARN due to uninitialized guest FPU state
...
|
|
The RGMII spec allows compliance for devices that implement an internal
delay on TXC and/or RXC inside the transmitter. This patch adds the
necessary RGMII_[RX|TX]ID mode code to handle such PHYs with the
emac driver.
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ibm_emac driver predates the PHY_INTERFACE_MODE_*
enums by a few years.
And while the driver has been retrofitted to use the PHYLIB,
the old definitions have stuck around to this day.
This patch replaces all occurences of PHY_MODE_* with
the respective equivalent PHY_INTERFACE_MODE_* enum.
And finally, it purges the old macros for good.
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
phy_modes() in the common phy.h already defines the same phy mode
names in lower case. The deleted rgmii_mode_name() is used only
in one place and for a "notice-level" printk. Hence, it will not
be missed.
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
supposed to not include flowlabel. This is true for normal packet, but
not for reset packet.
The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
changed, so the sock will keep the old behavior in terms of auto
flowlabel. Reset packet is suffering from this problem, because reset
packet is sent from a special control socket, which is created at boot
time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
socket will always have its ipv6_pinfo.autoflowlabel set, even after
user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
have flowlabel. Normal sock created before sysctl setting suffers from
the same issue. We can't even turn off autoflowlabel unless we kill all
socks in the hosts.
To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
autoflowlabel setting from user, otherwise we always call
ip6_default_np_autolabel() which has the new settings of sysctl.
Note, this changes behavior a little bit. Before commit 42240901f7c4
(ipv6: Implement different admin modes for automatic flow labels), the
autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
existing connection will change autoflowlabel behavior. After that
commit, autoflowlabel behavior is sticky in the whole life of the sock.
With this patch, the behavior isn't sticky again.
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <tom@quantonium.net>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
skb_vlan_pop() expects skb->protocol to be a valid TPID for double
tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop()
shift the true ethertype into position for us.
Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets")
Signed-off-by: Eric Garver <e@erig.me>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current code advertises (on the modifiers blob property) support for CCS
modifier for pipe C on GLK, only to reject it later when validating the
request before the atomic commit.
This fixes the tests igt@kms_ccs@pipe-c-*, which should skip on GLK for
pipe C (see bug 104096).
A relevant discussion is archived at:
https://lists.freedesktop.org/archives/intel-gfx/2017-December/150646.html
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104096
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171220002410.5604-1-krisman@collabora.co.uk
(cherry picked from commit f0cbd8bd877f3d8c5b80a6b1add9ca9010d7f9d8)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Alexander Aring says:
====================
net: sched: sch: introduce extack support
this patch series basically add support for extack in common qdisc handling.
Additional it adds extack pointer to common qdisc callback handling this
offers per qdisc implementation to setting the extack message for each
failure over netlink.
The extack message will be set deeper in qdisc functions but going not
deeper as net core api. For qdisc module callback handling, the extack
will not be set. This will be part of per qdisc extack handling.
I also want to prepare patches to handle extack per qdisc module...
so there will come a lot of more patches, just cut them down to make
it reviewable.
There are some above 80-chars width warnings, which I ignore because
it looks more ugly otherwise.
This patch-series based on patches by David Ahern which gave me some
hints how to deal with extack support.
Cc: David Ahern <dsahern@gmail.com>
changes since v4:
- rebase on current net-next/master
- fix several typos (also David Ahren to Ahern, I am sorry)
- Add acked by Jamal
changes since v3:
- remove patch 2/2 lib: nlattr: set extack msg if validate_nla fails since
David Ahern has a better solution
- Remove check on net admin permission since -EPERM indicates it already
- Change rtab to "rate table" - this is what it's stands for
- Fix cbs *not* support messages
- Fix tcf block error message for allocation, allocation will be still there
because there are multiple places which returns -ENOMEM
- Finnally also took care about sch_atm, sorry somehow I forgot this one and
I hope I didn't forgot any sch implementation to add new callback parameters
changes since v2:
- add fix coding style patch to catch all checkpatch warnings
- add patch for setting netlink extack msg if validate_nla fails
- changes in handle generic qdisc errors
- remove NL_SET_ERR_MSG from memory allocation errors
- remove NL_SET_ERR_MSG from device not found
- change STAB to table size
- add various new patches to add extack support for common
TC functions like qdisc_get_rtab, tcf_block_get, qdisc_alloc
and qdisc_create_dflt - users which are interessted in the
detailed error messages can assign extack, otherwise NULL.
- Add sch_cbq as example for qdisc_ops callback: init,
qdisc_class_ops callbacks: change and graft
- Add sch_cbs as example for qdisc_ops callback: change
- Add sch_drr as example for qdisc_class ops callbacks: tcf_block
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds extack support for the drr qdisc implementation by
adding NL_SET_ERR_MSG in validation of user input.
Also it serves to illustrate a use case of how the infrastructure ops
api changes are to be used by individual qdiscs.
Cc: David Ahern <dsahern@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds extack support for the cbs qdisc implementation by
adding NL_SET_ERR_MSG in validation of user input.
Also it serves to illustrate a use case of how the infrastructure ops
api changes are to be used by individual qdiscs.
Cc: David Ahern <dsahern@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds extack support for the cbq qdisc implementation by
adding NL_SET_ERR_MSG in validation of user input.
Also it serves to illustrate a use case of how the infrastructure ops
api changes are to be used by individual qdiscs.
Cc: David Ahern <dsahern@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds extack support for the function qdisc_create_dflt which is
a common used function in the tc subsystem. Callers which are interested
in the receiving error can assign extack to get a more detailed
information why qdisc_create_dflt failed. The function qdisc_create_dflt
will also call an init callback which can fail by any per-qdisc specific
handling.
Cc: David Ahern <dsahern@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds extack support for the function qdisc_alloc which is
a common used function in the tc subsystem. Callers which are interested
in the receiving error can assign extack to get a more detailed
information why qdisc_alloc failed.
Cc: David Ahern <dsahern@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds extack support for the function tcf_block_get which is
a common used function in the tc subsystem. Callers which are interested
in the receiving error can assign extack to get a more detailed
information why tcf_block_get failed.
Cc: David Ahern <dsahern@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds extack support for the function qdisc_get_rtab which is
a common used function in the tc subsystem. Callers which are interested
in the receiving error can assign extack to get a more detailed
information why qdisc_get_rtab failed.
Cc: David Ahern <dsahern@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds extack support for graft callback to prepare per-qdisc
specific changes for extack.
Cc: David Ahern <dsahern@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds extack support for block callback to prepare per-qdisc
specific changes for extack.
Cc: David Ahern <dsahern@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds extack support for class change callback api. This prepares
to handle extack support inside each specific class implementation.
Cc: David Ahern <dsahern@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds extack support for change callback for qdisc ops
structtur to prepare per-qdisc specific changes for extack.
Cc: David Ahern <dsahern@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds extack support for init callback to prepare per-qdisc
specific changes for extack.
Cc: David Ahern <dsahern@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds extack support for generic qdisc handling. The extack
will be set deeper to each called function which is not part of netdev
core api.
Cc: David Ahern <dsahern@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fix checkpatch issues for upcomming patches according to the
sched api file. It changes mostly how to check on null pointer.
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit a0747a859ef6d3cc5b6cd50eb694499b78dd0025.
It breaks some booting for some users, and more than a week
into this, there's still no good fix. Revert this commit
for now until a solution has been found.
Reported-by: Laura Abbott <labbott@redhat.com>
Reported-by: Bruno Wolff III <bruno@wolff.to>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Currently, parameters such as oif and source address are not taken into
account during fibmatch lookup. Example (IPv4 for reference) before
patch:
$ ip -4 route show
192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
198.51.100.0/24 dev dummy1 proto kernel scope link src 198.51.100.1
$ ip -6 route show
2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
2001:db8:2::/64 dev dummy1 proto kernel metric 256 pref medium
fe80::/64 dev dummy0 proto kernel metric 256 pref medium
fe80::/64 dev dummy1 proto kernel metric 256 pref medium
$ ip -4 route get fibmatch 192.0.2.2 oif dummy0
192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
$ ip -4 route get fibmatch 192.0.2.2 oif dummy1
RTNETLINK answers: No route to host
$ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
$ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
After:
$ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
$ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
RTNETLINK answers: Network is unreachable
The problem stems from the fact that the necessary route lookup flags
are not set based on these parameters.
Instead of duplicating the same logic for fibmatch, we can simply
resolve the original route from its copy and dump it instead.
Fixes: 18c3a61c4264 ("net: ipv6: RTM_GETROUTE: return matched fib result when requested")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For rmap removal, refactor the rmap owner checks into a separate
function, then skip the checks if we are performing an unknown-owner
removal.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Calling xfs_rmap_free with an unknown owner is supposed to remove any
rmaps covering that range regardless of owner. This is used by the EFI
recovery code to say "we're freeing this, it mustn't be owned by
anything anymore", but for whatever reason xfs_free_ag_extent filters
them out.
Therefore, remove the filter and make xfs_rmap_unmap actually treat it
as a wildcard owner -- free anything that's already there, and if
there's no owner at all then that's fine too.
There are two existing callers of bmap_add_free that take care the rmap
deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap
cleanup; convert these to use OWN_NULL (via helpers), and now we really
require that an RUI (if any) gets added to the defer ops before any EFI.
Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests,
growfs will have to consult directly with the rmap to ensure that there
aren't any rmaps in the grown region.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
order
Under the deferred rmap operation scheme, there's a certain order in
which the rmap deferred ops have to be queued to maintain integrity
during log replay. For alloc/map operations that order is cui -> rui;
for free/unmap operations that order is cui -> rui -> efi. However, the
initial refcount code got the ordering wrong in the free side of things
because it queued refcount free op and an EFI and the refcount free op
queued a rmap free op, resulting in the order cui -> efi -> rui.
If we fail before the efd finishes, the efi recovery will try to do a
wildcard rmap removal and the subsequent rui will fail to find the rmap
and blow up. This didn't ever happen due to other screws up in handling
unknown owner rmap removals, but those other screw ups broke recovery in
other ways, so fix the ordering to follow the intended rules.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
If a user performs a direct CoW write, we end up loading the CoW fork
with preallocated extents. Therefore, we must set the cowblocks tag so
that they can be cleared out if we run low on space.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
When we're remounting the filesystem readonly, remove all CoW
preallocations prior to going ro. If the fs goes down after the ro
remount, we never clean up the staging extents, which means xfs_check
will trip over them on a subsequent run. Practically speaking, the next
mount will clean them up too, so this is unlikely to be seen. Since we
shut down the cowblocks cleaner on remount-ro, we also have to make sure
we start it back up if/when we remount-rw.
Found by adding clonerange to fsstress and running xfs/017.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Currently, xfs_itruncate_extents clears the cowblocks tag if i_cnextents
is zero. This is wrong, since i_cnextents only tracks real extents in
the CoW fork, which means that we could have some delayed CoW
reservations still in there that will now never get cleaned.
Fix a further bug where we /don't/ clear the reflink iflag if there are
any attribute blocks -- really, it's only safe to clear the reflink flag
if there are no data fork extents and no cow fork extents.
Found by adding clonerange to fsstress in xfs/017.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Sort the fields returned by specifying '-f help' on the command line.
While at it, simplify the code a bit, indent the output and eliminate an
extra blank line at the beginning.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
rsm_load_state_64() and rsm_enter_protected_mode() load CR3, then
CR4 & ~PCIDE, then CR0, then CR4.
However, setting CR4.PCIDE fails if CR3[11:0] != 0. It's probably easier
in the long run to replace rsm_enter_protected_mode() with an emulator
callback that sets all the special registers (like KVM_SET_SREGS would
do). For now, set the PCID field of CR3 only after CR4.PCIDE is 1.
Reported-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Fixes: 660a5d517aaab9187f93854425c4c63f4a09195c
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
drm-intel-fixes
gvt-fixes-2017-12-21:
- default pipe enable fix for virtual display (Xiaolin)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171221032500.xjofb4xyoihw3wo5@zhen-hp.sh.intel.com
|
|
Patch bd36d3bab2e3d08f80766c86487090dbceed4651 fixed a deadlock in the
failure path of drm_lease_create. This made the partially initialized
lease object visible for a short window of time.
To avoid having the lessee state appear transiently, I've rearranged
the code so that the lessor fields are not filled in until the
parameters are all validated and the function will succeed.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171221065424.1304-1-keithp@keithp.com
|
|
There's no reason to define netdev->xfrmdev_ops if
the offload facility is not CONFIG'd in.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
This adds a check for the required add and delete functions up front
at registration time to be sure both are defined.
Since both the features check and the registration check are looking
at the same things, break out the check for both to call.
Lastly, for some reason the feature check was setting xfrmdev_ops to
NULL if the NETIF_F_HW_ESP bit was missing, which would probably
surprise the driver later if the driver turned its NETIF_F_HW_ESP bit
back on. We shouldn't be messing with the driver's callback list, so
we stop doing that with this patch.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
The current XFRM code assumes that we've implemented the
xdo_dev_state_free() callback, even if it is meaningless to the driver.
This patch adds a check for it before calling, as done in other APIs,
to prevent a NULL function pointer kernel crash.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2017-12-21
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix multiple security issues in the BPF verifier mostly related
to the value and min/max bounds tracking rework in 4.14. Issues
range from incorrect bounds calculation in some BPF_RSH cases,
to improper sign extension and reg size handling on 32 bit
ALU ops, missing strict alignment checks on stack pointers, and
several others that got fixed, from Jann, Alexei and Edward.
2) Fix various build failures in BPF selftests on sparc64. More
specifically, librt needed to be added to the libs to link
against and few format string fixups for sizeof, from David.
3) Fix one last remaining issue from BPF selftest build that was
still occuring on s390x from the asm/bpf_perf_event.h include
which could not find the asm/ptrace.h copy, from Hendrik.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The default rlimit RLIMIT_MEMLOCK is 64KB. In certain cases,
e.g. in a test machine mimicking our production system, this test may
fail due to unable to charge the required memory for prog load:
$ ./test_dev_cgroup
libbpf: load bpf program failed: Operation not permitted
libbpf: failed to load program 'cgroup/dev'
libbpf: failed to load object './dev_cgroup.o'
Failed to load DEV_CGROUP program
...
Changing the default rlimit RLIMIT_MEMLOCK to unlimited
makes the test pass.
This patch also fixed a problem where when bpf_prog_load fails,
cleanup_cgroup_environment() should not be called since
setup_cgroup_environment() has not been invoked. Otherwise,
the following confusing message will appear:
...
(/home/yhs/local/linux/tools/testing/selftests/bpf/cgroup_helpers.c:95:
errno: No such file or directory) Opening Cgroup Procs: /mnt/cgroup.procs
...
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Daniel Borkmann says:
====================
This work adds correlation of maps and calls into the bpftool
xlated dump in order to help debugging and introspection of
loaded BPF progs. First patch makes kallsyms work on subprogs
with bpf calls, and second implements the actual correlation.
Details and example output can be found in the 2nd patch.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently a dump of an xlated prog (post verifier stage) doesn't
correlate used helpers as well as maps. The prog info lists
involved map ids, however there's no correlation of where in the
program they are used as of today. Likewise, bpftool does not
correlate helper calls with the target functions.
The latter can be done w/o any kernel changes through kallsyms,
and also has the advantage that this works with inlined helpers
and BPF calls.
Example, via interpreter:
# tc filter show dev foo ingress
filter protocol all pref 49152 bpf chain 0
filter protocol all pref 49152 bpf chain 0 handle 0x1 foo.o:[ingress] \
direct-action not_in_hw id 1 tag c74773051b364165 <-- prog id:1
* Output before patch (calls/maps remain unclear):
# bpftool prog dump xlated id 1 <-- dump prog id:1
0: (b7) r1 = 2
1: (63) *(u32 *)(r10 -4) = r1
2: (bf) r2 = r10
3: (07) r2 += -4
4: (18) r1 = 0xffff95c47a8d4800
6: (85) call unknown#73040
7: (15) if r0 == 0x0 goto pc+18
8: (bf) r2 = r10
9: (07) r2 += -4
10: (bf) r1 = r0
11: (85) call unknown#73040
12: (15) if r0 == 0x0 goto pc+23
[...]
* Output after patch:
# bpftool prog dump xlated id 1
0: (b7) r1 = 2
1: (63) *(u32 *)(r10 -4) = r1
2: (bf) r2 = r10
3: (07) r2 += -4
4: (18) r1 = map[id:2] <-- map id:2
6: (85) call bpf_map_lookup_elem#73424 <-- helper call
7: (15) if r0 == 0x0 goto pc+18
8: (bf) r2 = r10
9: (07) r2 += -4
10: (bf) r1 = r0
11: (85) call bpf_map_lookup_elem#73424
12: (15) if r0 == 0x0 goto pc+23
[...]
# bpftool map show id 2 <-- show/dump/etc map id:2
2: hash_of_maps flags 0x0
key 4B value 4B max_entries 3 memlock 4096B
Example, JITed, same prog:
# tc filter show dev foo ingress
filter protocol all pref 49152 bpf chain 0
filter protocol all pref 49152 bpf chain 0 handle 0x1 foo.o:[ingress] \
direct-action not_in_hw id 3 tag c74773051b364165 jited
# bpftool prog show id 3
3: sched_cls tag c74773051b364165
loaded_at Dec 19/13:48 uid 0
xlated 384B jited 257B memlock 4096B map_ids 2
# bpftool prog dump xlated id 3
0: (b7) r1 = 2
1: (63) *(u32 *)(r10 -4) = r1
2: (bf) r2 = r10
3: (07) r2 += -4
4: (18) r1 = map[id:2] <-- map id:2
6: (85) call __htab_map_lookup_elem#77408 <-+ inlined rewrite
7: (15) if r0 == 0x0 goto pc+2 |
8: (07) r0 += 56 |
9: (79) r0 = *(u64 *)(r0 +0) <-+
10: (15) if r0 == 0x0 goto pc+24
11: (bf) r2 = r10
12: (07) r2 += -4
[...]
Example, same prog, but kallsyms disabled (in that case we are
also not allowed to pass any relative offsets, etc, so prog
becomes pointer sanitized on dump):
# sysctl kernel.kptr_restrict=2
kernel.kptr_restrict = 2
# bpftool prog dump xlated id 3
0: (b7) r1 = 2
1: (63) *(u32 *)(r10 -4) = r1
2: (bf) r2 = r10
3: (07) r2 += -4
4: (18) r1 = map[id:2]
6: (85) call bpf_unspec#0
7: (15) if r0 == 0x0 goto pc+2
[...]
Example, BPF calls via interpreter:
# bpftool prog dump xlated id 1
0: (85) call pc+2#__bpf_prog_run_args32
1: (b7) r0 = 1
2: (95) exit
3: (b7) r0 = 2
4: (95) exit
Example, BPF calls via JIT:
# sysctl net.core.bpf_jit_enable=1
net.core.bpf_jit_enable = 1
# sysctl net.core.bpf_jit_kallsyms=1
net.core.bpf_jit_kallsyms = 1
# bpftool prog dump xlated id 1
0: (85) call pc+2#bpf_prog_3b185187f1855c4c_F
1: (b7) r0 = 1
2: (95) exit
3: (b7) r0 = 2
4: (95) exit
And finally, an example for tail calls that is now working
as well wrt correlation:
# bpftool prog dump xlated id 2
[...]
10: (b7) r2 = 8
11: (85) call bpf_trace_printk#-41312
12: (bf) r1 = r6
13: (18) r2 = map[id:1]
15: (b7) r3 = 0
16: (85) call bpf_tail_call#12
17: (b7) r1 = 42
18: (6b) *(u16 *)(r6 +46) = r1
19: (b7) r0 = 0
20: (95) exit
# bpftool map show id 1
1: prog_array flags 0x0
key 4B value 4B max_entries 1 memlock 4096B
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Right now kallsyms handling is not working with JITed subprogs.
The reason is that when in 1c2a088a6626 ("bpf: x64: add JIT support
for multi-function programs") in jit_subprogs() they are passed
to bpf_prog_kallsyms_add(), then their prog type is 0, which BPF
core will think it's a cBPF program as only cBPF programs have a
0 type. Thus, they need to inherit the type from the main prog.
Once that is fixed, they are indeed added to the BPF kallsyms
infra, but their tag is 0. Therefore, since intention is to add
them as bpf_prog_F_<tag>, we need to pass them to bpf_prog_calc_tag()
first. And once this is resolved, there is a use-after-free on
prog cleanup: we remove the kallsyms entry from the main prog,
later walk all subprogs and call bpf_jit_free() on them. However,
the kallsyms linkage was never released on them. Thus, do that
for all subprogs right in __bpf_prog_put() when refcount hits 0.
Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Do not allow root to convert valid pointers into unknown scalars.
In particular disallow:
ptr &= reg
ptr <<= reg
ptr += ptr
and explicitly allow:
ptr -= ptr
since pkt_end - pkt == length
1.
This minimizes amount of address leaks root can do.
In the future may need to further tighten the leaks with kptr_restrict.
2.
If program has such pointer math it's likely a user mistake and
when verifier complains about it right away instead of many instructions
later on invalid memory access it's easier for users to fix their progs.
3.
when register holding a pointer cannot change to scalar it allows JITs to
optimize better. Like 32-bit archs could use single register for pointers
instead of a pair required to hold 64-bit scalars.
4.
reduces architecture dependent behavior. Since code:
r1 = r10;
r1 &= 0xff;
if (r1 ...)
will behave differently arm64 vs x64 and offloaded vs native.
A significant chunk of ptr mangling was allowed by
commit f1174f77b50c ("bpf/verifier: rework value tracking")
yet some of it was allowed even earlier.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Alexei Starovoitov says:
====================
This patch set addresses a set of security vulnerabilities
in bpf verifier logic discovered by Jann Horn.
All of the patches are candidates for 4.14 stable.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
These tests should cover the following cases:
- MOV with both zero-extended and sign-extended immediates
- implicit truncation of register contents via ALU32/MOV32
- implicit 32-bit truncation of ALU32 output
- oversized register source operand for ALU32 shift
- right-shift of a number that could be positive or negative
- map access where adding the operation size to the offset causes signed
32-bit overflow
- direct stack access at a ~4GiB offset
Also remove the F_LOAD_WITH_STRICT_ALIGNMENT flag from a bunch of tests
that should fail independent of what flags userspace passes.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
There were various issues related to the limited size of integers used in
the verifier:
- `off + size` overflow in __check_map_access()
- `off + reg->off` overflow in check_mem_access()
- `off + reg->var_off.value` overflow or 32-bit truncation of
`reg->var_off.value` in check_mem_access()
- 32-bit truncation in check_stack_boundary()
Make sure that any integer math cannot overflow by not allowing
pointer math with large values.
Also reduce the scope of "scalar op scalar" tracking.
Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
This could be made safe by passing through a reference to env and checking
for env->allow_ptr_leaks, but it would only work one way and is probably
not worth the hassle - not doing it will not directly lead to program
rejection.
Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Force strict alignment checks for stack pointers because the tracking of
stack spills relies on it; unaligned stack accesses can lead to corruption
of spilled registers, which is exploitable.
Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Prevent indirect stack accesses at non-constant addresses, which would
permit reading and corrupting spilled pointers.
Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|