summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-08-09net-next: tag_mtk: add flow_dissect callback to the ops structJohn Crispin
The MT7530 inserts the 4 magic header in between the 802.3 address and protocol field. The patch implements the callback that can be called by the flow dissector to figure out the real protocol and offset of the network header. With this patch applied we can properly parse the packet and thus make hashing function properly. Signed-off-by: Muciri Gatimu <muciri@openmesh.com> Signed-off-by: Shashidhar Lakkavalli <shashidhar.lakkavalli@openmesh.com> Signed-off-by: John Crispin <john@phrozen.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09net-next: dsa: add flow_dissect callback to struct dsa_device_opsJohn Crispin
When the flow dissector first sees packets coming in on a DSA devices the 802.3 header wont be located where the code expects it to be as the tag is still present. Adding this new callback allows a DSA device to provide a new function that the flow_dissector can use to get the correct protocol and offset of the network header. Signed-off-by: Muciri Gatimu <muciri@openmesh.com> Signed-off-by: Shashidhar Lakkavalli <shashidhar.lakkavalli@openmesh.com> Signed-off-by: John Crispin <john@phrozen.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09net-next: dsa: move struct dsa_device_ops to the global header fileJohn Crispin
We need to access this struct from within the flow_dissector to fix dissection for packets coming in on DSA devices. Signed-off-by: Muciri Gatimu <muciri@openmesh.com> Signed-off-by: Shashidhar Lakkavalli <shashidhar.lakkavalli@openmesh.com> Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_targetXin Long
Commit 55917a21d0cc ("netfilter: x_tables: add context to know if extension runs from nft_compat") introduced a member nft_compat to xt_tgchk_param structure. But it didn't set it's value for ipt_init_target. With unexpected value in par.nft_compat, it may return unexpected result in some target's checkentry. This patch is to set all it's fields as 0 and only initialize the non-zero fields in ipt_init_target. v1->v2: As Wang Cong's suggestion, fix it by setting all it's fields as 0 and only initializing the non-zero fields. Fixes: 55917a21d0cc ("netfilter: x_tables: add context to know if extension runs from nft_compat") Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09igmp: Fix regression caused by igmp sysctl namespace code.Nikolay Borisov
Commit dcd87999d415 ("igmp: net: Move igmp namespace init to correct file") moved the igmp sysctls initialization from tcp_sk_init to igmp_net_init. This function is only called as part of per-namespace initialization, only if CONFIG_IP_MULTICAST is defined, otherwise igmp_mc_init() call in ip_init is compiled out, casuing the igmp pernet ops to not be registerd and those sysctl being left initialized with 0. However, there are certain functions, such as ip_mc_join_group which are always compiled and make use of some of those sysctls. Let's do a partial revert of the aforementioned commit and move the sysctl initialization into inet_init_net, that way they will always have sane values. Fixes: dcd87999d415 ("igmp: net: Move igmp namespace init to correct file") Link: https://bugzilla.kernel.org/show_bug.cgi?id=196595 Reported-by: Gerardo Exequiel Pozzi <vmlinuz386@gmail.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09Merge branch 'mediatek-bring-up-QDMA-RX-ring-0'David S. Miller
John Crispin says: ==================== net-next: mediatek: bring up QDMA RX ring 0 The MT7623 has several DMA rings. Inside the SW path, the core will use the PDMA when receiving traffic. While bringing up the HW path we noticed that the PPE requires the QDMA RX to also be brought up as it uses this ring internally for its flow scheduling. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09net-next: mediatek: bring up QDMA RX ring 0John Crispin
This patch is in preparation for adding HW flow and QoS offloading. For those features to work, the driver needs to bring up the first QDMA RX ring. This ring is used by the PPE offloading HW. Signed-off-by: John Crisp in <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09net-next: mediatek: fix typos inside the header fileJohn Crispin
Trivial patch fixing 2 typos. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09net: atm: make atmdev_ops constBhumika Goyal
Make these const as they are only stored in the ops field of a atm_dev structure, which is const. Done using Coccinelle. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09atm: make atmdev_ops constBhumika Goyal
Make these structures const as they are either passed to the function atm_dev_register having the corresponding argument as const or stored in the ops field of a atm_dev structure, which is also const. Done using Coccinelle. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09net: dsa: make dsa_switch_ops constBhumika Goyal
Make these structures const as they are only stored in the ops field of a dsa_switch structure, which is const. Done using Coccinelle. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09liquidio: napi cleanupIntiyaz Basha
Disable napi when interface is going down. Delete napi when destroying the interface. Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09geneve: maximum value of VNI cannot be usedGirish Moodalbail
Geneve's Virtual Network Identifier (VNI) is 24 bit long, so the range of values for it would be from 0 to 16777215 (2^24 -1). However, one cannot create a geneve device with VNI set to 16777215. This patch fixes this issue. Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09net: ipv6: lower ndisc notifier priority below addrconfDavid Ahern
ndisc_notify is used to send unsolicited neighbor advertisements (e.g., on a link up). Currently, the ndisc notifier is run before the addrconf notifer which means NA's are not sent for link-local addresses which are added by the addrconf notifier. Fix by lowering the priority of the ndisc notifier. Setting the priority to ADDRCONF_NOTIFY_PRIORITY - 5 means it runs after addrconf and before the route notifier which is ADDRCONF_NOTIFY_PRIORITY - 10. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09net: systemport: Fix software statistics for SYSTEMPORT LiteFlorian Fainelli
With SYSTEMPORT Lite we have holes in our statistics layout that make us skip over the hardware MIB counters, bcm_sysport_get_stats() was not taking that into account resulting in reporting 0 for all SW-maintained statistics, fix this by skipping accordingly. Fixes: 44a4524c54af ("net: systemport: Add support for SYSTEMPORT Lite") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09tipc: remove premature ESTABLISH FSM event at link synchronizationJon Paul Maloy
When a link between two nodes come up, both endpoints will initially send out a STATE message to the peer, to increase the probability that the peer endpoint also is up when the first traffic message arrives. Thereafter, if the establishing link is the second link between two nodes, this first "traffic" message is a TUNNEL_PROTOCOL/SYNCH message, helping the peer to perform initial synchronization between the two links. However, the initial STATE message may be lost, in which case the SYNCH message will be the first one arriving at the peer. This should also work, as the SYNCH message itself will be used to take up the link endpoint before initializing synchronization. Unfortunately the code for this case is broken. Currently, the link is brought up through a tipc_link_fsm_evt(ESTABLISHED) when a SYNCH arrives, whereupon __tipc_node_link_up() is called to distribute the link slots and take the link into traffic. But, __tipc_node_link_up() is itself starting with a test for whether the link is up, and if true, returns without action. Clearly, the tipc_link_fsm_evt(ESTABLISHED) call is unnecessary, since tipc_node_link_up() is itself issuing such an event, but also harmful, since it inhibits tipc_node_link_up() to perform the test of its tasks, and the link endpoint in question hence is never taken into traffic. This problem has been exposed when we set up dual links between pre- and post-4.4 kernels, because the former ones don't send out the initial STATE message described above. We fix this by removing the unnecessary event call. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09ibmvnic: Correct 'unused variable' warning in build.Nathan Fontenot
Commit a248878d7a1d ("ibmvnic: Check for transport event on driver resume") removed the loop to kick irqs on driver resume but didn't remove the now unused loop variable 'i'. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09ibmvnic: Add netdev_dbg output for debuggingNathan Fontenot
To ease debugging of the ibmvnic driver add a series of netdev_dbg() statements to track driver status, especially during initialization, removal, and resetting of the driver. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09ibmvnic: Clean up resources on probe failureNathan Fontenot
Ensure that any resources allocated during probe are released if the probe of the driver fails. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09sunvdc: prevent sunvdc panic when mpgroup disk added to guest domainJim Quigley
Using mpgroup to define multiple paths for a virtual disk causes multiple virtual-device-port ports to be created for that virtual device. Each virtual-device-port port then gets a vdisk created for it by the Linux sunvdc driver. As mpgroup is not supported by the Linux sunvdc driver it cannot handle multiple ports for a single vdisk, leading to a kernel panic at startup. This fix prevents more than one vdisk per virtual-device-port being created until full virtual disk multipathing (mpgroup) support is implemented. Signed-off-by: Jim Quigley <Jim.Quigley@oracle.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Aaron Young <aaron.young@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09Merge branch 'rtnetlink-allow-selected-handlers-to-run-without-rtnl'David S. Miller
Florian Westphal says: ==================== rtnetlink: allow selected handlers to run without rtnl Changes since v1: In patch 6, don't make ipv6 route handlers lockless, they all have assumptions on rtnl being held. Other patches are unchanged. The RTNL mutex is used to serialize both rtnetlink calls and dump requests. Its also used to protect other things such as the list of current net namespaces. Unfortunately RTNL mutex is a performance issue, e.g. a cpu adding an ip address prevents other cpus from seemingly unrelated tasks such as dumping tc classifiers or doing rtnetlink route lookups. This patch set adds basic infrastructure to start pushing the rtnl lock down to those places that need it, or even elide it entirely in some cases. Subsystems can now indicate that their doit() callback can run without RTNL mutex, such callbacks can then run in parallel. This will obviously need a lot of followup work; all current users need to be audited/changed to benefit from this. Initial no-rtnl spot is netns new/getid. We have various 'get' handlers that are also a tempting target, however, several of these depend on rtnl mutex to prevent information from changing while objects are being read by rtnl handlers; however, it doesn't appear impossible to change this. Dumps are another problem entirely, see commit 2907c35ff64708065 ("net: hold rtnl again in dump callbacks"), this patchset doesn't touch dump requests. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09net: call newid/getid without rtnl mutex heldFlorian Westphal
Both functions take nsid_lock and don't rely on rtnl lock. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09rtnetlink: add RTNL_FLAG_DOIT_UNLOCKEDFlorian Westphal
Allow callers to tell rtnetlink core that its doit callback should be invoked without holding rtnl mutex. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09rtnetlink: protect handler table with rcuFlorian Westphal
Note that netlink dumps still acquire rtnl mutex via the netlink dump infrastructure. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09rtnetlink: small rtnl lock pushdownFlorian Westphal
instead of rtnl lock/unload at the top level, push it down to the called function. This is just an intermediate step, next commit switches protection of the rtnl_link ops table to rcu, in which case (for dumps) the rtnl lock is acquired only by the netlink dumper infrastructure (current lock/unlock/dump/lock/unlock rtnl sequence becomes rcu lock/rcu unlock/dump). Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09rtnetlink: add reference counting to prevent module unload while dump is in ↵Florian Westphal
progress I don't see what prevents rmmod (unregister_all is called) while a dump is active. Even if we'd add rtnl lock/unlock pair to unregister_all (as done here), thats not enough either as rtnl_lock is released right before the dump process starts. So this adds a refcount: * acquire rtnl mutex * bump refcount * release mutex * start the dump ... and make unregister_all remove the callbacks (no new dumps possible) and then wait until refcount is 0. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09rtnetlink: make rtnl_register accept a flags parameterFlorian Westphal
This change allows us to later indicate to rtnetlink core that certain doit functions should be called without acquiring rtnl_mutex. This change should have no effect, we simply replace the last (now unused) calcit argument with the new flag. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09rtnetlink: call rtnl_calcit directlyFlorian Westphal
There is only a single place in the kernel that regisers the "calcit" callback (to determine min allocation for dumps). This is in rtnetlink.c for PF_UNSPEC RTM_GETLINK. The function that checks for calcit presence at run time will first check the requested family (which will always fail for !PF_UNSPEC as no callsite registers this), then falls back to checking PF_UNSPEC. Therefore we can just check if type is RTM_GETLINK and then do a direct call. Because of fallback to PF_UNSPEC all RTM_GETLINK types used this regardless of family. This has the advantage that we don't need to allocate space for the function pointer for all the other families. A followup patch will drop the calcit function pointer from the rtnl_link callback structure. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09Merge branch 'bpf-new-branches'David S. Miller
Daniel Borkmann says: ==================== bpf: Add BPF_J{LT,LE,SLT,SLE} instructions This set adds BPF_J{LT,LE,SLT,SLE} instructions to the BPF insn set, interpreter, JIT hardening code and all JITs are also updated to support the new instructions. Basic idea is to reduce register pressure by avoiding BPF_J{GT,GE,SGT,SGE} rewrites. Removing the workaround for the rewrites in LLVM, this can result in shorter BPF programs, less stack usage and less verification complexity. First patch provides some more details on rationale and integration. Thanks a lot! v1 -> v2: - Reworded commit msg in patch 1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09bpf: add test cases for new BPF_J{LT, LE, SLT, SLE} instructionsDaniel Borkmann
Add test cases to the verifier selftest suite in order to verify that i) direct packet access, and ii) dynamic map value access is working with the changes related to the new instructions. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09bpf: enable BPF_J{LT, LE, SLT, SLE} opcodes in verifierDaniel Borkmann
Enable the newly added jump opcodes, main parts are in two different areas, namely direct packet access and dynamic map value access. For the direct packet access, we now allow for the following two new patterns to match in order to trigger markings with find_good_pkt_pointers(): Variant 1 (access ok when taking the branch): 0: (61) r2 = *(u32 *)(r1 +76) 1: (61) r3 = *(u32 *)(r1 +80) 2: (bf) r0 = r2 3: (07) r0 += 8 4: (ad) if r0 < r3 goto pc+2 R0=pkt(id=0,off=8,r=0) R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R10=fp 5: (b7) r0 = 0 6: (95) exit from 4 to 7: R0=pkt(id=0,off=8,r=8) R1=ctx R2=pkt(id=0,off=0,r=8) R3=pkt_end R10=fp 7: (71) r0 = *(u8 *)(r2 +0) 8: (05) goto pc-4 5: (b7) r0 = 0 6: (95) exit processed 11 insns, stack depth 0 Variant 2 (access ok on fall-through): 0: (61) r2 = *(u32 *)(r1 +76) 1: (61) r3 = *(u32 *)(r1 +80) 2: (bf) r0 = r2 3: (07) r0 += 8 4: (bd) if r3 <= r0 goto pc+1 R0=pkt(id=0,off=8,r=8) R1=ctx R2=pkt(id=0,off=0,r=8) R3=pkt_end R10=fp 5: (71) r0 = *(u8 *)(r2 +0) 6: (b7) r0 = 1 7: (95) exit from 4 to 6: R0=pkt(id=0,off=8,r=0) R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R10=fp 6: (b7) r0 = 1 7: (95) exit processed 10 insns, stack depth 0 The above two basically just swap the branches where we need to handle an exception and allow packet access compared to the two already existing variants for find_good_pkt_pointers(). For the dynamic map value access, we add the new instructions to reg_set_min_max() and reg_set_min_max_inv() in order to learn bounds. Verifier test cases for both are added in a follow-up patch. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09bpf, nfp: implement jiting of BPF_J{LT,LE}Daniel Borkmann
This work implements jiting of BPF_J{LT,LE} instructions with BPF_X/BPF_K variants for the nfp eBPF JIT. The two BPF_J{SLT,SLE} instructions have not been added yet given BPF_J{SGT,SGE} are not supported yet either. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09bpf, ppc64: implement jiting of BPF_J{LT, LE, SLT, SLE}Daniel Borkmann
This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions with BPF_X/BPF_K variants for the ppc64 eBPF JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09bpf, s390x: implement jiting of BPF_J{LT, LE, SLT, SLE}Daniel Borkmann
This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions with BPF_X/BPF_K variants for the s390x eBPF JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09bpf, sparc64: implement jiting of BPF_J{LT, LE, SLT, SLE}Daniel Borkmann
This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions with BPF_X/BPF_K variants for the sparc64 eBPF JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09bpf, arm64: implement jiting of BPF_J{LT, LE, SLT, SLE}Daniel Borkmann
This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions with BPF_X/BPF_K variants for the arm64 eBPF JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09bpf, x86: implement jiting of BPF_J{LT,LE,SLT,SLE}Daniel Borkmann
This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions with BPF_X/BPF_K variants for the x86_64 eBPF JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09bpf: add BPF_J{LT,LE,SLT,SLE} instructionsDaniel Borkmann
Currently, eBPF only understands BPF_JGT (>), BPF_JGE (>=), BPF_JSGT (s>), BPF_JSGE (s>=) instructions, this means that particularly *JLT/*JLE counterparts involving immediates need to be rewritten from e.g. X < [IMM] by swapping arguments into [IMM] > X, meaning the immediate first is required to be loaded into a register Y := [IMM], such that then we can compare with Y > X. Note that the destination operand is always required to be a register. This has the downside of having unnecessarily increased register pressure, meaning complex program would need to spill other registers temporarily to stack in order to obtain an unused register for the [IMM]. Loading to registers will thus also affect state pruning since we need to account for that register use and potentially those registers that had to be spilled/filled again. As a consequence slightly more stack space might have been used due to spilling, and BPF programs are a bit longer due to extra code involving the register load and potentially required spill/fills. Thus, add BPF_JLT (<), BPF_JLE (<=), BPF_JSLT (s<), BPF_JSLE (s<=) counterparts to the eBPF instruction set. Modifying LLVM to remove the NegateCC() workaround in a PoC patch at [1] and allowing it to also emit the new instructions resulted in cilium's BPF programs that are injected into the fast-path to have a reduced program length in the range of 2-3% (e.g. accumulated main and tail call sections from one of the object file reduced from 4864 to 4729 insns), reduced complexity in the range of 10-30% (e.g. accumulated sections reduced in one of the cases from 116432 to 88428 insns), and reduced stack usage in the range of 1-5% (e.g. accumulated sections from one of the object files reduced from 824 to 784b). The modification for LLVM will be incorporated in a backwards compatible way. Plan is for LLVM to have i) a target specific option to offer a possibility to explicitly enable the extension by the user (as we have with -m target specific extensions today for various CPU insns), and ii) have the kernel checked for presence of the extensions and enable them transparently when the user is selecting more aggressive options such as -march=native in a bpf target context. (Other frontends generating BPF byte code, e.g. ply can probe the kernel directly for its code generation.) [1] https://github.com/borkmann/llvm/tree/bpf-insns Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09Merge branch 'net-zerocopy-fixes'David S. Miller
Willem de Bruijn says: ==================== net: zerocopy fixes Fix two issues introduced in the msg_zerocopy patchset. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09sock: fix zerocopy_success regression with msg_zerocopyWillem de Bruijn
Do not use uarg->zerocopy outside msg_zerocopy. In other paths the field is not explicitly initialized and aliases another field. Those paths have only one reference so do not need this intermediate variable. Call uarg->callback directly. Fixes: 1f8b977ab32d ("sock: enable MSG_ZEROCOPY") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09sock: fix zerocopy panic in mem accountingWillem de Bruijn
Only call mm_unaccount_pinned_pages when releasing a struct ubuf_info that has initialized its field uarg->mmp. Before this patch, a vhost-net with experimental_zcopytx can crash in mm_unaccount_pinned_pages sock_zerocopy_put skb_zcopy_clear skb_release_data Only sock_zerocopy_alloc initializes this field. Move the unaccount call from generic sock_zerocopy_put to its specific callback sock_zerocopy_callback. Fixes: a91dbff551a6 ("sock: ulimit on MSG_ZEROCOPY pages") Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09Merge branch '1GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 1GbE Intel Wired LAN Driver Updates 2017-08-08 This series contains updates to e1000e and igb/igbvf. Gangfeng Huang fixes an issue with receive network flow classification, where igb_nfc_filter_exit() was not being called in igb_down() which would cause the filter tables to "fill up" if a user where to change the adapter settings (such as speed) which requires a reset of the adapter. Cliff Spradlin fixes a timestamping issue, where igb was allowing requests for hardware timestamping even if it was not configured for hardware transmit timestamping. Corinna Vinschen removes the error message that there was an "unexpected SYS WRAP", when it is actually expected. So remove the message to not confuse users. Greg Edwards provides several patches for the mailbox interface between the PF and VF drivers. Added a mailbox unlock method to be used to unlock the PF/VF mailbox by the PF. Added a lock around the VF mailbox ops to prevent the VF from sending another message while the PF is still processing the previous message. Fixed a "scheduling while atomic" issue by changing msleep() to mdelay(). Sasha adds support for the next LOM generations i219 (v8 & v9) which will be available in the next Intel client platform IceLake. John Linville adds support for a Broadcom PHY to the igb driver, since there are designs out in the world which use the igb MAC and a third party PHY. This allows the driver to load and function as expected on these designs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The UDP offload conflict is dealt with by simply taking what is in net-next where we have removed all of the UFO handling code entirely. The TCP conflict was a case of local variables in a function being removed from both net and net-next. In netvsc we had an assignment right next to where a missing set of u64 stats sync object inits were added. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09Merge tag 'pinctrl-v4.13-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "These are the pin control fixes I have gathered since the return from my vacation. They boiled in -next a while so let's get them in. Apart from the documentation build it is purely driver fixes. Which is nice. The Intel fixes seem kind of important. - Fix the documentation build as the docs were moved - Correct the UART pin list on the Intel Merrifield - Fix pin assignment and number of pins on the Marvell Armada 37xx pin controller - Cover the Setzer models in the Chromebook DMI quirk in the Intel cheryview driver so they start working - Add the missing "sim" function to the sunxi driver - Fix USB pin definitions on Uniphier Pro4 - Smatch fix for invalid reference in the zx pin control driver" * tag 'pinctrl-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: generic: update references to Documentation/pinctrl.txt pinctrl: intel: merrifield: Correct UART pin lists pinctrl: armada-37xx: Fix number of pin in south bridge pinctrl: armada-37xx: Fix the pin 23 on south bridge pinctrl: cherryview: Add Setzer models to the Chromebook DMI quirk pinctrl: sunxi: add a missing function of A10/A20 pinctrl driver pinctrl: uniphier: fix USB3 pin assignment for Pro4 pinctrl: zte: fix dereference of 'data' in zx_set_mux()
2017-08-09futex: Remove unnecessary warning from get_futex_keyMel Gorman
Commit 65d8fc777f6d ("futex: Remove requirement for lock_page() in get_futex_key()") removed an unnecessary lock_page() with the side-effect that page->mapping needed to be treated very carefully. Two defensive warnings were added in case any assumption was missed and the first warning assumed a correct application would not alter a mapping backing a futex key. Since merging, it has not triggered for any unexpected case but Mark Rutland reported the following bug triggering due to the first warning. kernel BUG at kernel/futex.c:679! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 3695 Comm: syz-executor1 Not tainted 4.13.0-rc3-00020-g307fec773ba3 #3 Hardware name: linux,dummy-virt (DT) task: ffff80001e271780 task.stack: ffff000010908000 PC is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679 LR is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679 pc : [<ffff00000821ac14>] lr : [<ffff00000821ac14>] pstate: 80000145 The fact that it's a bug instead of a warning was due to an unrelated arm64 problem, but the warning itself triggered because the underlying mapping changed. This is an application issue but from a kernel perspective it's a recoverable situation and the warning is unnecessary so this patch removes the warning. The warning may potentially be triggered with the following test program from Mark although it may be necessary to adjust NR_FUTEX_THREADS to be a value smaller than the number of CPUs in the system. #include <linux/futex.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #include <sys/syscall.h> #include <sys/time.h> #include <unistd.h> #define NR_FUTEX_THREADS 16 pthread_t threads[NR_FUTEX_THREADS]; void *mem; #define MEM_PROT (PROT_READ | PROT_WRITE) #define MEM_SIZE 65536 static int futex_wrapper(int *uaddr, int op, int val, const struct timespec *timeout, int *uaddr2, int val3) { syscall(SYS_futex, uaddr, op, val, timeout, uaddr2, val3); } void *poll_futex(void *unused) { for (;;) { futex_wrapper(mem, FUTEX_CMP_REQUEUE_PI, 1, NULL, mem + 4, 1); } } int main(int argc, char *argv[]) { int i; mem = mmap(NULL, MEM_SIZE, MEM_PROT, MAP_SHARED | MAP_ANONYMOUS, -1, 0); printf("Mapping @ %p\n", mem); printf("Creating futex threads...\n"); for (i = 0; i < NR_FUTEX_THREADS; i++) pthread_create(&threads[i], NULL, poll_futex, NULL); printf("Flipping mapping...\n"); for (;;) { mmap(mem, MEM_SIZE, MEM_PROT, MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, -1, 0); } return 0; } Reported-and-tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org # 4.7+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-09Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "The main thing is to allow empty id_tables for ACPI to make some drivers get probed again. It looks a bit bigger than usual because it needs some internal renaming, too. Other than that, there is a fix for broken DSTDs, a super simple enablement for ARM MPS, and two documentation fixes which I'd like to see in v4.13 already" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: rephrase explanation of I2C_CLASS_DEPRECATED i2c: allow i2c-versatile for ARM MPS platforms i2c: designware: Some broken DSTDs use 1MiHz instead of 1MHz i2c: designware: Print clock freq on invalid clock freq error i2c: core: Allow empty id_table in ACPI case as well i2c: mux: pinctrl: mention correct module name in Kconfig help text
2017-08-09Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Three patches that should go into this release. Two of them are from Paolo and fix up some corner cases with BFQ, and the last patch is from Ming and fixes up a potential usage count imbalance regression due to the recent NOWAIT work" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: don't leak preempt counter/q_usage_counter when allocating rq failed block, bfq: consider also in_service_entity to state whether an entity is active block, bfq: reset in_service_entity if it becomes idle
2017-08-09Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "Fix two regressions in the inside-secure driver with respect to hmac(sha1)" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: inside-secure - fix the sha state length in hmac_sha1_setkey crypto: inside-secure - fix invalidation check in hmac_sha1_setkey
2017-08-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "The pull requests are getting smaller, that's progress I suppose :-) 1) Fix infinite loop in CIPSO option parsing, from Yujuan Qi. 2) Fix remote checksum handling in VXLAN and GUE tunneling drivers, from Koichiro Den. 3) Missing u64_stats_init() calls in several drivers, from Florian Fainelli. 4) TCP can set the congestion window to an invalid ssthresh value after congestion window reductions, from Yuchung Cheng. 5) Fix BPF jit branch generation on s390, from Daniel Borkmann. 6) Correct MIPS ebpf JIT merge, from David Daney. 7) Correct byte order test in BPF test_verifier.c, from Daniel Borkmann. 8) Fix various crashes and leaks in ASIX driver, from Dean Jenkins. 9) Handle SCTP checksums properly in mlx4 driver, from Davide Caratti. 10) We can potentially enter tcp_connect() with a cached route already, due to fastopen, so we have to explicitly invalidate it. 11) skb_warn_bad_offload() can bark in legitimate situations, fix from Willem de Bruijn" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits) net: avoid skb_warn_bad_offload false positives on UFO qmi_wwan: fix NULL deref on disconnect ppp: fix xmit recursion detection on ppp channels rds: Reintroduce statistics counting tcp: fastopen: tcp_connect() must refresh the route net: sched: set xt_tgchk_param par.net properly in ipt_init_target net: dsa: mediatek: add adjust link support for user ports net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets qed: Fix a memory allocation failure test in 'qed_mcp_cmd_init()' hysdn: fix to a race condition in put_log_buffer s390/qeth: fix L3 next-hop in xmit qeth hdr asix: Fix small memory leak in ax88772_unbind() asix: Ensure asix_rx_fixup_info members are all reset asix: Add rx->ax_skb = NULL after usbnet_skb_return() bpf: fix selftest/bpf/test_pkt_md_access on s390x netvsc: fix race on sub channel creation bpf: fix byte order test in test_verifier xgene: Always get clk source, but ignore if it's missing for SGMII ports MIPS: Add missing file for eBPF JIT. bpf, s390: fix build for libbpf and selftest suite ...
2017-08-08net: ipv6: avoid overhead when no custom FIB rules are installedVincent Bernat
If the user hasn't installed any custom rules, don't go through the whole FIB rules layer. This is pretty similar to f4530fa574df (ipv4: Avoid overhead when no custom FIB rules are installed). Using a micro-benchmark module [1], timing ip6_route_output() with get_cycles(), with 40,000 routes in the main routing table, before this patch: min=606 max=12911 count=627 average=1959 95th=4903 90th=3747 50th=1602 mad=821 table=254 avgdepth=21.8 maxdepth=39 value │ ┊ count 600 │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 199 880 │▒▒▒░░░░░░░░░░░░░░░░ 43 1160 │▒▒▒░░░░░░░░░░░░░░░░░░░░ 48 1440 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░ 43 1720 │▒▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░ 59 2000 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 50 2280 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 26 2560 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 31 2840 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 28 3120 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 17 3400 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 17 3680 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8 3960 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 11 4240 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 6 4520 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 6 4800 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 9 After: min=544 max=11687 count=627 average=1776 95th=4546 90th=3585 50th=1227 mad=565 table=254 avgdepth=21.8 maxdepth=39 value │ ┊ count 540 │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 201 800 │▒▒▒▒▒░░░░░░░░░░░░░░░░ 63 1060 │▒▒▒▒▒░░░░░░░░░░░░░░░░░░░░░ 68 1320 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░ 39 1580 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 32 1840 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 32 2100 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 34 2360 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 33 2620 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 26 2880 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 22 3140 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 9 3400 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8 3660 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 9 3920 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8 4180 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8 4440 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8 At the frequency of the host during the bench (~ 3.7 GHz), this is about a 100 ns difference on the median value. A next step would be to collapse local and main tables, as in 0ddcf43d5d4a (ipv4: FIB Local/MAIN table collapse). [1]: https://github.com/vincentbernat/network-lab/blob/master/lab-routes-ipv6/kbench_mod.c Signed-off-by: Vincent Bernat <vincent@bernat.im> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>