summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-01-10cxgb4: Fixes static checker warning in mps_tcam_show()Hariprasad Shenai
The commit 115b56af88b5 ("cxgb4: Update mps_tcam output to include T6 fields") from Dec 23, 2015, leads to the following static checker warning: drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c:1735 mps_tcam_show() warn: we tested 'lookup_type' before and it was 'true' Fixing it. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10Merge branch 'emac-RK3036'David S. Miller
Xing Zheng says: ==================== Add support emac for the RK3036 SoC platform We have supported the emac for RK3066/RK3188, but the RK3036 have some configuration different with them. We should let the driver of emac_rockchip compatible with other Rockchip SoCs. Changes in v2: - Separate DTS from patch series. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10net: ethernet: arc: Add support emac for RK3036Xing Zheng
The RK3036's GRFs offset are different with RK3066/RK3188, and need to set mac TX/RX clock before probe emac. Signed-off-by: Xing Zheng <zhengxing@rock-chips.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10net: ethernet: arc: Keep emac compatibility for more Rockchip SoCsXing Zheng
On the RK3066/RK3188, there was fixed GRF offset configuration to set emac and fixed DIV2 mac TX/RX clock. So, we need to easily set and fit to other SoCs (RK3036) which maybe have different GRF offset, and need adjust mac TX/RX clock. Signed-off-by: Xing Zheng <zhengxing@rock-chips.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10net: ethernet: arc: Probe emac after set RMII clockXing Zheng
After enter arc_emac_probe, emac will get_phy_id, phy_poll_reset and other connecting PHY via mdiobus_read, so we need to set correct ref clock rate for emac before probe emac. Signed-off-by: Xing Zheng <zhengxing@rock-chips.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10Merge branch 'bnxt_en-zeropad-fw-and-reset'David S. Miller
Michael Chan says: ==================== bnxt_en: Zero pad fw messages and add fw reset. 2 patches related to firmware for net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10bnxt_en: Reset embedded processor after applying firmware upgradeRob Swindell
Use HWRM_FW_RESET command to request a self-reset of the embedded processor(s) after successfully applying a firmware update. For boot processor, the self-reset is currently deferred until the next PCIe reset. Signed-off-by: Rob Swindell <swindell@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10bnxt_en: Zero pad firmware messages to 128 bytes.Michael Chan
For future compatibility, zero pad all messages that the driver sends to the firmware to 128 bytes. If these messages are extended in the future with new byte enables, zero padding these messages now will guarantee future compatibility. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10net, sched: add clsact qdiscDaniel Borkmann
This work adds a generalization of the ingress qdisc as a qdisc holding only classifiers. The clsact qdisc works on ingress, but also on egress. In both cases, it's execution happens without taking the qdisc lock, and the main difference for the egress part compared to prior version of [1] is that this can be applied with _any_ underlying real egress qdisc (also classless ones). Besides solving the use-case of [1], that is, allowing for more programmability on assigning skb->priority for the mqprio case that is supported by most popular 10G+ NICs, it also opens up a lot more flexibility for other tc applications. The main work on classification can already be done at clsact egress time if the use-case allows and state stored for later retrieval f.e. again in skb->priority with major/minors (which is checked by most classful qdiscs before consulting tc_classify()) and/or in other skb fields like skb->tc_index for some light-weight post-processing to get to the eventual classid in case of a classful qdisc. Another use case is that the clsact egress part allows to have a central egress counterpart to the ingress classifiers, so that classifiers can easily share state (e.g. in cls_bpf via eBPF maps) for ingress and egress. Currently, default setups like mq + pfifo_fast would require for this to use, for example, prio qdisc instead (to get a tc_classify() run) and to duplicate the egress classifier for each queue. With clsact, it allows for leaving the setup as is, it can additionally assign skb->priority to put the skb in one of pfifo_fast's bands and it can share state with maps. Moreover, we can access the skb's dst entry (f.e. to retrieve tclassid) w/o the need to perform a skb_dst_force() to hold on to it any longer. In lwt case, we can also use this facility to setup dst metadata via cls_bpf (bpf_skb_set_tunnel_key()) without needing a real egress qdisc just for that (case of IFF_NO_QUEUE devices, for example). The realization can be done without any changes to the scheduler core framework. All it takes is that we have two a-priori defined minors/child classes, where we can mux between ingress and egress classifier list (dev->ingress_cl_list and dev->egress_cl_list, latter stored close to dev->_tx to avoid extra cacheline miss for moderate loads). The egress part is a bit similar modelled to handle_ing() and patched to a noop in case the functionality is not used. Both handlers are now called sch_handle_ingress() and sch_handle_egress(), code sharing among the two doesn't seem practical as there are various minor differences in both paths, so that making them conditional in a single handler would rather slow things down. Full compatibility to ingress qdisc is provided as well. Since both piggyback on TC_H_CLSACT, only one of them (ingress/clsact) can exist per netdevice, and thus ingress qdisc specific behaviour can be retained for user space. This means, either a user does 'tc qdisc add dev foo ingress' and configures ingress qdisc as usual, or the 'tc qdisc add dev foo clsact' alternative, where both, ingress and egress classifier can be configured as in the below example. ingress qdisc supports attaching classifier to any minor number whereas clsact has two fixed minors for muxing between the lists, therefore to not break user space setups, they are better done as two separate qdiscs. I decided to extend the sch_ingress module with clsact functionality so that commonly used code can be reused, the module is being aliased with sch_clsact so that it can be auto-loaded properly. Alternative would have been to add a flag when initializing ingress to alter its behaviour plus aliasing to a different name (as it's more than just ingress). However, the first would end up, based on the flag, choosing the new/old behaviour by calling different function implementations to handle each anyway, the latter would require to register ingress qdisc once again under different alias. So, this really begs to provide a minimal, cleaner approach to have Qdisc_ops and Qdisc_class_ops by its own that share callbacks used by both. Example, adding qdisc: # tc qdisc add dev foo clsact # tc qdisc show dev foo qdisc mq 0: root qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc clsact ffff: parent ffff:fff1 Adding filters (deleting, etc works analogous by specifying ingress/egress): # tc filter add dev foo ingress bpf da obj bar.o sec ingress # tc filter add dev foo egress bpf da obj bar.o sec egress # tc filter show dev foo ingress filter protocol all pref 49152 bpf filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action # tc filter show dev foo egress filter protocol all pref 49152 bpf filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action A 'tc filter show dev foo' or 'tc filter show dev foo parent ffff:' will show an empty list for clsact. Either using the parent names (ingress/egress) or specifying the full major/minor will then show the related filter lists. Prior work on a mqprio prequeue() facility [1] was done mainly by John Fastabend. [1] http://patchwork.ozlabs.org/patch/512949/ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10ethernet: amd: au1000: Remove pointless warningAndrew Lunn
The warning about being able to read any MDIO device, not just the attached ethernet devices PHY applies to all MDIO drivers. So remove it. This also removes a reference to a member in phy_device which has moved. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10staging: netlogic: Fix build error due to missed API changeAndrew Lunn
Fix a number of build errors due to moving the phy_map and centralizing interrupt allocation. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10net: ethernet: faraday: Use phy_find_first() instead of open coding itGuenter Roeck
Use phy_find_first() to find the first phy device instead of open coding it. Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10net: ethernet: broadcom: Fix build errorsGuenter Roeck
Commit 7f854420fbfe ("phy: Add API for {un}registering an mdio device to a bus") introduces an API to access mii_bus structures, but missed to update the sb1250 driver. This results in the following build error. drivers/net/ethernet/broadcom/sb1250-mac.c: In function 'sbmac_mii_probe': drivers/net/ethernet/broadcom/sb1250-mac.c:2360:24: error: 'struct mii_bus' has no member named 'phy_map' Use phy_find_first() instead of open coding it. Commit 2220943a21e2 ("phy: Centralise print about attached phy") introduces the following build error. drivers/net/ethernet/broadcom/sb1250-mac.c: In function 'sbmac_mii_probe': drivers/net/ethernet/broadcom/sb1250-mac.c:2383:20: error: 'phydev' undeclared Fixes: 7f854420fbfe ("phy: Add API for {un}registering an mdio device to a bus") Fixes: 2220943a21e2 ("phy: Centralise print about attached phy") Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10Merge branch 'mdio-device-fixes'David S. Miller
Andrew Lunn says: ==================== Fix breakage from mdio device These two patches fix MIPS platforms which got broken by the recent mdio device patchset. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10net: ethernet-rgmii.c: Fix breakage from moving phdev busAndrew Lunn
The mdio device patches moved the bus member in phy_device into a substructure. This driver got missed. Fix it. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10net: lantiq_etop.c: Use helper to find first phyAndrew Lunn
Make use of the helper to find the first phy device. This also fixes the compile breakage. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10stmmac: Don't exit mdio registration when mdio subnode is not found in the DTSRomain Perier
Originally, most of the platforms using this driver did not define an mdio subnode in the devicetree. Commit e34d65 ("stmmac: create of compatible mdio bus for stmmac driver") introduced a backward compatibily issue by using of_mdiobus_register explicitly with an mdio subnode. This patch fixes the issue by calling the function mdiobus_register, when mdio subnode is not found. The driver is now compatible with both modes. Fixes: e34d65696d2e ("stmmac: create of compatible mdio bus for stmmac driver") Signed-off-by: Romain Perier <romain.perier@gmail.com> Tested-by: Phil Reid <preid@electromag.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10Linux 4.4v4.4Linus Torvalds
2016-01-10net: sctp: prevent writes to cookie_hmac_alg from accessing invalid memorySasha Levin
proc_dostring() needs an initialized destination string, while the one provided in proc_sctp_do_hmac_alg() contains stack garbage. Thus, writing to cookie_hmac_alg would strlen() that garbage and end up accessing invalid memory. Fixes: 3c68198e7 ("sctp: Make hmac algorithm selection for cookie generation dynamic") Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10net: qmi_wwan: Add SIMCom 7230EKristian Evensen
SIMCom 7230E is a QMI LTE module with support for most "normal" bands. Manual testing has showed that only interface five works. Cc: Bjørn Mork <bjorn@mork.no> Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10Merge branch 'bpf-next'David S. Miller
Daniel Borkmann says: ==================== BPF update Fixes a csum issue on ingress. As mentioned previously, net-next seems just fine imho. Later on, will follow up with couple of replacements like ovs_skb_postpush_rcsum() etc. Thanks! v1 -> v2: - Added patch 1 with helper - Implemented Hannes' idea to just use csum_partial, thanks! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10bpf: add skb_postpush_rcsum and fix dev_forward_skb occasionsDaniel Borkmann
Add a small helper skb_postpush_rcsum() and fix up redirect locations that need CHECKSUM_COMPLETE fixups on ingress. dev_forward_skb() expects a proper csum that covers also Ethernet header, f.e. since 2c26d34bbcc0 ("net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding"), we also do skb_postpull_rcsum() after pulling Ethernet header off via eth_type_trans(). When using eBPF in a netns setup f.e. with vxlan in collect metadata mode, I can trigger the following csum issue with an IPv6 setup: [ 505.144065] dummy1: hw csum failure [...] [ 505.144108] Call Trace: [ 505.144112] <IRQ> [<ffffffff81372f08>] dump_stack+0x44/0x5c [ 505.144134] [<ffffffff81607cea>] netdev_rx_csum_fault+0x3a/0x40 [ 505.144142] [<ffffffff815fee3f>] __skb_checksum_complete+0xcf/0xe0 [ 505.144149] [<ffffffff816f0902>] nf_ip6_checksum+0xb2/0x120 [ 505.144161] [<ffffffffa08c0e0e>] icmpv6_error+0x17e/0x328 [nf_conntrack_ipv6] [ 505.144170] [<ffffffffa0898eca>] ? ip6t_do_table+0x2fa/0x645 [ip6_tables] [ 505.144177] [<ffffffffa08c0725>] ? ipv6_get_l4proto+0x65/0xd0 [nf_conntrack_ipv6] [ 505.144189] [<ffffffffa06c9a12>] nf_conntrack_in+0xc2/0x5a0 [nf_conntrack] [ 505.144196] [<ffffffffa08c039c>] ipv6_conntrack_in+0x1c/0x20 [nf_conntrack_ipv6] [ 505.144204] [<ffffffff8164385d>] nf_iterate+0x5d/0x70 [ 505.144210] [<ffffffff816438d6>] nf_hook_slow+0x66/0xc0 [ 505.144218] [<ffffffff816bd302>] ipv6_rcv+0x3f2/0x4f0 [ 505.144225] [<ffffffff816bca40>] ? ip6_make_skb+0x1b0/0x1b0 [ 505.144232] [<ffffffff8160b77b>] __netif_receive_skb_core+0x36b/0x9a0 [ 505.144239] [<ffffffff8160bdc8>] ? __netif_receive_skb+0x18/0x60 [ 505.144245] [<ffffffff8160bdc8>] __netif_receive_skb+0x18/0x60 [ 505.144252] [<ffffffff8160ccff>] process_backlog+0x9f/0x140 [ 505.144259] [<ffffffff8160c4a5>] net_rx_action+0x145/0x320 [...] What happens is that on ingress, we push Ethernet header back in, either from cls_bpf or right before skb_do_redirect(), but without updating csum. The "hw csum failure" can be fixed by using the new skb_postpush_rcsum() helper for the dev_forward_skb() case to correct the csum diff again. Thanks to Hannes Frederic Sowa for the csum_partial() idea! Fixes: 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper") Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10net, sched: add skb_at_tc_ingress helperDaniel Borkmann
Add a skb_at_tc_ingress() as this will be needed elsewhere as well and can hide the ugly ifdef. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10Merge branch 'tcp-keepalive-namespaceify'David S. Miller
Nikolay Borisov says: ==================== Namespaceify tcp keepalive machinery The following patch series enables the tcp keepalive mechanism to be configured per net namespace. This is especially useful if you have multiple containers hosted on one node and one of them is under DoS- in such situations one thing which could be done is to configure the tcp keepalive settings such that connections for that particular container are being reset faster. Another scenario where not being able to control those knob comes per container is problematic is occurs the value of net.netfilter.nf_conntrack_tcp_timeout_established is set below the keepalive interval, in such situations the server won't send an RST packet resulting in applications not trying to reconnect and stale connection waiting. Changing the global keepalive value is a possible solution but it might interfere with other containers. The three patches gradually convert each of the affected knobs to be per netns. I thought it would be easier for review than put everything in one patch. If people deem it more appropriate to squash everything in one patch (maybe after review) I'd be more than happy to do it. The patches have been compile-tested on 4.4 and functionally tested on 3.12 and they work as expected. These are based off 4.4-rc8 ==================== Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10ipv4: Namespecify the tcp_keepalive_intvl sysctl knobNikolay Borisov
This is the final part required to namespaceify the tcp keep alive mechanism. Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10ipv4: Namespecify tcp_keepalive_probes sysctl knobNikolay Borisov
This is required to have full tcp keepalive mechanism namespace support. Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10ipv4: Namespaceify tcp_keepalive_time sysctl knobNikolay Borisov
Different net namespaces might have different requirements as to the keepalive time of tcp sockets. This might be required in cases where different firewall rules are in place which require tcp timeout sockets to be increased/decreased independently of the host. Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10udp: restrict offloads to one namespaceHannes Frederic Sowa
udp tunnel offloads tend to aggregate datagrams based on inner headers. gro engine gets notified by tunnel implementations about possible offloads. The match is solely based on the port number. Imagine a tunnel bound to port 53, the offloading will look into all DNS packets and tries to aggregate them based on the inner data found within. This could lead to data corruption and malformed DNS packets. While this patch minimizes the problem and helps an administrator to find the issue by querying ip tunnel/fou, a better way would be to match on the specific destination ip address so if a user space socket is bound to the same address it will conflict. Cc: Tom Herbert <tom@herbertland.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10Merge branch 'mlxsw-layer2-multicast'David S. Miller
Jiri Pirko says: ==================== mlxsw: Adding layer 2 multicast Elad says: This patchset add Linux hardware reflection for L2 multicast offload and add MC support in mlxsw. For every bridge MDB entry insertion, either by IGMP snooping or by static insertion/removal, a switchdev ops is been called. In mlxsw, a new multicast group (MID) is been created and ports are assigned. When all ports are removed, the multicast group is been deleted. --- v1->v2: - GFP_ATOMIC->GFP_KERNEL change in patch 7/8 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10switchdev: Adding IGMP snooping documentationElad Raz
Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10mlxsw: Adding layer 2 multicast supportElad Raz
Add SWITCHDEV_OBJ_ID_PORT_MDB switchdev ops support. On first MDB insertion creates a new multicast group (MID) and add members port to the MID. Also add new MDB entry for the flooding-domain (fid-vid) and link the MDB entry to the newly constructed MC group. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10mlxsw: Adding VID to FID translatationElad Raz
Adding a generic function that translate VID to FID. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10mlxsw: Changing the maximum number of multicast group to a defineElad Raz
Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10mlxsw: reg: Adding SMID registerElad Raz
Adding back SMID register definition and packing. For each MC group a new SMID entry will be generated. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10mlxsw: reg: Add definition of multicast record for SFD registerElad Raz
Multicast-related records have specific format in SFD register. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10bridge: Reflect MDB entries to hardwareElad Raz
Offload MDB changes per port to hardware Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10switchdev: Adding MDB entry offloadElad Raz
Define HW multicast entry: MAC and VID. Using a MAC address simplifies support for both IPV4 and IPv6. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10um: Use race-free temporary file creationMickaël Salaün
Open the memory mapped file with the O_TMPFILE flag when available. Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Acked-by: Tristan Schmelcher <tschmelcher@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10um: Do not set unsecure permission for temporary fileMickaël Salaün
Remove the insecure 0777 mode for temporary file to prohibit other users to change the executable mapped code. An attacker could gain access to the mapped file descriptor from the temporary file (before it is unlinked) in a read-only mode but it should not be accessible in write mode to avoid arbitrary code execution. To not change the hostfs behavior, the temporary file creation permission now depends on the current umask(2) and the implementation of mkstemp(3). Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Acked-by: Tristan Schmelcher <tschmelcher@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10um: Fix build error and kconfig for i386Mickaël Salaün
Fix build error by generating elfcore.o only when ELF_CORE (depending on COREDUMP) is selected: arch/x86/um/built-in.o: In function `elf_core_write_extra_phdrs': (.text+0x3e62): undefined reference to `dump_emit' arch/x86/um/built-in.o: In function `elf_core_write_extra_data': (.text+0x3eef): undefined reference to `dump_emit' Fixes: 5d2acfc7b974 ("kconfig: make allnoconfig disable options behind EMBEDDED and EXPERT") Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2016-01-10um: Add seccomp supportMickaël Salaün
This brings SECCOMP_MODE_STRICT and SECCOMP_MODE_FILTER support through prctl(2) and seccomp(2) to User-mode Linux for i386 and x86_64 subarchitectures. secure_computing() is called first in handle_syscall() so that the syscall emulation will be aborted quickly if matching a seccomp rule. This is inspired from Meredydd Luff's patch (https://gerrit.chromium.org/gerrit/21425). Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: James Hogan <james.hogan@imgtec.com> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: David Drysdale <drysdale@google.com> Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Kees Cook <keescook@chromium.org>
2016-01-10um: Add full asm/syscall.h supportMickaël Salaün
Add subarchitecture-independent implementation of asm-generic/syscall.h allowing access to user system call parameters and results: * syscall_get_nr() * syscall_rollback() * syscall_get_error() * syscall_get_return_value() * syscall_set_return_value() * syscall_get_arguments() * syscall_set_arguments() * syscall_get_arch() provided by arch/x86/um/asm/syscall.h This provides the necessary syscall helpers needed by HAVE_ARCH_SECCOMP_FILTER plus syscall_get_error(). This is inspired from Meredydd Luff's patch (https://gerrit.chromium.org/gerrit/21425). Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: David Drysdale <drysdale@google.com> Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Kees Cook <keescook@chromium.org>
2016-01-10selftests/seccomp: Remove the need for HAVE_ARCH_TRACEHOOKMickaël Salaün
Some architectures do not implement PTRACE_GETREGSET nor PTRACE_SETREGSET (required by HAVE_ARCH_TRACEHOOK) but only implement PTRACE_GETREGS and PTRACE_SETREGS (e.g. User-mode Linux). This improve seccomp selftest portability for architectures without HAVE_ARCH_TRACEHOOK support by defining a new trigger HAVE_GETREGS. For now, this is only enabled for i386 and x86_64 architectures. This is required to be able to run this tests on User-mode Linux. Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: David Drysdale <drysdale@google.com> Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Kees Cook <keescook@chromium.org>
2016-01-10um: Fix ptrace GETREGS/SETREGS bugsMickaël Salaün
This fix two related bugs: * PTRACE_GETREGS doesn't get the right orig_ax (syscall) value * PTRACE_SETREGS can't set the orig_ax value (erased by initial value) Get rid of the now useless and error-prone get_syscall(). Fix inconsistent behavior in the ptrace implementation for i386 when updating orig_eax automatically update the syscall number as well. This is now updated in handle_syscall(). Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Thomas Meyer <thomas@m3y3r.de> Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Anton Ivanov <aivanov@brocade.com> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: David Drysdale <drysdale@google.com> Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Kees Cook <keescook@chromium.org>
2016-01-10um: link with -lpthreadVegard Nossum
Similarly to commit fb1770aa78a43530940d0c2dd161e77bc705bdac, with gcc 5 on Ubuntu and CONFIG_STATIC_LINK=y I was seeing these linker errors: /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/librt.a(timer_create.o): In function `__timer_create_new': (.text+0xcd): undefined reference to `pthread_once' /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/librt.a(timer_create.o): In function `__timer_create_new': (.text+0x126): undefined reference to `pthread_attr_init' /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/librt.a(timer_create.o): In function `__timer_create_new': (.text+0x168): undefined reference to `pthread_attr_setdetachstate' [...] Obviously we also need -lpthread for librt.a. Cc: stable@vger.kernel.org # 4.4 Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10um: Update UBD to use pread/pwrite family of functionsAnton Ivanov
This decreases the number of syscalls per read/write by half. Signed-off-by: Anton Ivanov <aivanov@brocade.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10um: Do not change hard IRQ flags in soft IRQ processingAnton Ivanov
Software IRQ processing in generic architectures assumes that the exit out of hard IRQ may have re-enabled interrupts (some architectures may have an implicit EOI). It presumes them enabled and toggles the flags once more just in case unless this is turned off in the architecture specific hardirq.h by setting __ARCH_IRQ_EXIT_IRQS_DISABLED This patch adds this to UML where due to the way IRQs are handled it is an optimization (it works fine without it too). Signed-off-by: Anton Ivanov <aivanov@brocade.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10um: Prevent IRQ handler reentrancyAnton Ivanov
The existing IRQ handler design in UML does not prevent reentrancy This is mitigated by fd-enable/fd-disable semantics for the IO portion of the UML subsystem. The timer, however, can and is re-entered resulting in very deep stack usage and occasional stack exhaustion. This patch prevents this by checking if there is a timer interrupt in-flight before processing any pending timer interrupts. Signed-off-by: Anton Ivanov <aivanov@brocade.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10uml: flush stdout before forkingVegard Nossum
I was seeing some really weird behaviour where piping UML's output somewhere would cause output to get duplicated: $ ./vmlinux | head -n 40 Checking that ptrace can change system call numbers...Core dump limits : soft - 0 hard - NONE OK Checking syscall emulation patch for ptrace...Core dump limits : soft - 0 hard - NONE OK Checking advanced syscall emulation patch for ptrace...Core dump limits : soft - 0 hard - NONE OK Core dump limits : soft - 0 hard - NONE This is because these tests do a fork() which duplicates the non-empty stdout buffer, then glibc flushes the duplicated buffer as each child exits. A simple workaround is to flush before forking. Cc: stable@vger.kernel.org Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10uml: fix hostfs mknod()Vegard Nossum
An inverted return value check in hostfs_mknod() caused the function to return success after handling it as an error (and cleaning up). It resulted in the following segfault when trying to bind() a named unix socket: Pid: 198, comm: a.out Not tainted 4.4.0-rc4 RIP: 0033:[<0000000061077df6>] RSP: 00000000daae5d60 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 000000006092a460 RCX: 00000000dfc54208 RDX: 0000000061073ef1 RSI: 0000000000000070 RDI: 00000000e027d600 RBP: 00000000daae5de0 R08: 00000000da980ac0 R09: 0000000000000000 R10: 0000000000000003 R11: 00007fb1ae08f72a R12: 0000000000000000 R13: 000000006092a460 R14: 00000000daaa97c0 R15: 00000000daaa9a88 Kernel panic - not syncing: Kernel mode fault at addr 0x40, ip 0x61077df6 CPU: 0 PID: 198 Comm: a.out Not tainted 4.4.0-rc4 #1 Stack: e027d620 dfc54208 0000006f da981398 61bee000 0000c1ed daae5de0 0000006e e027d620 dfcd4208 00000005 6092a460 Call Trace: [<60dedc67>] SyS_bind+0xf7/0x110 [<600587be>] handle_syscall+0x7e/0x80 [<60066ad7>] userspace+0x3e7/0x4e0 [<6006321f>] ? save_registers+0x1f/0x40 [<6006c88e>] ? arch_prctl+0x1be/0x1f0 [<60054985>] fork_handler+0x85/0x90 Let's also get rid of the "cosmic ray protection" while we're at it. Fixes: e9193059b1b3 "hostfs: fix races in dentry_name() and inode_name()" Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Richard Weinberger <richard@nod.at>