summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-12-15net: sched: fix clsact init error pathJiri Pirko
Since in qdisc_create, the destroy op is called when init fails, we don't do cleanup in init and leave it up to destroy. This fixes use-after-free when trying to put already freed block. Fixes: 6e40cf2d4dee ("net: sched: use extended variants of block_get/put in ingress and clsact qdiscs") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: phy: broadcom: Add entry for 5395 switch PHYsFlorian Fainelli
Add an entry for the builtin PHYs present in the Broadcom BCM5395 switch. This allows us to retrieve the PHY statistics among other things. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Chris Healy <cphealy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - fix the s2ram regression related to confusion around segment register restoration, plus related cleanups that make the code more robust - a guess-unwinder Kconfig dependency fix - an isoimage build target fix for certain tool chain combinations - instruction decoder opcode map fixes+updates, and the syncing of the kernel decoder headers to the objtool headers - a kmmio tracing fix - two 5-level paging related fixes - a topology enumeration fix on certain SMP systems" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Resync objtool's instruction decoder source code copy with the kernel's latest version x86/decoder: Fix and update the opcodes map x86/power: Make restore_processor_context() sane x86/power/32: Move SYSENTER MSR restoration to fix_processor_context() x86/power/64: Use struct desc_ptr for the IDT in struct saved_context x86/unwinder/guess: Prevent using CONFIG_UNWINDER_GUESS=y with CONFIG_STACKDEPOT=y x86/build: Don't verify mtools configuration file for isoimage x86/mm/kmmio: Fix mmiotrace for page unaligned addresses x86/boot/compressed/64: Print error if 5-level paging is not supported x86/boot/compressed/64: Detect and handle 5-level paging at boot-time x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation
2017-12-15Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Misc fixes: - Fix a S390 boot hang that was caused by the lock-break logic. Remove lock-break to begin with, as review suggested it was unreasonably fragile and our confidence in its continued good health is lower than our confidence in its removal. - Remove the lockdep cross-release checking code for now, because of unresolved false positive warnings. This should make lockdep work well everywhere again. - Get rid of the final (and single) ACCESS_ONCE() straggler and remove the API from v4.15. - Fix a liblockdep build warning" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools/lib/lockdep: Add missing declaration of 'pr_cont()' checkpatch: Remove ACCESS_ONCE() warning compiler.h: Remove ACCESS_ONCE() tools/include: Remove ACCESS_ONCE() tools/perf: Convert ACCESS_ONCE() to READ_ONCE() locking/lockdep: Remove the cross-release locking checks locking/core: Remove break_lock field when CONFIG_GENERIC_LOCKBREAK=y locking/core: Fix deadlock during boot on systems with GENERIC_LOCKBREAK
2017-12-15Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two fixes: a crash fix for an ARM SoC platform, and kernel-doc warnings fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/rt: Do not pull from current CPU if only one CPU to pull sched/core: Fix kernel-doc warnings after code movement
2017-12-15Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf tooling fix from Ingo Molnar: "Synchronize kernel <-> tooling headers to resolve two build warnings in the perf build" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools/headers: Synchronize kernel <-> tooling headers
2017-12-15Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull early_ioremap fix from Ingo Molnar: "A boot hang fix when the EFI earlyprintk driver is enabled" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep
2017-12-15Merge tag 'for-linus-4.15-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Two minor fixes for running as Xen dom0: - when built as 32 bit kernel on large machines the Xen LAPIC emulation should report a rather modern LAPIC in order to support enough APIC-Ids - The Xen LAPIC emulation is needed for dom0 only, so build it only for kernels supporting to run as Xen dom0" * tag 'for-linus-4.15-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: XEN_ACPI_PROCESSOR is Dom0-only x86/Xen: don't report ancient LAPIC version
2017-12-15SUNRPC: Fix a race in the receive code pathTrond Myklebust
We must ensure that the call to rpc_sleep_on() in xprt_transmit() cannot race with the call to xprt_complete_rqst(). Reported-by: Chuck Lever <chuck.lever@oracle.com> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=317 Fixes: ce7c252a8c74 ("SUNRPC: Add a separate spinlock to protect..") Cc: stable@vger.kernel.org # 4.14+ Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-12-15nfs: don't wait on commit in nfs_commit_inode() if there were no commit requestsScott Mayhew
If there were no commit requests, then nfs_commit_inode() should not wait on the commit or mark the inode dirty, otherwise the following BUG_ON can be triggered: [ 1917.130762] kernel BUG at fs/inode.c:578! [ 1917.130766] Oops: Exception in kernel mode, sig: 5 [#1] [ 1917.130768] SMP NR_CPUS=2048 NUMA pSeries [ 1917.130772] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc sg nx_crypto pseries_rng ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ibmvscsi scsi_transport_srp ibmveth scsi_tgt dm_mirror dm_region_hash dm_log dm_mod [ 1917.130805] CPU: 2 PID: 14923 Comm: umount.nfs4 Tainted: G ------------ T 3.10.0-768.el7.ppc64 #1 [ 1917.130810] task: c0000005ecd88040 ti: c00000004cea0000 task.ti: c00000004cea0000 [ 1917.130813] NIP: c000000000354178 LR: c000000000354160 CTR: c00000000012db80 [ 1917.130816] REGS: c00000004cea3720 TRAP: 0700 Tainted: G ------------ T (3.10.0-768.el7.ppc64) [ 1917.130820] MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI> CR: 22002822 XER: 20000000 [ 1917.130828] CFAR: c00000000011f594 SOFTE: 1 GPR00: c000000000354160 c00000004cea39a0 c0000000014c4700 c0000000018cc750 GPR04: 000000000000c750 80c0000000000000 0600000000000000 04eeb76bea749a03 GPR08: 0000000000000034 c0000000018cc758 0000000000000001 d000000005e619e8 GPR12: c00000000012db80 c000000007b31200 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 c000000000dfc3ec 0000000000000000 c0000005eefc02c0 GPR28: d0000000079dbd50 c0000005b94a02c0 c0000005b94a0250 c0000005b94a01c8 [ 1917.130867] NIP [c000000000354178] .evict+0x1c8/0x350 [ 1917.130871] LR [c000000000354160] .evict+0x1b0/0x350 [ 1917.130873] Call Trace: [ 1917.130876] [c00000004cea39a0] [c000000000354160] .evict+0x1b0/0x350 (unreliable) [ 1917.130880] [c00000004cea3a30] [c0000000003558cc] .evict_inodes+0x13c/0x270 [ 1917.130884] [c00000004cea3af0] [c000000000327d20] .kill_anon_super+0x70/0x1e0 [ 1917.130896] [c00000004cea3b80] [d000000005e43e30] .nfs_kill_super+0x20/0x60 [nfs] [ 1917.130900] [c00000004cea3c00] [c000000000328a20] .deactivate_locked_super+0xa0/0x1b0 [ 1917.130903] [c00000004cea3c80] [c00000000035ba54] .cleanup_mnt+0xd4/0x180 [ 1917.130907] [c00000004cea3d10] [c000000000119034] .task_work_run+0x114/0x150 [ 1917.130912] [c00000004cea3db0] [c00000000001ba6c] .do_notify_resume+0xcc/0x100 [ 1917.130916] [c00000004cea3e30] [c00000000000a7b0] .ret_from_except_lite+0x5c/0x60 [ 1917.130919] Instruction dump: [ 1917.130921] 7fc3f378 486734b5 60000000 387f00a0 38800003 4bdcb365 60000000 e95f00a0 [ 1917.130927] 694a0060 7d4a0074 794ad182 694a0001 <0b0a0000> 892d02a4 2f890000 40de0134 Signed-off-by: Scott Mayhew <smayhew@redhat.com> Cc: stable@vger.kernel.org # 4.5+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-12-15xprtrdma: Spread reply processing over more CPUsChuck Lever
Commit d8f532d20ee4 ("xprtrdma: Invoke rpcrdma_reply_handler directly from RECV completion") introduced a performance regression for NFS I/O small enough to not need memory registration. In multi- threaded benchmarks that generate primarily small I/O requests, IOPS throughput is reduced by nearly a third. This patch restores the previous level of throughput. Because workqueues are typically BOUND (in particular ib_comp_wq, nfsiod_workqueue, and rpciod_workqueue), NFS/RDMA workloads tend to aggregate on the CPU that is handling Receive completions. The usual approach to addressing this problem is to create a QP and CQ for each CPU, and then schedule transactions on the QP for the CPU where you want the transaction to complete. The transaction then does not require an extra context switch during completion to end up on the same CPU where the transaction was started. This approach doesn't work for the Linux NFS/RDMA client because currently the Linux NFS client does not support multiple connections per client-server pair, and the RDMA core API does not make it straightforward for ULPs to determine which CPU is responsible for handling Receive completions for a CQ. So for the moment, record the CPU number in the rpcrdma_req before the transport sends each RPC Call. Then during Receive completion, queue the RPC completion on that same CPU. Additionally, move all RPC completion processing to the deferred handler so that even RPCs with simple small replies complete on the CPU that sent the corresponding RPC Call. Fixes: d8f532d20ee4 ("xprtrdma: Invoke rpcrdma_reply_handler ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-12-15nfs: fix a deadlock in nfs client initializationScott Mayhew
The following deadlock can occur between a process waiting for a client to initialize in while walking the client list during nfsv4 server trunking detection and another process waiting for the nfs_clid_init_mutex so it can initialize that client: Process 1 Process 2 --------- --------- spin_lock(&nn->nfs_client_lock); list_add_tail(&CLIENTA->cl_share_link, &nn->nfs_client_list); spin_unlock(&nn->nfs_client_lock); spin_lock(&nn->nfs_client_lock); list_add_tail(&CLIENTB->cl_share_link, &nn->nfs_client_list); spin_unlock(&nn->nfs_client_lock); mutex_lock(&nfs_clid_init_mutex); nfs41_walk_client_list(clp, result, cred); nfs_wait_client_init_complete(CLIENTA); (waiting for nfs_clid_init_mutex) Make sure nfs_match_client() only evaluates clients that have completed initialization in order to prevent that deadlock. This patch also fixes v4.0 trunking behavior by not marking the client NFS_CS_READY until the clientid has been confirmed. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-12-15ip_gre: fix wrong return value of erspan_rcvHaishuang Yan
If pskb_may_pull return failed, return PACKET_REJECT instead of -ENOMEM. Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Cc: William Tu <u9012063@gmail.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15Merge branch 'sctp-stream-interleave'David S. Miller
Xin Long says: ==================== sctp: Implement Stream Interleave: Interaction with Other SCTP Extensions Stream Interleave would be implemented in two Parts: 1. The I-DATA Chunk Supporting User Message Interleaving 2. Interaction with Other SCTP Extensions Overview in section 2.3 of RFC8260 for Part 2: The usage of the I-DATA chunk might interfere with other SCTP extensions. Future SCTP extensions MUST describe if and how they interfere with the usage of I-DATA chunks. For the SCTP extensions already defined when this document was published, the details are given in the following subsections. As the 2nd part of Stream Interleave Implementation, this patchset mostly adds the support for SCTP Partial Reliability Extension with I-FORWARD-TSN chunk. Then adjusts stream scheduler and stream reconfig to make them work properly with I-DATA chunks. In the last patch, all stream interleave codes will be enabled by adding sysctl to allow users to use this feature. v1 -> v2: - removed the intl_enable check from sctp_chunk_event_lookup, as Marcelo's suggestion. - fixed a typo in changelog. ==================== Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15sctp: support sysctl to allow users to use stream interleaveXin Long
This is the last patch for support of stream interleave, after this patch, users could enable stream interleave by systcl -w net.sctp.intl_enable=1. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15sctp: update mid instead of ssn when doing stream and asoc resetXin Long
When using idata and doing stream and asoc reset, setting ssn with 0 could only clear the 1st 16 bits of mid. So to make this work for both data and idata, it sets mid with 0 instead of ssn, and also mid_uo for unordered idata also need to be cleared, as said in section 2.3.2 of RFC8260. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15sctp: add stream interleave support in stream schedulerXin Long
As Marcelo said in the stream scheduler patch: Support for I-DATA chunks, also described in RFC8260, with user message interleaving is straightforward as it just requires the schedulers to probe for the feature and ignore datamsg boundaries when dequeueing. All needs to do is just to ignore datamsg boundaries when dequeueing. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15sctp: implement handle_ftsn for sctp_stream_interleaveXin Long
handle_ftsn is added as a member of sctp_stream_interleave, used to skip ssn for data or mid for idata, called for SCTP_CMD_PROCESS_FWDTSN cmd. sctp_handle_iftsn works for ifwdtsn, and sctp_handle_fwdtsn works for fwdtsn. Note that different from sctp_handle_fwdtsn, sctp_handle_iftsn could do stream abort pd. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15sctp: implement report_ftsn for sctp_stream_interleaveXin Long
report_ftsn is added as a member of sctp_stream_interleave, used to skip tsn from tsnmap, remove old events from reasm or lobby queue, and abort pd for data or idata, called for SCTP_CMD_REPORT_FWDTSN cmd and asoc reset. sctp_report_iftsn works for ifwdtsn, and sctp_report_fwdtsn works for fwdtsn. Note that sctp_report_iftsn doesn't do asoc abort_pd, as stream abort_pd will be done when handling ifwdtsn. But when ftsn is equal with ftsn, which means asoc reset, asoc abort_pd has to be done. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15sctp: implement validate_ftsn for sctp_stream_interleaveXin Long
validate_ftsn is added as a member of sctp_stream_interleave, used to validate ssn/chunk type for fwdtsn or mid (message id)/chunk type for ifwdtsn, called in sctp_sf_eat_fwd_tsn, just as validate_data. If this check fails, an abort packet will be sent, as said in section 2.3.1 of RFC8260. As ifwdtsn and fwdtsn chunks have different length, it also defines ftsn_chunk_len for sctp_stream_interleave to describe the chunk size. Then it replaces all sizeof(struct sctp_fwdtsn_chunk) with sctp_ftsnchk_len. It also adds the process for ifwdtsn in rx path. As Marcelo pointed out, there's no need to add event table for ifwdtsn, but just share prsctp_chunk_event_table with fwdtsn's. It would drop fwdtsn chunk for ifwdtsn and drop ifwdtsn chunk for fwdtsn by calling validate_ftsn in sctp_sf_eat_fwd_tsn. After this patch, the ifwdtsn can be accepted. Note that this patch also removes the sctp.intl_enable check for idata chunks in sctp_chunk_event_lookup, as it will do this check in validate_data later. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15sctp: implement generate_ftsn for sctp_stream_interleaveXin Long
generate_ftsn is added as a member of sctp_stream_interleave, used to create fwdtsn or ifwdtsn chunk according to abandoned chunks, called in sctp_retransmit and sctp_outq_sack. sctp_generate_iftsn works for ifwdtsn, and sctp_generate_fwdtsn is still used for making fwdtsn. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15sctp: add basic structures and make chunk function for ifwdtsnXin Long
sctp_ifwdtsn_skip, sctp_ifwdtsn_hdr and sctp_ifwdtsn_chunk are used to define and parse I-FWD TSN chunk format, and sctp_make_ifwdtsn is a function to build the chunk. The I-FORWARD-TSN Chunk Format is defined in section 2.3.1 of RFC8260. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: phy: phylink: Handle NULL fwnode_handleFlorian Fainelli
Unlike the various of_* routines to fetch properties, fwnode_* routines can have an early check against a NULL fwnode_handle reference which makes them return -EINVAL (see fwnode_call_int_op), thus making it virtually impossible to differentiate what type of error is going on. Have an early check in phylink_register_sfp() so we can keep proceeding with the initialization, there is not much we can do without a valid fwnode_handle except return early and treat this similarly to -ENOENT. Fixes: 8fa7b9b6af25 ("phylink: convert to fwnode") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15qmi_wwan: set FLAG_SEND_ZLP to avoid network initiated disconnectBjørn Mork
It has been reported that the dummy byte we add to avoid ZLPs can be forwarded by the modem to the PGW/GGSN, and that some operators will drop the connection if this happens. In theory, QMI devices are based on CDC ECM and should as such both support ZLPs and silently ignore the dummy byte. The latter assumption failed. Let's test out the first. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: usb: qmi_wwan: add Telit ME910 PID 0x1101 supportDaniele Palmas
This patch adds support for Telit ME910 PID 0x1101. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15Merge branch 'net-sched-Make-qdisc-offload-uapi-uniform'David S. Miller
Yuval Mintz says: ==================== net: sched: Make qdisc offload uapi uniform Several qdiscs can already be offloaded to hardware, but there's an inconsistecy in regard to the uapi through which they indicate such an offload is taking place - indication is passed to the user via TCA_OPTIONS where each qdisc retains private logic for setting it. The recent addition of offloading to RED in 602f3baf2218 ("net_sch: red: Add offload ability to RED qdisc") caused the addition of yet another uapi field for this purpose - TC_RED_OFFLOADED. For clarity and prevention of bloat in the uapi we want to eliminate said added uapi, replacing it with a common mechanism that can be used to reflect offload status of the various qdiscs. The first patch introduces TCA_HW_OFFLOAD as the generic message meant for this purpose. The second changes the current RED implementation into setting the internal bits necessary for passing it, and the third removes TC_RED_OFFLOADED as its no longer needed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15pkt_sched: Remove TC_RED_OFFLOADED from uapiYuval Mintz
Following the previous patch, RED is now using the new uniform uapi for indicating it's offloaded. As a result, TC_RED_OFFLOADED is no longer utilized by kernel and can be removed [as it's still not part of any stable release]. Fixes: 602f3baf2218 ("net_sch: red: Add offload ability to RED qdisc") Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: sched: Move to new offload indication in REDYuval Mintz
Let RED utilize the new internal flag, TCQ_F_OFFLOADED, to mark a given qdisc as offloaded instead of using a dedicated indication. Also, change internal logic into looking at said flag when possible. Fixes: 602f3baf2218 ("net_sch: red: Add offload ability to RED qdisc") Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: sched: Add TCA_HW_OFFLOADYuval Mintz
Qdiscs can be offloaded to HW, but current implementation isn't uniform. Instead, qdiscs either pass information about offload status via their TCA_OPTIONS or omit it altogether. Introduce a new attribute - TCA_HW_OFFLOAD that would form a uniform uAPI for the offloading status of qdiscs. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: alteon: acenic: clean up indentation issueColin Ian King
There is a hunk of code that is incorrectly indented with spaces and rather than a tab. Clean this up. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15Merge branch 'sfp-SFF-module-support'David S. Miller
Russell King says: ==================== Add SFF module support Add support for SFF modules. SFF modules are similar to SFP modules, but they have fewer control signals, and are soldered down rather than pluggable. They also have different IDs in the EEPROM to identify as soldered down SFF modules. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15sfp: add sff module supportRussell King
Add support for SFF modules, which are soldered down SFP modules. These have a different phys_id value, and also have the present and rate select signals omitted compared with their socketed counter-parts. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15dt-bindings: add sff,sff binding for SFP supportRussell King
Add "sff,sff" for SFF module support with SFP. These have a different phys_id value, and also have the present and rate select signals omitted compared with their socketed counter-parts. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15Merge branch 'nfp-fix-rtsym-and-XPB-register-handling-in-debug-dump'David S. Miller
Simon Horman says: ==================== nfp: fix rtsym and XPB register handling in debug dump this series resolves two problems in the recently added debug dump facility. * Correctly handle reading absolute rtysms * Correctly handle special-case PB register reads These fixes are for code only present in net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15nfp: fix XPB register reads in debug dumpCarl Heymann
For XPB registers reads, some island IDs require special handling (e.g. ARM island), which is already taken care of in nfp_xpb_readl(), so use that instead of a straight CPP read. Without this fix all "xpbm:ArmIsldXpbmMap.*" registers are reported as 0xffffffff. It has also been observed to cause a system reboot. With this fix correct values are reported, none of which are 0xffffffff. The values may be read using ethtool debug level 2. # ethtool -W <netdev> 2 # ethtool -w <netdev> data dump.dat Fixes: 0e6c4955e149 ("nfp: dump CPP, XPB and direct ME CSRs") Signed-off-by: Carl Heymann <carl.heymann@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15nfp: fix absolute rtsym handling in debug dumpCarl Heymann
In TLV-based ethtool debug dumps, don't do a CPP read for absolute rtsyms, use the addr field in the symbol table directly as the value. Without this fix rtsym gro_release_ring_0 is 4 bytes of zeros. With this fix the correct value, 0x0000004a 0x00000000 is reported. The values may be read using ethtool debug level 2. # ethtool -W <netdev> 2 # ethtool -w <netdev> data dump.dat Fixes: e1e798e3fd93 ("nfp: dump rtsyms") Signed-off-by: Carl Heymann <carl.heymann@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15Merge branch 'aquantia-fixes'David S. Miller
Igor Russkikh says: ==================== net: aquantia: Atlantic driver 12/2017 updates The patchset contains important hardware fix for machines with large MRRS and couple of improvement in stats and capabilities reporting patch v3: - Fixed patch #7 after Andrew's finding. NIC level stats actually have to be cleaned only on hw struct creation (and this is done in kzalloc). On each hwinit we only have to reset link state to make sure hw stats update will not increment nic stats during init. patch v2: - split into more detailed commits Comment from David on wrong defines case will be submitted separately later ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: aquantia: Increment driver versionIgor Russkikh
Add a suffix to distinguish kernel mainline version and aquantia releases Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: aquantia: Fix typo in ethtool statistics namesIgor Russkikh
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: aquantia: Update hw counters on hw initIgor Russkikh
On very first start we should read out current HW counter values to make diff based calculations later. This also should be done each time NIC gets down/up or wakes up after sleep state. We reset link state explicitly to prevent diffs from being summed this first time. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: aquantia: Improve link state and statistics check interval callbackIgor Russkikh
Reduce timeout from 2 secs to 1 sec. If link is down, reduce it to 500msec. This speeds up link detection. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: aquantia: Fill in multicast counter in ndev stats from hardwareIgor Russkikh
This metric comes from HW and is also diff-calculated, like other counters Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: aquantia: Fill ndev stat couters from hardwareIgor Russkikh
Originally they were filled from ring sw counters. These sometimes incorrectly calculate byte and packet amounts when using LRO/LSO and jumboframes. Filling ndev counters from hardware makes them precise. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: aquantia: Extend stat counters to 64bit valuesIgor Russkikh
Device hardware provides only 32bit counters. Using these directly causes byte counters to overflow soon. A separate nic level structure with 64 bit counters is now used to collect incrementally all the stats and report these counters to ethtool stats and ndev stats. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: aquantia: Fix hardware DMA stream overload on large MRRSIgor Russkikh
Systems with large MRRS on device (2K, 4K) with high data rates and/or large MTU, atlantic observes DMA packet buffer overflow. On some systems that causes PCIe transaction errors, hardware NMIs or datapath freeze. This patch 1) Limits MRRS from device side to 2K (thats maximum our hardware supports) 2) Limit maximum size of outstanding TX DMA data read requests. This makes hardware buffers running fine. Signed-off-by: Pavel Belous <pavel.belous@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: aquantia: Fix actual speed capabilities reportingIgor Russkikh
Different hardware device Ids correspond to different maximum speed available. Extra checks were added for devices D108 and D109 to remove unsupported speeds from these device capabilities list. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15Merge branch 'erspan-version-2'David S. Miller
William Tu says: ==================== ERSPAN version 2 (type III) support ERSPAN has two versions, v1 (type II) and v2 (type III). This patch series add support for erspan v2 based on existing erspan v1 implementation. The first patch refactors the existing erspan v1's header structure, making it extensible to put additional v2's header. The second and third patch introduces erspan v2's implementation to ipv4 and ipv6 erspan, for both native mode and collect metadata mode. Finally, test cases are added under the samples/bpf. Note: ERSPAN version 2 has many features and this patch does not implement all. One major use case of version 2 over version 1 is its timestamp and direction. So the traffic collector is able to distinguish the mirrorred traffic better. Other features such as SGT (security group tag), FT (frame type) for carrying non-ethernet packet, and optional subheader are not implemented yet. Example commandline for ERSPAN version 2: ip link add dev ip6erspan11 type ip6erspan seq key 102 \ local fc00:100::2 remote fc00:100::1 \ erspan_ver 2 erspan_dir 1 erspan_hwid 17 The corresponding iproute2 patch: https://marc.info/?l=linux-netdev&m=151321141525106&w=2 William Tu (4): net: erspan: refactor existing erspan code net: erspan: introduce erspan v2 for ip_gre ip6_gre: add erspan v2 support samples/bpf: add erspan v2 sample code include/net/erspan.h | 152 ++++++++++++++++++++++++++++++++++++++--- include/net/ip6_tunnel.h | 3 + include/net/ip_tunnels.h | 5 +- include/uapi/linux/if_ether.h | 1 + include/uapi/linux/if_tunnel.h | 3 + net/ipv4/ip_gre.c | 124 +++++++++++++++++++++++++++------ net/ipv6/ip6_gre.c | 139 +++++++++++++++++++++++++++++++------ net/openvswitch/flow_netlink.c | 8 +-- samples/bpf/tcbpf2_kern.c | 77 ++++++++++++++++++--- samples/bpf/test_tunnel_bpf.sh | 38 ++++++++--- 10 files changed, 472 insertions(+), 78 deletions(-) -- A simple script to test it: set -ex function cleanup() { set +ex ip netns del ns0 ip link del ip6erspan11 ip link del veth1 } function main() { trap cleanup 0 2 3 9 ip netns add ns0 ip link add veth0 type veth peer name veth1 ip link set veth0 netns ns0 # non-namespace ip addr add dev veth1 fc00:100::2/96 if [ "$1" == "v1" ]; then echo "create IP6 ERSPAN v1 tunnel" ip link add dev ip6erspan11 type ip6erspan seq key 102 \ local fc00:100::2 remote fc00:100::1 \ erspan 123 erspan_ver 1 else echo "create IP6 ERSPAN v2 tunnel" ip link add dev ip6erspan11 type ip6erspan seq key 102 \ local fc00:100::2 remote fc00:100::1 \ erspan_ver 2 erspan_dir 1 erspan_hwid 17 fi ip addr add dev ip6erspan11 fc00:200::2/96 ip addr add dev ip6erspan11 10.10.200.2/24 # namespace: ns0 ip netns exec ns0 ip addr add fc00:100::1/96 dev veth0 if [ "$1" == "v1" ]; then ip netns exec ns0 \ ip link add dev ip6erspan00 type ip6erspan seq key 102 \ local fc00:100::1 remote fc00:100::2 \ erspan 123 erspan_ver 1 else ip netns exec ns0 \ ip link add dev ip6erspan00 type ip6erspan seq key 102 \ local fc00:100::1 remote fc00:100::2 \ erspan_ver 2 erspan_dir 1 erspan_hwid 7 fi ip netns exec ns0 ip addr add dev ip6erspan00 fc00:200::1/96 ip netns exec ns0 ip addr add dev ip6erspan00 10.10.200.1/24 ip link set dev veth1 up ip link set dev ip6erspan11 up ip netns exec ns0 ip link set dev ip6erspan00 up ip netns exec ns0 ip link set dev veth0 up } main $1 ping6 -c 1 fc00:100::1 || true ping -c 3 10.10.200.1 exit 0 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15samples/bpf: add erspan v2 sample codeWilliam Tu
Extend the existing tests for ipv4 ipv6 erspan version 2. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15ip6_gre: add erspan v2 supportWilliam Tu
Similar to support for ipv4 erspan, this patch adds erspan v2 to ip6erspan tunnel. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: erspan: introduce erspan v2 for ip_greWilliam Tu
The patch adds support for erspan version 2. Not all features are supported in this patch. The SGT (security group tag), GRA (timestamp granularity), FT (frame type) are set to fixed value. Only hardware ID and direction are configurable. Optional subheader is also not supported. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>