summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-11-09net: dsa: setup and teardown default CPU portVivien Didelot
The dsa_dst_parse function called just before dsa_dst_apply does not parse the tree but does only one thing: it assigns the default CPU port to dst->cpu_dp and to each user ports. This patch simplifies this by calling a dsa_tree_setup_default_cpu function at the beginning of dsa_dst_apply directly. A dsa_port_is_user helper is added for convenience. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-09net: dsa: constify cpu_dp member of dsa_portVivien Didelot
A DSA port has a dedicated CPU port assigned to it, stored in the cpu_dp member. It is not meant to be modified by a port, thus make it const. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08netfilter: ipvs: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Wensong Zhang <wensong@linux-vs.org> Cc: Simon Horman <horms@verge.net.au> Cc: Julian Anastasov <ja@ssi.bg> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: Florian Westphal <fw@strlen.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: lvs-devel@vger.kernel.org Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au>
2017-11-08netpoll: Use lockdep to assert IRQs are disabled/enabledFrederic Weisbecker
Use lockdep to check that IRQs are enabled or disabled as expected. This way the sanity check only shows overhead when concurrency correctness debug code is enabled. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: David S. Miller <davem@davemloft.net> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1509980490-4285-14-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08openvswitch: enable NSH supportYi Yang
v16->17 - Fixed disputed check code: keep them in nsh_push and nsh_pop but also add them in __ovs_nla_copy_actions v15->v16 - Add csum recalculation for nsh_push, nsh_pop and set_nsh pointed out by Pravin - Move nsh key into the union with ipv4 and ipv6 and add check for nsh key in match_validate pointed out by Pravin - Add nsh check in validate_set and __ovs_nla_copy_actions v14->v15 - Check size in nsh_hdr_from_nlattr - Fixed four small issues pointed out By Jiri and Eric v13->v14 - Rename skb_push_nsh to nsh_push per Dave's comment - Rename skb_pop_nsh to nsh_pop per Dave's comment v12->v13 - Fix NSH header length check in set_nsh v11->v12 - Fix missing changes old comments pointed out - Fix new comments for v11 v10->v11 - Fix the left three disputable comments for v9 but not fixed in v10. v9->v10 - Change struct ovs_key_nsh to struct ovs_nsh_key_base base; __be32 context[NSH_MD1_CONTEXT_SIZE]; - Fix new comments for v9 v8->v9 - Fix build error reported by daily intel build because nsh module isn't selected by openvswitch v7->v8 - Rework nested value and mask for OVS_KEY_ATTR_NSH - Change pop_nsh to adapt to nsh kernel module - Fix many issues per comments from Jiri Benc v6->v7 - Remove NSH GSO patches in v6 because Jiri Benc reworked it as another patch series and they have been merged. - Change it to adapt to nsh kernel module added by NSH GSO patch series v5->v6 - Fix the rest comments for v4. - Add NSH GSO support for VxLAN-gpe + NSH and Eth + NSH. v4->v5 - Fix many comments by Jiri Benc and Eric Garver for v4. v3->v4 - Add new NSH match field ttl - Update NSH header to the latest format which will be final format and won't change per its author's confirmation. - Fix comments for v3. v2->v3 - Change OVS_KEY_ATTR_NSH to nested key to handle length-fixed attributes and length-variable attriubte more flexibly. - Remove struct ovs_action_push_nsh completely - Add code to handle nested attribute for SET_MASKED - Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH to transfer NSH header data. - Fix comments and coding style issues by Jiri and Eric v1->v2 - Change encap_nsh and decap_nsh to push_nsh and pop_nsh - Dynamically allocate struct ovs_action_push_nsh for length-variable metadata. OVS master and 2.8 branch has merged NSH userspace patch series, this patch is to enable NSH support in kernel data path in order that OVS can support NSH in compat mode by porting this. Signed-off-by: Yi Yang <yi.y.yang@intel.com> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Eric Garver <e@erig.me> Acked-by: Pravin Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08pktgen: document 32-bit timestamp overflowArnd Bergmann
Timestamps in pktgen are currently retrieved using the deprecated do_gettimeofday() function that wraps its signed 32-bit seconds in 2038 (on 32-bit architectures) and requires a division operation to calculate microseconds. The pktgen header is also defined with the same limitations, hardcoding to a 32-bit seconds field that can be interpreted as unsigned to produce times that only wrap in 2106. Whatever code reads the timestamps should be aware of that problem in general, but probably doesn't care too much as we are mostly interested in the time passing between packets, and that is correctly represented. Using 64-bit nanoseconds would be cheaper and good for 584 years. Using monotonic times would also make this unambiguous by avoiding the overflow, but would make it harder to correlate to the times with those on remote machines. Either approach would require adding a new runtime flag and implementing the same thing on the remote side, which we probably don't want to do unless someone sees it as a real problem. Also, this should be coordinated with other pktgen implementations and might need a new magic number. For the moment, I'm documenting the overflow in the source code, and changing the implementation over to an open-coded ktime_get_real_ts64() plus division, so we don't have to look at it again while scanning for deprecated time interfaces. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08qrtr: Move to postcore_initcallBjorn Andersson
Registering qrtr with module_init makes the ability of typical platform code to create AF_QIPCRTR socket during probe a matter of link order luck. Moving qrtr to postcore_initcall() avoids this. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree, they are: 1) Speed up table replacement on busy systems with large tables (and many cores) in x_tables. Now xt_replace_table() synchronizes by itself by waiting until all cpus had an even seqcount and we use no use seqlock when fetching old counters, from Florian Westphal. 2) Add nf_l4proto_log_invalid() and nf_ct_l4proto_log_invalid() to speed up packet processing in the fast path when logging is not enabled, from Florian Westphal. 3) Precompute masked address from configuration plane in xt_connlimit, from Florian. 4) Don't use explicit size for set selection if performance set policy is selected. 5) Allow to get elements from an existing set in nf_tables. 6) Fix incorrect check in nft_hash_deactivate(), from Florian. 7) Cache netlink attribute size result in l4proto->nla_size, from Florian. 8) Handle NFPROTO_INET in nf_ct_netns_get() from conntrack core. 9) Use power efficient workqueue in conntrack garbage collector, from Vincent Guittot. 10) Remove unnecessary parameter, in conntrack l4proto functions, also from Florian. 11) Constify struct nf_conntrack_l3proto definitions, from Florian. 12) Remove all typedefs in nf_conntrack_h323 via coccinelle semantic patch, from Harsha Sharma. 13) Don't store address in the rbtree nodes in xt_connlimit, they are never used, from Florian. 14) Fix out of bound access in the conntrack h323 helper, patch from Eric Sesterhenn. 15) Print symbols for the address returned with %pS in IPVS, from Helge Deller. 16) Proc output should only display its own netns in IPVS, from KUWAZAWA Takuya. 17) Small clean up in size_entry_mwt(), from Colin Ian King. 18) Use test_and_clear_bit from nf_nat_proto_clean() instead of separated non-atomic test and then clear bit, from Florian Westphal. 19) Consolidate prefix length maps in ipset, from Aaron Conole. 20) Fix sparse warnings in ipset, from Jozsef Kadlecsik. 21) Simplify list_set_memsize(), from simran singhal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08rtnetlink: fix missing size for IFLA_IF_NETNSIDColin Ian King
The size for IFLA_IF_NETNSID is missing from the size calculation because the proceeding semicolon was not removed. Fix this by removing the semicolon. Detected by CoverityScan, CID#1461135 ("Structurally dead code") Fixes: 79e1ad148c84 ("rtnetlink: use netnsid to query interface") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08net: dsa: lan9303: Adjust indentingEgil Hjelmeland
Remove scripts/checkpatch.pl CHECKs by adjusting indenting. Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08net_sch: cbs: Change TC_SETUP_CBS to TC_SETUP_QDISC_CBSNogah Frankel
Change TC_SETUP_CBS to TC_SETUP_QDISC_CBS to match the new convention.. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08net_sch: mqprio: Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIONogah Frankel
Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIO to match the new convention. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08net_sch: red: Add offload ability to RED qdiscNogah Frankel
Add the ability to offload RED qdisc by using ndo_setup_tc. There are four commands for RED offloading: * TC_RED_SET: handles set and change. * TC_RED_DESTROY: handle qdisc destroy. * TC_RED_STATS: update the qdiscs counters (given as reference) * TC_RED_XSTAT: returns red xstats. Whether RED is being offloaded is being determined every time dump action is being called because parent change of this qdisc could change its offload state but doesn't require any RED function to be called. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08ila: Add a hook type for LWT routesTom Herbert
In LWT tunnels both an input and output route method is defined. If both of these are executed in the same path then double translation happens and the effect is not correct. This patch adds a new attribute that indicates the hook type. Two values are defined for route output and route output. ILA translation is only done for the one that is set. The default is to enable ILA on route output. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08ila: allow configuration of identifier typeTom Herbert
Allow identifier to be explicitly configured for a mapping. This can either be one of the identifier types specified in the ILA draft or a value of ILA_ATYPE_USE_FORMAT which means the identifier type is inferred from the identifier type field. If a value other than ILA_ATYPE_USE_FORMAT is set for a mapping then it is assumed that the identifier type field is not present in an identifier. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08ila: add checksum neutral map autoTom Herbert
Add checksum neutral auto that performs checksum neutral mapping without using the C-bit. This is enabled by configuration of a mapping. The checksum neutral function has been split into ila_csum_do_neutral_fmt and ila_csum_do_neutral_nofmt. The former handles the C-bit and includes it in the adjustment value. The latter just sets the adjustment value on the locator diff only. Added configuration for checksum neutral map aut in ila_lwt and ila_xlat. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08ila: cleanup checksum diffTom Herbert
Consolidate computing checksum diff into one function. Add get_csum_diff_iaddr that computes the checksum diff between an address argument and locator being written. get_csum_diff calls this using the destination address in the IP header as the argument. Also moved ila_init_saved_csum to be close to the checksum diff functions. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-07SUNRPC: Improve ordering of transport processingTrond Myklebust
Since it can take a while before a specific thread gets scheduled, it is better to just implement a first come first served queue mechanism. That way, if a thread is already scheduled and is idle, it can pick up the work to do from the queue. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07svcrdma: Enqueue after setting XPT_CLOSE in completion handlersChuck Lever
I noticed the server was sometimes not closing the connection after a flushed Send. For example, if the client responds with an RNR NAK to a Reply from the server, that client might be deadlocked, and thus wouldn't send any more traffic. Thus the server wouldn't have any opportunity to notice the XPT_CLOSE bit has been set. Enqueue the transport so that svcxprt notices the bit even if there is no more transport activity after a flushed completion, QP access error, or device removal event. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07rpc: remove some BUG()sJ. Bruce Fields
It would be kinder to WARN() and recover in several spots here instead of BUG()ing. Also, it looks like the read_u32_from_xdr_buf() call could actually fail, though it might require a broken (or malicious) client, so convert that to just an error return. Reported-by: Weston Andros Adamson <dros@monkey.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07svcrdma: Preserve CB send buffer across retransmitsChuck Lever
During each NFSv4 callback Call, an RDMA Send completion frees the page that contains the RPC Call message. If the upper layer determines that a retransmit is necessary, this is too soon. One possible symptom: after a GARBAGE_ARGS response an NFSv4.1 callback request, the following BUG fires on the NFS server: kernel: BUG: Bad page state in process kworker/0:2H pfn:7d3ce2 kernel: page:ffffea001f4f3880 count:-2 mapcount:0 mapping: (null) index:0x0 kernel: flags: 0x2fffff80000000() kernel: raw: 002fffff80000000 0000000000000000 0000000000000000 fffffffeffffffff kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 kernel: page dumped because: nonzero _refcount kernel: Modules linked in: cts rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue rpcrdm a ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pc lmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd pcspkr lpc_ich i2c_i801 mei_me mf d_core mei raid0 sg wmi ioatdma ipmi_si ipmi_devintf ipmi_msghandler shpchp acpi_power_meter acpi_pad nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en mlx4_ib mlx5_ib ib_core sd_mod sr_mod cdrom ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crc32c_intel libahci drm mlx5_core igb libata mlx4_core dca i2c_algo_bit i2c_core nvme kernel: ptp nvme_core pps_core dm_mirror dm_region_hash dm_log dm_mod dax kernel: CPU: 0 PID: 11495 Comm: kworker/0:2H Not tainted 4.14.0-rc3-00001-g577ce48 #811 kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015 kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] kernel: Call Trace: kernel: dump_stack+0x62/0x80 kernel: bad_page+0xfe/0x11a kernel: free_pages_check_bad+0x76/0x78 kernel: free_pcppages_bulk+0x364/0x441 kernel: ? ttwu_do_activate.isra.61+0x71/0x78 kernel: free_hot_cold_page+0x1c5/0x202 kernel: __put_page+0x2c/0x36 kernel: svc_rdma_put_context+0xd9/0xe4 [rpcrdma] kernel: svc_rdma_wc_send+0x50/0x98 [rpcrdma] This issue exists all the way back to v4.5, but refactoring and code re-organization prevents this simple patch from applying to kernels older than v4.12. The fix is the same, however, if someone needs to backport it. Reported-by: Ben Coddington <bcodding@redhat.com> BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=314 Fixes: 5d252f90a800 ('svcrdma: Add class for RDMA backwards ... ') Cc: stable@vger.kernel.org # v4.12 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07sunrcp: make function _svc_create_xprt staticColin Ian King
The function _svc_create_xprt is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: symbol '_svc_create_xprt' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-07Merge branch 'linus' into locking/core, to resolve conflictsIngo Molnar
Conflicts: include/linux/compiler-clang.h include/linux/compiler-gcc.h include/linux/compiler-intel.h include/uapi/linux/stddef.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07ipv6: addrconf: fix a lockdep splatEric Dumazet
Fixes a case where GFP_ATOMIC allocation must be used instead of GFP_KERNEL one. [ 54.891146] lock_acquire+0xb3/0x2f0 [ 54.891153] ? fs_reclaim_acquire.part.60+0x5/0x30 [ 54.891165] fs_reclaim_acquire.part.60+0x29/0x30 [ 54.891170] ? fs_reclaim_acquire.part.60+0x5/0x30 [ 54.891178] kmem_cache_alloc_trace+0x3f/0x500 [ 54.891186] ? cyc2ns_read_end+0x1e/0x30 [ 54.891196] ipv6_add_addr+0x15a/0xc30 [ 54.891217] ? ipv6_create_tempaddr+0x2ea/0x5d0 [ 54.891223] ipv6_create_tempaddr+0x2ea/0x5d0 [ 54.891238] ? manage_tempaddrs+0x195/0x220 [ 54.891249] ? addrconf_prefix_rcv_add_addr+0x1c0/0x4f0 [ 54.891255] addrconf_prefix_rcv_add_addr+0x1c0/0x4f0 [ 54.891268] addrconf_prefix_rcv+0x2e5/0x9b0 [ 54.891279] ? neigh_update+0x446/0xb90 [ 54.891298] ? ndisc_router_discovery+0x5ab/0xf00 [ 54.891303] ndisc_router_discovery+0x5ab/0xf00 [ 54.891311] ? retint_kernel+0x2d/0x2d [ 54.891331] ndisc_rcv+0x1b6/0x270 [ 54.891340] icmpv6_rcv+0x6aa/0x9f0 [ 54.891345] ? ipv6_chk_mcast_addr+0x176/0x530 [ 54.891351] ? do_csum+0x17b/0x260 [ 54.891360] ip6_input_finish+0x194/0xb20 [ 54.891372] ip6_input+0x5b/0x2c0 [ 54.891380] ? ip6_rcv_finish+0x320/0x320 [ 54.891389] ip6_mc_input+0x15a/0x250 [ 54.891396] ipv6_rcv+0x772/0x1050 [ 54.891403] ? consume_skb+0xbe/0x2d0 [ 54.891412] ? ip6_make_skb+0x2a0/0x2a0 [ 54.891418] ? ip6_input+0x2c0/0x2c0 [ 54.891425] __netif_receive_skb_core+0xa0f/0x1600 [ 54.891436] ? process_backlog+0xac/0x400 [ 54.891441] process_backlog+0xfa/0x400 [ 54.891448] ? net_rx_action+0x145/0x1130 [ 54.891456] net_rx_action+0x310/0x1130 [ 54.891524] ? RTUSBBulkReceive+0x11d/0x190 [mt7610u_sta] [ 54.891538] __do_softirq+0x140/0xaba [ 54.891553] irq_exit+0x10b/0x160 [ 54.891561] do_IRQ+0xbb/0x1b0 Fixes: f3d9832e56c4 ("ipv6: addrconf: cleanup locking in ipv6_add_addr") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Acked-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-07netfilter: nf_tables: get set elements via netlinkPablo Neira Ayuso
This patch adds a new get operation to look up for specific elements in a set via netlink interface. You can also use it to check if an interval already exists. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-07netfilter: nf_tables: performance set policy skips size description in selectionPablo Neira Ayuso
Use the complexity and space notations if policy is performance, this results in placing the bitmap set representation over the hashtable for key <= 16 for better performance as we discussed during the last NFWS in Faro, Portugal. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-07netfilter: conntrack: use power efficient workqueueVincent Guittot
conntrack uses the bounded system_long_wq workqueue for its works that don't have to run on the cpu they have been queued. Using bounded workqueue prevents the scheduler to make smart decision about the best place to schedule the work. This patch replaces system_long_wq with system_power_efficient_wq. the work stays bounded to a cpu by default unless the CONFIG_WQ_POWER_EFFICIENT is enable. In the latter case, the work can be scheduled on the best cpu from a power or a performance point of view. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: conntrack: move nf_ct_netns_{get,put}() to corePablo Neira Ayuso
So we can call this from other expression that need conntrack in place to work. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Florian Westphal <fw@strlen.de>
2017-11-06netfilter: conntrack: don't cache nlattr_tuple_size result in nla_sizeFlorian Westphal
We currently call ->nlattr_tuple_size() once at register time and cache result in l4proto->nla_size. nla_size is the only member that is written to, avoiding this would allow to make l4proto trackers const. We can use ->nlattr_tuple_size() at run time, and cache result in the individual trackers instead. This is an intermediate step, next patch removes nlattr_size() callback and computes size at compile time, then removes nla_size. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: nft_hash: fix nft_hash_deactivateFlorian Westphal
Jindřich Makovička says: The logical OR looks fishy to me. Shouldn't be && there instead? Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1199 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: xt_connlimit: remove mask argumentFlorian Westphal
Instead of passing mask to all the helpers, just fixup the search key early. After rbtree conversion, each rbtree node stores connections of same 'addr & mask', so no need to pass the mask too. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: ebtables: clean up initialization of bufColin Ian King
buf is initialized to buf_start and then set on the next statement to buf_start + offsets[i]. Clean this up to just initialize buf to buf_start + offsets[i] to clean up the clang build warning: "Value stored to 'buf' during its initialization is never read" Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: ipvs: Fix inappropriate output of procfsKUWAZAWA Takuya
Information about ipvs in different network namespace can be seen via procfs. How to reproduce: # ip netns add ns01 # ip netns add ns02 # ip netns exec ns01 ip a add dev lo 127.0.0.1/8 # ip netns exec ns02 ip a add dev lo 127.0.0.1/8 # ip netns exec ns01 ipvsadm -A -t 10.1.1.1:80 # ip netns exec ns02 ipvsadm -A -t 10.1.1.2:80 The ipvsadm displays information about its own network namespace only. # ip netns exec ns01 ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.1.1.1:80 wlc # ip netns exec ns02 ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.1.1.2:80 wlc But I can see information about other network namespace via procfs. # ip netns exec ns01 cat /proc/net/ip_vs IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 0A010101:0050 wlc TCP 0A010102:0050 wlc # ip netns exec ns02 cat /proc/net/ip_vs IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 0A010102:0050 wlc Signed-off-by: KUWAZAWA Takuya <albatross0@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: ipvs: Use %pS printk format for direct addressesHelge Deller
The debug and error printk functions in ipvs uses wrongly the %pF instead of the %pS printk format specifier for printing symbols for the address returned by _builtin_return_address(0). Fix it for the ia64, ppc64 and parisc64 architectures. Signed-off-by: Helge Deller <deller@gmx.de> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: netdev@vger.kernel.org Cc: lvs-devel@vger.kernel.org Cc: netfilter-devel@vger.kernel.org Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06NFC: Convert timers to use timer_setup()Allen Pais
Switch to using the new timer_setup() and from_timer() for net/nfc/* Signed-off-by: Allen Pais <allen.pais@oracle.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-11-06NFC: fix device-allocation error returnJohan Hovold
A recent change fixing NFC device allocation itself introduced an error-handling bug by returning an error pointer in case device-id allocation failed. This is clearly broken as the callers still expected NULL to be returned on errors as detected by Dan's static checker. Fix this up by returning NULL in the event that we've run out of memory when allocating a new device id. Note that the offending commit is marked for stable (3.8) so this fix needs to be backported along with it. Fixes: 20777bc57c34 ("NFC: fix broken device allocation") Cc: stable <stable@vger.kernel.org> # 3.8 Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-11-05tcp: fix DSACK-based undo on non-duplicate ACKPriyaranjan Jha
Fixes DSACK-based undo when sender is in Open State and an ACK advances snd_una. Example scenario: - Sender goes into recovery and makes some spurious rtx. - It comes out of recovery and enters into open state. - It sends some more packets, let's say 4. - The receiver sends an ACK for the first two, but this ACK is lost. - The sender receives ack for first two, and DSACK for previous spurious rtx. Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yousuk Seung <ysseung@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05tcp: higher throughput under reordering with adaptive RACK reordering wndPriyaranjan Jha
Currently TCP RACK loss detection does not work well if packets are being reordered beyond its static reordering window (min_rtt/4).Under such reordering it may falsely trigger loss recoveries and reduce TCP throughput significantly. This patch improves that by increasing and reducing the reordering window based on DSACK, which is now supported in major TCP implementations. It makes RACK's reo_wnd adaptive based on DSACK and no. of recoveries. - If DSACK is received, increment reo_wnd by min_rtt/4 (upper bounded by srtt), since there is possibility that spurious retransmission was due to reordering delay longer than reo_wnd. - Persist the current reo_wnd value for TCP_RACK_RECOVERY_THRESH (16) no. of successful recoveries (accounts for full DSACK-based loss recovery undo). After that, reset it to default (min_rtt/4). - At max, reo_wnd is incremented only once per rtt. So that the new DSACK on which we are reacting, is due to the spurious retx (approx) after the reo_wnd has been updated last time. - reo_wnd is tracked in terms of steps (of min_rtt/4), rather than absolute value to account for change in rtt. In our internal testing, we observed significant increase in throughput, in scenarios where reordering exceeds min_rtt/4 (previous static value). Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: resolve tagging protocol at parse timeVivien Didelot
Extend the dsa_port_parse_cpu() function to resolve the tagging protocol at port parsing time, instead of waiting for the whole tree to be complete. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: add one port parsing function per typeVivien Didelot
Add dsa_port_parse_user, dsa_port_parse_dsa and dsa_port_parse_cpu functions to factorize the code shared by both OF and pdata parsing. They don't do much for the moment but will be extended later to support tagging protocol resolution for example. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: only check presence of link propertyVivien Didelot
When parsing a port, simply use of_property_read_bool which checks the presence of a given property, instead of parsing the link phandle. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: rework switch parsingVivien Didelot
When parsing a switch, we have to identify to which tree it belongs and parse its ports. Provide two functions to separate the OF and platform data specific paths. Also use the of_property_read_variable_u32_array function to parse the OF member array instead of calling of_property_read_u32_index twice. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: get tree before parsing portsVivien Didelot
We will need a reference to the dsa_switch_tree when parsing a CPU port, so fetch it right after parsing the member and before parsing ports. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: rework switch addition and removalVivien Didelot
This patch removes the unnecessary index argument from the dsa_dst_add_ds and dsa_dst_del_ds functions and renames them to dsa_tree_add_switch and dsa_tree_remove_switch respectively. In addition to a more explicit scope, we now check the presence of an existing switch with the same index directly within dsa_tree_add_switch. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: provide a find or new tree helperVivien Didelot
Rename dsa_get_dst to dsa_tree_find since it doesn't increment the reference counter, rename dsa_add_dst to dsa_tree_alloc for symmetry with dsa_tree_free, and provide a convenient dsa_tree_touch function to find or allocate a new tree. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: get and put tree reference countingVivien Didelot
Provide convenient dsa_tree_get and dsa_tree_put functions scoping a DSA tree used to increment and decrement its reference counter, instead of poking directly its kref structure. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: simplify tree reference countingVivien Didelot
DSA trees have a refcount used to automatically free the dsa_switch_tree structure once there is no switch devices inside of it. The refcount is incremented when a switch is added to the tree, and decremented when it is removed from it. But because of kref_init, the refcount is also incremented at initialization, and when looking up the tree from the list for symmetry. Thus the current code stores the number of switches plus one, and makes the switch registration more complex. To simplify the switch registration function, we reset the refcount to zero after initialization and don't increment it when looking up a tree. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: make tree index unsignedVivien Didelot
Similarly to a DSA switch and port, rename the tree index from "tree" to "index" and make it an unsigned int because it isn't supposed to be less than 0. u32 is an OF specific data used to retrieve the value and has no need to be propagated up to the tree index. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05bpf: remove old offload/analyzerJakub Kicinski
Thanks to the ability to load a program for a specific device, running verifier twice is no longer needed. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05cls_bpf: allow attaching programs loaded for specific deviceJakub Kicinski
If TC program is loaded with skip_sw flag, we should allow the device-specific programs to be accepted. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>