summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-12-07netfilter: nat: skip checksum on offload SCTP packetsDavide Caratti
SCTP GSO and hardware can do CRC32c computation after netfilter processing, so we can avoid calling sctp_compute_checksum() on skb if skb->ip_summed is equal to CHECKSUM_PARTIAL. Moreover, set skb->ip_summed to CHECKSUM_NONE when the NAT code computes the CRC, to prevent offloaders from computing it again (on ixgbe this resulted in a transmission with wrong L4 checksum). Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: rpfilter: bypass ipv4 lbcast packets with zeronet sourceLiping Zhang
Otherwise, DHCP Discover packets(0.0.0.0->255.255.255.255) may be dropped incorrectly. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: nf_tables: allow to filter stateful object dumps by typePablo Neira Ayuso
This patch adds the netlink code to filter out dump of stateful objects, through the NFTA_OBJ_TYPE netlink attribute. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: nft_objref: support for stateful object mapsPablo Neira Ayuso
This patch allows us to refer to stateful object dictionaries, the source register indicates the key data to be used to look up for the corresponding state object. We can refer to these maps through names or, alternatively, the map transaction id. This allows us to refer to both anonymous and named maps. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: nf_tables: add stateful object reference to set elementsPablo Neira Ayuso
This patch allows you to refer to stateful objects from set elements. This provides the infrastructure to create maps where the right hand side of the mapping is a stateful object. This allows us to build dictionaries of stateful objects, that you can use to perform fast lookups using any arbitrary key combination. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: nft_quota: add depleted flag for objectsPablo Neira Ayuso
Notify on depleted quota objects. The NFT_QUOTA_F_DEPLETED flag indicates we have reached overquota. Add pointer to table from nft_object, so we can use it when sending the depletion notification to userspace. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: nf_tables: notify internal updates of stateful objectsPablo Neira Ayuso
Introduce nf_tables_obj_notify() to notify internal state changes in stateful objects. This is used by the quota object to report depletion in a follow up patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: nf_tables: atomic dump and reset for stateful objectsPablo Neira Ayuso
This patch adds a new NFT_MSG_GETOBJ_RESET command perform an atomic dump-and-reset of the stateful object. This also comes with add support for atomic dump and reset for counter and quota objects. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-07netfilter: nft_quota: dump consumed quotaPablo Neira Ayuso
Add a new attribute NFTA_QUOTA_CONSUMED that displays the amount of quota that has been already consumed. This allows us to restore the internal state of the quota object between reboots as well as to monitor how wasted it is. This patch changes the logic to account for the consumed bytes, instead of the bytes that remain to be consumed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06i40e: move all updates for VLAN mode into i40e_sync_vsi_filtersJacob Keller
In a similar fashion to how we handled exiting VLAN mode, move the logic in i40e_vsi_add_vlan into i40e_sync_vsi_filters. Extract this logic into its own function for ease of understanding as it will become quite complex. The new function, i40e_correct_mac_vlan_filters() correctly updates all filters for when we need to enter VLAN mode, exit VLAN mode, and also enforces the PVID when assigned. Call i40e_correct_mac_vlan_filters from i40e_sync_vsi_filters passing it the number of active VLAN filters, and the two temporary lists. Remove the function for updating VLAN=0 filters from i40e_vsi_add_vlan. The end result is that the logic for entering and exiting VLAN mode is in one location which has the most knowledge about all filters. This ensures that we always correctly have the non-VLAN filters assigned to VID=0 or VID=-1 regardless of how we ended up getting to this result. Additionally this enforces the PVID at sync time so that we know for certain that an assigned PVID results in only filters with that PVID will be added to the firmware. Change-ID: I895cee81e9c92d0a16baee38bd0ca51bbb14e372 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: use (add|rm)_vlan_all_mac helper functions when changing PVIDJacob Keller
The current flow for adding or updating the PVID for a VF uses i40e_vsi_add_vlan and i40e_vsi_kill_vlan which each take, then release the hash lock. In addition the two functions also must take special care that they do not perform VLAN mode changes as this will make the code in i40e_ndo_set_vf_port_vlan behave incorrectly. Fix these issues by using the new helper functions i40e_add_vlan_all_mac and i40e_rm_vlan_all_mac which expect the hash lock to already be taken. Additionally these functions do not perform any state updates in regards to VLAN mode, so they are safe to use in the PVID update flow. It should be noted that we don't need the VLAN mode update code here, because there are only a few flows here. (a) we're adding a new PVID In this case, if we already had VLAN filters the VSI is knocked offline so we don't need to worry about pre-existing VLAN filters (b) we're replacing an existing PVID In this case, we can't have any VLAN filters except those with the old PVID which we already take care of manually. (c) we're removing an existing PVID Similarly to above, we can't have any existing VLAN filters except those with the old PVID which we already take care of correctly. Because of this, we do not need (or even want) the special accounting done in i40e_vsi_add_vlan, so use of the helpers is a saner alternative. It also opens the door for a future patch which will refactor the flow of i40e_vsi_add_vlan now that it is not needed in this function. Change-ID: Ia841f63da94e12b106f41cf7d28ce8ce92f2ad99 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: factor out addition/deletion of VLAN per each MAC addressJacob Keller
A future refactor of how the PF assigns a PVID to a VF will want to be able to add and remove a block of filters by VLAN without worrying about accidentally triggering the accounting for I40E_VLAN_ANY. Additionally the PVID assignment would like to be able to batch several changes under one use of the mac_filter_hash_lock. Factor out the addition and deletion of a VLAN on all MACs into their own function which i40e_vsi_(add|kill)_vlan can use. These new functions expect the caller to take the hash lock, as well as perform any necessary accounting for updating I40E_VLAN_ANY filters if we are now operating under VLAN mode. Change-ID: If79e5b60b770433275350a74b3f1880333a185d5 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: delete filter after adding its replacement when convertingJacob Keller
Fix a subtle issue with the code for converting VID=-1 filters into VID=0 filters when adding a new VLAN. Previously the code deleted the VID=-1 filter, and then added a new VID=0 filter. In the rare case that the addition fails due to -ENOMEM, we end up completely deleting the filter which prevents recovery if memory pressure subsides. While it is not strictly an issue because it is likely that memory issues would result in many other problems, we shouldn't delete the filter until after the addition succeeds. Change-ID: Icba07ddd04ecc6a3b27c2e29f2c1c8673d266826 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: refactor i40e_update_filter_state to avoid passing aq_errJacob Keller
The current caller of i40e_update_filter_state incorrectly passes aq_ret, an i40e_status variable, instead of the expected aq_err. This happens to work because i40e_status is actually just a typedef integer, and 0 is still the successful return. However i40e_update_filter_state has special handling for ENOSPC which is currently being ignored. Also notice that firmware does not update the per-filter response for many types of errors, such as EINVAL. Thus, modify the filter setup so that the firmware response memory is pre-set with I40E_AQC_MM_ERR_NO_RES. This enables us to refactor i40e_update_filter_state, removing the need to pass aq_err and avoiding a need for having 3 different flows for checking the filter state. The resulting code for i40e_update_filter_state is much simpler, only a single loop and we always check each filter response value every time. Since we pre-set the response value to match our expected error this correctly works for all success and error flows. Change-ID: Ie292c9511f34ee18c6ef40f955ad13e28b7aea7d Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: recalculate vsi->active_filters from hash contentsJacob Keller
Previous code refactors have accidentally caused issues with the counting of active_filters. Avoid similar issues in the future by simply re-counting the active filters every time after we handle add and delete of all the filters. Additionally this allows us to simplify the check for when we exit promiscuous mode since we can combine the check for failed filters at the same time. Additionally since we recount filters at the end we need to set vsi->promisc_threshold as well. The resulting code takes a bit longer since we do have to loop over filters again. However, the result is more readable and less likely to become incorrect due to failed accounting of filters in the future. Finally, this ensures that it is not possible for vsi->active_filters to ever underflow since we never decrement it. Change-ID: Ib4f3a377e60eb1fa6c91ea86cc02238c08edd102 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: defeature support for PTP L4 frame detection on XL710Jacob Keller
A product decision has been made to defeature detection of PTP frames over L4 (UDP) on the XL710 MAC. Do not advertise support for L4 timestamping. Change-ID: I41fbb0f84ebb27c43e23098c08156f2625c6ee06 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: lock service task correctlyMitch Williams
The service task lock was being set in the scheduling function, not the actual service task. This would potentially leave the bit set for a long time before the task actually ran. Furthermore, if the service task takes too long, it calls the schedule function to reschedule itself - which would fail to take the lock and do nothing. Instead, set and clear the lock bit in the service task itself. In the process, get rid of the i40e_service_event_complete() function, which is really just two lines of code that can be put right in the service task itself. Change-ID: I83155e682b686121e2897f4429eb7d3f7c669168 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: Add functions which apply correct PHY access method for read and write ↵Michal Kosiarz
operation Depending on external PHY type, register access method should be different. Clause22 or Clause45 can be chosen for different PHYs. Implemented functions apply correct access method for used device. Change-ID: If39d5f0da9c0b905a8cbdc1ab89885535e7d0426 Signed-off-by: Michal Kosiarz <michal.kosiarz@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: Add FEC for 25gCarolyn Wyborny
This patch adds adminq support for Forward Error Correction ("FEC")for 25g products. Change-ID: Iaff4910737c239d2c730e5c22a313ce9c37d3964 Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jacek Naczyk <jacek.naczyk@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: Add support for 25G devicesCarolyn Wyborny
Add support for 25G devices - defines and data structures. One tricky part here is that the firmware support for these Devices introduces a mismatch between the PHY type enum and the bitfields for the phy types. This change creates a macro and uses it to increment the 25G PHY values when creating 25G bitfields. Change-ID: I69b24d837d44cf9220bf5cb8dd46c5be89ce490b Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: use unsigned printf format specifier for active_filters countJacob Keller
Replace the %d specifier used for printing vsi->active_filters and vsi->promisc_threshold with an unsigned %u format specifier. While it is unlikely in practice that these values will ever reach such a large number they are unsigned values and thus should not be interpreted as negative numbers. Change-ID: Iff050fad5a1c8537c4c57fcd527441cd95cfc0d4 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06Changed version from 1.6.21 to 1.6.25Bimmy Pujari
Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: Blink LED on 1G BaseT boardsHenry Tieman
Before this patch "ethtool -p" was not blinking the LEDs on boards with 1G BaseT PHYs. This commit identifies 1G BaseT boards as having the LEDs connected to the MAC. Also, renamed the flag to be more descriptive of usage. The flag is now I40E_FLAG_PHY_CONTROLS_LEDS. Change-ID: I4eb741da9780da7849ddf2dc4c0cb27ffa42a801 Signed-off-by: Henry Tieman <henry.w.tieman@intel.com> Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: remove code to handle dev_addr speciallyJacob Keller
The netdev->dev_addr MAC filter already exists in the MAC/VLAN hash table, as it is added when we configure the netdev in i40e_configure_netdev. Because we already know that this address will be updated in the hash_for_each loops, we do not need to handle it specially. This removes duplicate code and simplifies the i40e_vsi_add_vlan and i40e_vsi_kill_vlan functions. Because we know these filters must be part of the MAC/VLAN hash table, this should not have any functional impact on what filters are included and is merely a code simplification. Change-ID: I5e648302dbdd7cc29efc6d203b7019c11f0b5705 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e/i40evf: napi_poll must return the work doneAlexander Duyck
Currently the function i40e_napi-poll() returns 0 when it clean completely the Rx rings, but this foul budget accounting in core code. Fix this by returning the actual work done, capped to budget - 1, since the core doesn't allow to return the full budget when the driver modifies the NAPI status This is based on a similar change that was made for the ixgbe driver by Paolo Abeni. Change-ID: Ic3d93ad2fa2fc8ce3164bc461e69367da0f9173b Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: restore workaround for removing default MAC filterJacob Keller
A previous commit 53cb6e9e8949 ("i40e: Removal of workaround for simple MAC address filter deletion") removed a workaround for some firmware versions which was reported to not be necessary in production NICs. Unfortunately this workaround is necessary in some configurations, specifically the Ethernet Controller XL710 for 40GbE QSFP+ (8086:1583). Without this patch, the mentioned NICs with current firmware exhibit issues when adding VLANs, as outlined by the following reproduction: $modprobe i40e $ip link set <device> up $ip link add link <device> vlan100 type vlan id 100 $dmesg | tail <snip> kernel: i40e 0000:82:00.0: Error I40E_AQ_RC_EINVAL adding RX filters on PF, promiscuous mode forced on This results in filters being marked as FAILED and setting the device in promiscuous mode. The root cause of receiving the -EINVAL error response appears to be due to a conflict with the default MAC filter which still exists on the default firmware for this device. Attempting to add a new VLAN filter on the default MAC address conflicts with the IGNORE_VLAN setting on the default rule. Change-ID: I4d8f6d48ac5f60cfe981b3baad30eb4d7c170d61 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: simplify txd use count calculationMitch Williams
The i40e_txd_use_count function was fast but confusing. In the comments, it even admits that it's ugly. So replace it with a new function that is (very) slightly faster and has extensive commenting to help the thicker among us (including the author, who will forget in a week) understand how it works. Change-ID: Ifb533f13786a0bf39cb29f77969a5be2c83d9a87 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06i40e: Driver prints log message on link speed changeFilip Sadowski
This patch makes the driver log link speed change. Before applying the patch link messages were printed only on state change. Now message is printed when link is brought up or down and when speed changes. Change-ID: Ifbee14b4b16c24967450b3cecac6e8351dcc8f74 Signed-off-by: Filip Sadowski <filip.sadowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06tun: Use netif_receive_skb instead of netif_rxAndrey Konovalov
This patch changes tun.c to call netif_receive_skb instead of netif_rx when a packet is received (if CONFIG_4KSTACKS is not enabled to avoid stack exhaustion). The difference between the two is that netif_rx queues the packet into the backlog, and netif_receive_skb proccesses the packet in the current context. This patch is required for syzkaller [1] to collect coverage from packet receive paths, when a packet being received through tun (syzkaller collects coverage per process in the process context). As mentioned by Eric this change also speeds up tun/tap. As measured by Peter it speeds up his closed-loop single-stream tap/OVS benchmark by about 23%, from 700k packets/second to 867k packets/second. A similar patch was introduced back in 2010 [2, 3], but the author found out that the patch doesn't help with the task he had in mind (for cgroups to shape network traffic based on the original process) and decided not to go further with it. The main concern back then was about possible stack exhaustion with 4K stacks. [1] https://github.com/google/syzkaller [2] https://www.spinics.net/lists/netdev/thrd440.html#130570 [3] https://www.spinics.net/lists/netdev/msg130570.html Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06Merge branch 'w83977af_ir-neatening'David S. Miller
Joe Perches says: ==================== irda: w83977af_ir: Neatening Originally on top of Arnd's overly long udelay patches because I noticed a misindented block. That's now already fixed along with some other whitespace problems. These patches are the remainder style issues from my original series. Even though I haven't turned on the netwinder in a box in the garage in who knows how long, if this device is still used somewhere, might as well neaten the code too. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06irda: w83977af_ir: Neaten loggingJoe Perches
Use more common logging style, standardize function output logging use. Miscellanea: o Add and use pr_fmt o Convert printks to pr_<level> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06irda: w83977af_ir: Parenthesis alignmentJoe Perches
Neaten function declaration and definition arguments. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06irda: w83977af_ir: Use the common brace styleJoe Perches
Add braces where appropriate and remove an unnecessary else. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06irda: w83977af_ir: Neaten pointer comparisonsJoe Perches
Convert pointer comparisons to NULL. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06irda: w83977af_ir: Remove and add blank linesJoe Perches
Use a more typical vertical spacing style. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06irda: w83977af_ir: More whitespace neateningJoe Perches
Add spaces around operators. git diff -w shows no differences. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06irda: w83977af_ir: Whitespace neateningJoe Perches
Remove leading and trailing whitespace. git diff -w shows no differences. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2016-12-06netfilter: nf_tables: add stateful object reference expressionPablo Neira Ayuso
This new expression allows us to refer to existing stateful objects from rules. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: nft_quota: add stateful object typePablo Neira Ayuso
Register a new quota stateful object type into the new stateful object infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: nft_counter: add stateful object typePablo Neira Ayuso
Register a new percpu counter stateful object type into the stateful object infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: nf_tables: add stateful objectsPablo Neira Ayuso
This patch augments nf_tables to support stateful objects. This new infrastructure allows you to create, dump and delete stateful objects, that are identified by a user-defined name. This patch adds the generic infrastructure, follow up patches add support for two stateful objects: counters and quotas. This patch provides a native infrastructure for nf_tables to replace nfacct, the extended accounting infrastructure for iptables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: add and use nf_fwd_netdev_egressFlorian Westphal
... so we can use current skb instead of working with a clone. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: ingress: translate 0 nf_hook_slow retval to -1Florian Westphal
The caller assumes that < 0 means that skb was stolen (or free'd). All other return values continue skb processing. nf_hook_slow returns 3 different return value types: A) a (negative) errno value: the skb was dropped (NF_DROP, e.g. by iptables '-j DROP' rule). B) 0. The skb was stolen by the hook or queued to userspace. C) 1. all hooks returned NF_ACCEPT so the caller should invoke the okfn so packet processing can continue. nft ingress facility currently doesn't have the 'okfn' that the NF_HOOK() macros use; there is no nfqueue support either. So 1 means that nf_hook_ingress() caller should go on processing the skb. In order to allow use of NF_STOLEN from ingress we need to translate this to an errno number, else we'd crash because we continue with already-free'd (or about to be free-d) skb. The errno value isn't checked, its just important that its less than 0, so return -1. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: xt_multiport: Fix wrong unmatch result with multiple portsGao Feng
I lost one test case in the last commit for xt_multiport. For example, the rule is "-m multiport --dports 22,80,443". When first port is unmatched and the second is matched, the curent codes could not return the right result. It would return false directly when the first port is unmatched. Fixes: dd2602d00f80 ("netfilter: xt_multiport: Use switch case instead of multiple condition checks") Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: nft_payload: layer 4 checksum adjustment for pseudoheader fieldsPablo Neira Ayuso
This patch adds a new flag that signals the kernel to update layer 4 checksum if the packet field belongs to the layer 4 pseudoheader. This implicitly provides stateless NAT 1:1 that is useful under very specific usecases. Since rules mangling layer 3 fields that are part of the pseudoheader may potentially convey any layer 4 packet, we have to deal with the layer 4 checksum adjustment using protocol specific code. This patch adds support for TCP, UDP and ICMPv6, since they include the pseudoheader in the layer 4 checksum calculation. ICMP doesn't, so we can skip it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: nft_fib_ipv4: initialize *dest to zeroLiping Zhang
Otherwise, if fib lookup fail, *dest will be filled with garbage value, so reverse path filtering will not work properly: # nft add rule x prerouting fib saddr oif eq 0 drop Fixes: f6d0cbcf09c5 ("netfilter: nf_tables: add fib expression") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: nft_fib: convert htonl to ntohl properlyLiping Zhang
Acctually ntohl and htonl are identical, so this doesn't affect anything, but it is conceptually wrong. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: x_tables: pack percpu counter allocationsFlorian Westphal
instead of allocating each xt_counter individually, allocate 4k chunks and then use these for counter allocation requests. This should speed up rule evaluation by increasing data locality, also speeds up ruleset loading because we reduce calls to the percpu allocator. As Eric points out we can't use PAGE_SIZE, page_allocator would fail on arches with 64k page size. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-06netfilter: x_tables: pass xt_counters struct to counter allocatorFlorian Westphal
Keeps some noise away from a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>