summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-07-27samples: bpf: convert xdp_fwd_user.c to libbpfJakub Kicinski
Convert xdp_fwd_user.c to use libbpf instead of bpf_load.o. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27tools: libbpf: add bpf_object__find_program_by_title()Jakub Kicinski
Allow users to find programs by section names. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27tools: libbpf: handle NULL program gracefully in bpf_program__nth_fd()Jakub Kicinski
bpf_map__fd() handles NULL map gracefully and returns -EINVAL. bpf_program__fd() and bpf_program__nth_fd() crash in this case. Make the behaviour more consistent by validating prog pointer as well. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27Merge branch 'bpf-nfp-perf-event-improvements'Daniel Borkmann
Jakub Kicinski says: ==================== This set is focused on improving the performance of perf events reported from BPF offload. Perf events can now be received on packet data queues, which significantly improves the performance (from total of 0.5 Msps to 5Msps per core). To get to this performance we need a fast path for control messages which will operate on raw buffers and recycle them immediately. Patch 5 replaces the map pointers for perf maps with map IDs. We look the pointers up in a hashtable, anyway, to validate they are correct, so there is no performance difference. Map IDs have the advantage of being easier to understand for users in case of errors (we no longer print raw pointers to the logs). Last patch improves info messages about map offload. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27nfp: bpf: improve map offload info messagesJakub Kicinski
FW can put constraints on map element size to maximize resource use and efficiency. When user attempts offload of a map which does not fit into those constraints an informational message is printed to kernel logs to inform user about the reason offload failed. Map offload does not have access to any advanced error reporting like verifier log or extack. There is also currently no way for us to nicely expose the FW capabilities to user space. Given all those constraints we should make sure log messages are as informative as possible. Improve them. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27nfp: bpf: remember maps by IDJakub Kicinski
Record perf maps by map ID, not raw kernel pointer. This helps with debug messages, because printing pointers to logs is frowned upon, and makes debug easier for the users, as map ID is something they should be more familiar with. Note that perf maps are offload neutral, therefore IDs won't be orphaned. While at it use a rate limited print helper for the error message. Reported-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27nfp: bpf: allow receiving perf events on data queuesJakub Kicinski
Control queue is fairly low latency, and requires SKB allocations, which means we can't even reach 0.5Msps with perf events. Allow perf events to be delivered to data queues. This allows us to not only use multiple queues, but also receive and deliver to user space more than 5Msps per queue (Xeon E5-2630 v4 2.20GHz, no retpolines). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27nfp: bpf: pass raw data buffer to nfp_bpf_event_output()Jakub Kicinski
In preparation for SKB-less perf event handling make nfp_bpf_event_output() take buffer address and length, not SKB as parameters. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27nfp: allow control message reception on data queuesJakub Kicinski
Port id 0xffffffff is reserved for control messages. Allow reception of messages with this id on data queues. Hand off a raw buffer to the higher layer code, without allocating SKB for max efficiency. The RX handle can't modify or keep the buffer, after it returns buffer is handed back over to the NIC RX free buffer list. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27nfp: move repr handling on RX pathJakub Kicinski
Representor packets are received on PF queues with special metadata tag for demux. There is no reason to resolve the representor ID -> netdev after the skb has been allocated. Move the code, this will allow us to handle special FW messages without SKB allocation overhead. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27samples/bpf: Add BTF build flags to MakefileTaeung Song
To smoothly test BTF supported binary on samples/bpf, let samples/bpf/Makefile probe llc, pahole and llvm-objcopy for BPF support and use them like tools/testing/selftests/bpf/Makefile changed from the commit c0fa1b6c3efc ("bpf: btf: Add BTF tests"). Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27samples/bpf: xdpsock: order memory on AArch64Brian Brooks
Define u_smp_rmb() and u_smp_wmb() to respective barrier instructions. This ensures the processor will order accesses to queue indices against accesses to queue ring entries. Signed-off-by: Brian Brooks <brian.brooks@linaro.org> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-26tools/bpftool: ignore build productsTaeung Song
For untracked things of tools/bpf, add this. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-25bpf: Add Python 3 support to selftests scripts for bpfJeremy Cline
Adjust tcp_client.py and tcp_server.py to work with Python 3 by using the print function, marking string literals as bytes, and using the newer exception syntax. This should be functionally equivalent and supports Python 3+. Signed-off-by: Jeremy Cline <jcline@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-25bpf: btf: fix inconsistent IS_ERR and PTR_ERRYueHaibing
Fix inconsistent IS_ERR and PTR_ERR in get_btf, the proper pointer to be passed as argument is '*btf' This issue was detected with the help of Coccinelle. Fixes: 2d3feca8c44f ("bpf: btf: print map dump and lookup with btf info") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-24net/sched: add skbprio schedulerNishanth Devarajan
Skbprio (SKB Priority Queue) is a queueing discipline that prioritizes packets according to their skb->priority field. Under congestion, already-enqueued lower priority packets will be dropped to make space available for higher priority packets. Skbprio was conceived as a solution for denial-of-service defenses that need to route packets with different priorities as a means to overcome DoS attacks. v5 *Do not reference qdisc_dev(sch)->tx_queue_len for setting limit. Instead set default sch->limit to 64. v4 *Drop Documentation/networking/sch_skbprio.txt doc file to move it to tc man page for Skbprio, in iproute2. v3 *Drop max_limit parameter in struct skbprio_sched_data and instead use sch->limit. *Reference qdisc_dev(sch)->tx_queue_len only once, during initialisation for qdisc (previously being referenced every time qdisc changes). *Move qdisc's detailed description from in-code to Documentation/networking. *When qdisc is saturated, enqueue incoming packet first before dequeueing lowest priority packet in queue - improves usage of call stack registers. *Introduce and use overlimit stat to keep track of number of dropped packets. v2 *Use skb->priority field rather than DS field. Rename queueing discipline as SKB Priority Queue (previously Gatekeeper Priority Queue). *Queueing discipline is made classful to expose Skbprio's internal priority queues. Signed-off-by: Nishanth Devarajan <ndev2021@gmail.com> Reviewed-by: Sachin Paryani <sachin.paryani@gmail.com> Reviewed-by: Cody Doucette <doucette@bu.edu> Reviewed-by: Michel Machado <michel@digirati.com.br> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24net: phy: add GBit master / slave error detectionHeiner Kallweit
Certain PHY's have issues when operating in GBit slave mode and can be forced to master mode. Examples are RTL8211C, also the Micrel PHY driver has a DT setting to force master mode. If two such chips are link partners the autonegotiation will fail. Standard defines a self-clearing on read, latched-high bit to indicate this error. Check this bit to inform the user. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24Merge branch 'net-whitespace-cleanups'David S. Miller
Stephen Hemminger says: ==================== net whitespace cleanups Ran script that I use to check for trailing whitespace and blank lines at end of files across all files in net/ directory. These are errors that checkpatch reports and git flags. These are the resulting fixes broken up mostly by subsystem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24net: remove blank lines at end of fileStephen Hemminger
Several files have extra line at end of file. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24l2tp: remove trailing newlineStephen Hemminger
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24bpfilter: remove trailing newlineStephen Hemminger
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24decnet: whitespace fixesStephen Hemminger
Remove trailing whitespace and extra lines at EOF Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24x25: remove blank lines at EOFStephen Hemminger
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24ax25: remove blank line at EOFStephen Hemminger
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24atm: remove blank lines at EOFStephen Hemminger
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24ila: remove blank lines at EOFStephen Hemminger
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24sctp: whitespace fixesStephen Hemminger
Remove blank line at EOF and trailing whitespace. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24xfrm: remove blank lines at EOFStephen Hemminger
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24mpls: remove trailing whitepaceStephen Hemminger
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24llc: fix whitespace issuesStephen Hemminger
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24rds: remove trailing whitespace and blank linesStephen Hemminger
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24wimax: remove blank lines at EOFStephen Hemminger
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24sched: fix trailing whitespaceStephen Hemminger
Remove trailing whitespace and blank lines at EOF Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24net: remove redundant input checks in SIOCSIFTXQLEN case of dev_ifsiocTariq Toukan
The cited patch added a call to dev_change_tx_queue_len in SIOCSIFTXQLEN case. This obsoletes the new len comparison check done before the function call. Remove it here. For the desicion of keep/remove the negative value check, we examine the range check in dev_change_tx_queue_len. On 64-bit we will fail with -ERANGE. The 32-bit int ifr_qlen will be sign extended to 64-bits when it is passed into dev_change_tx_queue_len(). And then for negative values this test triggers: if (new_len != (unsigned int)new_len) return -ERANGE; because: if (0xffffffffWHATEVER != 0x00000000WHATEVER) On 32-bit the signed value will be accepted, changing behavior. Therefore, the negative value check is kept. Fixes: 3f76df198288 ("net: use dev_change_tx_queue_len() for SIOCSIFTXQLEN") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24Merge branch 'cxgb4-collect-free-Tx-Rx-pages-and-page-pointers'David S. Miller
Rahul Lakkireddy says: ==================== cxgb4: collect free Tx/Rx pages and page pointers Patch 1 collects number of free PSTRUCT page pointers in context memory. Patch 2 moves the collection logic for Tx/Rx free pages to common code, since this information needs to be collected in vmcore device dump as well. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24cxgb4: move Tx/Rx free pages collection to common codeRahul Lakkireddy
This information needs to be collected in vmcore device dump as well. So, move to common code. Fixes: fa145d5dfd61 ("cxgb4: display number of rx and tx pages free") Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24cxgb4: collect number of free PSTRUCT page pointersRahul Lakkireddy
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24Merge branch 'mlxsw-Add-extack-messages-for-tc-flower'David S. Miller
Ido Schimmel says: ==================== mlxsw: Add extack messages for tc flower Nir says: This patch set adds extack messages support to tc flower part of mlxsw. The messages provide clear reasoning to failures, as some of the available actions and keys are not supported in driver or HW and resources may get exhausted. The first patch deals with propagation of the extack pointer among the functions dealing with key parsing and action sets handling. Following patches 2-4 add appropriate messages across the different layers of mlxsw tc flower implementation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24mlxsw: spectrum_flower: Add extack messagesNir Dotan
Return extack messages in order to explain failures of unsupported actions, keys and invalid user input. Signed-off-by: Nir Dotan <nird@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24mlxsw: spectrum_acl: Add extack messagesNir Dotan
Return extack messages for failures in action set creation. Messages provide reasons for not being able to implement the action in HW. Signed-off-by: Nir Dotan <nird@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24mlxsw: core_acl_flex_actions: Add extack messagesNir Dotan
Return extack messages for failures in action set creation. Errors may occur when action is not currently supported or due to lack of resources. Signed-off-by: Nir Dotan <nird@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24mlxsw: spectrum_acl: Propagate extack pointerNir Dotan
Propagate extack pointer in order to add extack messages for ACL. In the follow-up patches, appropriate messages will be added in various points. Signed-off-by: Nir Dotan <nird@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24netlink: do not store start function in netlink_cbFlorian Westphal
->start() is called once when dump is being initialized, there is no need to store it in netlink_cb. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24Merge tag 'mac80211-next-for-davem-2018-07-24' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Only a few things: * HE (802.11ax) support in HWSIM * bypass TXQ with NDP frames as they're special * convert ahash -> shash in lib80211 TKIP * avoid playing with tailroom counter defer unless needed to avoid issues in some cases ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24mac80211: restrict delayed tailroom needed decrementManikanta Pubbisetty
As explained in ieee80211_delayed_tailroom_dec(), during roam, keys of the old AP will be destroyed and new keys will be installed. Deletion of the old key causes crypto_tx_tailroom_needed_cnt to go from 1 to 0 and the new key installation causes a transition from 0 to 1. Whenever crypto_tx_tailroom_needed_cnt transitions from 0 to 1, we invoke synchronize_net(); the reason for doing this is to avoid a race in the TX path as explained in increment_tailroom_need_count(). This synchronize_net() operation can be slow and can affect the station roam time. To avoid this, decrementing the crypto_tx_tailroom_needed_cnt is delayed for a while so that upon installation of new key the transition would be from 1 to 2 instead of 0 to 1 and thereby improving the roam time. This is all correct for a STA iftype, but deferring the tailroom_needed decrement for other iftypes may be unnecessary. For example, let's consider the case of a 4-addr client connecting to an AP for which AP_VLAN interface is also created, let the initial value for tailroom_needed on the AP be 1. * 4-addr client connects to the AP (AP: tailroom_needed = 1) * AP will clear old keys, delay decrement of tailroom_needed count * AP_VLAN is created, it takes the tailroom count from master (AP_VLAN: tailroom_needed = 1, AP: tailroom_needed = 1) * Install new key for the station, assume key is plumbed in the HW, there won't be any change in tailroom_needed count on AP iface * Delayed decrement of tailroom_needed count on AP (AP: tailroom_needed = 0, AP_VLAN: tailroom_needed = 1) Because of the delayed decrement on AP iface, tailroom_needed count goes out of sync between AP(master iface) and AP_VLAN(slave iface) and there would be unnecessary tailroom created for the packets going through AP_VLAN iface. Also, WARN_ONs were observed while trying to bring down the AP_VLAN interface: (warn_slowpath_common) (warn_slowpath_null+0x18/0x20) (warn_slowpath_null) (ieee80211_free_keys+0x114/0x1e4) (ieee80211_free_keys) (ieee80211_del_virtual_monitor+0x51c/0x850) (ieee80211_del_virtual_monitor) (ieee80211_stop+0x30/0x3c) (ieee80211_stop) (__dev_close_many+0x94/0xb8) (__dev_close_many) (dev_close_many+0x5c/0xc8) Restricting delayed decrement to station interface alone fixes the problem and it makes sense to do so because delayed decrement is done to improve roam time which is applicable only for client devices. Signed-off-by: Manikanta Pubbisetty <mpubbise@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-07-24wireless/lib80211: Convert from ahash to shashKees Cook
In preparing to remove all stack VLA usage from the kernel[1], this removes the discouraged use of AHASH_REQUEST_ON_STACK in favor of the smaller SHASH_DESC_ON_STACK by converting from ahash-wrapped-shash to direct shash. The stack allocation will be made a fixed size in a later patch to the crypto subsystem. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-07-23Merge tag 'wireless-drivers-next-for-davem-2018-07-23' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.19 The first set of patches for 4.19. Only smaller features and bug fixes, not really anything major. Also included are changes to include/linux/bitfield.h, we agreed with Johannes that it makes sense to apply them via wireless-drivers-next. Major changes: ath10k * support channel 173 * fix spectral scan for QCA9984 and QCA9888 chipsets ath6kl * add support for Dell Wireless 1537 ti wlcore * add support for runtime PM * enable runtime PM autosuspend support qtnfmac * support changing MAC address * enable source MAC address randomization support libertas * fix suspend and resume for SDIO cards mt76 * add software DFS radar pattern detector for mt76x2 based devices ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23Merge branch 'rds-ipv6'David S. Miller
Ka-Cheong Poon says: ==================== rds: IPv6 support This patch set adds IPv6 support to the kernel RDS and related modules. Existing RDS apps using IPv4 address continue to run without any problem. New RDS apps which want to use IPv6 address can do so by passing the address in struct sockaddr_in6 to bind(), connect() or sendmsg(). And those apps also need to use the new IPv6 equivalents of some of the existing socket options as the existing options use a 32 bit integer to store IP address. All RDS code now use struct in6_addr to store IP address. IPv4 address is stored as an IPv4 mapped address. Header file changes There are many data structures (RDS socket options) used by RDS apps which use a 32 bit integer to store IP address. To support IPv6, struct in6_addr needs to be used. To ensure backward compatibility, a new data structure is introduced for each of those data structures which use a 32 bit integer to represent an IP address. And new socket options are introduced to use those new structures. This means that existing apps should work without a problem with the new RDS module. For apps which want to use IPv6, those new data structures and socket options can be used. IPv4 mapped address is used to represent IPv4 address in the new data structures. Internally, all RDS data structures which contain an IP address are changed to use struct in6_addr to store the address. IPv4 address is stored as an IPv4 mapped address. All the functions which take an IP address as argument are also changed to use struct in6_addr. RDS/RDMA/IB uses a private data (struct rds_ib_connect_private) exchange between endpoints at RDS connection establishment time to support RDMA. This private data exchange uses a 32 bit integer to represent an IP address. This needs to be changed in order to support IPv6. A new private data struct rds6_ib_connect_private is introduced to handle this. To ensure backward compatibility, an IPv6 capable RDS stack uses another RDMA listener port (RDS_CM_PORT) to accept IPv6 connection. And it continues to use the original RDS_PORT for IPv4 RDS connections. When it needs to communicate with an IPv6 peer, it uses the RDS_TCP_PORT to send the connection set up request. RDS/TCP changes TCP related code is changed to support IPv6. Note that only an IPv6 TCP listener on port RDS_TCP_PORT is created as it can accept both IPv4 and IPv6 connection requests. IB/RDMA changes The initial private data exchange between IB endpoints using RDMA is changed to support IPv6 address instead, if the peer address is IPv6. To ensure backward compatibility, annother RDMA listener port (RDS_CM_PORT) is used to accept IPv6 connection. An IPv6 capable RDS module continues to use the original RDS_PORT for IPv4 RDS connections. When it needs to communicate with an IPv6 peer, it uses the RDS_CM_PORT to send the connection set up request. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23rds: Extend RDS API for IPv6 supportKa-Cheong Poon
There are many data structures (RDS socket options) used by RDS apps which use a 32 bit integer to store IP address. To support IPv6, struct in6_addr needs to be used. To ensure backward compatibility, a new data structure is introduced for each of those data structures which use a 32 bit integer to represent an IP address. And new socket options are introduced to use those new structures. This means that existing apps should work without a problem with the new RDS module. For apps which want to use IPv6, those new data structures and socket options can be used. IPv4 mapped address is used to represent IPv4 address in the new data structures. v4: Revert changes to SO_RDS_TRANSPORT Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23rds: Enable RDS IPv6 supportKa-Cheong Poon
This patch enables RDS to use IPv6 addresses. For RDS/TCP, the listener is now an IPv6 endpoint which accepts both IPv4 and IPv6 connection requests. RDS/RDMA/IB uses a private data (struct rds_ib_connect_private) exchange between endpoints at RDS connection establishment time to support RDMA. This private data exchange uses a 32 bit integer to represent an IP address. This needs to be changed in order to support IPv6. A new private data struct rds6_ib_connect_private is introduced to handle this. To ensure backward compatibility, an IPv6 capable RDS stack uses another RDMA listener port (RDS_CM_PORT) to accept IPv6 connection. And it continues to use the original RDS_PORT for IPv4 RDS connections. When it needs to communicate with an IPv6 peer, it uses the RDS_CM_PORT to send the connection set up request. v5: Fixed syntax problem (David Miller). v4: Changed port history comments in rds.h (Sowmini Varadhan). v3: Added support to set up IPv4 connection using mapped address (David Miller). Added support to set up connection between link local and non-link addresses. Various review comments from Santosh Shilimkar and Sowmini Varadhan. v2: Fixed bound and peer address scope mismatched issue. Added back rds_connect() IPv6 changes. Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>