summaryrefslogtreecommitdiff
path: root/net/mptcp/protocol.h
AgeCommit message (Collapse)Author
2021-05-25mptcp: validate 'id' when stopping the ADD_ADDR retransmit timerDavide Caratti
when Linux receives an echo-ed ADD_ADDR, it checks the IP address against the list of "announced" addresses. In case of a positive match, the timer that handles retransmissions is stopped regardless of the 'Address Id' in the received packet: this behaviour does not comply with RFC8684 3.4.1. Fix it by validating the 'Address Id' in received echo-ed ADD_ADDRs. Tested using packetdrill, with the following captured output: unpatched kernel: Out <...> Flags [.], ack 1, win 256, options [mptcp add-addr v1 id 1 198.51.100.2 hmac 0xfd2e62517888fe29,mptcp dss ack 3007449509], length 0 In <...> Flags [.], ack 1, win 257, options [mptcp add-addr v1-echo id 1 1.2.3.4,mptcp dss ack 3013740213], length 0 Out <...> Flags [.], ack 1, win 256, options [mptcp add-addr v1 id 1 198.51.100.2 hmac 0xfd2e62517888fe29,mptcp dss ack 3007449509], length 0 In <...> Flags [.], ack 1, win 257, options [mptcp add-addr v1-echo id 90 198.51.100.2,mptcp dss ack 3013740213], length 0 ^^^ retransmission is stopped here, but 'Address Id' is 90 patched kernel: Out <...> Flags [.], ack 1, win 256, options [mptcp add-addr v1 id 1 198.51.100.2 hmac 0x1cf372d59e05f4b8,mptcp dss ack 3007449509], length 0 In <...> Flags [.], ack 1, win 257, options [mptcp add-addr v1-echo id 1 1.2.3.4,mptcp dss ack 1672384568], length 0 Out <...> Flags [.], ack 1, win 256, options [mptcp add-addr v1 id 1 198.51.100.2 hmac 0x1cf372d59e05f4b8,mptcp dss ack 3007449509], length 0 In <...> Flags [.], ack 1, win 257, options [mptcp add-addr v1-echo id 90 198.51.100.2,mptcp dss ack 1672384568], length 0 Out <...> Flags [.], ack 1, win 256, options [mptcp add-addr v1 id 1 198.51.100.2 hmac 0x1cf372d59e05f4b8,mptcp dss ack 3007449509], length 0 In <...> Flags [.], ack 1, win 257, options [mptcp add-addr v1-echo id 1 198.51.100.2,mptcp dss ack 1672384568], length 0 ^^^ retransmission is stopped here, only when both 'Address Id' and 'IP Address' match Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-25mptcp: avoid OOB access in setsockopt()Paolo Abeni
We can't use tcp_set_congestion_control() on an mptcp socket, as such function can end-up accessing a tcp-specific field - prior_ssthresh - causing an OOB access. To allow propagating the correct ca algo on subflow, cache the ca name at initialization time. Additionally avoid overriding the user-selected CA (if any) at clone time. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/182 Fixes: aa1fbd94e5c7 ("mptcp: sockopt: add TCP_CONGESTION and TCP_INFO") Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: export mptcp_subflow_activeGeliang Tang
This patch moved the static function mptcp_subflow_active to protocol.h as an inline one. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: tag sequence_seq with socket stateFlorian Westphal
Paolo Abeni suggested to avoid re-syncing new subflows because they inherit options from listener. In case options were set on listener but are not set on mptcp-socket there is no need to do any synchronisation for new subflows. This change sets sockopt_seq of new mptcp sockets to the seq of the mptcp listener sock. Subflow sequence is set to the embedded tcp listener sk. Add a comment explaing why sk_state is involved in sockopt_seq generation. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: add skeleton to sync msk socket options to subflowsFlorian Westphal
Handle following cases: 1. setsockopt is called with multiple subflows. Change might have to be mirrored to all of them. This is done directly in process context/setsockopt call. 2. Outgoing subflow is created after one or several setsockopt() calls have been made. Old setsockopt changes should be synced to the new socket. 3. Incoming subflow, after setsockopt call(s). Cases 2 and 3 are handled right after the join list is spliced to the conn list. Not all sockopt values can be just be copied by value, some require helper calls. Those can acquire socket lock (which can sleep). If the join->conn list splicing is done from preemptible context, synchronization can be done right away, otherwise its deferred to work queue. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: move sockopt function into a new filePaolo Abeni
The MPTCP sockopt implementation is going to be much more big and complex soon. Let's move it to a different source file. No functional change intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07mptcp: drop MPTCP_ADDR_IPVERSION_4/6Geliang Tang
Since the type of the address family in struct mptcp_options_received became sa_family_t, we should set AF_INET/AF_INET6 to it, instead of using MPTCP_ADDR_IPVERSION_4/6. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07mptcp: use mptcp_addr_info in mptcp_options_receivedGeliang Tang
This patch added a new struct mptcp_addr_info member addr in struct mptcp_options_received, and dropped the original family, addr_id, addr, addr6 and port fields in it. Then we can pass the parameter mp_opt.addr directly to mptcp_pm_add_addr_received and mptcp_pm_add_addr_echoed. Since the port number became big-endian now, use htons to convert the incoming port number to it. Also use ntohs to convert it when passing it to add_addr_generate_hmac or printing it out. Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07mptcp: drop OPTION_MPTCP_ADD_ADDR6Geliang Tang
Since the family field was added in struct mptcp_out_options, no need to use OPTION_MPTCP_ADD_ADDR6 to identify the IPv6 address. Drop it. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07mptcp: use mptcp_addr_info in mptcp_out_optionsGeliang Tang
This patch moved the mptcp_addr_info struct from protocol.h to mptcp.h, added a new struct mptcp_addr_info member addr in struct mptcp_out_options, and dropped the original addr, addr6, addr_id and port fields in it. Then we can use opts->addr to get the adding address from PM directly using mptcp_pm_add_addr_signal. Since the port number became big-endian now, use ntohs to convert it before sending it out with the ADD_ADDR suboption. Also convert it when passing it to add_addr_generate_hmac or printing it out. Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07mptcp: move flags and ifindex out of mptcp_addr_infoGeliang Tang
This patch moved the flags and ifindex fields from struct mptcp_addr_info to struct mptcp_pm_addr_entry. Add the flags and ifindex values as two new parameters to __mptcp_subflow_connect. In mptcp_pm_create_subflow_or_signal_addr, pass the local address entry's flags and ifindex fields to __mptcp_subflow_connect. In mptcp_pm_nl_add_addr_received, just pass two zeros to it. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-02mptcp: add mptcp reset option supportFlorian Westphal
The MPTCP reset option allows to carry a mptcp-specific error code that provides more information on the nature of a connection reset. Reset option data received gets stored in the subflow context so it can be sent to userspace via the 'subflow closed' netlink event. When a subflow is closed, the desired error code that should be sent to the peer is also placed in the subflow context structure. If a reset is sent before subflow establishment could complete, e.g. on HMAC failure during an MP_JOIN operation, the mptcp skb extension is used to store the reset information. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-26mptcp: rename mptcp_pm_nl_add_addr_send_ackGeliang Tang
Since mptcp_pm_nl_add_addr_send_ack is now used for both ADD_ADDR and RM_ADDR cases, rename it to mptcp_pm_nl_addr_send_ack. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-26mptcp: send ack for rm_addrGeliang Tang
This patch changes the sending ACK conditions for the ADD_ADDR, send an ACK packet for RM_ADDR too. In mptcp_pm_remove_addr, invoke mptcp_pm_nl_add_addr_send_ack to send the ACK packet. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-26mptcp: move to next addr when subflow creation failGeliang Tang
When an invalid address was announced, the subflow couldn't be created for this address. Therefore mptcp_pm_nl_subflow_established couldn't be invoked. Then the next addresses in the local address list didn't have a chance to be announced. This patch invokes the new function mptcp_pm_add_addr_echoed when the address is echoed. In it, use mptcp_lookup_anno_list_by_saddr to check whether this address is in the anno_list. If it is, PM schedules the status MPTCP_PM_SUBFLOW_ESTABLISHED to invoke mptcp_pm_create_subflow_or_signal_addr to deal with the next address in the local address list. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-26mptcp: export lookup_anno_list_by_saddrGeliang Tang
This patch exported the static function lookup_anno_list_by_saddr, and renamed it to mptcp_lookup_anno_list_by_saddr. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-26mptcp: drop unused subflow in mptcp_pm_subflow_establishedGeliang Tang
This patch drops the unused parameter subflow in mptcp_pm_subflow_established(). Fixes: 926bdeab5535 ("mptcp: Implement path manager interface commands") Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-26mptcp: drop argument port from mptcp_pm_announce_addrGeliang Tang
Drop the redundant argument 'port' from mptcp_pm_announce_addr, use the port field of another argument 'addr' instead. Fixes: 0f5c9e3f079f ("mptcp: add port parameter for mptcp_pm_announce_addr") Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-26mptcp: clean-up the rtx pathPaolo Abeni
After the previous patch we can easily avoid invoking the workqueue to perform the retransmission, if the msk socket lock is held at rtx timer expiration. This also simplifies the relevant code. Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12mptcp: remove multi subflows in PMGeliang Tang
This patch dealt with removing multi subflows in PM: In mptcp_pm_remove_subflow, changed the input parameter local_id as an list of removing address ids, and passed the list to mptcp_pm_nl_rm_subflow_received. In mptcp_pm_nl_rm_subflow_received, iterated each address id from the received ids list. Then shut down and closed each address id's subsocket. In mptcp_nl_remove_subflow_and_signal_addr, put the single address id into an ids list, and passed it to mptcp_pm_remove_subflow. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12mptcp: remove multi addresses in PMGeliang Tang
This patch dropped the member rm_id of struct mptcp_pm_data. Use rm_list_rx in mptcp_pm_nl_rm_addr_received instead of using rm_id. In mptcp_pm_nl_rm_addr_received, iterated each address id from pm.rm_list_rx, then shut down and closed each address id's subsocket. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12mptcp: add rm_list_rx in mptcp_pm_dataGeliang Tang
This patch added a new member rm_list_rx for struct mptcp_pm_data as an list of the removing address ids on the incoming direction. Initialized its nr field to zero in mptcp_pm_data_init. In mptcp_pm_rm_addr_received, set it as the input rm_list. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12mptcp: add rm_list in mptcp_options_receivedGeliang Tang
This patch changed the member rm_id in struct mptcp_options_received as a list of the removing address ids, and renamed it to rm_list. In mptcp_parse_option, parsed the RM_ADDR suboption and filled them into the rm_list in struct mptcp_options_received. In mptcp_incoming_options, passed this rm_list to the function mptcp_pm_rm_addr_received. It also changed the parameter type of mptcp_pm_rm_addr_received. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12mptcp: add rm_list_tx in mptcp_pm_dataGeliang Tang
This patch added a new member rm_list_tx for struct mptcp_pm_data as the removing address list on the outgoing direction. Initialize its nr field to zero in mptcp_pm_data_init. In mptcp_pm_remove_anno_addr, put the single address id into an removing list, and passed it to mptcp_pm_remove_addr. In mptcp_pm_remove_addr, save the input rm_list to rm_list_tx in struct mptcp_pm_data. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12mptcp: add rm_list in mptcp_out_optionsGeliang Tang
This patch defined a new struct mptcp_rm_list, the ids field was an array of the removing address ids, the nr field was the valid number of removing address ids in the array. The array size was definced as a new macro MPTCP_RM_IDS_MAX. Changed the member rm_id of struct mptcp_out_options to rm_list. In mptcp_established_options_rm_addr, invoked mptcp_pm_rm_addr_signal to get the rm_list. According the number of addresses in it, calculated the padded RM_ADDR suboption length. And saved the ids array in struct mptcp_out_options's rm_list member. In mptcp_write_options, iterated each address id from struct mptcp_out_options's rm_list member, set the invalid ones as TCPOPT_NOP, then filled them into the RM_ADDR suboption. Changed TCPOLEN_MPTCP_RM_ADDR_BASE from 4 to 3. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08mptcp: fix length of ADD_ADDR with port sub-optionDavide Caratti
in current Linux, MPTCP peers advertising endpoints with port numbers use a sub-option length that wrongly accounts for the trailing TCP NOP. Also, receivers will only process incoming ADD_ADDR with port having such wrong sub-option length. Fix this, making ADD_ADDR compliant to RFC8684 §3.4.1. this can be verified running tcpdump on the kselftests artifacts: unpatched kernel: [root@bottarga mptcp]# tcpdump -tnnr unpatched.pcap | grep add-addr reading from file unpatched.pcap, link-type LINUX_SLL (Linux cooked v1), snapshot length 65535 IP 10.0.1.1.10000 > 10.0.1.2.53078: Flags [.], ack 101, win 509, options [nop,nop,TS val 214459678 ecr 521312851,mptcp add-addr v1 id 1 a00:201:2774:2d88:7436:85c3:17fd:101], length 0 IP 10.0.1.2.53078 > 10.0.1.1.10000: Flags [.], ack 101, win 502, options [nop,nop,TS val 521312852 ecr 214459678,mptcp add-addr[bad opt]] patched kernel: [root@bottarga mptcp]# tcpdump -tnnr patched.pcap | grep add-addr reading from file patched.pcap, link-type LINUX_SLL (Linux cooked v1), snapshot length 65535 IP 10.0.1.1.10000 > 10.0.1.2.38178: Flags [.], ack 101, win 509, options [nop,nop,TS val 3728873902 ecr 2732713192,mptcp add-addr v1 id 1 10.0.2.1:10100 hmac 0xbccdfcbe59292a1f,nop,nop], length 0 IP 10.0.1.2.38178 > 10.0.1.1.10000: Flags [.], ack 101, win 502, options [nop,nop,TS val 2732713195 ecr 3728873902,mptcp add-addr v1-echo id 1 10.0.2.1:10100,nop,nop], length 0 Fixes: 22fb85ffaefb ("mptcp: add port support for ADD_ADDR suboption writing") CC: stable@vger.kernel.org # 5.11+ Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-and-tested-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
2021-02-15mptcp: add local addr info in mptcp_infoGeliang Tang
Add mptcpi_local_addr_used and mptcpi_local_addr_max in struct mptcp_info. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12mptcp: add netlink event supportFlorian Westphal
Allow userspace (mptcpd) to subscribe to mptcp genl multicast events. This implementation reuses the same event API as the mptcp kernel fork to ease integration of existing tools, e.g. mptcpd. Supported events include: 1. start and close of an mptcp connection 2. start and close of subflows (joins) 3. announce and withdrawals of addresses 4. subflow priority (backup/non-backup) change. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12mptcp: pass subflow socket to a few helpersFlorian Westphal
Pass the first/initial subflow to the existing functions so they can pass this on to the notification handler that is added later in the series. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12mptcp: split __mptcp_close_ssk helperFlorian Westphal
Prepare for subflow close events: When mptcp connection is torn down its enough to send the mptcp socket close notification rather than a subflow close event for all of the subflows followed by the mptcp close event. This splits the helper: mptcp_close_ssk() will emit the close notification, __mptcp_close_ssk will not. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12mptcp: move pm netlink work into pm_netlinkFlorian Westphal
Allows to make some functions static and avoids acquire of the pm spinlock in protocol.c. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-11mptcp: better msk receive window updatesPaolo Abeni
Move mptcp_cleanup_rbuf() related checks inside the mentioned helper and extend them to mirror TCP checks more closely. Additionally drop the 'rmem_pending' hack, since commit 879526030c8b ("mptcp: protect the rx path with the msk socket spinlock") we can use instead 'rmem_released'. Fixes: ea4ca586b16f ("mptcp: refine MPTCP-level ack scheduling") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-11mptcp: fix spurious retransmissionsPaolo Abeni
Syzkaller was able to trigger the following splat again: WARNING: CPU: 1 PID: 12512 at net/mptcp/protocol.c:761 mptcp_reset_timer+0x12a/0x160 net/mptcp/protocol.c:761 Modules linked in: CPU: 1 PID: 12512 Comm: kworker/1:6 Not tainted 5.10.0-rc6 #52 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:mptcp_reset_timer+0x12a/0x160 net/mptcp/protocol.c:761 Code: e8 4b 0c ad ff e8 56 21 88 fe 48 b8 00 00 00 00 00 fc ff df 48 c7 04 03 00 00 00 00 48 83 c4 40 5b 5d 41 5c c3 e8 36 21 88 fe <0f> 0b 41 bc c8 00 00 00 eb 98 e8 e7 b1 af fe e9 30 ff ff ff 48 c7 RSP: 0018:ffffc900018c7c68 EFLAGS: 00010293 RAX: ffff888108cb1c80 RBX: 1ffff92000318f8d RCX: ffffffff82ad0307 RDX: 0000000000000000 RSI: ffffffff82ad036a RDI: 0000000000000007 RBP: ffff888113e2d000 R08: ffff888108cb1c80 R09: ffffed10227c5ab7 R10: ffff888113e2d5b7 R11: ffffed10227c5ab6 R12: 0000000000000000 R13: ffff88801f100000 R14: ffff888113e2d5b0 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff88811b500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd76a874ef8 CR3: 000000001689c005 CR4: 0000000000170ee0 Call Trace: mptcp_worker+0xaa4/0x1560 net/mptcp/protocol.c:2334 process_one_work+0x8d3/0x1200 kernel/workqueue.c:2272 worker_thread+0x9c/0x1090 kernel/workqueue.c:2418 kthread+0x303/0x410 kernel/kthread.c:292 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296 The mptcp_worker tries to update the MPTCP retransmission timer even if such timer is not currently scheduled. The mptcp_rtx_head() return value is bogus: we can have enqueued data not yet transmitted. The above may additionally cause spurious, unneeded MPTCP-level retransmissions. Fix the issue adding an explicit clearing of the rtx queue before trying to retransmit and checking for unacked data. Additionally drop an unneeded timer stop call and the unused mptcp_rtx_tail() helper. Reported-by: Christoph Paasch <cpaasch@apple.com> Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-11mptcp: deliver ssk errors to mskPaolo Abeni
Currently all errors received on msk subflows are ignored. We need to catch at least the errors on connect() and on fallback sockets. Use a custom sk_error_report callback at subflow level, and do the real action under the msk socket lock - via the usual sock_owned_by_user()/release_callback() schema. Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-06mptcp: pm: add lockdep assertionsFlorian Westphal
Add a few assertions to make sure functions are called with the needed locks held. Two functions gain might_sleep annotations because they contain conditional calls to functions that sleep. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02mptcp: add port number check for MP_JOINGeliang Tang
This patch adds two new helpers, subflow_use_different_sport and subflow_use_different_dport, to check whether the subflow's source or destination port number is different from the msk's port number. When receiving the MP_JOIN's SYN/SYNACK/ACK, we do these port number checks and print out the different port numbers. And furthermore, when receiving the MP_JOIN's SYN/ACK, we also use a new helper mptcp_pm_sport_in_anno_list to check whether this port number is announced. If it isn't, we need to abort this connection. This patch also populates the local address's port field in local_address. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02mptcp: create the listening socket for new portGeliang Tang
This patch creates a listening socket when an address with a port-number is added by PM netlink. Then binds the new port to the socket, and listens for new connections. When the address is removed or the addresses are flushed by PM netlink, release the listening socket. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02mptcp: drop *_max fields in mptcp_pm_dataGeliang Tang
This patch drops the per-msk values add_addr_signal_max, add_addr_accept_max, local_addr_max and subflows_max fields in struct mptcp_pm_data, uses the pernet *_max values instead. And adds four new helpers to get the pernet *_max values separately. Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02mptcp: fix length of MP_PRIO suboptionDavide Caratti
With version 0 of the protocol it was legal to encode the 'Subflow Id' in the MP_PRIO suboption, to specify which subflow would change its 'Backup' flag. This has been removed from v1 specification: thus, according to RFC 8684 §3.3.8, the resulting 'Length' for MP_PRIO changed from 4 to 3 byte. Current Linux generates / parses MP_PRIO according to the old spec, using 'Length' equal to 4, and hardcoding 1 as 'Subflow Id'; RFC compliance can improve if we change 'Length' in other to become 3, leaving a 'Nop' after the MP_PRIO suboption. In this way the kernel will emit and accept *only* MP_PRIO suboptions that are compliant to version 1 of the MPTCP protocol. unpatched 5.11-rc kernel: [root@bottarga ~]# tcpdump -tnnr unpatched.pcap | grep prio reading from file unpatched.pcap, link-type LINUX_SLL (Linux cooked v1) dropped privs to tcpdump IP 10.0.3.2.48433 > 10.0.1.1.10006: Flags [.], ack 1, win 502, options [nop,nop,TS val 4032325513 ecr 1876514270,mptcp prio non-backup id 1,mptcp dss ack 14084896651682217737], length 0 patched 5.11-rc kernel: [root@bottarga ~]# tcpdump -tnnr patched.pcap | grep prio reading from file patched.pcap, link-type LINUX_SLL (Linux cooked v1) dropped privs to tcpdump IP 10.0.3.2.49735 > 10.0.1.1.10006: Flags [.], ack 1, win 502, options [nop,nop,TS val 1276737699 ecr 2686399734,mptcp prio non-backup,nop,mptcp dss ack 18433038869082491686], length 0 Changes since v2: - when accounting for option space, don't increment 'TCPOLEN_MPTCP_PRIO' and use 'TCPOLEN_MPTCP_PRIO_ALIGN' instead, thanks to Matthieu Baerts. Changes since v1: - refactor patch to avoid using 'TCPOLEN_MPTCP_PRIO' with its old value, thanks to Geliang Tang. Fixes: 067065422fcd ("mptcp: add the outgoing MP_PRIO support") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Matteo Croce <mcroce@linux.microsoft.com> Link: https://lore.kernel.org/r/846cdd41e6ad6ec88ef23fee1552ab39c2f5a3d1.1612184361.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22mptcp: implement delegated actionsPaolo Abeni
On MPTCP-level ack reception, the packet scheduler may select a subflow other then the current one. Prior to this commit we rely on the workqueue to trigger action on such subflow. This changeset introduces an infrastructure that allows any MPTCP subflow to schedule actions (MPTCP xmit) on others subflows without resorting to (multiple) process reschedule. A dummy NAPI instance is used instead. When MPTCP needs to trigger action an a different subflow, it enqueues the target subflow on the NAPI backlog and schedule such instance as needed. The dummy NAPI poll method walks the sockets backlog and tries to acquire the (BH) socket lock on each of them. If the socket is owned by the user space, the action will be completed by the sock release cb, otherwise push is started. This change leverages the delegated action infrastructure to avoid invoking the MPTCP worker to spool the pending data, when the packet scheduler picks a subflow other then the one currently processing the incoming MPTCP-level ack. Additionally we further refine the subflow selection invoking the packet scheduler for each chunk of data even inside __mptcp_subflow_push_pending(). v1 -> v2: - fix possible UaF at shutdown time, resetting sock ops after removing the ulp context Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22mptcp: re-enable sndbuf autotunePaolo Abeni
After commit 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks"), MPTCP never sets the flag bit SOCK_NOSPACE on its subflow. As a side effect, autotune never takes place, as it happens inside tcp_new_space(), which in turn is called only when the mentioned bit is set. Let's sendmsg() set the subflows NOSPACE bit when looking for more memory and use the subflow write_space callback to propagate the snd buf update and wake-up the user-space. Additionally, this allows dropping a bunch of duplicate code and makes the SNDBUF_LIMITED chrono relevant again for MPTCP subflows. Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22mptcp: always graft subflow socket to parentPaolo Abeni
Currently, incoming subflows link to the parent socket, while outgoing ones link to a per subflow socket. The latter is not really needed, except at the initial connect() time and for the first subflow. Always graft the outgoing subflow to the parent socket and free the unneeded ones early. This allows some code cleanup, reduces the amount of memory used and will simplify the next patch Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09mptcp: add the incoming MP_PRIO supportGeliang Tang
This patch added the incoming MP_PRIO logic: Added a flag named mp_prio in struct mptcp_options_received, to mark the MP_PRIO is received, and save the priority value to struct mptcp_options_received's backup member. Then invoke mptcp_pm_mp_prio_received with the receiving subsocket and the backup value. In mptcp_pm_mp_prio_received, get the subflow context according the input subsocket, and change the subflow's backup as the incoming priority value. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09mptcp: add the outgoing MP_PRIO supportGeliang Tang
This patch added the outgoing MP_PRIO logic: In mptcp_pm_nl_mp_prio_send_ack, find the related subflow and subsocket according to the input parameter addr. Save the input priority value to suflow's backup, then set subflow's send_mp_prio flag to true, and save the input priority value to suflow's request_bkup. Finally, send out a pure ACK on the related subsocket. In mptcp_established_options_mp_prio, check whether the subflow's send_mp_prio is set. If it is, this is the packet for sending MP_PRIO. So save subflow->request_bkup value to mptcp_out_options's backup, and change the option type to OPTION_MPTCP_PRIO. In mptcp_write_options, clear the send_mp_prio flag and send out the MP_PRIO suboption with mptcp_out_options's backup value. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-17mptcp: push pending frames when subflow has free spacePaolo Abeni
When multiple subflows are active, we can receive a window update on subflow with no write space available. MPTCP will try to push frames on such subflow and will fail. Pending frames will be pushed only after receiving a window update on a subflow with some wspace available. Overall the above could lead to suboptimal aggregate bandwidth usage. Instead, we should try to push pending frames as soon as the subflow reaches both conditions mentioned above. We can finally enable self-tests with asymmetric links, as the above makes them finally pass. Fixes: 6f8a612a33e4 ("mptcp: keep track of advertised windows right edge") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14mptcp: parse and act on incoming FASTCLOSE optionFlorian Westphal
parse the MPTCP FASTCLOSE subtype. If provided key matches the local one, schedule the work queue to close (with tcp reset) all subflows. The MPTCP socket moves to closed state immediately. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14mptcp: use MPTCPOPT_HMAC_LEN macroGeliang Tang
Use the macro MPTCPOPT_HMAC_LEN instead of a constant in struct mptcp_options_received. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-09mptcp: link MPC subflow into msk only after acceptPaolo Abeni
Christoph reported the following splat: WARNING: CPU: 0 PID: 4615 at net/ipv4/inet_connection_sock.c:1031 inet_csk_listen_stop+0x8e8/0xad0 net/ipv4/inet_connection_sock.c:1031 Modules linked in: CPU: 0 PID: 4615 Comm: syz-executor.4 Not tainted 5.9.0 #37 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:inet_csk_listen_stop+0x8e8/0xad0 net/ipv4/inet_connection_sock.c:1031 Code: 03 00 00 00 e8 79 b2 3d ff e9 ad f9 ff ff e8 1f 76 ba fe be 02 00 00 00 4c 89 f7 e8 62 b2 3d ff e9 14 f9 ff ff e8 08 76 ba fe <0f> 0b e9 97 f8 ff ff e8 fc 75 ba fe be 03 00 00 00 4c 89 f7 e8 3f RSP: 0018:ffffc900037f7948 EFLAGS: 00010293 RAX: ffff88810a349c80 RBX: ffff888114ee1b00 RCX: ffffffff827b14cd RDX: 0000000000000000 RSI: ffffffff827b1c38 RDI: 0000000000000005 RBP: ffff88810a2a8000 R08: ffff88810a349c80 R09: fffff520006fef1f R10: 0000000000000003 R11: fffff520006fef1e R12: ffff888114ee2d00 R13: dffffc0000000000 R14: 0000000000000001 R15: ffff888114ee1d68 FS: 00007f2ac1945700(0000) GS:ffff88811b400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd44798bc0 CR3: 0000000109810002 CR4: 0000000000170ef0 Call Trace: __tcp_close+0xd86/0x1110 net/ipv4/tcp.c:2433 __mptcp_close_ssk+0x256/0x430 net/mptcp/protocol.c:1761 __mptcp_destroy_sock+0x49b/0x770 net/mptcp/protocol.c:2127 mptcp_close+0x62d/0x910 net/mptcp/protocol.c:2184 inet_release+0xe9/0x1f0 net/ipv4/af_inet.c:434 __sock_release+0xd2/0x280 net/socket.c:596 sock_close+0x15/0x20 net/socket.c:1277 __fput+0x276/0x960 fs/file_table.c:281 task_work_run+0x109/0x1d0 kernel/task_work.c:151 get_signal+0xe8f/0x1d40 kernel/signal.c:2561 arch_do_signal+0x88/0x1b60 arch/x86/kernel/signal.c:811 exit_to_user_mode_loop kernel/entry/common.c:161 [inline] exit_to_user_mode_prepare+0x9b/0xf0 kernel/entry/common.c:191 syscall_exit_to_user_mode+0x22/0x150 kernel/entry/common.c:266 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f2ac1254469 Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48 RSP: 002b:00007f2ac1944dc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffbf RBX: 000000000069bf00 RCX: 00007f2ac1254469 RDX: 0000000000000000 RSI: 0000000000008982 RDI: 0000000000000003 RBP: 000000000069bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000069bf0c R13: 00007ffeb53f178f R14: 00000000004668b0 R15: 0000000000000003 After commit 0397c6d85f9c ("mptcp: keep unaccepted MPC subflow into join list"), the msk's workqueue and/or PM can touch the MPC subflow - and acquire its socket lock - even if it's still unaccepted. If the above event races with the relevant listener socket close, we can end-up with the above splat. This change addresses the issue delaying the MPC socket insertion in conn_list at accept time - that is, partially reverting the blamed commit. We must additionally ensure that mptcp_pm_fully_established() happens after accept() time, or the PM will not be able to handle properly such event - conn_list could be empty otherwise. In the receive path, we check the subflow list node to ensure it is out of the listener queue. Be sure client subflows do not match transiently such condition moving them into the join list earlier at creation time. Since we now have multiple mptcp_pm_fully_established() call sites from different code-paths, said helper can now race with itself. Use an additional PM status bit to avoid multiple notifications. Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/103 Fixes: 0397c6d85f9c ("mptcp: keep unaccepted MPC subflow into join list"), Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09mptcp: rename add_addr_signal and mptcp_add_addr_statusGeliang Tang
Since the RM_ADDR signal had been reused with add_addr_signal, it's not suitable to call it add_addr_signal or mptcp_add_addr_status. So this patch renamed add_addr_signal to addr_signal, and renamed mptcp_add_addr_status to mptcp_addr_signal_status. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>