Age | Commit message (Collapse) | Author |
|
[ Upstream commit 679e9ddcf90dbdf98aaaa71a492454654b627bcb ]
When application sending data more than sndbuf_space, there have chances
application will sleep in epoll_wait, and will never be wakeup again. This
is caused by a race between smc_poll and smc_cdc_tx_handler.
application tasklet
smc_tx_sendmsg(len > sndbuf_space) |
epoll_wait for EPOLL_OUT,timeout=0 |
smc_poll |
if (!smc->conn.sndbuf_space) |
| smc_cdc_tx_handler
| atomic_add sndbuf_space
| smc_tx_sndbuf_nonfull
| if (!test_bit SOCK_NOSPACE)
| do not sk_write_space;
set_bit SOCK_NOSPACE; |
return mask=0; |
Application will sleep in epoll_wait as smc_poll returns 0. And
smc_cdc_tx_handler will not call sk_write_space because the SOCK_NOSPACE
has not be set. If there is no inflight cdc msg, sk_write_space will not be
called any more, and application will sleep in epoll_wait forever.
So check sndbuf_space again after NOSPACE flag is set to break the race.
Fixes: 8dce2786a290 ("net/smc: smc_poll improvements")
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5211c9729484c923f8d2e06bd29f9322cc42bb8f ]
Considering scenario:
smc_cdc_rx_handler
__smc_release
sock_set_flag
smc_close_active()
sock_set_flag
__set_bit(DEAD) __set_bit(DONE)
Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
if the DEAD flag lost, the state SMC_CLOSED will be never be reached
in smc_close_passive_work:
if (sock_flag(sk, SOCK_DEAD) &&
smc_close_sent_any_close(conn)) {
sk->sk_state = SMC_CLOSED;
} else {
/* just shutdown, but not yet closed locally */
sk->sk_state = SMC_APPFINCLOSEWAIT;
}
Replace sock_set_flags or __set_bit to set_bit will fix this problem.
Since set_bit is atomic.
Fixes: b38d732477e4 ("smc: socket closing and linkgroup cleanup")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ce7ca794712f186da99719e8b4e97bd5ddbb04c3 ]
Before determining whether the msg has unsupported options, it has been
prematurely terminated by the wrong status check.
For the application, the general usages of MSG_FASTOPEN likes
fd = socket(...)
/* rather than connect */
sendto(fd, data, len, MSG_FASTOPEN)
Hence, We need to check the flag before state check, because the sock
state here is always SMC_INIT when applications tries MSG_FASTOPEN.
Once we found unsupported options, fallback it to TCP.
Fixes: ee9dfbef02d1 ("net/smc: handle sockopts forcing fallback")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
v2 -> v1: Optimize code style
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a8424a9b4522a3ab9f32175ad6d848739079071f ]
For passive connections, the refcount increment has been done in
smc_clcsock_accept()-->smc_sock_alloc().
Fixes: 3b2dec2603d5 ("net/smc: restructure client and server code in af_smc")
Signed-off-by: Yacan Liu <liuyacan@corp.netease.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220830152314.838736-1-liuyacan@corp.netease.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 75c1edf23b95a9c66923d9269d8e86e4dbde151f ]
Same trigger condition as commit 86434744. When setsockopt runs
in parallel to a connect(), and switch the socket into fallback
mode. Then the sk_refcnt is incremented in smc_connect(), but
its state stay in SMC_INIT (NOT SMC_ACTIVE). This cause the
corresponding sk_refcnt decrement in __smc_release() will not be
performed.
Fixes: 86434744fedf ("net/smc: add fallback check to connect()")
Signed-off-by: liuyacan <liuyacan@corp.netease.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4e2e65e2e56c6ceb4ea1719360080c0af083229e ]
In the current implementation, when TCP initiates a connection
to an unavailable [ip,port], ECONNREFUSED will be stored in the
TCP socket, but SMC will not. However, some apps (like curl) use
getsockopt(,,SO_ERROR,,) to get the error information, which makes
them miss the error message and behave strangely.
Fixes: 50717a37db03 ("net/smc: nonblocking connect rework")
Signed-off-by: liuyacan <liuyacan@corp.netease.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1a74e99323746353bba11562a2f2d0aa8102f402 ]
Since commit e5d5aadcf3cd ("net/smc: fix sk_refcnt underflow on linkdown
and fallback"), for a fallback connection, __smc_release() does not call
sock_put() if its state is already SMC_CLOSED.
When calling smc_shutdown() after falling back, its state is set to
SMC_CLOSED but does not call sock_put(), so this patch calls it.
Reported-and-tested-by: syzbot+6e29a053eb165bd50de5@syzkaller.appspotmail.com
Fixes: e5d5aadcf3cd ("net/smc: fix sk_refcnt underflow on linkdown and fallback")
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5c15b3123f65f8fbb1b445d9a7e8812e0e435df2 ]
In nginx/wrk benchmark, there's a hung problem with high probability
on case likes that: (client will last several minutes to exit)
server: smc_run nginx
client: smc_run wrk -c 10000 -t 1 http://server
Client hangs with the following backtrace:
0 [ffffa7ce8Of3bbf8] __schedule at ffffffff9f9eOd5f
1 [ffffa7ce8Of3bc88] schedule at ffffffff9f9eløe6
2 [ffffa7ce8Of3bcaO] schedule_timeout at ffffffff9f9e3f3c
3 [ffffa7ce8Of3bd2O] wait_for_common at ffffffff9f9el9de
4 [ffffa7ce8Of3bd8O] __flush_work at ffffffff9fOfeOl3
5 [ffffa7ce8øf3bdfO] smc_release at ffffffffcO697d24 [smc]
6 [ffffa7ce8Of3be2O] __sock_release at ffffffff9f8O2e2d
7 [ffffa7ce8Of3be4ø] sock_close at ffffffff9f8ø2ebl
8 [ffffa7ce8øf3be48] __fput at ffffffff9f334f93
9 [ffffa7ce8Of3be78] task_work_run at ffffffff9flOlff5
10 [ffffa7ce8Of3beaO] do_exit at ffffffff9fOe5Ol2
11 [ffffa7ce8Of3bflO] do_group_exit at ffffffff9fOe592a
12 [ffffa7ce8Of3bf38] __x64_sys_exit_group at ffffffff9fOe5994
13 [ffffa7ce8Of3bf4O] do_syscall_64 at ffffffff9f9d4373
14 [ffffa7ce8Of3bfsO] entry_SYSCALL_64_after_hwframe at ffffffff9fa0007c
This issue dues to flush_work(), which is used to wait for
smc_connect_work() to finish in smc_release(). Once lots of
smc_connect_work() was pending or all executing work dangling,
smc_release() has to block until one worker comes to free, which
is equivalent to wait another smc_connnect_work() to finish.
In order to fix this, There are two changes:
1. For those idle smc_connect_work(), cancel it from the workqueue; for
executing smc_connect_work(), waiting for it to finish. For that
purpose, replace flush_work() with cancel_work_sync().
2. Since smc_connect() hold a reference for passive closing, if
smc_connect_work() has been cancelled, release the reference.
Fixes: 24ac3a08e658 ("net/smc: rebuild nonblocking connect")
Reported-by: Tony Lu <tonylu@linux.alibaba.com>
Tested-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/1639571361-101128-1-git-send-email-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7a61432dc81375be06b02f0061247d3efbdfce3a ]
Possible recursive locking is detected by lockdep when SMC
falls back to TCP. The corresponding warnings are as follows:
============================================
WARNING: possible recursive locking detected
5.16.0-rc1+ #18 Tainted: G E
--------------------------------------------
wrk/1391 is trying to acquire lock:
ffff975246c8e7d8 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0x109/0x250 [smc]
but task is already holding lock:
ffff975246c8f918 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0xfe/0x250 [smc]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&ei->socket.wq.wait);
lock(&ei->socket.wq.wait);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by wrk/1391:
#0: ffff975246040130 (sk_lock-AF_SMC){+.+.}-{0:0}, at: smc_connect+0x43/0x150 [smc]
#1: ffff975246c8f918 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0xfe/0x250 [smc]
stack backtrace:
Call Trace:
<TASK>
dump_stack_lvl+0x56/0x7b
__lock_acquire+0x951/0x11f0
lock_acquire+0x27a/0x320
? smc_switch_to_fallback+0x109/0x250 [smc]
? smc_switch_to_fallback+0xfe/0x250 [smc]
_raw_spin_lock_irq+0x3b/0x80
? smc_switch_to_fallback+0x109/0x250 [smc]
smc_switch_to_fallback+0x109/0x250 [smc]
smc_connect_fallback+0xe/0x30 [smc]
__smc_connect+0xcf/0x1090 [smc]
? mark_held_locks+0x61/0x80
? __local_bh_enable_ip+0x77/0xe0
? lockdep_hardirqs_on+0xbf/0x130
? smc_connect+0x12a/0x150 [smc]
smc_connect+0x12a/0x150 [smc]
__sys_connect+0x8a/0xc0
? syscall_enter_from_user_mode+0x20/0x70
__x64_sys_connect+0x16/0x20
do_syscall_64+0x34/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
The nested locking in smc_switch_to_fallback() is considered to
possibly cause a deadlock because smc_wait->lock and clc_wait->lock
are the same type of lock. But actually it is safe so far since
there is no other place trying to obtain smc_wait->lock when
clc_wait->lock is held. So the patch replaces spin_lock() with
spin_lock_nested() to avoid false report by lockdep.
Link: https://lkml.org/lkml/2021/11/19/962
Fixes: 2153bd1e3d3d ("Transfer remaining wait queue entries during fallback")
Reported-by: syzbot+e979d3597f48262cb4ee@syzkaller.appspotmail.com
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Acked-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2153bd1e3d3dbf6a3403572084ef6ed31c53c5f0 ]
The SMC fallback is incomplete currently. There may be some
wait queue entries remaining in smc socket->wq, which should
be removed to clcsocket->wq during the fallback.
For example, in nginx/wrk benchmark, this issue causes an
all-zeros test result:
server: nginx -g 'daemon off;'
client: smc_run wrk -c 1 -t 1 -d 5 http://11.200.15.93/index.html
Running 5s test @ http://11.200.15.93/index.html
1 threads and 1 connections
Thread Stats Avg Stdev Max ± Stdev
Latency 0.00us 0.00us 0.00us -nan%
Req/Sec 0.00 0.00 0.00 -nan%
0 requests in 5.00s, 0.00B read
Requests/sec: 0.00
Transfer/sec: 0.00B
The reason for this all-zeros result is that when wrk used SMC
to replace TCP, it added an eppoll_entry into smc socket->wq
and expected to be notified if epoll events like EPOLL_IN/
EPOLL_OUT occurred on the smc socket.
However, once a fallback occurred, wrk switches to use clcsocket.
Now it is clcsocket->wq instead of smc socket->wq which will
be woken up. The eppoll_entry remaining in smc socket->wq does
not work anymore and wrk stops the test.
This patch fixes this issue by removing remaining wait queue
entries from smc socket->wq to clcsocket->wq during the fallback.
Link: https://www.spinics.net/lists/netdev/msg779769.html
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bacb6c1e47691cda4a95056c21b5487fb7199fcc ]
When applications call shutdown() with SHUT_RDWR in userspace,
smc_close_active() calls kernel_sock_shutdown(), and it is called
twice in smc_shutdown().
This fixes this by checking sk_state before do clcsock shutdown, and
avoids missing the application's call of smc_shutdown().
Link: https://lore.kernel.org/linux-s390/1f67548e-cbf6-0dce-82b5-10288a4583bd@linux.ibm.com/
Fixes: 606a63c9783a ("net/smc: Ensure the active closing peer first closes clcsock")
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20211126024134.45693-1-tonylu@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e5d5aadcf3cd59949316df49c27cb21788d7efe4 ]
We got the following WARNING when running ab/nginx
test with RDMA link flapping (up-down-up).
The reason is when smc_sock fallback and at linkdown
happens simultaneously, we may got the following situation:
__smc_lgr_terminate()
--> smc_conn_kill()
--> smc_close_active_abort()
smc_sock->sk_state = SMC_CLOSED
sock_put(smc_sock)
smc_sock was set to SMC_CLOSED and sock_put() been called
when terminate the link group. But later application call
close() on the socket, then we got:
__smc_release():
if (smc_sock->fallback)
smc_sock->sk_state = SMC_CLOSED
sock_put(smc_sock)
Again we set the smc_sock to CLOSED through it's already
in CLOSED state, and double put the refcnt, so the following
warning happens:
refcount_t: underflow; use-after-free.
WARNING: CPU: 5 PID: 860 at lib/refcount.c:28 refcount_warn_saturate+0x8d/0xf0
Modules linked in:
CPU: 5 PID: 860 Comm: nginx Not tainted 5.10.46+ #403
Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
RIP: 0010:refcount_warn_saturate+0x8d/0xf0
Code: 05 5c 1e b5 01 01 e8 52 25 bc ff 0f 0b c3 80 3d 4f 1e b5 01 00 75 ad 48
RSP: 0018:ffffc90000527e50 EFLAGS: 00010286
RAX: 0000000000000026 RBX: ffff8881300df2c0 RCX: 0000000000000027
RDX: 0000000000000000 RSI: ffff88813bd58040 RDI: ffff88813bd58048
RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000001
R10: ffff8881300df2c0 R11: ffffc90000527c78 R12: ffff8881300df340
R13: ffff8881300df930 R14: ffff88810b3dad80 R15: ffff8881300df4f8
FS: 00007f739de8fb80(0000) GS:ffff88813bd40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000a01b008 CR3: 0000000111b64003 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
smc_release+0x353/0x3f0
__sock_release+0x3d/0xb0
sock_close+0x11/0x20
__fput+0x93/0x230
task_work_run+0x65/0xa0
exit_to_user_mode_prepare+0xf9/0x100
syscall_exit_to_user_mode+0x27/0x190
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This patch adds check in __smc_release() to make
sure we won't do an extra sock_put() and set the
socket to CLOSED when its already in CLOSED state.
Fixes: 51f1de79ad8e (net/smc: replace sock_put worker by socket refcounting)
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f3a3a0fe0b644582fa5d83dd94b398f99fc57914 ]
There should use TCPF_SYN_RECV instead of TCP_SYN_RECV.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8621436671f3a4bba5db57482e1ee604708bf1eb ]
syzbot is able to setup kTLS on an SMC socket which coincidentally
uses sk_user_data too. Later, kTLS treats it as psock so triggers a
refcnt warning. The root cause is that smc_setsockopt() simply calls
TCP setsockopt() which includes TCP_ULP. I do not think it makes
sense to setup kTLS on top of SMC sockets, so we should just disallow
this setup.
It is hard to find a commit to blame, but we can apply this patch
since the beginning of TCP_ULP.
Reported-and-tested-by: syzbot+b54a1ce86ba4a623b7f0@syzkaller.appspotmail.com
Fixes: 734942cc4ea6 ("tcp: ULP infrastructure")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 67f562e3e147750a02b2a91d21a163fc44a1d13e upstream.
SMC does not work together with FASTOPEN. If sendmsg() is called with
flag MSG_FASTOPEN in SMC_INIT state, the SMC-socket switches to
fallback mode. To handle the previous ioctl FIOASYNC call correctly
in this case, it is necessary to transfer the socket wait queue
fasync_list to the internal TCP socket.
Reported-by: syzbot+4b1fe8105f8044a26162@syzkaller.appspotmail.com
Fixes: ee9dfbef02d18 ("net/smc: handle sockopts forcing fallback")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 86434744fedf0cfe07a9eee3f4632c0e25c1d136 upstream.
FASTOPEN setsockopt() or sendmsg() may switch the SMC socket to fallback
mode. Once fallback mode is active, the native TCP socket functions are
called. Nevertheless there is a small race window, when FASTOPEN
setsockopt/sendmsg runs in parallel to a connect(), and switch the
socket into fallback mode before connect() takes the sock lock.
Make sure the SMC-specific connect setup is omitted in this case.
This way a syzbot-reported refcount problem is fixed, triggered by
different threads running non-blocking connect() and FASTOPEN_KEY
setsockopt.
Reported-by: syzbot+96d3f9ff6a86d37e44c8@syzkaller.appspotmail.com
Fixes: 6d6dd528d5af ("net/smc: fix refcount non-blocking connect() -part 2")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
FASTOPEN does not work with SMC-sockets. Since SMC allows fallback to
TCP native during connection start, the FASTOPEN setsockopts trigger
this fallback, if the SMC-socket is still in state SMC_INIT.
But if a FASTOPEN setsockopt is called after a non-blocking connect(),
this is broken, and fallback does not make sense.
This change complements
commit cd2063604ea6 ("net/smc: avoid fallback in case of non-blocking connect")
and fixes the syzbot reported problem "WARNING in smc_unhash_sk".
Reported-by: syzbot+8488cc4cf1c9e09b8b86@syzkaller.appspotmail.com
Fixes: e1bbdd570474 ("net/smc: reduce sock_put() for fallback sockets")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If an SMC socket is immediately terminated after a non-blocking connect()
has been called, a memory leak is possible.
Due to the sock_hold move in
commit 301428ea3708 ("net/smc: fix refcounting for non-blocking connect()")
an extra sock_put() is needed in smc_connect_work(), if the internal
TCP socket is aborted and cancels the sk_stream_wait_connect() of the
connect worker.
Reported-by: syzbot+4b73ad6fc767e576e275@syzkaller.appspotmail.com
Fixes: 301428ea3708 ("net/smc: fix refcounting for non-blocking connect()")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If a nonblocking socket is immediately closed after connect(),
the connect worker may not have started. This results in a refcount
problem, since sock_hold() is called from the connect worker.
This patch moves the sock_hold in front of the connect worker
scheduling.
Reported-by: syzbot+4c063e6dea39e4b79f29@syzkaller.appspotmail.com
Fixes: 50717a37db03 ("net/smc: nonblocking connect rework")
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Creating of an SMC-R connection with vlan-id fails, because
smc_listen_work() determines the vlan_id of the connection,
saves it in struct smc_init_info ini, but clears the ini area
again if SMC-D is not applicable.
This patch just resets the ISM device before investigating
SMC-R availability.
Fixes: bc36d2fc93eb ("net/smc: consolidate function parameters")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For SMC sockets forced to fallback to TCP, the file is propagated
from the outer SMC to the internal TCP socket. When closing the SMC
socket, the internal TCP socket file pointer must be restored to the
original NULL value, otherwise memory leaks may show up (found with
CONFIG_DEBUG_KMEMLEAK).
The internal TCP socket is released in smc_clcsock_release(), which
calls __sock_release() function in net/socket.c. This calls the
needed iput(SOCK_INODE(sock)) only, if the file pointer has been reset
to the original NULL-value.
Fixes: 07603b230895 ("net/smc: propagate file from SMC to TCP socket")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
FASTOPEN is not possible with SMC. sendmsg() with msg_flag MSG_FASTOPEN
triggers a fallback to TCP if the socket is in state SMC_INIT.
But if a nonblocking connect is already started, fallback to TCP
is no longer possible, even though the socket may still be in state
SMC_INIT.
And if a nonblocking connect is already started, a listen() call
does not make sense.
Reported-by: syzbot+bd8cc73d665590a1fcad@syzkaller.appspotmail.com
Fixes: 50717a37db032 ("net/smc: nonblocking connect rework")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The setsockopts options TCP_NODELAY and TCP_CORK may schedule the
tx worker. Make sure the socket is not yet moved into SMC_CLOSED
state (for instance by a shutdown SHUT_RDWR call).
Reported-by: syzbot+92209502e7aab127c75f@syzkaller.appspotmail.com
Reported-by: syzbot+b972214bb803a343f4fe@syzkaller.appspotmail.com
Fixes: 01d2f7e2cdd31 ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The new route handling in ip_mc_finish_output() from 'net' overlapped
with the new support for returning congestion notifications from BPF
programs.
In order to handle this I had to take the dev_loopback_xmit() calls
out of the switch statement.
The aquantia driver conflicts were simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are common steps when releasing an accepted or unaccepted socket.
Move this code into a common routine.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If register_pernet_subsys success in smc_init,
we should cleanup it in case any other error.
Fixes: 64e28b52c7a6 (net/smc: add pnet table namespace support")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have MODULE_LICENCE("GPL*") inside which was used in the initial
scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Conflict resolution of af_smc.c from Stephen Rothwell.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rework smc_conn_create() to always return a valid DECLINE reason code.
This removes the need to translate the return codes on 4 different
places and allows to easily add more detailed return codes by changing
smc_conn_create() only.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rework smc_listen_work() to provide improved reason codes when an
SMC connection is declined. This allows better debugging on user side.
This also adds 3 more detailed reason codes in smc_clc.h to indicate
what type of device was not found (ism or rdma or both), or if ism
cannot talk to the peer.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In smc_listen_work() the variables rc and reason_code are defined which
have the same meaning. Eliminate reason_code in favor of the shorter
name rc. No functional changes.
Rename the functions smc_check_ism() and smc_check_rdma() into
smc_find_ism_device() and smc_find_rdma_device() to make there purpose
more clear. No functional changes.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The vlan_id of the underlying CLC socket was retrieved two times
during processing of the listen handshaking. Change this to get the
vlan id one time in connect and in listen processing, and reuse the id.
And add a new CLC DECLINE return code for the case when the retrieval
of the vlan id failed.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During initialization of an SMC socket a lot of function parameters need
to get passed down the function call path. Consolidate the parameters
in a helper struct so there are less enough parameters to get all passed
by register.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The check for a matching ip prefix and subnet was only done for SMC-R
in smc_listen_rdma_check() but not when an SMC-D connection was
possible. Rename the function into smc_listen_prfx_check() and move its
call to a place where it is called for both SMC variants.
And add a new CLC DECLINE reason for the case when the IP prefix or
subnet check fails so the reason for the failing SMC connection can be
found out more easily.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For nonblocking sockets move the kernel_connect() from the connect
worker into the initial smc_connect part to return kernel_connect()
errors other than -EINPROGRESS to user space.
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit <26d92e951fe0>
("net/smc: move unhash as early as possible in smc_release()")
fixes one occurrence in the smc code, but the same pattern exists
in other places. This patch covers the remaining occurrences and
makes sure, the unhash operation is done before the smc->clcsock is
released. This avoids a potential use-after-free in smc_diag_dump().
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
fcntl(fd, F_SETOWN, getpid()) selects the recipient of SIGURG signals
that are delivered when out-of-band data arrives on socket fd.
If an SMC socket program makes use of such an fcntl() call, it fails
in case of fallback to TCP-mode. In case of fallback the traffic is
processed with the internal TCP socket. Propagating field "file" from the
SMC socket to the internal TCP socket fixes the issue.
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When the clcsock is already released using sock_release() and a pending
smc_listen_work accesses the clcsock than that will fail. Solve this
by canceling and waiting for the work to complete first. Because the
work holds the sock_lock it must make sure that the lock is not hold
before the new helper smc_clcsock_release() is invoked. And before the
smc_listen_work starts working check if the parent listen socket is
still valid, otherwise stop the work early.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds namespace support to the pnet table code. Each network
namespace gets its own pnet table. Infiniband and smcd device pnetids
can only be modified in the initial namespace. In other namespaces they
can still be used as if they were set by the underlying hardware.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
An ipvlan bug fix in 'net' conflicted with the abstraction away
of the IPV6 specific support in 'net-next'.
Similarly, a bug fix for mlx5 in 'net' conflicted with the flow
action conversion in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If SMC client and server connections are both established at the same
time, smc_connect_rdma() cannot send a CLC confirm message while
smc_listen_work() is waiting for one due to lock contention. This can
result in timeouts in smc_clc_wait_msg() and failed SMC connections.
In case of SMC-R, there are two types of LGRs (client and server LGRs)
which can be protected by separate locks. So, this patch splits the LGR
pending lock into two separate locks for client and server to avoid the
locking issue for SMC-R.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If SMC client and server connections are both established at the same
time, smc_connect_ism() cannot send a CLC confirm message while
smc_listen_work() is waiting for one due to lock contention. This can
result in timeouts in smc_clc_wait_msg() and failed SMC connections.
In case of SMC-D, the LGR pending lock is not needed while
smc_listen_work() is waiting for the CLC confirm message. So, this patch
releases the lock earlier for SMC-D to avoid the locking issue.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
According to RFC7609 (http://www.rfc-editor.org/info/rfc7609)
first the SMC-R connection is shut down and then the normal TCP
connection FIN processing drives cleanup of the internal TCP connection.
The unconditional release of the clcsock during active socket closing
has to be postponed if the peer has not yet signalled socket closing.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add SO_TIMESTAMPING_NEW variant of socket timestamp options.
This is the y2038 safe versions of the SO_TIMESTAMPING_OLD
for all architectures.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: chris@zankel.net
Cc: fenghua.yu@intel.com
Cc: rth@twiddle.net
Cc: tglx@linutronix.de
Cc: ubraun@linux.ibm.com
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-s390@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a socket was connected and is now shut down for read, return 0 to
indicate end of data in recvmsg and splice_read (like TCP) and do not
return ENOTCONN. This behavior is required by the socket api.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In smc_release() we release smc->clcsock before unhash the smc
sock, but a parallel smc_diag_dump() may be still reading
smc->clcsock, therefore this could cause a use-after-free as
reported by syzbot.
Reported-and-tested-by: syzbot+fbd1e5476e4c94c7b34e@syzkaller.appspotmail.com
Fixes: 51f1de79ad8e ("net/smc: replace sock_put worker by socket refcounting")
Cc: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reported-by: syzbot+0bf2e01269f1274b4b03@syzkaller.appspotmail.com
Reported-by: syzbot+e3132895630f957306bc@syzkaller.appspotmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Lots of conflicts, by happily all cases of overlapping
changes, parallel adds, things of that nature.
Thanks to Stephen Rothwell, Saeed Mahameed, and others
for their guidance in these resolutions.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
clcsock can be released while kernel_accept() references it in TCP
listen worker. Also, clcsock needs to wake up before released if TCP
fallback is used and the clcsock is blocked by accept. Add a lock to
safely release clcsock and call kernel_sock_shutdown() to wake up
clcsock from accept in smc_release().
Reported-by: syzbot+0bf2e01269f1274b4b03@syzkaller.appspotmail.com
Reported-by: syzbot+e3132895630f957306bc@syzkaller.appspotmail.com
Signed-off-by: Myungho Jung <mhjungk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
When an rmb is no longer in use by a connection, unregister its rkey at
the remote peer with an LLC DELETE RKEY message. With this change,
unused buffers held in the buffer pool are no longer registered at the
remote peer. They are registered before the buffer is actually used and
unregistered when they are no longer used by a connection.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|