Age | Commit message (Collapse) | Author |
|
Extend UVERBS_METHOD_REG_MR to get DMAH and pass it to all drivers.
It will be used in mlx5 driver as part of the next patch from the
series.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/2ae1e628c0675db81f092cc00d3ad6fbf6139405.1752752567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Currently, the capability check is done in the default
init_user_ns user namespace. When a process runs in a
non default user namespace, such check fails. Due to this
when a process is running using Podman, it fails to modify
the QP.
Since the RDMA device is a resource within a network namespace,
use the network namespace associated with the RDMA device to
determine its owning user namespace.
Fixes: 0cadb4db79e1 ("RDMA/uverbs: Restrict usage of privileged QKEYs")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Link: https://patch.msgid.link/099eb263622ccdd27014db7e02fec824a3307829.1750963874.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Currently, the capability check is done in the default
init_user_ns user namespace. When a process runs in a
non default user namespace, such check fails. Due to this
when a process is running using Podman, it fails to create
the QP.
Since the RDMA device is a resource within a network namespace,
use the network namespace associated with the RDMA device to
determine its owning user namespace.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Link: https://patch.msgid.link/3914ef9702b01de8843a391ce397fca67d0fc7af.1750963874.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Currently, the capability check is done in the default
init_user_ns user namespace. When a process runs in a
non default user namespace, such check fails. Due to this
when a process is running using Podman, it fails to create
the QP.
Since the RDMA device is a resource within a network namespace,
use the network namespace associated with the RDMA device to
determine its owning user namespace.
Fixes: 2dee0e545894 ("IB/uverbs: Enable QP creation with a given source QP number")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Link: https://patch.msgid.link/0e5920d1dfe836817bb07576b192da41b637130b.1750963874.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Currently, the capability check is done in the default
init_user_ns user namespace. When a process runs in a
non default user namespace, such check fails. Due to this
when a process is running using Podman, it fails to create
the flow resource.
Since the RDMA device is a resource within a network namespace,
use the network namespace associated with the RDMA device to
determine its owning user namespace.
Fixes: 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through uverbs")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Link: https://patch.msgid.link/6df6f2f24627874c4f6d041c19dc1f6f29f68f84.1750963874.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
As opposed to open-code, using the ERR_CAST macro clearly indicates that
this is a pointer to an error value and a type conversion was performed.
Link: https://patch.msgid.link/r/202504012109233981_YPVbd4wQzmAzP3tA5IG@zte.com.cn
Signed-off-by: Li Haoran <li.haoran7@zte.com.cn>
Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Currently, the IB uverbs API calls uobj_get_uobj_read(), which in turn
uses the rdma_lookup_get_uobject() helper to retrieve user objects.
In case of failure, uobj_get_uobj_read() returns NULL, overriding the
error code from rdma_lookup_get_uobject(). The IB uverbs API then
translates this NULL to -EINVAL, masking the actual error and
complicating debugging. For example, applications calling ibv_modify_qp
that fails with EBUSY when retrieving the QP uobject will see the
overridden error code EINVAL instead, masking the actual error.
Furthermore, based on rdma-core commit:
"2a22f1ced5f3 ("Merge pull request #1568 from jakemoroni/master")"
Kernel's IB uverbs return values are either ignored and passed on as is
to application or overridden with other errnos in a few cases.
Thus, to improve error reporting and debuggability, propagate the
original error from rdma_lookup_get_uobject() instead of replacing it
with EINVAL.
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Link: https://patch.msgid.link/64f9d3711b183984e939962c2f83383904f97dfb.1740577869.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add support for file descriptor array attribute for GET_CONTEXT
commands.
Check that the file descriptor (fd) array represents fds for valid UCAPs.
Store the enabled UCAPs from the fd array as a bitmask in ib_ucontext.
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Link: https://patch.msgid.link/ebfb30bc947e2259b193c96a319c80e82599045b.1741261611.git.leon@kernel.org
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
In the expression "cmd.wqe_size * cmd.wr_count", both variables are u32
values that come from the user so the multiplication can lead to integer
wrapping. Then we pass the result to uverbs_request_next_ptr() which also
could potentially wrap. The "cmd.sge_count * sizeof(struct ib_uverbs_sge)"
multiplication can also overflow on 32bit systems although it's fine on
64bit systems.
This patch does two things. First, I've re-arranged the condition in
uverbs_request_next_ptr() so that the use controlled variable "len" is on
one side of the comparison by itself without any math. Then I've modified
all the callers to use size_mul() for the multiplications.
Fixes: 67cdb40ca444 ("[IB] uverbs: Implement more commands")
Cc: stable@vger.kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/b8765ab3-c2da-4611-aae0-ddd6ba173d23@stanley.mountain
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
We want the compiler to see that fdput() on empty instance
is a no-op. The emptiness check is that file reference is NULL,
while fdput() is "fput() if FDPUT_FPUT is present in flags".
The reason why fdput() on empty instance is a no-op is something
compiler can't see - it's that we never generate instances with
NULL file reference combined with non-zero flags.
It's not that hard to deal with - the real primitives behind
fdget() et.al. are returning an unsigned long value, unpacked by (inlined)
__to_fd() into the current struct file * + int. The lower bits are
used to store flags, while the rest encodes the pointer. Linus suggested
that keeping this unsigned long around with the extractions done by inlined
accessors should generate a sane code and that turns out to be the case.
Namely, turning struct fd into a struct-wrapped unsinged long, with
fd_empty(f) => unlikely(f.word == 0)
fd_file(f) => (struct file *)(f.word & ~3)
fdput(f) => if (f.word & 1) fput(fd_file(f))
ends up with compiler doing the right thing. The cost is the patch
footprint, of course - we need to switch f.file to fd_file(f) all over
the tree, and it's not doable with simple search and replace; there are
false positives, etc.
Note that the sole member of that structure is an opaque
unsigned long - all accesses should be done via wrappers and I don't
want to use a name that would invite manual casts to file pointers,
etc. The value of that member is equal either to (unsigned long)p | flags,
p being an address of some struct file instance, or to 0 for an empty fd.
For now the new predicate (fd_empty(f)) has no users; all the
existing checks have form (!fd_file(f)). We will convert to fd_empty()
use later; here we only define it (and tell the compiler that it's
unlikely to return true).
This commit only deals with representation change; there will
be followups.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
For any changes of struct fd representation we need to
turn existing accesses to fields into calls of wrappers.
Accesses to struct fd::flags are very few (3 in linux/file.h,
1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in
explicit initializers).
Those can be dealt with in the commit converting to
new layout; accesses to struct fd::file are too many for that.
This commit converts (almost) all of f.file to
fd_file(f). It's not entirely mechanical ('file' is used as
a member name more than just in struct fd) and it does not
even attempt to distinguish the uses in pointer context from
those in boolean context; the latter will be eventually turned
into a separate helper (fd_empty()).
NOTE: mass conversion to fd_empty(), tempting as it
might be, is a bad idea; better do that piecewise in commit
that convert from fdget...() to CLASS(...).
[conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c
caught by git; fs/stat.c one got caught by git grep]
[fs/xattr.c conflict]
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Changes the create_cq verb signature by sending the entire uverbs attr
bundle as a parameter. This allows drivers to send driver specific attrs
through ioctl for the create_cq verb and access them in their driver
specific code.
Also adds a new enum value for driver specific ioctl attributes for
methods already supporting UHW.
Link: https://lore.kernel.org/r/ed147343987c0d43fd391c1b2f85e2f425747387.1719512393.git.leon@kernel.org
Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
When a struct containing a flexible array is included in another struct,
and there is a member after the struct-with-flex-array, there is a
possibility of memory overlap. These cases must be audited [1]. See:
struct inner {
...
int flex[];
};
struct outer {
...
struct inner header;
int overlap;
...
};
This is the scenario for all the "struct *_filter" structures that are
included in the following "struct ib_flow_spec_*" structures:
struct ib_flow_spec_eth
struct ib_flow_spec_ib
struct ib_flow_spec_ipv4
struct ib_flow_spec_ipv6
struct ib_flow_spec_tcp_udp
struct ib_flow_spec_tunnel
struct ib_flow_spec_esp
struct ib_flow_spec_gre
struct ib_flow_spec_mpls
The pattern is like the one shown below:
struct *_filter {
...
u8 real_sz[];
};
struct ib_flow_spec_* {
...
struct *_filter val;
struct *_filter mask;
};
In this case, the trailing flexible array "real_sz" is never allocated
and is only used to calculate the size of the structures. Here the use
of the "offsetof" helper can be changed by the "sizeof" operator because
the goal is to get the size of these structures. Therefore, the trailing
flexible arrays can also be removed.
However, due to the trailing padding that can be induced in structs it
is possible that the:
offsetof(struct *_filter, real_sz) != sizeof(struct *_filter)
This situation happens with the "struct ib_flow_ipv6_filter" and to
avoid it the "__packed" macro is used in this structure. But now, the
"sizeof(struct ib_flow_ipv6_filter)" has changed. This is not a problem
since this size is not used in the code.
The situation now is that "sizeof(struct ib_flow_spec_ipv6)" has also
changed (this struct contains the struct ib_flow_ipv6_filter). This is
also not a problem since it is only used to set the size of the "union
ib_flow_spec", which can store all the "ib_flow_spec_*" structures.
Link: https://lore.kernel.org/r/20240217142913.4285-1-erick.archer@gmx.com
Signed-off-by: Erick Archer <erick.archer@gmx.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Add netlink command that enables/disables privileged QKEY by default.
It is disabled by default, since according to IB spec only privileged
users are allowed to use privileged QKEY.
According to the IB specification rel-1.6, section 3.5.3:
"QKEYs with the most significant bit set are considered controlled
QKEYs, and a HCA does not allow a consumer to arbitrarily specify a
controlled QKEY."
Using rdma tool,
$rdma system set privileged-qkey on
When enabled non-privileged users would be able to use
controlled QKEYs which are considered privileged.
Using rdma tool,
$rdma system set privileged-qkey off
When disabled only privileged users would be able to use
controlled QKEYs.
You can also use the command below to check the parameter state:
$rdma system show
netns shared privileged-qkey off copy-on-fork on
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Link: https://lore.kernel.org/r/90398be70a9d23d2aa9d0f9fd11d2c264c1be534.1696848201.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
According to the IB specification rel-1.6, section 3.5.3:
"QKEYs with the most significant bit set are considered controlled
QKEYs, and a HCA does not allow a consumer to arbitrarily specify a
controlled QKEY."
Thus, block non-privileged users from setting such a QKEY.
Cc: stable@vger.kernel.org
Fixes: bc38a6abdd5a ("[PATCH] IB uverbs: core implementation")
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://lore.kernel.org/r/c00c809ddafaaf87d6f6cb827978670989a511b3.1685960567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Set 'iova' and 'length' on ib_mr in ib_uverbs and ib_core layers to let all
drivers have the members filled. Also, this commit removes redundancy in
the respective drivers.
Previously, commit 04c0a5fcfcf65 ("IB/uverbs: Set IOVA on IB MR in uverbs
layer") changed to set 'iova', but seems to have missed 'length' and the
ib_core layer at that time.
Fixes: 04c0a5fcfcf65 ("IB/uverbs: Set IOVA on IB MR in uverbs layer")
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Link: https://lore.kernel.org/r/20220921080844.1616883-1-matsuda-daisuke@fujitsu.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Split out flags from ib_device::device_cap_flags that are only used
internally to the kernel into kernel_cap_flags that is not part of the
uapi. This limits the device_cap_flags to being the same bitmap that will
be copied to userspace.
This cleanly splits out the uverbs flags from the kernel flags to avoid
confusion in the flags bitmap.
Add some short comments describing which each of the kernel flags is
connected to. Remove unused kernel flags.
Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
1) Part of enum ib_device_cap_flags are used by ibv_query_device(3)
or ibv_query_device_ex(3), so we define them in
include/uapi/rdma/ib_user_verbs.h and only expose them to userspace.
2) Reformat enum ib_device_cap_flags by removing the indent before '='.
Link: https://lore.kernel.org/r/20220331032419.313904-2-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The struct member variable create_flags is assigned twice. Remove the
unnecessary assignment.
Fixes: ece9ca97ccdc ("RDMA/uverbs: Do not check the input length on create_cq/qp paths")
Link: https://lore.kernel.org/r/20211207064607.541695-1-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
If the driver returns a new MR during rereg it has to fill it with the
IOVA from the proper source. If IB_MR_REREG_TRANS is set then the IOVA is
cmd.hca_va, otherwise the IOVA comes from the old MR. mlx5 for example has
two calls inside rereg_mr:
return create_real_mr(new_pd, umem, mr->ibmr.iova,
new_access_flags);
and
return create_real_mr(new_pd, new_umem, iova, new_access_flags);
Unconditionally overwriting the iova in the newly allocated MR will
corrupt the iova if the first path is used.
Remove the redundant initializations from ib_uverbs_rereg_mr().
Fixes: 6e0954b11c05 ("RDMA/uverbs: Allow drivers to create a new HW object during rereg_mr")
Link: https://lore.kernel.org/r/4b0a31bbc372842613286a10d7a8cbb0ee6069c7.1635400472.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Unify create QP creation interface to make clean approach to create
XRC_TGT and regular QPs.
Link: https://lore.kernel.org/r/5cd50e7d8ad9112545a1a61dea62799a5cb3224a.1628014762.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The QP usecnts were incremented through QP attributes structure while
decreased through QP itself. Rely on the ib_creat_qp_user() code that
initialized all QP parameters prior returning to the user and increment
exactly like destroy does.
Link: https://lore.kernel.org/r/25d256a3bb1fc480b77d7fe439817b993de48610.1628014762.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
All QP creation flows called ib_create_qp_security(), but differently.
This caused to the need to provide exclusion conditions for the XRC_TGT,
because such QP already had selinux configuration call.
In order to fix it, move ib_create_qp_security() to the general QP
creation routine.
Link: https://lore.kernel.org/r/4d7cd6f5828aca37fb62283e6b126b73ab86b18c.1628014762.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Linux 5.13-rc7
Needed for dependencies in following patches. Merge conflict in rxe_cmop.c
resolved by compining both patches.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
There are some '%u' for 'int' and '%d' for 'unsigend int', they should be
fixed.
Link: https://lore.kernel.org/r/1623325232-30900-1-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Validate port value provided by the user and with that remove no longer
needed validation by the driver. The missing check in the mlx5_ib driver
could cause to the below oops.
Call trace:
_create_flow_rule+0x2d4/0xf28 [mlx5_ib]
mlx5_ib_create_flow+0x2d0/0x5b0 [mlx5_ib]
ib_uverbs_ex_create_flow+0x4cc/0x624 [ib_uverbs]
ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xd4/0x150 [ib_uverbs]
ib_uverbs_cmd_verbs.isra.7+0xb28/0xc50 [ib_uverbs]
ib_uverbs_ioctl+0x158/0x1d0 [ib_uverbs]
do_vfs_ioctl+0xd0/0xaf0
ksys_ioctl+0x84/0xb4
__arm64_sys_ioctl+0x28/0xc4
el0_svc_common.constprop.3+0xa4/0x254
el0_svc_handler+0x84/0xa0
el0_svc+0x10/0x26c
Code: b9401260 f9615681 51000400 8b001c20 (f9403c1a)
Fixes: 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through uverbs")
Link: https://lore.kernel.org/r/faad30dc5219a01727f47db3dc2f029d07c82c00.1623309971.git.leonro@nvidia.com
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The mlx4 and mlx5 implemented differently the WQ input checks. Instead of
duplicating mlx4 logic in the mlx5, let's prepare the input in the central
place.
The mlx5 implementation didn't check for validity of state input. It is
not real bug because our FW checked that, but still worth to fix.
Fixes: f213c0527210 ("IB/uverbs: Add WQ support")
Link: https://lore.kernel.org/r/ac41ad6a81b095b1a8ad453dcf62cf8d3c5da779.1621413310.git.leonro@nvidia.com
Reported-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Space is not required after '(', before ')', before ',' and between '*'
and symbol name of a definition.
Link: https://lore.kernel.org/r/1617783353-48249-5-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Current code uses many different types when dealing with a port of a RDMA
device: u8, unsigned int and u32. Switch to u32 to clean up the logic.
This allows us to make (at least) the core view consistent and use the
same type. Unfortunately not all places can be converted. Many uverbs
functions expect port to be u8 so keep those places in order not to break
UAPIs. HW/Spec defined values must also not be changed.
With the switch to u32 we now can support devices with more than 255
ports. U32_MAX is reserved to make control logic a bit easier to deal
with. As a device with U32_MAX ports probably isn't going to happen any
time soon this seems like a non issue.
When a device with more than 255 ports is created uverbs will report the
RDMA device as having 255 ports as this is the max currently supported.
The verbs interface is not changed yet because the IBTA spec limits the
port size in too many places to be u8 and all applications that relies in
verbs won't be able to cope with this change. At this stage, we are
extending the interfaces that are using vendor channel solely
Once the limitation is lifted mlx5 in switchdev mode will be able to have
thousands of SFs created by the device. As the only instance of an RDMA
device that reports more than 255 ports will be a representor device and
it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other
ULPs aren't effected by this change and their sysfs/interfaces that are
exposes to userspace can remain unchanged.
While here cleanup some alignment issues and remove unneeded sanity
checks (mainly in rdmavt),
Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
An INI QP doesn't require receive CQ, the creation flow sets the recv counts
to zero:
if (cmd->qp_type == IB_QPT_XRC_INI) {
cmd->max_recv_wr = 0;
cmd->max_recv_sge = 0;
The new IOCTL path also does not set the rcq, so make things the same.
Link: https://lore.kernel.org/r/20201216071755.149449-1-yangx.jy@cn.fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
mlx5 has an ugly flow where it tries to allocate a new MR and replace the
existing MR in the same memory during rereg. This is very complicated and
buggy. Instead of trying to replace in-place inside the driver, provide
support from uverbs to change the entire HW object assigned to a handle
during rereg_mr.
Since destroying a MR is allowed to fail (ie if a MW is pointing at it)
and can't be detected in advance, the algorithm creates a completely new
uobject to hold the new MR and swaps the IDR entries of the two objects.
The old MR in the temporary IDR entry is destroyed, and if it fails
rereg_mr succeeds and destruction is deferred to FD release. This
complexity is why this cannot live in a driver safely.
Link: https://lore.kernel.org/r/20201130075839.278575-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
No reason only one caller checks this. This properly blocks ODP
from the rereg flow if the device does not support ODP.
Link: https://lore.kernel.org/r/20201130075839.278575-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Unknown flags should be EOPNOTSUPP, only zero flags is EINVAL. Flags is
actually the rereg action to perform.
The checking of the start/hca_va/etc is also redundant and ib_umem_get()
does these checks and returns proper error codes.
Fixes: 7e6edb9b2e0b ("IB/core: Add user MR re-registration support")
Link: https://lore.kernel.org/r/20201130075839.278575-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The latest changes in restrack name handling allowed to simplify the QP
creation code to support all types of QPs.
For example XRC QP are presented with rdmatool.
$ ibv_xsrq_pingpong &
$ rdma res show qp
link ibp0s9/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core]
link ibp0s9/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core]
link ibp0s9/1 lqpn 7 type UD state RTS sq-psn 0 comm [mlx5_ib]
link ibp0s9/1 lqpn 42 type XRC_TGT state INIT sq-psn 0 path-mig-state MIGRATED comm [ib_uverbs]
link ibp0s9/1 lqpn 43 type XRC_INI state INIT sq-psn 0 path-mig-state MIGRATED pdn 197 pid 419 comm ibv_xsrq_pingpong
Link: https://lore.kernel.org/r/20201117070148.1974114-4-leon@kernel.org
Reviewed-by: Mark Zhang <markz@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Remove the ib_is_destroyable_retryable() concept.
The idea here was to allow the drivers to forcibly clean the HW object
even if they otherwise didn't want to (eg because of usecnt). This was an
attempt to clean up in a world where drivers were not allowed to fail HW
object destruction.
Now that we are going back to allowing HW objects to fail destroy this
doesn't make sense. Instead if a uobject's HW object can't be destroyed it
is left on the uobject list and it is up to uverbs_destroy_ufile_hw() to
clean it. Multiple passes over the uobject list allow hidden dependencies
to be resolved. If that fails the HW driver is broken, throw a WARN_ON and
leak the HW object memory.
All the other tricky failure paths (eg on creation error unwind) have
already been updated to this new model.
Link: https://lore.kernel.org/r/20201104144556.3809085-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Drivers that need a uverbs AH should instead set the create_user_ah() op
similar to reg_user_mr(). MODIFY_AH and QUERY_AH cmds were never
implemented so are just deleted.
Link: https://lore.kernel.org/r/11-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Each driver should check that it can support the provided attr_mask during
modify_qp. IB_USER_VERBS_EX_CMD_MODIFY_QP was being used to block
modify_qp_ex because the driver didn't check RATE_LIMIT.
Link: https://lore.kernel.org/r/6-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
These functions all depend on the driver providing a specific op:
- REREG_MR is rereg_user_mr(). bnxt_re set this without providing the op
- ATTACH/DEATCH_MCAST is attach_mcast()/detach_mcast(). usnic set this
without providing the op
- OPEN_QP doesn't involve the driver but requires a XRCD. qedr provides
xrcd but forgot to set it, usnic doesn't provide XRCD but set it anyhow.
- OPEN/CLOSE_XRCD are the ops alloc_xrcd()/dealloc_xrcd()
- CREATE_SRQ/DESTROY_SRQ are the ops create_srq()/destroy_srq()
- QUERY/MODIFY_SRQ is op query_srq()/modify_srq(). hns sets this but
sometimes supplies a NULL op.
- RESIZE_CQ is op resize_cq(). bnxt_re sets this boes doesn't supply an op
- ALLOC/DEALLOC_MW is alloc_mw()/dealloc_mw(). cxgb4 provided an
(now deleted) implementation but no userspace
All drivers were checked that no drivers provide the op without also
setting uverbs_cmd_mask so this should have no functional change.
Link: https://lore.kernel.org/r/4-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Since a while now the uverbs layer checks if the driver implements a
function before allowing the ucmd to proceed. This largely obsoletes the
cmd_mask stuff, but there is some tricky bits in drivers preventing it
from being removed.
Remove the easy elements of uverbs_ex_cmd_mask by pre-setting them in the
core code. These are triggered soley based on the related ops function
pointer.
query_device_ex is not triggered based on an op, but all drivers already
implement something compatible with the extension, so enable it globally
too.
Link: https://lore.kernel.org/r/2-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The ioctl flow checks that the user provides only a supported list of QP
types, while write flow didn't do it and relied on the driver to check
it. Align those flows to fail as early as possible.
Link: https://lore.kernel.org/r/20200926102450.2966017-8-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Use rdma_restrack_set_name() and rdma_restrack_parent_name() instead of
tricky uses of rdma_restrack_attach_task()/rdma_restrack_uadd().
This uniformly makes all restracks add'd using rdma_restrack_add().
Link: https://lore.kernel.org/r/20200922091106.2152715-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Refactor the restrack code to make sure the kref inside the restrack entry
properly kref's the object in which it is embedded. This slight change is
needed for future conversions of MR and QP which are refcounted before the
release and kfree.
The ideal flow from ib_core perspective as follows:
* Allocate ib_* structure with rdma_zalloc_*.
* Set everything that is known to ib_core to that newly created object.
* Initialize kref with restrack help
* Call to driver specific allocation functions.
* Insert into restrack DB
....
* Return and release restrack with restrack_put.
Largely this means a rdma_restrack_new() should be called near allocating
the containing structure.
Link: https://lore.kernel.org/r/20200922091106.2152715-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Move struct ib_rwq_ind_table allocation to ib_core.
Link: https://lore.kernel.org/r/20200902081623.746359-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Move allocation and destruction of memory windows under ib_core
responsibility and clean drivers to ensure that no updates to MW
ib_core structures are done in driver layer.
Link: https://lore.kernel.org/r/20200902081623.746359-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The "domain" argument is constant and modern device (mlx5) doesn't support
anything except IB_FLOW_DOMAIN_USER, so delete this extra parameter and
simplify code.
Link: https://lore.kernel.org/r/20200730081235.1581127-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Fix reported by kbuild warning.
drivers/infiniband/core/uverbs_cmd.c:1897:47: warning: Shifting signed 32-bit value by 31 bits is undefined behaviour [shiftTooManyBitsSigned]
BUILD_BUG_ON(IB_USER_LAST_QP_ATTR_MASK == (1 << 31));
^
Link: https://lore.kernel.org/r/20200720175627.1273096-3-leon@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The kbuild reported the following warning, so clean whole uverbs_cmd.c
file.
drivers/infiniband/core/uverbs_cmd.c:1066:6: warning: Variable 'ret' is reassigned a value before the old one has been used. [redundantAssignment]
ret = uverbs_request(attrs, &cmd, sizeof(cmd));
^
drivers/infiniband/core/uverbs_cmd.c:1064:0: note: Variable 'ret' is reassigned a value before the old one has been used.
int ret = -EINVAL;
^
Link: https://lore.kernel.org/r/20200720175627.1273096-2-leon@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The automatic object lifetime model allows us to change the write()
interface to have the same logic as the ioctl() path. Update the
create/alloc functions to be in the following format, so the code flow
will be the same:
* Allocate objects
* Initialize them
* Call to the drivers, this is last step that is allowed to fail
* Finalize object
* Return response and allow to core code to handle abort/commit
respectively.
Link: https://lore.kernel.org/r/20200719052223.75245-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Set IOVA on IB MR in uverbs layer to let all drivers have it, this
includes both reg/rereg MR flows.
As part of this change cleaned-up this setting from the drivers that
already did it by themselves in their user flows.
Fixes: e6f0330106f4 ("mlx4_ib: set user mr attributes in struct ib_mr")
Link: https://lore.kernel.org/r/20200630093916.332097-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|