summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-08-10RDMA/netlink: Rename netlink callback structLeon Romanovsky
The RDMA netlink client infrastructure was removed and made obsolete. The old infrastructure defined struct ibnl_client_cbs. Now that all uses of this have been updated to the new infrastructure, rename the struct to be compliant with the current stack naming standards: struct rdma_nl_cbs. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10RDMA/netlink: Simplify and rename ibnl_chk_listenersLeon Romanovsky
Make ibnl_chk_listeners function to be one line by removing unneeded comparison. Rename that function to be complaint to other functions in RDMA netlink. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10RDMA/netlink: Rename and remove redundant parameter from ibnl_multicastLeon Romanovsky
The pointer to netlink header was not used in the ibnl_multicast function, so let's remove it and simplify the function signature. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10RDMA/netlink: Rename and remove redundant parameter from ibnl_unicast*Leon Romanovsky
Netlink message header is not needed for unicast reply, hence remove it. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10RDMA/netlink: Simplify the put_msg and put_attrLeon Romanovsky
Reuse standard macros to cancel the netlink message in case of error. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10RDMA/netlink: Add flag to consolidate common handlingLeon Romanovsky
Add ability to provide flags to control RDMA netlink callbacks and convert addr.c and sa_query.c to be first users of such infrastructure. It allows to move their CAP_NET_ADMIN checks into netlink core. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10RDMA/iwcm: Remove extra EXPORT_SYMBOLSLeon Romanovsky
The iwcm exports functions which are not used outside of ib_core. This patch simply removes these EXPORT_SYMBOLS. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Chien Tin Tung <chien.tin.tung@intel.com>
2017-08-10RDMA/iwcm: Remove useless check of netlink client validityLeon Romanovsky
RDMA netlink implementation guarantees that supplied client number is in allowed range. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Chien Tin Tung <chien.tin.tung@intel.com>
2017-08-10RDMA/netlink: Avoid double pass for RDMA netlink messagesLeon Romanovsky
The standard netlink_rcv_skb function skips messages without NLM_F_REQUEST flag in it, while SA netlink client issues them. In commit bc10ed7d3d19 ("IB/core: Add rdma netlink helper functions") the local function was introduced to allow such messages. This led to double pass for every incoming message. In this patch, we unify that local implementation and netlink_rcv_skb functions, so there will be no need for double pass anymore. As a outcome, this combined function gained more strict check for NLM_F_REQUEST flag and it is now allowed for SA pathquery client only. Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10RDMA/netlink: Remove redundant owner option for netlink callbacksLeon Romanovsky
Owner field is not needed to be set because netlink is part of ib_core which will be unloaded last after all other modules are unloaded. Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10RDMA/netlink: Remove netlink clients infrastructureLeon Romanovsky
RDMA netlink has a complicated infrastructure for dynamically registering and de-registering netlink clients to the NETLINK_RDMA group. The complicated portion of this code is not widely used because 2 of the 3 current clients are statically compiled together with netlink.c. The infrastructure, therefore, is deemed overkill. Refactor the code to eliminate the dynamically added clients. Now all clients are pre-registered in a client array at compile time, and at run time they merely check-in with the infrastructure to pass their callback table for inclusion in the pre-sized client array. This also allows for future cleanups and removal of unneeded code in the iwcm* netlink handler. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Chien Tin Tung <chien.tin.tung@intel.com>
2017-08-09RDMA/core: Add wait/retry version of ibnl_unicastIsmail, Mustafa
Add a wait/retry version of ibnl_unicast, ibnl_unicast_wait, and modify ibnl_unicast to not wait/retry. This eliminates the undesirable wait for future users of ibnl_unicast. Change Portmapper calls originating from kernel to user-space to use ibnl_unicast_wait and take advantage of the wait/retry logic in netlink_unicast. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2017-08-08nvme-rdma: use intelligent affinity based queue mappingsSagi Grimberg
Use the generic block layer affinity mapping helper. Also, limit nr_hw_queues to the rdma device number of irq vectors as we don't really need more. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08block: Add rdma affinity based queue mapping helperSagi Grimberg
Like pci and virtio, we add a rdma helper for affinity spreading. This achieves optimal mq affinity assignments according to the underlying rdma device affinity maps. Reviewed-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08mlx5: support ->get_vector_affinitySagi Grimberg
Simply refer to the generic affinity mask helper. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08RDMA/core: expose affinity mappings per completion vectorSagi Grimberg
This will allow ULPs to intelligently locate threads based on completion vector cpu affinity mappings. In case the driver does not expose a get_vector_affinity callout, return NULL so the caller can maintain a fallback logic. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08mlx5: move affinity hints assignments to generic codeSagi Grimberg
generic api takes care of spreading affinity similar to what mlx5 open coded (and even handles better asymmetric configurations). Ask the generic API to spread affinity for us, and feed him pre_vectors that do not participate in affinity settings (which is an improvement to what we had before). The affinity assignments should match what mlx5 tried to do earlier but now we do not set affinity to async, cmd and pages dedicated vectors. Also, remove mlx5e_get_cpu and introduce mlx5e_get_node (used for allocation purposes) and mlx5_get_vector_affinity (for indirection table construction) as they provide the needed information. Luckily, we have generic helpers to get cpumask and node given a irq vector. mlx5_get_vector_affinity will be used by mlx5_ib in a subsequent patch. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08mlx5e: don't assume anything on the irq affinity mappings of the deviceSagi Grimberg
mlx5e currently assumes that irq affinity is really spread first irq vectors across device home node cpus, with the new generic affinity mappings this is no longer the case, hence mlxe should not rely on this anymore. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08mlx5: convert to generic pci_alloc_irq_vectorsSagi Grimberg
Now that we have a generic code to allocate an array of irq vectors and even correctly spread their affinity, correctly handle cpu hotplug events and more, were much better off using it. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08IB/CM: Set appropriate slid and dlid when handling CM requestDasaratharaman Chandramouli
If extended LIDs are being used, a connection request contains OPA GIDs in them. Extract the lids from the OPA gids and populate slid/dlid fields in the path records that are created when handling a connection request. Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08IB/CM: Create appropriate path records when handling CM requestDasaratharaman Chandramouli
When handling an incoming conection request, ib_cm creates either an IB or an OPA path record based on the gid field in the request. Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08IB/CM: Add OPA Path record support to CMHiatt, Don
Add OPA path record support to the Connection Manager. Signed-off-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08IB/core: Change wc.slid from 16 to 32 bitsHiatt, Don
slid field in struct ib_wc is increased to 32 bits. This enables core components to use larger LIDs if needed. The user ABI is unchanged and return 16 bit values when queried. Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08IB/core: Change port_attr.sm_lid from 16 to 32 bitsDasaratharaman Chandramouli
sm_lid field in struct ib_port_attr is increased to 32 bits. This enables core components to use larger LIDs if needed. The user ABI is unchanged and return 16 bit values when queried. Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08IB/core: Change port_attr.lid size from 16 to 32 bitsDasaratharaman Chandramouli
lid field in struct ib_port_attr is increased to 32 bits. This enables core components to use larger LIDs if needed. The user ABI is unchanged and return 16 bit values when queried. Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08IB/mad: Change slid in RMPP recv from 16 to 32 bitsDasaratharaman Chandramouli
MAD RMPP contains slid field which is 16 bits in length, increase it to 32 bits. Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08IB/IPoIB: Increase local_lid to 32 bitsDasaratharaman Chandramouli
IPoIB contains local_lid field which is 16 bits in length, increase it to 32 bits. Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08IB/srpt: Increase lid and sm_lid to 32 bitsDasaratharaman Chandramouli
srpt contains lid and sm_lid fields which are 16 bits in length, increase them to 32 bits. Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08IB/core: Convert ah_attr from OPA to IB when copying to userDasaratharaman Chandramouli
OPA address handle atttibutes that have 32 bit LIDs would have to be converted to IB address handle attribute with the LID field programmed in the GID before copying to user space. Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-07Merge tag 'rdma-rc-2017-07-26' of ↵Doug Ledford
git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma into leon-ipoib IPoIB fixes for 4.13 The patchset provides various fixes for IPoIB. It is combination of fixes to various issues discovered during verification along with static checkers cleanup patches. Most of the patches are from pre-git era and hence lack of Fixes lines. There is one exception in this IPoIB group - addition of patch revert: Revert "IB/core: Allow QP state transition from reset to error", but it followed by proper fix to the annoying print, so I thought it is appropriate to include it. Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-04IB/hns: checking for IS_ERR() instead of NULLDan Carpenter
The hns_roce_v1_create_lp_qp() returns NULL on error, not error pointers. Fixes: bfcc681bd09d ("IB/hns: Fix the bug when free mr") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-04RDMA/mlx5: Fix existence check for extended address vectorLeon Romanovsky
The extended address vector is the highest bit in be32 variable, but it was compared with the lowest. This patch fixes the endianness of that check and removes already declared define. Fixes: 17d2f88f92ce ("IB/mlx5: Add ODP atomics support") Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-04IB/uverbs: Fix device cleanupYishai Hadas
Uverbs device should be cleaned up only when there is no potential usage of. As part of ib_uverbs_remove_one which might be triggered upon reset flow the device reference count is decreased as expected and leave the final cleanup to the FDs that were opened. Current code increases reference count upon opening a new command FD and decreases it upon closing the file. The event FD is opened internally and rely on the command FD by taking on it a reference count. In case that the command FD was closed and just later the event FD we may ensure that the device resources as of srcu are still alive as they are still in use. Fixing the above by moving the reference count decreasing to the place where the command FD is really freed instead of doing that when it was just closed. fixes: 036b10635739 ("IB/uverbs: Enable device removal when there are active user space applications") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-04RDMA/uverbs: Prevent leak of reserved fieldLeon Romanovsky
initialize to zero the response structure to prevent the leakage of "resp.reserved" field. drivers/infiniband/core/uverbs_cmd.c:1178 ib_uverbs_resize_cq() warn: check that 'resp.reserved' doesn't leak information Fixes: 33b9b3ee9709 ("IB: Add userspace support for resizing CQs") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-04IB/core: Fix race condition in resolving IP to MACParav Pandit
Currently while resolving IP address to MAC address single delayed work is used for resolving multiple such resolve requests. This singled work is essentially performs two tasks. (a) any retry needed to resolve and (b) it executes the callback function for all completed requests While work is executing callbacks, any new work scheduled on for this workqueue is lost because workqueue has completed looking at all pending requests and now looking at callbacks, but work is still under execution. Any further retry to look at pending requests in process_req() after executing callbacks would lead to similar race condition (may be reduce the probably further but doesn't eliminate it). Retrying to enqueue work that from queue_req() context is not something rest of the kernel modules have followed. Therefore fix in this patch utilizes kernel facility to enqueue multiple work items to a workqueue. This ensures that no such requests gets lost in synchronization. Request list is still maintained so that rdma_cancel_addr() can unlink the request and get the completion with error sooner. Neighbour update event handling continues to be handled in same way as before. Additionally process_req() work entry cancels any pending work for a request that gets completed while processing those requests. Originally ib_addr was ST workqueue, but it became MT work queue with patch of [1]. This patch again makes it similar to ST so that neighbour update events handler work item doesn't race with other work items. In one such below trace, (though on 4.5 based kernel) it can be seen that process_req() never executed the callback, which is likely for an event that was schedule by queue_req() when previous callback was getting executed by workqueue. [<ffffffff816b0dde>] schedule+0x3e/0x90 [<ffffffff816b3c45>] schedule_timeout+0x1b5/0x210 [<ffffffff81618c37>] ? ip_route_output_flow+0x27/0x70 [<ffffffffa027f9c9>] ? addr_resolve+0x149/0x1b0 [ib_addr] [<ffffffff816b228f>] wait_for_completion+0x10f/0x170 [<ffffffff810b6140>] ? try_to_wake_up+0x210/0x210 [<ffffffffa027f220>] ? rdma_copy_addr+0xa0/0xa0 [ib_addr] [<ffffffffa0280120>] rdma_addr_find_l2_eth_by_grh+0x1d0/0x278 [ib_addr] [<ffffffff81321297>] ? sub_alloc+0x77/0x1c0 [<ffffffffa02943b7>] ib_init_ah_from_wc+0x3a7/0x5a0 [ib_core] [<ffffffffa0457aba>] cm_req_handler+0xea/0x580 [ib_cm] [<ffffffff81015982>] ? __switch_to+0x212/0x5e0 [<ffffffffa04582fd>] cm_work_handler+0x6d/0x150 [ib_cm] [<ffffffff810a14c1>] process_one_work+0x151/0x4b0 [<ffffffff810a1940>] worker_thread+0x120/0x480 [<ffffffff816b074b>] ? __schedule+0x30b/0x890 [<ffffffff810a1820>] ? process_one_work+0x4b0/0x4b0 [<ffffffff810a1820>] ? process_one_work+0x4b0/0x4b0 [<ffffffff810a6b1e>] kthread+0xce/0xf0 [<ffffffff810a6a50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff816b53a2>] ret_from_fork+0x42/0x70 [<ffffffff810a6a50>] ? kthread_freezable_should_stop+0x70/0x70 INFO: task kworker/u144:1:156520 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u144:1 D ffff883ffe1d7600 0 156520 2 0x00000080 Workqueue: ib_addr process_req [ib_addr] ffff883f446fbbd8 0000000000000046 ffff881f95280000 ffff881ff24de200 ffff883f66120000 ffff883f446f8008 ffff881f95280000 ffff883f6f9208c4 ffff883f6f9208c8 00000000ffffffff ffff883f446fbbf8 ffffffff816b0dde [1] http://lkml.iu.edu/hypermail/linux/kernel/1608.1/05834.html Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Always perform offline transitionSebastian Sanchez
Always initiate an offline transition request when a link down occurs. The firmware will use this request to confirm that the driver has seen the link down message. A host version is set to indicate this driver behavior to the firmware. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Prevent link down request double queuingSebastian Sanchez
When link interrupts occur, multiple link down requests could be queued up when only one is needed. This could get the hfi1 out of sync with its link partner during LNI. Only allow one link down request to be queued at any one time. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Create workqueue for link eventsSebastian Sanchez
Currently, link down interrupts queue link entries on a workqueue intended for sending events only. Create a workqueue for queuing link events. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/{rdmavt, hfi1, qib}: Fix panic with post receive and SGE compressionMike Marciniszyn
The server side of qperf panics as follows: [242446.336860] IP: report_bug+0x64/0x10 [242446.341031] PGD 1c0c067 [242446.341032] P4D 1c0c067 [242446.343951] PUD 1c0d063 [242446.346870] PMD 8587ea067 [242446.349788] PTE 800000083e14016 [242446.352901] [242446.358352] Oops: 0003 [#1] SM [242446.437919] CPU: 1 PID: 7442 Comm: irq/92-hfi1_0 k Not tainted 4.12.0-mam-asm #1 [242446.446365] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/201 [242446.458397] task: ffff8808392d2b80 task.stack: ffffc9000664000 [242446.465097] RIP: 0010:report_bug+0x64/0x10 [242446.469859] RSP: 0018:ffffc900066439c0 EFLAGS: 0001000 [242446.475784] RAX: ffffffffa06647e4 RBX: ffffffffa06461e1 RCX: 000000000000000 [242446.483840] RDX: 0000000000000907 RSI: ffffffffa0675040 RDI: ffffffffffff740 [242446.491897] RBP: ffffc900066439e0 R08: 0000000000000001 R09: 000000000000025 [242446.499953] R10: ffffffff81a253df R11: 0000000000000133 R12: ffffc90006643b3 [242446.508010] R13: ffffffffa065bbf0 R14: 00000000000001e5 R15: 000000000000000 [242446.516067] FS: 0000000000000000(0000) GS:ffff88085f640000(0000) knlGS:000000000000000 [242446.525191] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003 [242446.531698] CR2: ffffffffa06647ee CR3: 0000000001c09000 CR4: 00000000001406e [242446.539756] Call Trace [242446.542582] fixup_bug+0x2c/0x5 [242446.546277] do_trap+0x12b/0x18 [242446.549972] do_error_trap+0x89/0x11 [242446.554171] ? hfi1_copy_sge+0x271/0x2b0 [hfi1 [242446.559324] ? ttwu_do_wakeup+0x1e/0x14 [242446.563795] ? ttwu_do_activate+0x77/0x8 [242446.568363] do_invalid_op+0x20/0x3 [242446.572448] invalid_op+0x1e/0x3 [242446.576247] RIP: 0010:hfi1_copy_sge+0x271/0x2b0 [hfi1 [242446.582075] RSP: 0018:ffffc90006643be8 EFLAGS: 0001004 [242446.587999] RAX: 0000000000000000 RBX: ffff88083e0fa240 RCX: 000000000000000 [242446.596058] RDX: 0000000000000000 RSI: ffff880842508000 RDI: ffff88083e0fa24 [242446.604116] RBP: ffffc90006643c28 R08: 0000000000000000 R09: 000000000000000 [242446.612172] R10: ffffc90009473640 R11: 0000000000000133 R12: 000000000000000 [242446.620228] R13: 0000000000000000 R14: 0000000000002000 R15: ffff88084250800 [242446.628293] ? hfi1_copy_sge+0x1a1/0x2b0 [hfi1 [242446.633449] hfi1_rc_rcv+0x3da/0x1270 [hfi1 [242446.638312] ? sc_buffer_alloc+0x113/0x150 [hfi1 [242446.643662] hfi1_ib_rcv+0x1c9/0x2e0 [hfi1 [242446.648428] process_receive_ib+0x19a/0x270 [hfi1 [242446.653866] ? process_rcv_qp_work+0xd2/0x160 [hfi1 [242446.659505] handle_receive_interrupt_nodma_rtail+0x184/0x2e0 [hfi1 [242446.666693] ? irq_finalize_oneshot+0x100/0x10 [242446.671846] receive_context_thread+0x1b/0x140 [hfi1 [242446.677576] irq_thread_fn+0x1e/0x4 [242446.681659] irq_thread+0x13c/0x1b [242446.685646] ? irq_forced_thread_fn+0x60/0x6 [242446.690604] kthread+0x112/0x15 [242446.694298] ? irq_thread_check_affinity+0xe0/0xe [242446.699738] ? kthread_park+0x60/0x6 [242446.703919] ? do_syscall_64+0x67/0x15 [242446.708292] ret_from_fork+0x25/0x3 [242446.712374] Code: 63 78 04 44 0f b7 70 08 41 89 d0 4c 8d 2c 38 41 83 e0 01 f6 c2 02 74 17 66 45 85 c0 74 11 f6 c2 04 b9 01 00 00 00 75 bb 83 ca 04 <66> 89 50 0a 66 45 85 c0 74 52 0f b6 48 0b 41 0f b7 f6 4d 89 e0 [242446.733527] RIP: report_bug+0x64/0x100 RSP: ffffc900066439c [242446.739935] CR2: ffffffffa06647e [242446.743763] ---[ end trace 0e90a20d0aa494f7 ]-- The root cause is that the qib/hfi1 post receive call to rvt_lkey_ok() doesn't interpret the new return value from rvt_lkey_ok() properly leading to an mr reference count underrun. Additionally, remove an unused argument in rvt_sge_adjacent() aw well as an unneeded incr local in rvt_post_one_wr(). Fixes: Commit 14fe13fcd3af ("IB/rdmavt: Compress adjacent SGEs in rvt_lkey_ok()") Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Disambiguate corruption and uninitialized error casesJan Sokolowski
The error messages when checksum validation of the platform configuration fields populated into the ASIC scratch registers fails are ambiguous. Disambiguate them. Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Only set fd pointer when base context is completely initializedMichael J. Ruhl
The allocate_ctxt() function adds the context to the fd data structure. Since the context is not completely initialized, this can cause confusion as to whether the context is valid or not. Move the fd reference from allocate_ctxt() to setup_base_ctxt(). Update the necessary functions to be aware of this move. Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Do not enable disabled port on cable insertJan Sokolowski
Fix issue where a disabled port can be enabled by inserting a cable. The port should be explicitly enabled instead. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Harden state transition to Armed and ActiveAlex Estrin
There is a window that allows other threads to read state of 'host_link_state' as a new, before the hardware actual state is set. This patch closes the window by indicating a new state only after hardware transition is complete. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Split copy_to_user data copy for better securityMichael J. Ruhl
A copy_to_user() call assumes that two members of a data structure are sequential. Since this may not always be true, separate the copies to ensure a safe copy. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Verify port data VLs credits on transition to ArmedAlex Estrin
There is a window where the FM can read the buffer control table and decide not to program buffers. When a port goes down, the code clears the table and if it is not programmed, posted SDMA descriptors will never complete due to no buffer credits. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Move saving PCI values to a separate functionBartlomiej Dudek
During PCIe initialization some registers' values from PCI config space are saved in order to restore them later (i.e. after reset). Restoring those value is done by a function called restore_pci_variables, while saving them is put directly into function hfi1_pcie_ddinit. Move saving values to a separate function in the image of restoring functionality. Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Bartlomiej Dudek <bartlomiej.dudek@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Fix initialization failure for debug firmwareByczkowski, Jakub
Loading debug signed firmware fails if started immediately after failed attempt to load production firmware. A short delay is required so add about a 100us delay after RSA check failure. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Fix code consistency for if/else blocks in chip.cJan Sokolowski
Code structure is not consistent for if/else blocks and break instructions in set_link_state for case HLS_UP_INIT. Physical state uses break in case of an error and if/else blocks for logical use cases. These blocks should be implemented consistently. Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Send MAD traps until repressedMichael J. Ruhl
A trap should be sent to the FM until the FM sends a repress message. This is in line with the IBTA 13.4.9. Add the ability to resend traps until a repress message is received. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Michael N. Henry <michael.n.henry@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Pass the context pointer rather than the indexMichael J. Ruhl
The hfi1_rcvctrl() function receives an index which it then converts to an rcd. Since most functions have the rcd, use that instead. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>