Age | Commit message (Collapse) | Author |
|
[ Upstream commit b5162bb6aa1ec04dff4509b025883524b6d7e7ca ]
Smatch detected a potential use-after-free of an ndlp oject in
dev_loss_tmo_callbk during driver unload or fatal error handling.
Fix by reordering code to avoid potential use-after-free if initial
nodelist reference has been previously removed.
Fixes: 4281f44ea8bf ("scsi: lpfc: Prevent NDLP reference count underflow in dev_loss_tmo callback")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/linux-scsi/41c1d855-9eb5-416f-ac12-8b61929201a3@stanley.mountain/
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20250425194806.3585-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 92b99f1a73b7951f05590fd01640dcafdbc82c21 ]
In attempt to reduce the amount of unnecessary ndlp->lock acquisitions
in the lpfc driver, change nlpa_flag into an unsigned long bitmask and
use clear_bit/test_bit bitwise atomic APIs instead of reliance on
ndlp->lock for synchronization.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20241031223219.152342-10-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Stable-dep-of: b5162bb6aa1e ("scsi: lpfc: Avoid potential ndlp use-after-free in dev_loss_tmo_callbk")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 32566a6f1ae558d0e79fed6e17a75c253367a57f ]
An RPI is tightly bound to an NDLP structure and is freed only upon
release of an NDLP object. As such, there should be no logic that frees
an RPI outside of the lpfc_nlp_release() routine. In order to reinforce
the original design usage of RPIs, remove the NLP_RELEASE_RPI flag and
related logic.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20241031223219.152342-9-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Stable-dep-of: b5162bb6aa1e ("scsi: lpfc: Avoid potential ndlp use-after-free in dev_loss_tmo_callbk")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 05ae6c9c7315d844fbc15afe393f5ba5e5771126 ]
In lpfc_check_sli_ndlp(), the get_job_els_rsp64_did remote_id assignment
does not apply for GEN_REQUEST64 commands as it only has meaning for a
ELS_REQUEST64 command. So, if (iocb->ndlp == ndlp) is false, we could
erroneously return the wrong value. Fix by replacing the fallthrough
statement with a break statement before the remote_id check.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20250425194806.3585-2-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 23ed62897746f49f195d819ce6edeb1db27d1b72 ]
With repeated port swaps between separate fabrics, there can be multiple
registrations for fabric well known address 0xfffffe. This can cause ndlp
reference confusion due to the usage of a single ndlp ptr that stores the
rport object in fc_rport struct private storage during transport
registration. Subsequent registrations update the ndlp->rport field with
the newer rport, so when transport layer triggers dev_loss_tmo for the
earlier registered rport the ndlp->rport private storage is referencing the
newer rport instead of the older rport in dev_loss_tmo callbk.
Because the older ndlp->rport object is already cleaned up elsewhere in
driver code during the time of fabric swap, check that the rport provided
in dev_loss_tmo callbk actually matches the rport stored in the LLDD's
ndlp->rport field. Otherwise, skip dev_loss_tmo work on a stale rport.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20250131000524.163662-4-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 56c3d809b7b450379162d0b8a70bbe71ab8db706 ]
After a port swap between separate fabrics, there may be multiple nodes in
the vport's fc_nodes list with the same fabric well known address.
Duplication is temporary and eventually resolves itself after dev_loss_tmo
expires, but nameserver queries may still occur before dev_loss_tmo. This
possibly results in returning stale fabric ndlp objects. Fix by adding an
nlp_state check to ensure the ndlp search routine returns the correct newer
allocated ndlp fabric object.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20250131000524.163662-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4281f44ea8bfedd25938a0031bebba1473ece9ad ]
Current dev_loss_tmo handling checks whether there has been a previous
call to unregister with SCSI transport. If so, the NDLP kref count is
decremented a second time in dev_loss_tmo as the final kref release.
However, this can sometimes result in a reference count underflow if
there is also a race to unregister with NVMe transport as well. Add a
check for NVMe transport registration before decrementing the final
kref. If NVMe transport is still registered, then the NVMe transport
unregistration is designated as the final kref decrement.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20241031223219.152342-8-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
With a FLOGI outstanding and loss of physical link connection to the fabric
for the duration of dev_loss_tmo, there is a fabric ndlp kref imbalance
that decrements the kref and sets the NLP_IN_RECOV_POST_DEV_LOSS flag at
the same time.
The issue is that when the FLOGI completion routine executes, the fabric
ndlp could already be freed because of the final kref put from the
dev_loss_tmo handler. Fix by early returning before the ndlp kref put if
the ndlp is deemed a candidate for NLP_IN_RECOV_POST_DEV_LOSS in the FLOGI
completion routine.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240912232447.45607-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
topology
In direct attached topology, certain target vendors that are quick to issue
FLOGI followed by a cable pull for more than dev_loss_tmo may result in a
kref imbalance for the remote port ndlp object.
Add an nlp_get when the defer_flogi_acc flag is set. This is expected to
balance the nlp_put in the defer_flogi_acc clause in the
lpfc_issue_els_flogi() routine. Because we need to retain the ndlp ptr,
reorganize all of the defer_flogi_acc information into one
lpfc_defer_flogi_acc struct.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240726231512.92867-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When the HBA is undergoing a reset or is handling an errata event, NULL ptr
dereference crashes may occur in routines such as
lpfc_sli_flush_io_rings(), lpfc_dev_loss_tmo_callbk(), or
lpfc_abort_handler().
Add NULL ptr checks before dereferencing hdwq pointers that may have been
freed due to operations colliding with a reset or errata event handler.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240726231512.92867-4-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In rare cases when a fabric node is recovered after a link bounce and
before dev_loss_tmo callbk is reached, the driver may leave the fabric node
in an inconsistent state with the NLP_IN_DEV_LOSS flag perpetually set.
In lpfc_dev_loss_tmo_callbk, a check is added for a recovered fabric node.
If the node is recovered, then don't queue the lpfc_dev_loss_tmo_handler
work. In lpfc_dev_loss_tmo_handler, the path taken for the recovered fabric
nodes is updated to clear the NLP_IN_DEV_LOSS flag.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240628172011.25921-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Certain vendor specific targets initially register with the fabric as an
initiator function first and then re-register as a target function
afterwards.
The timing of the target function re-registration can cause a race
condition such that the driver is stuck assuming the remote port as an
initiator function and never discovers the target's hosted LUNs.
Expand the nlp_state qualifier to also include NLP_STE_PRLI_ISSUE because
the state means that PRLI was issued but we have not quite reached
MAPPED_NODE state yet. If we received an RSCN in the PRLI_ISSUE state,
then we should restart discovery again by going into DEVICE_RECOVERY.
Fixes: dded1dc31aa4 ("scsi: lpfc: Modify when a node should be put in device recovery mode during RSCN")
Cc: <stable@vger.kernel.org> # v6.6+
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240628172011.25921-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In attempt to reduce the amount of unnecessary phba->hbalock acquisitions
in the lpfc driver, change hba_flag into an unsigned long bitmask and use
clear_bit/test_bit bitwise atomic APIs instead of reliance on phba->hbalock
for synchronization.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240429221547.6842-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In LPFC_MBOXQ_t, the ctx_buf ptr shouldn't be defined as a generic void
*ptr. It is named ctx_buf and it should only be used as an lpfc_dmabuf
*ptr. Due to the void* declaration, there have been abuses of ctx_buf for
things not related to lpfc_dmabuf.
So, set the ptr type for *ctx_buf as lpfc_dmabuf. Remove all type casts on
ctx_buf because it is no longer a void *ptr. Convert the abuse of ctx_buf
for something not related to lpfc_dmabuf to use the void *context3 ptr.
A particular abuse of the ctx_buf warranted a new void *ext_buf ptr.
However, the usage of this new void *ext_buf is not generic. It is
intended to only hold virtual addresses for extended mailbox commands.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240305200503.57317-10-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In LPFC_MBOXQ_t data structure, the ctx_ndlp ptr shouldn't be defined as a
generic void *ptr. It is named ctx_ndlp and it should only be used as an
lpfc_nodelist *ptr. Due to the void* declaration, there have been abuses
of ctx_ndlp for things not related to ndlp.
So, set the ptr type for *ctx_ndlp as lpfc_nodelist. Remove all type casts
on ctx_ndlp because it is no longer a void *ptr. Convert the abuse of
ctx_ndlp for things not related to ndlps to use the void *context3 ptr.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240305200503.57317-9-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
lpfc_worker_wake_up() calls the lpfc_work_done() routine, which takes the
hbalock. Thus, lpfc_worker_wake_up() should not be called while holding the
hbalock to avoid potential deadlock.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240305200503.57317-7-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Update copyrights to 2024 for files modified in the 14.4.0.0 patch set.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240131185112.149731-18-justintee8345@gmail.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In attempt to reduce the amount of unnecessary shost_lock acquisitions in
the lpfc driver, change load_flag into an unsigned long bitmask and use
clear_bit/test_bit bitwise atomic APIs instead of reliance on shost_lock
for synchronization.
Also, correct the test for FC_UNLOADING in lpfc_ct_handle_mibreq, which
incorrectly tests vport->fc_flag rather than vport->load_flag.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240131185112.149731-16-justintee8345@gmail.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In attempt to reduce the amount of unnecessary shost_lock acquisitions in
the lpfc driver, change fc_flag into an unsigned long bitmask and use
clear_bit/test_bit bitwise atomic APIs instead of reliance on shost_lock
for synchronization.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240131185112.149731-15-justintee8345@gmail.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In attempt to reduce the amount of unnecessary shost_lock acquisitions in
the lpfc driver, replace shost_lock with an explicit fc_nodes_list_lock
spinlock when accessing vport->fc_nodes lists. Although vport memory
region is owned by shost->hostdata, it is driver private memory and an
explicit fc_nodes list lock for fc_nodes list mutations is more appropriate
than locking the entire shost.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240131185112.149731-14-justintee8345@gmail.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There is no reason to use the shost_lock to synchronize an LLDD statistics
counter. Convert all the nlp state statistic counters into atomic_t.
Corresponding zeroing, increments, and reads are converted to atomic
versions.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240131185112.149731-13-justintee8345@gmail.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Upon first RSCN receipt of a target server's remote port that is initially
acting as an initiator function, the driver marks the ndlp->nlp_type as an
initiator role. Then later, when processing an RSCN for a target function
role switch, that ndlp remote port is permanently stuck as an initiator
role and can never transition to be discovered as an updated target role
function.
Remove the NLP_RCV_PLOGI early return if statement clause so that the
NLP_NPR_2B_DISC flag gets set. This allows for role change detections.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240131185112.149731-7-justintee8345@gmail.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In lpfc_check_nlp_post_devloss(), retaking of the ndlp lock in the if
statement is useless because the very next line unlocks. Simply return to
avoid relocking.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231031191224.150862-5-justintee8345@gmail.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Justin Tee <justintee8345@gmail.com> says:
Update lpfc to revision 14.2.0.15
This patch set contains error handling fixes, ELS bug fixes, and
logging improvements.
The patches were cut against Martin's 6.7/scsi-queue tree.
Link: https://lore.kernel.org/r/20231009161812.97232-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The preexisting LOG_NODE message flag frequently spams a subset of the same
log messages during normal FC driver operations. When analyzing driver
logs, this sometimes leads to difficulty in troubleshooting.
Because LOG_IP log message flag is unused, convert it to a new
LOG_NODE_VERBOSE flag. The LOG_NODE_VERBOSE shall specifically be used for
diagnosing issues that require precise ndlp tracking detail.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231009161812.97232-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
During rmmod, when dev_loss_tmo callback is called, an ndlp kref count is
decremented twice. Once for SCSI transport registration and second to
remove the initial node allocation kref. If there is also an NVMe
transport registration, another reference count decrement is expected in
lpfc_nvme_unregister_port().
Race conditions between the NVMe transport remoteport_delete and
dev_loss_tmo callbacks sometimes results in premature ndlp object release
resulting in use-after-free issues.
Fix by not dropping the ndlp object in dev_loss_tmo callback with an
outstanding NVMe transport registration. Inversely, mark the final
NLP_DROPPED flag in lpfc_nvme_unregister_port when rmmod flag is set.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230908211923.37603-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When a dev_loss_tmo event occurs, an ndlp lock is taken before checking
nlp_flag for NLP_DROPPED. There is an attempt to restore the ndlp lock
when exiting the if statement, but the nlp_put kref could be the final
decrement causing a use-after-free memory access on a released ndlp object.
Instead of trying to reacquire the ndlp lock after checking nlp_flag, just
return after calling nlp_put.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230908211852.37576-1-justintee8345@gmail.com
Reviewed-by: "Ewan D. Milne" <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
|
|
Only nodes whose state is at least past a PLOGI issue and strictly less
than a PRLI issue should be put into device recovery mode upon RSCN
receipt. Previously, the allowance of LOGO and PRLI completion states did
not make sense because those nodes should be allowed to flow through and
marked as NPort dissappeared as is normally done. A follow up RSCN GID_FT
would recover those nodes in such cases.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230804195546.157839-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The ndlp kref count implementation in lpfc_dev_loss_tmo_callbk() removes
the initial node reference when a vport is unloading. When lpfc_cleanup()
sends a DEVICE_RM event and is in NPR state, the driver calls
lpfc_drop_node(). Subsequently, lpfc_drop_node() also removes an ndlp kref
thinking it is the initial reference. This unintentionally introduces an
extra kref decrement on the ndlp object.
Fix by using the NLP_DROPPED node flag in lpfc_dev_loss_tmo_callbk() and
lpfc_drop_node() to coordinate the removal of the initial node reference.
In lpfc_dev_loss_tmo_callbk(), remove the SCSI transport reference provided
the node is registered in the dev_loss context because the driver cannot
call the SCSI transport in dev_loss context or afterwards. And, have
lpfc_drop_node() not remove a reference if another thread is acting or has
already acted on it.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Conditionalize when to put an ndlp into recovery mode when processing
RSCNs. As long as an ndlp state is beyond a PLOGI issue and has been
mapped to a transport layer before, the ndlp qualifies to be put into
recovery mode. Otherwise, treat the ndlp rport normally through the
discovery engine.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The variable phba->fcf.fcf_flag is often protected by the lock
phba->hbalock() when is accessed. Here is an example in
lpfc_unregister_fcf_rescan():
spin_lock_irq(&phba->hbalock);
phba->fcf.fcf_flag |= FCF_INIT_DISC;
spin_unlock_irq(&phba->hbalock);
However, in the same function, phba->fcf.fcf_flag is assigned with 0
without holding the lock, and thus can cause a data race:
phba->fcf.fcf_flag = 0;
To fix this possible data race, a lock and unlock pair is added when
accessing the variable phba->fcf.fcf_flag.
Reported-by: BassCheck <bass@buaa.edu.cn>
Signed-off-by: Tuo Li <islituo@gmail.com>
Link: https://lore.kernel.org/r/20230630024748.1035993-1-islituo@gmail.com
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Justin Tee <justintee8345@gmail.com> says:
Update lpfc to revision 14.2.0.13
This patch set contains discovery bug fixes, firmware logging
improvements, clean up of CQ handling, and statistics collection
enhancements.
The patches were cut against Martin's 6.5/scsi-queue tree.
Link: https://lore.kernel.org/r/20230523183206.7728-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Pre-existing device loss recovery logic via the NLP_IN_RECOV_POST_DEV_LOSS
flag only handled Fabric Port Login, Fabric Controller, Management, and
Name Server addresses.
Fabric domain controllers fall under the same category for usage of the
NLP_IN_RECOV_POST_DEV_LOSS flag. Add a default case statement to mark an
ndlp for device loss recovery.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-4-justintee8345@gmail.com
Acked-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In dev_loss_tmo callback routine, we early return if the ndlp is in a state
of rediscovery. This occurs when a target proactively PLOGIs or PRLIs
after an RSCN before the dev_loss_tmo callback routine is scheduled to run.
Move clear of the NLP_IN_DEV_LOSS flag before the ndlp state check in such
cases.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
lpfc_register_remote_port()
Due to a target port D_ID swap, it is possible for the
lpfc_register_remote_port() routine to touch post mortem fc_rport memory
when trying to access fc_rport->dd_data.
The D_ID swap causes a simultaneous call to lpfc_unregister_remote_port(),
where fc_remote_port_delete() reclaims fc_rport memory.
Remove the fc_rport->dd_data->pnode NULL assignment because the following
line reassigns ndlp->rport with an fc_rport object from
fc_remote_port_add() anyways. The pnode nullification is superfluous.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-2-justintee8345@gmail.com
Acked-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
strlcpy() reads the entire source buffer first. This read may exceed the
destination size limit. This is both inefficient and can lead to linear
read overflows if a source string is not NUL-terminated [1]. In an effort
to remove strlcpy() completely [2], replace strlcpy() here with strscpy().
No return values were used, so direct replacement is safe.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Link: https://lore.kernel.org/r/20230530155745.343032-1-azeemshaikh38@gmail.com
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
lpfc_nlp_not_used()
Smatch detected a double free path because lpfc_nlp_not_used() releases an
ndlp object before reaching lpfc_nlp_put() at the end of
lpfc_cmpl_els_logo_acc().
Remove the outdated lpfc_nlp_not_used() routine. In
lpfc_mbx_cmpl_ns_reg_login(), replace the call with lpfc_nlp_put(). In
lpfc_cmpl_els_logo_acc(), replace the call with lpfc_unreg_rpi() and keep
the lpfc_nlp_put() at the end of the routine. If ndlp's rpi was
registered, then lpfc_unreg_rpi()'s completion routine performs the final
ndlp clean up after lpfc_nlp_put() is called from lpfc_cmpl_els_logo_acc().
Otherwise if ndlp has no rpi registered, the lpfc_nlp_put() at the end of
lpfc_cmpl_els_logo_acc() is the final ndlp clean up.
Fixes: 4430f7fd09ec ("scsi: lpfc: Rework locations of ndlp reference taking")
Cc: <stable@vger.kernel.org> # v5.11+
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/all/Y3OefhyyJNKH%2Fiaf@kili/
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230417191558.83100-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Extended status reason code errors should mask off the IOERR_PARAM_MASK
before checking strict equalities for IOERR values.
Update the lpfc_error_lost_link() routine as such.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230301231626.9621-9-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When mapped to a target with multiple virtual ports, a link bounce
sometimes results in unsuccessful rediscovery of all of the target's
virtual ports. This is because a succession of repeat RSCNs for the
virtual target ports leaves ndlps in the REG_LOGIN state with the
NLP_REG_LOGIN_SEND flag set. With NLP_REG_LOGIN_SEND set, during the next
PLOGI, the driver will UNREG_RPI. When UNREG_RPI is processed, the driver
can be in the middle of PRLI_ISSUE or MAPPED state resulting in an illegal
state transition by the discovery engine and stalling.
Fix by calling the discovery state machine with DEVICE_RECOVERY event
during RSCN processing. This will set the NLP_IGNR_REG_CMPL bit and
prevent the old REG_LOGIN state from advancing. Then for the new PLOGI
issue, add the check for the NLP_IGNR_REG_CMPL bit to delay issuing the new
PLOGI until the queued REG_LOGIN and UNREG_LOGIN have been processed.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230301231626.9621-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Remove the repeated word "the" in comments.
[mkp: fixed additional typos in the changed lines]
Link: https://lore.kernel.org/r/20230217083046.4090-1-liubo03@inspur.com
Signed-off-by: Bo Liu <liubo03@inspur.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Update copyrights to 2023 for files modified in the 14.2.0.10 patch set.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
With faulty cables in PT2PT topology, an unintentional ndlp double kref
decrement can occur.
If a FLOGI request is outstanding before the link goes down, the missing
FLOGI_ACC causes an F_Port ndlp to remain in the UNUSED state. During link
down, lpfc_cleanup_rpis() is called and decrements an ndlp kref.
Additionally, when the driver later decides to abort the FLOGI, the FLOGI
completion handler decrements the ndlp kref a second time.
Remove duplicate clean up logic in lpfc_cleanup_rpis() because the updated
FLOGI completion handler already handles the ndlp kref decrement.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When a FLOGI completes with a sequence timeout error, a freed kref ptr
dereference crash can occur due to a timing race involving ndlp referencing
in lpfc_dev_loss_tmo_callbk.
Fix by ensuring the driver accounts for an outstanding FLOGI when dev_loss
is active. Also, don't remove the HBA_FLOGI_OUTSTANDING flag when the
FLOGI is retried to allow the driver to handle the reference counts
correctly in lpfc_dev_loss_tmo_handler.
Reported-by: Dietmar Hahn <dietmar.hahn@fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@fujitsu.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20221116011921.105995-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value,
simply use the get_random_{u8,u16}() functions, which are faster than
wasting the additional bytes from a 32-bit value. This was done
mechanically with this coccinelle script:
@@
expression E;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u16;
typedef __be16;
typedef __le16;
typedef u8;
@@
(
- (get_random_u32() & 0xffff)
+ get_random_u16()
|
- (get_random_u32() & 0xff)
+ get_random_u8()
|
- (get_random_u32() % 65536)
+ get_random_u16()
|
- (get_random_u32() % 256)
+ get_random_u8()
|
- (get_random_u32() >> 16)
+ get_random_u16()
|
- (get_random_u32() >> 24)
+ get_random_u8()
|
- (u16)get_random_u32()
+ get_random_u16()
|
- (u8)get_random_u32()
+ get_random_u8()
|
- (__be16)get_random_u32()
+ (__be16)get_random_u16()
|
- (__le16)get_random_u32()
+ (__le16)get_random_u16()
|
- prandom_u32_max(65536)
+ get_random_u16()
|
- prandom_u32_max(256)
+ get_random_u8()
|
- E->inet_id = get_random_u32()
+ E->inet_id = get_random_u16()
)
@@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u16;
identifier v;
@@
- u16 v = get_random_u32();
+ u16 v = get_random_u16();
@@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u8;
identifier v;
@@
- u8 v = get_random_u32();
+ u8 v = get_random_u8();
@@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u16;
u16 v;
@@
- v = get_random_u32();
+ v = get_random_u16();
@@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u8;
u8 v;
@@
- v = get_random_u32();
+ v = get_random_u8();
// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@
((T)get_random_u32()@p & (LITERAL))
// Examine limits
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@
value = None
if literal.startswith('0x'):
value = int(literal, 16)
elif literal[0] in '123456789':
value = int(literal, 10)
if value is None:
print("I don't know how to handle %s" % (literal))
cocci.include_match(False)
elif value < 256:
coccinelle.RESULT = cocci.make_ident("get_random_u8")
elif value < 65536:
coccinelle.RESULT = cocci.make_ident("get_random_u16")
else:
print("Skipping large mask of %s" % (literal))
cocci.include_match(False)
// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
identifier add_one.RESULT;
identifier FUNC;
@@
- (FUNC()@p & (LITERAL))
+ (RESULT() & LITERAL)
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This patch fixes below Smatch reported issues:
1. lpfc_hbadisc.c:3020 lpfc_mbx_cmpl_fcf_rr_read_fcf_rec()
error: uninitialized symbol 'vlan_id'.
2. lpfc_hbadisc.c:3121 lpfc_mbx_cmpl_read_fcf_rec()
error: uninitialized symbol 'vlan_id'.
3. lpfc_init.c:335 lpfc_dump_wakeup_param_cmpl()
warn: always true condition '(prg->dist < 4) => (0-3 < 4)'
4. lpfc_init.c:2419 lpfc_parse_vpd()
warn: inconsistent indenting.
5. lpfc_init.c:13248 lpfc_sli4_enable_msi()
warn: 'phba->pcidev->irq' 2147483648 can't fit into 65535
'eqhdl->irq'
6. lpfc_debugfs.c:5300 lpfc_idiag_extacc_avail_get()
error: uninitialized symbol 'ext_cnt'
7. lpfc_debugfs.c:5300 lpfc_idiag_extacc_avail_get()
error: uninitialized symbol 'ext_size'
8. lpfc_vmid.c:248 lpfc_vmid_get_appid()
warn: sleeping in atomic context.
9. lpfc_init.c:8342 lpfc_sli4_driver_resource_setup()
warn: missing error code 'rc'.
10. lpfc_init.c:13573 lpfc_sli4_hba_unset()
warn: variable dereferenced before check 'phba->pport' (see
line 13546)
11. lpfc_auth.c:1923 lpfc_auth_handle_dhchap_reply()
error: double free of 'hash_value'
Fixes:
1. Initialize vlan_id to LPFC_FCOE_NULL_VID.
2. Initialize vlan_id to LPFC_FCOE_NULL_VID.
3. prg->dist is a 2 bit field. Its value can only be between 0-3.
Remove redundent check 'if (prg->dist < 4)'.
4. Fix inconsistent indenting. Moved logic into helper function
lpfc_fill_vpd().
5. Define 'eqhdl->irq' as int value as pci_irq_vector() returns int.
Also, check for return value of pci_irq_vector() and log message in
case of failure.
6. Initialize 'ext_cnt' to 0.
7. Initialize 'ext_size' to 0.
8. Use alloc_percpu_gfp() with GFP_ATOMIC flag.
9. 'rc' was not updated when dma_pool_create() fails. Update 'rc =
-ENOMEM' when dma_pool_create() fails before calling goto statement.
10. Add check for 'phba->pport' in lpfc_cpuhp_remove().
11. Initialize 'hash_value' to NULL, same like 'aug_chal' variable.
Link: https://lore.kernel.org/r/20220911221505.117655-13-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Firmware reports link degrade signaling via ACQES.
Handlers and new additions to the SET_FEATURES mbox command are implemented
so that link degrade parameters for 64GB capable links are reported through
EDC ELS frames.
Link: https://lore.kernel.org/r/20220911221505.117655-12-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When a FLOGI is received before we have issued our FLOGI, the ACC response
to the received FLOGI is issued with SID 2 instead of the expected fabric
controller SID. Certain target vendors ignore the malformed ACC with SID 2
and wait for a properly filled ACC with a fabric controller SID.
The lpfc_sli_prep_wqe() routine depends on the FC_PT2PT flag to fill in the
fabric controller SID when in PT2PT mode, but due to a previous commit the
flag was getting cleared. Fix by adding a check for the defer_flogi_acc
flag to know whether or not to clear the FC_PT2PT flag on link up.
Link: https://lore.kernel.org/r/20220911221505.117655-3-jsmart2021@gmail.com
Fixes: 439b93293ff2 ("scsi: lpfc: Fix unsolicited FLOGI receive handling during PT2PT discovery")
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The SANDiags feature is unused, and related code is removed.
Link: https://lore.kernel.org/r/20220819011736.14141-6-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
During a stress offline/online test in PT2PT topology, target rediscovery
can fail with a specific target vendor array.
When the HBA transitions to online mode it is possible to receive an
unsolicited FLOGI before processing the Link Up event. The received FLOGI
will set the defer_flogi_acc_flag, which instructs the driver to wait until
it transmits its own FLOGI before ACKing the received FLOGI. In this
failure scenario, the link up processing clears the set
defer_flogi_acc_flag before we have sent out the FLOGI. As the target has
the higher WWPN and is responsible for sending the PLOGI, the target is
stuck waiting for its FLOGI_ACC that the driver will never send.
Remove the clear of defer_flogi_acc_flag from Link Up event processing. In
this stress test case, the defer_flogi_acc_flag is cleared during the Link
Down event processing anyways.
Link: https://lore.kernel.org/r/20220819011736.14141-2-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|