Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Dec 2021 16:57:27 -0500
From:      Alexander Motin <mav@FreeBSD.org>
To:        John Baldwin <jhb@FreeBSD.org>, scsi@FreeBSD.org
Cc:        =?UTF-8?Q?Edward_Tomasz_Napiera=c5=82a?= <trasz@freebsd.org>
Subject:   Re: iSCSI target: Handling in-flight requests during ctld shutdown
Message-ID:  <b6c090ac-6cb0-6173-422d-9aef0b37b8ee@FreeBSD.org>
In-Reply-To: <fd383f6f-5a19-e2bb-5383-e559271eb3cd@FreeBSD.org>
References:  <fd383f6f-5a19-e2bb-5383-e559271eb3cd@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 29.12.2021 16:39, John Baldwin wrote:
> One of the tests Chelsio QA has been running against our iSCSI stack
> with cxgbei offload enabled is to run a bunch of iozone's on an
> initiator while running a script on the target that keeps stopping
> ctld (for a minute or so), then starting it again and letting it run
> for about 5 minutes until stopping it again.
> 
> One of the errors found last night is that the target reported the
> following error to the initiator:
> 
> (da7:iscsi10:0:0:0): CAM status: SCSI Status Error
> (da7:iscsi10:0:0:0): SCSI status: Check Condition
> (da7:iscsi10:0:0:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal
> target failure)
> (da7:iscsi10:0:0:0): Actual Retry Count: 44
> (da7:iscsi10:0:0:0): Error 5, Unretryable error
> g_vfs_done():da7[WRITE(offset=9797632, length=32768)]error = 6
> UFS: forcibly unmounting /dev/da7 from /ISCSI8


> So my question I think is what is the expected behavior?  Is the
> internal error
> really expected to make it on the wire to be sent to the other side?  Since
> the connection is shutting down should we just discard the reply altogether
> rather than reporting an internal error?  If we discarded the reply then
> the
> initiator in this particular test would have retried the original
> request once
> ctld was restarted and continued running without an error.

The HARDWARE ERROR is obviously not expected by the initiator.  It
should better not be leaked after we decided to kill the connection.
Initiator may retry it and still work happily after reconnect, but
cleaner would be to not rely on that.  cfiscsi_session_terminate_tasks()
aborts all running commands by CTL_TASK_I_T_NEXUS_RESET, that make them
not return statuses to initiator, but I suppose this is the other side
of the race now.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b6c090ac-6cb0-6173-422d-9aef0b37b8ee>