Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 Dec 2021 11:29:20 -0800
From:      John Baldwin <jhb@FreeBSD.org>
To:        Alexander Motin <mav@FreeBSD.org>
Cc:        =?UTF-8?Q?Edward_Tomasz_Napiera=c5=82a?= <trasz@freebsd.org>, scsi@FreeBSD.org, Ken Merry <ken@freebsd.org>
Subject:   Re: iSCSI target: Handling in-flight requests during ctld shutdown
Message-ID:  <42e175d9-1693-29e2-0b5b-3fa513aa2a2d@FreeBSD.org>
In-Reply-To: <b6c090ac-6cb0-6173-422d-9aef0b37b8ee@FreeBSD.org>
References:  <fd383f6f-5a19-e2bb-5383-e559271eb3cd@FreeBSD.org> <b6c090ac-6cb0-6173-422d-9aef0b37b8ee@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 12/29/21 1:57 PM, Alexander Motin wrote:
> On 29.12.2021 16:39, John Baldwin wrote:
>> One of the tests Chelsio QA has been running against our iSCSI stack
>> with cxgbei offload enabled is to run a bunch of iozone's on an
>> initiator while running a script on the target that keeps stopping
>> ctld (for a minute or so), then starting it again and letting it run
>> for about 5 minutes until stopping it again.
>>
>> One of the errors found last night is that the target reported the
>> following error to the initiator:
>>
>> (da7:iscsi10:0:0:0): CAM status: SCSI Status Error
>> (da7:iscsi10:0:0:0): SCSI status: Check Condition
>> (da7:iscsi10:0:0:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal
>> target failure)
>> (da7:iscsi10:0:0:0): Actual Retry Count: 44
>> (da7:iscsi10:0:0:0): Error 5, Unretryable error
>> g_vfs_done():da7[WRITE(offset=9797632, length=32768)]error = 6
>> UFS: forcibly unmounting /dev/da7 from /ISCSI8
> 
> 
>> So my question I think is what is the expected behavior?  Is the
>> internal error
>> really expected to make it on the wire to be sent to the other side?  Since
>> the connection is shutting down should we just discard the reply altogether
>> rather than reporting an internal error?  If we discarded the reply then
>> the
>> initiator in this particular test would have retried the original
>> request once
>> ctld was restarted and continued running without an error.
> 
> The HARDWARE ERROR is obviously not expected by the initiator.  It
> should better not be leaked after we decided to kill the connection.
> Initiator may retry it and still work happily after reconnect, but
> cleaner would be to not rely on that.  cfiscsi_session_terminate_tasks()
> aborts all running commands by CTL_TASK_I_T_NEXUS_RESET, that make them
> not return statuses to initiator, but I suppose this is the other side
> of the race now.

Hmm, I wonder if we should be setting CTL_FLAG_ABORT instead of setting the
port_status when aborting an I/O?  The comment in ctl_frontend_iscsi.c claims
the backends check the port_status, but I don't see any checks for port_status
at all in backends.  I do see checks for CTL_FLAG_ABORT, and the handler for
the CTL_TASK_I_T_NEXUS_RESET does set CTL_FLAG_ABORT on pending requests.

For the tasks in sciscsi_session_terminate_tasks(), those should already have
CTL_FLAG_ABORT set anyway, but it wouldn't hurt if it were set again by
cfiscsi_data_wait_abort().  For the the cfiscsi_task_management_done case I'm
less certain, but I suspect there too that returning an internal error status
back to the initiator is not expected and that it would be better to just set
CTL_FLAG_ABORT and drop any response?

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?42e175d9-1693-29e2-0b5b-3fa513aa2a2d>