From nobody Thu Dec 30 19:29:20 2021
X-Original-To: scsi@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id A2B141924B58
	for <scsi@mlmmj.nyi.freebsd.org>; Thu, 30 Dec 2021 19:29:23 +0000 (UTC)
	(envelope-from jhb@FreeBSD.org)
Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (4096 bits) client-digest SHA256)
	(Client CN "smtp.freebsd.org", Issuer "R3" (verified OK))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4JPyyB5lzyz3J2N;
	Thu, 30 Dec 2021 19:29:22 +0000 (UTC)
	(envelope-from jhb@FreeBSD.org)
Received: from [10.0.1.4] (ralph.baldwin.cx [66.234.199.215])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(Client did not present a certificate)
	(Authenticated sender: jhb)
	by smtp.freebsd.org (Postfix) with ESMTPSA id ED8391EB4;
	Thu, 30 Dec 2021 19:29:21 +0000 (UTC)
	(envelope-from jhb@FreeBSD.org)
Message-ID: <42e175d9-1693-29e2-0b5b-3fa513aa2a2d@FreeBSD.org>
Date: Thu, 30 Dec 2021 11:29:20 -0800
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-scsi
List-Help: <mailto:scsi+help@freebsd.org>
List-Post: <mailto:scsi@freebsd.org>
List-Subscribe: <mailto:scsi+subscribe@freebsd.org>
List-Unsubscribe: <mailto:scsi+unsubscribe@freebsd.org>
Sender: owner-freebsd-scsi@freebsd.org
X-BeenThere: freebsd-scsi@freebsd.org
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:91.0)
 Gecko/20100101 Thunderbird/91.4.1
Content-Language: en-US
To: Alexander Motin <mav@FreeBSD.org>
Cc: =?UTF-8?Q?Edward_Tomasz_Napiera=c5=82a?= <trasz@freebsd.org>,
 scsi@FreeBSD.org, Ken Merry <ken@freebsd.org>
References: <fd383f6f-5a19-e2bb-5383-e559271eb3cd@FreeBSD.org>
 <b6c090ac-6cb0-6173-422d-9aef0b37b8ee@FreeBSD.org>
From: John Baldwin <jhb@FreeBSD.org>
Subject: Re: iSCSI target: Handling in-flight requests during ctld shutdown
In-Reply-To: <b6c090ac-6cb0-6173-422d-9aef0b37b8ee@FreeBSD.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org;
	s=dkim; t=1640892562;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=Jh14X+B53qgBPaS4laBgzWz2tCViKlvKsLV+oXHexdc=;
	b=aCIpVHXyjKIc1RAU9Kw3pz9UvEaz6NX9uvQTjjxLzJ8gByDZARu0b36CmnZ8zegpG8v2SQ
	n/5q8m1nOn82u9xTtm7IejJEfYDtXJY46ZqV5JXkuvSdoDke5wo3enPB39gX9pBoMx5xty
	zLj8eVIc/wtqoNC/GQqnSPLdiKXz52aQke/ZLBEvFeGftmvN3WTuG1BpzFojGHw49T7TZh
	yrYETnE78eKmF90WjMhpH4H33451ZNHC5CiqyuqQSIiA1tXWCN9it814eb3aX3KkNFoJFa
	MdTaFz6w5Mwmqcj01iu3Q6A4SE4sIDS7tFWQs/jvHzo6619HkTkVLctCI/XM4w==
ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1640892562; a=rsa-sha256; cv=none;
	b=WcgliAV7Ut15RGVaVnSRN3NpCUK60R6TJWm89aviIYabCi5OFBzsuWbShLG/eWsVB3Qm7X
	SMbtmxBAqeNhu+sh9i/0gdGvywZ32T6hkoUouAzyCYMuuPjrXlC2FOc14qZbU5Er5RjS1e
	nYotAI5Rowuutj/CUglHkBIi8r5WfckhASdrzA1RCk/ZhBXLicj7n04W4cN2FSXb76K0Ny
	s31pl6g5jNAK1uEmyESnXQkah8dYLy3nTk0Nz78NsAGZieEKwSSAUt9L3HqcdY/Btm0Uw7
	UCZjj0gIXRjlrdR+JBfWd+MC4Ft72U71mSIv39dg9hjKXobKtcO3YW6No5V7tg==
ARC-Authentication-Results: i=1;
	mx1.freebsd.org;
	none
X-ThisMailContainsUnwantedMimeParts: N

On 12/29/21 1:57 PM, Alexander Motin wrote:
> On 29.12.2021 16:39, John Baldwin wrote:
>> One of the tests Chelsio QA has been running against our iSCSI stack
>> with cxgbei offload enabled is to run a bunch of iozone's on an
>> initiator while running a script on the target that keeps stopping
>> ctld (for a minute or so), then starting it again and letting it run
>> for about 5 minutes until stopping it again.
>>
>> One of the errors found last night is that the target reported the
>> following error to the initiator:
>>
>> (da7:iscsi10:0:0:0): CAM status: SCSI Status Error
>> (da7:iscsi10:0:0:0): SCSI status: Check Condition
>> (da7:iscsi10:0:0:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal
>> target failure)
>> (da7:iscsi10:0:0:0): Actual Retry Count: 44
>> (da7:iscsi10:0:0:0): Error 5, Unretryable error
>> g_vfs_done():da7[WRITE(offset=9797632, length=32768)]error = 6
>> UFS: forcibly unmounting /dev/da7 from /ISCSI8
> 
> 
>> So my question I think is what is the expected behavior?  Is the
>> internal error
>> really expected to make it on the wire to be sent to the other side?  Since
>> the connection is shutting down should we just discard the reply altogether
>> rather than reporting an internal error?  If we discarded the reply then
>> the
>> initiator in this particular test would have retried the original
>> request once
>> ctld was restarted and continued running without an error.
> 
> The HARDWARE ERROR is obviously not expected by the initiator.  It
> should better not be leaked after we decided to kill the connection.
> Initiator may retry it and still work happily after reconnect, but
> cleaner would be to not rely on that.  cfiscsi_session_terminate_tasks()
> aborts all running commands by CTL_TASK_I_T_NEXUS_RESET, that make them
> not return statuses to initiator, but I suppose this is the other side
> of the race now.

Hmm, I wonder if we should be setting CTL_FLAG_ABORT instead of setting the
port_status when aborting an I/O?  The comment in ctl_frontend_iscsi.c claims
the backends check the port_status, but I don't see any checks for port_status
at all in backends.  I do see checks for CTL_FLAG_ABORT, and the handler for
the CTL_TASK_I_T_NEXUS_RESET does set CTL_FLAG_ABORT on pending requests.

For the tasks in sciscsi_session_terminate_tasks(), those should already have
CTL_FLAG_ABORT set anyway, but it wouldn't hurt if it were set again by
cfiscsi_data_wait_abort().  For the the cfiscsi_task_management_done case I'm
less certain, but I suspect there too that returning an internal error status
back to the initiator is not expected and that it would be better to just set
CTL_FLAG_ABORT and drop any response?

-- 
John Baldwin