From nobody Thu Dec 30 19:29:20 2021 X-Original-To: scsi@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id A2B141924B58 for ; Thu, 30 Dec 2021 19:29:23 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JPyyB5lzyz3J2N; Thu, 30 Dec 2021 19:29:22 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from [10.0.1.4] (ralph.baldwin.cx [66.234.199.215]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) (Authenticated sender: jhb) by smtp.freebsd.org (Postfix) with ESMTPSA id ED8391EB4; Thu, 30 Dec 2021 19:29:21 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Message-ID: <42e175d9-1693-29e2-0b5b-3fa513aa2a2d@FreeBSD.org> Date: Thu, 30 Dec 2021 11:29:20 -0800 List-Id: SCSI subsystem List-Archive: https://lists.freebsd.org/archives/freebsd-scsi List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-scsi@freebsd.org X-BeenThere: freebsd-scsi@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:91.0) Gecko/20100101 Thunderbird/91.4.1 Content-Language: en-US To: Alexander Motin Cc: =?UTF-8?Q?Edward_Tomasz_Napiera=c5=82a?= , scsi@FreeBSD.org, Ken Merry References: From: John Baldwin Subject: Re: iSCSI target: Handling in-flight requests during ctld shutdown In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1640892562; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Jh14X+B53qgBPaS4laBgzWz2tCViKlvKsLV+oXHexdc=; b=aCIpVHXyjKIc1RAU9Kw3pz9UvEaz6NX9uvQTjjxLzJ8gByDZARu0b36CmnZ8zegpG8v2SQ n/5q8m1nOn82u9xTtm7IejJEfYDtXJY46ZqV5JXkuvSdoDke5wo3enPB39gX9pBoMx5xty zLj8eVIc/wtqoNC/GQqnSPLdiKXz52aQke/ZLBEvFeGftmvN3WTuG1BpzFojGHw49T7TZh yrYETnE78eKmF90WjMhpH4H33451ZNHC5CiqyuqQSIiA1tXWCN9it814eb3aX3KkNFoJFa MdTaFz6w5Mwmqcj01iu3Q6A4SE4sIDS7tFWQs/jvHzo6619HkTkVLctCI/XM4w== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1640892562; a=rsa-sha256; cv=none; b=WcgliAV7Ut15RGVaVnSRN3NpCUK60R6TJWm89aviIYabCi5OFBzsuWbShLG/eWsVB3Qm7X SMbtmxBAqeNhu+sh9i/0gdGvywZ32T6hkoUouAzyCYMuuPjrXlC2FOc14qZbU5Er5RjS1e nYotAI5Rowuutj/CUglHkBIi8r5WfckhASdrzA1RCk/ZhBXLicj7n04W4cN2FSXb76K0Ny s31pl6g5jNAK1uEmyESnXQkah8dYLy3nTk0Nz78NsAGZieEKwSSAUt9L3HqcdY/Btm0Uw7 UCZjj0gIXRjlrdR+JBfWd+MC4Ft72U71mSIv39dg9hjKXobKtcO3YW6No5V7tg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N On 12/29/21 1:57 PM, Alexander Motin wrote: > On 29.12.2021 16:39, John Baldwin wrote: >> One of the tests Chelsio QA has been running against our iSCSI stack >> with cxgbei offload enabled is to run a bunch of iozone's on an >> initiator while running a script on the target that keeps stopping >> ctld (for a minute or so), then starting it again and letting it run >> for about 5 minutes until stopping it again. >> >> One of the errors found last night is that the target reported the >> following error to the initiator: >> >> (da7:iscsi10:0:0:0): CAM status: SCSI Status Error >> (da7:iscsi10:0:0:0): SCSI status: Check Condition >> (da7:iscsi10:0:0:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal >> target failure) >> (da7:iscsi10:0:0:0): Actual Retry Count: 44 >> (da7:iscsi10:0:0:0): Error 5, Unretryable error >> g_vfs_done():da7[WRITE(offset=9797632, length=32768)]error = 6 >> UFS: forcibly unmounting /dev/da7 from /ISCSI8 > > >> So my question I think is what is the expected behavior?  Is the >> internal error >> really expected to make it on the wire to be sent to the other side?  Since >> the connection is shutting down should we just discard the reply altogether >> rather than reporting an internal error?  If we discarded the reply then >> the >> initiator in this particular test would have retried the original >> request once >> ctld was restarted and continued running without an error. > > The HARDWARE ERROR is obviously not expected by the initiator. It > should better not be leaked after we decided to kill the connection. > Initiator may retry it and still work happily after reconnect, but > cleaner would be to not rely on that. cfiscsi_session_terminate_tasks() > aborts all running commands by CTL_TASK_I_T_NEXUS_RESET, that make them > not return statuses to initiator, but I suppose this is the other side > of the race now. Hmm, I wonder if we should be setting CTL_FLAG_ABORT instead of setting the port_status when aborting an I/O? The comment in ctl_frontend_iscsi.c claims the backends check the port_status, but I don't see any checks for port_status at all in backends. I do see checks for CTL_FLAG_ABORT, and the handler for the CTL_TASK_I_T_NEXUS_RESET does set CTL_FLAG_ABORT on pending requests. For the tasks in sciscsi_session_terminate_tasks(), those should already have CTL_FLAG_ABORT set anyway, but it wouldn't hurt if it were set again by cfiscsi_data_wait_abort(). For the the cfiscsi_task_management_done case I'm less certain, but I suspect there too that returning an internal error status back to the initiator is not expected and that it would be better to just set CTL_FLAG_ABORT and drop any response? -- John Baldwin