From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 12 21:40:36 2007 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4BED016A40F; Fri, 12 Jan 2007 21:40:36 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id C8CA413C455; Fri, 12 Jan 2007 21:40:35 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from phobos.samsco.home (phobos.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id l0CKxJb0031877; Fri, 12 Jan 2007 13:59:24 -0700 (MST) (envelope-from scottl@samsco.org) Message-ID: <45A7F6A4.4030707@samsco.org> Date: Fri, 12 Jan 2007 13:59:16 -0700 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.2pre) Gecko/20061227 SeaMonkey/1.1 MIME-Version: 1.0 To: Wilko Bulte References: <20070112195549.GA77181@freebie.xs4all.nl> In-Reply-To: <20070112195549.GA77181@freebie.xs4all.nl> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (pooker.samsco.org [168.103.85.57]); Fri, 12 Jan 2007 13:59:24 -0700 (MST) X-Spam-Status: No, score=-1.4 required=3.8 tests=ALL_TRUSTED autolearn=failed version=3.1.1 X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org X-Mailman-Approved-At: Mon, 15 Jan 2007 00:46:11 +0000 Cc: freebsd-scsi@freebsd.org, Pawel Jakub Dawidek , freebsd-hackers@freebsd.org Subject: Re: iSCSI disconnects dilema X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Jan 2007 21:40:36 -0000 Wilko Bulte wrote: > On Fri, Jan 12, 2007 at 09:31:04PM +0200, Danny Braniss wrote.. >>> --s/l3CgOIzMHHjg/5 >>> Content-Type: text/plain; charset=iso-8859-2 >>> Content-Disposition: inline >>> Content-Transfer-Encoding: quoted-printable >>> >>> On Tue, Jan 09, 2007 at 09:06:46AM +0200, Danny Braniss wrote: >>>> Hi, >>>> While I think I have almost solved the problem of network disconnects, >>>> It downed on me a major problem: >>>> When a 'local' disk crashes, the kernel will probably hang/panic/crash. >>>> if i don't try to recover, then there is no change in the above scenario. >>>> if i try to recover, then the client does not know that it should >>>> umount/fsck/mount. >>>> While all this seems familiar, removing a floppy/disk-on-key while it's >>>> mounted, we could always say "you shouldn't have done that!", with >>>> a network connection, it can happen very often - rebooting the target, a >>>> network hickup, etc. >>>> =20 >>>> So, any ideas? >>> In my opinion it should be done this way: >>> >>> You have a queue of I/O requests. You send the to the other end and wait >>> for confirmation. Until confirmation is received, you keep the requests >>> queued. If the other end dies, you try to reconnect (until some timeout >>> expires, the processes which send those requests will just wait), if you >>> reconnect successfully, you resend not-confirmed requests, if you won't >>> be able to reconnect, you just pass the errors up. >>> >>> This is what I did in ggate and it seems to work. >> That is basically what i'm doing - unacked request get requed. >> the problem I fear (and maybe I'm paranoid :-): > > Paranoia is a Good Thing(TM) in data storage land :-) > >> assume the following scenario, the client(initiator) sends a write command, >> the target acks it, then it crashes, if the write was never completed, >> the initiator goes on as nothing ever happened. > > Yes, but what can the initiator do about that? I mean, it does not have any > visibility of what the target has (or has not) done with the data. ' > > This is roughly the same as a RAID box accepting a write into a writeback cache > and ACK-ing to the host. You can only assume that the RAID box' cache > will get flushed to the spindles properly. All the usual horror scenarios > with a broken battery backup of the cache and a powerfailure etc apply here. > > Wilko > I forget, does iSCSI have a concept of a flush_cache command, or the equivalent of what parallel SCSI does with ordered tags? If so, then that's how your app or OS knows that the transaction got committed to stable storage. It's been long assumed in the external storage world that you are at the mercy of the external storage cache, so the problem that Danny is referring to is nothing new. The real question is how to implement the equivalent mechanism that iSCSI provides in a way that the OS/app can make use of it. For example, CAM issues an ordered tag periodically to flush the disk cache to stable storage. Most storage drivers, including CAM, will issue some sort of a flush_cache command to the controller and media during system shutdown. Scott