From owner-freebsd-scsi@FreeBSD.ORG Sun Sep 12 21:23:01 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7D553106566C for ; Sun, 12 Sep 2010 21:23:01 +0000 (UTC) (envelope-from freebsd-ml@bommel.de) Received: from mail.terralink.de (mail.terralink.de [217.9.16.16]) by mx1.freebsd.org (Postfix) with ESMTP id 445288FC14 for ; Sun, 12 Sep 2010 21:23:01 +0000 (UTC) Received: from sulaco.terralink.de (p579A4C1A.dip.t-dialin.net [87.154.76.26]) by mail.terralink.de (Postfix) with ESMTPA id 8CE16181652 for ; Sun, 12 Sep 2010 23:05:29 +0200 (CEST) Message-ID: <4C8D4073.9050100@bommel.de> Date: Sun, 12 Sep 2010 23:04:51 +0200 From: Gregor Moeller User-Agent: Thunderbird 2.0.0.23 (X11/20091021) MIME-Version: 1.0 To: freebsd-scsi@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: lost iscsi devices not recognised by zfs X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Sep 2010 21:23:01 -0000 Hi folks, given the following setup: iscsi initiator: 2.2.4.2, 8.1-STABLE (tried also 8.1-PRERELEASE with initiator 2.1.0) iscsi target: 8.1-STABLE as of 2010-09-12 ZFS pool on the initiator box: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 ada1 ONLINE 0 0 0 da0 ONLINE 0 0 0 mirror ONLINE 0 0 0 ada2 ONLINE 0 0 0 da1 ONLINE 0 0 0 mirror ONLINE 0 0 0 ada3 ONLINE 0 0 0 da2 ONLINE 0 0 0 ada1, ada2, ada3: local disks da0, da1, da2: iscsi disks A loss of all iscsi devices shouldn't render the pool unusable due to the mirroring. Problem: When I stop the iscsi target service to simulate an error on the target box (reboot etc.), the initiator recognises this: iscontrol[5634]: trapped signal 30 iscontrol[5628]: trapped signal 30 iscontrol[5638]: trapped signal 30 iscontrol: supervise going down iscontrol: supervise going down iscontrol[5628]: sess flags=2000040d iscontrol[5634]: sess flags=2000040d iscontrol[5628]: Reconnect iscontrol[5634]: Reconnect iscontrol: supervise going down iscontrol[5638]: sess flags=2000040d iscontrol[5638]: Reconnect recvpdu: Socket is not connected recvpdu failed iscontrol[5638]: terminated recvpdu: Socket is not connected recvpdu failed iscontrol[5634]: terminated recvpdu: Socket is not connected recvpdu failed iscontrol[5628]: terminated and shortly after this I see: (da2:iscsi2:0:0:0): lost device (da1:iscsi1:0:0:0): lost device (da0:iscsi0:0:0:0): lost device But somehow ZFS does not recognise the lost devices. The devices remain in ZFS "online" status with some r/w errors reported. The pool is unusable though, any r/w access hangs, as does a zpool detach tank da0 or a ls /tank. If I restart the target service after some minutes, the initiator doesn't reconnect although the processes are running: root 5628 0.0 0.0 9200 1616 ?? DEs 7:19PM 0:00.00 iscontrol -c /etc/iscsi/disk1.conf -n disk1 root 5634 0.0 0.0 9200 1616 ?? DEs 7:19PM 0:00.00 iscontrol -c /etc/iscsi/disk2.conf -n disk2 root 5638 0.0 0.0 9200 1616 ?? DEs 7:19PM 0:00.00 iscontrol -c /etc/iscsi/disk3.conf -n disk3 I don't know if this issue is related to the initiator (my guess) or ZFS or some other component (maybe even my misunderstanding of some concepts) so I kindly ask if someone can give me a hint? Best regards, Gregor