From owner-freebsd-fs@FreeBSD.ORG Mon Mar 1 05:58:26 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 42156106566B for ; Mon, 1 Mar 2010 05:58:26 +0000 (UTC) (envelope-from james-freebsd-fs2@jrv.org) Received: from mail.jrv.org (adsl-70-243-84-13.dsl.austtx.swbell.net [70.243.84.13]) by mx1.freebsd.org (Postfix) with ESMTP id CC2738FC1A for ; Mon, 1 Mar 2010 05:58:25 +0000 (UTC) Received: from kremvax.housenet.jrv (kremvax.housenet.jrv [192.168.3.124]) by mail.jrv.org (8.14.3/8.14.3) with ESMTP id o215wORe011114 for ; Sun, 28 Feb 2010 23:58:24 -0600 (CST) (envelope-from james-freebsd-fs2@jrv.org) Authentication-Results: mail.jrv.org; domainkeys=pass (testing) header.from=james-freebsd-fs2@jrv.org DomainKey-Signature: a=rsa-sha1; s=enigma; d=jrv.org; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:subject: content-type:content-transfer-encoding; b=FcW7LMp1TajMr4pY0kfL8aV3zZVN7wgf7cxr2VqNbKWJlgSW7noSdZQyA2sFK4s+a OydHEsdO/jhTrLdv6atxiyfacqOKY3c1q6OgbkywcEWopB9N7rDAjNkE+iJ/+TXUb7M 4o8tN6patZkK5FC07CehTcpqkekv0B7zgSeRcZM= Message-ID: <4B8B5780.2050601@jrv.org> Date: Sun, 28 Feb 2010 23:58:24 -0600 From: "James R. Van Artsdalen" User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812) MIME-Version: 1.0 To: freebsd-fs Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: [zfs] attach by name/uuid still attaches wrong device X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Mar 2010 05:58:26 -0000 FreeBSD bigtex.housenet.jrv 9.0-CURRENT FreeBSD 9.0-CURRENT #2 r200727M: Tue Dec 22 23:25:56 CST 2009 james@bigtex.housenet.jrv:/usr/obj/usr/src/sys/BIGTEX amd64 It appears the zfs/vdev_geom.c can still attach to the wrong device in some cases. Note in the zpool status output how ada10 appears in two different vdevs. What happened is that a disk failed completely (scbus3 target 3) and is no longer detected by the driver. At boot time: 1. ZFS fails to attach by path and UUID, since what was at ada11 is now at ada10 and has a different UUID. 2. ZFS fails to attach by UUID since that UUID is on a dead drive and can no longer be found anywhere. 3. ZFS then attaches by path blindly, even though that drive is in a different part of the pool and has a different UUID. I don't think it's possible to do this right in vdev_geom.c: there's no way to guess what is intended without a hint from higher ZFS layers as to which drives should be found and which are new. The best fixes I can think of are to expose drives by serial number in GEOM, or perhaps as a fall-back expose names that are geographic locations, i.e., "/dev/scbus0/target3/lun0". # zpool status pool: bigtex state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: none requested config: NAME STATE READ WRITE CKSUM bigtex DEGRADED 0 0 0 mirror ONLINE 0 0 0 ada6 ONLINE 0 0 0 ada13 ONLINE 0 0 0 mirror ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada11 ONLINE 0 0 0 mirror ONLINE 0 0 0 gptid/dbb5f9fd-5e40-11de-bef4-001aa01b0286 ONLINE 0 0 0 ada2p7 ONLINE 0 0 0 mirror ONLINE 0 0 0 ada7 ONLINE 0 0 0 ada14 ONLINE 0 0 0 mirror ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada10 ONLINE 0 0 0 mirror ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada12 ONLINE 0 0 0 mirror ONLINE 0 0 0 ada9 ONLINE 0 0 0 ada15 ONLINE 0 0 0 mirror DEGRADED 0 0 0 ada10 FAULTED 10 754K 0 corrupted data ada16 ONLINE 0 0 0 errors: No known data errors # camcontrol devlist at scbus0 target 0 lun 0 (ada2,pass6) at scbus0 target 1 lun 0 (ada3,pass7) at scbus0 target 2 lun 0 (ada4,pass8) at scbus0 target 3 lun 0 (ada5,pass9) at scbus0 target 15 lun 0 (pass0,pmp0) at scbus3 target 0 lun 0 (ada6,pass10) at scbus3 target 1 lun 0 (ada7,pass11) at scbus3 target 2 lun 0 (ada9,pass13) at scbus3 target 15 lun 0 (pass1,pmp1) at scbus4 target 0 lun 0 (ada8,pass12) at scbus4 target 1 lun 0 (ada10,pass14) at scbus4 target 2 lun 0 (ada11,pass15) at scbus4 target 3 lun 0 (ada12,pass16) at scbus4 target 15 lun 0 (pass2,pmp2) at scbus7 target 0 lun 0 (ada13,pass17) at scbus7 target 1 lun 0 (ada14,pass18) at scbus7 target 2 lun 0 (ada15,pass19) at scbus7 target 3 lun 0 (ada16,pass20) at scbus7 target 15 lun 0 (pass3,pmp3) at scbus8 target 0 lun 0 (pass4,ada0) at scbus11 target 0 lun 0 (pass5,ada1) # grep ada10 /var/run/dmesg.boot vdev_geom_read_guid:301[1]: Reading guid from ada10... vdev_geom_read_guid:339[1]: guid for ada10 is 12768899409278570370 vdev_geom_open_by_path:466[1]: Found provider by name /dev/ada10. vdev_geom_attach:112[1]: Attaching to ada10. vdev_geom_attach:138[1]: Found consumer for ada10. vdev_geom_attach:157[1]: Used existing consumer for ada10. vdev_geom_read_guid:301[1]: Reading guid from ada10... vdev_geom_read_guid:339[1]: guid for ada10 is 12768899409278570370 vdev_geom_detach:173[1]: Closing access to ada10. vdev_geom_open_by_path:477[1]: guid mismatch for provider /dev/ada10: 3665972767133355802 != 12768899409278570370. vdev_geom_read_guid:301[1]: Reading guid from ada10... vdev_geom_read_guid:339[1]: guid for ada10 is 12768899409278570370 vdev_geom_open_by_path:466[1]: Found provider by name /dev/ada10. vdev_geom_attach:112[1]: Attaching to ada10. vdev_geom_attach:138[1]: Found consumer for ada10. vdev_geom_attach:157[1]: Used existing consumer for ada10. vdev_geom_detach:173[1]: Closing access to ada10. vdev_geom_detach:173[1]: Closing access to ada10. vdev_geom_detach:177[1]: Destroyed consumer to ada10. vdev_geom_read_guid:301[1]: Reading guid from ada10... vdev_geom_read_guid:339[1]: guid for ada10 is 12768899409278570370 vdev_geom_attach:112[1]: Attaching to ada10. vdev_geom_attach:153[1]: Created consumer for ada10. vdev_geom_open_by_guid:446[1]: Attach by guid [12768899409278570370] succeeded, provider /dev/ada10. vdev_geom_read_guid:301[1]: Reading guid from ada10... vdev_geom_read_guid:339[1]: guid for ada10 is 12768899409278570370 vdev_geom_open_by_path:466[1]: Found provider by name /dev/ada10. vdev_geom_attach:112[1]: Attaching to ada10. vdev_geom_attach:138[1]: Found consumer for ada10. vdev_geom_attach:157[1]: Used existing consumer for ada10. vdev_geom_read_guid:301[1]: Reading guid from ada10... vdev_geom_read_guid:339[1]: guid for ada10 is 12768899409278570370 vdev_geom_detach:173[1]: Closing access to ada10. vdev_geom_open_by_path:477[1]: guid mismatch for provider /dev/ada10: 3665972767133355802 != 12768899409278570370. vdev_geom_read_guid:301[1]: Reading guid from ada10... vdev_geom_read_guid:339[1]: guid for ada10 is 12768899409278570370 vdev_geom_open_by_path:466[1]: Found provider by name /dev/ada10. vdev_geom_attach:112[1]: Attaching to ada10. vdev_geom_attach:138[1]: Found consumer for ada10. vdev_geom_attach:157[1]: Used existing consumer for ada10. vdev_geom_detach:173[1]: Closing access to ada10. #