Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 20 Aug 2008 09:47:38 +0200
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        Colin Moller <colin@lefty.tv>
Cc:        swank@storefront.com, colin@storefront.com, current@freebsd.org, elo@storefront.com
Subject:   Re: zpool import hanging on unexpectedly-rebooted machine
Message-ID:  <20080820074738.GA1701@garage.freebsd.pl>
In-Reply-To: <48A95C6F.2010002@lefty.tv>
References:  <48A95C6F.2010002@lefty.tv>

next in thread | previous in thread | raw e-mail | index | archive | help

--FL5UXtIhxfXey3p5
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Aug 18, 2008 at 04:26:39AM -0700, Colin Moller wrote:
> Hey all,
>=20
> I've got an interestingly frustrating problem on my hands with our=20
> 7.0-STABLE boxes running ZFS.  Sun X4500 box running amd64, 16GB of=20
> RAM., 46x1TB disks in RAIDZ1. (other two for the OS.)
>=20
> Uname for the box is:
> FreeBSD sf-nas1-c160a.storefront.com 7.0-STABLE FreeBSD 7.0-STABLE #1:=20
> Sat May 31 14:54:22 PDT 2008    =20
> root@sf-nas1-c160a.storefront.com:/usr/obj/usr/src/sys/X4500  amd64
>=20
> The box has been running relatively reliably for some months now, but=20
> our hosting provider decided to reboot it on us without asking.  After=20
> the box came back, it had lost /boot/zfs/zpool.cache, so I needed to=20
> reimport the only zpool on the machine (named zfsdata).
>=20
> Running zpool import gives me the output I'm expecting, showing a single=
=20
> zpool called zfsdata, status of ONLINE, and all the disks are showing up.
>=20
> However, when I run zpool import -f <numerical_pool_id>, the zpool=20
> command simply hangs up with no disk and no CPU activity.  I've run=20
> truss on the zpool import, and the last thing I see happening is:
>=20
> open("/dev/ad96",O_RDONLY,030115000)             =3D 6 (0x6)
> ioctl(6,DIOCGIDENT,0xffff9480)                   =3D 0 (0x0)
> close(6)                                         =3D 0 (0x0)
>=20
> After turning on vfs.zfs.debug, I also see this on the console:
>=20
> zfs_ereport_post:293[1]: time=3D1219057172.795893475 ereport_version=3D0=
=20
> class=3Dfs.zfs.checksum zfs_scheme_version=3D0 pool=3Dzfsdata=20
> pool_guid=3D316648131406719055 pool_context=3D2=20
> vdev_guid=3D7326417523786577584 vdev_type=3Ddisk vdev_path=3D/dev/ad12=20
> vdev_devid=3Dad:GTF000PAHX5TMF parent_guid=3D6708978418893991394=20
> parent_type=3Draidz zio_err=3D0 zio_offset=3D89290496000 zio_size=3D512=
=20
> zio_object=3D132 zio_level=3D0 zio_blkid=3D244

if I read this correctly, it reports checksum error on disk /dev/ad12,
but because this is RAIDZ, it probably tries to self-heal and maybe
something here goes wrong. I never saw similar problem, so I'm not sure
how to help you. Even if upgrading to -CURRENT is not an option for you,
maybe you can still install -CURRENT on a USB pendriver and recompile it
with new patch? You may also try to remove this disk (ad12) and see if
it behaves any better. Anyway, please keep me informed on what's going
on.

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--FL5UXtIhxfXey3p5
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFIq8wWForvXbEpPzQRAvigAJwN2eD3656SWtHwJFCdTwqSjOeDLQCgwy8/
vy0+MJ+BSbc286s0MxHy2Sk=
=GBxs
-----END PGP SIGNATURE-----

--FL5UXtIhxfXey3p5--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080820074738.GA1701>