Date: Sun, 18 May 2008 05:42:17 -0700 From: Jeremy Chadwick <koitsu@FreeBSD.org> To: Andrew Hill <lists@thefrog.net> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS lockup in "zfs" state Message-ID: <20080518124217.GA16222@eos.sc1.parodius.com> In-Reply-To: <683A6ED2-0E54-42D7-8212-898221C05150@thefrog.net> References: <683A6ED2-0E54-42D7-8212-898221C05150@thefrog.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, May 18, 2008 at 05:11:37PM +1000, Andrew Hill wrote: > right now, i'm getting uptimes in the order of days before everything locks > up, i assume its related to this bug, though i'm also getting the following > output when it locks up > > ad2: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=350494631 > ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=234920650 > ad2: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=443427007 > ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=350174938 > ad2: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=350494631 > ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=234920650 > ad2: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=443427007 > ad0: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=350174938 > ad2: FAILURE - WRITE_DMA48 timed out LBA=350494631 > ad0: FAILURE - WRITE_DMA timed out LBA=234920650 > ad2: FAILURE - WRITE_DMA48 timed out LBA=443427007 > ad0: FAILURE - WRITE_DMA48 timed out LBA=350174938 I've documented this fairly well, although I suppose I could write up a diagnosis method as an addendum. Anyway: http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues One thing: are the timeouts always on ad0 and ad2? > typically repeated for a number of different LBA values before the system > panics. I don't know if this is more likely to be related to the cause of > the lockups (e.g. faulty hardware/driver) or if its an effect of the lockup > (e.g. waiting on a deadlocked thread)... from what i've found searching > mailing lists, this kind of error seems to turn up with faulty > hardware/drivers so i guess it could just be that zfs exposes the faults > because its using the hardware differently to my previous ufs setup... It is possible you have some bad hardware, but there are many of us who have seen the above (with or without ZFS) on perfectly good hardware. For some, changing cables fixed the problem, while for others absolutely nothing fixed it (changed cables, changed controller brands, changed to new disks). If the DMA timeouts are easily reproducable, please get in touch with Scott Long <scottl@samsco.org>, who is in the process of researching why these happen. Serial console access might be required. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080518124217.GA16222>