Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 18 May 2008 05:42:17 -0700
From:      Jeremy Chadwick <koitsu@FreeBSD.org>
To:        Andrew Hill <lists@thefrog.net>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS lockup in "zfs" state
Message-ID:  <20080518124217.GA16222@eos.sc1.parodius.com>
In-Reply-To: <683A6ED2-0E54-42D7-8212-898221C05150@thefrog.net>
References:  <683A6ED2-0E54-42D7-8212-898221C05150@thefrog.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, May 18, 2008 at 05:11:37PM +1000, Andrew Hill wrote:
> right now, i'm getting uptimes in the order of days before everything locks 
> up, i assume its related to this bug, though i'm also getting the following 
> output when it locks up
>
> ad2: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=350494631
> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=234920650
> ad2: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=443427007
> ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=350174938
> ad2: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=350494631
> ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=234920650
> ad2: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=443427007
> ad0: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=350174938
> ad2: FAILURE - WRITE_DMA48 timed out LBA=350494631
> ad0: FAILURE - WRITE_DMA timed out LBA=234920650
> ad2: FAILURE - WRITE_DMA48 timed out LBA=443427007
> ad0: FAILURE - WRITE_DMA48 timed out LBA=350174938

I've documented this fairly well, although I suppose I could write up a
diagnosis method as an addendum.  Anyway:

http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues

One thing: are the timeouts always on ad0 and ad2?

> typically repeated for a number of different LBA values before the system 
> panics. I don't know if this is more likely to be related to the cause of 
> the lockups (e.g. faulty hardware/driver) or if its an effect of the lockup 
> (e.g. waiting on a deadlocked thread)... from what i've found searching 
> mailing lists, this kind of error seems to turn up with faulty 
> hardware/drivers so i guess it could just be that zfs exposes the faults 
> because its using the hardware differently to my previous ufs setup...

It is possible you have some bad hardware, but there are many of us who
have seen the above (with or without ZFS) on perfectly good hardware.
For some, changing cables fixed the problem, while for others absolutely
nothing fixed it (changed cables, changed controller brands, changed to
new disks).

If the DMA timeouts are easily reproducable, please get in touch with
Scott Long <scottl@samsco.org>, who is in the process of researching why
these happen.  Serial console access might be required.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080518124217.GA16222>