Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 25 Oct 2009 00:54:47 +0000 (UTC)
From:      Marcin Wisnicki <mwisnicki+freebsd@gmail.com>
To:        freebsd-geom@freebsd.org
Subject:   Infinite loop in GEOM_JOURNAL when device dies
Message-ID:  <hc07kn$l0v$1@ger.gmane.org>

next in thread | raw e-mail | index | archive | help
Hello,

My system looks like this:

FreeBSD 7.2-STABLE #3: Sat Oct 17 20:50:32 CEST 2009
da1 at umass-sim1 bus 1 target 0 lun 0
da1: <WD 2500BMV External 1.05> Fixed Direct Access SCSI-4 device
da1: 40.000MB/s transfers
da1: 238475MB (488397168 512 byte sectors: 255H 63S/T 30401C)
GEOM_JOURNAL: Journal 584260361: da1p1 contains data.
GEOM_JOURNAL: Journal 584260361: da1p1 contains journal.
GEOM_JOURNAL: Journal da1p1 consistent.

/dev/ufs/tank1u on /vol/store/tank1 (ufs, asynchronous, local, noatime, 
acls, gjournal)


Device da1 is an external WD Passport hdd connected to a powered usb hub.
UFS filesystem on da1p1.journal is labeled "tank1u".
Unfortunately from time to time (1 day to many weeks after startup) it 
stops working:

umass1: BBB reset failed, IOERROR
umass1: BBB bulk-in clear stall failed, IOERROR
umass1: BBB bulk-out clear stall failed, IOERROR
umass1: BBB reset failed, IOERROR
umass1: BBB bulk-in clear stall failed, IOERROR
umass1: BBB bulk-out clear stall failed, IOERROR
umass1: BBB reset failed, IOERROR
umass1: at uhub4 port 2 (addr 4) disconnected
GEOM_JOURNAL: Error while reading data from da1p1 (error=22).


At this point I usually get:
panic: ufs_dirbad: /vol/store/tank1: bad dir ino 2 at offset 0: mangled entry

Which is unfortunate but at least system will recover itself.
However today it didn't panic but instead following happened:

(da1:umass-sim1:1:0:0): lost device
GEOM_JOURNAL: Lost provider da1p1.
GEOM_JOURNAL: Cannot destroy journal da1p1 (error=16). Destroy it 
manually after last close.


System was still working but when I've tried doing "ls /vol/store", I got 
this on serial console:

(da1:umass-sim1:1:0:0): lost device
GEOM_JOURNAL: Lost provider da1p1.
GEOM_JOURNAL: Cannot destroy journal da1p1 (error=16). Destroy it manually after last close.

GEOM_JOURNAL: Error while reading data from da1p1 (error=6).
GEOM_JOURNAL: Error while reading data from da1p1 (error=6).
GEOM_JOURNAL: Error while reading data from da1p1 (error=6).
GEOM_JOURNAL: Error while reading data from da1p1 (error=6).
GEOM_JOURNAL: Error while reading data from da1p1 (error=6).
GEOM_JOURNAL: Error while reading data from da1p1 (error=6).
GEOM_JOURNAL: Error while reading data from da1p1 (error=6).
GEOM_JOURNAL: Error while reading data from da1p1 (error=6).
g_vfs_done():ufs/tank1u[READ(offset=254163107960586240, length=16384)]error = 5
bad block 9261869914, ino 432515
g_vfs_done():ufs/tank1u[READ(offset=7447739922700238848, length=16384)]error = 5
g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5
[skip]
g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5
g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5
g_vfs_done():ufs/tank1u[READ(o
g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5
[skip]
g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5
g_vfs_done():ufs/tank1u[READ(o
17932288, length=16384)]error = 5
g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5
[skip many pages]
g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5
g_vfs_done():ufs/tank1u[READ(of
ength=16384)]error = 5
g_vfs_done():ufs/tank1u[READ(offset=18968309645312, length=16384)]error = 5
[infinite loop ?]

(I have unmangled and reformatted output for readability)

While I could ping the machine nothing in userland worked and console was 
constantly printing geom errors and wouldn't accept any input, so I had 
to press reset button.

I think that gjournal should somehow destroy itself if underlying 
provider dies - just like a provider of unplugged disk.
UFS is supposed to handle disappearing devices for some time now and even 
while this does not really work yet a panic is better than an infinite 
loop.

SMART log shows some READ DMA EXT errors but no permanent damage - I have smartd
doing periodic testing and it completes without failure and all error counters
remain at 0.
BTW the drive worked fine in Windows, it just "stalled" for a moment sometimes.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?hc07kn$l0v$1>