Date: Sun, 25 Oct 2009 00:54:47 +0000 (UTC) From: Marcin Wisnicki <mwisnicki+freebsd@gmail.com> To: freebsd-geom@freebsd.org Subject: Infinite loop in GEOM_JOURNAL when device dies Message-ID: <hc07kn$l0v$1@ger.gmane.org>
next in thread | raw e-mail | index | archive | help
Hello, My system looks like this: FreeBSD 7.2-STABLE #3: Sat Oct 17 20:50:32 CEST 2009 da1 at umass-sim1 bus 1 target 0 lun 0 da1: <WD 2500BMV External 1.05> Fixed Direct Access SCSI-4 device da1: 40.000MB/s transfers da1: 238475MB (488397168 512 byte sectors: 255H 63S/T 30401C) GEOM_JOURNAL: Journal 584260361: da1p1 contains data. GEOM_JOURNAL: Journal 584260361: da1p1 contains journal. GEOM_JOURNAL: Journal da1p1 consistent. /dev/ufs/tank1u on /vol/store/tank1 (ufs, asynchronous, local, noatime, acls, gjournal) Device da1 is an external WD Passport hdd connected to a powered usb hub. UFS filesystem on da1p1.journal is labeled "tank1u". Unfortunately from time to time (1 day to many weeks after startup) it stops working: umass1: BBB reset failed, IOERROR umass1: BBB bulk-in clear stall failed, IOERROR umass1: BBB bulk-out clear stall failed, IOERROR umass1: BBB reset failed, IOERROR umass1: BBB bulk-in clear stall failed, IOERROR umass1: BBB bulk-out clear stall failed, IOERROR umass1: BBB reset failed, IOERROR umass1: at uhub4 port 2 (addr 4) disconnected GEOM_JOURNAL: Error while reading data from da1p1 (error=22). At this point I usually get: panic: ufs_dirbad: /vol/store/tank1: bad dir ino 2 at offset 0: mangled entry Which is unfortunate but at least system will recover itself. However today it didn't panic but instead following happened: (da1:umass-sim1:1:0:0): lost device GEOM_JOURNAL: Lost provider da1p1. GEOM_JOURNAL: Cannot destroy journal da1p1 (error=16). Destroy it manually after last close. System was still working but when I've tried doing "ls /vol/store", I got this on serial console: (da1:umass-sim1:1:0:0): lost device GEOM_JOURNAL: Lost provider da1p1. GEOM_JOURNAL: Cannot destroy journal da1p1 (error=16). Destroy it manually after last close. GEOM_JOURNAL: Error while reading data from da1p1 (error=6). GEOM_JOURNAL: Error while reading data from da1p1 (error=6). GEOM_JOURNAL: Error while reading data from da1p1 (error=6). GEOM_JOURNAL: Error while reading data from da1p1 (error=6). GEOM_JOURNAL: Error while reading data from da1p1 (error=6). GEOM_JOURNAL: Error while reading data from da1p1 (error=6). GEOM_JOURNAL: Error while reading data from da1p1 (error=6). GEOM_JOURNAL: Error while reading data from da1p1 (error=6). g_vfs_done():ufs/tank1u[READ(offset=254163107960586240, length=16384)]error = 5 bad block 9261869914, ino 432515 g_vfs_done():ufs/tank1u[READ(offset=7447739922700238848, length=16384)]error = 5 g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5 [skip] g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5 g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5 g_vfs_done():ufs/tank1u[READ(o g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5 [skip] g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5 g_vfs_done():ufs/tank1u[READ(o 17932288, length=16384)]error = 5 g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5 [skip many pages] g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5 g_vfs_done():ufs/tank1u[READ(of ength=16384)]error = 5 g_vfs_done():ufs/tank1u[READ(offset=18968309645312, length=16384)]error = 5 [infinite loop ?] (I have unmangled and reformatted output for readability) While I could ping the machine nothing in userland worked and console was constantly printing geom errors and wouldn't accept any input, so I had to press reset button. I think that gjournal should somehow destroy itself if underlying provider dies - just like a provider of unplugged disk. UFS is supposed to handle disappearing devices for some time now and even while this does not really work yet a panic is better than an infinite loop. SMART log shows some READ DMA EXT errors but no permanent damage - I have smartd doing periodic testing and it completes without failure and all error counters remain at 0. BTW the drive worked fine in Windows, it just "stalled" for a moment sometimes.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?hc07kn$l0v$1>