Date: Wed, 18 Feb 2004 00:20:02 +0100 From: Matthias Andree <ma@dt.e-technik.uni-dortmund.de> To: freebsd-stable@freebsd.org Subject: ahc and massive ffs+softupdates corruption Message-ID: <m38yj15m59.fsf@merlin.emma.line.org>
next in thread | raw e-mail | index | archive | help
Hi, I have seen a SCSI timeout that preceded massive file system corruption some days ago. As this happened with the drive's write cache turned off and with ffs softupdates, I think it's worth reporting. Hardware: Micropolis 4345WS (UWSCSI disk drive, 7200/min, device da0) attached to an Adaptec 2940UW Pro (ahc0, aic7880 chip); on the same bus there's a Yamaha CRW4416S CD writer (current firmware) and a Plextor PX-20TS CD-ROM (firmware 1.00). This machine had a SCSI timeout problem on Friday Feb 6th and went down hard, suffering massive file system corruption on /var. At that time, the machine was running portupgrade -a. /var is using softupdates and uses default mount options. As said before, the drive's FWC enable was set to 0 in both the current and saved editions of mode page 8, and I wonder how such massive corruption can happen. I was under the impression that softupdates prevented any on-disk corruptions that require user intervention at fsck time. Given that the write cache was off, I am wondering if there are any ffs+softupdates or tagged command queueing bugs left (that might reorder writes - ordered tag forgotten or something). Here's the log, first the SCSI crash, then, on Feb 7, then excerpts from the reboot including fsck. / and /usr also suffered minor file system inconsistencies that "fsck -p" was able to fix automatically. Feb 6 20:10:20 libertas /kernel: swap_pager: indefinite wait buffer: device: #da/0x20001, blkno: 4296, size: 24576 Feb 6 20:10:41 libertas /kernel: (da0:ahc0:0:0:0): SCB 0x0 - timed out Feb 6 20:13:12 libertas /kernel: (da0:ahc0:0:0:0): Other SCB Timeout Feb 6 20:13:12 libertas /kernel: (da0:ahc0:0:0:0): SCB 0x21 - timed out Feb 6 20:13:21 libertas /kernel: (da0:ahc0:0:0:0): BDR message in message buffer Feb 6 20:13:21 libertas /kernel: (da0:ahc0:0:0:0): SCB 0x21 - timed out Feb 6 20:13:30 libertas /kernel: (da0:ahc0:0:0:0): no longer in timeout, status = 34b Feb 6 20:13:30 libertas /kernel: ahc0: Issued Channel A Bus Reset. 12 SCBs aborted Feb 6 20:13:30 libertas /kernel: swap_pager: indefinite wait buffer: device: #da/0x20001, blkno: 4296, size: 24576 Feb 6 20:13:30 libertas /kernel: swap_pager: indefinite wait buffer: device: #da/0x20001, blkno: 2056, size: 8192 Feb 6 20:13:30 libertas /kernel: (da0:ahc0:0:0:0): SCB 0x20 - timed out Feb 6 20:13:37 libertas /kernel: (da0:ahc0:0:0:0): Other SCB Timeout Feb 6 20:13:37 libertas /kernel: (da0:ahc0:0:0:0): SCB 0xa - timed out Feb 6 20:13:44 libertas /kernel: (da0:ahc0:0:0:0): BDR message in message buffer Feb 6 20:13:44 libertas /kernel: (da0:ahc0:0:0:0): SCB 0xa - timed out Feb 6 20:13:51 libertas /kernel: (da0:ahc0:0:0:0): no longer in timeout, status = 34b Feb 6 20:13:51 libertas /kernel: ahc0: Issued Channel A Bus Reset. 3 SCBs aborted Feb 6 20:13:51 libertas /kernel: (da0:ahc0:0:0:0): SCB 0x22 - timed out Feb 6 20:13:58 libertas /kernel: (da0:ahc0:0:0:0): Other SCB Timeout Feb 6 20:13:58 libertas /kernel: (da0:ahc0:0:0:0): SCB 0xa - timed out Feb 6 20:14:05 libertas /kernel: (da0:ahc0:0:0:0): BDR message in message buffer Feb 6 20:14:05 libertas /kernel: (da0:ahc0:0:0:0): SCB 0xa - timed out Feb 6 20:14:14 libertas /kernel: (da0:ahc0:0:0:0): no longer in timeout, status = 34b Feb 6 20:14:14 libertas /kernel: ahc0: Issued Channel A Bus Reset. 4 SCBs aborted Feb 6 20:15:01 libertas /usr/sbin/cron[58193]: (root) CMD (/usr/libexec/atrun) Feb 6 20:20:00 libertas /usr/sbin/cron[64038]: (root) CMD (/usr/libexec/atrun) Feb 7 12:37:03 libertas /kernel: ahc0: <Adaptec 2940 Pro Ultra SCSI adapter> port 0xe800-0xe8ff mem 0xe3020000-0xe3020fff irq 15 at devi ce 8.0 on pci0 Feb 7 12:37:03 libertas /kernel: aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs Feb 7 12:37:04 libertas /kernel: da0 at ahc0 bus 0 target 0 lun 0 Feb 7 12:37:04 libertas /kernel: da0: <MICROP 4345WS x43h> Fixed Direct Access SCSI-2 device Feb 7 12:37:04 libertas /kernel: da0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled Feb 7 12:37:04 libertas /kernel: da0: 4340MB (8890029 512 byte sectors: 255H 63S/T 553C) Feb 7 12:37:04 libertas /kernel: /dev/da0s1e: Feb 7 12:37:04 libertas /kernel: UNKNOWN FILE TYPE I=256 Feb 7 12:37:04 libertas /kernel: Feb 7 12:37:04 libertas /kernel: /dev/da0s1e: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY. Feb 7 12:37:04 libertas /kernel: THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY: Feb 7 12:37:04 libertas /kernel: Feb 7 12:37:04 libertas /kernel: /dev/da0s1e (/var) Feb 7 12:37:04 libertas /kernel: Automatic file system check failed . . . help! Feb 7 12:37:04 libertas /kernel: Enter full pathname of shell or RETURN for /bin/sh: Feb 7 12:37:04 libertas /kernel: # Entered shell and typed "fsck": Feb 7 12:37:04 libertas /kernel: ** /dev/da0s1e Feb 7 12:37:04 libertas /kernel: ** Last Mounted on /var Feb 7 12:37:04 libertas /kernel: ** Phase 1 - Check Blocks and Sizes Feb 7 12:37:04 libertas /kernel: UNKNOWN FILE TYPE I=256 Feb 7 12:37:04 libertas /kernel: Feb 7 12:37:04 libertas /kernel: UNEXPECTED SOFT UPDATE INCONSISTENCY Feb 7 12:37:04 libertas /kernel: Feb 7 12:37:04 libertas /kernel: CLEAR? [yn] Feb 7 12:37:04 libertas /kernel: Last 6 lines repeat with "I=" figures monotonically increasing, finally: Feb 7 12:37:04 libertas /kernel: UNKNOWN FILE TYPE I=264 Feb 7 12:37:04 libertas /kernel: Feb 7 12:37:04 libertas /kernel: UNEXPECTED SOFT UPDATE INCONSISTENCY Feb 7 12:37:04 libertas /kernel: Feb 7 12:37:04 libertas /kernel: CLEAR? [yn] Feb 7 12:37:04 libertas /kernel: Feb 7 12:37:04 libertas /kernel: Feb 7 12:37:04 libertas /kernel: UNREF FILE The log ends here, but isn't complete, I have agreed to reconnect dozens of files to lost+found after fsck asked, many but not all of them from /var/db/pkg. -- Matthias Andree Encrypt your mail: my GnuPG key ID is 0x052E7D95
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?m38yj15m59.fsf>