Date: Mon, 5 Jul 2010 17:23:03 -0400 (EDT) From: Charles Sprickman <spork@bway.net> To: freebsd-fs@freebsd.org Subject: 7.2 - ufs2 corruption Message-ID: <alpine.OSX.2.00.1007051701020.33454@hotlap.local>
next in thread | raw e-mail | index | archive | help
Howdy, I've posted previously about this, but I'm going to give it one more shot before I start reformatting and/or upgrading things. I have a largish filesystem (1.3TB) that holds a few jails, the main one being a mail server. Running 7.2/amd64 on a Dell 2970 with the mfi raid card, 6GB RAM, UFS2 (SU was enabled, I disabled it for testing to no effect) The symptoms are as follows: Various applications will log messages about "bad file descriptors" (imap, rsync backup script, quota counter): du: ./cur/1271801961.M21831P98582V0000005BI08E85975_0.foo.net,S=2824:2,S: Bad file descriptor The kernel also starts logging messages like this to the console: g_vfs_done():mfid0s1e[READ(offset=2456998070156636160, length=16384)]error = 5 g_vfs_done():mfid0s1e[READ(offset=-7347040593908226048, length=16384)]error = 5 g_vfs_done():mfid0s1e[READ(offset=2456998070156636160, length=16384)]error = 5 g_vfs_done():mfid0s1e[READ(offset=-7347040593908226048, length=16384)]error = 5 g_vfs_done():mfid0s1e[READ(offset=2456998070156636160, length=16384)]error = 5 Note that the offsets look a bit... suspicious, especially those negative ones. Usually within a day or two of those "g_vfs_done()" messages showing up the box will panic shortly after the daily run. Things are hosed up enough that it is unable to save a dump. The panic always looks like this: panic: ufs_dirbad: /spool: bad dir ino 151699770 at offset 163920: mangled entry cpuid = 0 Uptime: 70d22h56m48s Physical memory: 6130 MB Dumping 811 MB: 796 780 764 748 732 716 700 684 668 652 636 620 604 588 572 556 540 524 508 492 476 460 444 428 412 396 380 364 348 332 316 300 284 ** DUMP FAILED (ERROR 16) ** panic: ufs_dirbad: /spool: bad dir ino 150073505 at offset 150: mangled entry cpuid = 2 Uptime: 13d22h30m21s Physical memory: 6130 MB Dumping 816 MB: 801 785 769 753 737 721 705 689 ** DUMP FAILED (ERROR 16) ** Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... The fs, specifically "/spool" (which is where the errors always originate), will be pretty trashed and require a manual fsck. The first pass finds/fixes errors, but does not mark the fs clean. It can take anywhere from 2-4 passes to get a clean fs. The box then runs fine for a few weeks or a few months until the "g_vfs_done" errors start popping up, then it's a repeat. Are there any *known* issues with either the fs or possibly the mfi driver in 7.2? My plan was to do something like this: -shut down services and copy all of /spool off to the backups server -newfs /spool -copy everything back Then if it continues, repeat the above with a 7.3 upgrade before running newfs. If it still continues, then just go nuts and see what 8.0 or 8.1 does. But I'd really like to avoid that. Any tips? Thanks, Charles
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.OSX.2.00.1007051701020.33454>
