Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Dec 2014 18:16:17 +0100
From:      Walter Hop <freebsd@spam.lifeforms.nl>
To:        freebsd-fs@freebsd.org
Subject:   Serious FS hangs and panics on 10.1
Message-ID:  <553B39FA-7DBC-4536-9FD4-11A98E0D4740@spam.lifeforms.nl>

next in thread | raw e-mail | index | archive | help
Hi all,

As some may have read on -stable, various users are having system hangs =
since 10.1-RC when unmounting the root filesystem on 10.1 with =
UFS+softupdates. I'll recap: hangs occur for instance when /sbin/init =
has been meddled with, so people experience it generally after running =
freebsd-update. With the 10.1-p1 update, the bug and mailinglist posts =
got additional activity, so it's a recurring theme. I verified the =
problem still exists in CURRENT, and found lock order reversals which =
may or may not be related. =
(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D195458)

Now the above problem has a simple mitigation: just disable softupdates =
before doing freebsd-update, and you won't hang. Okay, a little =
startling, but I=E2=80=99m still sleeping okay.

Now today, the 10.1 story seems to look a lot worse, with a 10.1 box =
getting back-to-back kernel panics in VFS functions. This is a box =
serving SVN repositories, and SVN is known to exercise a filesystem =
pretty thoroughly (even uncovering NTFS bugs in pre-SP1 Windows7). =
We=E2=80=99ve updated this box from 10.0 to 10.1 a week ago. The four =
panics that we saw (trace below), had the exact same instruction pointer =
and stack trace, so I'm pretty positive we're not looking at a random =
hardware fluke.

The last panics were spaced only minutes apart, which was pretty scary. =
I was fearing persistent disk corruption, but the panics stopped when... =
I disabled softupdates! This was my first shot, as this also solved my =
other stability problem on 10.1. Anyway, the machine has been stable so =
far.

Maybe these two problems are unrelated, it might be too early to tell, =
but in any case, I am getting the strong vibe that something was changed =
in UFS/VFS/softupdates between 10.0 and 10.1 that's possibly very =
problematic and has a risk of causing data loss in the future.

Our experience with 10.0 has been remarkably good (same for earlier =
releases for that matter... in fact I don't think I can remember the =
last kernel panic in production at all.. maybe on 5.2-STABLE?) So, =
that's why we were very happy to see 10.1; but it feels really =
troublesome in the filesystem department, which is very uncharacteristic =
for FreeBSD.

That said, I'd prefer spending some more energy on getting 10.1 working =
well, rather than downgrading or jumping to other systems... But I think =
it really needs some love.

Any ideas on what we could do?

Thanks!
WH

--=20
Walter Hop | PGP key: https://lifeforms.nl/pgp


Panic:

kernel: Fatal trap 12: page fault while in kernel mode
kernel: cpuid =3D 0; apic id =3D 00
kernel: fault virtual address      =3D 0x30058
kernel: fault code         =3D supervisor write data, page not present
kernel: instruction pointer        =3D 0x20:0xffffffff8090e46a
kernel: stack pointer              =3D 0x28:0xfffffe000024d780
kernel: frame pointer              =3D 0x28:0xfffffe000024d850
kernel: code segment               =3D base 0x0, limit 0xfffff, type =
0x1b
kernel: =3D DPL 0, pres 1, long 1, def32 0, gran 1
kernel: processor eflags   =3D interrupt enabled, resume, IOPL =3D 0
kernel: current process            =3D 27466 (httpd)
kernel: trap number                =3D 12
kernel: panic: page fault
kernel: cpuid =3D 0
kernel: KDB: stack backtrace:
kernel: #0 0xffffffff80963000 at kdb_backtrace+0x60
kernel: #1 0xffffffff80928125 at panic+0x155
kernel: #2 0xffffffff80d24f1f at trap_fatal+0x38f
kernel: #3 0xffffffff80d25238 at trap_pfault+0x308
kernel: #4 0xffffffff80d2489a at trap+0x47a
kernel: #5 0xffffffff80d0a782 at calltrap+0x8
kernel: #6 0xffffffff8090ec35 at lf_advlock+0x45
kernel: #7 0xffffffff809b8e69 at vop_stdadvlock+0xa9
kernel: #8 0xffffffff80e44247 at VOP_ADVLOCK_APV+0xa7
kernel: #9 0xffffffff808e4919 at kern_fcntl+0xb39
kernel: #10 0xffffffff808e3d5c at kern_fcntl_freebsd+0xac
kernel: #11 0xffffffff80d25851 at amd64_syscall+0x351
kernel: #12 0xffffffff80d0aa6b at Xfast_syscall+0xfb




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?553B39FA-7DBC-4536-9FD4-11A98E0D4740>