From owner-freebsd-fs@FreeBSD.ORG  Fri Dec 12 17:16:27 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id ED16A8E9
 for <freebsd-fs@freebsd.org>; Fri, 12 Dec 2014 17:16:27 +0000 (UTC)
Received: from tau.lfms.nl (tau.lfms.nl [93.189.130.30])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 957C76C4
 for <freebsd-fs@freebsd.org>; Fri, 12 Dec 2014 17:16:27 +0000 (UTC)
Received: from sim.dt.lfms.nl (dt.lfms.nl [83.84.86.53])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by tau.lfms.nl (Postfix) with ESMTPS id 0B5B9892AB
 for <freebsd-fs@freebsd.org>; Fri, 12 Dec 2014 18:16:18 +0100 (CET)
Received: from [192.168.130.112] (borax.dt.lfms.nl [192.168.130.112])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by sim.dt.lfms.nl (Postfix) with ESMTPS id BF7DA9C09085
 for <freebsd-fs@freebsd.org>; Fri, 12 Dec 2014 18:16:17 +0100 (CET)
From: Walter Hop <freebsd@spam.lifeforms.nl>
Subject: Serious FS hangs and panics on 10.1
Message-Id: <553B39FA-7DBC-4536-9FD4-11A98E0D4740@spam.lifeforms.nl>
Date: Fri, 12 Dec 2014 18:16:17 +0100
To: freebsd-fs@freebsd.org
Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\))
X-Mailer: Apple Mail (2.1993)
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Dec 2014 17:16:28 -0000

Hi all,

As some may have read on -stable, various users are having system hangs =
since 10.1-RC when unmounting the root filesystem on 10.1 with =
UFS+softupdates. I'll recap: hangs occur for instance when /sbin/init =
has been meddled with, so people experience it generally after running =
freebsd-update. With the 10.1-p1 update, the bug and mailinglist posts =
got additional activity, so it's a recurring theme. I verified the =
problem still exists in CURRENT, and found lock order reversals which =
may or may not be related. =
(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D195458)

Now the above problem has a simple mitigation: just disable softupdates =
before doing freebsd-update, and you won't hang. Okay, a little =
startling, but I=E2=80=99m still sleeping okay.

Now today, the 10.1 story seems to look a lot worse, with a 10.1 box =
getting back-to-back kernel panics in VFS functions. This is a box =
serving SVN repositories, and SVN is known to exercise a filesystem =
pretty thoroughly (even uncovering NTFS bugs in pre-SP1 Windows7). =
We=E2=80=99ve updated this box from 10.0 to 10.1 a week ago. The four =
panics that we saw (trace below), had the exact same instruction pointer =
and stack trace, so I'm pretty positive we're not looking at a random =
hardware fluke.

The last panics were spaced only minutes apart, which was pretty scary. =
I was fearing persistent disk corruption, but the panics stopped when... =
I disabled softupdates! This was my first shot, as this also solved my =
other stability problem on 10.1. Anyway, the machine has been stable so =
far.

Maybe these two problems are unrelated, it might be too early to tell, =
but in any case, I am getting the strong vibe that something was changed =
in UFS/VFS/softupdates between 10.0 and 10.1 that's possibly very =
problematic and has a risk of causing data loss in the future.

Our experience with 10.0 has been remarkably good (same for earlier =
releases for that matter... in fact I don't think I can remember the =
last kernel panic in production at all.. maybe on 5.2-STABLE?) So, =
that's why we were very happy to see 10.1; but it feels really =
troublesome in the filesystem department, which is very uncharacteristic =
for FreeBSD.

That said, I'd prefer spending some more energy on getting 10.1 working =
well, rather than downgrading or jumping to other systems... But I think =
it really needs some love.

Any ideas on what we could do?

Thanks!
WH

--=20
Walter Hop | PGP key: https://lifeforms.nl/pgp


Panic:

kernel: Fatal trap 12: page fault while in kernel mode
kernel: cpuid =3D 0; apic id =3D 00
kernel: fault virtual address      =3D 0x30058
kernel: fault code         =3D supervisor write data, page not present
kernel: instruction pointer        =3D 0x20:0xffffffff8090e46a
kernel: stack pointer              =3D 0x28:0xfffffe000024d780
kernel: frame pointer              =3D 0x28:0xfffffe000024d850
kernel: code segment               =3D base 0x0, limit 0xfffff, type =
0x1b
kernel: =3D DPL 0, pres 1, long 1, def32 0, gran 1
kernel: processor eflags   =3D interrupt enabled, resume, IOPL =3D 0
kernel: current process            =3D 27466 (httpd)
kernel: trap number                =3D 12
kernel: panic: page fault
kernel: cpuid =3D 0
kernel: KDB: stack backtrace:
kernel: #0 0xffffffff80963000 at kdb_backtrace+0x60
kernel: #1 0xffffffff80928125 at panic+0x155
kernel: #2 0xffffffff80d24f1f at trap_fatal+0x38f
kernel: #3 0xffffffff80d25238 at trap_pfault+0x308
kernel: #4 0xffffffff80d2489a at trap+0x47a
kernel: #5 0xffffffff80d0a782 at calltrap+0x8
kernel: #6 0xffffffff8090ec35 at lf_advlock+0x45
kernel: #7 0xffffffff809b8e69 at vop_stdadvlock+0xa9
kernel: #8 0xffffffff80e44247 at VOP_ADVLOCK_APV+0xa7
kernel: #9 0xffffffff808e4919 at kern_fcntl+0xb39
kernel: #10 0xffffffff808e3d5c at kern_fcntl_freebsd+0xac
kernel: #11 0xffffffff80d25851 at amd64_syscall+0x351
kernel: #12 0xffffffff80d0aa6b at Xfast_syscall+0xfb