From owner-freebsd-bugs@FreeBSD.ORG Sat Mar 7 23:00:09 2009 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1D86A1065674 for ; Sat, 7 Mar 2009 23:00:09 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id DF9298FC13 for ; Sat, 7 Mar 2009 23:00:08 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n27N058d079447 for ; Sat, 7 Mar 2009 23:00:05 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n27N05RS079446; Sat, 7 Mar 2009 23:00:05 GMT (envelope-from gnats) Resent-Date: Sat, 7 Mar 2009 23:00:05 GMT Resent-Message-Id: <200903072300.n27N05RS079446@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Dieter Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BFB94106566B for ; Sat, 7 Mar 2009 22:55:50 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id A36238FC08 for ; Sat, 7 Mar 2009 22:55:50 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n27MtoXY024467 for ; Sat, 7 Mar 2009 22:55:50 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.3/8.14.3/Submit) id n27Mtos4024466; Sat, 7 Mar 2009 22:55:50 GMT (envelope-from nobody) Message-Id: <200903072255.n27Mtos4024466@www.freebsd.org> Date: Sat, 7 Mar 2009 22:55:50 GMT From: Dieter To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: kern/132397: reboot causes filesystem corruption X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 Mar 2009 23:00:09 -0000 >Number: 132397 >Category: kern >Synopsis: reboot causes filesystem corruption >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sat Mar 07 23:00:05 UTC 2009 >Closed-Date: >Last-Modified: >Originator: Dieter >Release: 7.1 >Organization: >Environment: 7.1-RELEASE amd64 >Description: FreeBSD 7.1 amd64 soft updates, disk write cache off System was running fine. I typed reboot. It printed out a *very* long string of numbers of buffers left, (see below) usually the string is very short (one line). It gave up with 3 buffers left to go. (why?) A filesystem is now corrupt and fsck causes panic. Through the miracle of soft updates filesystems can now survive power outages and the reset button with no damage, but an orderly reboot causes unfixable corruption? THIS IS COMPLETELY UNACCEPTABLE AND NEEDS TO BE FIXED! Q1) What "timed out" ? (see below) kproc_shutdown() maybe? Q2) Why does the number of buffers sometimes go up? Q3) Why doesn't it get all the buffers synced out to disk? It is like something is still generating dirty buffers, but aren't all processes killed before it gets to this point? I supposed I can crank up static int kproc_shutdown_wait = 60; in kern_shutdown.c but that just gives it more time. It could still fail. kern_kthread.c says: * Advise a kernel process to suspend (or resume) in its main loop. * Participation is voluntary. Voluntary eh? Q4) Could something refuse to suspend and cause this? Why is this allowed? in kern_shutdown.c: /* * With soft updates, some buffers that are * written will be remarked as dirty until other * buffers are written. */ for (iter = pbusy = 0; iter < 20; iter++) { I suppose the "20" could be increased, but again, it could still fail. Q5) Does this buffer syncing not follow the soft updates protocol? If not why not? There is something fundamentally wrong here, but I don't know what. Barring a disk write error, which is not the case here, this should never happen. Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...1 3 1 3 2 0 1 2 0 1 0 2 2 0 2 2 2 2 2 2 0 2 0 2 2 2 2 0 2 2 2 2 0 1 0 2 0 2 2 2 2 0 2 0 2 1 0 2 2 0 2 0 1 2 2 2 1 0 1 2 2 time d out Syncing disks, buffers remaining... 40 40 34 33 33 17 17 11 10 10 2 1 1 37 39 17 17 5 5 39 40 23 23 8 8 2 1 1 37 38 17 17 5 5 40 41 23 23 8 7 7 1 1 34 35 17 17 7 6 6 1 31 35 23 22 22 21 21 9 9 2 1 1 27 26 26 21 21 21 1 1 27 24 24 21 21 10 1 0 4 3 3 30 28 28 22 21 21 21 30 22 21 21 21 1 1 27 25 25 22 22 10 10 3 3 27 24 2 3 23 21 21 7 6 6 30 27 26 26 21 21 21 31 22 22 21 21 8 7 7 1 1 26 22 22 21 21 9 9 2 2 28 25 25 21 21 21 1 1 25 22 22 21 21 9 9 2 2 28 26 26 21 21 21 1 1 26 24 2 4 21 21 21 30 22 21 21 21 3 2 2 28 26 25 25 21 21 21 1 1 26 23 23 21 21 21 31 22 22 21 21 10 9 9 2 2 28 25 24 24 21 21 10 10 2 2 29 26 26 21 21 21 1 1 27 24 24 21 21 21 31 23 22 22 21 21 7 7 31 30 29 29 22 22 21 21 10 9 9 3 2 2 28 26 25 25 21 21 21 1 1 26 23 23 21 21 10 10 3 3 30 27 27 21 21 21 2 2 29 26 26 21 21 21 1 31 36 23 23 21 21 10 10 3 3 30 28 27 27 21 21 21 1 1 26 23 23 21 21 10 10 2 1 1 27 24 23 23 21 21 6 6 31 28 27 27 21 21 21 2 1 1 27 25 25 21 21 21 29 21 21 21 2 2 28 25 25 21 21 21 31 22 22 21 21 9 9 3 2 2 29 27 26 26 21 21 21 1 1 27 24 24 21 21 21 31 23 22 22 21 21 9 9 3 3 30 27 27 21 21 21 1 1 27 24 24 22 22 21 21 5 5 31 28 28 21 21 21 3 3 28 25 25 21 21 21 1 1 26 21 21 21 2 2 29 26 25 25 21 21 10 10 2 2 28 24 24 21 21 10 10 1 1 27 24 23 23 21 21 9 8 8 1 1 27 24 24 21 21 10 9 9 2 1 1 25 22 22 21 21 9 9 3 2 2 28 24 23 23 21 21 9 9 3 3 30 26 26 21 21 21 1 1 26 22 22 21 21 10 10 4 3 3 30 27 26 26 21 21 21 31 22 21 21 21 4 4 30 26 26 21 21 21 1 1 27 23 23 21 21 10 10 3 3 28 25 25 21 21 21 1 31 35 22 22 21 21 10 1 0 3 3 28 26 25 25 21 21 21 1 1 26 23 23 21 21 21 31 22 21 21 21 3 3 29 26 26 21 21 21 1 1 26 23 23 21 21 7 7 1 1 26 23 23 21 21 9 9 1 40 46 37 37 33 32 32 26 26 14 14 5 5 40 40 24 23 23 6 6 1 38 44 36 36 31 31 24 24 10 10 3 3 39 41 24 24 14 14 5 5 40 41 24 24 9 9 3 3 39 41 25 25 10 9 9 4 4 40 41 25 25 10 10 2 2 38 39 2 3 23 7 7 1 1 36 37 8 8 1 1 38 37 37 32 31 31 25 24 24 9 9 3 3 36 36 22 16 16 6 6 31 28 28 22 21 21 21 31 22 21 21 21 31 22 21 21 21 3 2 2 28 25 24 24 21 21 21 1 1 26 22 22 21 21 9 9 2 1 1 27 25 24 24 21 21 21 1 1 25 22 22 21 21 9 9 3 2 2 29 26 25 25 21 21 21 1 1 26 23 23 21 21 10 10 3 3 29 26 26 21 21 21 1 1 26 23 23 2 1 21 21 31 23 22 22 21 21 10 10 4 4 29 26 26 21 21 21 1 1 26 23 23 21 21 21 1 31 35 22 22 21 21 9 9 1 1 27 24 23 23 21 21 21 1 1 26 22 22 21 21 9 8 8 2 1 1 27 2 5 24 24 21 21 7 7 29 27 26 26 21 21 21 31 22 22 21 21 8 7 7 1 1 26 23 22 22 21 2 1 10 10 4 4 30 27 27 21 21 21 1 1 27 24 24 21 21 21 1 31 35 22 21 21 21 4 3 3 20 17 16 16 15 14 15 5 5 2 2 3 3 3 3 5 3 3 5 3 3 3 5 3 2 5 3 3 4 3 3 2 1 1 3 4 3 3 3 3 3 1 1 3 3 5 3 3 4 4 3 3 3 3 3 5 5 3 2 3 3 3 3 5 3 2 5 3 3 4 3 3 4 3 3 5 4 4 3 2 5 3 2 3 2 2 3 3 3 3 5 3 3 4 3 3 5 4 3 3 3 5 3 2 3 3 3 3 5 3 3 3 5 4 4 3 3 3 3 5 1 1 3 3 3 3 2 3 3 2 2 3 3 3 3 3 3 3 2 2 5 3 3 3 4 3 3 3 3 2 3 2 5 3 3 3 3 3 3 2 2 2 3 2 3 3 2 2 3 3 3 3 2 2 3 3 3 3 3 3 3 3 3 2 2 5 3 3 3 3 2 2 3 5 3 3 3 3 3 2 2 3 3 2 2 2 2 3 3 2 2 3 2 2 5 3 2 2 3 5 1 2 3 3 3 4 5 3 2 2 3 2 2 3 3 2 2 5 3 3 3 3 2 3 5 3 3 5 3 3 2 2 2 3 3 3 3 3 3 2 3 3 5 3 2 5 3 2 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 6 3 3 3 3 5 3 3 3 3 3 3 3 3 2 2 3 5 3 2 5 3 2 3 3 3 3 3 3 3 3 3 2 3 3 3 3 5 3 2 5 3 2 3 3 3 3 5 3 2 3 3 3 3 3 3 3 3 3 5 3 3 5 3 3 5 5 3 3 3 3 3 5 5 3 3 5 3 3 5 3 3 5 3 3 5 1 1 3 4 5 3 3 3 5 3 3 3 3 3 3 3 4 3 2 3 3 3 5 3 2 3 3 3 4 3 3 3 3 3 3 3 4 3 2 3 3 3 3 3 3 3 3 3 3 3 5 3 3 2 2 3 5 3 2 5 3 2 3 3 5 3 3 3 3 3 5 2 5 3 3 4 3 3 3 5 3 2 5 3 3 3 3 5 3 3 3 4 3 3 3 3 3 5 3 3 3 3 4 3 3 3 5 3 3 5 4 4 5 3 2 5 3 2 3 5 3 3 3 3 3 3 3 3 3 3 3 5 3 3 1 3 3 5 4 4 3 2 3 3 3 3 3 5 3 3 5 3 3 4 3 2 3 3 5 3 3 2 1 1 5 3 3 3 5 4 4 3 2 5 3 3 3 3 3 3 3 5 3 2 3 2 3 3 3 3 5 3 3 3 3 3 3 3 3 3 3 3 3 3 3 5 3 3 3 5 3 3 3 4 3 3 4 3 3 3 3 5 2 3 3 3 2 3 3 3 3 5 3 2 3 4 3 2 3 1 1 3 4 4 3 3 3 4 3 3 3 3 2 2 4 3 3 4 3 3 3 2 2 3 3 3 3 3 2 2 3 3 3 3 2 2 4 3 3 3 2 2 3 3 3 3 3 3 3 2 3 3 2 2 3 3 2 2 3 3 3 2 2 3 3 2 2 3 3 2 3 2 2 3 3 3 3 3 3 3 3 2 2 2 3 3 3 3 4 3 3 2 2 3 3 3 3 3 3 2 3 4 2 2 1 1 3 4 3 3 3 3 2 2 4 3 3 3 3 3 2 2 3 2 4 2 2 2 3 3 3 3 2 2 3 3 3 3 3 3 3 3 3 3 3 3 I wasn't able to capture all of it. >How-To-Repeat: unknown >Fix: unknown >Release-Note: >Audit-Trail: >Unformatted: