From owner-freebsd-current@FreeBSD.ORG  Thu Aug 26 06:00:46 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3AFC016A4CE
	for <freebsd-current@FreeBSD.org>;
	Thu, 26 Aug 2004 06:00:46 +0000 (GMT)
Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP id C005E43D45
	for <freebsd-current@FreeBSD.org>;
	Thu, 26 Aug 2004 06:00:45 +0000 (GMT)
	(envelope-from truckman@FreeBSD.org)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
	by gw.catspoiler.org (8.12.11/8.12.11) with ESMTP id i7Q60Z2m005989;
	Wed, 25 Aug 2004 23:00:39 -0700 (PDT)
	(envelope-from truckman@FreeBSD.org)
Message-Id: <200408260600.i7Q60Z2m005989@gw.catspoiler.org>
Date: Wed, 25 Aug 2004 23:00:35 -0700 (PDT)
From: Don Lewis <truckman@FreeBSD.org>
To: noackjr@alumni.rice.edu
In-Reply-To: <412D46D3.9010900@alumni.rice.edu>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
cc: freebsd-current@FreeBSD.org
cc: bettan@nerim.net
Subject: Re: reboot on freebsd 5.3-beta1
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Aug 2004 06:00:46 -0000

On 25 Aug, Jon Noack wrote:
> On 08/25/04 13:31, Don Lewis wrote:
>> On 25 Aug, bettan wrote:
>>> When i reboot , i have Syncing disk , vnodes remaining and numbers
>>> but it isn't quickly and i don't umount my files systems before the
>>> reboot.
>> 
>> It should print a number once per second.  The numbers should quickly
>> decrease to something in the low single digits. It is not unusual to see
>> the number decrease to zero and then bounce back up a couple of times.
>> Spending about 10 seconds or so at this stage is not unexpected.
> 
> This is a relatively recent change in behavior to workaround the fact 
> that IDE/ATA controllers/drives report that they have successfully 
> written data before they actually perform the write.  With the old quick 
> sync, people were experiencing corruption and data loss when rebooting. 
>   The new behavior is much more conservative and is designed to minimize 
> risk.

Before this change, the system attempted to sync all the file systems
after it disabled the syncer thread.  The problem was that if soft
updates was in use and there was a lot of file system activity shortly
before system shutdown, there could be some mutually dependent file
system writes buffered that the final sync code couldn't handle.  When
this happened, the system would get stuck at the "syncing disks, buffers
remaining..." stage and after an initial decrease, the number of buffers
would stabilize at some non-zero value.  Eventually the final sync code
would time out and shut down the system with the mounted file systems
marked unclean.  I could easily reproduce this problem by rebooting
shortly after running mergemaster, so I got in the habit of running the
sync command three times and waiting a bit before rebooting as a
workaround.

The IDE/ATA problem only happened when powering off the system.  I
*think* it might have been fixed by explicitly telling the drives to do
a cache flush.  There is also a tuneable delay
(kern.shutdown.poweroff_delay) before turning off the power to give the
drives time to flush their write caches even if they ignore any explicit
flush command.

The system shutdown delay currently observed at the "Syncing disk, vnodes
remaining" stage should be roughly the same as delay previously seen at
the "syncing disks, buffers remaining..." stage.  I've got some ideas on
how to speed it up.