From owner-freebsd-stable@FreeBSD.ORG Tue May 3 08:18:31 2005 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7E0B316A4EE for ; Tue, 3 May 2005 08:18:31 +0000 (GMT) Received: from gen129.n001.c02.escapebox.net (gen129.n001.c02.escapebox.net [213.73.91.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8C45243D39 for ; Tue, 3 May 2005 08:18:30 +0000 (GMT) (envelope-from gemini@geminix.org) Message-ID: <427733D2.5020300@geminix.org> Date: Tue, 03 May 2005 10:18:26 +0200 From: Uwe Doering Organization: Private UNIX Site User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.7) Gecko/20050501 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-stable@FreeBSD.ORG References: <200505021407.j42E7gFv095417@lurza.secnetix.de> In-Reply-To: <200505021407.j42E7gFv095417@lurza.secnetix.de> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Received: from gemini by geminix.org with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.50 (FreeBSD)) id 1DSsbs-000PpE-61; Tue, 03 May 2005 10:18:29 +0200 Subject: Re: kernel: swap_pager: indefinite wait buffer - on 5.3-RELEASE-p5 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2005 08:18:31 -0000 Oliver Fromme wrote: > Uwe Doering wrote: > > Oliver Fromme wrote: > > > If they're really identical (i.e. the same size and same > > > geometry), then you can use dd(1) for duplication, like > > > this: > > > > > > # dd if=/dev/ad0 of=/dev/ad1 bs=64k conv=noerror,sync > > > > > > The "noerror,sync" part is important so the dd command will > > > not stop when it hits any bad spots on the source drive and > > > instead will fill the blocks with zeroes on the destination > > > drive. Since it's only the swap partition, you shouldn't > > > lose any data. > > > > I would like to point out that the conclusion you're drawing in the last > > sentence is invalid IMHO. > > I'm afraid I don't agree. > > > "indefinite wait buffer" messages at > > apparently random block numbers just indicate that the pager was unable > > to access the swap area (in its entirety!) when it wanted to. It means > > that the disk drive was either dead at that point in time or busy trying > > to deal with a bad sector. > > > > This sector could have been anywhere on the disk. It just kept the disk > > drive busy for long enough that the pager started to complain. > > The OP specifically said that the swap_pager messages were > the only kernel messages that he got. That indicates that > only the swap partition is affected, because otherwise > there would have been other kernel messages indicating > I/O errors from one of the filesystems on that disk. Your assumption here is that the filesystem code would become impatient, too. This in not the case. The swap pager has a timeout built in (20 seconds IIRC) after which it prints a warning message and continues waiting, but there is nothing like this in the filesystem code. If the disk drive is dead or busy trying to deal with a bad sector in a filesystem the kernel will wait silently and indefinitely until either the disk drive succeeds in recovering the sector, or it fails to do so. In the latter case the kernel would log an I/O error. But only when it hears back from the disk drive and not any earlier, in contrast to the swap pager. That's why you often see only swap pager messages in case of a dying disk drive. I checked the kernel sources, but of course I could have missed the relevant lines. In this case I would appreciate a pointer to the place at which the filesystem code generates a warning message comparable to that from the swap pager. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net