From owner-freebsd-fs@FreeBSD.ORG Tue Jan 8 07:29:33 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4A14E9B6 for ; Tue, 8 Jan 2013 07:29:33 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id C1729F3E for ; Tue, 8 Jan 2013 07:29:32 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r087TU7L034949; Tue, 8 Jan 2013 11:29:30 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Tue, 8 Jan 2013 11:29:30 +0400 (MSK) From: Dmitry Morozovsky To: Konstantin Belousov Subject: Re: zfs -> ufs rsync: livelock in wdrain state In-Reply-To: <20130108001231.GB82219@kib.kiev.ua> Message-ID: References: <20130108001231.GB82219@kib.kiev.ua> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (woozle.rinet.ru [0.0.0.0]); Tue, 08 Jan 2013 11:29:30 +0400 (MSK) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jan 2013 07:29:33 -0000 On Tue, 8 Jan 2013, Konstantin Belousov wrote: > > Now, during last rsync, the process is stuck as [snip] > > root@moose:/ar# sync > > load: 0.00 cmd: sync 67229 [wdrain] 468.17r 0.00u 0.00s 0% 596k > > > > Any hints? Quick searching throug freebsd mailing lists and/or open PRs does > > not reveal much. > > > > Are there any kernel messages about the disk system ? > > The wdrain means that the amount of the dirty buffers accumulated exceeds > the allowed maximum. The transient 'wdrain' state is normal on a machine > doing lot of writes to a filesystem using buffer cache, say UFS. Failure > to clean the dirty buffers is usually related to the disk i/o stalling. > > It cannot be denied that a bug could cause stuck 'wdrain' state, but > in the last five or so years all the cases I investigated were due to > disks. Yes, it seems so: root@moose:~# camcontrol devlist load: 0.03 cmd: camcontrol 49735 [devfs] 2.68r 0.00u 0.00s 0% 820k and then machine is in well known "hardly alive" state: TCP connects established, process switching does not go. Will investigate the hardware, thank you. -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------