From owner-freebsd-fs@FreeBSD.ORG  Tue Jan  8 07:29:33 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 4A14E9B6
 for <freebsd-fs@freebsd.org>; Tue,  8 Jan 2013 07:29:33 +0000 (UTC)
 (envelope-from marck@rinet.ru)
Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68])
 by mx1.freebsd.org (Postfix) with ESMTP id C1729F3E
 for <freebsd-fs@freebsd.org>; Tue,  8 Jan 2013 07:29:32 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r087TU7L034949;
 Tue, 8 Jan 2013 11:29:30 +0400 (MSK) (envelope-from marck@rinet.ru)
Date: Tue, 8 Jan 2013 11:29:30 +0400 (MSK)
From: Dmitry Morozovsky <marck@rinet.ru>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: zfs -> ufs rsync: livelock in wdrain state
In-Reply-To: <20130108001231.GB82219@kib.kiev.ua>
Message-ID: <alpine.BSF.2.00.1301081127340.7949@woozle.rinet.ru>
References: <alpine.BSF.2.00.1301080013520.7949@woozle.rinet.ru>
 <20130108001231.GB82219@kib.kiev.ua>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
X-NCC-RegID: ru.rinet
X-OpenPGP-Key-ID: 6B691B03
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7
 (woozle.rinet.ru [0.0.0.0]); Tue, 08 Jan 2013 11:29:30 +0400 (MSK)
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Jan 2013 07:29:33 -0000

On Tue, 8 Jan 2013, Konstantin Belousov wrote:

> > Now, during last rsync, the process is stuck as

[snip]

> > root@moose:/ar# sync
> > load: 0.00  cmd: sync 67229 [wdrain] 468.17r 0.00u 0.00s 0% 596k
> > 
> > Any hints? Quick searching throug freebsd mailing lists and/or open PRs does 
> > not reveal much.
> > 
> 
> Are there any kernel messages about the disk system ?
> 
> The wdrain means that the amount of the dirty buffers accumulated exceeds
> the allowed maximum. The transient 'wdrain' state is normal on a machine
> doing lot of writes to a filesystem using buffer cache, say UFS. Failure
> to clean the dirty buffers is usually related to the disk i/o stalling.
> 
> It cannot be denied that a bug could cause stuck 'wdrain' state, but
> in the last five or so years all the cases I investigated were due to
> disks.

Yes, it seems so:

root@moose:~# camcontrol devlist
load: 0.03  cmd: camcontrol 49735 [devfs] 2.68r 0.00u 0.00s 0% 820k

and then machine is in well known "hardly alive" state: TCP connects 
established, process switching does not go.

Will investigate the hardware, thank you.

-- 
Sincerely,
D.Marck                                     [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer:                                 marck@FreeBSD.org ]
------------------------------------------------------------------------
*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru ***
------------------------------------------------------------------------