From owner-freebsd-fs@FreeBSD.ORG  Sat Oct  9 13:37:08 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 79D641065672
	for <freebsd-fs@freebsd.org>; Sat,  9 Oct 2010 13:37:08 +0000 (UTC)
	(envelope-from gallasch@free.de)
Received: from smtp.free.de (smtp.free.de [91.204.6.103])
	by mx1.freebsd.org (Postfix) with ESMTP id BC66F8FC16
	for <freebsd-fs@freebsd.org>; Sat,  9 Oct 2010 13:37:07 +0000 (UTC)
Received: (qmail 90429 invoked from network); 9 Oct 2010 15:37:05 +0200
Received: from smtp.free.de (HELO orwell.free.de) ([91.204.4.103])
	(envelope-sender <gallasch@free.de>)
	by smtp.free.de (qmail-ldap-1.03) with AES128-SHA encrypted SMTP
	for <freebsd-fs@freebsd.org>; 9 Oct 2010 15:37:05 +0200
References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de>
	<20101009111241.GA58948@icarus.home.lan>
In-Reply-To: <20101009111241.GA58948@icarus.home.lan>
Mime-Version: 1.0 (Apple Message framework v1081)
Content-Type: text/plain; charset=us-ascii
Message-Id: <CF901B53-657E-49FC-A43B-27BC7D49F7A7@free.de>
Content-Transfer-Encoding: quoted-printable
From: Kai Gallasch <gallasch@free.de>
Date: Sat, 9 Oct 2010 15:37:04 +0200
To: freebsd-fs@freebsd.org
X-Mailer: Apple Mail (2.1081)
Cc: 
Subject: Re: Locked up processes after upgrade to ZFS v15
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Oct 2010 13:37:08 -0000


Am 09.10.2010 um 13:12 schrieb Jeremy Chadwick:

> On Wed, Oct 06, 2010 at 02:28:31PM +0200, Kai Gallasch wrote:
>> Two days ago I upgraded my server to 8.1-STABLE (amd64) and upgraded =
ZFS from v14 to v15.
>> After zpool & zfs upgrade the server was running stable for about =
half a day, but then apache processes running inside jails would lock up =
and could not be terminated any more.


> On RELENG_7, the system used ZFS v14, had the same tunings, and had an
> uptime of 221 days w/out issue.

8.0 and 8.1-STABLE + ZFS v14 also ran very solid on my servers - dang!

> With RELENG_8, the system lasted approximately 12 hours (about half a
> day) before getting into a state that looks almost identical to Kai's
> system: existing processes were stuck (unkillable, even with -9).  New
> processes could be spawned (including ones which used the ZFS
> filesystems), and commands executed successfully.

same here. I can provoke this locked process problem by starting
one of my webserver jails. The first httpd process will lock up  after =
max. 30 minutes.

Problem is, that after lot httpd forks, apache can not fork any more =
child processes and the stuck (not killable) httpd processes all have a =
socket open, with the IP address of the webserver. So a restart of =
apache is not possible, because $IP:80 is already occupied.

The jail also cannot be stopped/started in this state.. Only choice =
there is: Restart the whole jail-host server (some processes would not =
die - ps -axl advised + unclean umounts of ufs partitions) or delete the =
IP-Adresse from the network interface and migrate the jail to another =
server (zfs send/receive).. no fun at all. BTW: zfs destroy also does =
not work here.

> init complained about wedged processes when the system was rebooted:

I use 'procstat -k -k -a | grep faul' to look for this condition..

This will find all processes in the table that contain 'trap_pfault'

> Oct  9 02:00:56 init: some processes would not die; ps axl advised
>=20
> No indication of any hardware issues on the console.

here too.

> The administrator who was handling the issue did not use "ps -l", =
"top",
> nor "procstat -k", so we don't have any indication of what the process
> state was in, nor what the kernel calling stack looked like that lead =
up
> to the wedging.  All he stated was that the processes were in D/I
> states, which doesn't help since that's what they're in normally =
anyway.
> If I was around I would have forced DDB and done "call doadump" to
> investigate things post-mortem.

Another sign is an increased count of processes in 'top'.=20

> Monitoring graphs of the system during this time don't indicate any
> signs of memory thrashing (though bsnmp-ucd doesn't provide as much
> granularity as top does); the system looks normal except for a =
slightly
> decreased load average (probably as a result of the deadlocked
> processes).

My server currently has 28 GB RAM, with < 60% usage and no special zfs =
tuning in loader.conf - although I tried to set =
vm.pmap.pg_ps_enabled=3D"0" to find out if the locked processes had =
anything to do with it.
But setting it, did not prevent the problem from reoccurring.

> Aside from the top/procstat/kernel dump aspect, what other information
> would kernel folks be interested in?  Is "call doadump" sufficient for
> post-mortem investigation?  I need to know since if/when this happens
> again (likely), I want to get folks as much information as possible.

I'm also willing to help, but need explicit instructions. I could =
provoke such a lockup on one of my servers, but don't have that much =
time to leave the server in this state.. So only a small time frame to =
collect wanted debug data.

> Also, a question for Kai: what did you end up doing to resolve this
> problem?  Did you roll back to an older FreeBSD, or...?

This bug struck me really hard, because the affected server is not part =
of a cluster and hosts about 50 jails (mail, web, databases).
Problem is: Sockets held open by locked processes cannot be closed.. So =
a restart of a jammed service is not possible.
Theoretically I had the option to boot into the old world/kernel, but =
I'm sure with the old zfs.ko a zfs mount of ZFS v15 wouldn't be =
possible. AFAIK there is no zfs downgrade command or utility..

Of course a bare metal recovery of the whole server from tape was also a =
last option. But really??

my 'solution':

- move the most instable jails to other servers and restore them to UFS =
partitions.
- move everything else in the zpool temporarily to other servers running =
zfs (zfs send/recieve)
- zfs destroy -r
- zpool delete
- gpart create -t freebsd-ufs
- gpart add ...
- restore all jails from zfs to ufs

So the server was now reverted to ufs - just for the piece of (my) mind, =
although I waste around 50% of the raid capacity for reserved FS =
allocation and all the other disadvantages compared to a volume manager. =
I will still use zfs on several machines, but for some time not for =
critical data. ZFS is a nifty thing, but I really depend on a stable FS. =
(Of course for other people zfs v15 may be running smoothly)

I must repeat. I offer my help if someone wants to dig into the locking =
problem. =20

Regards,
Kai.