Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 22 Jul 2012 02:32:35 -0400
From:      Rafi Rubin <rafi@ugcs.caltech.edu>
To:        Peter Maloney <peter.maloney@brockmann-consult.de>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS deadlock
Message-ID:  <500B9E83.6030509@ugcs.caltech.edu>
In-Reply-To: <500AC46D.309@brockmann-consult.de>
References:  <500A9787.5060109@ugcs.caltech.edu> <500AC46D.309@brockmann-consult.de>

next in thread | previous in thread | raw e-mail | index | archive | help
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'll have to read those more carefully later.  From what I've read so far, it
looks like that happens with a single nfs client.

In my case, a single client isn't a problem, and when I drop the nfsd to a
single thread, many clients (of course in reality just one at a time), are
safe.  I have no problem reading the .zfs directory and my snapshots.

I'm also not sure if the .zfs dir is actually more sensitive than anything
else.  I was just happy to find something that triggered this bug reliably.  I
plan to explore more after I setup a non-production machine for testing.

We've been seeing crashes on the order of once every 2-7 days.  And I doubt
anyone has been using the .zfs directories (but I'm can't be sure).  Also, the
failure mode isn't quite the same (crashing or rebooting the server instead of
just freezing one volume), but I've been changing things around to try
to bisect the problem.  This is just the first consistent bug that I've been
able to identify.

Rafi

On 07/21/12 11:02, Peter Maloney wrote:
> I've had the same thing happen. It only happens with Linux clients. My 
> workaround is to mount an empty directory on top of .zfs. I submitted a PR 
> about it also. And I can't reproduce it on other machines, including the 
> replicated backup server with all the same snapshots. It only happens on 
> the one production machine, possibly because it has more nfs clients
> active than what I created for my tests.
> 
> Here's my forum thread: http://forums.freebsd.org/showthread.php?t=29648
> 
> Here's my PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/168947
> 
> 
> On 07/21/2012 01:50 PM, Rafi Rubin wrote:
>> I have a small server with a mirrored pair of hard drives that handles 
>> generally light loads for a number of linux machines over nfs v3.
>> 
>> I've been seeing some deadlocks (and possibly worse) lately.  I've 
>> narrowed down the freezing to a simple test run simultaneously on a 
>> number of the client machines:
>> 
>> mount host:/food/bar ls bar/.zfs/snapshot
>> 
>> When I do that, all the client machines hang on any access to the
>> server. On the server, only the volume "bar" hangs, the rest are fine.
>> 
>> This occurs even when there is no other load on the server.  Also, after
>>  rebooting the server, the client machines eventually complete the ls and
>>  behave normally.
>> 
>> 
>> I think this doesn't happen if the directory is cached, either in memory 
>> or on an L2 ssd, but I need to rerun some tests.
>> 
>> For now the machine seems to be stable with nfsd limited to a single 
>> thread.
>> 
>> 
>> Any help debugging would be appreciated, Rafi 
>> _______________________________________________ freebsd-fs@freebsd.org 
>> mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To 
>> unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> 
> 
> _______________________________________________ freebsd-fs@freebsd.org 
> mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To 
> unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJQC56AAAoJEPILXytRLnK2C6sP/RtSugG8JXkG50B8U7uiBsPh
UPci3oXYh9uP6XALR2Y+41wmem4Hv8BCQylKK2pkdZkU5G4vRn25xE8PTxFTw4st
27EaD8c++Bihg+djP7B+jdfGR4QTIDNpjH1sqk9GZuZ/uppjEjh5GiNjQrnskOEK
u3Q6VubBNuHAZ6/QFERsy8ZCYmyWtkQ2g7g99tLDrFDYm48lEabxeE1cuS/Ck6BT
IBIqtaujLEWZ8eo8/VG143a5DxMwIjF4FeZ1ingAkiCOFHQu6QEqh7dPfK8CAxg/
sdKS7e3QSi2RKBC5MuV19jQhCXo05QxewfhWtQ4go1yyD9Q733fS7WA9pqtUrLEr
DU5mrqJv7U+V68R/I7a0lM+g67H0yoxnYYk6oYfjPfc3o2aSQ+Sa1cfm94BzvOVs
19wGrh1TpKZFWV0DAJqndHLSknPd7Ly6E4r6drLerTABssSXIJxCOJEAjQG8ftzJ
ftvxYPydzR0hwff/LEDnNWfd4fLmi++f0zc9mrYAAerrZDAPfT+6FHCASYmU9V6+
HA47UztAsDcTkcM3uOPhgmHQI2FnB5uaBLdK/Z+q96kpSvgsKtWyu/jLlZ/KMjNh
xB+hf3OZ8q3q+QoSHDERmd2AXo9ZhTfDLIVfkU10iLaNdxoXhOq8uo5iCX0nzJK2
/sqd8O9CVd91+MfF24X5
=LFIk
-----END PGP SIGNATURE-----



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?500B9E83.6030509>