Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Nov 2009 08:09:16 -0500
From:      "Robert N. M. Watson" <rwatson@FreeBSD.org>
To:        =?iso-8859-1?Q?Eirik_=D8verby?= <ltning@anduin.net>
Cc:        pyunyh@gmail.com, weldon@excelsusphoto.com, freebsd-current@freebsd.org, Gavin Atkinson <gavin@freebsd.org>
Subject:   Re: FreeBSD 8.0 - network stack crashes?
Message-ID:  <BA47FDA1-1097-4C43-AF71-51E7227795B5@FreeBSD.org>
In-Reply-To: <A0C9ED20-5536-44E2-B26B-0F1AEC2AF79C@anduin.net>
References:  <A1648B95-F36D-459D-BBC4-FFCA63FC1E4C@anduin.net> <20091129013026.GA1355@michelle.cdnetworks.com> <74BFE523-4BB3-4748-98BA-71FBD9829CD5@anduin.net> <alpine.BSF.2.00.0911291427240.80654@fledge.watson.org> <34AD565D-814A-446A-B9CA-AC16DD762E1B@anduin.net> <A0C9ED20-5536-44E2-B26B-0F1AEC2AF79C@anduin.net>

next in thread | previous in thread | raw e-mail | index | archive | help

On 30 Nov 2009, at 05:36, Eirik =D8verby wrote:

> Short follow-up: Making OpenBSD use TCP mounts (it defaults to UDP) =
seems to solve the issue.
>=20
> So this is a UDP-NFS-related problem, it would seem?

Could well be. Let's try another debugging tactic -- there are two =
possible things going on here: resource leak, and resource exhaustion =
leading to deadlock. If you shut down to single user mode from =
multi-user, and let the system quiesce for a few minutes, then run =
netstat -m, what does it look like? Do vast numbers of mbufs+clusters =
get freed, or do they remain accounted for as allocated?

(If they remain allocated, they were likely leaked, since most/all =
sockets will have been closed, releasing their resources on shutdown to =
single user when all processes are killed)

The theory of an mbuf leak in NFS isn't an unlikely theory -- the socket =
code there continues to change, and rare edge cases frequently lead to =
leaks (per my earlier e-mail). Perhaps there's a case the OpenBSD client =
is triggering that other NFS clients normally don't. If we think that's =
the case, the next step is usually to narrow down what causes the leak =
to trigger a lot (i.e., the backup starting), and then grab a packet =
trace that we can analyze with wireshark. We'll want to look at the =
types of errors being returned for RPCs and, in particular, if there's =
one that happens about the same number of times as the resource has =
leaked over the same window, look at the code and see if that error case =
is handled properly.

If this is definitely an NFS leak bug, we should get the NFS folks =
attention by sticking "NFS mbuf leak" in the subject line and CC'ing =
rmacklem/dfr. :-)

Robert




> /Eirik
>=20
> On 30. nov. 2009, at 11.22, Eirik =D8verby wrote:
>=20
>> Hi,
>>=20
>> I have something that might be more interesting than any counter ...
>> It seems to me as if the problem *only* manifests itself when an =
OpenBSD box is backing up to this FreeBSD 8.0-NFS-ZFS server. All other =
boxes are FreeBSD, and I have so far today been unable to reproduce the =
problem from any of those. As soon as I interrupted the backup running =
from OpenBSD, the mbuf cluster usage stabilized.
>>=20
>> How's that for a mystery in the morning?
>>=20
>> /Eirik
>>=20
>> On 29. nov. 2009, at 15.29, Robert Watson wrote:
>>=20
>>> On Sun, 29 Nov 2009, Eirik =D8verby wrote:
>>>=20
>>>> I just did that (-rxcsum -txcsum -tso), but the numbers still keep =
rising. I'll wait and see if it goes down again, then reboot with those =
values to see how it behaves. But right away it doesn't look too good ..
>>>=20
>>> It would be interesting to know if any of the counters in the output =
of netstat -s grow linearly with the allocation count in netstat -m.  =
Often times leaks are associated with edge cases in the stack (typically =
because if they are in common cases the bug is detected really quickly!) =
-- usually error handling, where in some error case the unwinding fails =
to free an mbuf that it should free.  These are notoriously hard to =
track down, unfortunately, but the stats output (especially where delta =
alloc is linear to delta stat) may inform the situation some more.
>>>=20
>>> Robert N M Watson
>>> Computer Laboratory
>>> University of Cambridge
>>=20
>> _______________________________________________
>> freebsd-current@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to =
"freebsd-current-unsubscribe@freebsd.org"
>>=20
>=20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BA47FDA1-1097-4C43-AF71-51E7227795B5>