Date: Mon, 7 Dec 2009 16:28:52 -0500 (EST) From: Rick Macklem <rmacklem@uoguelph.ca> To: "Robert N. M. Watson" <rwatson@FreeBSD.org> Cc: pyunyh@gmail.com, dfr@FreeBSD.org, weldon@excelsusphoto.com, freebsd-current@FreeBSD.org, =?X-UNKNOWN?Q?Eirik_=C3~Xverby?= <ltning@anduin.net>, Gavin Atkinson <gavin@FreeBSD.org> Subject: Re: FreeBSD 8.0 - network stack crashes? Message-ID: <Pine.GSO.4.63.0912071623300.11928@muncher.cs.uoguelph.ca> In-Reply-To: <BA47FDA1-1097-4C43-AF71-51E7227795B5@FreeBSD.org> References: <A1648B95-F36D-459D-BBC4-FFCA63FC1E4C@anduin.net> <20091129013026.GA1355@michelle.cdnetworks.com> <74BFE523-4BB3-4748-98BA-71FBD9829CD5@anduin.net> <alpine.BSF.2.00.0911291427240.80654@fledge.watson.org> <34AD565D-814A-446A-B9CA-AC16DD762E1B@anduin.net> <A0C9ED20-5536-44E2-B26B-0F1AEC2AF79C@anduin.net> <BA47FDA1-1097-4C43-AF71-51E7227795B5@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---559023410-758783491-1260221332=:11928 Content-Type: TEXT/PLAIN; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Mon, 30 Nov 2009, Robert N. M. Watson wrote: > > On 30 Nov 2009, at 05:36, Eirik =D8verby wrote: > >> Short follow-up: Making OpenBSD use TCP mounts (it defaults to UDP) seem= s to solve the issue. >> >> So this is a UDP-NFS-related problem, it would seem? > > Could well be. Let's try another debugging tactic -- there are two possib= le things going on here: resource leak, and resource exhaustion leading to = deadlock. If you shut down to single user mode from multi-user, and let the= system quiesce for a few minutes, then run netstat -m, what does it look l= ike? Do vast numbers of mbufs+clusters get freed, or do they remain account= ed for as allocated? > > (If they remain allocated, they were likely leaked, since most/all socket= s will have been closed, releasing their resources on shutdown to single us= er when all processes are killed) > > The theory of an mbuf leak in NFS isn't an unlikely theory -- the socket = code there continues to change, and rare edge cases frequently lead to leak= s (per my earlier e-mail). Perhaps there's a case the OpenBSD client is tri= ggering that other NFS clients normally don't. If we think that's the case,= the next step is usually to narrow down what causes the leak to trigger a = lot (i.e., the backup starting), and then grab a packet trace that we can a= nalyze with wireshark. We'll want to look at the types of errors being retu= rned for RPCs and, in particular, if there's one that happens about the sam= e number of times as the resource has leaked over the same window, look at = the code and see if that error case is handled properly. > > If this is definitely an NFS leak bug, we should get the NFS folks attent= ion by sticking "NFS mbuf leak" in the subject line and CC'ing rmacklem/dfr= =2E :-) > It's a bit of a shot in the dark, but could you please test the following patch? It patches for a possible mbuf leak + a possible M_SONAME leak (I have no idea if these ever occur in practice?). It also fixes a case where the return value for svc_reply_dg() would have been TRUE for failure. It was all I could see from a quick look. rick --- rpc/svc_dg.c.sav=092009-12-07 15:37:45.000000000 -0500 +++ rpc/svc_dg.c=092009-12-07 15:48:50.000000000 -0500 @@ -221,6 +221,8 @@ =09xdrmbuf_create(&xdrs, mreq, XDR_DECODE); =09if (! xdr_callmsg(&xdrs, msg)) { =09=09XDR_DESTROY(&xdrs); +=09=09if (raddr !=3D NULL) +=09=09=09free(raddr, M_SONAME); =09=09return (FALSE); =09} @@ -259,11 +261,13 @@ =09=09m_fixhdr(mrep); =09=09error =3D sosend(xprt->xp_socket, addr, NULL, mrep, NULL, =09=09 0, curthread); -=09=09if (!error) { -=09=09=09stat =3D TRUE; +=09=09if (error) { +=09=09=09stat =3D FALSE; =09=09} =09} else { =09=09m_freem(mrep); +=09=09if (m !=3D NULL) +=09=09=09m_freem(m); =09} =09XDR_DESTROY(&xdrs); ---559023410-758783491-1260221332=:11928--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.0912071623300.11928>