From owner-freebsd-fs@FreeBSD.ORG Fri Jan 25 15:07:02 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 09C488D5; Fri, 25 Jan 2013 15:07:02 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 75E861C0; Fri, 25 Jan 2013 15:07:01 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEANqeAlGDaFvO/2dsb2JhbABEhka4I3OCHgEBAQMBAQEBIAQnIAsFFg4KAgINGQIpAQkmBggCBQQBHASHaAYMq3CSVoEji3GDGIETA4hhin6CLoEcjyyDFYFRNQ X-IronPort-AV: E=Sophos;i="4.84,538,1355115600"; d="scan'208";a="13610977" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 25 Jan 2013 10:06:54 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id C9321B403F; Fri, 25 Jan 2013 10:06:54 -0500 (EST) Date: Fri, 25 Jan 2013 10:06:54 -0500 (EST) From: Rick Macklem To: Konstantin Belousov Message-ID: <1436756651.2351478.1359126414802.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20130125113606.GO2522@kib.kiev.ua> Subject: Re: 9.1-stable crashes while copying data from a NFS mounted directory MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, yongari@freebsd.org, Christian Gusenbauer X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Jan 2013 15:07:02 -0000 Konstantin Belousov wrote: > On Thu, Jan 24, 2013 at 09:39:38PM -0500, Rick Macklem wrote: > > John Baldwin wrote: > > > On Thursday, January 24, 2013 4:22:12 pm Konstantin Belousov > > > wrote: > > > > On Thu, Jan 24, 2013 at 09:50:52PM +0100, Christian Gusenbauer > > > > wrote: > > > > > On Thursday 24 January 2013 20:37:09 Konstantin Belousov > > > > > wrote: > > > > > > On Thu, Jan 24, 2013 at 07:50:49PM +0100, Christian > > > > > > Gusenbauer > > > > > > wrote: > > > > > > > On Thursday 24 January 2013 19:07:23 Konstantin Belousov > > > > > > > wrote: > > > > > > > > On Thu, Jan 24, 2013 at 08:03:59PM +0200, Konstantin > > > > > > > > Belousov wrote: > > > > > > > > > On Thu, Jan 24, 2013 at 06:05:57PM +0100, Christian > > > > > > > > > Gusenbauer wrote: > > > > > > > > > > Hi! > > > > > > > > > > > > > > > > > > > > I'm using 9.1 stable svn revision 245605 and I get > > > > > > > > > > the > > > > > > > > > > panic below > > > > > > > > > > if I execute the following commands (as single > > > > > > > > > > user): > > > > > > > > > > > > > > > > > > > > # swapon -a > > > > > > > > > > # dumpon /dev/ada0s3b > > > > > > > > > > # mount -u / > > > > > > > > > > # ifconfig age0 inet 192.168.2.2 mtu 6144 up > > > > > > > > > > # mount -t nfs -o rsize=32768 data:/multimedia /mnt > > > > > > > > > > # cp /mnt/Movies/test/a.m2ts /tmp > > > > > > > > > > > > > > > > > > > > then the system panics almost immediately. I'll > > > > > > > > > > attach > > > > > > > > > > the stack > > > > > > > > > > trace. > > > > > > > > > > > > > > > > > > > > Note, that I'm using jumbo frames (6144 byte) on a > > > > > > > > > > 1Gbit > > > > > > > > > > network, > > > > > > > > > > maybe that's the cause for the panic, because the > > > > > > > > > > bcopy > > > > > > > > > > (see stack > > > > > > > > > > frame #15) fails. > > > > > > > > > > > > > > > > > > > > Any clues? > > > > > > > > > > > > > > > > > > I tried a similar operation with the nfs mount of > > > > > > > > > rsize=32768 and mtu > > > > > > > > > 6144, but the machine runs HEAD and em instead of age. > > > > > > > > > I > > > > > > > > > was unable > > > > > > > > > to reproduce the panic on the copy of the 5GB file > > > > > > > > > from > > > > > > > > > nfs mount. > > > > > > > > > > > > > > Hmmm, I did a quick test. If I do not change the MTU, so > > > > > > > just > > > > > > > configuring > > > > > > > age0 with > > > > > > > > > > > > > > # ifconfig age0 inet 192.168.2.2 up > > > > > > > > > > > > > > then I can copy all files from the mounted directory > > > > > > > without > > > > > > > any > > > > > > > problems, too. So it's probably age0 related? > > > > > > > > > > > > From your backtrace and the buffer printout, I see somewhat > > > > > > strange thing. > > > > > > The buffer data address is 0xffffff8171418000, while kernel > > > > > > faulted > > > > > > at the attempt to write at 0xffffff8171413000, which is is > > > > > > lower > > > > > > then > > > > > > the buffer data pointer, at the attempt to bcopy to the > > > > > > buffer. > > > > > > > > > > > > The other data suggests that there were no overflow of the > > > > > > data > > > > > > from the > > > > > > server response. So it might be that mbuf_len(mp) returned > > > > > > negative number > > > > > > ? I am not sure is it possible at all. > > > > > > > > > > > > Try this debugging patch, please. You need to add INVARIANTS > > > > > > etc > > > > > > to the > > > > > > kernel config. > > > > > > > > > > > > diff --git a/sys/fs/nfs/nfs_commonsubs.c > > > > > > b/sys/fs/nfs/nfs_commonsubs.c > > > > > > index efc0786..9a6bda5 100644 > > > > > > --- a/sys/fs/nfs/nfs_commonsubs.c > > > > > > +++ b/sys/fs/nfs/nfs_commonsubs.c > > > > > > @@ -218,6 +218,7 @@ nfsm_mbufuio(struct nfsrv_descript *nd, > > > > > > struct uio > > > > > > *uiop, int siz) } > > > > > > mbufcp = NFSMTOD(mp, caddr_t); > > > > > > len = mbuf_len(mp); > > > > > > + KASSERT(len > 0, ("len %d", len)); > > > > > > } > > > > > > xfer = (left > len) ? len : left; > > > > > > #ifdef notdef > > > > > > @@ -239,6 +240,8 @@ nfsm_mbufuio(struct nfsrv_descript *nd, > > > > > > struct uio > > > > > > *uiop, int siz) uiop->uio_resid -= xfer; > > > > > > } > > > > > > if (uiop->uio_iov->iov_len <= siz) { > > > > > > + KASSERT(uiop->uio_iovcnt > 1, ("uio_iovcnt %d", > > > > > > + uiop->uio_iovcnt)); > > > > > > uiop->uio_iovcnt--; > > > > > > uiop->uio_iov++; > > > > > > } else { > > > > > > > > > > > > I thought that server have returned too long response, but > > > > > > it > > > > > > seems to > > > > > > be not the case from your data. Still, I think the patch > > > > > > below > > > > > > might be > > > > > > due. > > > > > > > > > > > > diff --git a/sys/fs/nfsclient/nfs_clrpcops.c > > > > > > b/sys/fs/nfsclient/nfs_clrpcops.c index be0476a..a89b907 > > > > > > 100644 > > > > > > --- a/sys/fs/nfsclient/nfs_clrpcops.c > > > > > > +++ b/sys/fs/nfsclient/nfs_clrpcops.c > > > > > > @@ -1444,7 +1444,7 @@ nfsrpc_readrpc(vnode_t vp, struct uio > > > > > > *uiop, struct > > > > > > ucred *cred, NFSM_DISSECT(tl, u_int32_t *, NFSX_UNSIGNED); > > > > > > eof = fxdr_unsigned(int, *tl); > > > > > > } > > > > > > - NFSM_STRSIZ(retlen, rsize); > > > > > > + NFSM_STRSIZ(retlen, len); > > > > > > error = nfsm_mbufuio(nd, uiop, retlen); > > > > > > if (error) > > > > > > goto nfsmout; > > > > > > > I think this patch is appropriate, although I don't see it as too > > critical. It just tightens the "sanity check" on the read reply > > length (which should never exceed what the client requested). > I agree, but client cannot control the server response. Righto. However, the server can put the correct "size" at the beginning of the reply and then follow it with the wrong amount of data. Those are the cases that the EBADRPC reply from nfsm_mbufuio() hopefully catches. But, I agree that changing "rsize" to "len" is correct in this case and would catch the case where the server replies with too large a "size". > Anyway, I > think > there is too much things that could go wrong if the server actively > exploit the client code. > > > > > nfsm_mbufuio() shouldn't transfer more than the uio structure can > > handle, even if the replied read size is larger than requested. > Yes, this what happen, I suppose, due to the decrement of the > uio_iovcnt > and the EBADRPC error return at the beginning of loop. But IMO the > situation should be catched and asserted instead. This is why I added > KASSERT(uio_iovcnt > 1) before the decrement. > > I do not think that we should both add my KASSERT for iovcnt and leave > the EBADRPC return. What is your preference there ? > Well, if a server sends a reply with "size == 16384", but then follows it with 32768bytes of data, I don't think that should panic the client, since it isn't a client bug. I think this is where the EBADRPC reply for iovcnt will happen. The reverse: A reply with "size == 32768", but followed by 16384bytes of data, would hit the end of the mbuf list. I don't think that should be a KASSERT() either since, again, it is a server bug and not a client one. Now, the case where the mbuf list is bogus (bad m_len or ???) I see the argument for KASSERT()s, since it is a bug somewhere in the client machine and that bug may soon (or have already) corrupted other data structures, even if there is no damage done for this case once caught. I'm fine with KASSERT()s for these. (A KASSERT() panic with a message is probably easier to debug than a bcopy() crash, although you did the latter amazingly well;-) > > > > It does seem that nfsm_mbufuio() should apply a sanity check on > > m_len. I think m_len == 0 is ok, but negative or very large should > > be checked for. Maybe just return EBADRPC after a printf() instead > > of a KASSERT(), as a safety belt against a trashed m_len from a > > driver > > or ??? > The same as with the overflowed size, it would only hide another bug > in > the kernel. Probably, the assert m_len >= 0 and other useful > assertions > should be performed in the central place in the network stack, to > catch > an error earlier and for all consumers. > Yes. That sounds like a good idea to me. To be honest, there are a lot of places where a bogus mbuf list can trash the NFS code, so catching it in a central place would be good, I think. Thanks for looking into this, rick > > > > rick > > > > > > > I applied your patches and now I get a > > > > > > > > > > panic: len -4 > > > > > cpuid = 1 > > > > > KDB: enter: panic > > > > > Dumping 377 out of 6116 > > > > > MB:..5%..13%..22%..34%..43%..51%..64%..73%..81%..94% > > > > > > > > > This means that the age driver either produced corrupted mbuf > > > > chain, > > > > or filled wrong negative value into the mbuf len field. I am > > > > quite > > > > certain that the issue is in the driver. > > > > > > > > I added the net@ to Cc:, hopefully you could get help there. > > > > > > And I've cc'd Pyun who has written most of this driver and is > > > likely > > > the one > > > most familiar with its handling of jumbo frames. > > > > > > -- > > > John Baldwin > > > _______________________________________________ > > > freebsd-fs@freebsd.org mailing list > > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > To unsubscribe, send any mail to > > > "freebsd-fs-unsubscribe@freebsd.org"