From owner-freebsd-fs@FreeBSD.ORG  Mon Jan 28 06:35:45 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id C902ED8C;
 Mon, 28 Jan 2013 06:35:45 +0000 (UTC)
 (envelope-from pyunyh@gmail.com)
Received: from mail-pb0-f47.google.com (mail-pb0-f47.google.com
 [209.85.160.47]) by mx1.freebsd.org (Postfix) with ESMTP id 6E955BF2;
 Mon, 28 Jan 2013 06:35:45 +0000 (UTC)
Received: by mail-pb0-f47.google.com with SMTP id rp8so514298pbb.34
 for <multiple recipients>; Sun, 27 Jan 2013 22:35:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=x-received:from:date:to:cc:subject:message-id:reply-to:references
 :mime-version:content-type:content-disposition:in-reply-to
 :user-agent; bh=2RacGay4nKiXjswap+LYj/1R3BklExGECrJVUYi2YcE=;
 b=jfDdnNRfagARNXe8mlBcGP6m59yqAXmAUhck8r3GUBB9aKYCz/AS2sLDQBQLPyTFAi
 RR2Ha8pOTsd4haaQKclcY/u/eIv/r99tR6CS5Dn4jNx3VKS/LM8U3cd9Q/5CKEVGptQ2
 qqQCNGXBz2bcpn408+9hpu7F8CJRK5Ls97wwK7XYofCyTJjQbOhlI59q7udckpA7SmIn
 ktvckWbbowjY7eqpI7NoE7RcOgTPRuZna7igRLmHQVk7qR2L8SuVSMBduKg233e7JuDz
 ihjvOEv7Yu1n0INf2iNCcKKp65xNmO6MGZR2CmWNS3KwUFdJlofE/KOqGVVhL/6L+LDA
 pMBg==
X-Received: by 10.66.84.195 with SMTP id b3mr33785573paz.30.1359354939703;
 Sun, 27 Jan 2013 22:35:39 -0800 (PST)
Received: from pyunyh@gmail.com (lpe4.p59-icn.cdngp.net. [114.111.62.249])
 by mx.google.com with ESMTPS id x6sm6157347paw.0.2013.01.27.22.35.35
 (version=TLSv1 cipher=RC4-SHA bits=128/128);
 Sun, 27 Jan 2013 22:35:38 -0800 (PST)
Received: by pyunyh@gmail.com (sSMTP sendmail emulation);
 Mon, 28 Jan 2013 15:35:31 +0900
From: YongHyeon PYUN <pyunyh@gmail.com>
Date: Mon, 28 Jan 2013 15:35:31 +0900
To: Christian Gusenbauer <c47g@gmx.at>
Subject: Re: 9.1-stable crashes while copying data from a NFS mounted directory
Message-ID: <20130128063531.GC1447@michelle.cdnetworks.com>
References: <201301241805.57623.c47g@gmx.at>
 <20130125043043.GA1429@michelle.cdnetworks.com>
 <20130125045048.GB1429@michelle.cdnetworks.com>
 <201301251809.50929.c47g@gmx.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <201301251809.50929.c47g@gmx.at>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-fs@freebsd.org, net@freebsd.org, yongari@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: pyunyh@gmail.com
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Jan 2013 06:35:45 -0000

On Fri, Jan 25, 2013 at 06:09:50PM +0100, Christian Gusenbauer wrote:
> On Friday 25 January 2013 05:50:48 YongHyeon PYUN wrote:
> > On Fri, Jan 25, 2013 at 01:30:43PM +0900, YongHyeon PYUN wrote:
> > > On Thu, Jan 24, 2013 at 05:21:50PM -0500, John Baldwin wrote:
> > > > On Thursday, January 24, 2013 4:22:12 pm Konstantin Belousov wrote:
> > > > > On Thu, Jan 24, 2013 at 09:50:52PM +0100, Christian Gusenbauer wrote:
> > > > > > On Thursday 24 January 2013 20:37:09 Konstantin Belousov wrote:
> > > > > > > On Thu, Jan 24, 2013 at 07:50:49PM +0100, Christian Gusenbauer 
> wrote:
> > > > > > > > On Thursday 24 January 2013 19:07:23 Konstantin Belousov wrote:
> > > > > > > > > On Thu, Jan 24, 2013 at 08:03:59PM +0200, Konstantin Belousov 
> wrote:
> > > > > > > > > > On Thu, Jan 24, 2013 at 06:05:57PM +0100, Christian 
> Gusenbauer wrote:
> > > > > > > > > > > Hi!
> > > > > > > > > > > 
> > > > > > > > > > > I'm using 9.1 stable svn revision 245605 and I get the
> > > > > > > > > > > panic below if I execute the following commands (as
> > > > > > > > > > > single user):
> > > > > > > > > > > 
> > > > > > > > > > > # swapon -a
> > > > > > > > > > > # dumpon /dev/ada0s3b
> > > > > > > > > > > # mount -u /
> > > > > > > > > > > # ifconfig age0 inet 192.168.2.2 mtu 6144 up
> > > > > > > > > > > # mount -t nfs -o rsize=32768 data:/multimedia /mnt
> > > > > > > > > > > # cp /mnt/Movies/test/a.m2ts /tmp
> > > > > > > > > > > 
> > > > > > > > > > > then the system panics almost immediately. I'll attach
> > > > > > > > > > > the stack trace.
> > > > > > > > > > > 
> > > > > > > > > > > Note, that I'm using jumbo frames (6144 byte) on a 1Gbit
> > > > > > > > > > > network, maybe that's the cause for the panic, because
> > > > > > > > > > > the bcopy (see stack frame #15) fails.
> > > > > > > > > > > 
> > > > > > > > > > > Any clues?
> > > > > > > > > > 
> > > > > > > > > > I tried a similar operation with the nfs mount of
> > > > > > > > > > rsize=32768 and mtu 6144, but the machine runs HEAD and em
> > > > > > > > > > instead of age. I was unable to reproduce the panic on the
> > > > > > > > > > copy of the 5GB file from nfs mount.
> > > > > > > > 
> > > > > > > > Hmmm, I did a quick test. If I do not change the MTU, so just
> > > > > > > > configuring age0 with
> > > > > > > > 
> > > > > > > > # ifconfig age0 inet 192.168.2.2 up
> > > > > > > > 
> > > > > > > > then I can copy all files from the mounted directory without
> > > > > > > > any problems, too. So it's probably age0 related?
> > > > > > > 
> > > > > > > From your backtrace and the buffer printout, I see somewhat
> > > > > > > strange thing. The buffer data address is 0xffffff8171418000,
> > > > > > > while kernel faulted at the attempt to write at
> > > > > > > 0xffffff8171413000, which is is lower then the buffer data
> > > > > > > pointer, at the attempt to bcopy to the buffer.
> > > > > > > 
> > > > > > > The other data suggests that there were no overflow of the data
> > > > > > > from the server response. So it might be that mbuf_len(mp)
> > > > > > > returned negative number ? I am not sure is it possible at all.
> > > > > > > 
> > > > > > > Try this debugging patch, please. You need to add INVARIANTS etc
> > > > > > > to the kernel config.
> > > > > > > 
> > > > > > > diff --git a/sys/fs/nfs/nfs_commonsubs.c
> > > > > > > b/sys/fs/nfs/nfs_commonsubs.c index efc0786..9a6bda5 100644
> > > > > > > --- a/sys/fs/nfs/nfs_commonsubs.c
> > > > > > > +++ b/sys/fs/nfs/nfs_commonsubs.c
> > > > > > > @@ -218,6 +218,7 @@ nfsm_mbufuio(struct nfsrv_descript *nd,
> > > > > > > struct uio *uiop, int siz) }
> > > > > > > 
> > > > > > >  				mbufcp = NFSMTOD(mp, caddr_t);
> > > > > > >  				len = mbuf_len(mp);
> > > > > > > 
> > > > > > > +				KASSERT(len > 0, ("len %d", len));
> > > > > > > 
> > > > > > >  			}
> > > > > > >  			xfer = (left > len) ? len : left;
> > > > > > >  
> > > > > > >  #ifdef notdef
> > > > > > > 
> > > > > > > @@ -239,6 +240,8 @@ nfsm_mbufuio(struct nfsrv_descript *nd,
> > > > > > > struct uio *uiop, int siz) uiop->uio_resid -= xfer;
> > > > > > > 
> > > > > > >  		}
> > > > > > >  		if (uiop->uio_iov->iov_len <= siz) {
> > > > > > > 
> > > > > > > +			KASSERT(uiop->uio_iovcnt > 1, ("uio_iovcnt %d",
> > > > > > > +			    uiop->uio_iovcnt));
> > > > > > > 
> > > > > > >  			uiop->uio_iovcnt--;
> > > > > > >  			uiop->uio_iov++;
> > > > > > >  		
> > > > > > >  		} else {
> > > > > > > 
> > > > > > > I thought that server have returned too long response, but it
> > > > > > > seems to be not the case from your data. Still, I think the
> > > > > > > patch below might be due.
> > > > > > > 
> > > > > > > diff --git a/sys/fs/nfsclient/nfs_clrpcops.c
> > > > > > > b/sys/fs/nfsclient/nfs_clrpcops.c index be0476a..a89b907 100644
> > > > > > > --- a/sys/fs/nfsclient/nfs_clrpcops.c
> > > > > > > +++ b/sys/fs/nfsclient/nfs_clrpcops.c
> > > > > > > @@ -1444,7 +1444,7 @@ nfsrpc_readrpc(vnode_t vp, struct uio
> > > > > > > *uiop, struct ucred *cred, NFSM_DISSECT(tl, u_int32_t *,
> > > > > > > NFSX_UNSIGNED);
> > > > > > > 
> > > > > > >  			eof = fxdr_unsigned(int, *tl);
> > > > > > >  		
> > > > > > >  		}
> > > > > > > 
> > > > > > > -		NFSM_STRSIZ(retlen, rsize);
> > > > > > > +		NFSM_STRSIZ(retlen, len);
> > > > > > > 
> > > > > > >  		error = nfsm_mbufuio(nd, uiop, retlen);
> > > > > > >  		if (error)
> > > > > > >  		
> > > > > > >  			goto nfsmout;
> > > > > > 
> > > > > > I applied your patches and now I get a
> > > > > > 
> > > > > > panic: len -4
> > > > > > cpuid = 1
> > > > > > KDB: enter: panic
> > > > > > Dumping 377 out of 6116
> > > > > > MB:..5%..13%..22%..34%..43%..51%..64%..73%..81%..94%
> > > > > 
> > > > > This means that the age driver either produced corrupted mbuf chain,
> > > > > or filled wrong negative value into the mbuf len field. I am quite
> > > > > certain that the issue is in the driver.
> > > > > 
> > > > > I added the net@ to Cc:, hopefully you could get help there.
> > > > 
> > > > And I've cc'd Pyun who has written most of this driver and is likely
> > > > the one most familiar with its handling of jumbo frames.
> > > 
> > > Try attached one and let me know how it goes.
> > > Note, I don't have age(4) anymore so it wasn't tested at all.
> > 
> > Sorry, ignore previous patch and use this one(age.diff2) instead.
> 
> Thanks for the patch! I ignored the first and applied only the second one, but 
> unfortunately that did not change anything. I still get the "panic: len -4" 
> :-(.

Ok, I contacted QAC and got a hint for its descriptor usage and I
realized the controller does not work as I initially expected!
When I wrote age(4) for the controller, the hardware was available
only for a couple of weeks so I may have not enough time to test
it.  Sorry about that.
I'll let you know when experimental patch is available. Due to lack
of hardware, it would take more time than it used to be.

Thanks for reporting!

> 
> Ciao,
> Christian.