Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 21 Oct 2014 11:10:07 +0200
From:      "Tobias C. Berner" <tcberner@gmail.com>
To:        araujo@freebsd.org
Cc:        Rick Macklem <rmacklem@uoguelph.ca>, Allan Jude <allanjude@freebsd.org>, freebsd-current <freebsd-current@freebsd.org>
Subject:   Re: kernel page fault with nfs
Message-ID:  <9752382.X6OBTMl3Me@noxon.firefly>
In-Reply-To: <CAOfEmZi9UF7o8sXN2=Ta2%2BLHwbZPNanzbMZcRQkYsc5Et%2B0EtA@mail.gmail.com>
References:  <9401112.gOMxVk2Kpo@noxon.firefly> <CAOshKtdY7ekU=zoWeR5OJAYszhEWAkpJYk6OD72%2BWoxMyqaKAQ@mail.gmail.com> <CAOfEmZi9UF7o8sXN2=Ta2%2BLHwbZPNanzbMZcRQkYsc5Et%2B0EtA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Marcelo

The following ist the current fstab-line which seems to run smoothly:
odo.firefly:/storage/multimedia       /multimedia          nfs 
readahead=4,soft,intr,rw,tcp,wsize=32768,rsize=32768,late 0 0 

nfsstat -m: 
odo.firefly:/storage/multimedia on /multimedia
nfsv3,tcp,resvport,soft,intr,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,a
cregmax=60,nametimeo=60,negnametimeo=60,rsize=32768,wsize=32768,readdirsi
ze=32768,readahead=4,wcommitsize=2798255,timeout=120,retrans=2




Now the bad line (no different appart from the typo)
odo.firefly:/storage/multimedia       /multimedia          nfs 
readahead=4,soft,intr,rw,tcp,wsize=32767,rsize=32767,late 0 0 
which leads to the page-faults.
And as you said wsize/rsize gets rounded down to the multiple of 512:
odo.firefly:/storage/multimedia on /multimedia
nfsv3,tcp,resvport,soft,intr,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,a
cregmax=60,nametimeo=60,negnametimeo=60,rsize=32256,wsize=32256,readdirsi
ze=32256,readahead=4,wcommitsize=2798255,timeout=120,retrans=2


I can easily reproduce the pagefault by letting for example multimedia/gpodder write 
to the nfs.



hope this helps,

mfg Tobias




On Tuesday 21 October 2014 15.45:24 Marcelo Araujo wrote:
> Hello Tobias,
> 
> That sounds good, at least you don't have any crash so far.
> I agree with you, seems a bug, I'm gonna take a look on that.
> 
> Could you share with me your testbed or how you can reproduce the issue?
> 
> Best Regards,
> 
> 2014-10-21 15:36 GMT+08:00 T.C.Berner <tcberner@gmail.com>:
> > The system now has an uptime of >24h using NFS heavily.
> > 
> > So wsize/rsize=2^15-1 seems to have been the problem.... which is imho a
> > bug therefore.
> > 
> > 
> > mfg Tobias
> > 
> > 2014-10-21 5:11 GMT+02:00 Marcelo Araujo <araujobsdport@gmail.com>:
> >> Hello Tibias,
> >> 
> >> Any news?
> >> 
> >> 
> >> Best Regards,
> >> 
> >> 2014-10-20 20:55 GMT+08:00 Rick Macklem <rmacklem@uoguelph.ca>:
> >>> Tobias C. Berner wrote:
> >>> > Now that I posted it, 32767 should of course be 2^15=32768. Let me
> >>> > recheck if it still
> >>> > hangs with the correct value.
> >>> > 
> >>> > On Monday 20 October 2014 09.15:39 Tobias C. Berner wrote:
> >>> > > Hi Marcelo
> >>> > > 
> >>> > > Yes, I'm using readahead:
> >>> > > The mountoptions are
> >>> > > "readahead=4,soft,intr,rw,tcp,wsize=32767,rsize=32767,late"
> >>> 
> >>> If you type "nfsstat -m", you will see what is actually getting used.
> >>> (I suspect the above rsize/wsize got clipped to 32256 or something like
> >>> 
> >>>  that. I think it clips it to a multiple of 512.)
> >>> 
> >>> If rsize/wsize are not a power of 2, there are issues, although I've
> >>> never
> >>> been able to see why it is broken. Maybe it should clip it to the power
> >>> of
> >>> 2 below the value, since it causes unexplained problems otherwise.
> >>> 
> >>> rick
> >>> 
> >>> > > mfg Tobias
> >>> > > 
> >>> > > On Monday 20 October 2014 10.41:30 Marcelo Araujo wrote:
> >>> > > > Hello Tobias,
> >>> > > > 
> >>> > > > Could you show how you are mount the NFS share?
> >>> > > > Are you using 'readahead' option?
> >>> > > > 
> >>> > > > Best Regards,
> >>> > > > 
> >>> > > > 2014-10-19 17:40 GMT+08:00 Tobias C. Berner 
<tcberner@gmail.com>:
> >>> > > > > both are at 1100038.
> >>> > > > > 
> >>> > > > > On Sunday 19 October 2014 11.12:36 Marcelo Araujo wrote:
> >>> > > > > > It is still strange, could you do what Allan said and send us
> >>> > > > > > the
> >>> > > > > > result
> >>> > > > > 
> >>> > > > > in
> >>> > > > > 
> >>> > > > > > case you are not sure you have world and kernel in the same
> >>> > > > > > revision!
> >>> > > > > > 
> >>> > > > > > On Oct 19, 2014 6:48 AM, "Tobias C. Berner"
> >>> > > > > > 
> >>> > > > > > <tcberner@gmail.com> wrote:
> >>> > > > > > >  Hi
> >>> > > > > > > 
> >>> > > > > > > World ist from october 16, installed world and kernel then.
> >>> > > > > > > 
> >>> > > > > > > Kernel was later rebuilt with debug-options.
> >>> > > > > > > 
> >>> > > > > > > 
> >>> > > > > > > 
> >>> > > > > > > 
> >>> > > > > > > 
> >>> > > > > > > Is the following more sensible?
> >>> > 
> >>> > 
##################################################
> >>> > 
> >>> > > > > > > # kgdb NOXON/kernel.debug vmcore.1
From owner-freebsd-current@FreeBSD.ORG  Tue Oct 21 09:13:36 2014
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C1593E29;
 Tue, 21 Oct 2014 09:13:36 +0000 (UTC)
Received: from mail-wg0-x234.google.com (mail-wg0-x234.google.com
 [IPv6:2a00:1450:400c:c00::234])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id F167BC9D;
 Tue, 21 Oct 2014 09:13:35 +0000 (UTC)
Received: by mail-wg0-f52.google.com with SMTP id a1so809847wgh.23
 for <multiple recipients>; Tue, 21 Oct 2014 02:13:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:reply-to:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=4JUl+jNB1Wt+rJXZ5LsR4VL7vbLbqz+GcvEM8Ikjyig=;
 b=O+14LEGq509YQoQ+QZYBPRFDl77tM/a1dFfTSZGxu4venU7C02N4tGHomNN8UIjwpQ
 kLtTy7rq1rcQvu3pl74JkW5+JqwlDRJAU4C6DdNgqShzS8yWfM0TVZ7EJEmGJTrpPVlN
 ByPrRkUnyCIIF6+f7c3RwcrA0LDBLbjidjRUkuf6Mc9Mbt2H8UliF0LV+MGuwonaHCvs
 aN/34+WHwDkwPMIYz7uUVrM//1l3s/BnnTBb8fN6eAHrTBamWP9MmspM0ZIBOzGiIH/e
 Bx2p5VZQ6ivAB0CBklTXl9BnGSnyutShuHtF2NoGX6w7qsGtgW84mIb0Zw1DVd09Ofkr
 G6Iw==
MIME-Version: 1.0
X-Received: by 10.194.237.9 with SMTP id uy9mr39830766wjc.69.1413882812908;
 Tue, 21 Oct 2014 02:13:32 -0700 (PDT)
Received: by 10.216.159.193 with HTTP; Tue, 21 Oct 2014 02:13:32 -0700 (PDT)
Reply-To: araujo@FreeBSD.org
In-Reply-To: <9752382.X6OBTMl3Me@noxon.firefly>
References: <9401112.gOMxVk2Kpo@noxon.firefly>
 <CAOshKtdY7ekU=zoWeR5OJAYszhEWAkpJYk6OD72+WoxMyqaKAQ@mail.gmail.com>
 <CAOfEmZi9UF7o8sXN2=Ta2+LHwbZPNanzbMZcRQkYsc5Et+0EtA@mail.gmail.com>
 <9752382.X6OBTMl3Me@noxon.firefly>
Date: Tue, 21 Oct 2014 17:13:32 +0800
Message-ID: <CAOfEmZi-CChh1464xOTcOG=GMzfVQ_XZ++_uZzNQX6+cvx-pGg@mail.gmail.com>
Subject: Re: kernel page fault with nfs
From: Marcelo Araujo <araujobsdport@gmail.com>
To: "Tobias C. Berner" <tcberner@gmail.com>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: Rick Macklem <rmacklem@uoguelph.ca>, Allan Jude <allanjude@freebsd.org>,
 freebsd-current <freebsd-current@freebsd.org>
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>;
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Oct 2014 09:13:36 -0000

Tobias,

Thank you very much, it really helps to simulate the problem.
I'm gonna try as soon as possible and I will keep you informed.

Best Regards,

2014-10-21 17:10 GMT+08:00 Tobias C. Berner <tcberner@gmail.com>:

>  Hi Marcelo
>
>
>
> The following ist the current fstab-line which seems to run smoothly:
>
> odo.firefly:/storage/multimedia /multimedia nfs
> readahead=4,soft,intr,rw,tcp,wsize=32768,rsize=32768,late 0 0
>
>
>
> nfsstat -m:
>
> odo.firefly:/storage/multimedia on /multimedia
>
>
> nfsv3,tcp,resvport,soft,intr,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=32768,wsize=32768,readdirsize=32768,readahead=4,wcommitsize=2798255,timeout=120,retrans=2
>
>
>
>
>
>
>
>
>
> Now the bad line (no different appart from the typo)
>
> odo.firefly:/storage/multimedia /multimedia nfs
> readahead=4,soft,intr,rw,tcp,wsize=32767,rsize=32767,late 0 0
>
> which leads to the page-faults.
>
> And as you said wsize/rsize gets rounded down to the multiple of 512:
>
> odo.firefly:/storage/multimedia on /multimedia
>
>
> nfsv3,tcp,resvport,soft,intr,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=32256,wsize=32256,readdirsize=32256,readahead=4,wcommitsize=2798255,timeout=120,retrans=2
>
>
>
>
>
> I can easily reproduce the pagefault by letting for example
> multimedia/gpodder write to the nfs.
>
>
>
>
>
>
>
> hope this helps,
>
>
>
> mfg Tobias
>
>
>
>
>
>
>
>
>
> On Tuesday 21 October 2014 15.45:24 Marcelo Araujo wrote:
>
> > Hello Tobias,
>
> >
>
> > That sounds good, at least you don't have any crash so far.
>
> > I agree with you, seems a bug, I'm gonna take a look on that.
>
> >
>
> > Could you share with me your testbed or how you can reproduce the issue?
>
> >
>
> > Best Regards,
>
> >
>
> > 2014-10-21 15:36 GMT+08:00 T.C.Berner <tcberner@gmail.com>:
>
> > > The system now has an uptime of >24h using NFS heavily.
>
> > >
>
> > > So wsize/rsize=2^15-1 seems to have been the problem.... which is imho
> a
>
> > > bug therefore.
>
> > >
>
> > >
>
> > > mfg Tobias
>
> > >
>
> > > 2014-10-21 5:11 GMT+02:00 Marcelo Araujo <araujobsdport@gmail.com>:
>
> > >> Hello Tibias,
>
> > >>
>
> > >> Any news?
>
> > >>
>
> > >>
>
> > >> Best Regards,
>
> > >>
>
> > >> 2014-10-20 20:55 GMT+08:00 Rick Macklem <rmacklem@uoguelph.ca>:
>
> > >>> Tobias C. Berner wrote:
>
> > >>> > Now that I posted it, 32767 should of course be 2^15=32768. Let me
>
> > >>> > recheck if it still
>
> > >>> > hangs with the correct value.
>
> > >>> >
>
> > >>> > On Monday 20 October 2014 09.15:39 Tobias C. Berner wrote:
>
> > >>> > > Hi Marcelo
>
> > >>> > >
>
> > >>> > > Yes, I'm using readahead:
>
> > >>> > > The mountoptions are
>
> > >>> > > "readahead=4,soft,intr,rw,tcp,wsize=32767,rsize=32767,late"
>
> > >>>
>
> > >>> If you type "nfsstat -m", you will see what is actually getting used.
>
> > >>> (I suspect the above rsize/wsize got clipped to 32256 or something
> like
>
> > >>>
>
> > >>> that. I think it clips it to a multiple of 512.)
>
> > >>>
>
> > >>> If rsize/wsize are not a power of 2, there are issues, although I've
>
> > >>> never
>
> > >>> been able to see why it is broken. Maybe it should clip it to the
> power
>
> > >>> of
>
> > >>> 2 below the value, since it causes unexplained problems otherwise.
>
> > >>>
>
> > >>> rick
>
> > >>>
>
> > >>> > > mfg Tobias
>
> > >>> > >
>
> > >>> > > On Monday 20 October 2014 10.41:30 Marcelo Araujo wrote:
>
> > >>> > > > Hello Tobias,
>
> > >>> > > >
>
> > >>> > > > Could you show how you are mount the NFS share?
>
> > >>> > > > Are you using 'readahead' option?
>
> > >>> > > >
>
> > >>> > > > Best Regards,
>
> > >>> > > >
>
> > >>> > > > 2014-10-19 17:40 GMT+08:00 Tobias C. Berner <
> tcberner@gmail.com>:
>
> > >>> > > > > both are at 1100038.
>
> > >>> > > > >
>
> > >>> > > > > On Sunday 19 October 2014 11.12:36 Marcelo Araujo wrote:
>
> > >>> > > > > > It is still strange, could you do what Allan said and send
> us
>
> > >>> > > > > > the
>
> > >>> > > > > > result
>
> > >>> > > > >
>
> > >>> > > > > in
>
> > >>> > > > >
>
> > >>> > > > > > case you are not sure you have world and kernel in the same
>
> > >>> > > > > > revision!
>
> > >>> > > > > >
>
> > >>> > > > > > On Oct 19, 2014 6:48 AM, "Tobias C. Berner"
>
> > >>> > > > > >
>
> > >>> > > > > > <tcberner@gmail.com> wrote:
>
> > >>> > > > > > > Hi
>
> > >>> > > > > > >
>
> > >>> > > > > > > World ist from october 16, installed world and kernel
> then.
>
> > >>> > > > > > >
>
> > >>> > > > > > > Kernel was later rebuilt with debug-options.
>
> > >>> > > > > > >
>
> > >>> > > > > > >
>
> > >>> > > > > > >
>
> > >>> > > > > > >
>
> > >>> > > > > > >
>
> > >>> > > > > > > Is the following more sensible?
>
> > >>> >
>
> > >>> > ##################################################
>
> > >>> >
>
> > >>> > > > > > > # kgdb NOXON/kernel.debug vmcore.1
>
> > >>> > > > > > >
>
> > >>> > > > > > > Fatal trap 12: page fault while in kernel mode
>
> > >>> > > > > > >
>
> > >>> > > > > > > cpuid = 5; apic id = 05
>
> > >>> > > > > > >
>
> > >>> > > > > > > fault virtual address = 0xfffffe07d1744000
>
> > >>> > > > > > >
>
> > >>> > > > > > > fault code = supervisor write data, page not present
>
> > >>> > > > > > >
>
> > >>> > > > > > > instruction pointer = 0x20:0xffffffff80d4d58a
>
> > >>> > > > > > >
>
> > >>> > > > > > > stack pointer = 0x28:0xfffffe086057f240
>
> > >>> > > > > > >
>
> > >>> > > > > > > frame pointer = 0x28:0xfffffe086057f2f0
>
> > >>> > > > > > >
>
> > >>> > > > > > > code segment = base 0x0, limit 0xfffff, type 0x1b
>
> > >>> > > > > > >
>
> > >>> > > > > > > = DPL 0, pres 1, long 1, def32 0, gran 1
>
> > >>> > > > > > >
>
> > >>> > > > > > > processor eflags = interrupt enabled, resume, IOPL = 0
>
> > >>> > > > > > >
>
> > >>> > > > > > > current process = 6524 (python2.7)
>
> > >>> > > > > > >
>
> > >>> > > > > > >
>
> > >>> > > > > > >
>
> > >>> > > > > > >
>
> > >>> > > > > > >
>
> > >>> > > > > > > (kgdb) bt
>
> > >>> > > > > > >
>
> > >>> > > > > > > #0 doadump (textdump=1) at pcpu.h:219
>
> > >>> > > > > > >
>
> > >>> > > > > > > #1 0xffffffff80926b6d in kern_reboot (howto=260) at
>
> > >>> > > > > > > /usr/src/sys/kern/kern_shutdown.c:447
>
> > >>> > > > > > >
>
> > >>> > > > > > > #2 0xffffffff809270c0 in panic (fmt=<value optimized
> out>)
>
> > >>> > > > > > > at
>
> > >>> > > > > > > /usr/src/sys/kern/kern_shutdown.c:746
>
> > >>> > > > > > >
>
> > >>> > > > > > > #3 0xffffffff8035f167 in db_panic (addr=<value optimized
>
> > >>> > > > > > > out>,
>
> > >>> > > > > > > have_addr=2, count=0, modif=0x0) at
>
> > >>> > > > > > > /usr/src/sys/ddb/db_command.c:473
>
> > >>> > > > > > >
>
> > >>> > > > > > > #4 0xffffffff8035ed7d in db_command (cmd_table=0x0) at
>
> > >>> > > > > > > /usr/src/sys/ddb/db_command.c:440
>
> > >>> > > > > > >
>
> > >>> > > > > > > #5 0xffffffff8035eaf4 in db_command_loop () at
>
> > >>> > > > > > > /usr/src/sys/ddb/db_command.c:493
>
> > >>> > > > > > >
>
> > >>> > > > > > > #6 0xffffffff80361600 in db_trap (type=<value optimized
>
> > >>> > > > > > > out>,
>
> > >>> > > > > > > code=0)
>
> > >>> > > > >
>
> > >>> > > > > at
>
> > >>> > > > >
>
> > >>> > > > > > > /usr/src/sys/ddb/db_main.c:251
>
> > >>> > > > > > >
>
> > >>> > > > > > > #7 0xffffffff80966f01 in kdb_trap (type=12, code=0,
>
> > >>> > > > > > > tf=<value
>
> > >>> > > > > > > optimized
>
> > >>> > > > > > > out>) at /usr/src/sys/kern/subr_kdb.c:654
>
> > >>> > > > > > >
>
> > >>> > > > > > > #8 0xffffffff80d4fa7c in trap_fatal
>
> > >>> > > > > > > (frame=0xfffffe086057f190,
>
> > >>> > > > >
>
> > >>> > > > > eva=<value
>
> > >>> > > > >
>
> > >>> > > > > > > optimized out>) at /usr/src/sys/amd64/amd64/trap.c:861
>
> > >>> > > > > > >
>
> > >>> > > > > > > #9 0xffffffff80d4fe0c in trap_pfault
>
> > >>> > > > > > > (frame=0xfffffe086057f190,
>
> > >>> > > > > > > usermode=<value optimized out>) at
>
> > >>> > > > > > > /usr/src/sys/amd64/amd64/trap.c:677
>
> > >>> > > > > > >
>
> > >>> > > > > > > #10 0xffffffff80d4f42e in trap (frame=0xfffffe086057f190)
>
> > >>> > > > > > > at
>
> > >>> > > > > > > /usr/src/sys/amd64/amd64/trap.c:426
>
> > >>> > > > > > >
>
> > >>> > > > > > > #11 0xffffffff80d33972 in calltrap () at
>
> > >>> > > > > > > /usr/src/sys/amd64/amd64/exception.S:231
>
> > >>> > > > > > >
>
> > >>> > > > > > > #12 0xffffffff80d4d58a in bzero () at
>
> > >>> > > > > > > /usr/src/sys/amd64/amd64/support.S:53
>
> > >>> > > > > > >
>
> > >>> > > > > > > #13 0xffffffff80830463 in ncl_doio
> (vp=0xfffff801e7f99938,
>
> > >>> > > > > > > bp=0xfffffe07c5a168e8, cr=<value optimized out>,
> td=<value
>
> > >>> > > > > > > optimized
>
> > >>> > > > >
>
> > >>> > > > > out>,
>
> > >>> > > > >
>
> > >>> > > > > > > called_from_strategy=<value optimized out>)
>
> > >>> > > > > > >
>
> > >>> > > > > > > at /usr/src/sys/fs/nfsclient/nfs_clbio.c:1648
>
> > >>> >
>
> > >>> > _______________________________________________
>
> > >>> > freebsd-current@freebsd.org mailing list
>
> > >>> > http://lists.freebsd.org/mailman/listinfo/freebsd-current
>
> > >>> > To unsubscribe, send any mail to
>
> > >>> > "freebsd-current-unsubscribe@freebsd.org"
>
> > >>
>
> > >> --
>
> > >>
>
> > >> --
>
> > >> Marcelo Araujo (__)araujo@FreeBSD.org
>
> > >> \\\'',)http://www.FreeBSD.org <http://www.freebsd.org/>; \/ \ ^ Power
>
> > >> To Server. .\. /_)
>
>
>



-- 

-- 
Marcelo Araujo            (__)araujo@FreeBSD.org
\\\'',)http://www.FreeBSD.org <http://www.freebsd.org/>;   \/  \ ^
Power To Server.         .\. /_)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9752382.X6OBTMl3Me>