From owner-freebsd-stable@FreeBSD.ORG Tue Sep 27 11:59:37 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0142D1065670 for ; Tue, 27 Sep 2011 11:59:37 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta07.emeryville.ca.mail.comcast.net (qmta07.emeryville.ca.mail.comcast.net [76.96.30.64]) by mx1.freebsd.org (Postfix) with ESMTP id DBE258FC14 for ; Tue, 27 Sep 2011 11:59:36 +0000 (UTC) Received: from omta24.emeryville.ca.mail.comcast.net ([76.96.30.92]) by qmta07.emeryville.ca.mail.comcast.net with comcast id dnvU1h0011zF43QA7nzWCW; Tue, 27 Sep 2011 11:59:30 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta24.emeryville.ca.mail.comcast.net with comcast id do0D1h00F1t3BNj8ko0D6N; Tue, 27 Sep 2011 12:00:13 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id DC19A102C19; Tue, 27 Sep 2011 04:59:35 -0700 (PDT) Date: Tue, 27 Sep 2011 04:59:35 -0700 From: Jeremy Chadwick To: Kirill Yelizarov Message-ID: <20110927115935.GA29196@icarus.home.lan> References: <1317017670.4307.YahooMailNeo@web120527.mail.ne1.yahoo.com> <20110926063210.GA54741@icarus.home.lan> <1317034584.14989.YahooMailNeo@web120530.mail.ne1.yahoo.com> <1317121450.5432.YahooMailNeo@web120529.mail.ne1.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1317121450.5432.YahooMailNeo@web120529.mail.ne1.yahoo.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: rmacklem@uoguelph.ca, freebsd-stable@freebsd.org Subject: Re: NFSD hang X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Sep 2011 11:59:37 -0000 On Tue, Sep 27, 2011 at 04:04:10AM -0700, Kirill Yelizarov wrote: > I found a had sync enabled on my server so I set? zfs?set?sync=disabled data > and will look for failures. Are there any other setting for nfs over zfs i can check or set? > > ________________________________ > > # uname -a > FreeBSD brat.faberlic.com 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Jun? 9 11:22:38 MSD 2011???? root@**:/usr/obj/usr/src/sys/BRAT? amd64 Sources were taken at that time > > There are a lot of this. Should i paste them all here or part is enough? > > brat# procstat -k -k 1666 > ? PID??? TID COMM???????????? TDNAME?????????? KSTACK?????????????????????? > ?1666 100323 nfsd???????????? nfsd: master???? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_run+0x8b nfssvc_nfsd+0x97 nfssvc_nfsserver+0x53 nfssvc+0x44 syscallenter+0x186 syscall+0x40 Xfast_syscall+0xe2 > ?1666 100391 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100392 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100393 nfsd???????????? nfsd: service??? ??????????????????? > ?1666 100394 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100395 nfsd???????????? nfsd: service??? ??????????????????? > ?1666 100396 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100397 nfsd???????????? nfsd: service??? ??????????????????? > ?1666 100398 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100399 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100400 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100401 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100402 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100403 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100404 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100405 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100406 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100407 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100408 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100409 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100410 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100411 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100412 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100413 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100414 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100415 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100416 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100417 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100418 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100419 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100420 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100421 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100422 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100423 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100424 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100425 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100426 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100427 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100428 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100429 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100430 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100431 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100432 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100433 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100434 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100435 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100436 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100437 nfsd???????????? nfsd: service??? ??????????????????? > ?1666 100438 nfsd???????????? nfsd: service??? ??????????????????? > ?1666 100439 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100440 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100441 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100442 nfsd???????????? nfsd: service??? ??????????????????? > ?1666 100443 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100444 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100445 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100446 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > ?1666 100447 nfsd???????????? nfsd: service??? mi_switch+0x176 sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _cv_timedwait_sig+0x11d svc_run_internal+0x939 svc_thread_start+0xb fork_exit+0x114 fork_trampoline+0xe > > > > ________________________________ > From: Jeremy Chadwick > To: Kirill Yelizarov > Cc: "freebsd-stable@freebsd.org" > Sent: Monday, September 26, 2011 10:32 AM > Subject: Re: NFSD hang > > On Sun, Sep 25, 2011 at 11:14:30PM -0700, Kirill Yelizarov wrote: > > Good Day! > > I'v got a problem with nfs share on zfs volume. Everything worked fine for a few month and now it hang. This share stores logs from 9 servers at night, about 1-2Gb from each server. ZFS is filled to 26% and it is v28 > > > > last pid: 46573;? load averages: 195.82, 199.86, 200.12?????????????????????????????????????????????????????????????????????????????? up 108+21:56:50 10:05:06 > > 432 processes: 208 running, 224 sleeping > > CPU:? 0.0% user,? 0.0% nice,? 100% system,? 0.0% interrupt,? 0.0% idle > > Mem: 280M Active, 1469M Inact, 9584M Wired, 161M Cache, 1232M Buf, 311M Free > > Swap: 16G Total, 16G Free > > > > ? PID USERNAME????? THR PRI NICE?? SIZE??? RES STATE?? C?? TIME?? WCPU COMMAND > > ?1666 root????????? 256? 76??? 0? 5788K? 5120K RUN??? 14 476.8H 1508.64% nfsd > > > > # zpool list > > NAME?? SIZE? ALLOC?? FREE??? CAP? DEDUP? HEALTH? ALTROOT > > data? 3.62T?? 954G? 2.69T??? 25%? 1.00x? ONLINE? - > > > > # zfs list > > NAME?? USED? AVAIL? REFER? MOUNTPOINT > > data?? 954G? 2.64T?? 954G? /data > > > > # zfs mount > > data??????????????????????????? /data > > > > What should i look for to resolve it? > > What version of FreeBSD exactly, and what build date? > > Please provide output from "procstat -k -k 1666" (yes, two -k's). Can you explain the correlation between the "sync" parameter (which I have to assume was set to "standard" -- the default -- on all of your filesystems) and your nfsd issue? I do not see the correlation. My intention of asking for procstat -k -k output (which you did provide; thank you) was for Rick Macklem (who's currently working on NFS on FreeBSD) to chime in with some insights. He may be busy, but I've CC'd him here. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |