From owner-freebsd-performance@FreeBSD.ORG  Sat May 19 04:35:27 2007
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 3068116A400;
	Sat, 19 May 2007 04:35:27 +0000 (UTC)
	(envelope-from aedwards@sandvine.com)
Received: from gw.sandvine.com (gw.sandvine.com [199.243.201.138])
	by mx1.freebsd.org (Postfix) with ESMTP id D196813C459;
	Sat, 19 May 2007 04:35:26 +0000 (UTC)
	(envelope-from aedwards@sandvine.com)
Received: from exchange-2.sandvine.com ([192.168.16.12]) by gw.sandvine.com
	with Microsoft SMTPSVC(6.0.3790.1830); 
	Sat, 19 May 2007 00:34:25 -0400
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable
Date: Sat, 19 May 2007 00:34:25 -0400
Message-ID: <5230D3C40B842D4F9FB3CD368021BEF72F092A@exchange-2.sandvine.com>
In-Reply-To: <5230D3C40B842D4F9FB3CD368021BEF72F0926@exchange-2.sandvine.com>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Ufs dead-locks on freebsd 6.2
Thread-Index: AceZgA8XJQM9a6noQX+h86ioyzHh9wAHOkwgAAtyb/A=
From: "Andrew Edwards" <aedwards@sandvine.com>
To: <freebsd-fs@freebsd.org>,
	<freebsd-performance@freebsd.org>
X-OriginalArrivalTime: 19 May 2007 04:34:25.0729 (UTC)
	FILETIME=[FB085B10:01C799CE]
Cc: 
Subject: RE: Ufs dead-locks on freebsd 6.2
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 May 2007 04:35:27 -0000

Fsck didn't help but below is a list of processes that were stuck in
disk.  Also, one potential problem I've hit is I have mrtg scripts that
get launched from cron every min.  MRTG is supposed to have a locking
mechanism to prevent the same script from running at the same time but I
suspect since the filesystem was unaccessible the cron jobs just kept
piling up and piling up until the system would eventually crash.  I
caught it when the load avg. was at 620 and killed all the cron's I
could.  That brought the load avg. down to under 1 however system is
still taking up 30% of the processor time and the disks are basically
idle.  I can still do an ls -l on the root of all my mounted ufs and nfs
filesystems but on one it's taking a considerable amount longer than the
rest.  This particular rsync that I was running is copying into the /d2
fs.

The system is still running and I can make tpc connections and
somethings I have running from inetd work but ssh stops responding right
away and I can't logon via the console.  So, I've captured a core dump
of the system and rebooted so that I could use it again.  Are there any
suggestion as to what to do next?  I'm debaiting installing an adaptec
raid and rebuilding the system to see if I get the same problem, my
worry is that it's the intel raid drivers that are causing this problem
and I have 4 other systems with the same card.


  PID  TT  STAT      TIME COMMAND
    2  ??  DL     0:04.86 [g_event]
    3  ??  DL     2:05.90 [g_up]
    4  ??  DL     1:07.95 [g_down]
    5  ??  DL     0:00.00 [xpt_thrd]
    6  ??  DL     0:00.00 [kqueue taskq]
    7  ??  DL     0:00.00 [thread taskq]
    8  ??  DL     0:06.96 [pagedaemon]
    9  ??  DL     0:00.00 [vmdaemon]
   15  ??  DL     0:22.28 [yarrow]
   24  ??  DL     0:00.01 [usb0]
   25  ??  DL     0:00.00 [usbtask]
   27  ??  DL     0:00.01 [usb1]
   29  ??  DL     0:00.01 [usb2]
   36  ??  DL     1:28.73 [pagezero]
   37  ??  DL     0:08.76 [bufdaemon]
   38  ??  DL     0:00.54 [vnlru]
   39  ??  DL     1:08.12 [syncer]
   40  ??  DL     0:04.00 [softdepflush]
   41  ??  DL     0:11.05 [schedcpu]
27182  ??  Ds     0:05.75 /usr/sbin/syslogd -l /var/run/log -l
/var/named/var/run/log -b 127.0.0.1 -a 10.128.0.0/10
27471  ??  Is     0:01.10 /usr/local/bin/postmaster -D
/usr/local/pgsql/data (postgres)
27594  ??  Is     0:00.04 /usr/libexec/ftpd -m -D -l -l
27602  ??  DL     0:00.28 [smbiod1]
96581  ??  D      0:00.00 cron: running job (cron)
96582  ??  D      0:00.00 cron: running job (cron)
96583  ??  D      0:00.00 cron: running job (cron)
96585  ??  D      0:00.00 cron: running job (cron)
96586  ??  D      0:00.00 cron: running job (cron)
96587  ??  D      0:00.00 cron: running job (cron)
96588  ??  D      0:00.00 cron: running job (cron)
96589  ??  D      0:00.00 cron: running job (cron)
96590  ??  D      0:00.00 cron: running job (cron)
96591  ??  D      0:00.00 cron: running job (cron)
96592  ??  D      0:00.00 cron: running job (cron)
96593  ??  D      0:00.00 cron: running job (cron)
96594  ??  D      0:00.00 cron: running job (cron)
96607  ??  D      0:00.00 cron: running job (cron)
96608  ??  D      0:00.00 cron: running job (cron)
96609  ??  D      0:00.00 cron: running job (cron)
96610  ??  D      0:00.00 cron: running job (cron)
96611  ??  D      0:00.00 cron: running job (cron)
96612  ??  D      0:00.00 cron: running job (cron)
96613  ??  D      0:00.00 cron: running job (cron)
96614  ??  D      0:00.00 cron: running job (cron)
96615  ??  D      0:00.00 cron: running job (cron)
96616  ??  D      0:00.00 cron: running job (cron)
96617  ??  D      0:00.00 cron: running job (cron)
96631  ??  D      0:00.00 cron: running job (cron)
96632  ??  D      0:00.00 cron: running job (cron)
96633  ??  D      0:00.00 cron: running job (cron)
96634  ??  D      0:00.00 cron: running job (cron)
96635  ??  D      0:00.00 cron: running job (cron)
96636  ??  D      0:00.00 cron: running job (cron)
96637  ??  D      0:00.00 cron: running job (cron)
96638  ??  D      0:00.00 cron: running job (cron)
96639  ??  D      0:00.00 cron: running job (cron)
96642  ??  D      0:00.00 cron: running job (cron)
96650  ??  D      0:00.00 cron: running job (cron)
29393  p0  D+    22:04.58 /usr/local/bin/rsync

real    0m0.012s
user    0m0.000s
sys     0m0.010s
/

real    0m0.019s
user    0m0.000s
sys     0m0.016s
/var

real    0m0.028s
user    0m0.008s
sys     0m0.018s
/diskless

real    0m0.017s
user    0m0.008s
sys     0m0.007s
/usr

real    0m0.016s
user    0m0.000s
sys     0m0.015s
/d2

real    0m0.024s
user    0m0.000s
sys     0m0.023s
/exports/home

real    0m2.559s
user    0m0.216s
sys     0m2.307s

-----Original Message-----
From: owner-freebsd-fs@freebsd.org [mailto:owner-freebsd-fs@freebsd.org]
On Behalf Of Andrew Edwards
Sent: Friday, May 18, 2007 6:44 PM
To: freebsd-fs@freebsd.org; freebsd-performance@freebsd.org
Subject: RE: Ufs dead-locks on freebsd 6.2

Okay, I let memtest run for a full day and there has been no memory
errors.  What do I do next?  Just to be on the safe side I'll fsck all
of my fs's and try to reproduce the problem again.

I also don't know what zonelimit is, I see this on similarily configured
machines but running 5.4.  I know it's related to network as I
periodically get network connections to work i.e. ssh, ftp (both server
and client side) but eventually the box will deadlock.  Should I start a
different thread on this?  Happens about once every 30 days on two
server although I havn't checked the exact timing.

-----Original Message-----
From: owner-freebsd-fs@freebsd.org [mailto:owner-freebsd-fs@freebsd.org]
On Behalf Of Eric Anderson
Sent: Friday, May 18, 2007 3:09 PM
To: Kris Kennaway
Cc: freebsd-fs@freebsd.org
Subject: Re: Ufs dead-locks on freebsd 6.2

On 05/18/07 14:00, Kris Kennaway wrote:
> On Thu, May 17, 2007 at 11:38:20PM -0500, Eric Anderson wrote:
>> On 05/17/07 12:47, Kostik Belousov wrote:
>>> On Thu, May 17, 2007 at 01:03:37PM -0400, Andrew Edwards wrote:
>>>> Here it is.
>>>>
>>>> db> show vnode 0xccd47984
>>>> vnode 0xccd47984: tag ufs, type VDIR
>>>>    usecount 5135, writecount 0, refcount 5137 mountedhere 0
>>>>    flags (VV_ROOT)
>>>>    v_object 0xcd02518c ref 0 pages 1
>>>>    #0 0xc0593f0d at lockmgr+0x4ed
>>>> #1 0xc06b8e0e at ffs_lock+0x76
>>>> #2 0xc0739787 at VOP_LOCK_APV+0x87
>>>> #3 0xc0601c28 at vn_lock+0xac
>>>> #4 0xc05ee832 at lookup+0xde
>>>> #5 0xc05ee4b2 at namei+0x39a
>>>> #6 0xc05e2ab0 at unp_connect+0xf0
>>>> #7 0xc05e1a6a at uipc_connect+0x66
>>>> #8 0xc05d9992 at soconnect+0x4e
>>>> #9 0xc05dec60 at kern_connect+0x74
>>>> #10 0xc05debdf at connect+0x2f
>>>> #11 0xc0723e2b at syscall+0x25b
>>>> #12 0xc070ee0f at Xint0x80_syscall+0x1f
>>>>
>>>>        ino 2, on dev amrd0s1a
>>> It seems to be the sort of things that cannot happen. VOP_LOCK()=20
>>> returned 0, but vnode was not really locked.
>>>
>>> Although claiming that kernel code cannot have such bug is too=20
>>> optimistic, I would first make sure that:
>>> 1. You checked the memory of the machine.
>>> 2. Your kernel is built from pristine sources.
>>
>> This looks precisely like a lock I was seeing on one of my NFS
servers.=20
>>  Only one of the filesystems would cause it, but it was the same one=20
>> each time, not necessarily under any kind of load.  Things like=20
>> mountd would get wedged in state 'ufs', and other things would get=20
>> stuck in one of the lock states (I can't recall).
>=20
> ...so you cannot conclude that it looks "precisely like" this case.
>=20
> Please, don't confuse bug reports by this kind of claim unless you=20
> have made a detailed comparison of the debugging traces to yours.


Understood - my mistake.

Eric


_______________________________________________
freebsd-fs@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
_______________________________________________
freebsd-fs@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"