From owner-freebsd-current@FreeBSD.ORG Sun Jan 6 17:13:38 2008 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 17ECD16A46C; Sun, 6 Jan 2008 17:13:37 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from weak.local (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 3EA2013C4F2; Sun, 6 Jan 2008 17:13:31 +0000 (UTC) (envelope-from kris@FreeBSD.org) Message-ID: <47810C39.8010302@FreeBSD.org> Date: Sun, 06 Jan 2008 18:13:29 +0100 From: Kris Kennaway User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Henri Hennebert References: <20080104163352.GA42835@lor.one-eyed-alien.net> <9bbcef730801040958t36e48c9fjd0fbfabd49b08b97@mail.gmail.com> <200801061051.26817.peter.schuller@infidyne.com> <9bbcef730801060458k4bc9f2d6uc3f097d70e087b68@mail.gmail.com> <4780D289.7020509@FreeBSD.org> <4780F839.5020200@restart.be> <4780FBE2.8040208@FreeBSD.org> <47810621.8080406@restart.be> In-Reply-To: <47810621.8080406@restart.be> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-current@FreeBSD.org, Peter Schuller , Ivan Voras , Brooks Davis Subject: Re: When will ZFS become stable? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jan 2008 17:13:38 -0000 Henri Hennebert wrote: > Kris Kennaway wrote: >> Henri Hennebert wrote: >>> Kris Kennaway wrote: >>>> Ivan Voras wrote: >>>>> On 06/01/2008, Peter Schuller wrote: >>>>>>> This number is not so large. It seems to be easily crashed by rsync, >>>>>>> for example (speaking from my own experience, and also some of my >>>>>>> colleagues). >>>>>> I can definitely say this is not *generally* true, as I do a lot of >>>>>> rsyncing/rdiff-backup:ing and similar stuff (with many files / >>>>>> large files) >>>>>> on ZFS without any stability issues. Problems for me have been >>>>>> limited to >>>>>> 32bit and the memory exhaustion issue rather than "hard" issues. >>>>> >>>>> It's not generally true since kmem problems with rsync are often hard >>>>> to repeat - I have them on one machine, but not on another, similar >>>>> machine. This nonrepeatability is also a part of the problem. >>>>> >>>>>> But perhaps that's all you are referring to. >>>>> >>>>> Mostly. I did have a ZFS crash with rsync that wasn't kmem related, >>>>> but only once. >>>> >>>> kmem problems are just tuning. They are not indicative of stability >>>> problems in ZFS. Please report any further non-kmem panics you >>>> experience. >>> >>> I encounter 2 times a deadlock during high I/O activity (the last one >>> during rsync + rm -r on a 5GB hierarchy (openoffice-2/work). >>> >>> I was running with this patch: >>> http://people.freebsd.org/~pjd/patches/zgd_done.patch >>> db> show allpcpu >>> Current CPU: 1 >>> >>> cpuid = 0 >>> curthread = 0xa5ebe440: pid 3422 "txg_thread_enter" >>> curpcb = 0xeb175d90 >>> fpcurthread = none >>> idlethread = 0xa5529aa0: pid 12 "idle: cpu0" >>> APIC ID = 0 >>> currentldt = 0x50 >>> >>> cpuid = 1 >>> curthread = 0xa56ab220: pid 47 "arc_reclaim_thread" >>> curpcb = 0xe6837d90 >>> fpcurthread = none >>> idlethread = 0xa5529880: pid 11 "idle: cpu1" >>> APIC ID = 1 >>> currentldt = 0x50 >>> >>> With the 2 times arc_reclaim_thread `running` >> >> Backtraces of the affected processes (or just alltrace) are usually > > noted for next time > >> required to proceed with debugging, and lock status is also often >> vital (show alllocks, requires witness). > > I add it to my kernel config > > Also, in the case when threads are >> actually running (not deadlocked), then it is often useful to >> repeatedly break/continue and sample many backtraces to try and >> determine where the threads are looping. > > I do this after the second deadlock and arc_reclaim_thread was always > there and second cpu was idle. To repeat, it is important not just to note which thread is running, but *what the thread is doing*. This means repeatedly comparing the backtraces, which will allow you to build up a picture of which part of the code it is looping in. Kris