From owner-freebsd-fs@FreeBSD.ORG Sun Apr 6 21:23:45 2008 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E78EB106566B for ; Sun, 6 Apr 2008 21:23:45 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.156]) by mx1.freebsd.org (Postfix) with ESMTP id 6029D8FC1F for ; Sun, 6 Apr 2008 21:23:44 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: by fg-out-1718.google.com with SMTP id 16so1281233fgg.35 for ; Sun, 06 Apr 2008 14:23:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; bh=AurCNhkEpVTH8HbartThzq8u6lwvOgdFOtvvcA/Ep6A=; b=COnLq+fXrEYyZ66CMWKkyYe3zMvVLo3dsicdn414Gnvc46me7KbMjptXi4ZSfemVpWcKggziQUJ4S2UtF7V/kJcqct+J7jfsh1ufjkzQ/Qj+tHfs4CyM61w9/plz/yl2g60Q+dPkX0KW4bXOvx0+8dcTBdy2moY9PaLByn8bS98= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; b=CqIV3BypsIzS5zhgk45hQ3i2iZ4RFcA4zTsIxKNLaoboA7P1k4yIXznSAF7HFFfrheSPxLiLJuX4SWS5ZuaraStqHkR1c2KlBrA5mCsV8pq9naEUj1HjxZEXUG8dN49D8i4+ku2D+XEYXNtSLVbRLuGslSPCFm8fGhQXII7K7cs= Received: by 10.86.94.11 with SMTP id r11mr2866468fgb.56.1207515279993; Sun, 06 Apr 2008 13:54:39 -0700 (PDT) Received: by 10.86.36.15 with HTTP; Sun, 6 Apr 2008 13:54:39 -0700 (PDT) Message-ID: <3bbf2fe10804061354n4a4ad211s96695e8ccef4f99@mail.gmail.com> Date: Sun, 6 Apr 2008 22:54:39 +0200 From: "Attilio Rao" Sender: asmrookie@gmail.com To: "FreeBSD Current" , fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Google-Sender-Auth: 47325ec4e9e58fe2 Cc: Subject: [FOLLOW UP] lockmgr rewriting -- round 2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Apr 2008 21:23:46 -0000 As alredy mentioned in earlier e-mail, an optimized version for lockmgr has been committed less than half an hour ago to CVS stock kernel. This has been made necessary because the patch needed a wider coverage for testing and feedbacks, after a good "troubleshooter" period where several bugs were fixed. What does the new patch introduce in the KPI? Together with a faster implementation, the patch provides the ability to use a rwlock as interlock, through functions lockmgr_rw() and lockmgr_args_rw(). This feature (requested by jeff@) has been implemented through the generic lock layer in a generalized function (__lockmgr_args(), the real core function for new lockmgr). What is missing? Currently, the WITNESS support has been dropped but it is supposed to be re-added very soon. Other surrounding support (lockmgr stack, ktr traces, showing vnodes, etc) have been judged enough good to find bugs so far that WITNESS support can be delayed for the moment. Also, the draining can be subjected to a race, alredy present in the old version of lockmgr too. Basically, if a drainer finds a queue of shared waiters and wake them up and later the first shared owner wakes up the drainer, the drainer and other shared (eventual) owners will race fot the lock acquisition. However LK_DRAIN is not widely used and the current implementation will have this fixed asap (in the while, this issue is a lot more mitigated than the old implementation). Future plans? In order to get the maximum by the primitive, adaptive spinning support can be tried. Also other wakeups mechanism can be explored. Benchmarks? Kris Kennaway did some rough benchmark alredy with the patch. On some benchmarks (like mysql in write mode) he experienced a big performance boost (in order of 200%) while on other a less massive one (myisam gets improved of a 10% factor). We expect, however, that a lot of workload will be improved by this work so, we wait for you direct experiences on the battle field. This work has been possible thanks to the generous sponsor of Google through the Summer of code (2007) program, thanks to the FreeBSD developers community which picked up this project and in particular thanks to Jeff Roberson and Kris Kennaway that through revisions and testing found several bugs. Peter Holm and Daniel Gerzo, also, tested the patch and confirmed stability of the work on their particular workloads. Any further comments and reports will be appreciated by the community. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-fs@FreeBSD.ORG Mon Apr 7 08:52:01 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 98F031065672 for ; Mon, 7 Apr 2008 08:52:01 +0000 (UTC) (envelope-from uspoerlein@gmail.com) Received: from acme.spoerlein.net (cl-43.dus-01.de.sixxs.net [IPv6:2a01:198:200:2a::2]) by mx1.freebsd.org (Postfix) with ESMTP id 2BD028FC1D for ; Mon, 7 Apr 2008 08:52:00 +0000 (UTC) (envelope-from uspoerlein@gmail.com) Received: from acme.spoerlein.net (localhost.spoerlein.net [127.0.0.1]) by acme.spoerlein.net (8.14.2/8.14.2) with ESMTP id m378pwwF030377 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 7 Apr 2008 10:51:58 +0200 (CEST) (envelope-from uspoerlein@gmail.com) Received: (from uqs@localhost) by acme.spoerlein.net (8.14.2/8.14.2/Submit) id m378puWe030376; Mon, 7 Apr 2008 10:51:56 +0200 (CEST) (envelope-from uspoerlein@gmail.com) Date: Mon, 7 Apr 2008 10:51:56 +0200 From: Ulrich Spoerlein To: Attila Nagy Message-ID: <20080407085156.GA30355@acme.spoerlein.net> References: <47F0D02B.8060504@fsn.hu> <20080331152251.62526181@peedub.jennejohn.org> <47F0EDD6.8060402@fsn.hu> <47F0F1E8.1080504@fsn.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <47F0F1E8.1080504@fsn.hu> User-Agent: Mutt/1.5.17 (2007-11-01) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS hangs very often X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Apr 2008 08:52:01 -0000 On Mon, 31.03.2008 at 05:15:14 +0000, Attila Nagy wrote: > On 2008.03.31. 15:57, Attila Nagy wrote: > > My system completely locks up, I can't start new processes, but > > runnings ones -which don't do IO- can continue (for example a top). > > I don't know ZFS internals (BTW, /usr and others are of course > > different ZFS filesystems on the pool), but it might be, that > > something major gets locked and that's why it stops here. > > > > Anyways, if somebody can help to back this out, I'm here to try > > patches, or do experiments. > I forgot to tell -I don't know, maybe it's important-, that I have an > SMP box (but tried with UP kernel, the effect is the same) and > compression is enabled on every filesystems. Hi, I had frequent deadlocks as you just describe when using GMIRROR on SMP systems and running with PREEMPTION. Could you try a kernel without PREEMPTION? Or perhaps break the GMIRRORs for testing purposes? Cheers, Ulrich Spoerlein -- It is better to remain silent and be thought a fool, than to speak, and remove all doubt. From owner-freebsd-fs@FreeBSD.ORG Mon Apr 7 08:57:58 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 55DA71065671 for ; Mon, 7 Apr 2008 08:57:58 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id 1D07C8FC14 for ; Mon, 7 Apr 2008 08:57:51 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from [172.16.129.140] (fw.axelero.hu [195.228.243.120]) by people.fsn.hu (Postfix) with ESMTP id 0E8C5AF88D; Mon, 7 Apr 2008 10:57:45 +0200 (CEST) Message-ID: <47F9E208.7050100@fsn.hu> Date: Mon, 07 Apr 2008 10:57:44 +0200 From: Attila Nagy User-Agent: Thunderbird 2.0.0.12 (Windows/20080213) MIME-Version: 1.0 To: Ulrich Spoerlein References: <47F0D02B.8060504@fsn.hu> <20080331152251.62526181@peedub.jennejohn.org> <47F0EDD6.8060402@fsn.hu> <47F0F1E8.1080504@fsn.hu> <20080407085156.GA30355@acme.spoerlein.net> In-Reply-To: <20080407085156.GA30355@acme.spoerlein.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: ZFS hangs very often X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Apr 2008 08:57:58 -0000 On 2008.04.07. 10:51, Ulrich Spoerlein wrote: > On Mon, 31.03.2008 at 05:15:14 +0000, Attila Nagy wrote: > >> On 2008.03.31. 15:57, Attila Nagy wrote: >> >>> My system completely locks up, I can't start new processes, but >>> runnings ones -which don't do IO- can continue (for example a top). >>> I don't know ZFS internals (BTW, /usr and others are of course >>> different ZFS filesystems on the pool), but it might be, that >>> something major gets locked and that's why it stops here. >>> >>> Anyways, if somebody can help to back this out, I'm here to try >>> patches, or do experiments. >>> >> I forgot to tell -I don't know, maybe it's important-, that I have an >> SMP box (but tried with UP kernel, the effect is the same) and >> compression is enabled on every filesystems. >> > I had frequent deadlocks as you just describe when using GMIRROR on SMP > systems and running with PREEMPTION. > > Could you try a kernel without PREEMPTION? Or perhaps break the GMIRRORs > for testing purposes? > I've had the same effect with UP, and besides swapping, I don't use the gmirror-ed partitions too much (it only holds swap and /, everything else is on ZFS, even /usr, /tmp, /var and others). Also, when the machine stops, processes get stuck in ZFS related states. But I will try (as far as I can remember, I've already did, but I will re-check). From owner-freebsd-fs@FreeBSD.ORG Mon Apr 7 11:06:58 2008 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6D9F81065670 for ; Mon, 7 Apr 2008 11:06:58 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 5DE5F8FC1C for ; Mon, 7 Apr 2008 11:06:58 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m37B6wfG048762 for ; Mon, 7 Apr 2008 11:06:58 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m37B6vb3048758 for freebsd-fs@FreeBSD.org; Mon, 7 Apr 2008 11:06:57 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 7 Apr 2008 11:06:57 GMT Message-Id: <200804071106.m37B6vb3048758@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Apr 2008 11:06:58 -0000 Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o kern/116170 fs [panic] Kernel panic when mounting /tmp o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o bin/122172 fs [amd] [fs]: amd(8) automount daemon dies on 6.3-STABLE 5 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o bin/118249 fs mv(1): moving a directory changes its mtime 6 problems total. From owner-freebsd-fs@FreeBSD.ORG Tue Apr 8 03:40:05 2008 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D8D01106566B for ; Tue, 8 Apr 2008 03:40:05 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C71A08FC2B for ; Tue, 8 Apr 2008 03:40:05 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m383e5nA038854 for ; Tue, 8 Apr 2008 03:40:05 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m383e5tX038853; Tue, 8 Apr 2008 03:40:05 GMT (envelope-from gnats) Date: Tue, 8 Apr 2008 03:40:05 GMT Message-Id: <200804080340.m383e5tX038853@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: John Hein Cc: Subject: Re: bin/122172: [amd] [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: John Hein List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2008 03:40:05 -0000 The following reply was made to PR bin/122172; it has been noted by GNATS. From: John Hein To: bug-followup@FreeBSD.org Cc: Subject: Re: bin/122172: [amd] [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6 Date: Mon, 7 Apr 2008 21:16:37 -0600 This doesn't help your problem directly, but we've been using amd with NIS maps and 6.3/i386 without any problems. What's your configuration? You might have to debug a little further to find out how fp gets set to NULL. You could also try the newer version of am-utils in ports just to see if it behaves differently. Have you tried searching back from your cvsup date to see when it stops seg faulting for you? From owner-freebsd-fs@FreeBSD.ORG Tue Apr 8 06:34:01 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 52EE11065670; Tue, 8 Apr 2008 06:34:01 +0000 (UTC) (envelope-from johan@headweb.com) Received: from core.stromnet.se (core.stromnet.se [83.218.84.131]) by mx1.freebsd.org (Postfix) with ESMTP id 0DE938FC12; Tue, 8 Apr 2008 06:34:00 +0000 (UTC) (envelope-from johan@headweb.com) Received: from localhost (core.stromnet.se [83.218.84.131]) by core.stromnet.se (Postfix) with ESMTP id 7AC6CD46405; Tue, 8 Apr 2008 08:17:42 +0200 (CEST) X-Virus-Scanned: amavisd-new at stromnet.se Received: from core.stromnet.se ([83.218.84.131]) by localhost (core.stromnet.se [83.218.84.135]) (amavisd-new, port 10024) with ESMTP id da9-Akidm75y; Tue, 8 Apr 2008 08:17:38 +0200 (CEST) Received: from johan-mp.stromnet.se (90-224-172-102-no129.tbcn.telia.com [90.224.172.102]) by core.stromnet.se (Postfix) with ESMTP id BE6AFD46412; Tue, 8 Apr 2008 08:17:38 +0200 (CEST) Message-Id: <0B67CBBD-11CB-44C2-807D-5F00654CDD35@headweb.com> From: =?ISO-8859-1?Q?Johan_Str=F6m?= To: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v919.2) Date: Tue, 8 Apr 2008 08:17:38 +0200 X-Mailer: Apple Mail (2.919.2) Cc: Subject: ZFS deadlock X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2008 06:34:01 -0000 Hello A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 =20 disks, 3 mirrors) seems to have gotten stuck. =46rom Ctrl-T: load: 0.50 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] =20 0.02u 0.04s 0% 3404k load: 0.43 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] =20 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] =20 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] =20 0.02u 0.04s 0% 3404k load: 0.11 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] =20 0.02u 0.04s 0% 3404k Worked for a while then that stopped working too (was over ssh). When =20= trying a local login i only got load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k I found one post like this earlier (by Xin LI), but nobody seemed to =20 have replied... in my current conf, I think my kmem/kmem_max is at 512Mb (not sure =20 though, since I've edited my file yesterday for next reboot), with 2G =20= of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. =20 currently it is at default), but since I just got back to 2G total mem =20= after some hardware problems I've been runnig at those lows (1G total =20= is kindof tight with zfs..) Well, just wanted to report... The box is not totally dead yet, ie I =20 can still do Ctrl-T on console, but thats it.. I don't really know =20 what more I can do so.. I don't have KDB/DDB. I'll wait another hour or so before I hard reboot it, unless it =20 "unlocks" or if anyone have any suggestions. Thanks -- Johan Str=F6m Stromnet johan@stromnet.se http://www.stromnet.se/ From owner-freebsd-fs@FreeBSD.ORG Tue Apr 8 07:32:00 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DBA5E106566C; Tue, 8 Apr 2008 07:32:00 +0000 (UTC) (envelope-from jdc@parodius.com) Received: from mx01.sc1.parodius.com (mx01.sc1.parodius.com [72.20.106.3]) by mx1.freebsd.org (Postfix) with ESMTP id C8C8D8FC1E; Tue, 8 Apr 2008 07:32:00 +0000 (UTC) (envelope-from jdc@parodius.com) Received: by mx01.sc1.parodius.com (Postfix, from userid 1000) id B11EB1CC033; Tue, 8 Apr 2008 00:32:00 -0700 (PDT) Date: Tue, 8 Apr 2008 00:32:00 -0700 From: Jeremy Chadwick To: Johan =?iso-8859-1?Q?Str=F6m?= Message-ID: <20080408073200.GA32128@eos.sc1.parodius.com> References: <0B67CBBD-11CB-44C2-807D-5F00654CDD35@headweb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <0B67CBBD-11CB-44C2-807D-5F00654CDD35@headweb.com> User-Agent: Mutt/1.5.17 (2007-11-01) Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: ZFS deadlock X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2008 07:32:01 -0000 On Tue, Apr 08, 2008 at 08:17:38AM +0200, Johan Ström wrote: > Hello > > A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 disks, 3 > mirrors) seems to have gotten stuck. From Ctrl-T: > > load: 0.50 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u > 0.04s 0% 3404k > load: 0.43 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u > 0.04s 0% 3404k > load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u > 0.04s 0% 3404k > load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u > 0.04s 0% 3404k > load: 0.11 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u > 0.04s 0% 3404k > > Worked for a while then that stopped working too (was over ssh). When > trying a local login i only got > > load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k > > I found one post like this earlier (by Xin LI), but nobody seemed to have > replied... > in my current conf, I think my kmem/kmem_max is at 512Mb (not sure though, > since I've edited my file yesterday for next reboot), with 2G of system > RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. currently it is > at default), but since I just got back to 2G total mem after some hardware > problems I've been runnig at those lows (1G total is kindof tight with > zfs..) > > Well, just wanted to report... The box is not totally dead yet, ie I can > still do Ctrl-T on console, but thats it.. I don't really know what more I > can do so.. I don't have KDB/DDB. > I'll wait another hour or so before I hard reboot it, unless it "unlocks" > or if anyone have any suggestions. I don't think there are any suggestions left to give. Many people, including myself, have experienced this kind of problem. It's well- documented both on my Common Issues page, and the official FreeBSD ZFS Wiki. ZFS is still considered highly experimental, so if your data is at all important to you, perform backups or switch to another filesystem provider. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue Apr 8 07:37:42 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0A61C106566C; Tue, 8 Apr 2008 07:37:42 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from tarsier.delphij.net (delphij-pt.tunnel.tserv2.fmt.ipv6.he.net [IPv6:2001:470:1f03:2c9::2]) by mx1.freebsd.org (Postfix) with ESMTP id 82BEE8FC27; Tue, 8 Apr 2008 07:37:41 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from tarsier.geekcn.org (tarsier.geekcn.org [202.108.54.204]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by tarsier.delphij.net (Postfix) with ESMTPS id 578A228448; Tue, 8 Apr 2008 15:37:40 +0800 (CST) Received: from localhost (tarsier.geekcn.org [202.108.54.204]) by tarsier.geekcn.org (Postfix) with ESMTP id A4B33EBBF39; Tue, 8 Apr 2008 15:37:39 +0800 (CST) X-Virus-Scanned: amavisd-new at geekcn.org Received: from tarsier.geekcn.org ([202.108.54.204]) by localhost (mail.geekcn.org [202.108.54.204]) (amavisd-new, port 10024) with ESMTP id RAmcJtc1xAgk; Tue, 8 Apr 2008 15:37:34 +0800 (CST) Received: from li-xins-macbook.lan (c-67-161-39-180.hsd1.ca.comcast.net [67.161.39.180]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tarsier.geekcn.org (Postfix) with ESMTPSA id 884A8EBB100; Tue, 8 Apr 2008 15:37:31 +0800 (CST) DomainKey-Signature: a=rsa-sha1; s=default; d=delphij.net; c=nofws; q=dns; h=message-id:date:from:reply-to:organization:user-agent: mime-version:to:cc:subject:references:in-reply-to: x-enigmail-version:openpgp:content-type; b=AC1kOmRIDEHaq0JTBs1tNmRX8JUPzht3iX+A6qSySbyJP2uN5N1y3aobF9SIZ5u5G PPCP+X/eoPwIFXXneTaVw== Message-ID: <47FB20B5.8050205@delphij.net> Date: Tue, 08 Apr 2008 00:37:25 -0700 From: LI Xin Organization: The FreeBSD Project User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213) MIME-Version: 1.0 To: =?ISO-8859-1?Q?Johan_Str=F6m?= References: <0B67CBBD-11CB-44C2-807D-5F00654CDD35@headweb.com> In-Reply-To: <0B67CBBD-11CB-44C2-807D-5F00654CDD35@headweb.com> X-Enigmail-Version: 0.95.6 OpenPGP: url=http://www.delphij.net/delphij.asc Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enigE795D5CFBD7AB26F932D8DB3" Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek , freebsd-stable@freebsd.org Subject: Re: ZFS deadlock X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: d@delphij.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2008 07:37:42 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigE795D5CFBD7AB26F932D8DB3 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Johan Str=F6m wrote: > Hello >=20 > A box of mine running RELENG_7_0 and ZFS over a couple of disks (6=20 > disks, 3 mirrors) seems to have gotten stuck. From Ctrl-T: >=20 > load: 0.50 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]=20 > 0.02u 0.04s 0% 3404k > load: 0.43 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]=20 > 0.02u 0.04s 0% 3404k > load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]=20 > 0.02u 0.04s 0% 3404k > load: 0.10 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]=20 > 0.02u 0.04s 0% 3404k > load: 0.11 cmd: zsh 40188 [zfs:&buf_hash_table.ht_locks[i].ht_lock]=20 > 0.02u 0.04s 0% 3404k >=20 > Worked for a while then that stopped working too (was over ssh). When=20 > trying a local login i only got >=20 > load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k >=20 > I found one post like this earlier (by Xin LI), but nobody seemed to=20 > have replied... > in my current conf, I think my kmem/kmem_max is at 512Mb (not sure=20 > though, since I've edited my file yesterday for next reboot), with 2G o= f=20 > system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M.=20 > currently it is at default), but since I just got back to 2G total mem = > after some hardware problems I've been runnig at those lows (1G total i= s=20 > kindof tight with zfs..) >=20 > Well, just wanted to report... The box is not totally dead yet, ie I ca= n=20 > still do Ctrl-T on console, but thats it.. I don't really know what mor= e=20 > I can do so.. I don't have KDB/DDB. > I'll wait another hour or so before I hard reboot it, unless it=20 > "unlocks" or if anyone have any suggestions. The key is to increase your kmem and prevent it from being exhausted. I = think more recent OpenSolaris's ZFS code has some improvements but I do=20 not have spare devices at hand to test and debug :( Maybe pjd@ would get a new import at some point? I have cc'ed him. Cheers, --=20 Xin LI http://www.delphij.net/ FreeBSD - The Power to Serve! --------------enigE795D5CFBD7AB26F932D8DB3 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH+yC1OfuToMruuMARCqN0AKCIKKc84mc47mc70QEHXgI3cbIzlACfclIE OCVHk4KNeYm7i6JdbM+7dkI= =yO3F -----END PGP SIGNATURE----- --------------enigE795D5CFBD7AB26F932D8DB3-- From owner-freebsd-fs@FreeBSD.ORG Tue Apr 8 07:38:02 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 49E171065671; Tue, 8 Apr 2008 07:38:02 +0000 (UTC) (envelope-from johan@headweb.com) Received: from core.stromnet.se (core.stromnet.se [83.218.84.131]) by mx1.freebsd.org (Postfix) with ESMTP id E00BA8FC21; Tue, 8 Apr 2008 07:38:01 +0000 (UTC) (envelope-from johan@headweb.com) Received: from localhost (core.stromnet.se [83.218.84.131]) by core.stromnet.se (Postfix) with ESMTP id 68A90D4640C; Tue, 8 Apr 2008 09:38:00 +0200 (CEST) X-Virus-Scanned: amavisd-new at stromnet.se Received: from core.stromnet.se ([83.218.84.131]) by localhost (core.stromnet.se [83.218.84.135]) (amavisd-new, port 10024) with ESMTP id Xia7q9CnVQxy; Tue, 8 Apr 2008 09:37:58 +0200 (CEST) Received: from johan-mp.stromnet.se (90-224-172-102-no129.tbcn.telia.com [90.224.172.102]) by core.stromnet.se (Postfix) with ESMTP id 00D5CD4640F; Tue, 8 Apr 2008 09:37:57 +0200 (CEST) Message-Id: From: =?ISO-8859-1?Q?Johan_Str=F6m?= To: Jeremy Chadwick In-Reply-To: <20080408073200.GA32128@eos.sc1.parodius.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v919.2) Date: Tue, 8 Apr 2008 09:37:57 +0200 References: <0B67CBBD-11CB-44C2-807D-5F00654CDD35@headweb.com> <20080408073200.GA32128@eos.sc1.parodius.com> X-Mailer: Apple Mail (2.919.2) Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: ZFS deadlock X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2008 07:38:02 -0000 On Apr 8, 2008, at 9:32 AM, Jeremy Chadwick wrote: > On Tue, Apr 08, 2008 at 08:17:38AM +0200, Johan Str=F6m wrote: >> Hello >> >> A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 =20= >> disks, 3 >> mirrors) seems to have gotten stuck. =46rom Ctrl-T: >> >> load: 0.50 cmd: zsh 40188 =20 >> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u >> 0.04s 0% 3404k >> load: 0.43 cmd: zsh 40188 =20 >> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u >> 0.04s 0% 3404k >> load: 0.10 cmd: zsh 40188 =20 >> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u >> 0.04s 0% 3404k >> load: 0.10 cmd: zsh 40188 =20 >> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u >> 0.04s 0% 3404k >> load: 0.11 cmd: zsh 40188 =20 >> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u >> 0.04s 0% 3404k >> >> Worked for a while then that stopped working too (was over ssh). When >> trying a local login i only got >> >> load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k >> >> I found one post like this earlier (by Xin LI), but nobody seemed =20 >> to have >> replied... >> in my current conf, I think my kmem/kmem_max is at 512Mb (not sure =20= >> though, >> since I've edited my file yesterday for next reboot), with 2G of =20 >> system >> RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. =20 >> currently it is >> at default), but since I just got back to 2G total mem after some =20 >> hardware >> problems I've been runnig at those lows (1G total is kindof tight =20 >> with >> zfs..) >> >> Well, just wanted to report... The box is not totally dead yet, ie =20= >> I can >> still do Ctrl-T on console, but thats it.. I don't really know what =20= >> more I >> can do so.. I don't have KDB/DDB. >> I'll wait another hour or so before I hard reboot it, unless it =20 >> "unlocks" >> or if anyone have any suggestions. > > I don't think there are any suggestions left to give. Many people, > including myself, have experienced this kind of problem. It's well- > documented both on my Common Issues page, and the official FreeBSD ZFS > Wiki. Ah.. I guess I was just to restrictive with the googling on =20 "zfs:&buf_hash_table.ht_locks[i].ht_lock". > > > ZFS is still considered highly experimental, so if your data is at all > important to you, perform backups or switch to another filesystem > provider. That I am aware of. Thanks.= From owner-freebsd-fs@FreeBSD.ORG Tue Apr 8 07:40:20 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 33EE41065688; Tue, 8 Apr 2008 07:40:20 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from tarsier.delphij.net (delphij-pt.tunnel.tserv2.fmt.ipv6.he.net [IPv6:2001:470:1f03:2c9::2]) by mx1.freebsd.org (Postfix) with ESMTP id C2F428FC16; Tue, 8 Apr 2008 07:40:19 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from tarsier.geekcn.org (tarsier.geekcn.org [202.108.54.204]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by tarsier.delphij.net (Postfix) with ESMTPS id AC78228448; Tue, 8 Apr 2008 15:40:18 +0800 (CST) Received: from localhost (tarsier.geekcn.org [202.108.54.204]) by tarsier.geekcn.org (Postfix) with ESMTP id 78D33EBBF3C; Tue, 8 Apr 2008 15:40:18 +0800 (CST) X-Virus-Scanned: amavisd-new at geekcn.org Received: from tarsier.geekcn.org ([202.108.54.204]) by localhost (mail.geekcn.org [202.108.54.204]) (amavisd-new, port 10024) with ESMTP id WR+e4fmOZzmE; Tue, 8 Apr 2008 15:40:13 +0800 (CST) Received: from li-xins-macbook.lan (c-67-161-39-180.hsd1.ca.comcast.net [67.161.39.180]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tarsier.geekcn.org (Postfix) with ESMTPSA id 18216EBBF44; Tue, 8 Apr 2008 15:40:11 +0800 (CST) DomainKey-Signature: a=rsa-sha1; s=default; d=delphij.net; c=nofws; q=dns; h=message-id:date:from:reply-to:organization:user-agent: mime-version:to:cc:subject:references:in-reply-to: x-enigmail-version:openpgp:content-type; b=aTZySrHHg/Pm9Z8vL6h3eX2k2e83FEnDldHrx+z0ImHgLXyljoGBLsu6P1d/uB7jQ Q4saVG4APpfrbOO7pRrOQ== Message-ID: <47FB2155.1030106@delphij.net> Date: Tue, 08 Apr 2008 00:40:05 -0700 From: LI Xin Organization: The FreeBSD Project User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213) MIME-Version: 1.0 To: =?ISO-8859-1?Q?Johan_Str=F6m?= References: <0B67CBBD-11CB-44C2-807D-5F00654CDD35@headweb.com> In-Reply-To: <0B67CBBD-11CB-44C2-807D-5F00654CDD35@headweb.com> X-Enigmail-Version: 0.95.6 OpenPGP: url=http://www.delphij.net/delphij.asc Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enigC768FF74A1AA2FF26112D23A" Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: ZFS deadlock X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: d@delphij.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2008 07:40:20 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigC768FF74A1AA2FF26112D23A Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable For your question: just reboot would be fine, you may want to tune your=20 arc size (to be smaller) and kmem space (to be larger), which would=20 reduce the chance that this would happen, or eliminate it, depending on=20 your workload. This situation is not recoverable and you can trust ZFS that you will=20 not lose data if they are already sync'ed. --=20 Xin LI http://www.delphij.net/ FreeBSD - The Power to Serve! --------------enigC768FF74A1AA2FF26112D23A Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH+yFVOfuToMruuMARCsn6AJ9+gLwO6qE1EMh88KrHzoTPUqfLWwCeP7cJ AGlkPJ5DNkNw172KJ/bapKs= =uROd -----END PGP SIGNATURE----- --------------enigC768FF74A1AA2FF26112D23A-- From owner-freebsd-fs@FreeBSD.ORG Tue Apr 8 07:42:21 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4A92E106566B; Tue, 8 Apr 2008 07:42:21 +0000 (UTC) (envelope-from johan@headweb.com) Received: from core.stromnet.se (core.stromnet.se [83.218.84.131]) by mx1.freebsd.org (Postfix) with ESMTP id 02CAA8FC27; Tue, 8 Apr 2008 07:42:20 +0000 (UTC) (envelope-from johan@headweb.com) Received: from localhost (core.stromnet.se [83.218.84.131]) by core.stromnet.se (Postfix) with ESMTP id 1D0A2D46414; Tue, 8 Apr 2008 09:42:20 +0200 (CEST) X-Virus-Scanned: amavisd-new at stromnet.se Received: from core.stromnet.se ([83.218.84.131]) by localhost (core.stromnet.se [83.218.84.135]) (amavisd-new, port 10024) with ESMTP id aHQhOZyssf6f; Tue, 8 Apr 2008 09:42:16 +0200 (CEST) Received: from johan-mp.stromnet.se (90-224-172-102-no129.tbcn.telia.com [90.224.172.102]) by core.stromnet.se (Postfix) with ESMTP id 4B0F8D46418; Tue, 8 Apr 2008 09:42:16 +0200 (CEST) Message-Id: From: =?ISO-8859-1?Q?Johan_Str=F6m?= To: d@delphij.net In-Reply-To: <47FB20B5.8050205@delphij.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v919.2) Date: Tue, 8 Apr 2008 09:42:15 +0200 References: <0B67CBBD-11CB-44C2-807D-5F00654CDD35@headweb.com> <47FB20B5.8050205@delphij.net> X-Mailer: Apple Mail (2.919.2) Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: ZFS deadlock X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2008 07:42:21 -0000 On Apr 8, 2008, at 9:37 AM, LI Xin wrote: > Johan Str=F6m wrote: >> Hello >> A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 =20= >> disks, 3 mirrors) seems to have gotten stuck. =46rom Ctrl-T: >> load: 0.50 cmd: zsh 40188 =20 >> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k >> load: 0.43 cmd: zsh 40188 =20 >> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k >> load: 0.10 cmd: zsh 40188 =20 >> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k >> load: 0.10 cmd: zsh 40188 =20 >> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k >> load: 0.11 cmd: zsh 40188 =20 >> [zfs:&buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k >> Worked for a while then that stopped working too (was over ssh). =20 >> When trying a local login i only got >> load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k >> I found one post like this earlier (by Xin LI), but nobody seemed =20 >> to have replied... >> in my current conf, I think my kmem/kmem_max is at 512Mb (not sure =20= >> though, since I've edited my file yesterday for next reboot), with =20= >> 2G of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of =20 >> 512M. currently it is at default), but since I just got back to 2G =20= >> total mem after some hardware problems I've been runnig at those =20 >> lows (1G total is kindof tight with zfs..) >> Well, just wanted to report... The box is not totally dead yet, ie =20= >> I can still do Ctrl-T on console, but thats it.. I don't really =20 >> know what more I can do so.. I don't have KDB/DDB. >> I'll wait another hour or so before I hard reboot it, unless it =20 >> "unlocks" or if anyone have any suggestions. > > The key is to increase your kmem and prevent it from being =20 > exhausted. I think more recent OpenSolaris's ZFS code has some =20 > improvements but I do not have spare devices at hand to test and =20 > debug :( Yep, never had the problem when I was running with 2G total mem, but =20 then one stick (damn consumer crap) failed and I was left with 1G, and =20= I started to have random problems. Going to tune kmem back up now when =20= I got more mem again, thinking about putting in 4G too.. > > > Maybe pjd@ would get a new import at some point? I have cc'ed him. > > Cheers, > --=20 > Xin LI http://www.delphij.net/ > FreeBSD - The Power to Serve! > From owner-freebsd-fs@FreeBSD.ORG Tue Apr 8 07:55:27 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A3740106564A; Tue, 8 Apr 2008 07:55:27 +0000 (UTC) (envelope-from johan@headweb.com) Received: from core.stromnet.se (core.stromnet.se [83.218.84.131]) by mx1.freebsd.org (Postfix) with ESMTP id 5CC908FC29; Tue, 8 Apr 2008 07:55:27 +0000 (UTC) (envelope-from johan@headweb.com) Received: from localhost (core.stromnet.se [83.218.84.131]) by core.stromnet.se (Postfix) with ESMTP id 9A0CFD4640C; Tue, 8 Apr 2008 09:55:26 +0200 (CEST) X-Virus-Scanned: amavisd-new at stromnet.se Received: from core.stromnet.se ([83.218.84.131]) by localhost (core.stromnet.se [83.218.84.135]) (amavisd-new, port 10024) with ESMTP id IawDjCJo1AZA; Tue, 8 Apr 2008 09:55:22 +0200 (CEST) Received: from johan-mp.stromnet.se (90-224-172-102-no129.tbcn.telia.com [90.224.172.102]) by core.stromnet.se (Postfix) with ESMTP id CB8A4D46405; Tue, 8 Apr 2008 09:55:22 +0200 (CEST) Message-Id: <3886278B-F65A-44BD-8307-C9889727FEA3@headweb.com> From: =?ISO-8859-1?Q?Johan_Str=F6m?= To: d@delphij.net In-Reply-To: <47FB2155.1030106@delphij.net> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v919.2) Date: Tue, 8 Apr 2008 09:55:22 +0200 References: <0B67CBBD-11CB-44C2-807D-5F00654CDD35@headweb.com> <47FB2155.1030106@delphij.net> X-Mailer: Apple Mail (2.919.2) Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: ZFS deadlock X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2008 07:55:27 -0000 On Apr 8, 2008, at 9:40 AM, LI Xin wrote: > For your question: just reboot would be fine, you may want to tune > your arc size (to be smaller) and kmem space (to be larger), which > would reduce the chance that this would happen, or eliminate it, > depending on your workload. Back online now, with kmem/kmem_max to 1G and arcsize to 512M. Are those reasonable on a 2G machine? I think I've read that from somewhere, but cannot find that (arc at least) in the TuningGuide now. > > This situation is not recoverable and you can trust ZFS that you > will not lose data if they are already sync'ed. > Actually, I've had a lot of hard crashes lately on this machine (bad hw) but not a single time I have lost data (to my knowledge at least...). In that regard, comparing to UFS, ZFS is waaay better! :) > -- > Xin LI http://www.delphij.net/ > FreeBSD - The Power to Serve! > From owner-freebsd-fs@FreeBSD.ORG Tue Apr 8 14:30:22 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 69A6910656C4 for ; Tue, 8 Apr 2008 14:30:22 +0000 (UTC) (envelope-from ender@enderzone.com) Received: from www.ksdhost.com (www.ksdhost.com [75.126.66.82]) by mx1.freebsd.org (Postfix) with ESMTP id 1AF198FC13 for ; Tue, 8 Apr 2008 14:30:22 +0000 (UTC) (envelope-from ender@enderzone.com) Received: (qmail 25229 invoked from network); 8 Apr 2008 10:30:21 -0400 Received: from unknown (HELO ?192.168.1.6?) (206.48.228.163) by www.ksdhost.com with SMTP; 8 Apr 2008 10:30:21 -0400 Message-ID: <47FB7355.4060802@enderzone.com> Date: Tue, 08 Apr 2008 09:29:57 -0400 From: Ender User-Agent: Thunderbird 2.0.0.12 (Windows/20080213) MIME-Version: 1.0 To: =?ISO-8859-1?Q?Johan_Str=F6m?= References: <0B67CBBD-11CB-44C2-807D-5F00654CDD35@headweb.com> <47FB2155.1030106@delphij.net> <3886278B-F65A-44BD-8307-C9889727FEA3@headweb.com> In-Reply-To: <3886278B-F65A-44BD-8307-C9889727FEA3@headweb.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: ZFS deadlock X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2008 14:30:22 -0000 Johan Ström wrote: > On Apr 8, 2008, at 9:40 AM, LI Xin wrote: > >> For your question: just reboot would be fine, you may want to tune >> your arc size (to be smaller) and kmem space (to be larger), which >> would reduce the chance that this would happen, or eliminate it, >> depending on your workload. > > Back online now, with kmem/kmem_max to 1G and arcsize to 512M. Are > those reasonable on a 2G machine? I think I've read that from > somewhere, but cannot find that (arc at least) in the TuningGuide now. > Depending on your work load you are just buying more time, so "reasonable" is a matter of perspective. :( I didn't see if you said you are on 32bit or 64bit? Keep in mind the kmem max is 1.5-2G on amd64 regardless of how much memory you have. If 512M arcsize crashes too soon for your tastes you can always lower it down to 256M, or 128M, etc. From owner-freebsd-fs@FreeBSD.ORG Tue Apr 8 16:26:30 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3E3A91065671 for ; Tue, 8 Apr 2008 16:26:30 +0000 (UTC) (envelope-from ender@enderzone.com) Received: from www.ksdhost.com (www.ksdhost.com [75.126.66.82]) by mx1.freebsd.org (Postfix) with ESMTP id E426E8FC1F for ; Tue, 8 Apr 2008 16:26:29 +0000 (UTC) (envelope-from ender@enderzone.com) Received: (qmail 34570 invoked from network); 8 Apr 2008 12:26:29 -0400 Received: from unknown (HELO ?192.168.1.6?) (206.48.228.163) by www.ksdhost.com with SMTP; 8 Apr 2008 12:26:29 -0400 Message-ID: <47FB8E8D.1030801@enderzone.com> Date: Tue, 08 Apr 2008 11:26:05 -0400 From: Ender User-Agent: Thunderbird 2.0.0.12 (Windows/20080213) MIME-Version: 1.0 To: Spike Ilacqua References: <0B67CBBD-11CB-44C2-807D-5F00654CDD35@headweb.com> <47FB2155.1030106@delphij.net> <3886278B-F65A-44BD-8307-C9889727FEA3@headweb.com> <47FB7355.4060802@enderzone.com> <47FB99AC.7080504@indra.com> In-Reply-To: <47FB99AC.7080504@indra.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: ZFS deadlock X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2008 16:26:30 -0000 Spike Ilacqua wrote: >> Depending on your work load you are just buying more time, so >> "reasonable" is a matter of perspective. :( I didn't see if you said >> you are on 32bit or 64bit? Keep in mind the kmem max is 1.5-2G on >> amd64 regardless of how much memory you have. If 512M arcsize crashes >> too soon for your tastes you can always lower it down to 256M, or >> 128M, etc. > > I tried for several weeks to get ZFS stable on a 64bit system with a > 1.5G kernel. The best uptime I ever got was 72 hours, the worst was > 2, the average about 24. Interestingly, most of the hangs were at off > hours, when the system was lightly loaded, had lots of free memory, > etc. That suggests to me a slow leak of some sort. > > Anyway, ZFS is not ready for production. Some people may get lucky, > but you can't count on it. > > Spike Very intresting. With 1.5G of kmem and a 64M arc_max the best uptime I had was 5 days, worst 1 day. Also most of my crashes are off hours as well. Another tidbit of information running things out of /tank instead of /tank/foo/bar/foo seems to lead to longer uptime, you might want to try that as well. From owner-freebsd-fs@FreeBSD.ORG Tue Apr 8 16:39:49 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0389A1065681; Tue, 8 Apr 2008 16:39:49 +0000 (UTC) (envelope-from spike@indra.com) Received: from smtp.indra.com (smtp.indra.com [209.169.0.20]) by mx1.freebsd.org (Postfix) with ESMTP id 45EA48FC2D; Tue, 8 Apr 2008 16:39:48 +0000 (UTC) (envelope-from spike@indra.com) Received: from coke.indra.com (coke.indra.com [209.169.23.199]) (authenticated bits=0) by smtp.indra.com (8.13.8/8.13.8) with ESMTP id m38GK4Am060045 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Tue, 8 Apr 2008 10:20:04 -0600 (MDT) (envelope-from spike@indra.com) Message-ID: <47FB99AC.7080504@indra.com> Date: Tue, 08 Apr 2008 10:13:32 -0600 From: Spike Ilacqua User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213) MIME-Version: 1.0 To: Ender References: <0B67CBBD-11CB-44C2-807D-5F00654CDD35@headweb.com> <47FB2155.1030106@delphij.net> <3886278B-F65A-44BD-8307-C9889727FEA3@headweb.com> <47FB7355.4060802@enderzone.com> In-Reply-To: <47FB7355.4060802@enderzone.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org, =?ISO-8859-1?Q?Johan_Str=F6m?= Subject: Re: ZFS deadlock X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2008 16:39:49 -0000 > Depending on your work load you are just buying more time, so > "reasonable" is a matter of perspective. :( I didn't see if you said > you are on 32bit or 64bit? Keep in mind the kmem max is 1.5-2G on amd64 > regardless of how much memory you have. If 512M arcsize crashes too soon > for your tastes you can always lower it down to 256M, or 128M, etc. I tried for several weeks to get ZFS stable on a 64bit system with a 1.5G kernel. The best uptime I ever got was 72 hours, the worst was 2, the average about 24. Interestingly, most of the hangs were at off hours, when the system was lightly loaded, had lots of free memory, etc. That suggests to me a slow leak of some sort. Anyway, ZFS is not ready for production. Some people may get lucky, but you can't count on it. Spike From owner-freebsd-fs@FreeBSD.ORG Tue Apr 8 17:35:27 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 78C4B106564A for ; Tue, 8 Apr 2008 17:35:27 +0000 (UTC) (envelope-from peter@pean.org) Received: from proxy1.bredband.net (proxy1.bredband.net [195.54.101.71]) by mx1.freebsd.org (Postfix) with ESMTP id 34C188FC25 for ; Tue, 8 Apr 2008 17:35:26 +0000 (UTC) (envelope-from peter@pean.org) Received: from ironport.bredband.com (195.54.101.120) by proxy1.bredband.net (7.3.127) id 47E10B8B005FAF77 for freebsd-fs@freebsd.org; Tue, 8 Apr 2008 19:15:21 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ah9KAC9F+0dV4WMOPGdsb2JhbACBXI9jAQEBATCaQA Received: from c-0e63e155.166-7-64736c14.cust.bredbandsbolaget.se (HELO pi.pean.org) ([85.225.99.14]) by ironport1.bredband.com with ESMTP; 08 Apr 2008 19:15:21 +0200 Message-Id: <68CDE3E3-DB02-470B-9BDB-81DB01A431F3@pean.org> From: =?ISO-8859-1?Q?Peter_Ankerst=E5l?= To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v919.2) Date: Tue, 8 Apr 2008 19:15:20 +0200 X-Mailer: Apple Mail (2.919.2) Subject: ZFS eats up 3GB of ram then the machine starts dropping connections X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2008 17:35:27 -0000 Hi, I have a machine with a 1.3T pool, hosting about 10 jails.. When the periodic daily was running ZFS began to eat memory, up to 3GB (out of the installed 4GB). Something like 200MB Inuse, 3GB Wired and just a few MB Free. After that the machine was getting really laggy and began dropping connections. httpd was giving half web-pages and sshd was shutting down the connections with some error message "Invalid package size" or "Invalid MAC" I tried to look it up and found that vfs.zfs.prefetch_disable="1" could help, but problem remains. It seems like zfs takes all the memory it could possibly get and then refuses to give it back. I've tried different approaches to replicate this porblem and something like ls -R / would do exactly the same thing. It starts eating ram shitloads of ram, and when I stop... I also stops eating ram but the amout Wired ram stays there. Doesnt go up or (down) until you do another ls or find. What about the machine then: # uname -a FreeBSD ninja.jails.se 7.0-RELEASE FreeBSD 7.0-RELEASE #0: Sun Feb 24 10:35:36 UTC 2008 root@driscoll.cse.buffalo.edu:/usr/obj/usr/src/ sys/GENERIC amd64 # cat /boot/loader.conf vfs.zfs.prefetch_disable="1" vm.kmem_size_max="1073741824" vm.kmem_size="1073741824" # cat /etc/sysctl.conf kern.maxvnodes=400000 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz (2401.93-MHz K8- class CPU) Origin = "GenuineIntel" Id = 0x6fb Stepping = 11 Features = 0xbfebfbff < FPU ,VME ,DE ,PSE ,TSC ,MSR ,PAE ,MCE ,CX8 ,APIC ,SEP ,MTRR ,PGE ,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0xe3bd AMD Features=0x20100800 AMD Features2=0x1 Cores per package: 4 usable memory = 3744874496 (3571 MB) avail memory = 3590160384 (3423 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 From owner-freebsd-fs@FreeBSD.ORG Tue Apr 8 18:00:03 2008 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8CC401065678 for ; Tue, 8 Apr 2008 18:00:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 7AD2A8FC13 for ; Tue, 8 Apr 2008 18:00:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m38I03Fb024425 for ; Tue, 8 Apr 2008 18:00:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m38I03rB024424; Tue, 8 Apr 2008 18:00:03 GMT (envelope-from gnats) Date: Tue, 8 Apr 2008 18:00:03 GMT Message-Id: <200804081800.m38I03rB024424@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: John Hein Cc: Subject: Re: bin/122172: [amd] [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: John Hein List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2008 18:00:03 -0000 The following reply was made to PR bin/122172; it has been noted by GNATS. From: John Hein To: Lee Damon Cc: bug-followup@FreeBSD.org Subject: Re: bin/122172: [amd] [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6 Date: Tue, 8 Apr 2008 11:52:18 -0600 Lee Damon wrote at 09:40 -0700 on Apr 8, 2008: > John Hein wrote: > > This doesn't help your problem directly, but we've been using amd with > > NIS maps and 6.3/i386 without any problems. What's your configuration? > > The maps are flat files but we use LDAP. > > > You could also try the newer version of am-utils in ports just > > to see if it behaves differently. > > thanks for the hints. Sadly the version in the ports tree tied the same > horrible death. You should put that information in the PR (CC restored). > > Have you tried searching back from your cvsup date to see when > > it stops seg faulting for you? > > These are production machines, I can't take them down for the time it > would take to do that :( Unfortunately, all I have are debugging suggestions... - Bring up a non-production machine to play with. - Bring up a virtual machine or jail to play with. - Start with a bare bones amd config (e.g., without anything but the default maps & .conf files). If there's no core dump, then add back parts of your config until it dies. - Compile amd with debug on and turn up the debug level to see if you get any hints. - Trace deeper into the code to find the source of the null ptr. - Try asking on the am-utils mailing list. From owner-freebsd-fs@FreeBSD.ORG Tue Apr 8 18:50:03 2008 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BD9221065671 for ; Tue, 8 Apr 2008 18:50:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id AC7D08FC23 for ; Tue, 8 Apr 2008 18:50:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m38Io3xR028487 for ; Tue, 8 Apr 2008 18:50:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m38Io3GO028486; Tue, 8 Apr 2008 18:50:03 GMT (envelope-from gnats) Date: Tue, 8 Apr 2008 18:50:03 GMT Message-Id: <200804081850.m38Io3GO028486@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Lee Damon Cc: Subject: Re: bin/122172: [amd] [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Lee Damon List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2008 18:50:03 -0000 The following reply was made to PR bin/122172; it has been noted by GNATS. From: Lee Damon To: bug-followup@FreeBSD.org, nomad@crow.ee.washington.edu Cc: Subject: Re: bin/122172: [amd] [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6 Date: Tue, 08 Apr 2008 10:59:27 -0700 > You could also try the newer version of am-utils in ports just > to see if it behaves differently. Just tried, same failure (exited with signal 10). Corefile & binary are available if you want them but the port compile defaulted to no debugging and I forgot to turn it on so there's not a lot of information there. Since these are both production machines and amd crashing requires the host to reboot I can't easily test again. nomad