From owner-freebsd-fs@FreeBSD.ORG Wed Sep 22 10:38:16 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 26B8E106566C for ; Wed, 22 Sep 2010 10:38:16 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id D17DC8FC19 for ; Wed, 22 Sep 2010 10:38:15 +0000 (UTC) Received: by people.fsn.hu (Postfix, from userid 1001) id 85A94459AF6; Wed, 22 Sep 2010 12:38:13 +0200 (CEST) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR: 8.8340] X-CRM114-CacheID: sfid-20100922_12380_C5EA89E0 X-CRM114-Status: Good ( pR: 8.8340 ) X-DSPAM-Result: Whitelisted X-DSPAM-Processed: Wed Sep 22 12:38:13 2010 X-DSPAM-Confidence: 0.8406 X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 4c99dc95682441175099450 X-DSPAM-Factors: 27, From*Attila Nagy , 0.00058, To*FreeBSD.org, 0.00268, 8+STABLE, 0.01000, STABLE, 0.01000, does+a, 0.01000, worse, 0.01000, a+while, 0.99000, (for, 0.01000, disks, 0.01000, disks, 0.01000, but+having, 0.01000, the+system, 0.01845, queue, 0.02245, queue, 0.02245, slow, 0.02519, slow, 0.02519, says, 0.02868, does, 0.03057, load, 0.03330, I+can't, 0.03970, I+can't, 0.03970, nearly, 0.03970, error, 0.04041, I+don't, 0.04601, can't, 0.04691, can't, 0.04691, X-Spambayes-Classification: ham; 0.00 Message-ID: <4C99DC90.70208@fsn.hu> Date: Wed, 22 Sep 2010 12:38:08 +0200 From: Attila Nagy User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.23) Gecko/20090817 Thunderbird/2.0.0.23 Mnenhy/0.7.6.0 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: zcolli (zcollide) state, what does znode dying means? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Sep 2010 10:38:16 -0000 Hello, I have a machine, which is heavily hammered with file system operations, running a very recent 8-STABLE. The symptom is that everything works fine for a few minutes, then a lot of processes get into zcolli state (according to top). At that there there are two outcomes: 1. the disks calm down for a while (for long seconds, there is no, or very small amount of IO, verified with gstat), top shows nearly 100% system, a lot of processes are on the run queue (load is in the sky, around 300 and 1000), all operations stop, top refreshes, but I can't really execute new programs, then suddenly the zcolli states change and the IO resumes and the run queue decreases. 2. the system remains in this state, after 5-10 minutes there is still no change, only a reset helps (doesn't even react to CTRL-ALT-DEL, but running programs, like top still refreshes, but no disk IO can be made) zcollide state only appears here: http://fxr.watson.org/fxr/source/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c#L915 which says this is due to a dying znode. My question is: what does a dying znode mean? I don't think it's related to the on-disk structure, because the disks seem to be healthy, respond quickly (or at least evenly slow, due to the load, I can't see a disk, which would have a read error, or slow responses). Having slowdowns due to this is bad, but having lockups is a lot more worse... Thanks,