From owner-freebsd-fs@FreeBSD.ORG Wed Feb 3 10:39:33 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 57F45106566C for ; Wed, 3 Feb 2010 10:39:33 +0000 (UTC) (envelope-from mgamsjager@gmail.com) Received: from mail-bw0-f211.google.com (mail-bw0-f211.google.com [209.85.218.211]) by mx1.freebsd.org (Postfix) with ESMTP id D1EB08FC08 for ; Wed, 3 Feb 2010 10:39:32 +0000 (UTC) Received: by bwz3 with SMTP id 3so917803bwz.13 for ; Wed, 03 Feb 2010 02:39:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=5aPaurj8z78sLxcBo5vrMtKyYWHM90RIibdgwynFD3k=; b=Ht+j4XtKXvkb5ch/Q37pfZnvDDXZCZYi80SmVJczwe3e4Dj8vsPAAuLd4/WdK31vA4 RuLR/EstYJ9CDX+iLrk/2psl8Avxl3F4JeP84pujYdiU1FbFK4eVJsNhn5MtLPWqEv8x D4sSFwkVBjOabIIEw5BJryVu9qGjgExTIhvb8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=i9keD6Io8rAJDnBtTQUVmlIYHZDTD/BjA3qrao5SzQkZLMiYdRbOFTxeVMRU9F1kQr YB03hkNUPGfl0nIugmg8W8bkNwvd4p8ubXzELG2iPFFqTYu4ZfiHJZgYHTo4Q4Nm65RO czygwgkEqip41ZG6Pzv87fjAk9ftBP0g9LiQQ= MIME-Version: 1.0 Received: by 10.204.8.154 with SMTP id h26mr5236269bkh.113.1265193571497; Wed, 03 Feb 2010 02:39:31 -0800 (PST) In-Reply-To: <4B694689.2030704@fsn.hu> References: <4B694689.2030704@fsn.hu> From: Matthias Gamsjager Date: Wed, 3 Feb 2010 11:39:11 +0100 Message-ID: <585602e11002030239y3da31f7bkbf593a04950c351e@mail.gmail.com> To: Attila Nagy Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: Machine stops for some seconds with ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Feb 2010 10:39:33 -0000 What's the point in having a cache device that is slower then the harddisks itself? could you please try the build without the slow cache device? On Wed, Feb 3, 2010 at 10:48 AM, Attila Nagy wrote: > Hello, > > After a long time, I've switched back to ZFS on my desktop. It runs > 8-STABLE/amd64 with two SATA disks and an USB pendrive. > One-one partition is used from each disk for the zpool, which is encrypte= d > using GELI, and the pendrive is there for L2ARC: > =A0 NAME =A0 =A0 =A0 =A0 =A0 =A0STATE =A0 =A0 READ WRITE CKSUM > =A0 data =A0 =A0 =A0 =A0 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0 =A0 mirror =A0 =A0 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0 =A0 =A0 ad0s1d.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0 =A0 =A0 ad1s1d.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0 cache > =A0 =A0 da0 =A0 =A0 =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > > Today, after 12 days of uptime the machine has frozen. I could ping it fr= om > a different machine, even could open a telnet to its ssh port, but I > couldn't get the ssh banner. > > Now I'm building a 9-CURRENT kernel and world to see whether the same > problem persists with that, and during the make process I've noticed a > strange thing. > I build with -j4 (the machine has one dual core CPU), so the fans are > screaming during the process. But every few minutes (I couldn't recognize > any patterns in it) the machine goes completely silent (even more silent > than normally), and everything halts. > During this, the top running on the machine can refresh itself, and I can > type on pass through ssh connections (that is, I use the machine in quest= ion > to access other machines with ssh), but I can't open new ssh connections = to > it, and can't start anything new (for example from an open shell). > ping is running seamlessly during this, and top shows the following: > > last pid: 36503; =A0load averages: =A01.59, =A03.04, =A03.01 =A0 =A0up 0+= 00:49:53 > =A010:32:10 > 97 processes: =A01 running, 96 sleeping > CPU: =A00.0% user, =A00.0% nice, =A00.0% system, =A00.0% interrupt, =A010= 0% idle > Mem: 218M Active, 24M Inact, 639M Wired, 40M Cache, 6208K Buf, 1022M Free > Swap: 4096M Total, 4096M Free > > =A0PID USERNAME =A0 =A0THR PRI NICE =A0 SIZE =A0 =A0RES STATE =A0 C =A0 T= IME =A0 WCPU COMMAND > 1342 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 620K select = =A00 =A0 0:02 =A00.00% make > 1424 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A01036K select = =A00 =A0 0:01 =A00.00% make > 1280 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 12540K =A01900K select =A00= =A0 0:01 =A00.00% > hald-addon-storage > 1234 haldaemon =A0 =A0 1 =A044 =A0 =A00 24116K =A04464K select =A00 =A0 0= :01 =A00.00% hald > 93600 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A01028K select = =A00 =A0 0:00 =A00.00% make > 1260 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 19704K =A02688K select =A00= =A0 0:00 =A00.00% > hald-addon-mouse-sy > 15142 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 =A09332K =A02864K CPU0 =A0= =A00 =A0 0:00 =A00.00% top > 1263 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 12540K =A01896K cgticb =A00= =A0 0:00 =A00.00% > hald-addon-storage > 94415 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 37944K =A04992K select =A0= 1 =A0 0:00 =A00.00% sshd > 35837 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A05252K =A02424K select = =A01 =A0 0:00 =A00.00% make > 95361 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 37944K =A04992K select =A0= 1 =A0 0:00 =A00.00% sshd > 35973 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A01772K select = =A00 =A0 0:00 =A00.00% make > =A0608 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A06892K =A01436K select = =A01 =A0 0:00 =A00.00% syslogd > 96928 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 728K select = =A00 =A0 0:00 =A00.00% make > 94369 root =A0 =A0 =A0 =A0 =A01 =A051 =A0 =A00 37944K =A04584K sbwait =A0= 0 =A0 0:00 =A00.00% sshd > 82631 root =A0 =A0 =A0 =A0 =A01 =A050 =A0 =A00 37944K =A04584K sbwait =A0= 0 =A0 0:00 =A00.00% sshd > 16304 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 37944K =A04576K zio->i =A0= 1 =A0 0:00 =A00.00% sshd > =A0951 _ntp =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A06876K =A01692K select = =A00 =A0 0:00 =A00.00% ntpd > 1238 root =A0 =A0 =A0 =A0 =A01 =A076 =A0 =A00 16768K =A02372K select =A00= =A0 0:00 =A00.00% > hald-runner > 4916 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 728K select = =A01 =A0 0:00 =A00.00% make > 95338 root =A0 =A0 =A0 =A0 =A01 =A049 =A0 =A00 37944K =A04584K sbwait =A0= 1 =A0 0:00 =A00.00% sshd > 1259 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 10280K =A02712K pause =A0 1= =A0 0:00 =A00.00% csh > 33357 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 21596K =A04004K select =A0= 0 =A0 0:00 =A00.00% ssh > 16405 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 37944K =A05012K zio->i =A0= 0 =A0 0:00 =A00.00% sshd > 1044 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A09104K =A01796K kqread = =A00 =A0 0:00 =A00.00% master > 34765 root =A0 =A0 =A0 =A0 =A01 =A076 =A0 =A00 =A08260K =A01764K wait =A0= =A01 =A0 0:00 =A00.00% sh > 82685 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 37944K =A04960K select =A0= 1 =A0 0:00 =A00.00% sshd > 1065 postfix =A0 =A0 =A0 1 =A044 =A0 =A00 =A09100K =A01872K kqread =A00 = =A0 0:00 =A00.00% qmgr > 1237 root =A0 =A0 =A0 =A0 17 =A044 =A0 =A00 27460K =A04124K waitvt =A00 = =A0 0:00 =A00.00% > console-kit-daemon > 95362 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 10216K =A02612K ttyin =A0 = 0 =A0 0:00 =A00.00% bash > 34764 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 852K select = =A00 =A0 0:00 =A00.00% make > 1222 root =A0 =A0 =A0 =A0 =A01 =A049 =A0 =A00 21672K =A01896K wait =A0 = =A00 =A0 0:00 =A00.00% login > 35728 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 860K select = =A00 =A0 0:00 =A00.00% make > 1064 postfix =A0 =A0 =A0 1 =A044 =A0 =A00 =A09104K =A01772K zio->i =A01 = =A0 0:00 =A00.00% pickup > 82696 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 10216K =A02596K wait =A0 = =A00 =A0 0:00 =A00.00% bash > 94417 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 10216K =A02596K wait =A0 = =A01 =A0 0:00 =A00.00% bash > 35455 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 744K select = =A00 =A0 0:00 =A00.00% make > 35774 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 728K select = =A01 =A0 0:00 =A00.00% make > 16409 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 10216K =A02592K ttyin =A0 = 0 =A0 0:00 =A00.00% bash > 1155 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A07948K =A01604K nanslp = =A00 =A0 0:00 =A00.00% cron > 1077 messagebus =A0 =A01 =A053 =A0 =A00 =A08092K =A02060K select =A00 =A0= 0:00 =A00.00% > dbus-daemon > 1149 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 26012K =A03960K select =A01= =A0 0:00 =A00.00% sshd > 35729 root =A0 =A0 =A0 =A0 =A01 =A076 =A0 =A00 =A08260K =A01760K wait =A0= =A00 =A0 0:00 =A00.00% sh > 4921 root =A0 =A0 =A0 =A0 =A01 =A057 =A0 =A00 =A08260K =A01748K wait =A0 = =A00 =A0 0:00 =A00.00% sh > =A0825 root =A0 =A0 =A0 =A0 =A01 =A076 =A0 =A00 39212K =A02372K lockf =A0= 1 =A0 0:00 =A00.00% > saslauthd > 35460 root =A0 =A0 =A0 =A0 =A01 =A076 =A0 =A00 =A08260K =A01748K wait =A0= =A00 =A0 0:00 =A00.00% sh > 34761 root =A0 =A0 =A0 =A0 =A01 =A048 =A0 =A00 =A08260K =A01740K wait =A0= =A01 =A0 0:00 =A00.00% sh > 96923 root =A0 =A0 =A0 =A0 =A01 =A050 =A0 =A00 =A08260K =A01740K wait =A0= =A00 =A0 0:00 =A00.00% sh > > > As you can see, top reports that the machine is 100% idle, while a make -= j4 > buildworld runs. This lasts for few seconds (10-20), then everything goes > back to normal, the fans start to scream, the build continues and I can u= se > the machine. > This occasional halt is new to me -but I'm just switched to ZFS on my > desktop, in a server it's harder to notice if you don't use it for > interactive sessions-, but I could see the final freeze on more than one > servers. > How could I help to debug this, and the final one? > > Thanks, > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >