Date: Wed, 3 Feb 2010 11:39:11 +0100 From: Matthias Gamsjager <mgamsjager@gmail.com> To: Attila Nagy <bra@fsn.hu> Cc: freebsd-fs@freebsd.org Subject: Re: Machine stops for some seconds with ZFS Message-ID: <585602e11002030239y3da31f7bkbf593a04950c351e@mail.gmail.com> In-Reply-To: <4B694689.2030704@fsn.hu> References: <4B694689.2030704@fsn.hu>
next in thread | previous in thread | raw e-mail | index | archive | help
What's the point in having a cache device that is slower then the harddisks itself? could you please try the build without the slow cache device? On Wed, Feb 3, 2010 at 10:48 AM, Attila Nagy <bra@fsn.hu> wrote: > Hello, > > After a long time, I've switched back to ZFS on my desktop. It runs > 8-STABLE/amd64 with two SATA disks and an USB pendrive. > One-one partition is used from each disk for the zpool, which is encrypte= d > using GELI, and the pendrive is there for L2ARC: > =A0 NAME =A0 =A0 =A0 =A0 =A0 =A0STATE =A0 =A0 READ WRITE CKSUM > =A0 data =A0 =A0 =A0 =A0 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0 =A0 mirror =A0 =A0 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0 =A0 =A0 ad0s1d.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0 =A0 =A0 ad1s1d.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > =A0 cache > =A0 =A0 da0 =A0 =A0 =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 > > Today, after 12 days of uptime the machine has frozen. I could ping it fr= om > a different machine, even could open a telnet to its ssh port, but I > couldn't get the ssh banner. > > Now I'm building a 9-CURRENT kernel and world to see whether the same > problem persists with that, and during the make process I've noticed a > strange thing. > I build with -j4 (the machine has one dual core CPU), so the fans are > screaming during the process. But every few minutes (I couldn't recognize > any patterns in it) the machine goes completely silent (even more silent > than normally), and everything halts. > During this, the top running on the machine can refresh itself, and I can > type on pass through ssh connections (that is, I use the machine in quest= ion > to access other machines with ssh), but I can't open new ssh connections = to > it, and can't start anything new (for example from an open shell). > ping is running seamlessly during this, and top shows the following: > > last pid: 36503; =A0load averages: =A01.59, =A03.04, =A03.01 =A0 =A0up 0+= 00:49:53 > =A010:32:10 > 97 processes: =A01 running, 96 sleeping > CPU: =A00.0% user, =A00.0% nice, =A00.0% system, =A00.0% interrupt, =A010= 0% idle > Mem: 218M Active, 24M Inact, 639M Wired, 40M Cache, 6208K Buf, 1022M Free > Swap: 4096M Total, 4096M Free > > =A0PID USERNAME =A0 =A0THR PRI NICE =A0 SIZE =A0 =A0RES STATE =A0 C =A0 T= IME =A0 WCPU COMMAND > 1342 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 620K select = =A00 =A0 0:02 =A00.00% make > 1424 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A01036K select = =A00 =A0 0:01 =A00.00% make > 1280 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 12540K =A01900K select =A00= =A0 0:01 =A00.00% > hald-addon-storage > 1234 haldaemon =A0 =A0 1 =A044 =A0 =A00 24116K =A04464K select =A00 =A0 0= :01 =A00.00% hald > 93600 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A01028K select = =A00 =A0 0:00 =A00.00% make > 1260 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 19704K =A02688K select =A00= =A0 0:00 =A00.00% > hald-addon-mouse-sy > 15142 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 =A09332K =A02864K CPU0 =A0= =A00 =A0 0:00 =A00.00% top > 1263 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 12540K =A01896K cgticb =A00= =A0 0:00 =A00.00% > hald-addon-storage > 94415 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 37944K =A04992K select =A0= 1 =A0 0:00 =A00.00% sshd > 35837 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A05252K =A02424K select = =A01 =A0 0:00 =A00.00% make > 95361 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 37944K =A04992K select =A0= 1 =A0 0:00 =A00.00% sshd > 35973 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A01772K select = =A00 =A0 0:00 =A00.00% make > =A0608 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A06892K =A01436K select = =A01 =A0 0:00 =A00.00% syslogd > 96928 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 728K select = =A00 =A0 0:00 =A00.00% make > 94369 root =A0 =A0 =A0 =A0 =A01 =A051 =A0 =A00 37944K =A04584K sbwait =A0= 0 =A0 0:00 =A00.00% sshd > 82631 root =A0 =A0 =A0 =A0 =A01 =A050 =A0 =A00 37944K =A04584K sbwait =A0= 0 =A0 0:00 =A00.00% sshd > 16304 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 37944K =A04576K zio->i =A0= 1 =A0 0:00 =A00.00% sshd > =A0951 _ntp =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A06876K =A01692K select = =A00 =A0 0:00 =A00.00% ntpd > 1238 root =A0 =A0 =A0 =A0 =A01 =A076 =A0 =A00 16768K =A02372K select =A00= =A0 0:00 =A00.00% > hald-runner > 4916 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 728K select = =A01 =A0 0:00 =A00.00% make > 95338 root =A0 =A0 =A0 =A0 =A01 =A049 =A0 =A00 37944K =A04584K sbwait =A0= 1 =A0 0:00 =A00.00% sshd > 1259 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 10280K =A02712K pause =A0 1= =A0 0:00 =A00.00% csh > 33357 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 21596K =A04004K select =A0= 0 =A0 0:00 =A00.00% ssh > 16405 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 37944K =A05012K zio->i =A0= 0 =A0 0:00 =A00.00% sshd > 1044 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A09104K =A01796K kqread = =A00 =A0 0:00 =A00.00% master > 34765 root =A0 =A0 =A0 =A0 =A01 =A076 =A0 =A00 =A08260K =A01764K wait =A0= =A01 =A0 0:00 =A00.00% sh > 82685 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 37944K =A04960K select =A0= 1 =A0 0:00 =A00.00% sshd > 1065 postfix =A0 =A0 =A0 1 =A044 =A0 =A00 =A09100K =A01872K kqread =A00 = =A0 0:00 =A00.00% qmgr > 1237 root =A0 =A0 =A0 =A0 17 =A044 =A0 =A00 27460K =A04124K waitvt =A00 = =A0 0:00 =A00.00% > console-kit-daemon > 95362 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 10216K =A02612K ttyin =A0 = 0 =A0 0:00 =A00.00% bash > 34764 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 852K select = =A00 =A0 0:00 =A00.00% make > 1222 root =A0 =A0 =A0 =A0 =A01 =A049 =A0 =A00 21672K =A01896K wait =A0 = =A00 =A0 0:00 =A00.00% login > 35728 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 860K select = =A00 =A0 0:00 =A00.00% make > 1064 postfix =A0 =A0 =A0 1 =A044 =A0 =A00 =A09104K =A01772K zio->i =A01 = =A0 0:00 =A00.00% pickup > 82696 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 10216K =A02596K wait =A0 = =A00 =A0 0:00 =A00.00% bash > 94417 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 10216K =A02596K wait =A0 = =A01 =A0 0:00 =A00.00% bash > 35455 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 744K select = =A00 =A0 0:00 =A00.00% make > 35774 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 728K select = =A01 =A0 0:00 =A00.00% make > 16409 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 10216K =A02592K ttyin =A0 = 0 =A0 0:00 =A00.00% bash > 1155 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A07948K =A01604K nanslp = =A00 =A0 0:00 =A00.00% cron > 1077 messagebus =A0 =A01 =A053 =A0 =A00 =A08092K =A02060K select =A00 =A0= 0:00 =A00.00% > dbus-daemon > 1149 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 26012K =A03960K select =A01= =A0 0:00 =A00.00% sshd > 35729 root =A0 =A0 =A0 =A0 =A01 =A076 =A0 =A00 =A08260K =A01760K wait =A0= =A00 =A0 0:00 =A00.00% sh > 4921 root =A0 =A0 =A0 =A0 =A01 =A057 =A0 =A00 =A08260K =A01748K wait =A0 = =A00 =A0 0:00 =A00.00% sh > =A0825 root =A0 =A0 =A0 =A0 =A01 =A076 =A0 =A00 39212K =A02372K lockf =A0= 1 =A0 0:00 =A00.00% > saslauthd > 35460 root =A0 =A0 =A0 =A0 =A01 =A076 =A0 =A00 =A08260K =A01748K wait =A0= =A00 =A0 0:00 =A00.00% sh > 34761 root =A0 =A0 =A0 =A0 =A01 =A048 =A0 =A00 =A08260K =A01740K wait =A0= =A01 =A0 0:00 =A00.00% sh > 96923 root =A0 =A0 =A0 =A0 =A01 =A050 =A0 =A00 =A08260K =A01740K wait =A0= =A00 =A0 0:00 =A00.00% sh > > > As you can see, top reports that the machine is 100% idle, while a make -= j4 > buildworld runs. This lasts for few seconds (10-20), then everything goes > back to normal, the fans start to scream, the build continues and I can u= se > the machine. > This occasional halt is new to me -but I'm just switched to ZFS on my > desktop, in a server it's harder to notice if you don't use it for > interactive sessions-, but I could see the final freeze on more than one > servers. > How could I help to debug this, and the final one? > > Thanks, > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?585602e11002030239y3da31f7bkbf593a04950c351e>