Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 3 Feb 2010 11:39:11 +0100
From:      Matthias Gamsjager <mgamsjager@gmail.com>
To:        Attila Nagy <bra@fsn.hu>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Machine stops for some seconds with ZFS
Message-ID:  <585602e11002030239y3da31f7bkbf593a04950c351e@mail.gmail.com>
In-Reply-To: <4B694689.2030704@fsn.hu>
References:  <4B694689.2030704@fsn.hu>

next in thread | previous in thread | raw e-mail | index | archive | help
What's the point in having a cache device that is slower then the
harddisks itself?
could you please try the build without the slow cache device?

On Wed, Feb 3, 2010 at 10:48 AM, Attila Nagy <bra@fsn.hu> wrote:
> Hello,
>
> After a long time, I've switched back to ZFS on my desktop. It runs
> 8-STABLE/amd64 with two SATA disks and an USB pendrive.
> One-one partition is used from each disk for the zpool, which is encrypte=
d
> using GELI, and the pendrive is there for L2ARC:
> =A0 NAME =A0 =A0 =A0 =A0 =A0 =A0STATE =A0 =A0 READ WRITE CKSUM
> =A0 data =A0 =A0 =A0 =A0 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0
> =A0 =A0 mirror =A0 =A0 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0
> =A0 =A0 =A0 ad0s1d.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0
> =A0 =A0 =A0 ad1s1d.eli =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0
> =A0 cache
> =A0 =A0 da0 =A0 =A0 =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0
>
> Today, after 12 days of uptime the machine has frozen. I could ping it fr=
om
> a different machine, even could open a telnet to its ssh port, but I
> couldn't get the ssh banner.
>
> Now I'm building a 9-CURRENT kernel and world to see whether the same
> problem persists with that, and during the make process I've noticed a
> strange thing.
> I build with -j4 (the machine has one dual core CPU), so the fans are
> screaming during the process. But every few minutes (I couldn't recognize
> any patterns in it) the machine goes completely silent (even more silent
> than normally), and everything halts.
> During this, the top running on the machine can refresh itself, and I can
> type on pass through ssh connections (that is, I use the machine in quest=
ion
> to access other machines with ssh), but I can't open new ssh connections =
to
> it, and can't start anything new (for example from an open shell).
> ping is running seamlessly during this, and top shows the following:
>
> last pid: 36503; =A0load averages: =A01.59, =A03.04, =A03.01 =A0 =A0up 0+=
00:49:53
> =A010:32:10
> 97 processes: =A01 running, 96 sleeping
> CPU: =A00.0% user, =A00.0% nice, =A00.0% system, =A00.0% interrupt, =A010=
0% idle
> Mem: 218M Active, 24M Inact, 639M Wired, 40M Cache, 6208K Buf, 1022M Free
> Swap: 4096M Total, 4096M Free
>
> =A0PID USERNAME =A0 =A0THR PRI NICE =A0 SIZE =A0 =A0RES STATE =A0 C =A0 T=
IME =A0 WCPU COMMAND
> 1342 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 620K select =
=A00 =A0 0:02 =A00.00% make
> 1424 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A01036K select =
=A00 =A0 0:01 =A00.00% make
> 1280 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 12540K =A01900K select =A00=
 =A0 0:01 =A00.00%
> hald-addon-storage
> 1234 haldaemon =A0 =A0 1 =A044 =A0 =A00 24116K =A04464K select =A00 =A0 0=
:01 =A00.00% hald
> 93600 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A01028K select =
=A00 =A0 0:00 =A00.00% make
> 1260 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 19704K =A02688K select =A00=
 =A0 0:00 =A00.00%
> hald-addon-mouse-sy
> 15142 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 =A09332K =A02864K CPU0 =A0=
 =A00 =A0 0:00 =A00.00% top
> 1263 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 12540K =A01896K cgticb =A00=
 =A0 0:00 =A00.00%
> hald-addon-storage
> 94415 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 37944K =A04992K select =A0=
1 =A0 0:00 =A00.00% sshd
> 35837 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A05252K =A02424K select =
=A01 =A0 0:00 =A00.00% make
> 95361 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 37944K =A04992K select =A0=
1 =A0 0:00 =A00.00% sshd
> 35973 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A01772K select =
=A00 =A0 0:00 =A00.00% make
> =A0608 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A06892K =A01436K select =
=A01 =A0 0:00 =A00.00% syslogd
> 96928 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 728K select =
=A00 =A0 0:00 =A00.00% make
> 94369 root =A0 =A0 =A0 =A0 =A01 =A051 =A0 =A00 37944K =A04584K sbwait =A0=
0 =A0 0:00 =A00.00% sshd
> 82631 root =A0 =A0 =A0 =A0 =A01 =A050 =A0 =A00 37944K =A04584K sbwait =A0=
0 =A0 0:00 =A00.00% sshd
> 16304 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 37944K =A04576K zio->i =A0=
1 =A0 0:00 =A00.00% sshd
> =A0951 _ntp =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A06876K =A01692K select =
=A00 =A0 0:00 =A00.00% ntpd
> 1238 root =A0 =A0 =A0 =A0 =A01 =A076 =A0 =A00 16768K =A02372K select =A00=
 =A0 0:00 =A00.00%
> hald-runner
> 4916 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 728K select =
=A01 =A0 0:00 =A00.00% make
> 95338 root =A0 =A0 =A0 =A0 =A01 =A049 =A0 =A00 37944K =A04584K sbwait =A0=
1 =A0 0:00 =A00.00% sshd
> 1259 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 10280K =A02712K pause =A0 1=
 =A0 0:00 =A00.00% csh
> 33357 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 21596K =A04004K select =A0=
0 =A0 0:00 =A00.00% ssh
> 16405 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 37944K =A05012K zio->i =A0=
0 =A0 0:00 =A00.00% sshd
> 1044 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A09104K =A01796K kqread =
=A00 =A0 0:00 =A00.00% master
> 34765 root =A0 =A0 =A0 =A0 =A01 =A076 =A0 =A00 =A08260K =A01764K wait =A0=
 =A01 =A0 0:00 =A00.00% sh
> 82685 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 37944K =A04960K select =A0=
1 =A0 0:00 =A00.00% sshd
> 1065 postfix =A0 =A0 =A0 1 =A044 =A0 =A00 =A09100K =A01872K kqread =A00 =
=A0 0:00 =A00.00% qmgr
> 1237 root =A0 =A0 =A0 =A0 17 =A044 =A0 =A00 27460K =A04124K waitvt =A00 =
=A0 0:00 =A00.00%
> console-kit-daemon
> 95362 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 10216K =A02612K ttyin =A0 =
0 =A0 0:00 =A00.00% bash
> 34764 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 852K select =
=A00 =A0 0:00 =A00.00% make
> 1222 root =A0 =A0 =A0 =A0 =A01 =A049 =A0 =A00 21672K =A01896K wait =A0 =
=A00 =A0 0:00 =A00.00% login
> 35728 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 860K select =
=A00 =A0 0:00 =A00.00% make
> 1064 postfix =A0 =A0 =A0 1 =A044 =A0 =A00 =A09104K =A01772K zio->i =A01 =
=A0 0:00 =A00.00% pickup
> 82696 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 10216K =A02596K wait =A0 =
=A00 =A0 0:00 =A00.00% bash
> 94417 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 10216K =A02596K wait =A0 =
=A01 =A0 0:00 =A00.00% bash
> 35455 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 744K select =
=A00 =A0 0:00 =A00.00% make
> 35774 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A03204K =A0 728K select =
=A01 =A0 0:00 =A00.00% make
> 16409 bra =A0 =A0 =A0 =A0 =A0 1 =A044 =A0 =A00 10216K =A02592K ttyin =A0 =
0 =A0 0:00 =A00.00% bash
> 1155 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 =A07948K =A01604K nanslp =
=A00 =A0 0:00 =A00.00% cron
> 1077 messagebus =A0 =A01 =A053 =A0 =A00 =A08092K =A02060K select =A00 =A0=
 0:00 =A00.00%
> dbus-daemon
> 1149 root =A0 =A0 =A0 =A0 =A01 =A044 =A0 =A00 26012K =A03960K select =A01=
 =A0 0:00 =A00.00% sshd
> 35729 root =A0 =A0 =A0 =A0 =A01 =A076 =A0 =A00 =A08260K =A01760K wait =A0=
 =A00 =A0 0:00 =A00.00% sh
> 4921 root =A0 =A0 =A0 =A0 =A01 =A057 =A0 =A00 =A08260K =A01748K wait =A0 =
=A00 =A0 0:00 =A00.00% sh
> =A0825 root =A0 =A0 =A0 =A0 =A01 =A076 =A0 =A00 39212K =A02372K lockf =A0=
 1 =A0 0:00 =A00.00%
> saslauthd
> 35460 root =A0 =A0 =A0 =A0 =A01 =A076 =A0 =A00 =A08260K =A01748K wait =A0=
 =A00 =A0 0:00 =A00.00% sh
> 34761 root =A0 =A0 =A0 =A0 =A01 =A048 =A0 =A00 =A08260K =A01740K wait =A0=
 =A01 =A0 0:00 =A00.00% sh
> 96923 root =A0 =A0 =A0 =A0 =A01 =A050 =A0 =A00 =A08260K =A01740K wait =A0=
 =A00 =A0 0:00 =A00.00% sh
>
>
> As you can see, top reports that the machine is 100% idle, while a make -=
j4
> buildworld runs. This lasts for few seconds (10-20), then everything goes
> back to normal, the fans start to scream, the build continues and I can u=
se
> the machine.
> This occasional halt is new to me -but I'm just switched to ZFS on my
> desktop, in a server it's harder to notice if you don't use it for
> interactive sessions-, but I could see the final freeze on more than one
> servers.
> How could I help to debug this, and the final one?
>
> Thanks,
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?585602e11002030239y3da31f7bkbf593a04950c351e>