Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 2 Dec 2007 13:33:09 +0100
From:      =?ISO-8859-1?Q?Johan_Str=F6m?= <johan@stromnet.se>
To:        m.rebele@web.de
Cc:        freebsd-current@freebsd.org
Subject:   Re: 7.0-Beta 3: zfs makes system reboot
Message-ID:  <CAAD8692-4EBA-4BEF-9523-721EFFC5643E@stromnet.se>
In-Reply-To: <475039D5.4020204@web.de>
References:  <475039D5.4020204@web.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Nov 30, 2007, at 17:27 , Michael Rebele wrote:

> Hello,
>
> i'm testing the zfs since 7.0-Beta 1.
> First, i had only access to an 32 Bit Machine (P4/3GHz with 2GB =20
> RAM, 2xHD for RAID1 and 2xHD for ZFS Raid 0).
>
> While running iozone with the following call:
> iozone -R -a -z -b file.wks -g 4G -f testile
>
> (This is inspired by Dominic Kay from Sun, see http://blogs.sun.com/=20=

> dom/entry/zfs_v_vxfs_iozone for details).
>
> the well known "kmem_malloc" error occured and stopped the system.
> (panic: kmem_malloc (131072): kmem_map too small: 398491648 total =20
> allocated cpuid=3D1)
>
> I tested several optimizations as suggested in the ZFS Tuning Guide =20=

> and several postings on this list.
> The problem stayed mainly the same, it stopped with a "kmem_malloc" =20=

> or rebooted without warning. This depends on the configuration, if =20
> i raised the vm.kmem_-sizes or only the KVA_PAGES or both.
> But it never ever made the benchmark. With more memory in =20
> vm.kmem_size and vm.kmem_size_max, the problem came later.
>
>
>
> But ok, the main target for the ZFS is to use amd64, not i386. Now =20
> i have access to an Intel Woodcrest-System, it's a Xeon 5160 with =20
> 4GB RAM and 1xHD. It has UFS for the System and Home and one ZFS =20
> only for data (for the iozone-Benchmark).
> It has a vanilla kernel, i haven't touched it. I've tested the =20
> default settings from Beta 3 and applied the tuning tips from the =20
> Tuning Guide.
> It shows the same behaviour as on the 32 Bit machine. One major =20
> difference: it makes always a reboot. There's no kmem_malloc error =20
> message (which made the system hang).
>
> The problem is the "-z" option in the iozone-Benchmark. Without it, =20=

> the benchmark works (on the i386 and on the amd64-Machine). This =20
> option makes iozone testing small record sizes for large files. On =20
> an UFS-Filesystem, iozone works with the "-z" option. Though, it =20
> seems to me, that this is a problem with ZFS.
>
> Here are some more informations (from the amd64-System):
>
> 1. The captured iozone output
>
> [root@zfs /tank/iozone]# iozone -R -a -z -b filez-512M.wks -g 4G -f =20=

> testile
> ...


For the record, I can reproduce the same thing on amd64 FreeBSD =20
RELENG_7 (installed from beta3 2 days ago) from 2 days ago. Its a c2d =20=

box with 2Gb of memory and two satadrives in zpool mirror. No special =20=

tweaking whatsoever yet..
The panic was Page fault, supervisor read instruction page not =20
present.. so not the (apparently) regular kmem_malloc? So I doubt the =20=

other patch that was linked to by Alexandre would help?

iozone got to
         Run began: Sun Dec  2 13:11:53 2007

         Excel chart generation enabled
         Auto Mode
         Cross over of record size disabled.
         Using maximum file size of 4194304 kilobytes.
         Command line used: iozone -R -a -z -b file.wks -g 4G -f testile
         Output is in Kbytes/sec
         Time Resolution =3D 0.000001 seconds.
         Processor cache size set to 1024 Kbytes.
         Processor cache line size set to 32 bytes.
         File stride size set to 17 * record size.
                                                             random  =20
random    bkwd  record  stride
               KB  reclen   write rewrite    read    reread    read   =20=

write    read rewrite    read   fwrite frewrite   fread  freread
               64       4  122584  489126   969761  1210227 1033216  =20
503814  769584  516414  877797   291206   460591  703068   735831
               64       8  204474  735831  1452528  1518251 1279447  =20
799377 1255511  752329 1460430   372410   727850 1087638  1279447
......
           131072       4   65734   71698  1011780   970967  =20
755928    5479 1008858  494172  931232    65869    68155  906746   =20
910950
           131072       8   79507   74422  1699148  1710185 1350184   =20=

10907 1612344  929991 1372725    34699    74782 1407638  1429434
           131072      16   82479   74279  2411000  2426173 2095714   =20=

25327 2299061 1608974 2038950    71102    69200 1887231  1893067
           131072      32   75268   73077  3276650  3326454 2954789   =20=

70573 3195793 2697621 2987611
then it died

No cores dumped however.. Altough I'm running on a gmirror for swap, =20
if I recall correct at least 6.x couldnt dump to a gmirror, I guess =20
7.x cant either then.. Altought the dump message DID say it dumped =20
memory (and it did say Dump complete), savecore didnt find any dumps =20
at boot..

The box didnt do anything else during this test, and is not running =20
any apps yet. Havent encounterd the problem before, but then again =20
I've only been playing with it for 2 days without any real hard test =20
(just scp'ed about 50 gigs of data to it, but thats it)

--
Johan Str=F6m
Stromnet
johan@stromnet.se
http://www.stromnet.se/




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAAD8692-4EBA-4BEF-9523-721EFFC5643E>