Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Jan 2010 10:10:20 -0600
From:      "Doug Poland" <doug@polands.org>
To:        "Ivan Voras" <ivoras@freebsd.org>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: 8.0-R-p2 ZFS: unixbench causing kmem exhaustion panic
Message-ID:  <e9d87b468534fb7eccbc1a4fc2fced19.squirrel@email.polands.org>
In-Reply-To: <9bbcef731001140650h5d887843ubc6d555da993e8b6@mail.gmail.com>
References:  <8418112cdfada93d83ca0cb5307c1d21.squirrel@email.polands.org> <hil1e8$ebs$1@ger.gmane.org> <b78f9b16683331ad0f574ecfc1b7f995.squirrel@email.polands.org> <9bbcef731001131035x604cdea1t81b14589cb10ad25@mail.gmail.com> <b41ca31fbeacf104143509e8cba2fe66.squirrel@email.polands.org> <9bbcef731001131157h256c4d14mbb241bc4326405f8@mail.gmail.com> <3aa09fd8723749d1fa65f1b9a6faac60.squirrel@email.polands.org> <cb290c7a06dd633dfc1cd5bd8b4fd99a.squirrel@email.polands.org> <himnfv$acn$1@ger.gmane.org> <27117211dd662bcf93055f4351243396.squirrel@email.polands.org> <9bbcef731001140650h5d887843ubc6d555da993e8b6@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Thu, January 14, 2010 08:50, Ivan Voras wrote:
> 2010/1/14 Doug Poland <doug@polands.org>:
>>>>
>>>> kstat.zfs.misc.arcstats.size
>>>>
>>>> seemed to fluctuate between about 164,000,00 and 180,000,000 bytes
>>>> during this last run
>>>
>>> Is that with or without panicking?
>>>
>> with a panic
>>
>>
>>> If the system did panic then it looks like the problem is a memory
>>> leak somewhere else in the kernel, which you could confirm by
>>> monitoring vmstat -z.
>>>
>> I'll give that a try.  Am I looking for specific items in vmstat
>> -z?   arc*, zil*, zfs*, zio*?  Please advise.
>
> You should look for whatever is allocating all your memory between 180
> MB (which is your ARC size) and 1.2 GB (which is your kmem size).
>

OK, another run, this time back to vfs.zfs.arc_max=512M in
/boot/loader.conf, and a panic:

panic: kmem malloc(131072): kmem map too small: 1294258176 total
allocated

I admit I do not fully understand what metrics are important to proper
analysis of this issue.  In this case, I was watching the following
within 1 second of the panic:

sysctl kstat.zfs.misc.arcstats.size: 41739944
sysctl vfs.numvnodes: 678
sysctl vfs.zfs.arc_max: 536870912
sysctl vfs.zfs.arc_meta_limit: 134217728
sysctl vfs.zfs.arc_meta_used: 7228584
sysctl vfs.zfs.arc_min: 67108864
sysctl vfs.zfs.cache_flush_disable: 0
sysctl vfs.zfs.debug: 0
sysctl vfs.zfs.mdcomp_disable: 0
sysctl vfs.zfs.prefetch_disable: 1
sysctl vfs.zfs.recover: 0
sysctl vfs.zfs.scrub_limit: 10
sysctl vfs.zfs.super_owner: 0
sysctl vfs.zfs.txg.synctime: 5
sysctl vfs.zfs.txg.timeout: 30
sysctl vfs.zfs.vdev.aggregation_limit: 131072
sysctl vfs.zfs.vdev.cache.bshift: 16
sysctl vfs.zfs.vdev.cache.max: 16384
sysctl vfs.zfs.vdev.cache.size: 10485760
sysctl vfs.zfs.vdev.max_pending: 35
sysctl vfs.zfs.vdev.min_pending: 4
sysctl vfs.zfs.vdev.ramp_rate: 2
sysctl vfs.zfs.vdev.time_shift: 6
sysctl vfs.zfs.version.acl: 1
sysctl vfs.zfs.version.dmu_backup_header: 2
sysctl vfs.zfs.version.dmu_backup_stream: 1
sysctl vfs.zfs.version.spa: 13
sysctl vfs.zfs.version.vdev_boot: 1
sysctl vfs.zfs.version.zpl: 3
sysctl vfs.zfs.zfetch.array_rd_sz: 1048576
sysctl vfs.zfs.zfetch.block_cap: 256
sysctl vfs.zfs.zfetch.max_streams: 8
sysctl vfs.zfs.zfetch.min_sec_reap: 2
sysctl vfs.zfs.zil_disable: 0
sysctl vm.kmem_size: 1327202304
sysctl vm.kmem_size_max: 329853485875
sysctl vm.kmem_size_min: 0
sysctl vm.kmem_size_scale: 3


vmstat -z | egrep -i 'zfs|zil|arc|zio|files'
ITEM                     SIZE     LIMIT      USED      FREE  REQUESTS
Files:                     80,        0,      116,      199,   850713
zio_cache:                720,        0,    53562,       98, 86386955
arc_buf_hdr_t:            208,        0,     1193,       31,    11990
arc_buf_t:                 72,        0,     1180,      120,    11990
zil_lwb_cache:            200,        0,    11580,     2594,    62407
zfs_znode_cache:          376,        0,      605,       55,      654

vmstat -m |grep solaris|sed 's/K//'|awk '{print "vm.solaris:", $3*1024}'


  solaris: 1285068800


The value I see as the culprit is vmstat -m | grep solaris.  This
value fluctuates wildly during the run and is always near kmem_size at
the time of the panic.

Again, I'm not sure what to look for here, and you are patiently
helping me along in this process.  If you have any tips or can point
me to docs on how to easily monitor these values, I will endeavor to
do so.


-- 
Regards,
Doug




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?e9d87b468534fb7eccbc1a4fc2fced19.squirrel>