Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 04 Aug 2018 20:38:04 +0200
From:      Mark Martinec <Mark.Martinec+freebsd@ijs.si>
To:        stable@freebsd.org
Cc:        Mark Johnston <markj@freebsd.org>
Subject:   Re: All the memory eaten away by ZFS 'solaris' malloc - on 11.1-R amd64
Message-ID:  <87f6a55cc2ee3d754ddb89475bbfbab8@ijs.si>
In-Reply-To: <20180804170154.GA12146@raichu>
References:  <1a039af7758679ba1085934b4fb81b57@ijs.si> <3e56e4de076111c04c2595068ba71eec@ijs.si> <20180731220948.GA97237@raichu> <2ec91ebeaba54fda5e9437f868d4d590@ijs.si> <b3aa2bbe947914f8933b24cf0d0b15f0@ijs.si> <20180804170154.GA12146@raichu>

next in thread | previous in thread | raw e-mail | index | archive | help
2018-08-04 19:01, Mark Johnston wrote:
> I think running "zpool list" is adding a lot of noise to the output.
> Could you retry without doing that?

No, like I said previously, the "zpool list" (with one defunct
zfs pool) *is* the sole culprit of the zfs memory leak.
With each invocation of "zpool list" the "solaris" malloc
jumps up by the same amount, and never ever drops. Without
running it (like repeatedly under 'telegraf' monitoring
of zfs), the machine runs normally and never runs out of
memory, the "solaris" malloc count no longer grows steadily.

This leak was introduced sometime between 10.3 and 11.1R-p11,
and is still there with 11.2.

   Mark


> On Fri, Aug 03, 2018 at 09:11:42PM +0200, Mark Martinec wrote:
>> More attempts at tracking this down. The suggested dtrace command does
>> usually abort with:
>> 
>>    Assertion failed: (buf->dtbd_timestamp >= first_timestamp),
>>      file
>> /usr/src/cddl/contrib/opensolaris/lib/libdtrace/common/dt_consume.c,
>>      line 3330.
> 
> Hrmm.  As a workaround you can add "-x temporal=off" to the dtrace(1)
> invocation.
> 
>> but with some luck soon after each machine reboot I can leave the 
>> dtrace
>> running for about 10 or 20 seconds (max) before terminating it with a
>> ^C,
>> and succeed in collecting the report.  If I miss the opportunity to
>> leave
>> dtrace running just long enough to collect useful info, but not long
>> enough for it to hit the assertion check, then any further attempt
>> to run the dtrace script hits the assertion fault immediately.
>> 
>> Btw, (just in case) I have recompiled kernel from source
>> (base/release/11.2.0)
>> with debugging symbols, although the behaviour has not changed:
>> 
>>    FreeBSD floki.ijs.si 11.2-RELEASE FreeBSD 11.2-RELEASE #0 r337238:
>>      Fri Aug 3 17:29:42 CEST 2018
>> mark@xxx.ijs.si:/usr/obj/usr/src/sys/FLOKI amd64
>> 
>> 
>> Anyway, after several attempts I was able to collect a useful dtrace
>> output from the suggested dtrace stript:
>> 
>> # dtrace -n 'dtmalloc::solaris:malloc {@allocs[stack(), args[3]] =
>>    count()} dtmalloc::solaris:free {@frees[stack(), args[3]] = 
>> count()}'
>> 
>> while running "zpool list" repeatedly in another terminal screen:
> 
> I think running "zpool list" is adding a lot of noise to the output.
> Could you retry without doing that?



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?87f6a55cc2ee3d754ddb89475bbfbab8>