From owner-freebsd-stable@freebsd.org Tue Aug 11 15:54:17 2015 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EB1AA99F955 for ; Tue, 11 Aug 2015 15:54:16 +0000 (UTC) (envelope-from daniel@byte.nl) Received: from mail-out.s1.byte.nl (mail-out.s1.byte.nl [82.94.214.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 76670FB6 for ; Tue, 11 Aug 2015 15:54:15 +0000 (UTC) (envelope-from daniel@byte.nl) Received: from localhost (localhost [127.0.0.1]) by mail-out.s1.byte.nl (Postfix) with ESMTP id EB507121FA2 for ; Tue, 11 Aug 2015 17:54:12 +0200 (CEST) Received: from mail-out.s1.byte.nl ([127.0.0.1]) by localhost (mail-out4.c1.internal [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id q45q2E1scOBh for ; Tue, 11 Aug 2015 17:54:10 +0200 (CEST) Received: from [192.168.101.17] (unknown [37.74.194.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: byte0030) by mail-out.s1.byte.nl (Postfix) with ESMTPSA id 808BB122051 for ; Tue, 11 Aug 2015 17:54:10 +0200 (CEST) Message-ID: <55CA1AA4.7020401@byte.nl> Date: Tue, 11 Aug 2015 17:54:12 +0200 From: Daniel Genis User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.8.0 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: Continuously increasing L2ARC header size References: <55CA09D3.5020007@byte.nl> In-Reply-To: <55CA09D3.5020007@byte.nl> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Aug 2015 15:54:17 -0000 Hello everyone, looking closer and comparing our servers, we infact have 2 servers that behave differently, but they also have a different workload (mail instead of file storage). They have identical hardware, but were installed at a later time, compared to the servers which do have the issue. They have a stable l2arc cache size, not like the one i described previously, where the l2arc size is bigger than the dataset in the pool. Nonleaking server: L2 ARC Summary: (HEALTHY) Passed Headroom: 63.05m Tried Lock Failures: 198.57m IO In Progress: 53.57k Low Memory Aborts: 32 Free on Write: 21.40k Writes While Full: 16.50k R/W Clashes: 3.50k Bad Checksums: 0 IO Errors: 0 SPA Mismatch: 613.80m L2 ARC Size: (Adaptive) 443.42 GiB Header Size: 0.27% 1.22 GiB L2 ARC Evicts: Lock Retries: 1.33k Upon Reading: 0 L2 ARC Breakdown: 191.36m Hit Ratio: 28.27% 54.09m Miss Ratio: 71.73% 137.27m Feeds: 1.68m L2 ARC Buffer: Bytes Scanned: 4.28 PiB Buffer Iterations: 1.68m List Iterations: 107.55m NULL List Iterations: 1.16m L2 ARC Writes: Writes Sent: 100.00% 915.04k No bad checksums or IO Errors. The l2arc size of 443gb is sensible compared to it's actual size (373gb). Yet aside of the workload I cannot find any difference between the 2 "mail" fileservers and the other 6+ "web" fileservers doing storage for websites. They're identical in every regard, aside of the workload and the time of installation. The mail fileservers were installed much later in comparison. I just wanted to add this, as it maybe very relevant. With kind regards, Daniel On 08/11/2015 04:42 PM, Daniel Genis wrote: > Dear FreeBSD community, > > We're facing a somewhat odd issue, perhaps similar to what is discussed > here: https://forums.freebsd.org/threads/l2arc-degraded.47540/ > > The issue is that the L2ARC header seems to grow without limit, similar > to a memory leak, pressuring more and more memory over time out of the ARC. > > For example, the output of "zpool iostat -v 1" > > capacity operations bandwidth > pool alloc free read write read write > ------------ ----- ----- ----- ----- ----- ----- > syspool 1.15G 275G 0 0 0 0 > mirror 1.15G 275G 0 0 0 0 > gpt/zfs0 - - 0 0 0 0 > gpt/zfs1 - - 0 0 0 0 > ------------ ----- ----- ----- ----- ----- ----- > tank 1.21T 1.51T 229 1.99K 3.67M 9.48M > mirror 124G 154G 67 125 787K 503K > da0 - - 20 27 440K 503K > da1 - - 45 28 379K 503K > [...] > mirror 124G 154G 34 164 454K 612K > da18 - - 26 12 417K 612K > da19 - - 6 13 58.8K 612K > logs - - - - - - > mirror 117M 74.4G 0 109 0 1.75M > da21 - - 0 109 0 1.75M > da22 - - 0 109 0 1.75M > cache - - - - - - > da23 1.67T 16.0E 302 7 2.85M 223K > ------------ ----- ----- ----- ----- ----- ----- > > > Here the cache shows 1.67T, in use and 16.0E free. > The cache is a 373GB Intel SSD. > > # diskinfo -v da23 > da23 > 512 # sectorsize > 400088457216 # mediasize in bytes (373G) > 781422768 # mediasize in sectors > 4096 # stripesize > 0 # stripeoffset > 48641 # Cylinders according to firmware. > 255 # Heads according to firmware. > 63 # Sectors according to firmware. > BTTV4234089C400HGN # Disk ident. > id1,enc@n500e004aaaaaaa3e/type@0/slot@18 # Physical path > > > > The L2ARC stats section from "zfs-stats -a": > > L2 ARC Summary: (DEGRADED) > Passed Headroom: 133.33m > Tried Lock Failures: 4.90b > IO In Progress: 313.63k > Low Memory Aborts: 1.52k > Free on Write: 589.79k > Writes While Full: 34.57k > R/W Clashes: 46.95k > Bad Checksums: 408.40m > IO Errors: 151.99m > SPA Mismatch: 632.00m > > L2 ARC Size: (Adaptive) 1.89 TiB > Header Size: 0.88% 16.98 GiB > > L2 ARC Evicts: > Lock Retries: 1.27k > Upon Reading: 2 > > L2 ARC Breakdown: 2.10b > Hit Ratio: 32.89% 691.15m > Miss Ratio: 67.11% 1.41b > Feeds: 3.70m > > L2 ARC Buffer: > Bytes Scanned: 10.70 PiB > Buffer Iterations: 3.70m > List Iterations: 236.30m > NULL List Iterations: 24.86m > > L2 ARC Writes: > Writes Sent: 100.00% 3.38m > > > Here we can see that currently the Header Size is almost 17gb. > This header size grows continuously without (apparent) limit. > Also zfs appears to think it's holding 1.89 TiB inside the L2ARC, which > seems very very unlikely. > > # freebsd-version > 10.1-RELEASE-p13 > > # uname -a > FreeBSD servername 10.1-RELEASE-p10 FreeBSD 10.1-RELEASE-p10 #0: Wed May > 13 06:54:13 UTC 2015 > root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 > > # uptime > 4:35PM up 42 days, 15:24, 1 user, load averages: 1.35, 0.96, 0.84 > > > Does anyone know how we can alleviate the issue? > We originally thought the issue was caused by > https://www.freebsd.org/security/advisories/FreeBSD-EN-15:07.zfs.asc > > We have updated our Servers since but the header size seems to keep > growing still. For reference, we have multiple bsd fileservers which are > used mostly over NFS, all with identical configuration (but varying > workload). They all still show these symptoms. > > Any tips/hints/pointers are appreciated! > > With kind regards, > > Daniel > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > -- Met vriendelijke groeten, Daniel Genis Medewerker Techniek Byte Internet W http://www.byte.nl/ E daniel@byte.nl T 020 521 6226 F 020 521 6227