From owner-freebsd-stable@freebsd.org Wed Oct 12 21:13:16 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1277EC0EA08 for ; Wed, 12 Oct 2016 21:13:16 +0000 (UTC) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from uucp.dinoex.sub.de (uucp.dinoex.sub.de [IPv6:2001:1440:5001:1::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "uucp.dinoex.sub.de", Issuer "StartCom Class 1 DV Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A998AC09 for ; Wed, 12 Oct 2016 21:13:15 +0000 (UTC) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from uucp.dinoex.sub.de (uucp.dinoex.sub.de [194.45.71.2]) by uucp.dinoex.sub.de (8.15.2/8.14.9) with ESMTPS id u9CLD7tr042205 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Wed, 12 Oct 2016 23:13:07 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) X-MDaemon-Deliver-To: Received: from citylink.dinoex.sub.org (uucp@localhost) by uucp.dinoex.sub.de (8.15.2/8.14.9/Submit) with UUCP id u9CLD7B4042204 for freebsd-stable@FreeBSD.ORG; Wed, 12 Oct 2016 23:13:07 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from gate.oper.dinoex.org (gate-e [192.168.98.2]) by citylink.dinoex.sub.de (8.14.9/8.14.9) with ESMTP id u9CKQfZ8005558 for ; Wed, 12 Oct 2016 22:26:41 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from gate.oper.dinoex.org (gate-e [192.168.98.2]) by gate.oper.dinoex.org (8.14.9/8.14.9) with ESMTP id u9CKO8De005148 for ; Wed, 12 Oct 2016 22:24:09 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: (from news@localhost) by gate.oper.dinoex.org (8.14.9/8.14.9/Submit) id u9CKO89b005147 for freebsd-stable@FreeBSD.ORG; Wed, 12 Oct 2016 22:24:08 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) X-Authentication-Warning: gate.oper.dinoex.org: news set sender to li-fbsd@citylink.dinoex.sub.org using -f From: Peter Subject: Re: ZFS l2arc broken in 10.3 Date: Wed, 12 Oct 2016 22:18:44 +0200 Organization: even some more stinky socks Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Wed, 12 Oct 2016 20:18:44 +0000 (UTC) Injection-Info: oper.dinoex.de; logging-data="4323"; mail-complaints-to="usenet@citylink.dinoex.sub.org" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:42.0) Gecko/20100101 Firefox/42.0 SeaMonkey/2.39 In-Reply-To: Sender: li-fbsd@citylink.dinoex.sub.org To: freebsd-stable@FreeBSD.ORG X-Milter: Spamilter (Reciever: uucp.dinoex.sub.de; Sender-ip: 194.45.71.2; Sender-helo: uucp.dinoex.sub.de; ) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (uucp.dinoex.sub.de [194.45.71.2]); Wed, 12 Oct 2016 23:13:08 +0200 (CEST) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Oct 2016 21:13:16 -0000 Details: After upgrading 2 machines from 9.3 to 10.3-STABLE, on one of them the l2arc stays empty (capacity alloc = 0), although it is online and gets accessed. It did work well on 9.3. I did the following tests: * Create a zpool on a stick, with two volumes: one filesystem and one cache. The cache stays with alloc=0. Export it and move it into the other machine. The cache immediately fills. Move it back, the cache stays with alloc=0. -> this rules out all zpool/zfs get/set options, as they should walk with the pool. * Boot the GENERIC kernel. l2arc stays with alloc=0. -> this rules out all my nonstandard kernel options. * Boot in single user mode. l2arc stays with alloc=0. -> this rules out all /etc/* config files. * Delete the zpool.cache and reimport pools. l2arc stays with alloc=0. * Copy the /boot/loader.conf settings to the other machine. The l2arc still works there. I could not think of any remaining place where this could come from, except the kernel code itself. From there, I found these counters nicely incrementing each second: kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 50758 kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 27121 kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 40589375488 But also this counter incrementing: kstat.zfs.misc.arcstats.l2_write_full: 14604 Then with some printf in the code I saw these values provided: buf_sz = hdr->b_size; align = (size_t)1 << dev->l2ad_vdev->vdev_ashift; buf_a_sz = P2ROUNDUP(buf_sz, align); if ((write_asize + buf_a_sz) > target_sz) { full = B_TRUE; mutex_exit(hash_lock); ARCSTAT_BUMP(arcstat_l2_write_full); break; } buf_sz = 1536 align = 512 buf_a_sz = 18446744069414585856 write_asize = 0 target_sz = 16777216 where buf_a_sz is obviousely off by (2^64 - 2^32). Maybe this is an effect of crosscompiling i386 on amd64. But anyway, as long as i386 is still supported, it should not happen. Now, my real concern is: if this really obvious ... made it undetected until 10.3, how many other missing typecasts are still in the code??