From owner-freebsd-stable@freebsd.org Thu Oct 13 11:24:25 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EF247C0E748 for ; Thu, 13 Oct 2016 11:24:25 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 357DFB19 for ; Thu, 13 Oct 2016 11:24:24 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA04922; Thu, 13 Oct 2016 14:24:22 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1bue7a-000JOZ-F0; Thu, 13 Oct 2016 14:24:22 +0300 Subject: Re: ZFS l2arc broken in 10.3 To: Peter , freebsd-stable@FreeBSD.org References: From: Andriy Gapon Message-ID: <32ee460e-64ba-2270-d587-d27007bbb3ab@FreeBSD.org> Date: Thu, 13 Oct 2016 14:23:25 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Oct 2016 11:24:26 -0000 On 12/10/2016 23:18, Peter wrote: > Details: > After upgrading 2 machines from 9.3 to 10.3-STABLE, on one of them the > l2arc stays empty (capacity alloc = 0), although it is online and gets > accessed. It did work well on 9.3. > > I did the following tests: > * Create a zpool on a stick, with two volumes: one filesystem and one > cache. The cache stays with alloc=0. > Export it and move it into the other machine. The cache immediately > fills. > Move it back, the cache stays with alloc=0. > -> this rules out all zpool/zfs get/set options, as they should > walk with the pool. > * Boot the GENERIC kernel. l2arc stays with alloc=0. > -> this rules out all my nonstandard kernel options. > * Boot in single user mode. l2arc stays with alloc=0. > -> this rules out all /etc/* config files. > * Delete the zpool.cache and reimport pools. l2arc stays with alloc=0. > * Copy the /boot/loader.conf settings to the other machine. The l2arc > still works there. > > I could not think of any remaining place where this could come from, > except the kernel code itself. > From there, I found these counters nicely incrementing each second: > kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 50758 > kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 27121 > kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 40589375488 > But also this counter incrementing: > kstat.zfs.misc.arcstats.l2_write_full: 14604 > > Then with some printf in the code I saw these values provided: > buf_sz = hdr->b_size; > align = (size_t)1 << dev->l2ad_vdev->vdev_ashift; > buf_a_sz = P2ROUNDUP(buf_sz, align); > if ((write_asize + buf_a_sz) > target_sz) { > full = B_TRUE; > mutex_exit(hash_lock); > ARCSTAT_BUMP(arcstat_l2_write_full); > break; > } > > buf_sz = 1536 > align = 512 > buf_a_sz = 18446744069414585856 > write_asize = 0 > target_sz = 16777216 > > where buf_a_sz is obviousely off by (2^64 - 2^32). > > Maybe this is an effect of crosscompiling i386 on amd64. Yes, the problem is specific to 32-bit platforms where size_t is 32-bit. > But anyway, as long as > i386 is still supported, it should not happen. Certainly. > Now, my real concern is: if this really obvious ... made it undetected until > 10.3, how many other missing typecasts are still in the code?? No need to be dramatic here. That particular piece code is very new. I committed it to head in April (r297848), MFC-ed even later. Apparently no one else who uses 32-bit systems and has L2ARC configured had a chance to run into the bug. Thank you very much for discovering and analyzing the bug and providing a fix for it! -- Andriy Gapon