Date: Wed, 12 Oct 2016 22:18:44 +0200 From: Peter <pmc@citylink.dinoex.sub.org> To: freebsd-stable@FreeBSD.ORG Subject: Re: ZFS l2arc broken in 10.3 Message-ID: <ntm5r4$473$1@oper.dinoex.de> In-Reply-To: <ntlssq$6hj$2@oper.dinoex.de> References: <ntlssq$6hj$2@oper.dinoex.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Details: After upgrading 2 machines from 9.3 to 10.3-STABLE, on one of them the l2arc stays empty (capacity alloc = 0), although it is online and gets accessed. It did work well on 9.3. I did the following tests: * Create a zpool on a stick, with two volumes: one filesystem and one cache. The cache stays with alloc=0. Export it and move it into the other machine. The cache immediately fills. Move it back, the cache stays with alloc=0. -> this rules out all zpool/zfs get/set options, as they should walk with the pool. * Boot the GENERIC kernel. l2arc stays with alloc=0. -> this rules out all my nonstandard kernel options. * Boot in single user mode. l2arc stays with alloc=0. -> this rules out all /etc/* config files. * Delete the zpool.cache and reimport pools. l2arc stays with alloc=0. * Copy the /boot/loader.conf settings to the other machine. The l2arc still works there. I could not think of any remaining place where this could come from, except the kernel code itself. From there, I found these counters nicely incrementing each second: kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 50758 kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 27121 kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 40589375488 But also this counter incrementing: kstat.zfs.misc.arcstats.l2_write_full: 14604 Then with some printf in the code I saw these values provided: buf_sz = hdr->b_size; align = (size_t)1 << dev->l2ad_vdev->vdev_ashift; buf_a_sz = P2ROUNDUP(buf_sz, align); if ((write_asize + buf_a_sz) > target_sz) { full = B_TRUE; mutex_exit(hash_lock); ARCSTAT_BUMP(arcstat_l2_write_full); break; } buf_sz = 1536 align = 512 buf_a_sz = 18446744069414585856 write_asize = 0 target_sz = 16777216 where buf_a_sz is obviousely off by (2^64 - 2^32). Maybe this is an effect of crosscompiling i386 on amd64. But anyway, as long as i386 is still supported, it should not happen. Now, my real concern is: if this really obvious ... made it undetected until 10.3, how many other missing typecasts are still in the code??
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ntm5r4$473$1>