From owner-svn-src-head@freebsd.org Tue Jun 20 20:29:56 2017 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3B383DA18D8; Tue, 20 Jun 2017 20:29:56 +0000 (UTC) (envelope-from ken@freebsd.org) Received: from mithlond.kdm.org (mithlond.kdm.org [96.89.93.250]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "A1-33714", Issuer "A1-33714" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id BBC13729FA; Tue, 20 Jun 2017 20:29:55 +0000 (UTC) (envelope-from ken@freebsd.org) Received: from [10.0.0.26] (mbp2013.int.kdm.org [10.0.0.26]) (authenticated bits=0) by mithlond.kdm.org (8.15.2/8.14.9) with ESMTPSA id v5KKTluL029917 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 20 Jun 2017 16:29:48 -0400 (EDT) (envelope-from ken@freebsd.org) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: svn commit: r320156 - in head: cddl/contrib/opensolaris/cmd/zdb cddl/contrib/opensolaris/cmd/ztest cddl/contrib/opensolaris/lib/libzfs/common sys/cddl/contrib/opensolaris/common/zfs sys/cddl/contri... From: Ken Merry In-Reply-To: <201706201739.v5KHdPhO051256@repo.freebsd.org> Date: Tue, 20 Jun 2017 16:29:47 -0400 Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <81F84BCA-E973-4D78-B81C-1D398ADFA47E@freebsd.org> References: <201706201739.v5KHdPhO051256@repo.freebsd.org> To: Andriy Gapon X-Mailer: Apple Mail (2.3273) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (mithlond.kdm.org [96.89.93.250]); Tue, 20 Jun 2017 16:29:48 -0400 (EDT) X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jun 2017 20:29:56 -0000 I don=E2=80=99t know for sure that this commit is the cause, but it (and = r320153) are the only ZFS commits between a version of head from June = 14th that boots off a ZFS mirror, and one that panics. Here=E2=80=99s the stack trace: Fatal trap 12: page fault while in kernel mode cpuid =3D 22;=20 Fatal trap 12: page fault while in kernel mode cpuid =3D 9; apic id =3D 09 fault virtual address =3D 0x0 fault code =3D supervisor read data, page not present instruction pointer =3D 0x20:0xffffffff81e47f21 stack pointer =3D 0x28:0xfffffe08b37f8810 frame pointer =3D 0x28:0xfffffe08b37f8860 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 0 (zio_free_issue_0_3) [ thread pid 0 tid 100478 ] Stopped at 0xffffffff81e47f21 =3D zio_vdev_io_start+0x1f1: testb = $0x1,(%rax) db> bt Tracing pid 0 tid 100478 td 0xfffff80193156000 zio_vdev_io_start() at 0xffffffff81e47f21 =3D = zio_vdev_io_start+0x1f1/frame 0xfffffe08b37f8860 zio_execute() at 0xffffffff81e4312c =3D zio_execute+0x36c/frame = 0xfffffe08b37f88b0 zio_nowait() at 0xffffffff81e422b8 =3D zio_nowait+0xb8/frame = 0xfffffe08b37f88e0 vdev_mirror_io_start() at 0xffffffff81e224fc =3D = vdev_mirror_io_start+0x38c/frame 0xfffffe08b37f8930 zio_vdev_io_start() at 0xffffffff81e48030 =3D = zio_vdev_io_start+0x300/frame 0xfffffe08b37f8990 zio_execute() at 0xffffffff81e4312c =3D zio_execute+0x36c/frame = 0xfffffe08b37f89e0 taskqueue_run_locked() at 0xffffffff809a9d6d =3D = taskqueue_run_locked+0x13d/frame 0xfffffe08b37f8a40 taskqueue_thread_loop() at 0xffffffff809aab28 =3D = taskqueue_thread_loop+0x88/frame 0xfffffe08b37f8a70 fork_exit() at 0xffffffff8091e3e4 =3D fork_exit+0x84/frame = 0xfffffe08b37f8ab0 fork_trampoline() at 0xffffffff80d930fe =3D fork_trampoline+0xe/frame = 0xfffffe08b37f8ab0 --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- db>=20 (kgdb) list *(zio_vdev_io_start+0x1f1) 0xd9f21 is in zio_vdev_io_start = (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/u= ts/common/fs/zfs/zio.c:350). 345 346 /* 347 * Ensure that anyone expecting this zio to contain a = linear ABD isn't 348 * going to get a nasty surprise when they try to access = the data. 349 */ 350 IMPLY(abd_is_linear(zio->io_abd), abd_is_linear(data)); 351 352 zt->zt_orig_abd =3D zio->io_abd; 353 zt->zt_orig_size =3D zio->io_size; 354 zt->zt_bufsize =3D bufsize; I=E2=80=99ll try rebooting and see if the problem goes away. If not, = I=E2=80=99ll roll back the ABD change and see if the problem goes away. Ken =E2=80=94=20 Ken Merry ken@FreeBSD.ORG > On Jun 20, 2017, at 1:39 PM, Andriy Gapon wrote: >=20 > Author: avg > Date: Tue Jun 20 17:39:24 2017 > New Revision: 320156 > URL: https://svnweb.freebsd.org/changeset/base/320156 >=20 > Log: > MFV r318946: 8021 ARC buf data scatter-ization >=20 > illumos/illumos-gate@770499e185d15678ccb0be57ebc626ad18d93383 > = https://github.com/illumos/illumos-gate/commit/770499e185d15678ccb0be57ebc= 626ad18d93383 >=20 > https://www.illumos.org/issues/8021 > The ARC buf data project (known simply as "ABD" since its genesis = in the ZoL > community) changes the way the ARC allocates `b_pdata` memory from = using linear > `void *` buffers to using scatter/gather lists of fixed-size 1KB = chunks. This > improves ZFS's performance by helping to defragment the address = space occupied > by the ARC, in particular for cases where compressed ARC is = enabled. It could > also ease future work to allocate pages directly from `segkpm` for = minimal- > overhead memory allocations, bypassing the `kmem` subsystem. > This is essentially the same change as the one which recently = landed in ZFS on > Linux, although they made some platform-specific changes while = adapting this > work to their codebase: > 1. Implemented the equivalent of the `segkpm` suggestion for future = work > mentioned above to bypass issues that they've had with the Linux = kernel memory > allocator. > 2. Changed the internal representation of the ABD's scatter/gather = list so it > could be used to pass I/O directly into Linux block device drivers. = (This > feature is not available in the illumos block device interface = yet.) >=20 > FreeBSD notes: > - the actual (default) chunk size is 4KB (despite the text above = saying 1KB) > - we can try to reimplement ABDs, so that they are not permanently > mapped into the KVA unless explicitly requested, especially on > platforms with scarce KVA > - we can try to use unmapped I/O and avoid intermediate allocation of = a > linear, virtual memory mapped buffer > - we can try to avoid extra data copying by referring to chunks / = pages > in the original ABD >=20 > Reviewed by: Matthew Ahrens > Reviewed by: George Wilson > Reviewed by: Paul Dagnelie > Reviewed by: John Kennedy > Reviewed by: Prakash Surya > Reviewed by: Prashanth Sreenivasa > Reviewed by: Pavel Zakharov > Reviewed by: Chris Williamson > Approved by: Richard Lowe > Author: Dan Kimmel >=20 > MFC after: 3 weeks >=20 > Added: > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c > - copied, changed from r318946, = vendor-sys/illumos/dist/uts/common/fs/zfs/abd.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/abd.h > - copied, changed from r318946, = vendor-sys/illumos/dist/uts/common/fs/zfs/sys/abd.h > Modified: > head/cddl/contrib/opensolaris/cmd/zdb/zdb.c > head/cddl/contrib/opensolaris/cmd/zdb/zdb_il.c > head/cddl/contrib/opensolaris/cmd/ztest/ztest.c > head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c > head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.c > head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.h > head/sys/cddl/contrib/opensolaris/uts/common/Makefile.files > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/blkptr.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/edonr_zfs.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lz4.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sha256.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/skein_zfs.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/ddt.h > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h > = head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_checksum.h > = head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_compress.h > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_cache.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_disk.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_checksum.c > head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c > head/sys/conf/files > Directory Properties: > head/cddl/contrib/opensolaris/ (props changed) > head/cddl/contrib/opensolaris/cmd/zdb/ (props changed) > head/cddl/contrib/opensolaris/lib/libzfs/ (props changed) > head/sys/cddl/contrib/opensolaris/ (props changed) >=20 > Modified: head/cddl/contrib/opensolaris/cmd/zdb/zdb.c > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > --- head/cddl/contrib/opensolaris/cmd/zdb/zdb.c Tue Jun 20 = 17:38:25 2017 (r320155) > +++ head/cddl/contrib/opensolaris/cmd/zdb/zdb.c Tue Jun 20 = 17:39:24 2017 (r320156) > @@ -59,6 +59,7 @@ > #include > #include > #include > +#include > #include > #undef verify > #include > @@ -2410,7 +2411,7 @@ zdb_blkptr_done(zio_t *zio) > zdb_cb_t *zcb =3D zio->io_private; > zbookmark_phys_t *zb =3D &zio->io_bookmark; >=20 > - zio_data_buf_free(zio->io_data, zio->io_size); > + abd_free(zio->io_abd); >=20 > mutex_enter(&spa->spa_scrub_lock); > spa->spa_scrub_inflight--; > @@ -2477,7 +2478,7 @@ zdb_blkptr_cb(spa_t *spa, zilog_t *zilog, const = blkptr > if (!BP_IS_EMBEDDED(bp) && > (dump_opt['c'] > 1 || (dump_opt['c'] && is_metadata))) { > size_t size =3D BP_GET_PSIZE(bp); > - void *data =3D zio_data_buf_alloc(size); > + abd_t *abd =3D abd_alloc(size, B_FALSE); > int flags =3D ZIO_FLAG_CANFAIL | ZIO_FLAG_SCRUB | = ZIO_FLAG_RAW; >=20 > /* If it's an intent log block, failure is expected. */ > @@ -2490,7 +2491,7 @@ zdb_blkptr_cb(spa_t *spa, zilog_t *zilog, const = blkptr > spa->spa_scrub_inflight++; > mutex_exit(&spa->spa_scrub_lock); >=20 > - zio_nowait(zio_read(NULL, spa, bp, data, size, > + zio_nowait(zio_read(NULL, spa, bp, abd, size, > zdb_blkptr_done, zcb, ZIO_PRIORITY_ASYNC_READ, = flags, zb)); > } >=20 > @@ -3270,6 +3271,13 @@ name: > return (NULL); > } >=20 > +/* ARGSUSED */ > +static int > +random_get_pseudo_bytes_cb(void *buf, size_t len, void *unused) > +{ > + return (random_get_pseudo_bytes(buf, len)); > +} > + > /* > * Read a block from a pool and print it out. The syntax of the > * block descriptor is: > @@ -3301,7 +3309,8 @@ zdb_read_block(char *thing, spa_t *spa) > uint64_t offset =3D 0, size =3D 0, psize =3D 0, lsize =3D 0, = blkptr_offset =3D 0; > zio_t *zio; > vdev_t *vd; > - void *pbuf, *lbuf, *buf; > + abd_t *pabd; > + void *lbuf, *buf; > char *s, *p, *dup, *vdev, *flagstr; > int i, error; >=20 > @@ -3373,7 +3382,7 @@ zdb_read_block(char *thing, spa_t *spa) > psize =3D size; > lsize =3D size; >=20 > - pbuf =3D umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL); > + pabd =3D abd_alloc_linear(SPA_MAXBLOCKSIZE, B_FALSE); > lbuf =3D umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL); >=20 > BP_ZERO(bp); > @@ -3401,15 +3410,15 @@ zdb_read_block(char *thing, spa_t *spa) > /* > * Treat this as a normal block read. > */ > - zio_nowait(zio_read(zio, spa, bp, pbuf, psize, NULL, = NULL, > + zio_nowait(zio_read(zio, spa, bp, pabd, psize, NULL, = NULL, > ZIO_PRIORITY_SYNC_READ, > ZIO_FLAG_CANFAIL | ZIO_FLAG_RAW, NULL)); > } else { > /* > * Treat this as a vdev child I/O. > */ > - zio_nowait(zio_vdev_child_io(zio, bp, vd, offset, pbuf, = psize, > - ZIO_TYPE_READ, ZIO_PRIORITY_SYNC_READ, > + zio_nowait(zio_vdev_child_io(zio, bp, vd, offset, pabd, > + psize, ZIO_TYPE_READ, ZIO_PRIORITY_SYNC_READ, > ZIO_FLAG_DONT_CACHE | ZIO_FLAG_DONT_QUEUE | > ZIO_FLAG_DONT_PROPAGATE | ZIO_FLAG_DONT_RETRY | > ZIO_FLAG_CANFAIL | ZIO_FLAG_RAW, NULL, NULL)); > @@ -3432,21 +3441,21 @@ zdb_read_block(char *thing, spa_t *spa) > void *pbuf2 =3D umem_alloc(SPA_MAXBLOCKSIZE, = UMEM_NOFAIL); > void *lbuf2 =3D umem_alloc(SPA_MAXBLOCKSIZE, = UMEM_NOFAIL); >=20 > - bcopy(pbuf, pbuf2, psize); > + abd_copy_to_buf(pbuf2, pabd, psize); >=20 > - VERIFY(random_get_pseudo_bytes((uint8_t *)pbuf + psize, > - SPA_MAXBLOCKSIZE - psize) =3D=3D 0); > + VERIFY0(abd_iterate_func(pabd, psize, SPA_MAXBLOCKSIZE - = psize, > + random_get_pseudo_bytes_cb, NULL)); >=20 > - VERIFY(random_get_pseudo_bytes((uint8_t *)pbuf2 + psize, > - SPA_MAXBLOCKSIZE - psize) =3D=3D 0); > + VERIFY0(random_get_pseudo_bytes((uint8_t *)pbuf2 + = psize, > + SPA_MAXBLOCKSIZE - psize)); >=20 > for (lsize =3D SPA_MAXBLOCKSIZE; lsize > psize; > lsize -=3D SPA_MINBLOCKSIZE) { > for (c =3D 0; c < ZIO_COMPRESS_FUNCTIONS; c++) { > - if (zio_decompress_data(c, pbuf, lbuf, > - psize, lsize) =3D=3D 0 && > - zio_decompress_data(c, pbuf2, lbuf2, > - psize, lsize) =3D=3D 0 && > + if (zio_decompress_data(c, pabd, > + lbuf, psize, lsize) =3D=3D 0 && > + zio_decompress_data_buf(c, pbuf2, > + lbuf2, psize, lsize) =3D=3D 0 && > bcmp(lbuf, lbuf2, lsize) =3D=3D 0) > break; > } > @@ -3465,7 +3474,7 @@ zdb_read_block(char *thing, spa_t *spa) > buf =3D lbuf; > size =3D lsize; > } else { > - buf =3D pbuf; > + buf =3D abd_to_buf(pabd); > size =3D psize; > } >=20 > @@ -3483,7 +3492,7 @@ zdb_read_block(char *thing, spa_t *spa) > zdb_dump_block(thing, buf, size, flags); >=20 > out: > - umem_free(pbuf, SPA_MAXBLOCKSIZE); > + abd_free(pabd); > umem_free(lbuf, SPA_MAXBLOCKSIZE); > free(dup); > } >=20 > Modified: head/cddl/contrib/opensolaris/cmd/zdb/zdb_il.c > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > --- head/cddl/contrib/opensolaris/cmd/zdb/zdb_il.c Tue Jun 20 = 17:38:25 2017 (r320155) > +++ head/cddl/contrib/opensolaris/cmd/zdb/zdb_il.c Tue Jun 20 = 17:39:24 2017 (r320156) > @@ -24,7 +24,7 @@ > */ >=20 > /* > - * Copyright (c) 2013, 2014 by Delphix. All rights reserved. > + * Copyright (c) 2013, 2016 by Delphix. All rights reserved. > */ >=20 > /* > @@ -41,6 +41,7 @@ > #include > #include > #include > +#include >=20 > extern uint8_t dump_opt[256]; >=20 > @@ -117,13 +118,27 @@ zil_prt_rec_rename(zilog_t *zilog, int txtype, = lr_rena > } >=20 > /* ARGSUSED */ > +static int > +zil_prt_rec_write_cb(void *data, size_t len, void *unused) > +{ > + char *cdata =3D data; > + for (int i =3D 0; i < len; i++) { > + if (isprint(*cdata)) > + (void) printf("%c ", *cdata); > + else > + (void) printf("%2X", *cdata); > + cdata++; > + } > + return (0); > +} > + > +/* ARGSUSED */ > static void > zil_prt_rec_write(zilog_t *zilog, int txtype, lr_write_t *lr) > { > - char *data, *dlimit; > + abd_t *data; > blkptr_t *bp =3D &lr->lr_blkptr; > zbookmark_phys_t zb; > - char buf[SPA_MAXBLOCKSIZE]; > int verbose =3D MAX(dump_opt['d'], dump_opt['i']); > int error; >=20 > @@ -144,7 +159,6 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, = lr_write > if (BP_IS_HOLE(bp)) { > (void) printf("\t\t\tLSIZE 0x%llx\n", > (u_longlong_t)BP_GET_LSIZE(bp)); > - bzero(buf, sizeof (buf)); > (void) printf("%s\n", prefix); > return; > } > @@ -157,28 +171,26 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, = lr_write > lr->lr_foid, ZB_ZIL_LEVEL, > lr->lr_offset / BP_GET_LSIZE(bp)); >=20 > + data =3D abd_alloc(BP_GET_LSIZE(bp), B_FALSE); > error =3D zio_wait(zio_read(NULL, zilog->zl_spa, > - bp, buf, BP_GET_LSIZE(bp), NULL, NULL, > + bp, data, BP_GET_LSIZE(bp), NULL, NULL, > ZIO_PRIORITY_SYNC_READ, ZIO_FLAG_CANFAIL, &zb)); > if (error) > - return; > - data =3D buf; > + goto out; > } else { > - data =3D (char *)(lr + 1); > + /* data is stored after the end of the lr_write record = */ > + data =3D abd_alloc(lr->lr_length, B_FALSE); > + abd_copy_from_buf(data, lr + 1, lr->lr_length); > } >=20 > - dlimit =3D data + MIN(lr->lr_length, > - (verbose < 6 ? 20 : SPA_MAXBLOCKSIZE)); > - > (void) printf("%s", prefix); > - while (data < dlimit) { > - if (isprint(*data)) > - (void) printf("%c ", *data); > - else > - (void) printf("%2X", *data); > - data++; > - } > + (void) abd_iterate_func(data, > + 0, MIN(lr->lr_length, (verbose < 6 ? 20 : = SPA_MAXBLOCKSIZE)), > + zil_prt_rec_write_cb, NULL); > (void) printf("\n"); > + > +out: > + abd_free(data); > } >=20 > /* ARGSUSED */ >=20 > Modified: head/cddl/contrib/opensolaris/cmd/ztest/ztest.c > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > --- head/cddl/contrib/opensolaris/cmd/ztest/ztest.c Tue Jun 20 = 17:38:25 2017 (r320155) > +++ head/cddl/contrib/opensolaris/cmd/ztest/ztest.c Tue Jun 20 = 17:39:24 2017 (r320156) > @@ -112,6 +112,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -190,6 +191,7 @@ extern uint64_t metaslab_df_alloc_threshold; > extern uint64_t zfs_deadman_synctime_ms; > extern int metaslab_preload_limit; > extern boolean_t zfs_compressed_arc_enabled; > +extern boolean_t zfs_abd_scatter_enabled; >=20 > static ztest_shared_opts_t *ztest_shared_opts; > static ztest_shared_opts_t ztest_opts; > @@ -5042,7 +5044,7 @@ ztest_ddt_repair(ztest_ds_t *zd, uint64_t id) > enum zio_checksum checksum =3D spa_dedup_checksum(spa); > dmu_buf_t *db; > dmu_tx_t *tx; > - void *buf; > + abd_t *abd; > blkptr_t blk; > int copies =3D 2 * ZIO_DEDUPDITTO_MIN; >=20 > @@ -5122,14 +5124,14 @@ ztest_ddt_repair(ztest_ds_t *zd, uint64_t id) > * Damage the block. Dedup-ditto will save us when we read it = later. > */ > psize =3D BP_GET_PSIZE(&blk); > - buf =3D zio_buf_alloc(psize); > - ztest_pattern_set(buf, psize, ~pattern); > + abd =3D abd_alloc_linear(psize, B_TRUE); > + ztest_pattern_set(abd_to_buf(abd), psize, ~pattern); >=20 > (void) zio_wait(zio_rewrite(NULL, spa, 0, &blk, > - buf, psize, NULL, NULL, ZIO_PRIORITY_SYNC_WRITE, > + abd, psize, NULL, NULL, ZIO_PRIORITY_SYNC_WRITE, > ZIO_FLAG_CANFAIL | ZIO_FLAG_INDUCE_DAMAGE, NULL)); >=20 > - zio_buf_free(buf, psize); > + abd_free(abd); >=20 > (void) rw_unlock(&ztest_name_lock); > } > @@ -5413,6 +5415,12 @@ ztest_resume_thread(void *arg) > */ > if (ztest_random(10) =3D=3D 0) > zfs_compressed_arc_enabled =3D ztest_random(2); > + > + /* > + * Periodically change the zfs_abd_scatter_enabled = setting. > + */ > + if (ztest_random(10) =3D=3D 0) > + zfs_abd_scatter_enabled =3D ztest_random(2); > } > return (NULL); > } >=20 > Modified: = head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > --- head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c = Tue Jun 20 17:38:25 2017 (r320155) > +++ head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c = Tue Jun 20 17:39:24 2017 (r320156) > @@ -199,19 +199,19 @@ dump_record(dmu_replay_record_t *drr, void = *payload, i > { > ASSERT3U(offsetof(dmu_replay_record_t, = drr_u.drr_checksum.drr_checksum), > =3D=3D, sizeof (dmu_replay_record_t) - sizeof = (zio_cksum_t)); > - fletcher_4_incremental_native(drr, > + (void) fletcher_4_incremental_native(drr, > offsetof(dmu_replay_record_t, = drr_u.drr_checksum.drr_checksum), zc); > if (drr->drr_type !=3D DRR_BEGIN) { > ASSERT(ZIO_CHECKSUM_IS_ZERO(&drr->drr_u. > drr_checksum.drr_checksum)); > drr->drr_u.drr_checksum.drr_checksum =3D *zc; > } > - = fletcher_4_incremental_native(&drr->drr_u.drr_checksum.drr_checksum, > - sizeof (zio_cksum_t), zc); > + (void) fletcher_4_incremental_native( > + &drr->drr_u.drr_checksum.drr_checksum, sizeof (zio_cksum_t), = zc); > if (write(outfd, drr, sizeof (*drr)) =3D=3D -1) > return (errno); > if (payload_len !=3D 0) { > - fletcher_4_incremental_native(payload, payload_len, zc); > + (void) fletcher_4_incremental_native(payload, = payload_len, zc); > if (write(outfd, payload, payload_len) =3D=3D -1) > return (errno); > } > @@ -2096,9 +2096,9 @@ recv_read(libzfs_handle_t *hdl, int fd, void = *buf, int >=20 > if (zc) { > if (byteswap) > - fletcher_4_incremental_byteswap(buf, ilen, zc); > + (void) fletcher_4_incremental_byteswap(buf, = ilen, zc); > else > - fletcher_4_incremental_native(buf, ilen, zc); > + (void) fletcher_4_incremental_native(buf, ilen, = zc); > } > return (0); > } > @@ -3688,7 +3688,8 @@ zfs_receive_impl(libzfs_handle_t *hdl, const = char *tos > * recv_read() above; do it again correctly. > */ > bzero(&zcksum, sizeof (zio_cksum_t)); > - fletcher_4_incremental_byteswap(&drr, sizeof (drr), = &zcksum); > + (void) fletcher_4_incremental_byteswap(&drr, > + sizeof (drr), &zcksum); > flags->byteswap =3D B_TRUE; >=20 > drr.drr_type =3D BSWAP_32(drr.drr_type); >=20 > Modified: head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.c > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > --- head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.c = Tue Jun 20 17:38:25 2017 (r320155) > +++ head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.c = Tue Jun 20 17:39:24 2017 (r320156) > @@ -24,6 +24,7 @@ > */ > /* > * Copyright 2013 Saso Kiselkov. All rights reserved. > + * Copyright (c) 2016 by Delphix. All rights reserved. > */ >=20 > /* > @@ -133,17 +134,29 @@ > #include > #include > #include > +#include >=20 > -/*ARGSUSED*/ > void > -fletcher_2_native(const void *buf, uint64_t size, > - const void *ctx_template, zio_cksum_t *zcp) > +fletcher_init(zio_cksum_t *zcp) > { > + ZIO_SET_CHECKSUM(zcp, 0, 0, 0, 0); > +} > + > +int > +fletcher_2_incremental_native(void *buf, size_t size, void *data) > +{ > + zio_cksum_t *zcp =3D data; > + > const uint64_t *ip =3D buf; > const uint64_t *ipend =3D ip + (size / sizeof (uint64_t)); > uint64_t a0, b0, a1, b1; >=20 > - for (a0 =3D b0 =3D a1 =3D b1 =3D 0; ip < ipend; ip +=3D 2) { > + a0 =3D zcp->zc_word[0]; > + a1 =3D zcp->zc_word[1]; > + b0 =3D zcp->zc_word[2]; > + b1 =3D zcp->zc_word[3]; > + > + for (; ip < ipend; ip +=3D 2) { > a0 +=3D ip[0]; > a1 +=3D ip[1]; > b0 +=3D a0; > @@ -151,18 +164,33 @@ fletcher_2_native(const void *buf, uint64_t = size, > } >=20 > ZIO_SET_CHECKSUM(zcp, a0, a1, b0, b1); > + return (0); > } >=20 > /*ARGSUSED*/ > void > -fletcher_2_byteswap(const void *buf, uint64_t size, > +fletcher_2_native(const void *buf, size_t size, > const void *ctx_template, zio_cksum_t *zcp) > { > + fletcher_init(zcp); > + (void) fletcher_2_incremental_native((void *) buf, size, zcp); > +} > + > +int > +fletcher_2_incremental_byteswap(void *buf, size_t size, void *data) > +{ > + zio_cksum_t *zcp =3D data; > + > const uint64_t *ip =3D buf; > const uint64_t *ipend =3D ip + (size / sizeof (uint64_t)); > uint64_t a0, b0, a1, b1; >=20 > - for (a0 =3D b0 =3D a1 =3D b1 =3D 0; ip < ipend; ip +=3D 2) { > + a0 =3D zcp->zc_word[0]; > + a1 =3D zcp->zc_word[1]; > + b0 =3D zcp->zc_word[2]; > + b1 =3D zcp->zc_word[3]; > + > + for (; ip < ipend; ip +=3D 2) { > a0 +=3D BSWAP_64(ip[0]); > a1 +=3D BSWAP_64(ip[1]); > b0 +=3D a0; > @@ -170,50 +198,23 @@ fletcher_2_byteswap(const void *buf, uint64_t = size, > } >=20 > ZIO_SET_CHECKSUM(zcp, a0, a1, b0, b1); > + return (0); > } >=20 > /*ARGSUSED*/ > void > -fletcher_4_native(const void *buf, uint64_t size, > +fletcher_2_byteswap(const void *buf, size_t size, > const void *ctx_template, zio_cksum_t *zcp) > { > - const uint32_t *ip =3D buf; > - const uint32_t *ipend =3D ip + (size / sizeof (uint32_t)); > - uint64_t a, b, c, d; > - > - for (a =3D b =3D c =3D d =3D 0; ip < ipend; ip++) { > - a +=3D ip[0]; > - b +=3D a; > - c +=3D b; > - d +=3D c; > - } > - > - ZIO_SET_CHECKSUM(zcp, a, b, c, d); > + fletcher_init(zcp); > + (void) fletcher_2_incremental_byteswap((void *) buf, size, zcp); > } >=20 > -/*ARGSUSED*/ > -void > -fletcher_4_byteswap(const void *buf, uint64_t size, > - const void *ctx_template, zio_cksum_t *zcp) > +int > +fletcher_4_incremental_native(void *buf, size_t size, void *data) > { > - const uint32_t *ip =3D buf; > - const uint32_t *ipend =3D ip + (size / sizeof (uint32_t)); > - uint64_t a, b, c, d; > + zio_cksum_t *zcp =3D data; >=20 > - for (a =3D b =3D c =3D d =3D 0; ip < ipend; ip++) { > - a +=3D BSWAP_32(ip[0]); > - b +=3D a; > - c +=3D b; > - d +=3D c; > - } > - > - ZIO_SET_CHECKSUM(zcp, a, b, c, d); > -} > - > -void > -fletcher_4_incremental_native(const void *buf, uint64_t size, > - zio_cksum_t *zcp) > -{ > const uint32_t *ip =3D buf; > const uint32_t *ipend =3D ip + (size / sizeof (uint32_t)); > uint64_t a, b, c, d; > @@ -231,12 +232,23 @@ fletcher_4_incremental_native(const void *buf, = uint64_ > } >=20 > ZIO_SET_CHECKSUM(zcp, a, b, c, d); > + return (0); > } >=20 > +/*ARGSUSED*/ > void > -fletcher_4_incremental_byteswap(const void *buf, uint64_t size, > - zio_cksum_t *zcp) > +fletcher_4_native(const void *buf, size_t size, > + const void *ctx_template, zio_cksum_t *zcp) > { > + fletcher_init(zcp); > + (void) fletcher_4_incremental_native((void *) buf, size, zcp); > +} > + > +int > +fletcher_4_incremental_byteswap(void *buf, size_t size, void *data) > +{ > + zio_cksum_t *zcp =3D data; > + > const uint32_t *ip =3D buf; > const uint32_t *ipend =3D ip + (size / sizeof (uint32_t)); > uint64_t a, b, c, d; > @@ -254,4 +266,14 @@ fletcher_4_incremental_byteswap(const void *buf, = uint6 > } >=20 > ZIO_SET_CHECKSUM(zcp, a, b, c, d); > + return (0); > +} > + > +/*ARGSUSED*/ > +void > +fletcher_4_byteswap(const void *buf, size_t size, > + const void *ctx_template, zio_cksum_t *zcp) > +{ > + fletcher_init(zcp); > + (void) fletcher_4_incremental_byteswap((void *) buf, size, zcp); > } >=20 > Modified: head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.h > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > --- head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.h = Tue Jun 20 17:38:25 2017 (r320155) > +++ head/sys/cddl/contrib/opensolaris/common/zfs/zfs_fletcher.h = Tue Jun 20 17:39:24 2017 (r320156) > @@ -24,6 +24,7 @@ > */ > /* > * Copyright 2013 Saso Kiselkov. All rights reserved. > + * Copyright (c) 2016 by Delphix. All rights reserved. > */ >=20 > #ifndef _ZFS_FLETCHER_H > @@ -40,12 +41,15 @@ extern "C" { > * fletcher checksum functions > */ >=20 > -void fletcher_2_native(const void *, uint64_t, const void *, = zio_cksum_t *); > -void fletcher_2_byteswap(const void *, uint64_t, const void *, = zio_cksum_t *); > -void fletcher_4_native(const void *, uint64_t, const void *, = zio_cksum_t *); > -void fletcher_4_byteswap(const void *, uint64_t, const void *, = zio_cksum_t *); > -void fletcher_4_incremental_native(const void *, uint64_t, = zio_cksum_t *); > -void fletcher_4_incremental_byteswap(const void *, uint64_t, = zio_cksum_t *); > +void fletcher_init(zio_cksum_t *); > +void fletcher_2_native(const void *, size_t, const void *, = zio_cksum_t *); > +void fletcher_2_byteswap(const void *, size_t, const void *, = zio_cksum_t *); > +int fletcher_2_incremental_native(void *, size_t, void *); > +int fletcher_2_incremental_byteswap(void *, size_t, void *); > +void fletcher_4_native(const void *, size_t, const void *, = zio_cksum_t *); > +void fletcher_4_byteswap(const void *, size_t, const void *, = zio_cksum_t *); > +int fletcher_4_incremental_native(void *, size_t, void *); > +int fletcher_4_incremental_byteswap(void *, size_t, void *); >=20 > #ifdef __cplusplus > } >=20 > Modified: head/sys/cddl/contrib/opensolaris/uts/common/Makefile.files > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > --- head/sys/cddl/contrib/opensolaris/uts/common/Makefile.files = Tue Jun 20 17:38:25 2017 (r320155) > +++ head/sys/cddl/contrib/opensolaris/uts/common/Makefile.files = Tue Jun 20 17:39:24 2017 (r320156) > @@ -33,6 +33,7 @@ > # common to all SunOS systems. >=20 > ZFS_COMMON_OBJS +=3D \ > + abd.o \ > arc.o \ > bplist.o \ > blkptr.o \ >=20 > Copied and modified: = head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c (from r318946, = vendor-sys/illumos/dist/uts/common/fs/zfs/abd.c) > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > --- vendor-sys/illumos/dist/uts/common/fs/zfs/abd.c Fri May 26 = 12:13:27 2017 (r318946, copy source) > +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c Tue Jun = 20 17:39:24 2017 (r320156) > @@ -174,6 +174,7 @@ abd_free_chunk(void *c) > void > abd_init(void) > { > +#ifdef illumos > vmem_t *data_alloc_arena =3D NULL; >=20 > #ifdef _KERNEL > @@ -186,7 +187,10 @@ abd_init(void) > */ > abd_chunk_cache =3D kmem_cache_create("abd_chunk", = zfs_abd_chunk_size, 0, > NULL, NULL, NULL, NULL, data_alloc_arena, KMC_NOTOUCH); > - > +#else > + abd_chunk_cache =3D kmem_cache_create("abd_chunk", = zfs_abd_chunk_size, 0, > + NULL, NULL, NULL, NULL, 0, KMC_NOTOUCH | KMC_NODEBUG); > +#endif > abd_ksp =3D kstat_create("zfs", 0, "abdstats", "misc", = KSTAT_TYPE_NAMED, > sizeof (abd_stats) / sizeof (kstat_named_t), = KSTAT_FLAG_VIRTUAL); > if (abd_ksp !=3D NULL) { >=20 > Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c Tue Jun = 20 17:38:25 2017 (r320155) > +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c Tue Jun = 20 17:39:24 2017 (r320156) > @@ -128,14 +128,14 @@ > * the arc_buf_hdr_t that will point to the data block in memory. A = block can > * only be read by a consumer if it has an l1arc_buf_hdr_t. The L1ARC > * caches data in two ways -- in a list of ARC buffers (arc_buf_t) and > - * also in the arc_buf_hdr_t's private physical data block pointer = (b_pdata). > + * also in the arc_buf_hdr_t's private physical data block pointer = (b_pabd). > * > * The L1ARC's data pointer may or may not be uncompressed. The ARC = has the > - * ability to store the physical data (b_pdata) associated with the = DVA of the > - * arc_buf_hdr_t. Since the b_pdata is a copy of the on-disk physical = block, > + * ability to store the physical data (b_pabd) associated with the = DVA of the > + * arc_buf_hdr_t. Since the b_pabd is a copy of the on-disk physical = block, > * it will match its on-disk compression characteristics. This = behavior can be > * disabled by setting 'zfs_compressed_arc_enabled' to B_FALSE. When = the > - * compressed ARC functionality is disabled, the b_pdata will point = to an > + * compressed ARC functionality is disabled, the b_pabd will point to = an > * uncompressed version of the on-disk data. > * > * Data in the L1ARC is not accessed by consumers of the ARC directly. = Each > @@ -174,7 +174,7 @@ > * | l1arc_buf_hdr_t > * | | arc_buf_t > * | b_buf +------------>+-----------+ arc_buf_t > - * | b_pdata +-+ |b_next +---->+-----------+ > + * | b_pabd +-+ |b_next +---->+-----------+ > * +-----------+ | |-----------| |b_next +-->NULL > * | |b_comp =3D T | +-----------+ > * | |b_data +-+ |b_comp =3D F | > @@ -191,8 +191,8 @@ > * When a consumer reads a block, the ARC must first look to see if = the > * arc_buf_hdr_t is cached. If the hdr is cached then the ARC = allocates a new > * arc_buf_t and either copies uncompressed data into a new data = buffer from an > - * existing uncompressed arc_buf_t, decompresses the hdr's b_pdata = buffer into a > - * new data buffer, or shares the hdr's b_pdata buffer, depending on = whether the > + * existing uncompressed arc_buf_t, decompresses the hdr's b_pabd = buffer into a > + * new data buffer, or shares the hdr's b_pabd buffer, depending on = whether the > * hdr is compressed and the desired compression characteristics of = the > * arc_buf_t consumer. If the arc_buf_t ends up sharing data with the > * arc_buf_hdr_t and both of them are uncompressed then the arc_buf_t = must be > @@ -216,7 +216,7 @@ > * | | arc_buf_t (shared) > * | b_buf +------------>+---------+ arc_buf_t > * | | |b_next = +---->+---------+ > - * | b_pdata +-+ |---------| |b_next = +-->NULL > + * | b_pabd +-+ |---------| |b_next = +-->NULL > * +-----------+ | | | = +---------+ > * | |b_data +-+ | = | > * | +---------+ | |b_data = +-+ > @@ -230,19 +230,19 @@ > * | +------+ = | > * = +---------------------------------+ > * > - * Writing to the ARC requires that the ARC first discard the hdr's = b_pdata > + * Writing to the ARC requires that the ARC first discard the hdr's = b_pabd > * since the physical block is about to be rewritten. The new data = contents > * will be contained in the arc_buf_t. As the I/O pipeline performs = the write, > * it may compress the data before writing it to disk. The ARC will be = called > * with the transformed data and will bcopy the transformed on-disk = block into > - * a newly allocated b_pdata. Writes are always done into buffers = which have > + * a newly allocated b_pabd. Writes are always done into buffers = which have > * either been loaned (and hence are new and don't have other readers) = or > * buffers which have been released (and hence have their own hdr, if = there > * were originally other readers of the buf's original hdr). This = ensures that > * the ARC only needs to update a single buf and its hdr after a write = occurs. > * > - * When the L2ARC is in use, it will also take advantage of the = b_pdata. The > - * L2ARC will always write the contents of b_pdata to the L2ARC. This = means > + * When the L2ARC is in use, it will also take advantage of the = b_pabd. The > + * L2ARC will always write the contents of b_pabd to the L2ARC. This = means > * that when compressed ARC is enabled that the L2ARC blocks are = identical > * to the on-disk block in the main data pool. This provides a = significant > * advantage since the ARC can leverage the bp's checksum when reading = from the > @@ -263,7 +263,9 @@ > #include > #include > #include > +#include > #include > +#include > #ifdef _KERNEL > #include > #include > @@ -307,7 +309,7 @@ int zfs_arc_evict_batch_limit =3D 10; > /* number of seconds before growing cache again */ > static int arc_grow_retry =3D 60; >=20 > -/* shift of arc_c for calculating overflow limit in arc_get_data_buf = */ > +/* shift of arc_c for calculating overflow limit in arc_get_data_impl = */ > int zfs_arc_overflow_shift =3D 8; >=20 > /* shift of arc_c for calculating both min and max arc_p */ > @@ -543,13 +545,13 @@ typedef struct arc_stats { > kstat_named_t arcstat_c_max; > kstat_named_t arcstat_size; > /* > - * Number of compressed bytes stored in the arc_buf_hdr_t's = b_pdata. > + * Number of compressed bytes stored in the arc_buf_hdr_t's = b_pabd. > * Note that the compressed bytes may match the uncompressed = bytes > * if the block is either not compressed or compressed arc is = disabled. > */ > kstat_named_t arcstat_compressed_size; > /* > - * Uncompressed size of the data stored in b_pdata. If = compressed > + * Uncompressed size of the data stored in b_pabd. If compressed > * arc is disabled then this value will be identical to the stat > * above. > */ > @@ -988,7 +990,7 @@ typedef struct l1arc_buf_hdr { > refcount_t b_refcnt; >=20 > arc_callback_t *b_acb; > - void *b_pdata; > + abd_t *b_pabd; > } l1arc_buf_hdr_t; >=20 > typedef struct l2arc_dev l2arc_dev_t; > @@ -1341,7 +1343,7 @@ typedef struct l2arc_read_callback { > blkptr_t l2rcb_bp; /* original = blkptr */ > zbookmark_phys_t l2rcb_zb; /* original = bookmark */ > int l2rcb_flags; /* original = flags */ > - void *l2rcb_data; /* temporary = buffer */ > + void *l2rcb_abd; /* temporary = buffer */ > } l2arc_read_callback_t; >=20 > typedef struct l2arc_write_callback { > @@ -1351,7 +1353,7 @@ typedef struct l2arc_write_callback { >=20 > typedef struct l2arc_data_free { > /* protected by l2arc_free_on_write_mtx */ > - void *l2df_data; > + abd_t *l2df_abd; > size_t l2df_size; > arc_buf_contents_t l2df_type; > list_node_t l2df_list_node; > @@ -1361,10 +1363,14 @@ static kmutex_t l2arc_feed_thr_lock; > static kcondvar_t l2arc_feed_thr_cv; > static uint8_t l2arc_thread_exit; >=20 > +static abd_t *arc_get_data_abd(arc_buf_hdr_t *, uint64_t, void *); > static void *arc_get_data_buf(arc_buf_hdr_t *, uint64_t, void *); > +static void arc_get_data_impl(arc_buf_hdr_t *, uint64_t, void *); > +static void arc_free_data_abd(arc_buf_hdr_t *, abd_t *, uint64_t, = void *); > static void arc_free_data_buf(arc_buf_hdr_t *, void *, uint64_t, void = *); > -static void arc_hdr_free_pdata(arc_buf_hdr_t *hdr); > -static void arc_hdr_alloc_pdata(arc_buf_hdr_t *); > +static void arc_free_data_impl(arc_buf_hdr_t *hdr, uint64_t size, = void *tag); > +static void arc_hdr_free_pabd(arc_buf_hdr_t *); > +static void arc_hdr_alloc_pabd(arc_buf_hdr_t *); > static void arc_access(arc_buf_hdr_t *, kmutex_t *); > static boolean_t arc_is_overflowing(); > static void arc_buf_watch(arc_buf_t *); > @@ -1718,7 +1724,9 @@ static inline boolean_t > arc_buf_is_shared(arc_buf_t *buf) > { > boolean_t shared =3D (buf->b_data !=3D NULL && > - buf->b_data =3D=3D buf->b_hdr->b_l1hdr.b_pdata); > + buf->b_hdr->b_l1hdr.b_pabd !=3D NULL && > + abd_is_linear(buf->b_hdr->b_l1hdr.b_pabd) && > + buf->b_data =3D=3D abd_to_buf(buf->b_hdr->b_l1hdr.b_pabd)); > IMPLY(shared, HDR_SHARED_DATA(buf->b_hdr)); > IMPLY(shared, ARC_BUF_SHARED(buf)); > IMPLY(shared, ARC_BUF_COMPRESSED(buf) || ARC_BUF_LAST(buf)); > @@ -1822,7 +1830,8 @@ arc_cksum_is_equal(arc_buf_hdr_t *hdr, zio_t = *zio) > uint64_t csize; >=20 > void *cbuf =3D zio_buf_alloc(HDR_GET_PSIZE(hdr)); > - csize =3D zio_compress_data(compress, zio->io_data, = cbuf, lsize); > + csize =3D zio_compress_data(compress, zio->io_abd, cbuf, = lsize); > + > ASSERT3U(csize, <=3D, HDR_GET_PSIZE(hdr)); > if (csize < HDR_GET_PSIZE(hdr)) { > /* > @@ -1857,7 +1866,7 @@ arc_cksum_is_equal(arc_buf_hdr_t *hdr, zio_t = *zio) > * logical I/O size and not just a gang fragment. > */ > valid_cksum =3D (zio_checksum_error_impl(zio->io_spa, = zio->io_bp, > - BP_GET_CHECKSUM(zio->io_bp), zio->io_data, zio->io_size, > + BP_GET_CHECKSUM(zio->io_bp), zio->io_abd, zio->io_size, > zio->io_offset, NULL) =3D=3D 0); > zio_pop_transforms(zio); > return (valid_cksum); > @@ -2161,7 +2170,7 @@ arc_buf_fill(arc_buf_t *buf, boolean_t = compressed) >=20 > if (hdr_compressed =3D=3D compressed) { > if (!arc_buf_is_shared(buf)) { > - bcopy(hdr->b_l1hdr.b_pdata, buf->b_data, > + abd_copy_to_buf(buf->b_data, = hdr->b_l1hdr.b_pabd, > arc_buf_size(buf)); > } > } else { > @@ -2213,7 +2222,7 @@ arc_buf_fill(arc_buf_t *buf, boolean_t = compressed) > return (0); > } else { > int error =3D = zio_decompress_data(HDR_GET_COMPRESS(hdr), > - hdr->b_l1hdr.b_pdata, buf->b_data, > + hdr->b_l1hdr.b_pabd, buf->b_data, > HDR_GET_PSIZE(hdr), HDR_GET_LSIZE(hdr)); >=20 > /* > @@ -2250,7 +2259,7 @@ arc_decompress(arc_buf_t *buf) > } >=20 > /* > - * Return the size of the block, b_pdata, that is stored in the = arc_buf_hdr_t. > + * Return the size of the block, b_pabd, that is stored in the = arc_buf_hdr_t. > */ > static uint64_t > arc_hdr_size(arc_buf_hdr_t *hdr) > @@ -2282,14 +2291,14 @@ arc_evictable_space_increment(arc_buf_hdr_t = *hdr, arc_ > if (GHOST_STATE(state)) { > ASSERT0(hdr->b_l1hdr.b_bufcnt); > ASSERT3P(hdr->b_l1hdr.b_buf, =3D=3D, NULL); > - ASSERT3P(hdr->b_l1hdr.b_pdata, =3D=3D, NULL); > + ASSERT3P(hdr->b_l1hdr.b_pabd, =3D=3D, NULL); > (void) refcount_add_many(&state->arcs_esize[type], > HDR_GET_LSIZE(hdr), hdr); > return; > } >=20 > ASSERT(!GHOST_STATE(state)); > - if (hdr->b_l1hdr.b_pdata !=3D NULL) { > + if (hdr->b_l1hdr.b_pabd !=3D NULL) { > (void) refcount_add_many(&state->arcs_esize[type], > arc_hdr_size(hdr), hdr); > } > @@ -2317,14 +2326,14 @@ arc_evictable_space_decrement(arc_buf_hdr_t = *hdr, arc_ > if (GHOST_STATE(state)) { > ASSERT0(hdr->b_l1hdr.b_bufcnt); > ASSERT3P(hdr->b_l1hdr.b_buf, =3D=3D, NULL); > - ASSERT3P(hdr->b_l1hdr.b_pdata, =3D=3D, NULL); > + ASSERT3P(hdr->b_l1hdr.b_pabd, =3D=3D, NULL); > (void) refcount_remove_many(&state->arcs_esize[type], > HDR_GET_LSIZE(hdr), hdr); > return; > } >=20 > ASSERT(!GHOST_STATE(state)); > - if (hdr->b_l1hdr.b_pdata !=3D NULL) { > + if (hdr->b_l1hdr.b_pabd !=3D NULL) { > (void) refcount_remove_many(&state->arcs_esize[type], > arc_hdr_size(hdr), hdr); > } > @@ -2421,7 +2430,7 @@ arc_change_state(arc_state_t *new_state, = arc_buf_hdr_t > old_state =3D hdr->b_l1hdr.b_state; > refcnt =3D refcount_count(&hdr->b_l1hdr.b_refcnt); > bufcnt =3D hdr->b_l1hdr.b_bufcnt; > - update_old =3D (bufcnt > 0 || hdr->b_l1hdr.b_pdata !=3D = NULL); > + update_old =3D (bufcnt > 0 || hdr->b_l1hdr.b_pabd !=3D = NULL); > } else { > old_state =3D arc_l2c_only; > refcnt =3D 0; > @@ -2491,7 +2500,7 @@ arc_change_state(arc_state_t *new_state, = arc_buf_hdr_t > */ > (void) refcount_add_many(&new_state->arcs_size, > HDR_GET_LSIZE(hdr), hdr); > - ASSERT3P(hdr->b_l1hdr.b_pdata, =3D=3D, NULL); > + ASSERT3P(hdr->b_l1hdr.b_pabd, =3D=3D, NULL); > } else { > uint32_t buffers =3D 0; >=20 > @@ -2520,7 +2529,7 @@ arc_change_state(arc_state_t *new_state, = arc_buf_hdr_t > } > ASSERT3U(bufcnt, =3D=3D, buffers); >=20 > - if (hdr->b_l1hdr.b_pdata !=3D NULL) { > + if (hdr->b_l1hdr.b_pabd !=3D NULL) { > (void) = refcount_add_many(&new_state->arcs_size, > arc_hdr_size(hdr), hdr); > } else { > @@ -2533,7 +2542,7 @@ arc_change_state(arc_state_t *new_state, = arc_buf_hdr_t > ASSERT(HDR_HAS_L1HDR(hdr)); > if (GHOST_STATE(old_state)) { > ASSERT0(bufcnt); > - ASSERT3P(hdr->b_l1hdr.b_pdata, =3D=3D, NULL); > + ASSERT3P(hdr->b_l1hdr.b_pabd, =3D=3D, NULL); >=20 > /* > * When moving a header off of a ghost state, > @@ -2573,7 +2582,7 @@ arc_change_state(arc_state_t *new_state, = arc_buf_hdr_t > buf); > } > ASSERT3U(bufcnt, =3D=3D, buffers); > - ASSERT3P(hdr->b_l1hdr.b_pdata, !=3D, NULL); > + ASSERT3P(hdr->b_l1hdr.b_pabd, !=3D, NULL); > (void) refcount_remove_many( > &old_state->arcs_size, arc_hdr_size(hdr), = hdr); > } > @@ -2655,7 +2664,7 @@ arc_space_return(uint64_t space, = arc_space_type_t type >=20 > /* > * Given a hdr and a buf, returns whether that buf can share its = b_data buffer > - * with the hdr's b_pdata. > + * with the hdr's b_pabd. > */ > static boolean_t > arc_can_share(arc_buf_hdr_t *hdr, arc_buf_t *buf) > @@ -2732,20 +2741,23 @@ arc_buf_alloc_impl(arc_buf_hdr_t *hdr, void = *tag, bool > /* > * If the hdr's data can be shared then we share the data buffer = and > * set the appropriate bit in the hdr's b_flags to indicate the = hdr is > - * sharing it's b_pdata with the arc_buf_t. Otherwise, we = allocate a new > + * sharing it's b_pabd with the arc_buf_t. Otherwise, we = allocate a new > * buffer to store the buf's data. > * > - * There is one additional restriction here because we're = sharing > - * hdr -> buf instead of the usual buf -> hdr: the hdr can't be = actively > - * involved in an L2ARC write, because if this buf is used by an > - * arc_write() then the hdr's data buffer will be released when = the > + * There are two additional restrictions here because we're = sharing > + * hdr -> buf instead of the usual buf -> hdr. First, the hdr = can't be > + * actively involved in an L2ARC write, because if this buf is = used by > + * an arc_write() then the hdr's data buffer will be released = when the > * write completes, even though the L2ARC write might still be = using it. > + * Second, the hdr's ABD must be linear so that the buf's user = doesn't > + * need to be ABD-aware. > */ > - boolean_t can_share =3D arc_can_share(hdr, buf) && = !HDR_L2_WRITING(hdr); > + boolean_t can_share =3D arc_can_share(hdr, buf) && = !HDR_L2_WRITING(hdr) && > + abd_is_linear(hdr->b_l1hdr.b_pabd); >=20 > /* Set up b_data and sharing */ > if (can_share) { > - buf->b_data =3D hdr->b_l1hdr.b_pdata; > + buf->b_data =3D abd_to_buf(hdr->b_l1hdr.b_pabd); > buf->b_flags |=3D ARC_BUF_FLAG_SHARED; > arc_hdr_set_flags(hdr, ARC_FLAG_SHARED_DATA); > } else { > @@ -2841,11 +2853,11 @@ arc_loan_inuse_buf(arc_buf_t *buf, void *tag) > } >=20 > static void > -l2arc_free_data_on_write(void *data, size_t size, arc_buf_contents_t = type) > +l2arc_free_abd_on_write(abd_t *abd, size_t size, arc_buf_contents_t = type) > { > l2arc_data_free_t *df =3D kmem_alloc(sizeof (*df), KM_SLEEP); >=20 > - df->l2df_data =3D data; > + df->l2df_abd =3D abd; > df->l2df_size =3D size; > df->l2df_type =3D type; > mutex_enter(&l2arc_free_on_write_mtx); > @@ -2876,7 +2888,7 @@ arc_hdr_free_on_write(arc_buf_hdr_t *hdr) > arc_space_return(size, ARC_SPACE_DATA); > } >=20 > - l2arc_free_data_on_write(hdr->b_l1hdr.b_pdata, size, type); > + l2arc_free_abd_on_write(hdr->b_l1hdr.b_pabd, size, type); > } >=20 > /* > @@ -2890,7 +2902,7 @@ arc_share_buf(arc_buf_hdr_t *hdr, arc_buf_t = *buf) > arc_state_t *state =3D hdr->b_l1hdr.b_state; >=20 > ASSERT(arc_can_share(hdr, buf)); > - ASSERT3P(hdr->b_l1hdr.b_pdata, =3D=3D, NULL); > + ASSERT3P(hdr->b_l1hdr.b_pabd, =3D=3D, NULL); > ASSERT(MUTEX_HELD(HDR_LOCK(hdr)) || HDR_EMPTY(hdr)); >=20 > /* > @@ -2899,7 +2911,9 @@ arc_share_buf(arc_buf_hdr_t *hdr, arc_buf_t = *buf) > * the refcount whenever an arc_buf_t is shared. > */ > refcount_transfer_ownership(&state->arcs_size, buf, hdr); > - hdr->b_l1hdr.b_pdata =3D buf->b_data; > + hdr->b_l1hdr.b_pabd =3D abd_get_from_buf(buf->b_data, = arc_buf_size(buf)); > + abd_take_ownership_of_buf(hdr->b_l1hdr.b_pabd, > + HDR_ISTYPE_METADATA(hdr)); > arc_hdr_set_flags(hdr, ARC_FLAG_SHARED_DATA); > buf->b_flags |=3D ARC_BUF_FLAG_SHARED; >=20 > @@ -2919,7 +2933,7 @@ arc_unshare_buf(arc_buf_hdr_t *hdr, arc_buf_t = *buf) > arc_state_t *state =3D hdr->b_l1hdr.b_state; >=20 > ASSERT(arc_buf_is_shared(buf)); > - ASSERT3P(hdr->b_l1hdr.b_pdata, !=3D, NULL); > + ASSERT3P(hdr->b_l1hdr.b_pabd, !=3D, NULL); > ASSERT(MUTEX_HELD(HDR_LOCK(hdr)) || HDR_EMPTY(hdr)); >=20 > /* > @@ -2928,7 +2942,9 @@ arc_unshare_buf(arc_buf_hdr_t *hdr, arc_buf_t = *buf) > */ > refcount_transfer_ownership(&state->arcs_size, hdr, buf); > arc_hdr_clear_flags(hdr, ARC_FLAG_SHARED_DATA); > - hdr->b_l1hdr.b_pdata =3D NULL; > + abd_release_ownership_of_buf(hdr->b_l1hdr.b_pabd); > + abd_put(hdr->b_l1hdr.b_pabd); > + hdr->b_l1hdr.b_pabd =3D NULL; > buf->b_flags &=3D ~ARC_BUF_FLAG_SHARED; >=20 > /* > @@ -3025,7 +3041,7 @@ arc_buf_destroy_impl(arc_buf_t *buf) > if (ARC_BUF_SHARED(buf) && !ARC_BUF_COMPRESSED(buf)) { > /* > * If the current arc_buf_t is sharing its data buffer = with the > - * hdr, then reassign the hdr's b_pdata to share it with = the new > + * hdr, then reassign the hdr's b_pabd to share it with = the new > * buffer at the end of the list. The shared buffer is = always > * the last one on the hdr's buffer list. > * > @@ -3040,8 +3056,8 @@ arc_buf_destroy_impl(arc_buf_t *buf) > /* hdr is uncompressed so can't have compressed = buf */ > VERIFY(!ARC_BUF_COMPRESSED(lastbuf)); >=20 > - ASSERT3P(hdr->b_l1hdr.b_pdata, !=3D, NULL); > - arc_hdr_free_pdata(hdr); > + ASSERT3P(hdr->b_l1hdr.b_pabd, !=3D, NULL); > + arc_hdr_free_pabd(hdr); >=20 > /* > * We must setup a new shared block between the > @@ -3079,26 +3095,26 @@ arc_buf_destroy_impl(arc_buf_t *buf) > } >=20 > *** DIFF OUTPUT TRUNCATED AT 1000 LINES *** >=20