From owner-svn-src-head@freebsd.org Tue Jun 20 21:25:56 2017 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E8049DA2AD7; Tue, 20 Jun 2017 21:25:56 +0000 (UTC) (envelope-from ken@kdm.org) Received: from mithlond.kdm.org (mithlond.kdm.org [96.89.93.250]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "A1-33714", Issuer "A1-33714" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id AFDF074DB9; Tue, 20 Jun 2017 21:25:56 +0000 (UTC) (envelope-from ken@kdm.org) Received: from mithlond.kdm.org (localhost [127.0.0.1]) by mithlond.kdm.org (8.15.2/8.14.9) with ESMTPS id v5KLPrgS030844 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 20 Jun 2017 17:25:54 -0400 (EDT) (envelope-from ken@mithlond.kdm.org) Received: (from ken@localhost) by mithlond.kdm.org (8.15.2/8.14.9/Submit) id v5KLPrVR030843; Tue, 20 Jun 2017 17:25:53 -0400 (EDT) (envelope-from ken) Date: Tue, 20 Jun 2017 17:25:53 -0400 From: "Kenneth D. Merry" To: Andriy Gapon Cc: src-committers@FreeBSD.org, svn-src-all@FreeBSD.org, svn-src-head@FreeBSD.org Subject: Re: svn commit: r320156 - in head: cddl/contrib/opensolaris/cmd/zdb cddl/contrib/opensolaris/cmd/ztest cddl/contrib/opensolaris/lib/libzfs/common sys/cddl/contrib/opensolaris/common/zfs sys/cddl/contri... Message-ID: <20170620212553.GA30559@mithlond.kdm.org> References: <201706201739.v5KHdPhO051256@repo.freebsd.org> <81F84BCA-E973-4D78-B81C-1D398ADFA47E@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (mithlond.kdm.org [127.0.0.1]); Tue, 20 Jun 2017 17:25:54 -0400 (EDT) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mithlond.kdm.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jun 2017 21:25:57 -0000 On Tue, Jun 20, 2017 at 23:37:10 +0300, Andriy Gapon wrote: > On 20/06/2017 23:29, Ken Merry wrote: > > I don???t know for sure that this commit is the cause, but it (and r320153) are the only ZFS commits between a version of head from June 14th that boots off a ZFS mirror, and one that panics. > > > > Here???s the stack trace: > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 22; > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 9; apic id = 09 > > fault virtual address = 0x0 > > fault code = supervisor read data, page not present > > instruction pointer = 0x20:0xffffffff81e47f21 > > stack pointer = 0x28:0xfffffe08b37f8810 > > frame pointer = 0x28:0xfffffe08b37f8860 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags = interrupt enabled, resume, IOPL = 0 > > current process = 0 (zio_free_issue_0_3) > > [ thread pid 0 tid 100478 ] > > Stopped at 0xffffffff81e47f21 = zio_vdev_io_start+0x1f1: testb $0x1,(%rax) > > db> bt > > Tracing pid 0 tid 100478 td 0xfffff80193156000 > > zio_vdev_io_start() at 0xffffffff81e47f21 = zio_vdev_io_start+0x1f1/frame 0xfffffe08b37f8860 > > zio_execute() at 0xffffffff81e4312c = zio_execute+0x36c/frame 0xfffffe08b37f88b0 > > zio_nowait() at 0xffffffff81e422b8 = zio_nowait+0xb8/frame 0xfffffe08b37f88e0 > > vdev_mirror_io_start() at 0xffffffff81e224fc = vdev_mirror_io_start+0x38c/frame 0xfffffe08b37f8930 > > zio_vdev_io_start() at 0xffffffff81e48030 = zio_vdev_io_start+0x300/frame 0xfffffe08b37f8990 > > zio_execute() at 0xffffffff81e4312c = zio_execute+0x36c/frame 0xfffffe08b37f89e0 > > taskqueue_run_locked() at 0xffffffff809a9d6d = taskqueue_run_locked+0x13d/frame 0xfffffe08b37f8a40 > > taskqueue_thread_loop() at 0xffffffff809aab28 = taskqueue_thread_loop+0x88/frame 0xfffffe08b37f8a70 > > fork_exit() at 0xffffffff8091e3e4 = fork_exit+0x84/frame 0xfffffe08b37f8ab0 > > fork_trampoline() at 0xffffffff80d930fe = fork_trampoline+0xe/frame 0xfffffe08b37f8ab0 > > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > > db> > > > > (kgdb) list *(zio_vdev_io_start+0x1f1) > > 0xd9f21 is in zio_vdev_io_start (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:350). > > 345 > > 346 /* > > 347 * Ensure that anyone expecting this zio to contain a linear ABD isn't > > 348 * going to get a nasty surprise when they try to access the data. > > 349 */ > > 350 IMPLY(abd_is_linear(zio->io_abd), abd_is_linear(data)); > > 351 > > 352 zt->zt_orig_abd = zio->io_abd; > > 353 zt->zt_orig_size = zio->io_size; > > 354 zt->zt_bufsize = bufsize; > > > > I???ll try rebooting and see if the problem goes away. If not, I???ll roll back the ABD change and see if the problem goes away. > > Judging from the thread that panic-ed the problem may have to do with our TRIM > support. Unfortunately, I didn't have a chance to test the change on a system > with working TRIM and, so, I missed it. > I will look into this further, but it's almost obvious that the problem is > caused by zio->io_abd being NULL for a zio of type ZIO_TYPE_FREE. FWIW, avg sent me a patch for this particular problem (by checking for NULL before dereferencing the pointer), and although it got me past the above problem, I hit another related panic: Fatal trap 12: page fault while in kernel mode cpuid = 6; Fatal trap 12: page fault while in kernel mode cpuid = 14; apic id = 22 fault virtual address = 0x4 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff81d92a2d stack pointer = 0x0:0xfffffe08b36e0710 frame pointer = 0x0:0xfffffe08b36e0730 code segment = base 0x0, limit 0xfffff, type 0x1b Fatal trap 12: page fault while in kernel mode cpuid = 11; apic id = 0b fault virtual address = 0x4 Fatal trap 12: page fault while in kernel mode cpuid = 8; apic id = 08 = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (zio_free_issue_4_1) [ thread pid 0 tid 100799 ] Stopped at 0xffffffff81d92a2d = abd_verify+0xd: movl 0x4(%r14),%eax db> bt Tracing pid 0 tid 100799 td 0xfffff801931b8560 abd_verify() at 0xffffffff81d92a2d = abd_verify+0xd/frame 0xfffffe08b36e0730 abd_put() at 0xffffffff81d92eff = abd_put+0xf/frame 0xfffffe08b36e0750 vdev_raidz_map_free() at 0xffffffff81e26312 = vdev_raidz_map_free+0x82/frame 0xfffffe08b36e0780 zio_vdev_io_assess() at 0xffffffff81e48646 = zio_vdev_io_assess+0x116/frame 0xfffffe08b36e07b0 zio_execute() at 0xffffffff81e4312c = zio_execute+0x36c/frame 0xfffffe08b36e0800 zio_vdev_io_start() at 0xffffffff81e48184 = zio_vdev_io_start+0x454/frame 0xfffffe08b36e0860 zio_execute() at 0xffffffff81e4312c = zio_execute+0x36c/frame 0xfffffe08b36e08b0 zio_nowait() at 0xffffffff81e422b8 = zio_nowait+0xb8/frame 0xfffffe08b36e08e0 vdev_mirror_io_start() at 0xffffffff81e224fc = vdev_mirror_io_start+0x38c/frame 0xfffffe08b36e0930 zio_vdev_io_start() at 0xffffffff81e48030 = zio_vdev_io_start+0x300/frame 0xfffffe08b36e0990 zio_execute() at 0xffffffff81e4312c = zio_execute+0x36c/frame 0xfffffe08b36e09e0 taskqueue_run_locked() at 0xffffffff809a9d6d = taskqueue_run_locked+0x13d/frame 0xfffffe08b36e0a40 taskqueue_thread_loop() at 0xffffffff809aab28 = taskqueue_thread_loop+0x88/frame 0xfffffe08b36e0a70 fork_exit() at 0xffffffff8091e3e4 = fork_exit+0x84/frame 0xfffffe08b36e0ab0 fork_trampoline() at 0xffffffff80d930fe = fork_trampoline+0xe/frame 0xfffffe08b36e0ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- db> (kgdb) list *(abd_verify+0xd) 0x24a2d is in abd_verify (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c:231). 226 } 227 228 static inline void 229 abd_verify(abd_t *abd) 230 { 231 ASSERT3U(abd->abd_size, >, 0); 232 ASSERT3U(abd->abd_size, <=, SPA_MAXBLOCKSIZE); 233 ASSERT3U(abd->abd_flags, ==, abd->abd_flags & (ABD_FLAG_LINEAR | 234 ABD_FLAG_OWNER | ABD_FLAG_META)); 235 IMPLY(abd->abd_parent != NULL, !(abd->abd_flags & ABD_FLAG_OWNER)); (kgdb) list *(abd_put+0xf) 0x24eff is in abd_put (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c:514). 509 */ 510 void 511 abd_put(abd_t *abd) 512 { 513 abd_verify(abd); 514 ASSERT(!(abd->abd_flags & ABD_FLAG_OWNER)); 515 516 if (abd->abd_parent != NULL) { 517 (void) refcount_remove_many(&abd->abd_parent->abd_children, 518 abd->abd_size, abd); (kgdb) list *(vdev_raidz_map_free+0x82) 0xb8312 is in vdev_raidz_map_free (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c:281). 276 zio_buf_free(rm->rm_col[c].rc_gdata, 277 rm->rm_col[c].rc_size); 278 } 279 280 size = 0; 281 for (c = rm->rm_firstdatacol; c < rm->rm_cols; c++) { 282 abd_put(rm->rm_col[c].rc_abd); 283 size += rm->rm_col[c].rc_size; 284 } 285 (kgdb) list *(zio_vdev_io_assess+0x116) 0xda646 is in zio_vdev_io_assess (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3315). 3310 if (vd == NULL && !(zio->io_flags & ZIO_FLAG_CONFIG_WRITER)) 3311 spa_config_exit(zio->io_spa, SCL_ZIO, zio); 3312 3313 if (zio->io_vsd != NULL) { 3314 zio->io_vsd_ops->vsd_free(zio); 3315 zio->io_vsd = NULL; 3316 } 3317 3318 if (zio_injection_enabled && zio->io_error == 0) 3319 zio->io_error = zio_handle_fault_injection(zio, EIO); (kgdb) So, I disabled trim by setting vfs.zfs.trim.enabled=0 in the loader, and I can boot now. Ken -- Kenneth Merry ken@FreeBSD.ORG