Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Jun 2017 17:25:53 -0400
From:      "Kenneth D. Merry" <ken@FreeBSD.ORG>
To:        Andriy Gapon <avg@FreeBSD.org>
Cc:        src-committers@FreeBSD.org, svn-src-all@FreeBSD.org, svn-src-head@FreeBSD.org
Subject:   Re: svn commit: r320156 - in head: cddl/contrib/opensolaris/cmd/zdb cddl/contrib/opensolaris/cmd/ztest cddl/contrib/opensolaris/lib/libzfs/common sys/cddl/contrib/opensolaris/common/zfs sys/cddl/contri...
Message-ID:  <20170620212553.GA30559@mithlond.kdm.org>
In-Reply-To: <fc648de9-576d-b5c4-0436-e9597decadf2@FreeBSD.org>
References:  <201706201739.v5KHdPhO051256@repo.freebsd.org> <81F84BCA-E973-4D78-B81C-1D398ADFA47E@freebsd.org> <fc648de9-576d-b5c4-0436-e9597decadf2@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jun 20, 2017 at 23:37:10 +0300, Andriy Gapon wrote:
> On 20/06/2017 23:29, Ken Merry wrote:
> > I don???t know for sure that this commit is the cause, but it (and r320153) are the only ZFS commits between a version of head from June 14th that boots off a ZFS mirror, and one that panics.
> > 
> > Here???s the stack trace:
> > 
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 22; 
> > 
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 9; apic id = 09
> > fault virtual address   = 0x0
> > fault code              = supervisor read data, page not present
> > instruction pointer     = 0x20:0xffffffff81e47f21
> > stack pointer           = 0x28:0xfffffe08b37f8810
> > frame pointer           = 0x28:0xfffffe08b37f8860
> > code segment            = base 0x0, limit 0xfffff, type 0x1b
> >                         = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags        = interrupt enabled, resume, IOPL = 0
> > current process         = 0 (zio_free_issue_0_3)
> > [ thread pid 0 tid 100478 ]
> > Stopped at      0xffffffff81e47f21 = zio_vdev_io_start+0x1f1:   testb   $0x1,(%rax)
> > db> bt
> > Tracing pid 0 tid 100478 td 0xfffff80193156000
> > zio_vdev_io_start() at 0xffffffff81e47f21 = zio_vdev_io_start+0x1f1/frame 0xfffffe08b37f8860
> > zio_execute() at 0xffffffff81e4312c = zio_execute+0x36c/frame 0xfffffe08b37f88b0
> > zio_nowait() at 0xffffffff81e422b8 = zio_nowait+0xb8/frame 0xfffffe08b37f88e0
> > vdev_mirror_io_start() at 0xffffffff81e224fc = vdev_mirror_io_start+0x38c/frame 0xfffffe08b37f8930
> > zio_vdev_io_start() at 0xffffffff81e48030 = zio_vdev_io_start+0x300/frame 0xfffffe08b37f8990
> > zio_execute() at 0xffffffff81e4312c = zio_execute+0x36c/frame 0xfffffe08b37f89e0
> > taskqueue_run_locked() at 0xffffffff809a9d6d = taskqueue_run_locked+0x13d/frame 0xfffffe08b37f8a40
> > taskqueue_thread_loop() at 0xffffffff809aab28 = taskqueue_thread_loop+0x88/frame 0xfffffe08b37f8a70
> > fork_exit() at 0xffffffff8091e3e4 = fork_exit+0x84/frame 0xfffffe08b37f8ab0
> > fork_trampoline() at 0xffffffff80d930fe = fork_trampoline+0xe/frame 0xfffffe08b37f8ab0
> > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> > db> 
> > 
> > (kgdb) list *(zio_vdev_io_start+0x1f1)
> > 0xd9f21 is in zio_vdev_io_start (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:350).
> > 345
> > 346             /*
> > 347              * Ensure that anyone expecting this zio to contain a linear ABD isn't
> > 348              * going to get a nasty surprise when they try to access the data.
> > 349              */
> > 350             IMPLY(abd_is_linear(zio->io_abd), abd_is_linear(data));
> > 351
> > 352             zt->zt_orig_abd = zio->io_abd;
> > 353             zt->zt_orig_size = zio->io_size;
> > 354             zt->zt_bufsize = bufsize;
> > 
> > I???ll try rebooting and see if the problem goes away.  If not, I???ll roll back the ABD change and see if the problem goes away.
> 
> Judging from the thread that panic-ed the problem may have to do with our TRIM
> support.  Unfortunately,  I didn't have a chance to test the change on a system
> with working TRIM and, so, I missed it.
> I will look into this further, but it's almost obvious that the problem is
> caused by zio->io_abd being NULL for a zio of type ZIO_TYPE_FREE.

FWIW, avg sent me a patch for this particular problem (by checking for NULL
before dereferencing the pointer), and although it got me past the above
problem, I hit another related panic:

Fatal trap 12: page fault while in kernel mode
cpuid = 6; 

Fatal trap 12: page fault while in kernel mode
cpuid = 14; apic id = 22
fault virtual address   = 0x4
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff81d92a2d
stack pointer           = 0x0:0xfffffe08b36e0710
frame pointer           = 0x0:0xfffffe08b36e0730
code segment            = base 0x0, limit 0xfffff, type 0x1b


Fatal trap 12: page fault while in kernel mode
cpuid = 11; apic id = 0b
fault virtual address   = 0x4
Fatal trap 12: page fault while in kernel mode
cpuid = 8; apic id = 08
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (zio_free_issue_4_1)
[ thread pid 0 tid 100799 ]
Stopped at      0xffffffff81d92a2d = abd_verify+0xd:    movl    0x4(%r14),%eax
db> bt
Tracing pid 0 tid 100799 td 0xfffff801931b8560
abd_verify() at 0xffffffff81d92a2d = abd_verify+0xd/frame 0xfffffe08b36e0730
abd_put() at 0xffffffff81d92eff = abd_put+0xf/frame 0xfffffe08b36e0750
vdev_raidz_map_free() at 0xffffffff81e26312 = vdev_raidz_map_free+0x82/frame 0xfffffe08b36e0780
zio_vdev_io_assess() at 0xffffffff81e48646 = zio_vdev_io_assess+0x116/frame 0xfffffe08b36e07b0
zio_execute() at 0xffffffff81e4312c = zio_execute+0x36c/frame 0xfffffe08b36e0800
zio_vdev_io_start() at 0xffffffff81e48184 = zio_vdev_io_start+0x454/frame 0xfffffe08b36e0860
zio_execute() at 0xffffffff81e4312c = zio_execute+0x36c/frame 0xfffffe08b36e08b0
zio_nowait() at 0xffffffff81e422b8 = zio_nowait+0xb8/frame 0xfffffe08b36e08e0
vdev_mirror_io_start() at 0xffffffff81e224fc = vdev_mirror_io_start+0x38c/frame 0xfffffe08b36e0930
zio_vdev_io_start() at 0xffffffff81e48030 = zio_vdev_io_start+0x300/frame 0xfffffe08b36e0990
zio_execute() at 0xffffffff81e4312c = zio_execute+0x36c/frame 0xfffffe08b36e09e0
taskqueue_run_locked() at 0xffffffff809a9d6d = taskqueue_run_locked+0x13d/frame 0xfffffe08b36e0a40
taskqueue_thread_loop() at 0xffffffff809aab28 = taskqueue_thread_loop+0x88/frame 0xfffffe08b36e0a70
fork_exit() at 0xffffffff8091e3e4 = fork_exit+0x84/frame 0xfffffe08b36e0ab0
fork_trampoline() at 0xffffffff80d930fe = fork_trampoline+0xe/frame 0xfffffe08b36e0ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
db> 

(kgdb) list *(abd_verify+0xd)

0x24a2d is in abd_verify (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c:231).
226     }
227
228     static inline void
229     abd_verify(abd_t *abd)
230     {
231             ASSERT3U(abd->abd_size, >, 0);
232             ASSERT3U(abd->abd_size, <=, SPA_MAXBLOCKSIZE);
233             ASSERT3U(abd->abd_flags, ==, abd->abd_flags & (ABD_FLAG_LINEAR |
234                 ABD_FLAG_OWNER | ABD_FLAG_META));
235             IMPLY(abd->abd_parent != NULL, !(abd->abd_flags & ABD_FLAG_OWNER));
(kgdb) list *(abd_put+0xf)
0x24eff is in abd_put (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c:514).
509      */
510     void
511     abd_put(abd_t *abd)
512     {
513             abd_verify(abd);
514             ASSERT(!(abd->abd_flags & ABD_FLAG_OWNER));
515
516             if (abd->abd_parent != NULL) {
517                     (void) refcount_remove_many(&abd->abd_parent->abd_children,
518                         abd->abd_size, abd);
(kgdb) list *(vdev_raidz_map_free+0x82)
0xb8312 is in vdev_raidz_map_free (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c:281).
276                             zio_buf_free(rm->rm_col[c].rc_gdata,
277                                 rm->rm_col[c].rc_size);
278             }
279
280             size = 0;
281             for (c = rm->rm_firstdatacol; c < rm->rm_cols; c++) {
282                     abd_put(rm->rm_col[c].rc_abd);
283                     size += rm->rm_col[c].rc_size;
284             }
285
(kgdb) list *(zio_vdev_io_assess+0x116)
0xda646 is in zio_vdev_io_assess (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3315).
3310            if (vd == NULL && !(zio->io_flags & ZIO_FLAG_CONFIG_WRITER))
3311                    spa_config_exit(zio->io_spa, SCL_ZIO, zio);
3312
3313            if (zio->io_vsd != NULL) {
3314                    zio->io_vsd_ops->vsd_free(zio);
3315                    zio->io_vsd = NULL;
3316            }
3317
3318            if (zio_injection_enabled && zio->io_error == 0)
3319                    zio->io_error = zio_handle_fault_injection(zio, EIO);
(kgdb) 

So, I disabled trim by setting vfs.zfs.trim.enabled=0 in the loader, and I
can boot now.

Ken
-- 
Kenneth Merry
ken@FreeBSD.ORG



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170620212553.GA30559>