Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Jun 2017 09:59:14 +0200
From:      "O. Hartmann" <ohartmann@walstatt.org>
To:        "Kenneth D. Merry" <ken@FreeBSD.ORG>
Cc:        Andriy Gapon <avg@FreeBSD.org>, svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org
Subject:   Re: svn commit: r320156 - in head: cddl/contrib/opensolaris/cmd/zdb cddl/contrib/opensolaris/cmd/ztest cddl/contrib/opensolaris/lib/libzfs/common sys/cddl/contrib/opensolaris/common/zfs sys/cddl/contri...
Message-ID:  <20170621095903.4fefe0b5@thor.intern.walstatt.dynvpn.de>
In-Reply-To: <20170620212553.GA30559@mithlond.kdm.org>
References:  <201706201739.v5KHdPhO051256@repo.freebsd.org> <81F84BCA-E973-4D78-B81C-1D398ADFA47E@freebsd.org> <fc648de9-576d-b5c4-0436-e9597decadf2@FreeBSD.org> <20170620212553.GA30559@mithlond.kdm.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/OqF89_4IuMou85Cqd.VGkTp
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Am Tue, 20 Jun 2017 17:25:53 -0400
"Kenneth D. Merry" <ken@FreeBSD.ORG> schrieb:

> On Tue, Jun 20, 2017 at 23:37:10 +0300, Andriy Gapon wrote:
> > On 20/06/2017 23:29, Ken Merry wrote: =20
> > > I don???t know for sure that this commit is the cause, but it (and r3=
20153) are the
> > > only ZFS commits between a version of head from June 14th that boots =
off a ZFS
> > > mirror, and one that panics.

r320153 is running well here and stable, but with r320156, my kernel(s) on =
all ZFS
machines panic immediately (they have ZFS built in into the kernel, not a m=
odule).

This moment, I went back to r320153. I'm sorry for not having debugging inf=
ormations, the
boxes are w/o debugging options this moment.

> > >=20
> > > Here???s the stack trace:
> > >=20
> > > Fatal trap 12: page fault while in kernel mode
> > > cpuid =3D 22;=20
> > >=20
> > > Fatal trap 12: page fault while in kernel mode
> > > cpuid =3D 9; apic id =3D 09
> > > fault virtual address   =3D 0x0
> > > fault code              =3D supervisor read data, page not present
> > > instruction pointer     =3D 0x20:0xffffffff81e47f21
> > > stack pointer           =3D 0x28:0xfffffe08b37f8810
> > > frame pointer           =3D 0x28:0xfffffe08b37f8860
> > > code segment            =3D base 0x0, limit 0xfffff, type 0x1b
> > >                         =3D DPL 0, pres 1, long 1, def32 0, gran 1
> > > processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
> > > current process         =3D 0 (zio_free_issue_0_3)
> > > [ thread pid 0 tid 100478 ]
> > > Stopped at      0xffffffff81e47f21 =3D zio_vdev_io_start+0x1f1:   tes=
tb
> > > $0x1,(%rax) =20
> > > db> bt =20
> > > Tracing pid 0 tid 100478 td 0xfffff80193156000
> > > zio_vdev_io_start() at 0xffffffff81e47f21 =3D zio_vdev_io_start+0x1f1=
/frame
> > > 0xfffffe08b37f8860 zio_execute() at 0xffffffff81e4312c =3D zio_execut=
e+0x36c/frame
> > > 0xfffffe08b37f88b0 zio_nowait() at 0xffffffff81e422b8 =3D zio_nowait+=
0xb8/frame
> > > 0xfffffe08b37f88e0 vdev_mirror_io_start() at 0xffffffff81e224fc =3D
> > > vdev_mirror_io_start+0x38c/frame 0xfffffe08b37f8930 zio_vdev_io_start=
() at
> > > 0xffffffff81e48030 =3D zio_vdev_io_start+0x300/frame 0xfffffe08b37f89=
90 zio_execute()
> > > at 0xffffffff81e4312c =3D zio_execute+0x36c/frame 0xfffffe08b37f89e0
> > > taskqueue_run_locked() at 0xffffffff809a9d6d =3D taskqueue_run_locked=
+0x13d/frame
> > > 0xfffffe08b37f8a40 taskqueue_thread_loop() at 0xffffffff809aab28 =3D
> > > taskqueue_thread_loop+0x88/frame 0xfffffe08b37f8a70 fork_exit() at
> > > 0xffffffff8091e3e4 =3D fork_exit+0x84/frame 0xfffffe08b37f8ab0 fork_t=
rampoline() at
> > > 0xffffffff80d930fe =3D fork_trampoline+0xe/frame 0xfffffe08b37f8ab0 -=
-- trap 0, rip =3D
> > > 0, rsp =3D 0, rbp =3D 0 --- =20
> > > db>  =20
> > >=20
> > > (kgdb) list *(zio_vdev_io_start+0x1f1)
> > > 0xd9f21 is in zio_vdev_io_start
> > > (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensola=
ris/uts/common/fs/zfs/zio.c:350).
> > > 345 346             /*
> > > 347              * Ensure that anyone expecting this zio to contain a=
 linear ABD
> > > isn't 348              * going to get a nasty surprise when they try =
to access the
> > > data. 349              */
> > > 350             IMPLY(abd_is_linear(zio->io_abd), abd_is_linear(data)=
);
> > > 351
> > > 352             zt->zt_orig_abd =3D zio->io_abd;
> > > 353             zt->zt_orig_size =3D zio->io_size;
> > > 354             zt->zt_bufsize =3D bufsize;
> > >=20
> > > I???ll try rebooting and see if the problem goes away.  If not, I???l=
l roll back
> > > the ABD change and see if the problem goes away. =20
> >=20
> > Judging from the thread that panic-ed the problem may have to do with o=
ur TRIM
> > support.  Unfortunately,  I didn't have a chance to test the change on =
a system
> > with working TRIM and, so, I missed it.
> > I will look into this further, but it's almost obvious that the problem=
 is
> > caused by zio->io_abd being NULL for a zio of type ZIO_TYPE_FREE. =20
>=20
> FWIW, avg sent me a patch for this particular problem (by checking for NU=
LL
> before dereferencing the pointer), and although it got me past the above
> problem, I hit another related panic:
>=20
> Fatal trap 12: page fault while in kernel mode
> cpuid =3D 6;=20
>=20
> Fatal trap 12: page fault while in kernel mode
> cpuid =3D 14; apic id =3D 22
> fault virtual address   =3D 0x4
> fault code              =3D supervisor read data, page not present
> instruction pointer     =3D 0x20:0xffffffff81d92a2d
> stack pointer           =3D 0x0:0xfffffe08b36e0710
> frame pointer           =3D 0x0:0xfffffe08b36e0730
> code segment            =3D base 0x0, limit 0xfffff, type 0x1b
>=20
>=20
> Fatal trap 12: page fault while in kernel mode
> cpuid =3D 11; apic id =3D 0b
> fault virtual address   =3D 0x4
> Fatal trap 12: page fault while in kernel mode
> cpuid =3D 8; apic id =3D 08
>                         =3D DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
> current process         =3D 0 (zio_free_issue_4_1)
> [ thread pid 0 tid 100799 ]
> Stopped at      0xffffffff81d92a2d =3D abd_verify+0xd:    movl    0x4(%r1=
4),%eax
> db> bt =20
> Tracing pid 0 tid 100799 td 0xfffff801931b8560
> abd_verify() at 0xffffffff81d92a2d =3D abd_verify+0xd/frame 0xfffffe08b36=
e0730
> abd_put() at 0xffffffff81d92eff =3D abd_put+0xf/frame 0xfffffe08b36e0750
> vdev_raidz_map_free() at 0xffffffff81e26312 =3D vdev_raidz_map_free+0x82/=
frame
> 0xfffffe08b36e0780 zio_vdev_io_assess() at 0xffffffff81e48646 =3D
> zio_vdev_io_assess+0x116/frame 0xfffffe08b36e07b0 zio_execute() at 0xffff=
ffff81e4312c =3D
> zio_execute+0x36c/frame 0xfffffe08b36e0800 zio_vdev_io_start() at 0xfffff=
fff81e48184 =3D
> zio_vdev_io_start+0x454/frame 0xfffffe08b36e0860 zio_execute() at 0xfffff=
fff81e4312c =3D
> zio_execute+0x36c/frame 0xfffffe08b36e08b0 zio_nowait() at 0xffffffff81e4=
22b8 =3D
> zio_nowait+0xb8/frame 0xfffffe08b36e08e0 vdev_mirror_io_start() at 0xffff=
ffff81e224fc =3D
> vdev_mirror_io_start+0x38c/frame 0xfffffe08b36e0930 zio_vdev_io_start() at
> 0xffffffff81e48030 =3D zio_vdev_io_start+0x300/frame 0xfffffe08b36e0990 z=
io_execute() at
> 0xffffffff81e4312c =3D zio_execute+0x36c/frame 0xfffffe08b36e09e0 taskque=
ue_run_locked()
> at 0xffffffff809a9d6d =3D taskqueue_run_locked+0x13d/frame 0xfffffe08b36e=
0a40
> taskqueue_thread_loop() at 0xffffffff809aab28 =3D taskqueue_thread_loop+0=
x88/frame
> 0xfffffe08b36e0a70 fork_exit() at 0xffffffff8091e3e4 =3D fork_exit+0x84/f=
rame
> 0xfffffe08b36e0ab0 fork_trampoline() at 0xffffffff80d930fe =3D fork_tramp=
oline+0xe/frame
> 0xfffffe08b36e0ab0 --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 ---
> db>  =20
>=20
> (kgdb) list *(abd_verify+0xd)
>=20
> 0x24a2d is in abd_verify
> (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/=
uts/common/fs/zfs/abd.c:231).
> 226     } 227
> 228     static inline void
> 229     abd_verify(abd_t *abd)
> 230     {
> 231             ASSERT3U(abd->abd_size, >, 0);
> 232             ASSERT3U(abd->abd_size, <=3D, SPA_MAXBLOCKSIZE);
> 233             ASSERT3U(abd->abd_flags, =3D=3D, abd->abd_flags & (ABD_FL=
AG_LINEAR |
> 234                 ABD_FLAG_OWNER | ABD_FLAG_META));
> 235             IMPLY(abd->abd_parent !=3D NULL, !(abd->abd_flags & ABD_F=
LAG_OWNER));
> (kgdb) list *(abd_put+0xf)
> 0x24eff is in abd_put
> (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/=
uts/common/fs/zfs/abd.c:514).
> 509      */ 510     void
> 511     abd_put(abd_t *abd)
> 512     {
> 513             abd_verify(abd);
> 514             ASSERT(!(abd->abd_flags & ABD_FLAG_OWNER));
> 515
> 516             if (abd->abd_parent !=3D NULL) {
> 517                     (void) refcount_remove_many(&abd->abd_parent->abd=
_children,
> 518                         abd->abd_size, abd);
> (kgdb) list *(vdev_raidz_map_free+0x82)
> 0xb8312 is in vdev_raidz_map_free
> (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/=
uts/common/fs/zfs/vdev_raidz.c:281).
> 276                             zio_buf_free(rm->rm_col[c].rc_gdata,
> 277                                 rm->rm_col[c].rc_size); 278          =
   }
> 279
> 280             size =3D 0;
> 281             for (c =3D rm->rm_firstdatacol; c < rm->rm_cols; c++) {
> 282                     abd_put(rm->rm_col[c].rc_abd);
> 283                     size +=3D rm->rm_col[c].rc_size;
> 284             }
> 285
> (kgdb) list *(zio_vdev_io_assess+0x116)
> 0xda646 is in zio_vdev_io_assess
> (/usr/home/kenm/perforce4/kenm/FreeBSD-test/sys/cddl/contrib/opensolaris/=
uts/common/fs/zfs/zio.c:3315).
> 3310            if (vd =3D=3D NULL && !(zio->io_flags & ZIO_FLAG_CONFIG_W=
RITER))
> 3311                    spa_config_exit(zio->io_spa, SCL_ZIO, zio); 3312
> 3313            if (zio->io_vsd !=3D NULL) {
> 3314                    zio->io_vsd_ops->vsd_free(zio);
> 3315                    zio->io_vsd =3D NULL;
> 3316            }
> 3317
> 3318            if (zio_injection_enabled && zio->io_error =3D=3D 0)
> 3319                    zio->io_error =3D zio_handle_fault_injection(zio,=
 EIO);
> (kgdb)=20
>=20
> So, I disabled trim by setting vfs.zfs.trim.enabled=3D0 in the loader, an=
d I
> can boot now.
>=20
> Ken



--=20
O. Hartmann

Ich widerspreche der Nutzung oder =C3=9Cbermittlung meiner Daten f=C3=BCr
Werbezwecke oder f=C3=BCr die Markt- oder Meinungsforschung (=C2=A7 28 Abs.=
 4 BDSG).

--Sig_/OqF89_4IuMou85Cqd.VGkTp
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----

iLUEARMKAB0WIQQZVZMzAtwC2T/86TrS528fyFhYlAUCWUonUgAKCRDS528fyFhY
lHQ+Af0Zs5JJEHGPnD9+ae/VKtOa3hEsYh1mQJI+oDA9ni80NQ+a33cMDfjGk/Dc
iFTQM/8ckRwwgpPQPODf05oRHhRmAf4xk8L5wXKUAF+5nQBVhgApX38egAK0uAYz
aR1spEDIBsWB1stG5wt7uNdqCFbT2CJV1Wn/oQ4DNtTqgGEHwWIw
=KaKf
-----END PGP SIGNATURE-----

--Sig_/OqF89_4IuMou85Cqd.VGkTp--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170621095903.4fefe0b5>