From owner-freebsd-current@FreeBSD.ORG Wed Oct 19 21:13:33 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4015E106564A for ; Wed, 19 Oct 2011 21:13:33 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay01.ispgateway.de (smtprelay01.ispgateway.de [80.67.31.24]) by mx1.freebsd.org (Postfix) with ESMTP id A3F478FC14 for ; Wed, 19 Oct 2011 21:13:32 +0000 (UTC) Received: from [78.35.72.188] (helo=fabiankeil.de) by smtprelay01.ispgateway.de with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.68) (envelope-from ) id 1RGdS6-0001SU-Tq for freebsd-current@freebsd.org; Wed, 19 Oct 2011 23:13:31 +0200 Date: Wed, 19 Oct 2011 23:09:15 +0200 From: Fabian Keil To: freebsd-current@freebsd.org Message-ID: <20111019230915.041aa981@fabiankeil.de> In-Reply-To: <20110927220015.375ac343@fabiankeil.de> References: <20110927220015.375ac343@fabiankeil.de> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/=nSlf2SV8M+zQPGl.+Cmhad"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 Subject: Re: Fatal trap 12: page fault while in kernel mode -- Stopped at atomic_subtract_int+0x4 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Oct 2011 21:13:33 -0000 --Sig_/=nSlf2SV8M+zQPGl.+Cmhad Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Fabian Keil wrote: > I pretty reproducible get the following (handtranscribed) panic > when sending an zfs snapshot to geli provider based on an USB > stick that disappears (due to a bug, or because it's unplugged):=20 >=20 > Fatal trap 12: page fault while in kernel mode > cpuid =3D 0: apic id =3D 00 > fault virtual address =3D 0x288 > fault code =3D supervisor write data, page not present > instruction pointer =3D 0x20:0xffffffff808e2984 > stack pointer =3D 0x28:0xffffff800023fba0 > frame pointer =3D 0x28:0xffffff800023fbb0 > code segment =3D base 0x0, limit 0xfffff, type 0x1b > =3D DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags =3D interrupt enabled, resume, IOPL =3D 0 > current process =3D 13 (g_up) > [ thread pid 13 tid 100010 ] > Stopped at atomic_subtract_int+0x4: lock subl %esi,(%rdi) > db> where > Tracing pid 13 tid 100010 td 0xfffffe00027998c0 > atomic_subtract_int() at atomic_subtract_int+0x4 > g_io_schdule_up() at g_io_schedule_up+0xa6 > g_up_procbody() at g_up_procbody+0x5c > fork_exit() at fork_exit+0x11f > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip =3D 0, rsp =3D 0xffffff800023fd00, rbp 0 --- >=20 > It seems to be important that ZFS is actually writing to the stick. > If the stick is unplugged while the operation is stalled for other > reasons, this particular panic doesn't seem to occur. >=20 > While I end up in the debugger, dumping core doesn't work > and produces a double fault and a bunch of duplicated > messages (again handtranscribed): The duplicated messages have been recently fixed. =20 > db> dump > Dumping 443 out of 1974 MB: Dumping 443 out of 1974 MB >=20 > Fatal double fault > Fatal double fault > rip =3D 0xffffffff8066a9e0 > rip =3D 0xffffffff8066a9e0 > rsp =3D 0xffffff800023c000 > rsp =3D 0xffffff800023c000 > rbp =3D 0xffffff800023c040 > rbp =3D 0xffffff800023c040 > cpuid =3D 0; cpuid =3D 0; apic id =3D 00 > apic id =3D 00 > panic: double fault > panic: double fault > cpuid =3D 0 > cpuid =3D 0 > KDB: stack backtrace: > KDB: stack backtrace: > db_trac_self_wrapper() at db_trace_self_wrapper+0x2a > kdb_backtrace() at kdb_backtrace+0x37 > panic() at panic+0x187 > dblfault_handler() at dblfault_handler+0xa4 > Xdblfault() at Xdblfault+0xa8 > --- trap 0x17, rip =3D 0xffffffff8066a9e8, rsp =3D 0xffffffff80e56158, rb= p =3D 0xffffff800023c040 --- > mi_switch() at mi_switch+0x270 > critical_exit() at critical_exit+0x9b > spinlock_exit() at spinlock_exit+0x17 > mi_switch() at mi_switch+0x275 > critical_exit() at critical_exit+0x9b > spinlock_exit() at spinlock_exit+0x17 > [several pages of the previous three lines skipped] > mi_switch() at mi_switch+0x275 > critical_exit() at critical_exit+0x9b > spinlock_exit() at spinlock_exit+0x17 > intr_even_schedule_thread() at intr_event_schedule_thread+0xbb > ahci_end_transaction() at ahci_end_transaction+0x398 > ahci_ch_intr() at ahci_ch_intr+0x2b5 > ahcipoll() at ahcipoll+0x15 > xpt_polled_action() at xpt_polled_action+0xf7 >=20 > I first noticed the problem with CURRENT from a week ago, > but given that USB sticks don't usually disappear for me > while sending snapshots to them, the problem might not > be new. >=20 > I'm using amd64, the panic above is from a custom kernel > without WITNESS and INVARIANTS, but enabling them doesn't > seem to affect the trace before the double fault. >=20 > I wasn't able to reproduce the panic by unplugging the stick > while writing to the pool using dd (but only tried once). Here's another one, again with recent HEAD. This time the USB stick disappeared while the pool was being scrubbed and dumping actually worked. The stick seems to reproducibly disappear after scrubbing it for a while and the panic seems to be reproducible as well. The stack trace looks a bit different, but I'm not sure if this is because it's a slightly different situation or because of changes in HEAD. fk@r500 /usr/crash $kgdb kernel.3/kernel vmcore.3 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain condition= s. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: ugen7.2: at usbus7 (disconnected) umass0: at uhub7, port 2, addr 2 (disconnected) (da0:umass-sim0:0:0:0): Request completed with CAM_REQ_CMP_ERR (da0:umass-sim0:0:0:0): Retrying command (da0:umass-sim0:0:0:0): Selection timeout (da0:umass-sim0:0:0:0): Retrying command (da0:umass-sim0:0:0:0): Selection timeout (da0:umass-sim0:0:0:0): Retrying command (da0:umass-sim0:0:0:0): Selection timeout (da0:umass-sim0:0:0:0): Retrying command (pass2:umass-sim0:0:0:0): lost device (pass2:umass-sim0:0:0:0): removing device entry (da0:umass-sim0:0:0:0): lost device - 1 outstanding (da0:umass-sim0:0:0:0): Error 6, Retries exhausted (da0:umass-sim0:0:0:0): oustanding 0 GEOM_ELI: Crypto WRITE request failed (error=3D6). label/extreme.eli[WRITE(= offset=3D2958844928, length=3D6144)] GEOM_ELI: Crypto WRITE request failed (error=3D6). label/extreme.eli[WRITE(= offset=3D2958851584, length=3D21504)] GEOM_ELI: Crypto WRITE request failed (error=3D6). label/extreme.eli[WRITE(= offset=3D2390255616, length=3D18944)] GEOM_ELI: g_eli_read_done() failed label/extreme.eli[READ(offset=3D270336, = length=3D8192)] Fatal trap 12: page fault while in kernel mode cpuid =3D 1; apic id =3D 01 fault virtual address =3D 0x284 fault code =3D supervisor read data, page not present instruction pointer =3D 0x20:0xffffffff81314512 stack pointer =3D 0x28:0xffffff8000244b90 frame pointer =3D 0x28:0xffffff8000244bc0 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 13 (g_down) Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kerne= l/zfs.ko.symbols...done. done. [...] Loaded symbols for /boot/kernel/fdescfs.ko #0 doadump (textdump=3D0) at /usr/src/sys/kern/kern_shutdown.c:260 260 if (textdump && textdump_pending) { (kgdb) where #0 doadump (textdump=3D0) at /usr/src/sys/kern/kern_shutdown.c:260 #1 0xffffffff80333380 in db_dump (dummy=3DVariable "dummy" is not availabl= e. ) at /usr/src/sys/ddb/db_command.c:537 #2 0xffffffff80332cb1 in db_command (last_cmdp=3D0xffffffff80dfd100, cmd_t= able=3DVariable "cmd_table" is not available. ) at /usr/src/sys/ddb/db_command.c:448 #3 0xffffffff80332f00 in db_command_loop () at /usr/src/sys/ddb/db_command= .c:501 #4 0xffffffff80335039 in db_trap (type=3DVariable "type" is not available. ) at /usr/src/sys/ddb/db_main.c:229 #5 0xffffffff806a1061 in kdb_trap (type=3D12, code=3D0, tf=3D0xffffff80002= 44ae0) at /usr/src/sys/kern/subr_kdb.c:625 #6 0xffffffff80903bbd in trap_fatal (frame=3D0xffffff8000244ae0, eva=3DVar= iable "eva" is not available. ) at /usr/src/sys/amd64/amd64/trap.c:813 #7 0xffffffff80903f26 in trap_pfault (frame=3D0xffffff8000244ae0, usermode= =3D0) at /usr/src/sys/amd64/amd64/trap.c:734 #8 0xffffffff8090448f in trap (frame=3D0xffffff8000244ae0) at /usr/src/sys= /amd64/amd64/trap.c:473 #9 0xffffffff808ee1e3 in calltrap () at /usr/src/sys/amd64/amd64/exception= .S:228 #10 0xffffffff81314512 in g_eli_start (bp=3D0xfffffe000c44cd98) at /usr/src= /sys/modules/geom/geom_eli/../../../geom/eli/g_eli.c:320 #11 0xffffffff80600304 in g_io_schedule_down (tp=3DVariable "tp" is not ava= ilable. ) at /usr/src/sys/geom/geom_io.c:632 #12 0xffffffff8060060c in g_down_procbody (arg=3DVariable "arg" is not avai= lable. ) at /usr/src/sys/geom/geom_kern.c:110 #13 0xffffffff806399ef in fork_exit (callout=3D0xffffffff806005b0 , arg=3D0x0, frame=3D0xffffff8000244c50) at /usr/src/sys/kern/kern_= fork.c:995 #14 0xffffffff808ee70e in fork_trampoline () at /usr/src/sys/amd64/amd64/ex= ception.S:602 #15 0x0000000000000000 in ?? () #16 0x0000000000000000 in ?? () #17 0x0000000000000001 in ?? () #18 0x0000000000000000 in ?? () [...] #39 0xffffffff80e398f0 in sleepq_chains () #40 0xfffffe0002713888 in ?? () #41 0x0000000000000000 in ?? () #42 0xfffffe0002713460 in ?? () #43 0xffffff8000244ad0 in ?? () #44 0xffffff8000244a78 in ?? () #45 0xfffffe00027138c0 in ?? () #46 0xffffffff80693ca0 in sched_switch (td=3D0xffffffff806005b0, newtd=3D0x= 0, flags=3DVariable "flags" is not available. ) at /usr/src/sys/kern/sched_ule.c:1853 Previous frame inner to this frame (corrupt stack?) (kgdb) f 10 #10 0xffffffff81314512 in g_eli_start (bp=3D0xfffffe000c44cd98) at /usr/src= /sys/modules/geom/geom_eli/../../../geom/eli/g_eli.c:320 320 } (kgdb) p *bp $1 =3D {bio_cmd =3D 1 '\001', bio_flags =3D 0 '\0', bio_cflags =3D 0 '\0', = bio_pflags =3D 255 '=FF', bio_dev =3D 0x0, bio_disk =3D 0x0, bio_offset =3D= 4022607872, bio_bcount =3D 0, bio_data =3D 0xffffff8005747000 "\200*=F8\f", bio_error =3D 0, bio_resid = =3D 0, bio_done =3D 0xffffffff811a72d0 , bio_driver1 =3D= 0xfffffe0004c83828, bio_driver2 =3D 0x0, bio_caller1 =3D 0xfffffe001cf5ba50, bio_caller2 =3D 0x0, bio_queue =3D {t= qe_next =3D 0xfffffe000c44b3a0, tqe_prev =3D 0xffffffff80e1c840}, bio_attri= bute =3D 0x0, bio_from =3D 0xfffffe000c782a80, bio_to =3D 0xfffffe000cc8ca00, bio_length =3D 8192, bio_completed =3D 0, = bio_children =3D 1, bio_inbed =3D 0, bio_parent =3D 0x0, bio_t0 =3D {sec = =3D 243, frac =3D 14970883905652654531}, bio_task =3D 0, bio_task_arg =3D 0x0, bio_classifier1 =3D 0x0, bio_classi= fier2 =3D 0x0, bio_pblkno =3D 0} Fabian --Sig_/=nSlf2SV8M+zQPGl.+Cmhad Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk6fPH4ACgkQBYqIVf93VJ2zbACgku3xNwtrWEFmpBe7vlSazLB3 DEYAmwaD/GQz1f5rTp9z8+JPgMRX84rv =rDGs -----END PGP SIGNATURE----- --Sig_/=nSlf2SV8M+zQPGl.+Cmhad--