Date: Fri, 07 Aug 2009 09:45:38 -0400 From: Boris Kochergin <spawk@acm.poly.edu> To: Pawel Jakub Dawidek <pjd@FreeBSD.org> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS RAID-Z panic on vdev failure + subsequent panics and hangs Message-ID: <4A7C3002.8000003@acm.poly.edu> In-Reply-To: <20090807074400.GB1607@garage.freebsd.pl> References: <4A78AA71.9050107@acm.poly.edu> <4A78AFB2.10103@acm.poly.edu> <20090805115621.GG1784@garage.freebsd.pl> <4A798A12.4070408@acm.poly.edu> <20090807073738.GA1607@garage.freebsd.pl> <20090807074400.GB1607@garage.freebsd.pl>
next in thread | previous in thread | raw e-mail | index | archive | help
Pawel Jakub Dawidek wrote: > On Fri, Aug 07, 2009 at 09:37:38AM +0200, Pawel Jakub Dawidek wrote: > >> On Wed, Aug 05, 2009 at 09:33:06AM -0400, Boris Kochergin wrote: >> >>> Fatal trap 12: page fault while in kernel mode >>> fault virtual address = 0xffffffffffffffe9 >>> fault code = supervisor read data, page not present >>> instruction pointer = 0x20:0xffffffff8103a9e7 >>> stack pointer = 0x28:0xffffff8077f26430 >>> frame pointer = 0x28:0xffffff8077f26500 >>> code segment = base 0x0, limit 0xfffff, type 0x1b >>> = DPL 0, pres 1, long 1, def32 0, gran 1 >>> processor eflags = interrupt enabled, resume, IOPL = 0 >>> current process = 972 (cp) >>> >> [...] >> >>> /usr/src/sys/amd64/amd64/trap.c:494 >>> #11 0xffffffff80854d73 in calltrap () at >>> /usr/src/sys/amd64/amd64/exception.S:224 >>> #12 0xffffffff8103a9e7 in arc_evict (state=Variable "state" is not >>> available. >>> ) at >>> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1489 >>> >> Could you tell me what do you have at this line in your source? I don't >> think you use HEAD... What exact FreeBSD version are you using? >> > > You already gave version number in your first mail, sorry about that. > 8.0-BETA2 should be very close to HEAD (or it actually was HEAD), so I > guess this is the code we are looking at: > > 1488: /* "lookahead" for better eviction candidate */ > 1489: if (recycle && ab->b_size != bytes && > 1490: ab_prev && ab_prev->b_size == bytes) > 1491: continue; > > If 'ab' is corrupted it should panic earlier, so it seems ab_prev is > corrupted, can you see what it points to in gdb? > > Yeah, that's what the code looks like. For convenience, I've put the source tree the system was built using up at: http://acm.poly.edu/~spawk/src/ Maybe my kgdb chops aren't up to par, but I can't seem to see what ab_prev points to: (kgdb) up #12 0xffffffff8103a9e7 in arc_evict (state=Variable "state" is not available. ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1489 1489 if (recycle && ab->b_size != bytes && Current language: auto; currently c (kgdb) list 1484 LBOLT - ab->b_arc_access < arc_min_prefetch_lifespan)) { 1485 skipped++; 1486 continue; 1487 } 1488 /* "lookahead" for better eviction candidate */ 1489 if (recycle && ab->b_size != bytes && 1490 ab_prev && ab_prev->b_size == bytes) 1491 continue; 1492 hash_lock = HDR_LOCK(ab); 1493 have_lock = MUTEX_HELD(hash_lock); (kgdb) print ab $13 = (arc_buf_hdr_t *) 0xffffff0003ebc410 (kgdb) print ab->b_size $14 = 1 (kgdb) print bytes $15 = 16384 (kgdb) print ab_prev No symbol "ab_prev" in current context. -Boris
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A7C3002.8000003>