From owner-freebsd-current Wed Mar 17 19: 1:55 1999 Delivered-To: freebsd-current@freebsd.org Received: from allegro.lemis.com (allegro.lemis.com [192.109.197.134]) by hub.freebsd.org (Postfix) with ESMTP id 1CE9215264 for ; Wed, 17 Mar 1999 19:01:48 -0800 (PST) (envelope-from grog@freebie.lemis.com) Received: from freebie.lemis.com (freebie.lemis.com [192.109.197.137]) by allegro.lemis.com (8.9.1/8.9.0) with ESMTP id NAA24755; Thu, 18 Mar 1999 13:31:25 +1030 (CST) Received: (from grog@localhost) by freebie.lemis.com (8.9.3/8.9.0) id NAA76509; Thu, 18 Mar 1999 13:31:24 +1030 (CST) Message-ID: <19990318133123.S429@lemis.com> Date: Thu, 18 Mar 1999 13:31:23 +1030 From: Greg Lehey To: vallo@matti.ee Cc: freebsd-current@freebsd.org Subject: Help! (was: Crash while newfs'ing innocent vinum volume on fresh system.) References: <19990317142107.A78437@matti.ee> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Mailer: Mutt 0.93.2i In-Reply-To: <19990317142107.A78437@matti.ee>; from Vallo Kallaste on Wed, Mar 17, 1999 at 02:21:07PM +0200 WWW-Home-Page: http://www.lemis.com/~grog Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-41-739-7062 Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wednesday, 17 March 1999 at 14:21:07 +0200, Vallo Kallaste wrote: > I have reinstalled fresh 3.1-RELEASE to the machine in question, > before swapped out two memory DIMM's to the single new 32MB one, just > to be sure it's not some kind of memory error. This is definitely not a memory error. I wish I knew what it was, though. In my last message I said it wasn't appropriate to copy -CURRENT, but maybe it is: I just can't figure out how this is happening. > Then I downloaded your newest archive for vinum installation: > vinum-4.0-CURRENT.tar.gz... > Copied the same kernel configuration file as usual to the right place > from backup. Made some minor changes to kernel configuration, removed > DMA support for ata disks, I don't trust the support very much yet. > Made new debug kernel, copied to /var/crash, stripped the original > with -g and installed. Reboot. For just testing Vinum, you don't need to build a kernel. > Newfs -n1 -d0 -v /dev/vinum/rsvol .... two lines newfs output and > crash: > > Script started on Wed Mar 17 13:24:45 1999 > sh-2.02# gdb -k ker=07nel.gdb vmcore.0 > (kgdb) add-symbol-file /modules/vinum.ko 0x00005ba8+ > A syntax error in expression, near `'. You shouldn't need to do this any more. The .gdbinit file I put in this directory did it for you. When removing dumps, don't remove the =2Egdbinit. > #9 0xf01eadba in trap (frame=3D{tf_es =3D -203751408, tf_ds =3D -2037514= 08, > tf_edi =3D -559038242, tf_esi =3D -2147483648, tf_ebp =3D -20371514= 4, > tf_isp =3D -203715180, tf_ebx =3D -258066920, tf_edx =3D 16384, tf_= ecx =3D 14, > tf_eax =3D -16162, tf_trapno =3D 12, tf_err =3D 0, tf_eip =3D -2576= 39372, > tf_cs =3D 8, tf_eflags =3D 66182, tf_esp =3D -257699840, tf_ss =3D = -245981592}) > at ../../i386/i386/trap.c:437 > #10 0xf0a4bc34 in ?? () > #11 0xf0a4bae1 in ?? () > #12 0xf0a4b8b6 in ?? () > #13 0xf012ffec in physio (strategy=3D0xf0a4b808 , bp=3D0x0, > dev=3D23296, rw=3D0, minp=3D0xf01300dc , uio=3D0xf3db8f34) > at ../../kern/kern_physio.c:113 > #14 0xf0a4c97c in ?? () > #15 0xf0162f2f in spec_write (ap=3D0xf3db8ef8) > at ../../miscfs/specfs/spec_vnops.c:355 These are the critical frames. With syumbols, they read: #9 0xf01eadba in trap (frame=3D{tf_es =3D -203751408, tf_ds =3D -203751408= , tf_edi =3D -559038242, tf_esi =3D -2147483648, tf_ebp =3D -203715144, tf_isp =3D -203715180, tf_ebx =3D -258066920, = tf_edx =3D 16384, tf_ecx =3D 14, tf_eax =3D -16162, tf_trapno =3D 12, tf_err =3D 0, tf_eip =3D -257639372, tf_cs =3D 8, t= f_eflags =3D 66182, tf_esp =3D -257699840, tf_ss =3D -245981592}) at ../../i386/i386/trap.c:437 #10 0xf0a4bc34 in launch_requests (rq=3D0xf0a3cb00, reviveok=3D0) at /usr/src/sys/modules/vinum/../../dev/vinum/vinumrequest.c:370 #11 0xf0a4bae1 in vinumstart (bp=3D0xf1569e68, reviveok=3D0) at /usr/src/sys/modules/vinum/../../dev/vinum/vinumrequest.c:300 #12 0xf0a4b8b6 in vinumstrategy (bp=3D0xf1569e68) at /usr/src/sys/modules/v= inum/../../dev/vinum/vinumrequest.c:151 #13 0xf012ffec in physio (strategy=3D0xf0a4b808 , bp=3D0x0, = dev=3D23296, rw=3D0, minp=3D0xf01300dc , uio=3D0xf3db8f34) at ../../kern/kern_physio.c:113 #14 0xf0a4c97c in vinumwrite (dev=3D23296, uio=3D0xf3db8f34, ioflag=3D1) at /usr/src/sys/modules/vinum/../../dev/vinum/vinumrequest.c:995 #15 0xf0162f2f in spec_write (ap=3D0xf3db8ef8) at ../../miscfs/specfs/spec_= vnops.c:355 This is a problem I've seen before, but it completely baffles me. The request passed to launch_requests (frame 10) has been deallocated. Some of the debug code I put in caught it: (kgdb) p freeinfo[7] $2 =3D { time =3D { tv_sec =3D 921669613,=20 tv_usec =3D 289712 },=20 seq =3D 24,=20 size =3D 36,=20 line =3D 174,=20 address =3D 0xf0a3cb00 "=DE=C0=AD=DEh\235\"=F0=DE=C0=AD=DE=C0=C8=A3=F0=DE= =C0=AD=DE=DE=C0=AD=DE=DE=C0=AD=DE=DE=C0=AD=DE=DE=C0=AD=DE=DE=C0=AD=DE=DE=C0= =AD=DE=DE=C0=AD=DE=DE=C0=AD=DE=DE=C0=AD=DE=DE=C0=AD=DE=DE=C0=AD=DE",=20 file =3D "vinuminterrupt.c" } This was called from freerq, which frees the complete request. freerq is called from only four places: one on completion of the request (which in this case is just about to be started), one if the request is aborted (which also sets bp->b_error, which is not set here), once in a read context (which is not applicable here: it's a write), and once just before the call to launch_requests in frame 11: 292 if ((status > REQUEST_RECOVERED) /* can'= t satisfy it */ 293 ||(bp->b_flags & B_DONE)) { /* XXX = shouldn't get this without bad status */ 294 if (status =3D=3D REQUEST_DOWN) { /* = not enough subdisks */ 295 bp->b_error =3D EIO; /* I/= O error */ 296 bp->b_flags |=3D B_ERROR; 297 } 298 if ((bp->b_flags & B_DONE) =3D=3D 0) 299 biodone(bp); 300 freerq(rq); 301 return -1; 302 } 303 return launch_requests(rq, reviveok); /* now = start the requests if we can */ [the line numbers are off by 3, but the code here is unchanged]. If we go through this code, we will also return. There's something to suggest that this code was executed, since the buffer was also marked done (at line 298/300). I checked the assembly code and confirmed that it was, in fact, correct. So where is this coming from? I'm completely baffled. It doesn't happen to most people, though I have had reports of one or two other cases. About the only clue is that the problem didn't occur when I removed the debug memory allocator, but I don't know whether it went away or into hiding. I'd really like to find out what's going on here. Here's what I've done: I've built a version of vinum on your site with even more debug code, and would like you to run it again. It'll crash, but hopefully the dump will give me more information. If it's OK for me to run the tests, tell me so and I'll do it during the night. When you run this time, do the following: # vinum vinum -> debug 104 vinum -> setdaemon 1 Thanks for your patience. Greg -- See complete headers for address, home page and phone numbers finger grog@lemis.com for PGP public key To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message