Date: Fri, 21 Jul 2006 09:50:24 -0700 (PDT) From: "R. B. Riddick" <arne_woerner@yahoo.com> To: Andrew - Supernews <andrew@supernews.net>, freebsd-geom@freebsd.org Cc: ade@freebsd.org Subject: Re: gmirror panics on startup, and some other cases Message-ID: <20060721165024.12901.qmail@web30309.mail.mud.yahoo.com> In-Reply-To: <E1G3Nqb-000BqI-Iv@trinity.supernews.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi! Maybe u want to test my brand-new geom_raid5? I am trying to do stress tests since yesterday, but I cannot produce any crashes anymore (I tried concurrent reads+writes while I removed+inserted disks)... geom_raid5 can be used in RAID1 mode by just giving 2 disks... It might be a bit slower than geom_mirror... U can download it here: http://home.tiscali.de/cmdr_faako/geom_raid5.tbz I would be glad to hear, if it works on other boxes, too... I would be glad, if somebody could say something about this strange exceptions, that are handled in g_raid5.c in line 411 asoasf (and so on and so forth?; I mean: perge perge) and 2044 perge perge. Bye Arne --- Andrew - Supernews <andrew@supernews.net> wrote: > Running RELENG_6 as of June 21 2006, we ran into what looks like a > couple of related gmirror bugs relating to synchronization problems > during destruction. In the worst case these can cause a kernel panic > before reaching single-user mode, if geom_mirror is loaded from > loader.conf. > > We can reproduce it as follows (might require two CPUs, haven't > tested it without SMP): > > 1. create a new gmirror device from only one disk, e.g. > gmirror label m0 da10 > > 2. insert a new disk to the existing device, e.g. > gmirror insert m0 da20 > This will start synchronization of the new disk. > > 3. Remove the original disk (what actually happened to us was that > this disk partially failed, but that's not necessary to demonstrate > the problem): > gmirror remove m0 da10 > > Most likely you will get an immediate panic from doing that (I've had > at least two different ones that I've not tracked down yet). > > Subsequently, though, you'll get a further panic any time that gmirror > tastes the remaining disk (da20 in this example). What is happening in > that case appears to be: > > g_mirror_taste sees a disk it likes, so it creates the new gmirror > geom (m0), and the various ancillary bits, starts a kernel thread for > it, and sends a "new disk" message to that thread; having sent the > message, it waits for it to be processed, and then re-acquires a lock > on &sc->sc_lock. > > In the m0 kernel thread, the "new disk" message is processed. The disk > goes to state "new", and the device state changed to "running". The > thread then posts another message to itself changing the disk state > from "new" to "synchronizing" (which isn't really meaningful since > there is only one disk in the mirror now); the device state check then > sees that there are no disks in either the "new" or "active" state, > and destroys the whole device, including destroying a lock out from > under the tasting thread which is waiting on it, hence the panic. > > Log appended below. Haven't been able to get a really usable crashdump > out of it yet. > > -- > Andrew, Supernews > http://www.supernews.com > > --8=-- > > g_mirror_taste(MIRROR, da20) > GEOM_MIRROR[2]: Tasting da20. > g_access(0xb0ae5280(da20), 1, 0, 0) > open delta:[r1w0e0] old:[r0w0e0] provider:[r0w0e0] 0xb0aba180(da20) > g_disk_access(da20, 1, 0, 0) > bio_request(0xb0ab6294) from 0xb0ae5280(mirror:taste) to 0xb0aba180(da20) cmd > 1 > g_io_deliver(0xb0ab6294) from 0xb0ae5280(mirror:taste) to 0xb0aba180(da20) > cmd 1 error 0 off 147015821312 len 512 > g_access(0xb0ae5280(da20), -1, 0, 0) > open delta:[r-1w0e0] old:[r1w0e0] provider:[r1w0e0] 0xb0aba180(da20) > g_disk_access(da20, -1, 0, 0) > g_detach(0xb0ae5280) > g_destroy_consumer(0xb0ae5280) > g_destroy_geom(0xb0abad00(mirror:taste)) > magic: GEOM::MIRROR > version: 3 > name: m0 > mid: 3504796118 > did: 2410144083 > all: 1 > genid: 0 > syncid: 1 > priority: 0 > slice: 4096 > balance: split > mediasize: 147015821312 > sectorsize: 512 > syncoffset: 1297350656 > mflags: NONE > dflags: SYNCHRONIZING > hcprovider: > provsize: 147015821824 > MD5 hash: eb7116ccf587743165eb48f285103a01 > GEOM_MIRROR[1]: Creating device m0 (id=3504796118). > GEOM_MIRROR[0]: Device m0 created (id=3504796118). > GEOM_MIRROR[1]: root_mount_hold 0xb0ab4c30 > GEOM_MIRROR[1]: Adding disk da20 to m0. > GEOM_MIRROR[2]: Adding disk da20. > g_access(0xb0ae5240(da20), 1, 1, 1) > open delta:[r1w1e1] old:[r0w0e0] provider:[r0w0e0] 0xb0aba180(da20) > g_disk_access(da20, 1, 1, 1) > g_post_event_x(0xa04bd774, 0xb0aba180, 2, 0) > ref 0xb0aba180 > GEOM_MIRROR[2]: Disk da20 connected. > GEOM_MIRROR[4]: g_mirror_event_send: Sending event 0xb0ab7aa0. > GEOM_MIRROR[4]: g_mirror_event_send: Waking up 0xb0a7fa00. > GEOM_MIRROR[4]: g_mirror_event_send: Sleeping 0xb0ab7aa0. > GEOM_MIRROR[4]: g_mirror_event_send: Sleeping 0xb0ab7aa0. > GEOM_MIRROR[5]: g_mirror_worker: Let's see... > GEOM_MIRROR[3]: Running event for disk da20. > GEOM_MIRROR[3]: Changing disk da20 state from NONE to NEW. > GEOM_MIRROR[1]: Disk da20 state changed from NONE to NEW (device m0). > GEOM_MIRROR[0]: Device m0: provider da20 detected. > GEOM_MIRROR[1]: Device m0 state changed from STARTING to RUNNING. > GEOM_MIRROR[3]: State for da20 disk: SYNCHRONIZING. > GEOM_MIRROR[4]: g_mirror_event_send: Sending event 0xb0ab7260. > GEOM_MIRROR[4]: g_mirror_event_send: Waking up 0xb0a7fa00. > GEOM_MIRROR[4]: g_mirror_worker: Waking up 0xb0ab7aa0. > GGEOM_MIRROR[4]: g_mirror_event_send: Woken up 0xb0ab7aa0. > EOM_MIRROR[5]: g_mirror_worker: I'm here 1. > GEOM_MIRROR[5]: g_mirror_worker: Let's see... > GEOM_MIRROR[3]: Running event for disk da20. > GEOM_MIRROR[3]: Changing disk da20 state from NEW to SYNCHRONIZING. > GEOM_MIRROR[1]: Disk da20 state changed from NEW to SYNCHRONIZING (device > m0). > GEOM_MIRROR[1]: root_mount_rel[1683] 0xb0ab4c30 > GEOM_MIRROR[2]: No I/O requests for m0, it can be destroyed. > bio_request(0xb0b14108) from 0xb0ae5240(m0) to 0xb0aba180(da20) cmd 2 > g_io_deliver(0xb0b14108) from 0xb0ae5240(m0) to 0xb0aba180(da20) cmd 2 error > 0 off 147015821312 len 512 > GEOM_MIRROR[2]: Metadata on da20 updated. > GEOM_MIRROR[2]: Access da20 r-1w-1e-1 = 0 > g_access(0xb0ae5240(da20), -1, -1, -1) > open delta:[r-1w-1e-1] old:[r1w1e1] provider:[r1w1e1] 0xb0aba180(da20) > g_disk_access(da20, -1, -1, -1) > g_post_event_x(0xa04bd0a0, 0xb0aba180, 2, 0) > ref 0xb0aba180 > g_post_event_x(0xb0b2c904, 0xb0ae5240, 2, 0) > g_wither_geom(0xb0a86000(m0.sync)) > GEOM_MIRROR[0]: Device m0 destroyed. > g_wither_geom(0xb0abae80(m0)) > GEOM_MIRROR[1]: Thread exiting. > > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x1c > fault code = supervisor write, page not present > instruction pointer = 0x20:0xa04f81e5 > stack pointer = 0x28:0xd28c6b64 > frame pointer = 0x28:0xd28c6b6c > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 2 (g_event) > trap number = 12 > panic: page fault > cpuid = 0 > KDB: stack backtrace: > kdb_backtrace(100,b07b9000,28,d28c6b24,c) at kdb_backtrace+0x29 > panic(a06b4123,a06d4d09,0,fffff,b07bf09b) at panic+0x114 > trap_fatal(d28c6b24,1c,b07b9000,0,c) at trap_fatal+0x2ce > trap_pfault(d28c6b24,0,1c) at trap_pfault+0x1f7 > trap(a0510008,28,d28c0028,b0a7fa2c,4) at trap+0x325 > calltrap() at calltrap+0x5 > --- trap 0xc, eip = 0xa04f81e5, esp = 0xd28c6b64, ebp = 0xd28c6b6c --- > _sx_xlock(b0a7fa2c,b0b35a45,d1,b0a7fb18,b0abb900) at _sx_xlock+0x79 > g_mirror_event_send(b0abb900,1,2,0,b0a7fa00) at g_mirror_event_send+0x1ce > g_mirror_add_disk(b0a7fa00,b0aba180,d28c6c18,b0a7fa2c,b0b35a45) at > g_mirror_add_disk+0x1ab > g_mirror_taste(b0b37040,b0aba180,0) at g_mirror_taste+0x1d5 > g_load_class(b0ab4950,0) at g_load_class+0x143 > one_event(d28c6d10,a04bbb71,258,190,b07b7000) at one_event+0x188 > g_run_events(258,190,b07b7000,a04bbad8,d28c6d24) at g_run_events+0x9 > g_event_procbody(0,d28c6d38) at g_event_procbody+0x99 > fork_exit(a04bbad8,0,d28c6d38) at fork_exit+0x71 > fork_trampoline() at fork_trampoline+0x8 > --- trap 0x1, eip = 0, esp = 0xd28c6d6c, ebp = 0 --- > Uptime: 3m8s > Dumping 3327 MB (2 chunks) > chunk 0: 1MB (155 pages) ... ok > chunk 1: 3327MB (851568 pages) 3311panic: ahd_run_qoutfifo recursion > cpuid = 0 > Uptime: 3m11s > Automatic reboot in 15 seconds - press a key on the console to abort > Rebooting... > cpu_reset: Stopping other CPUs > > > --8=---- > > _______________________________________________ > freebsd-geom@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-geom > To unsubscribe, send any mail to "freebsd-geom-unsubscribe@freebsd.org" > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060721165024.12901.qmail>