FreeBSD Mail Archives

Date:      Fri, 21 Jul 2006 09:50:24 -0700 (PDT)
From:      "R. B. Riddick" <arne_woerner@yahoo.com>
To:        Andrew - Supernews <andrew@supernews.net>, freebsd-geom@freebsd.org
Cc:        ade@freebsd.org
Subject:   Re: gmirror panics on startup, and some other cases
Message-ID:  <20060721165024.12901.qmail@web30309.mail.mud.yahoo.com>
In-Reply-To: <E1G3Nqb-000BqI-Iv@trinity.supernews.net>

Hi!

Maybe u want to test my brand-new geom_raid5?

I am trying to do stress tests since yesterday, but I cannot produce any
crashes anymore (I tried concurrent reads+writes while I removed+inserted
disks)...

geom_raid5 can be used in RAID1 mode by just giving 2 disks... It might be a
bit slower than geom_mirror...

U can download it here:
http://home.tiscali.de/cmdr_faako/geom_raid5.tbz

I would be glad to hear, if it works on other boxes, too...

I would be glad, if somebody could say something about this strange exceptions,
that are handled in g_raid5.c in line 411 asoasf (and so on and so forth?; I
mean: perge perge) and 2044 perge perge.

Bye
Arne


--- Andrew - Supernews <andrew@supernews.net> wrote:

> Running RELENG_6 as of June 21 2006, we ran into what looks like a
> couple of related gmirror bugs relating to synchronization problems
> during destruction. In the worst case these can cause a kernel panic
> before reaching single-user mode, if geom_mirror is loaded from
> loader.conf.
> 
> We can reproduce it as follows (might require two CPUs, haven't
> tested it without SMP):
> 
> 1. create a new gmirror device from only one disk, e.g.
>     gmirror label m0 da10
> 
> 2. insert a new disk to the existing device, e.g.
>     gmirror insert m0 da20
> This will start synchronization of the new disk.
> 
> 3. Remove the original disk (what actually happened to us was that
> this disk partially failed, but that's not necessary to demonstrate
> the problem):
>     gmirror remove m0 da10
> 
> Most likely you will get an immediate panic from doing that (I've had
> at least two different ones that I've not tracked down yet).
> 
> Subsequently, though, you'll get a further panic any time that gmirror
> tastes the remaining disk (da20 in this example). What is happening in
> that case appears to be:
> 
> g_mirror_taste sees a disk it likes, so it creates the new gmirror
> geom (m0), and the various ancillary bits, starts a kernel thread for
> it, and sends a "new disk" message to that thread; having sent the
> message, it waits for it to be processed, and then re-acquires a lock
> on &sc->sc_lock.
> 
> In the m0 kernel thread, the "new disk" message is processed. The disk
> goes to state "new", and the device state changed to "running".  The
> thread then posts another message to itself changing the disk state
> from "new" to "synchronizing" (which isn't really meaningful since
> there is only one disk in the mirror now); the device state check then
> sees that there are no disks in either the "new" or "active" state,
> and destroys the whole device, including destroying a lock out from
> under the tasting thread which is waiting on it, hence the panic.
> 
> Log appended below. Haven't been able to get a really usable crashdump
> out of it yet.
> 
> -- 
> Andrew, Supernews
> http://www.supernews.com
> 
> --8=--
> 
> g_mirror_taste(MIRROR, da20)
> GEOM_MIRROR[2]: Tasting da20.
> g_access(0xb0ae5280(da20), 1, 0, 0)
> open delta:[r1w0e0] old:[r0w0e0] provider:[r0w0e0] 0xb0aba180(da20)
> g_disk_access(da20, 1, 0, 0)
> bio_request(0xb0ab6294) from 0xb0ae5280(mirror:taste) to 0xb0aba180(da20) cmd
> 1
> g_io_deliver(0xb0ab6294) from 0xb0ae5280(mirror:taste) to 0xb0aba180(da20)
> cmd 1 error 0 off 147015821312 len 512
> g_access(0xb0ae5280(da20), -1, 0, 0)
> open delta:[r-1w0e0] old:[r1w0e0] provider:[r1w0e0] 0xb0aba180(da20)
> g_disk_access(da20, -1, 0, 0)
> g_detach(0xb0ae5280)
> g_destroy_consumer(0xb0ae5280)
> g_destroy_geom(0xb0abad00(mirror:taste))
>      magic: GEOM::MIRROR
>    version: 3
>       name: m0
>        mid: 3504796118
>        did: 2410144083
>        all: 1
>      genid: 0
>     syncid: 1
>   priority: 0
>      slice: 4096
>    balance: split
>  mediasize: 147015821312
> sectorsize: 512
> syncoffset: 1297350656
>     mflags: NONE
>     dflags: SYNCHRONIZING
> hcprovider: 
>   provsize: 147015821824
>   MD5 hash: eb7116ccf587743165eb48f285103a01
> GEOM_MIRROR[1]: Creating device m0 (id=3504796118).
> GEOM_MIRROR[0]: Device m0 created (id=3504796118).
> GEOM_MIRROR[1]: root_mount_hold 0xb0ab4c30
> GEOM_MIRROR[1]: Adding disk da20 to m0.
> GEOM_MIRROR[2]: Adding disk da20.
> g_access(0xb0ae5240(da20), 1, 1, 1)
> open delta:[r1w1e1] old:[r0w0e0] provider:[r0w0e0] 0xb0aba180(da20)
> g_disk_access(da20, 1, 1, 1)
> g_post_event_x(0xa04bd774, 0xb0aba180, 2, 0)
>   ref 0xb0aba180
> GEOM_MIRROR[2]: Disk da20 connected.
> GEOM_MIRROR[4]: g_mirror_event_send: Sending event 0xb0ab7aa0.
> GEOM_MIRROR[4]: g_mirror_event_send: Waking up 0xb0a7fa00.
> GEOM_MIRROR[4]: g_mirror_event_send: Sleeping 0xb0ab7aa0.
> GEOM_MIRROR[4]: g_mirror_event_send: Sleeping 0xb0ab7aa0.
> GEOM_MIRROR[5]: g_mirror_worker: Let's see...
> GEOM_MIRROR[3]: Running event for disk da20.
> GEOM_MIRROR[3]: Changing disk da20 state from NONE to NEW.
> GEOM_MIRROR[1]: Disk da20 state changed from NONE to NEW (device m0).
> GEOM_MIRROR[0]: Device m0: provider da20 detected.
> GEOM_MIRROR[1]: Device m0 state changed from STARTING to RUNNING.
> GEOM_MIRROR[3]: State for da20 disk: SYNCHRONIZING.
> GEOM_MIRROR[4]: g_mirror_event_send: Sending event 0xb0ab7260.
> GEOM_MIRROR[4]: g_mirror_event_send: Waking up 0xb0a7fa00.
> GEOM_MIRROR[4]: g_mirror_worker: Waking up 0xb0ab7aa0.
> GGEOM_MIRROR[4]: g_mirror_event_send: Woken up 0xb0ab7aa0.
> EOM_MIRROR[5]: g_mirror_worker: I'm here 1.
> GEOM_MIRROR[5]: g_mirror_worker: Let's see...
> GEOM_MIRROR[3]: Running event for disk da20.
> GEOM_MIRROR[3]: Changing disk da20 state from NEW to SYNCHRONIZING.
> GEOM_MIRROR[1]: Disk da20 state changed from NEW to SYNCHRONIZING (device
> m0).
> GEOM_MIRROR[1]: root_mount_rel[1683] 0xb0ab4c30
> GEOM_MIRROR[2]: No I/O requests for m0, it can be destroyed.
> bio_request(0xb0b14108) from 0xb0ae5240(m0) to 0xb0aba180(da20) cmd 2
> g_io_deliver(0xb0b14108) from 0xb0ae5240(m0) to 0xb0aba180(da20) cmd 2 error
> 0 off 147015821312 len 512
> GEOM_MIRROR[2]: Metadata on da20 updated.
> GEOM_MIRROR[2]: Access da20 r-1w-1e-1 = 0
> g_access(0xb0ae5240(da20), -1, -1, -1)
> open delta:[r-1w-1e-1] old:[r1w1e1] provider:[r1w1e1] 0xb0aba180(da20)
> g_disk_access(da20, -1, -1, -1)
> g_post_event_x(0xa04bd0a0, 0xb0aba180, 2, 0)
>   ref 0xb0aba180
> g_post_event_x(0xb0b2c904, 0xb0ae5240, 2, 0)
> g_wither_geom(0xb0a86000(m0.sync))
> GEOM_MIRROR[0]: Device m0 destroyed.
> g_wither_geom(0xb0abae80(m0))
> GEOM_MIRROR[1]: Thread exiting.
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x1c
> fault code              = supervisor write, page not present
> instruction pointer     = 0x20:0xa04f81e5
> stack pointer           = 0x28:0xd28c6b64
> frame pointer           = 0x28:0xd28c6b6c
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 2 (g_event)
> trap number             = 12
> panic: page fault
> cpuid = 0
> KDB: stack backtrace:
> kdb_backtrace(100,b07b9000,28,d28c6b24,c) at kdb_backtrace+0x29
> panic(a06b4123,a06d4d09,0,fffff,b07bf09b) at panic+0x114
> trap_fatal(d28c6b24,1c,b07b9000,0,c) at trap_fatal+0x2ce
> trap_pfault(d28c6b24,0,1c) at trap_pfault+0x1f7
> trap(a0510008,28,d28c0028,b0a7fa2c,4) at trap+0x325
> calltrap() at calltrap+0x5
> --- trap 0xc, eip = 0xa04f81e5, esp = 0xd28c6b64, ebp = 0xd28c6b6c ---
> _sx_xlock(b0a7fa2c,b0b35a45,d1,b0a7fb18,b0abb900) at _sx_xlock+0x79
> g_mirror_event_send(b0abb900,1,2,0,b0a7fa00) at g_mirror_event_send+0x1ce
> g_mirror_add_disk(b0a7fa00,b0aba180,d28c6c18,b0a7fa2c,b0b35a45) at
> g_mirror_add_disk+0x1ab
> g_mirror_taste(b0b37040,b0aba180,0) at g_mirror_taste+0x1d5
> g_load_class(b0ab4950,0) at g_load_class+0x143
> one_event(d28c6d10,a04bbb71,258,190,b07b7000) at one_event+0x188
> g_run_events(258,190,b07b7000,a04bbad8,d28c6d24) at g_run_events+0x9
> g_event_procbody(0,d28c6d38) at g_event_procbody+0x99
> fork_exit(a04bbad8,0,d28c6d38) at fork_exit+0x71
> fork_trampoline() at fork_trampoline+0x8
> --- trap 0x1, eip = 0, esp = 0xd28c6d6c, ebp = 0 ---
> Uptime: 3m8s
> Dumping 3327 MB (2 chunks)
>   chunk 0: 1MB (155 pages) ... ok
>   chunk 1: 3327MB (851568 pages) 3311panic: ahd_run_qoutfifo recursion
> cpuid = 0
> Uptime: 3m11s
> Automatic reboot in 15 seconds - press a key on the console to abort
> Rebooting...
> cpu_reset: Stopping other CPUs
> 
> 
> --8=----
> 
> _______________________________________________
> freebsd-geom@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-geom
> To unsubscribe, send any mail to "freebsd-geom-unsubscribe@freebsd.org"
> 



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060721165024.12901.qmail>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation