Date: Sun, 16 Oct 2005 19:25:29 -0700 From: Michael <elshar@cheekan.org> To: freebsd-geom@freebsd.org Subject: Issues with GEOM and raid5? Message-ID: <43530B99.5090304@cheekan.org>
next in thread | raw e-mail | index | archive | help
Hey guys, I seem to be having a problem with my raid5 gvinum array actually causing my server to freeze and/or kernel panic. I've got a dualcore opteron running 6rc1 (rebuilt world/kernel as of yesterday), with an lsi megaraid 300-8x with 8 drives attached. It's currently setup with 4 hardware raid1 arrays. The drives FBSD sees are actually something like amrd0-4. Just to clarify: 8 drives in 4 raid1 arrays in hardware being used in a 4 logical drive raid5 gvinum array. I got gvinum working, the array shows up on boot, and everything's fine. But it seems that after I do a *lot* of writes to the drive interesting things start happening. There might even be symptoms going on during these transfers, as I've noticed the transfers stop occasionally for anywhere up to about 10 seconds or so. The first time it did this I got a message about increasing the PMAP_SHPGPERPROC. It also actually caused the raid card itself to think there was something wrong with two of the eight drives. I had to hotplug them while in the raid card's bios to get it to accept that the drives were not dead and allow a rebuild of the two affected raid1 arrays. Tonight before the crash I got through writing approximately 45,000 files to it in a total of about 310GB using dd. All those went fine, but then any process that tried to read anything from the array started to become non-responsive, and then the machine froze. Unfortunately, I won't be able to physically get to it until tomorrow afternoon, but I was hoping maybe someone might give me some things to do to try to coax more information out of what's going on. I haven't tried increasing the PMAP value. It seemed to me that it would only hide the actual issue that seems to be going on. And as a side note the first time the array did this I hadn't yet recompiled 6rc1 for SMP, so it was still running in UP mode. It was actually doing the buildworld for SMP when it decided to die. It is also a fresh install of 6rc1 + whatever commits were made as of approximately friday or so. Any suggestions? Things I should look for? I'll reply back tomorrow with (hopefully) what it was complaining about and whatever debug info comes out of suggestions to this email. Thanks, Michael
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?43530B99.5090304>