Date: Fri, 12 Jan 2007 12:28:08 -0600 From: CyberLeo Kitsana <cyberleo@cyberleo.net> To: "R. B. Riddick" <arne_woerner@yahoo.com> Cc: FreeBSD Geom <freebsd-geom@freebsd.org> Subject: geom_raid5 livelock? Message-ID: <45A7D338.1090402@cyberleo.net>
next in thread | raw e-mail | index | archive | help
Hi. I've been making use of the geom_raid5 class for FreeBSD 6.2-PRERELEASE. I've noticed an odd behavior lately, with the latest sources provided (as of 2006-01-10) in which, with 'safeop' enabled, the raid5 will stop responding. I have an 800MHz Celeron (i815 chipset) with 512MB RAM and 4x 400GB PATA100 disks, two on a Promise PCI ATA controller, running FreeBSD 6.2-PRERELEASE (2006-01-10). The kernel is SMP, with DEVICE_POLLING and HZ=100 set. The disks are configured with a 2GB slice 1 as a 4-disk geom_mirror containing /, and the remainder in slice 2 as a 4-disk geom_raid5 (32768 stripe size). The box is designed to receive daily backups from production servers, so data integrity is preferred over throughput or latency. All works beautifully, until several dozen gigabytes are transferred to or from the filesystem with safeop enabled, at which point the filesystem will grow quickly less responsive, and eventually cease responding entirely (processes stuck in diskwait) CPU usage is at 0%, and all four members of the raid5 are being read at around 160kB/sec (16kB/t, 10tps) constantly. It does not naturally recover within 72 hours. The mirror is unaffected by this behavior. When this occurs, the moment safeop is disabled on the raid5, all the problems cease, the filesystem begins responding and the programs resume. Is this intentional, an artifact of the hardware or layout I'm using, or could this be indicative of an obscure bug somewhere? Can I provide any additional information which would assist in tracking this down? ---- [cyberleo@mikayla ~]$ gmirror list Geom name: root State: COMPLETE Components: 4 Balance: load Slice: 4096 Flags: NONE GenID: 0 SyncID: 2 ID: 89087781 Providers: 1. Name: mirror/root Mediasize: 1610563584 (1.5G) Sectorsize: 512 Mode: r1w1e1 Consumers: 1. Name: ad0s1a Mediasize: 1610564096 (1.5G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 2 ID: 3326083319 2. Name: ad2s1a Mediasize: 1610564096 (1.5G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 2 ID: 1957052293 3. Name: ad4s1a Mediasize: 1610564096 (1.5G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 2 ID: 3131999117 4. Name: ad6s1a Mediasize: 1610564096 (1.5G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 2 ID: 2209607005 [cyberleo@mikayla ~]$ graid5 list Geom name: raid5 State: COMPLETE CALM Status: Total=4, Online=4 Type: AUTOMATIC Pending: (wqp 0 // 0) Stripesize: 32768 MemUse: 3467264 (msl 138) Newest: -1 ID: 3906282509 Providers: 1. Name: raid5/raid5 Mediasize: 1193822846976 (1.1T) Sectorsize: 512 Mode: r1w1e1 Consumers: 1. Name: ad6s2 Mediasize: 397940981760 (371G) Sectorsize: 512 Mode: r2w2e2 DiskNo: 3 Error: No 2. Name: ad4s2 Mediasize: 397940981760 (371G) Sectorsize: 512 Mode: r2w2e2 DiskNo: 2 Error: No 3. Name: ad2s2 Mediasize: 397940981760 (371G) Sectorsize: 512 Mode: r2w2e2 DiskNo: 1 Error: No 4. Name: ad0s2 Mediasize: 397940981760 (371G) Sectorsize: 512 Mode: r2w2e2 DiskNo: 0 Error: No ---- -- Fuzzy love, -CyberLeo Technical Administrator CyberLeo.Net Webhosting http://www.CyberLeo.Net <CyberLeo@CyberLeo.Net> Furry Peace! - http://www.fur.com/peace/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45A7D338.1090402>