Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Jan 2007 12:28:08 -0600
From:      CyberLeo Kitsana <cyberleo@cyberleo.net>
To:        "R. B. Riddick" <arne_woerner@yahoo.com>
Cc:        FreeBSD Geom <freebsd-geom@freebsd.org>
Subject:   geom_raid5 livelock?
Message-ID:  <45A7D338.1090402@cyberleo.net>

next in thread | raw e-mail | index | archive | help
Hi.

I've been making use of the geom_raid5 class for FreeBSD 6.2-PRERELEASE. 
I've noticed an odd behavior lately, with the latest sources provided 
(as of 2006-01-10) in which, with 'safeop' enabled, the raid5 will stop 
responding.

I have an 800MHz Celeron (i815 chipset) with 512MB RAM and 4x 400GB 
PATA100 disks, two on a Promise PCI ATA controller, running FreeBSD 
6.2-PRERELEASE (2006-01-10). The kernel is SMP, with DEVICE_POLLING and 
HZ=100 set. The disks are configured with a 2GB slice 1 as a 4-disk 
geom_mirror containing /, and the remainder in slice 2 as a 4-disk 
geom_raid5 (32768 stripe size). The box is designed to receive daily 
backups from production servers, so data integrity is preferred over 
throughput or latency.

All works beautifully, until several dozen gigabytes are transferred to 
or from the filesystem with safeop enabled, at which point the 
filesystem will grow quickly less responsive, and eventually cease 
responding entirely (processes stuck in diskwait) CPU usage is at 0%, 
and all four members of the raid5 are being read at around 160kB/sec 
(16kB/t, 10tps) constantly. It does not naturally recover within 72 
hours. The mirror is unaffected by this behavior.

When this occurs, the moment safeop is disabled on the raid5, all the 
problems cease, the filesystem begins responding and the programs resume.

Is this intentional, an artifact of the hardware or layout I'm using, or 
could this be indicative of an obscure bug somewhere? Can I provide any 
additional information which would assist in tracking this down?

----
[cyberleo@mikayla ~]$ gmirror list
Geom name: root
State: COMPLETE
Components: 4
Balance: load
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 2
ID: 89087781
Providers:
1. Name: mirror/root
    Mediasize: 1610563584 (1.5G)
    Sectorsize: 512
    Mode: r1w1e1
Consumers:
1. Name: ad0s1a
    Mediasize: 1610564096 (1.5G)
    Sectorsize: 512
    Mode: r1w1e1
    State: ACTIVE
    Priority: 0
    Flags: DIRTY
    GenID: 0
    SyncID: 2
    ID: 3326083319
2. Name: ad2s1a
    Mediasize: 1610564096 (1.5G)
    Sectorsize: 512
    Mode: r1w1e1
    State: ACTIVE
    Priority: 0
    Flags: DIRTY
    GenID: 0
    SyncID: 2
    ID: 1957052293
3. Name: ad4s1a
    Mediasize: 1610564096 (1.5G)
    Sectorsize: 512
    Mode: r1w1e1
    State: ACTIVE
    Priority: 0
    Flags: DIRTY
    GenID: 0
    SyncID: 2
    ID: 3131999117
4. Name: ad6s1a
    Mediasize: 1610564096 (1.5G)
    Sectorsize: 512
    Mode: r1w1e1
    State: ACTIVE
    Priority: 0
    Flags: DIRTY
    GenID: 0
    SyncID: 2
    ID: 2209607005

[cyberleo@mikayla ~]$ graid5 list
Geom name: raid5
State: COMPLETE CALM
Status: Total=4, Online=4
Type: AUTOMATIC
Pending: (wqp 0 // 0)
Stripesize: 32768
MemUse: 3467264 (msl 138)
Newest: -1
ID: 3906282509
Providers:
1. Name: raid5/raid5
    Mediasize: 1193822846976 (1.1T)
    Sectorsize: 512
    Mode: r1w1e1
Consumers:
1. Name: ad6s2
    Mediasize: 397940981760 (371G)
    Sectorsize: 512
    Mode: r2w2e2
    DiskNo: 3
    Error: No
2. Name: ad4s2
    Mediasize: 397940981760 (371G)
    Sectorsize: 512
    Mode: r2w2e2
    DiskNo: 2
    Error: No
3. Name: ad2s2
    Mediasize: 397940981760 (371G)
    Sectorsize: 512
    Mode: r2w2e2
    DiskNo: 1
    Error: No
4. Name: ad0s2
    Mediasize: 397940981760 (371G)
    Sectorsize: 512
    Mode: r2w2e2
    DiskNo: 0
    Error: No
----

--
Fuzzy love,
-CyberLeo
Technical Administrator
CyberLeo.Net Webhosting
http://www.CyberLeo.Net
<CyberLeo@CyberLeo.Net>

Furry Peace! - http://www.fur.com/peace/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45A7D338.1090402>