From owner-freebsd-geom@FreeBSD.ORG Fri Jan 12 18:55:32 2007 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4759116A403 for ; Fri, 12 Jan 2007 18:55:32 +0000 (UTC) (envelope-from cyberleo@cyberleo.net) Received: from pizzabox.cyberleo.net (alpha.cyberleo.net [198.145.45.10]) by mx1.freebsd.org (Postfix) with ESMTP id 2BA9F13C45D for ; Fri, 12 Jan 2007 18:55:31 +0000 (UTC) (envelope-from cyberleo@cyberleo.net) Received: (qmail 34465 invoked from network); 12 Jan 2007 18:28:50 -0000 Received: from adsl-69-212-1-127.dsl.chcgil.ameritech.net (HELO ?172.16.44.14?) (cyberleo@cyberleo.net@69.212.1.127) by alpha.cyberleo.net with ESMTPA; 12 Jan 2007 18:28:50 -0000 Message-ID: <45A7D338.1090402@cyberleo.net> Date: Fri, 12 Jan 2007 12:28:08 -0600 From: CyberLeo Kitsana User-Agent: Thunderbird 1.5 (X11/20051201) MIME-Version: 1.0 To: "R. B. Riddick" Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Geom Subject: geom_raid5 livelock? X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Jan 2007 18:55:32 -0000 Hi. I've been making use of the geom_raid5 class for FreeBSD 6.2-PRERELEASE. I've noticed an odd behavior lately, with the latest sources provided (as of 2006-01-10) in which, with 'safeop' enabled, the raid5 will stop responding. I have an 800MHz Celeron (i815 chipset) with 512MB RAM and 4x 400GB PATA100 disks, two on a Promise PCI ATA controller, running FreeBSD 6.2-PRERELEASE (2006-01-10). The kernel is SMP, with DEVICE_POLLING and HZ=100 set. The disks are configured with a 2GB slice 1 as a 4-disk geom_mirror containing /, and the remainder in slice 2 as a 4-disk geom_raid5 (32768 stripe size). The box is designed to receive daily backups from production servers, so data integrity is preferred over throughput or latency. All works beautifully, until several dozen gigabytes are transferred to or from the filesystem with safeop enabled, at which point the filesystem will grow quickly less responsive, and eventually cease responding entirely (processes stuck in diskwait) CPU usage is at 0%, and all four members of the raid5 are being read at around 160kB/sec (16kB/t, 10tps) constantly. It does not naturally recover within 72 hours. The mirror is unaffected by this behavior. When this occurs, the moment safeop is disabled on the raid5, all the problems cease, the filesystem begins responding and the programs resume. Is this intentional, an artifact of the hardware or layout I'm using, or could this be indicative of an obscure bug somewhere? Can I provide any additional information which would assist in tracking this down? ---- [cyberleo@mikayla ~]$ gmirror list Geom name: root State: COMPLETE Components: 4 Balance: load Slice: 4096 Flags: NONE GenID: 0 SyncID: 2 ID: 89087781 Providers: 1. Name: mirror/root Mediasize: 1610563584 (1.5G) Sectorsize: 512 Mode: r1w1e1 Consumers: 1. Name: ad0s1a Mediasize: 1610564096 (1.5G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 2 ID: 3326083319 2. Name: ad2s1a Mediasize: 1610564096 (1.5G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 2 ID: 1957052293 3. Name: ad4s1a Mediasize: 1610564096 (1.5G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 2 ID: 3131999117 4. Name: ad6s1a Mediasize: 1610564096 (1.5G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 2 ID: 2209607005 [cyberleo@mikayla ~]$ graid5 list Geom name: raid5 State: COMPLETE CALM Status: Total=4, Online=4 Type: AUTOMATIC Pending: (wqp 0 // 0) Stripesize: 32768 MemUse: 3467264 (msl 138) Newest: -1 ID: 3906282509 Providers: 1. Name: raid5/raid5 Mediasize: 1193822846976 (1.1T) Sectorsize: 512 Mode: r1w1e1 Consumers: 1. Name: ad6s2 Mediasize: 397940981760 (371G) Sectorsize: 512 Mode: r2w2e2 DiskNo: 3 Error: No 2. Name: ad4s2 Mediasize: 397940981760 (371G) Sectorsize: 512 Mode: r2w2e2 DiskNo: 2 Error: No 3. Name: ad2s2 Mediasize: 397940981760 (371G) Sectorsize: 512 Mode: r2w2e2 DiskNo: 1 Error: No 4. Name: ad0s2 Mediasize: 397940981760 (371G) Sectorsize: 512 Mode: r2w2e2 DiskNo: 0 Error: No ---- -- Fuzzy love, -CyberLeo Technical Administrator CyberLeo.Net Webhosting http://www.CyberLeo.Net Furry Peace! - http://www.fur.com/peace/