From owner-freebsd-fs@FreeBSD.ORG Fri May 6 03:12:09 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 81E21106566B for ; Fri, 6 May 2011 03:12:09 +0000 (UTC) (envelope-from marcel@xcllnt.net) Received: from mail.xcllnt.net (mail.xcllnt.net [70.36.220.4]) by mx1.freebsd.org (Postfix) with ESMTP id 559F28FC12 for ; Fri, 6 May 2011 03:12:09 +0000 (UTC) Received: from dhcp-192-168-2-13.wifi.xcllnt.net (atm.xcllnt.net [70.36.220.6]) (authenticated bits=0) by mail.xcllnt.net (8.14.4/8.14.4) with ESMTP id p463BxrZ001939 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Thu, 5 May 2011 20:12:05 -0700 (PDT) (envelope-from marcel@xcllnt.net) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Marcel Moolenaar In-Reply-To: Date: Thu, 5 May 2011 20:11:59 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <6B7B3E48-08D5-47D1-85B4-FAA1EEE6764C@xcllnt.net> References: To: Marcel Moolenaar X-Mailer: Apple Mail (2.1084) Cc: freebsd-fs@freebsd.org Subject: Re: "gpart show" stuck in loop X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 May 2011 03:12:09 -0000 On May 5, 2011, at 2:25 PM, Marcel Moolenaar wrote: >=20 > On May 5, 2011, at 10:25 AM, Kevin Day wrote: >=20 >>=20 >> We've had one of our boxes getting stuck with "gpart show" (called = from rc startup scripts) consuming 100% cpu after each reboot. Manually = running "gpart show" gives me: >=20 > Can you send me a binary image of the first sector of da0 privately > and also tell me what FreeBSD version you're using. (after receiving the dump) Hi Kevin, I reproduced the problem: ns1% sudo mdconfig -a -t malloc -s 5860573173 md0 ns1% sudo gpart create -s mbr md0 md0 created ns1% gpart show md0 =3D> 63 4294967229 md0 MBR (2.7T) 63 4294967229 - free - (2.0T) ns1% sudo dd if=3Dkevin-day.mbr of=3D/dev/md0 8+0 records in 8+0 records out 4096 bytes transferred in 0.006988 secs (586144 bytes/sec) ns1% gpart show md0 =3D> 63 5860573110 md0 MBR (2.7T) 63 2147472747 1 freebsd [active] (1.0T) 2147472810 2147472810 2 freebsd [active] (1.0T) 4294945620 -2729352721 3 freebsd [active] () 1565592899 581879911 - free - (277G) 2147472810 2147472810 2 freebsd [active] (1.0T) 4294945620 -2729352721 3 freebsd [active] () 1565592899 581879911 - free - (277G) 2147472810 2147472810 2 freebsd [active] (1.0T) 4294945620 -2729352721 3 freebsd [active] () 1565592899 581879911 - free - (277G) ^C The first problem you have is that the MBR has overflows. As you can see from my initial MBR, only 2.0TB out of the 2.7T can be addressed, whereas yours addresses the whole 2.7T. There must be an overflow condition. The second problem is that more than 1 slice is marked active. Now, on to the infinite recursion in gpart. The XML has the following pertaining the slices: r0w0e0 md0s3 -1397428593152 512 4294945620 1565592898 3 freebsd 2199012157440 18446742676280958464 165 active Notice how mediasize is negative. This is a bug in the kernel. This is also what leads to the recursion in gpart, because gpart looks up the next partition on the disk, given the LBA of the next sector following the partition just processed. This allows gpart to detect free space (the next partition found doesn't start at the given LBA) and it allows gpart to print the partitions in order on the disk. In any case: since the end of slice 3 is before the start of slice 3 and even before the start of slice 2, due to its negative size, gpart will continuously find the same partitions: 1. After partition 3 the "cursor" is at 1565592899, 2. The next partition found is partition 2, at 2147472810 3. Therefore, 1565592899-2147472810 is free space 4. Partition 2 is printed, and partition 3 is found next 5. Partition 3 is printed and due to the negative size: goto 1 I think we should do things: 1. Protect the gpart tool against this, 2. Fix the kernel to simply reject partitions that fall outside of the addressable space (as determined by the limitations of the scheme). In your case it would mean that slice 3 would result in slice 3 being inaccessable. Given that you've been hit by this: do you feel that such a change would be a better failure mode? --=20 Marcel Moolenaar marcel@xcllnt.net