From owner-freebsd-stable@FreeBSD.ORG Sun Aug 21 10:40:46 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0FD80106564A for ; Sun, 21 Aug 2011 10:40:46 +0000 (UTC) (envelope-from perryh@pluto.rain.com) Received: from agora.rdrop.com (agora.rdrop.com [IPv6:2607:f678:1010::34]) by mx1.freebsd.org (Postfix) with ESMTP id DAA8F8FC0C for ; Sun, 21 Aug 2011 10:40:45 +0000 (UTC) Received: from agora.rdrop.com (66@localhost [127.0.0.1]) by agora.rdrop.com (8.13.1/8.12.7) with ESMTP id p7LAeh91036325 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sun, 21 Aug 2011 03:40:44 -0700 (PDT) (envelope-from perryh@pluto.rain.com) Received: (from uucp@localhost) by agora.rdrop.com (8.13.1/8.12.9/Submit) with UUCP id p7LAehxE036324; Sun, 21 Aug 2011 03:40:43 -0700 (PDT) Received: from fbsd81 ([192.168.200.81]) by pluto.rain.com (4.1/SMI-4.1-pluto-M2060407) id AA22266; Sun, 21 Aug 11 03:35:47 PDT Date: Sun, 21 Aug 2011 10:35:25 -0700 From: perryh@pluto.rain.com To: freebsd@jdc.parodius.com Message-Id: <4e5141dd.mmh6t9R/knnc8Jll%perryh@pluto.rain.com> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> <20110820032438.GA21925@icarus.home.lan> <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> <4E4FF4D6.1090305@os2.kiev.ua> <20110820183456.GA38317@icarus.home.lan> <4e50c931.gCNlQFqn5sVQXXax%perryh@pluto.rain.com> <20110821050051.GA47415@icarus.home.lan> In-Reply-To: <20110821050051.GA47415@icarus.home.lan> User-Agent: nail 11.25 7/29/05 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org, ml@os2.kiev.ua, dan@langille.org, utisoft@gmail.com Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Aug 2011 10:40:46 -0000 Jeremy Chadwick wrote: > On Sun, Aug 21, 2011 at 02:00:33AM -0700, perryh@pluto.rain.com > wrote: > > Jeremy Chadwick wrote: > > > ... using dd to find the bad LBAs is the only choice he has. > > or sysutils/diskcheckd ... > That software has a major problem where it runs constantly, rather > than periodically. Even in light of the discussion below, I would not think that a problem for the particular purpose under discussion, where it's presumably going to be terminated after completing a single pass. The "dd" approach is also going to soak the drive for the duration. > I know because I'm the one who opened the PR on it: > http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/115853 > There's a discussion about this port/issue from a few days ago > (how sweet!): > http://lists.freebsd.org/pipermail/freebsd-ports/2011-August/069276.html > With comments from you stating that the software is behaving as > designed and that I misread the man page, but also stating point > blank that "either way the software runs continuously" (which is > what the PR was about in the first place): > http://lists.freebsd.org/pipermail/freebsd-ports/2011-August/069321.html > ... > Back to my PR. > I state that I set up diskcheckd.conf using the option you > describe as "a length of time over which to spread each pass", > yet what happened was that it did as much I/O as it could > (read the entire disk in 45 minutes) then proceeded to do > it again (no sleep()) ... Agreed, that is not what is supposed to happen. What I see as a misreading of the manpage is reflected in your assertion, in the closing comment on 7/1/2008, that "the code does not do what the manpage says (or vice-versa)." Having looked at both the code and the manpage, I don't agree with that assessment. As I read it, the manpage sentence Naturally, it would be contradictory to specify both the frequency and the rate, so only one of these should be specified. has to mean that the "days" (frequency) setting is simply an alternative way of specifying the rate. Is there some other interpretation that I'm missing? Based on the code, it looks to me as if diskcheckd is supposed to read 64KB checking for errors, then sleep for a calculated length of time before reading the next 64KB, so as to average out to the (directly or indirectly) specified rate. Thus it is intended to run "continuously" in the sense that its I/O load is supposed to be as uniform as possible, consistent with reading 64KB at a time, rather than imposing a heavier load for some period of time and then pausing for the balance of the specified number of days. This is entirely consistent with my understanding of the manpage. Given that 115853 was closed (which AFAIK is supposed to mean "no longer considered a problem"), and seemed to have involved a misunderstanding of how diskcheckd was intended to operate, I decided to investigate the open 143566 instead -- and 143566 explicitly stated that "diskcheckd runs fine when gmirror is not involved ..." So I've been running diskcheckd on a gmirrored system and it seems to be working. As to what is actually going on: Earlier this evening I started looking into the failure to call updateproctitle() as mentioned in 115853's closing comment, which I had also noticed in my own testing, and it seems that this _is_ related to the now-clarified problem of diskcheckd running flat-out instead of pausing between each 64KB read. When the specified or calculated rate exceeds 64KB/sec, the required sleep interval between 64KB chunks is less than one second. Since diskcheckd calculates the interval in whole seconds -- because it calls sleep() rather than usleep() or nanosleep() -- an interval of less than one second is calculated as zero. That zero "interval" gets passed to sleep(), which dutifully returns immediately or nearly so, and the same zero is also used to "increment" the counter that is supposed to cause updateproctitle() to be called every 300 seconds. I suspect the fix will be to calculate in microseconds, and call usleep() instead of sleep(). And yes, I am planning to fix it -- and clarify the manpage -- but not tonight. > ... and besides, such a utility really shouldn't be a daemon > anyway but a periodic(8)-called utility with appropriate locks put > in place to ensure more than one instance can't be run at once. I suppose that can be argued either way. It's not obvious to me that using, say, 7x as much bandwidth for one day and then taking 6 days off is somehow better than spreading the testing over an entire week. Furthermore, using periodic(8) could get _really_ messy if checking multiple drives using different frequencies -- unless one wanted to run a separate instance of the program for each drive (and then we would have to prevent multiple simultaneous instances for any one drive, while allowing simultaneous checking of multiple drives).