From owner-freebsd-hackers@FreeBSD.ORG Sat Oct 9 17:00:47 2004 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CAAA816A4D3 for ; Sat, 9 Oct 2004 17:00:47 +0000 (GMT) Received: from beer.ux6.net (beer.ux6.net [64.62.253.29]) by mx1.FreeBSD.org (Postfix) with SMTP id 2A5BA43D4C for ; Sat, 9 Oct 2004 17:00:27 +0000 (GMT) (envelope-from miha@ghuug.org) Received: (qmail 81605 invoked by uid 113); 9 Oct 2004 10:00:25 -0700 Received: from 64.62.253.84 by beer.ux6.net (envelope-from , uid 112) with qmail-scanner-1.23 (clamdscan: 0.70. spamassassin: 2.64. Clear:RC:0(64.62.253.84):SA:0(0.3/6.0):. Processed in 4.249043 secs); 09 Oct 2004 17:00:25 -0000 X-Spam-Status: No, hits=0.3 required=6.0 Received: from unknown (HELO miha.netstream-gh.com) (miha@beer.ux6.net@64.62.253.84) by localhost with SMTP; 9 Oct 2004 10:00:20 -0700 From: "Mikhail P." To: freebsd-hackers@freebsd.org Date: Sat, 9 Oct 2004 17:01:01 +0000 User-Agent: KMail/1.7 References: <200410081937.15068.miha@ghuug.org> <200410091617.26794.miha@ghuug.org> In-Reply-To: Organization: Ghana Unix Users Group MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_NlBaBLZGX4OeKse" Message-Id: <200410091701.01987.miha@ghuug.org> cc: Dag-Erling =?iso-8859-1?q?Sm=F8rgrav?= Subject: Re: ad0: FAILURE - WRITE_DMA X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: miha@ghuug.org List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Oct 2004 17:00:48 -0000 --Boundary-00=_NlBaBLZGX4OeKse Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Saturday 09 October 2004 16:23, Dag-Erling Sm=F8rgrav wrote: > "Mikhail P." writes: > > On Saturday 09 October 2004 15:01, Dag-Erling Sm=F8rgrav wrote: > > > A lot of them, or just one or two? Some ATA drives will spin down at > > > regular intervals to recalibrate, and you'll get a harmless timeout if > > > you try to write to the disk while it's doing that. > > > > Unfortunately, all the drives (so far - four 200GB drives). > > I meant "a lot of timeouts", not "a lot of drives". If you only get > one or two timeouts per drive at regular intervals (say, once a > month), they're just recalibrating and there's nothing to worry about. > Well, there is no pattern. Often it just happens by itself - system runs 3-= 10=20 days fine (no warnings, no timeouts), and after that time I start seeing lo= ts=20 of these. To be more exact, for example I have user who's home dir=20 is /home/user; user uses FTP to upload/download files under that directory.= =20 Let's say he has 5k files in total (ranging in size from 1kb to 20mb), so=20 what happens is that when user tries to access certain files (either to=20 continue upload, or continue download of the file), system spews lots of=20 these timeouts and basically "input/ourput error" occurs. For example,=20 yesterday it showed 360 of these messages during 12 hour period, and=20 unfortunately during the time I was sleeping system has locked itself - las= t=20 message in /var/log/messages was regarding ad0 failure. I'm not exactly sure on which files it timed out yesterday, but I do know=20 under which directory it happened - directory has 20k files in it (not in t= he=20 single dir, but including subdirs). Maybe someone knows a quick way I could= =20 open every file in under that directory - this could probably help to=20 identify exactly on which file timeouts happened. Before replacing the drives, I had that server up for 120 days, and it did= =20 spew these messages (more and more with every day, started on about 90th da= y=20 of uptime count). After rebooting system, it asked for fsck, which I did ru= n,=20 but it showed some softupdates inconsistencies, and refused to mount /home = in=20 rw. By the way, I just ran fsck on rw mounted /home (that's where those timeout= s=20 occurred yesterday), and I have attached it's output. I also got another message off-list, where author suggested to play with UD= MA=20 values. I switched from UDMA100 to UDMA66. System's uptime is 12 hours, and= =20 no timeouts so far.. but I'm quite sure they will get back in few days. > BTW, are you using ataidle or anything similar? nope, nothing. > > DES regards, M. --Boundary-00=_NlBaBLZGX4OeKse Content-Type: text/plain; charset="iso-8859-1"; name="fsck.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="fsck.txt" [root]@[beer]:/usr/local/etc/rc.d> fsck /home ** /dev/ad0s1g (NO WRITE) ** Last Mounted on /home ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts LINK COUNT FILE I=8715003 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715004 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715005 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715006 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715007 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715008 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715009 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715010 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715016 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715017 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715080 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715086 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715087 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715093 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715094 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715100 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715101 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715107 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715129 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715142 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715143 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715156 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715157 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no LINK COUNT FILE I=8715163 OWNER=noc MODE=0 SIZE=0 MTIME=Oct 9 09:50 2004 COUNT 0 SHOULD BE -1 ADJUST? no ** Phase 5 - Check Cyl groups SUMMARY INFORMATION BAD SALVAGE? no BLK(S) MISSING IN BIT MAPS SALVAGE? no ALLOCATED FRAGS 34852132-34852134 MARKED FREE ALLOCATED FRAGS 34852264-34852268 MARKED FREE ALLOCATED FRAGS 34852344-34852347 MARKED FREE ALLOCATED FRAGS 34852376-34852380 MARKED FREE ALLOCATED FRAGS 34852452-34852453 MARKED FREE ALLOCATED FRAGS 34852512-34852513 MARKED FREE ALLOCATED FRAGS 34852536-34852540 MARKED FREE ALLOCATED FRAGS 34852544-34852545 MARKED FREE ALLOCATED FRAGS 34852548-34852549 MARKED FREE ALLOCATED FRAG 34852567 MARKED FREE ALLOCATED FRAG 34852583 MARKED FREE ALLOCATED FRAGS 34852594-34852599 MARKED FREE ALLOCATED FRAGS 34852616-34852620 MARKED FREE ALLOCATED FRAGS 34852757-34852758 MARKED FREE ALLOCATED FRAGS 34852818-34852820 MARKED FREE ALLOCATED FRAGS 34852824-34852827 MARKED FREE ALLOCATED FRAG 34852906 MARKED FREE ALLOCATED FRAGS 34852925-34852927 MARKED FREE ALLOCATED FRAGS 34853136-34853140 MARKED FREE ALLOCATED FRAGS 34853144-34853148 MARKED FREE ALLOCATED FRAGS 34853152-34853156 MARKED FREE ALLOCATED FRAGS 34853160-34853164 MARKED FREE ALLOCATED FRAGS 34853168-34853172 MARKED FREE ALLOCATED FRAGS 34853245-34853246 MARKED FREE ALLOCATED FRAGS 34853280-34853284 MARKED FREE ALLOCATED FRAGS 34853288-34853292 MARKED FREE ALLOCATED FRAGS 34853304-34853308 MARKED FREE ALLOCATED FRAGS 34853352-34853356 MARKED FREE ALLOCATED FRAGS 34853365-34853366 MARKED FREE ALLOCATED FRAGS 34853368-34853372 MARKED FREE ALLOCATED FRAGS 34853400-34853404 MARKED FREE ALLOCATED FRAGS 34853490-34853494 MARKED FREE ALLOCATED FRAGS 34853496-34853500 MARKED FREE ALLOCATED FRAGS 34853536-34853545 MARKED FREE ALLOCATED FRAGS 34853568-34853572 MARKED FREE ALLOCATED FRAGS 34853868-34853870 MARKED FREE ALLOCATED FRAGS 34853949-34853951 MARKED FREE ALLOCATED FRAGS 34854074-34854075 MARKED FREE ALLOCATED FRAGS 34854934-34854935 MARKED FREE ALLOCATED FRAGS 34855504-34855508 MARKED FREE ALLOCATED FRAGS 34855776-34855777 MARKED FREE ALLOCATED FRAGS 34855920-34855924 MARKED FREE ALLOCATED FRAGS 34856856-34856857 MARKED FREE ALLOCATED FRAGS 34857067-34857068 MARKED FREE ALLOCATED FRAGS 34871843-34871847 MARKED FREE ALLOCATED FRAGS 34879373-34879374 MARKED FREE ALLOCATED FRAGS 37584536-37584551 MARKED FREE ALLOCATED FRAGS 37601008-37601014 MARKED FREE 471717 files, 47373681 used, 38091807 free (33239 frags, 4757321 blocks, 0.0% fragmentation) [root]@[beer]:/usr/local/etc/rc.d> --Boundary-00=_NlBaBLZGX4OeKse--