From owner-freebsd-stable@FreeBSD.ORG Wed Apr 25 12:16:14 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 17D5416A402 for ; Wed, 25 Apr 2007 12:16:14 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [82.208.36.70]) by mx1.freebsd.org (Postfix) with ESMTP id CEC3C13C45E for ; Wed, 25 Apr 2007 12:16:13 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from localhost (localhost.codelab.cz [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 543C519E027; Wed, 25 Apr 2007 14:16:12 +0200 (CEST) Received: from [192.168.1.2] (grimm.quip.cz [213.220.192.218]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTP id 1666219E019; Wed, 25 Apr 2007 14:16:07 +0200 (CEST) Message-ID: <462F4687.5060204@quip.cz> Date: Wed, 25 Apr 2007 14:16:07 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 X-Accept-Language: cz, cs, en, en-us MIME-Version: 1.0 To: =?ISO-8859-1?Q?Johan_Str=F6m?= References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-stable@freebsd.org Subject: Re: ATA driver/gmirror problems, multiple boxes... X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Apr 2007 12:16:14 -0000 Johan Ström wrote: > Hello > > I got a few boxes, elfi crus and gw-1, running gmirror. These are three > completely different boxes, but all are running 6.1. They all have > multiple disks which are gmirrored, two of them SATA-only and one has a > mirror between one SATA and one ATA. > Some times now and then they all have different problems with the > mirrors.. All three in different ways.. although elfi being the one > crashing most, its also the one with most disk IO so that might be > "expected" (not that it crashes but that its the one crashing most > often).. > First, some HW spec: [...] > Yes.. it fails and then the whole box totally HANGS... No input > possible at all.. had to hard-reboot it with the button... Not good at > all.. I have been running the disks that are now in elfi in this > machine before, and at that time I had the same problem.. disk problems > -> total hang.. That was with sata only, this appears to be a problem > with the ATA disk too?.. > > I have never succeeded to force these crashes.. they appear now and > then but I can never produce them on demand.. The crashes happens now > and then, no regular intervals though.. For elfi: > Apr 24 05:20:27 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad6 > disconnected. > (I actually cant find any other entry in the logs, but judging from IRC > logs: march 28, march 12, feb 13, jan 22, jan 18) > > For crus: > Apr 23 13:46:14 crus kernel: GEOM_MIRROR: Device gm1: provider ad8 > disconnected. > Apr 13 09:57:49 crus kernel: GEOM_MIRROR: Device gm1: provider ad8 > disconnected. > I think it has happened once more, but thats it.. > > For gw-1 it's luckily only once so far.. At least with the current > install, it has had problems when the maxtor disks was running in it > (and i think it was 6.0 back then) > > So.. Three different boxes, with three different chipsets... With three > different crash scenarios.. But they all have problems.. So where is > the actual problem? The HW? The chipset drivers? Gmirror code? I have > run SMART tests on the crashing disks, no errors.. I have run powermax > (maxtors own test program) a while back on the maxtor disks, no > problems.. I have tried changing SATA cables on some of the disks, no > difference.. > > Does anyone have any clue about what can be causing this? What is most > likely? How do we hunt this down? I have same problems for a long time (you can found my posts in this list last year). From my point of view - this is HW problem. For example: I have 4 same machines Sun Fire X2100 and one of them have this problem (always on same ATA channel), others not. HW becomes cheapper and cheapper at cost of lower quality. Miroslav Lachman