From owner-freebsd-stable@FreeBSD.ORG  Fri Jul 16 19:40:56 2004
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id A576E16A4CE; Fri, 16 Jul 2004 19:40:56 +0000 (GMT)
Received: from hartley.mintel.co.uk (hartley.mintel.com [213.206.147.162])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 829FC43D53; Fri, 16 Jul 2004 19:40:55 +0000 (GMT)
	(envelope-from jason.thomson@mintel.com)
Received: from [10.0.62.5] ([10.0.62.5])i6GJePoV079024;
	Fri, 16 Jul 2004 20:40:25 +0100 (BST)
	(envelope-from jason.thomson@mintel.com)
Message-ID: <40F82F29.9040006@mintel.com>
Date: Fri, 16 Jul 2004 20:40:25 +0100
From: Jason Thomson <jason.thomson@mintel.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
	rv:1.7) Gecko/20040616
X-Accept-Language: en, en-us
MIME-Version: 1.0
To: freebsd-hardware@freebsd.org, freebsd-stable@freebsd.org
References: <1088701228.2638.86.camel@host-83-146-2-180.bulldogdsl.com>
	<20040701215131.GA83112@elvis.mu.org>
	<1088722694.2554.48.camel@host-83-146-2-180.bulldogdsl.com>
	<20040701230015.GA87635@elvis.mu.org>
	<1088724938.2879.17.camel@host-83-146-2-180.bulldogdsl.com>
	<20040701233811.GA89536@elvis.mu.org>
	<1088725862.2879.22.camel@host-83-146-2-180.bulldogdsl.com>
	<40E52725.1060409@mintel.com>
In-Reply-To: <40E52725.1060409@mintel.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.28 (www . roaringpenguin . com / mimedefang)
cc: Paul Saab <ps@mu.org>
cc: vkayshap@amcc.com
cc: Jason Thomson <jason.thomson@mintel.com>
cc: Ken Smith <kensmith@cse.Buffalo.EDU>
cc: Alasdair Lumsden <enquiries@alivewww.com>
Subject: Reproducible FreeBSD 4.10-STABLE (Jul 7) ,  3ware 7506-4 lockup.
 
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Jul 2004 19:40:56 -0000

We can now reproduce the lockup we have been experiencing.  We have not
been able to get a crash dump.  I'm not sure if it's something we're
doing wrong,  or if there's some other reason it's not saving the core
to the swap device.

Next week sometime we can make the server available on the internet if
there is someone willing and able to help us debug this.  We can
probably provide a serial console hookup from another machine if that
would help.  (We have to migrate the data from this production machine
before we can make it available).

We are very keen to resolve this problem;  we have ~20 machines running
FreeBSD 4.x with 7506-4 cards,  and so far three of them have exhibited
this problem.  (Only one is causing problems now - we replaced disks on
the other two).


Recap on problem:

Hardware / OS:

+ FreeBSD 4.x (Various -STABLE versions from 21/01/04 until 07/07/04)

+ Dell 1600SC  (UP and SMP).

+ 7506-4 cards

+ 300 / 320 GB Maxtor Maxline II hard drives.  (Only these disks*).


* We have many machines with WD2000JB / WD2500JB that do not exhibit
this problem.


To reproduce the problem on the the machine in question I run this
command:

    # dd if=/dev/twed0s1h iseek=137510 bs=1m of=/dev/null

The card then locks up hard within 10 seconds - no further I/O succeeds,
but anything that is already in cached by the VM can be read / invoked.


Crash dumps are enabled.  We have swap (and the dumpdev) configured on a
SCSI disk in the same machine.  CTRL-ALT-ESC does drop to the debugger.


    ddb> panic

followed by

    ddb> call boot(0)

does reboot the machine,  but savecore does NOT find a kernel core dump
on reboot.

It is possible that we have something configured wrongly,  but I can't
see what it is.


Another data point:

In one previouse instance of this problem,  we resolved the symptoms by
checking the disks with Maxtor's PowerMax tools.  One disk was found to
have errors and been and repairing / replacing that disk resolved the
errors.  (However,  if the disk has errors,  I would expect the RAID
card to deal with it!).