From owner-freebsd-stable@FreeBSD.ORG Fri Jul 16 19:40:56 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A576E16A4CE; Fri, 16 Jul 2004 19:40:56 +0000 (GMT) Received: from hartley.mintel.co.uk (hartley.mintel.com [213.206.147.162]) by mx1.FreeBSD.org (Postfix) with ESMTP id 829FC43D53; Fri, 16 Jul 2004 19:40:55 +0000 (GMT) (envelope-from jason.thomson@mintel.com) Received: from [10.0.62.5] ([10.0.62.5])i6GJePoV079024; Fri, 16 Jul 2004 20:40:25 +0100 (BST) (envelope-from jason.thomson@mintel.com) Message-ID: <40F82F29.9040006@mintel.com> Date: Fri, 16 Jul 2004 20:40:25 +0100 From: Jason Thomson User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040616 X-Accept-Language: en, en-us MIME-Version: 1.0 To: freebsd-hardware@freebsd.org, freebsd-stable@freebsd.org References: <1088701228.2638.86.camel@host-83-146-2-180.bulldogdsl.com> <20040701215131.GA83112@elvis.mu.org> <1088722694.2554.48.camel@host-83-146-2-180.bulldogdsl.com> <20040701230015.GA87635@elvis.mu.org> <1088724938.2879.17.camel@host-83-146-2-180.bulldogdsl.com> <20040701233811.GA89536@elvis.mu.org> <1088725862.2879.22.camel@host-83-146-2-180.bulldogdsl.com> <40E52725.1060409@mintel.com> In-Reply-To: <40E52725.1060409@mintel.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.28 (www . roaringpenguin . com / mimedefang) cc: Paul Saab cc: vkayshap@amcc.com cc: Jason Thomson cc: Ken Smith cc: Alasdair Lumsden Subject: Reproducible FreeBSD 4.10-STABLE (Jul 7) , 3ware 7506-4 lockup. X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Jul 2004 19:40:56 -0000 We can now reproduce the lockup we have been experiencing. We have not been able to get a crash dump. I'm not sure if it's something we're doing wrong, or if there's some other reason it's not saving the core to the swap device. Next week sometime we can make the server available on the internet if there is someone willing and able to help us debug this. We can probably provide a serial console hookup from another machine if that would help. (We have to migrate the data from this production machine before we can make it available). We are very keen to resolve this problem; we have ~20 machines running FreeBSD 4.x with 7506-4 cards, and so far three of them have exhibited this problem. (Only one is causing problems now - we replaced disks on the other two). Recap on problem: Hardware / OS: + FreeBSD 4.x (Various -STABLE versions from 21/01/04 until 07/07/04) + Dell 1600SC (UP and SMP). + 7506-4 cards + 300 / 320 GB Maxtor Maxline II hard drives. (Only these disks*). * We have many machines with WD2000JB / WD2500JB that do not exhibit this problem. To reproduce the problem on the the machine in question I run this command: # dd if=/dev/twed0s1h iseek=137510 bs=1m of=/dev/null The card then locks up hard within 10 seconds - no further I/O succeeds, but anything that is already in cached by the VM can be read / invoked. Crash dumps are enabled. We have swap (and the dumpdev) configured on a SCSI disk in the same machine. CTRL-ALT-ESC does drop to the debugger. ddb> panic followed by ddb> call boot(0) does reboot the machine, but savecore does NOT find a kernel core dump on reboot. It is possible that we have something configured wrongly, but I can't see what it is. Another data point: In one previouse instance of this problem, we resolved the symptoms by checking the disks with Maxtor's PowerMax tools. One disk was found to have errors and been and repairing / replacing that disk resolved the errors. (However, if the disk has errors, I would expect the RAID card to deal with it!).