Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 16 Jul 2004 20:40:25 +0100
From:      Jason Thomson <jason.thomson@mintel.com>
To:        freebsd-hardware@freebsd.org, freebsd-stable@freebsd.org
Cc:        Alasdair Lumsden <enquiries@alivewww.com>
Subject:   Reproducible FreeBSD 4.10-STABLE (Jul 7) ,  3ware 7506-4 lockup. 
Message-ID:  <40F82F29.9040006@mintel.com>
In-Reply-To: <40E52725.1060409@mintel.com>
References:  <1088701228.2638.86.camel@host-83-146-2-180.bulldogdsl.com> <20040701215131.GA83112@elvis.mu.org> <1088722694.2554.48.camel@host-83-146-2-180.bulldogdsl.com> <20040701230015.GA87635@elvis.mu.org> <1088724938.2879.17.camel@host-83-146-2-180.bulldogdsl.com> <20040701233811.GA89536@elvis.mu.org> <1088725862.2879.22.camel@host-83-146-2-180.bulldogdsl.com> <40E52725.1060409@mintel.com>

next in thread | previous in thread | raw e-mail | index | archive | help
We can now reproduce the lockup we have been experiencing.  We have not
been able to get a crash dump.  I'm not sure if it's something we're
doing wrong,  or if there's some other reason it's not saving the core
to the swap device.

Next week sometime we can make the server available on the internet if
there is someone willing and able to help us debug this.  We can
probably provide a serial console hookup from another machine if that
would help.  (We have to migrate the data from this production machine
before we can make it available).

We are very keen to resolve this problem;  we have ~20 machines running
FreeBSD 4.x with 7506-4 cards,  and so far three of them have exhibited
this problem.  (Only one is causing problems now - we replaced disks on
the other two).


Recap on problem:

Hardware / OS:

+ FreeBSD 4.x (Various -STABLE versions from 21/01/04 until 07/07/04)

+ Dell 1600SC  (UP and SMP).

+ 7506-4 cards

+ 300 / 320 GB Maxtor Maxline II hard drives.  (Only these disks*).


* We have many machines with WD2000JB / WD2500JB that do not exhibit
this problem.


To reproduce the problem on the the machine in question I run this
command:

    # dd if=/dev/twed0s1h iseek=137510 bs=1m of=/dev/null

The card then locks up hard within 10 seconds - no further I/O succeeds,
but anything that is already in cached by the VM can be read / invoked.


Crash dumps are enabled.  We have swap (and the dumpdev) configured on a
SCSI disk in the same machine.  CTRL-ALT-ESC does drop to the debugger.


    ddb> panic

followed by

    ddb> call boot(0)

does reboot the machine,  but savecore does NOT find a kernel core dump
on reboot.

It is possible that we have something configured wrongly,  but I can't
see what it is.




Another data point:

In one previouse instance of this problem,  we resolved the symptoms by
checking the disks with Maxtor's PowerMax tools.  One disk was found to
have errors and been and repairing / replacing that disk resolved the
errors.  (However,  if the disk has errors,  I would expect the RAID
card to deal with it!).






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?40F82F29.9040006>