Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Feb 1997 12:51:11 -0800
From:      pete helme <pete@ebay.com>
To:        "'freebsd-stable@freebsd.org'" <freebsd-stable@freebsd.org>
Subject:   SCSI idle hangs and reboots with 2.1.7
Message-ID:  <01BC1FF5.EB538860@pete.ebay.com>

next in thread | raw e-mail | index | archive | help
We're running a heavily loaded web server which was running fine on 2.1.5, when we upgraded to 2.1.6 and then 2.1.7, things went sour and we're now getting frequent hangs and reboots.

The only evidence we have of what's happening is the following, which would be scrolling by on the console when the machine could no longer be accessed on the net:

Feb 18 10:18:07 calculus /kernel: sd0(ahc0:0:0): timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
Feb 18 10:35:28 calculus /kernel: SEQADDR == 0xd
Feb 18 10:35:28 calculus /kernel: Clearing bus reset
Feb 18 10:35:28 calculus /kernel: Clearing 'in-reset' flag
Feb 18 10:35:28 calculus /kernel: sd1(ahc0:1:0): timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
Feb 18 10:35:28 calculus /kernel: SEQADDR == 0x10
Feb 18 10:35:28 calculus /kernel: sd1(ahc0:1:0): timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
Feb 18 10:35:28 calculus /kernel: SEQADDR == 0xc
Feb 18 10:35:28 calculus /kernel: ahc0: Issued Channel A Bus Reset. 2 SCBs aborted
Feb 18 10:35:28 calculus /kernel: Clearing bus reset
Feb 18 10:35:28 calculus /kernel: Clearing 'in-reset' flag
Feb 18 10:35:29 calculus /kernel: sd1(ahc0:1:0): timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
Feb 18 10:38:48 calculus /kernel: SEQADDR == 0xd
Feb 18 10:38:49 calculus /kernel: sd1(ahc0:1:0): timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
Feb 18 10:38:49 calculus /kernel: SEQADDR == 0x8
Feb 18 10:38:49 calculus /kernel: ahc0: Issued Channel A Bus Reset. 2 SCBs aborted
Feb 18 10:57:59 calculus /kernel: Clearing bus reset

When we first upgraded to 2.1.6 we were getting hangs a couple of times a day. When we'd run in and look at the server, we'd see stuff like the above scrolling on the console. The server could be pinged, but we couldn't telnet into it. It was essentially dead and we'd have to reboot. Usually 12 hours later the same thing would happen and we'd have to reboot the machine again. On one occasion we couldn't even ping the machine and it was completely stuck.

We've upgraded to the "final" 2.1.7 and things appear to be better, but we still get the occasional reboots at least once a day. Now it doesn't get stuck, but usually reboots itself after a few minutes of the idle. Here's the latest syslog snippet:

Feb 21 02:12:51 calculus /kernel: sd0(ahc0:0:0): timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
Feb 21 02:12:51 calculus /kernel: SEQADDR == 0x8
Feb 21 02:15:58 calculus /kernel: FreeBSD 2.1.7-RELEASE #0: Thu Feb 20 18:07:36 PST 1997

As you can see, we got the idle report again and it rebooted itself a couple of minutes later.

We thought maybe it was the SCSI chain, so we swapped the Adaptec 2940 for one in a different machine, that made no difference. We also checked the cable and it seems fine. The idle is not always happening on the same device, drives 0 & 1 have been seen with these idle errors so we doubt it's the drives themselves. Again it was working fine last week before we upgraded the OS.

We've been getting a lot of these rtq_reallyold  messages too:

Feb 20 19:22:11 calculus /kernel: in_rtqtimo: adjusted rtq_reallyold to 10

...but we've heard they are innocuous. We did see at least one instance though were there was one of these messages and then, in the same second in the log, the SCSI idles started.

We've tried running Apache 1.1.3 and 1.2b4 and 1.2b6 and that hasn't made any difference with the crashes. We did remove some kernel changes we made to SOMAXCON and TCP options to make it a more generic kernel, but that hasn't gotten rid of the SCSI idles. This is running on a Pentium Pro 200 machine with 128 MBs. A couple of our other machines which have had the 2.1.7 upgrades appear to be OK.

If anyone has any ideas what's going on, please let us know!

Thanks.

pete@ebay.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?01BC1FF5.EB538860>