Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 2 Aug 2011 16:11:43 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Mike Tancsa <mike@sentex.net>
Cc:        FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>
Subject:   Re: ATA_IDENTIFY requeued due to channel reset LBA=0
Message-ID:  <20110802231143.GA5450@icarus.home.lan>
In-Reply-To: <4E38640E.9020005@sentex.net>
References:  <4E38640E.9020005@sentex.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Aug 02, 2011 at 04:54:38PM -0400, Mike Tancsa wrote:
> I upgraded a RELENG_8 box from a kernel from ~ June 15th to one today to
> get some of the zfs and bind updates and on reboot, the box panic'd
> twice, and booted fine the third time.  I have not seen this error
> before and not sure if its a hardware issue, or some odd timing issue I
> ran into ? smartctl shows no errors on any of the disks recorded in
> their logs
> 
> >From the serial console, this is what I saw
> 
> atapci0: <Intel ICH9 SATA300 controller> port
> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x3410-0x341f,0x3400-0x340f irq 21
> at device 31.2 on pci0
> ata0: <ATA channel 0> on atapci0
> ata0: [ITHREAD]
> ata1: <ATA channel 1> on atapci0
> ata1: [ITHREAD]
> pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
> atapci1: <Intel ICH9 SATA300 controller> port
> 0x3428-0x342f,0x3444-0x3447,0x3420-0x3427,0x3440-0x3443,0x30f0-0x30ff,0x30e0-0x30ef
> irq 21 at device 31.5 on pci0
> atapci1: [ITHREAD]
> ata2: <ATA channel 0> on atapci1
> ata2: [ITHREAD]
> ata3: <ATA channel 1> on atapci1
> ata3: [ITHREAD]
> acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on
> acpi0
> .
> .
> .
> ugen1.1: <Intel> at usbus1
> uhub1: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
> ugen2.1: <Intel> at usbus2
> uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
> ugen3.1: <Intel> at usbus3
> uhub3: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
> ugen4.1: <Intel> at usbus4
> uhub4: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4
> ugen5.1: <Intel> at usbus5
> uhub5: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus5
> ugen6.1: <Intel> at usbus6
> uhub6: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus6
> ugen7.1: <Intel> at usbus7
> uhub7: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus7
> uhub0: 2 ports with 2 removable, self powered
> uhub1: 2 ports with 2 removable, self powered
> uhub2: 2 ports with 2 removable, self powered
> uhub4: 2 ports with 2 removable, self powered
> uhub5: 2 ports with 2 removable, self powered
> uhub6: 2 ports with 2 removable, self powered
> ad0: 1430799MB <Seagate ST31500341AS CC1H> at ata0-master UDMA100 SATA 3Gb/s
> uhub3: 6 ports with 6 removable, self powered
> uhub7: 6 ports with 6 removable, self powered
> unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
> ...
> unknown: WARNING - ATA_IDENTIFY taskqueue timeout - completing request directly
> unknown: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0
> ad4: 1430799MB <Seagate ST31500341AS CC1H> at ata2-master UDMA100 SATA 3Gb/s
> subdisk4: WARNING - ATA_IDENTIFY requeued due to channel reset LBA=0

This looks like a failure on ata1, both -master and -slave.  I can tell
based on later output:

> ad0: 1430799MB <Seagate ST31500341AS CC1H> at ata0-master UDMA100 SATA 3Gb/s
> ad2: 76319MB <Seagate ST380811AS 3.AAE> at ata1-master UDMA100 SATA 1.5Gb/s
> ad3: 1907729MB <WDC WD2001FASS-00U0B0 01.00101> at ata1-slave UDMA100 SATA 3Gb/s
> GEOM: ad2s1: geometry does not match label (255h,63s != 16h,63s).
> ad4: 1430799MB <Seagate ST31500341AS CC1H> at ata2-master UDMA100 SATA 3Gb/s

> smartctl -a /dev/ad3
> ...
> smartctl -a /dev/ad2
> ....

These disks look in fine shape.

The only thing I could think might have happened is that either the ata1
channel went wonky, or one of the two disks (ad2 or ad3) has an
intermittent problem within its HPA area (which TMK wouldn't show up in
SMART but would manifest itself as an issue when ATA_IDENTIFY/0xec
would be called since some of the data that CDB returns comes from the
HPA), or there's a strange quirk/bug in the older ata(4) code.

Does your motherboard support AHCI?  I see it's ICH9 and you're running
it in Compatible mode (which makes SATA devices appear as classic PATA).

Alexander (mav@) should act as a more authoritative source for this.
However, I can tell you up front that booting verbose is your best
choice of option here, as it gives significant controller status/state
debug information which is useful when troubleshooting things like this.
So unless the problem is reproducible, I'm not sure if you're willing to
constantly boot verbose or not.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110802231143.GA5450>