Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Feb 2009 16:21:01 -0800
From:      Marcel Moolenaar <xcllnt@mac.com>
To:        usb@freebsd.org
Subject:   USB2+umass: timing related bug (machine check abort)
Message-ID:  <B5BADABE-8E3E-4248-85FA-A16DFA175B3E@mac.com>

next in thread | raw e-mail | index | archive | help
Context: MACHINE=ia64, CPU=Montecito

I'm running into a timing related MCA. In short:
	...
umass0: <HEWLETT PACKARD INTEGRITY SERVER, class 0/0, rev 2.00/0.a1,  
addr 2> on usbus2
umass0:  SCSI over Bulk-Only; quirks = 0x0000
umass0:2:0:-1: Attached to scbus2
	*** machine check abort ***
***********************************************************
* ROM Version : 01.05
* ROM Date    : 11/06/2006
* BMC Version :  05.06
***********************************************************
	...

When I enable EHCI debugging (level 99) this does not happen
and between the debug output, I see:

	...
(probe0:umass-sim0:0:0:0): TEST UNIT READY. CDB: 0 0 0 0 0 0
(probe0:umass-sim0:0:0:0): CAM Status: SCSI Status Error
(probe0:umass-sim0:0:0:0): SCSI Status: Check Condition
(probe0:umass-sim0:0:0:0): UNIT ATTENTION asc:29,0
(probe0:umass-sim0:0:0:0): Power on, reset, or bus device reset occurred
(probe0:umass-sim0:0:0:0): Retrying Command (per Sense Data)
	...
(probe0:umass-sim0:0:0:0): TEST UNIT READY. CDB: 0 0 0 0 0 0
(probe0:umass-sim0:0:0:0): CAM Status: SCSI Status Error
(probe0:umass-sim0:0:0:0): SCSI Status: Check Condition
(probe0:umass-sim0:0:0:0): NOT READY asc:3a,0
(probe0:umass-sim0:0:0:0): Medium not present
(probe0:umass-sim0:0:0:0): Unretryable error
	...
ehcd0 at umass-sim0 bus 0 target 0 lun 0
cd0: <TEAC DV-28E-N C.6B> Removable CD-ROM SCSI-0 device
cd0: 40.000MB/s transfers
cd0: Attempt to query device size failed: NOT READY, Medium not present
	...

MCA error dumps tells me that it's PCI related. I
suspect it's a race condition caused by the HCD
writing/updating operational state at the same
time that the HC is accessing it.

I have 2 instruction pointers. The first is one
where an interrupt last occured: IP=0xe0000000041cf810

(gdb) l *0xe0000000041cf810
0xe0000000041cf810 is in ehci_root_ctrl_done (/nfs/freebsd/base/head/ 
sys/dev/usb2/controller/ehci2.c:3307).
3302                            std->err = USB_ERR_IOERROR;
3303                            goto done;
3304                    }
3305                    v = EOREAD4(sc, EHCI_PORTSC(index));
3306                    DPRINTFN(9, "port status=0x%04x\n", v);
3307                    if (sc->sc_flags & EHCI_SCFLG_FORCESPEED) {
3308                            if ((v & 0xc000000) == 0x8000000)
3309                                    i = UPS_HIGH_SPEED;
3310                            else if ((v & 0xc000000) == 0x4000000)
3311                                    i = UPS_LOW_SPEED;

The second is when the MCA happened: IP=0xe00000000420d8b0

(gdb) l *0xe00000000420d8b0
0xe00000000420d8b0 is in usb2_transfer_start (/nfs/freebsd/base/head/ 
sys/dev/usb2/core/usb2_transfer.c:1577).
1572    {
1573            if (xfer == NULL) {
1574                    /* transfer is gone */
1575                    return;
1576            }
1577            USB_XFER_LOCK_ASSERT(xfer, MA_OWNED);
1578
1579            /* mark the USB transfer started */
1580
1581            if (!xfer->flags_int.started) {

The last access to the EHCI registers was through register 0x6c,
which corresponds to PORTSC(3). This matches the first IP.

The MCA is caused by an error on the PCI bus, most likely
an invalid inbound address:

**** MEMORY ERROR STRUCTURE ****
MEM_ERR_STRUCT_VALID            0x0000000000000201

**** PLATFORM_SPECIFIC_ERROR_INFO ****
VALIDATION_BITS                 0x000000000000007b
PLATFORM_ERROR_STATUS           0x0000000000421200
PLATFORM_REQUESTOR_ID           0x0000000000000000
PLATFORM_RESPONDER_ID           0x0000000000000000
PLATFORM_TARGET_ID              0x000000003fde6000
PLATFORM_BUS_SPECIFIC_DATA      0x0000000000107628
PLATFORM_OEM_COMPONENT_ID[0]    0x000000004033103c
PLATFORM_OEM_COMPONENT_ID[1]    0x0000000000000000
PLATFORM_OEM_DEVICE_PATH        0x0000000000000000

.... HP_TITAN_PLATFORM_DATA .....
ERROR_LOG_EN                    0x0000008000003dff
ERROR_SIG_EN                    0x0000200000002117
ERROR_STATUS                    0x0000000000001000
ERROR_OVFL                      0x0000000000001000
ERROR_FIRST                     0x0000000000000000
AP_ADDRa                        0x0000000000000000
AP_ADDRb                        0x0000000000000000
ST_ADDRa                        0x0000000000000000
ST_ADDRb                        0x0000000000000000
RT_ADDRa                        0x0000000000000000
RT_ADDRb                        0x0000000000000000
RP_ADDRa                        0x0000000000000000
RP_ADDRb                        0x0000000000000000
LE_ADDRa                        0x503800003fde6000
LE_ADDRb                        0xc020000000030118
ST_TO                           0x00000000fffffff3
PT_TO                           0x00000000ffffffff
RT_TO                           0x000000009e8c6100


**** PCI BUS REGISTERS ****
PCI_BUS_ERROR_VALID             0x0000000000000001


**** PLATFORM_PCI_BUS_ERROR_INFO ****
VALIDATION_BITS                    0x00000000000007cf
PCI_BUS_ERROR_STATUS               0x0000000000091200
PCI_BUS_ERROR_TYPE                 0x0000000000000000
PCI_BUS_ID                         0x0000000000000000
PCI_BUS_ADDRESS                    0x00000000fc2fa5d0
PCI_BUS_DATA                       0x0000000000000000
PCI_BUS_CMD                        0x0000000000000000
PCI_BUS_REQUESTOR_ID               0x0000000000001000
PCI_BUS_COMPLETER_ID               0x00000000fed20000
PCI_BUS_TARGET_ID                  0x00000000fc2fa5d0
PCI_BUS_OEM_ID[0]                  0x000000000000122e
PCI_BUS_OEM_ID[1]                  0x0000000000000000

.... HP_MERCURY_DATA ....
CELL_NUMBER                        0x0000000000000000
SBA_NUMBER                         0x0000000000000000
ROPE_NUMBER                        0x0000000000000000
ERROR_STATUS                       0x000000010000021a
ERROR_MASTER_ID_LOG                0x0000000000000008
INBOUND_ERR_ADDRESS                0x00000000fc2fa5d0
INBOUND_ERR_ATTRIBUTE              0x2000000000000000
COMPLETION_MESSAGE_LOG             0x0000000000000000
OUTBOUND_ERR_ADDRESS               0x0000000000000000
ERROR_CONFIG                       0x0000000000001d50
STATUS_INFO_CONTROL                0x0000000000000048
FUNC_ID                            0x0ab00146122e103c
CAPABILITIES_LIST                  0x0f00023700200002
AGP_COMMAND                        0x0000000000000000
PCIX_CAPABILITIES                  0x0013ff0000010007
OLR_CONTROL                        0x00023e1b00032403
CLOCK_CONTROL                      0x0000000000000048
BUS_MODE                           0x9da874ae36d58460

Some more background information:

\begin{log}
	...
FreeBSD 8.0-CURRENT #28 r188699M: Mon Feb 16 14:51:49 PST 2009
     marcel@hob.lan.xcllnt.net:/usr/obj/nfs/freebsd/base/head/sys/HOB
	...
CPU: Montecito (1594.66-Mhz Itanium 2)
	...
ohci0: <NEC uPD 9210 USB controller> mem 0x88032000-0x88032fff irq 17  
at device 2.0 on pci0
ohci0: [ITHREAD]
usbus0: <NEC uPD 9210 USB controller> on ohci0
ohci1: <NEC uPD 9210 USB controller> mem 0x88031000-0x88031fff irq 18  
at device 2.1 on pci0
ohci1: [ITHREAD]
usbus1: <NEC uPD 9210 USB controller> on ohci1
ehci0: <NEC uPD 720100 USB 2.0 controller> mem 0x88030000-0x880300ff  
irq 19 at device 2.2 on pci0
ehci0: [ITHREAD]
usbus2: EHCI version 1.0
usbus2: <NEC uPD 720100 USB 2.0 controller> on ehci0
	...
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 12Mbps Full Speed USB v1.0
usbus2: 480Mbps High Speed USB v2.0
ugen0.1: <NEC> at usbus0
ushub0: <NEC OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <NEC> at usbus1
ushub1: <NEC OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
ugen2.1: <NEC> at usbus2
ushub2: <NEC EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus2
ushub1: 2 ports with 2 removable, self powered
ushub0: 3 ports with 3 removable, self powered
	...
ushub2: 5 ports with 5 removable, self powered
ugen0.2: <HP> at usbus0
uhid0: <Virtual Keyboard> on usbus0
Symlink: uhid0 -> usb0.2.0.16
ums0: <Virtual Mouse> on usbus0
ugen2.2: <HEWLETT PACKARD> at usbus2
ums0: 3 buttons and [] coordinates
Symlink: ums0 -> usb0.2.1.17
	...
\end{log}

-- 
Marcel Moolenaar
xcllnt@mac.com






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B5BADABE-8E3E-4248-85FA-A16DFA175B3E>