Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 21 Aug 2014 22:58:41 -0700
From:      Neel Natu <neelnatu@gmail.com>
To:        Warner Losh <imp@bsdimp.com>
Cc:        "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, "src-committers@freebsd.org" <src-committers@freebsd.org>, Warner Losh <imp@freebsd.org>
Subject:   Re: svn commit: r270249 - head/sys/cam/ata
Message-ID:  <CAFgRE9HA29K3NFwRoTRWf-UHq-NWSt=jGN0hX=aAwrdV5Qbqkg@mail.gmail.com>
In-Reply-To: <0DAF2357-4BBA-4D5B-8F17-D61845BACDA5@bsdimp.com>
References:  <201408202258.s7KMwDh3073409@svn.freebsd.org> <CAFgRE9GvOMp4EmryGVJvPdZiUcKU0cZ3aajTfrEu8TJkAk2d-g@mail.gmail.com> <0DAF2357-4BBA-4D5B-8F17-D61845BACDA5@bsdimp.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Warner,

On Thu, Aug 21, 2014 at 10:34 PM, Warner Losh <imp@bsdimp.com> wrote:
>
> On Aug 21, 2014, at 10:31 PM, Neel Natu <neelnatu@gmail.com> wrote:
>
>> Hi Warner,
>>
>> On Wed, Aug 20, 2014 at 3:58 PM, Warner Losh <imp@freebsd.org> wrote:
>>> Author: imp
>>> Date: Wed Aug 20 22:58:12 2014
>>> New Revision: 270249
>>> URL: http://svnweb.freebsd.org/changeset/base/270249
>>>
>>> Log:
>>>  Turns out that IDENTIFY DEVICE and IDENTIFY PACKET DEVICE return data
>>>  that's only mostly similar. Specifically word 78 bits are defined for
>>>  IDENTIFY DEVICE as
>>>        5 Supports Hardware Feature Control
>>>  while a IDENTIFY PACKET DEVICE defines them as
>>>        5 Asynchronous notification supported
>>>  Therefore, only pay attention to bit 5 when we're talking to ATAPI
>>>  devices (we don't use the hardware feature control at this time).
>>>  Ignore it for ATA devices. Remove kludge that papered over this issue
>>>  for Samsung SATA SSDs, since Micron drives also have the bit set and
>>>  the error was caused by this bad interpretation of the spec (which is
>>>  quite easy to do, since bits aren't normally overlapping like this).
>>>
>>> Modified:
>>>  head/sys/cam/ata/ata_xpt.c
>>>
>>> Modified: head/sys/cam/ata/ata_xpt.c
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D
>>> --- head/sys/cam/ata/ata_xpt.c  Wed Aug 20 22:39:26 2014        (r27024=
8)
>>> +++ head/sys/cam/ata/ata_xpt.c  Wed Aug 20 22:58:12 2014        (r27024=
9)
>>> @@ -458,12 +458,18 @@ negotiate:
>>>                    0, 0x02);
>>>                break;
>>>        case PROBE_SETAN:
>>> -               /* Remember what transport thinks about AEN. */
>>> -               if (softc->caps & CTS_SATA_CAPS_H_AN)
>>> +               /*
>>> +                * Only ATAPI defines this bit to mean AEN, but remembe=
r
>>> +                * what transport thinks about AEN.
>>> +                */
>>> +               if ((softc->caps & CTS_SATA_CAPS_H_AN) &&
>>> +                   periph->path->device->protocol =3D=3D PROTO_ATAPI)
>>>                        path->device->inq_flags |=3D SID_AEN;
>>>                else
>>>                        path->device->inq_flags &=3D ~SID_AEN;
>>>                xpt_async(AC_GETDEV_CHANGED, path, NULL);
>>> +               if (periph->path->device->protocol !=3D PROTO_ATAPI)
>>> +                       break;
>>>                cam_fill_ataio(ataio,
>>>                    1,
>>>                    probedone,
>>> @@ -750,14 +756,6 @@ out:
>>>                        goto noerror;
>>>
>>>                /*
>>> -                * Some Samsung SSDs report supported Asynchronous Noti=
fication,
>>> -                * but return ABORT on attempt to enable it.
>>> -                */
>>> -               } else if (softc->action =3D=3D PROBE_SETAN &&
>>> -                   status =3D=3D CAM_ATA_STATUS_ERROR) {
>>> -                       goto noerror;
>>> -
>>> -               /*
>>>                 * SES and SAF-TE SEPs have different IDENTIFY commands,
>>>                 * but SATA specification doesn't tell how to identify t=
hem.
>>>                 * Until better way found, just try another if first fai=
l.
>>>
>>
>> This change causes a panic for me on boot. Here is the boot log:
>>
>> ahci0: <Intel Patsburg AHCI SATA controller> port
>> 0xf050-0xf057,0xf040-0xf043,0xf030-0xf037,0xf020-0xf023,0xf000-0xf01f
>> mem 0xfbb21000-0xfbb217ff irq 18 at device 31.2 on pci0
>> ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
>> ahcich0: <AHCI channel> at channel 0 on ahci0
>> ahcich1: <AHCI channel> at channel 1 on ahci0
>> ahcich2: <AHCI channel> at channel 2 on ahci0
>> ahcich3: <AHCI channel> at channel 3 on ahci0
>> ahcich4: <AHCI channel> at channel 4 on ahci0
>> ahcich5: <AHCI channel> at channel 5 on ahci0
>> ahciem0: <AHCI enclosure management bridge> on ahci0
>> ...
>> xpt_action_default: CCB type 0xdeadc0de not supported
>> ...
>> run_interrupt_driven_hooks: still waiting after 60 seconds for xpt_confi=
g
>> run_interrupt_driven_hooks: still waiting after 120 seconds for xpt_conf=
ig
>> run_interrupt_driven_hooks: still waiting after 180 seconds for xpt_conf=
ig
>> run_interrupt_driven_hooks: still waiting after 240 seconds for xpt_conf=
ig
>> run_interrupt_driven_hooks: still waiting after 300 seconds for xpt_conf=
ig
>> panic: run_interrupt_driven_config_hooks: waited too long
>> cpuid =3D 0
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff81=
d92920
>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xffffffff81d929d0
>> vpanic() at vpanic+0x189/frame 0xffffffff81d92a50
>> kassert_panic() at kassert_panic+0x139/frame 0xffffffff81d92ac0
>> boot_run_interrupt_driven_config_hooks() at
>> boot_run_interrupt_driven_config_hooks+0x111/frame 0xffffffff81d92b50
>> mi_startup()fffff81d92b70
>> btext() at btext+0x2c
>> KDB: enter: panic
>> [ thread pid 0 tid 100000 ]
>> Stopped at      kdb_enter+0x3e: movq    $0,kdb_why
>> db>
>>
>> The peripheral in question is a SATA attached CDROM:
>>
>> % camcontrol devlist
>> <INTEL SSDSC2CW240A3 400i>         at scbus0 target 0 lun 0 (pass0,ada0)
>> <ATAPI iHAS524   C LL23>           at scbus2 target 0 lun 0 (cd0,pass1)
>> <WDC WD1000CHTZ-04JCPV0 04.06A00>  at scbus3 target 0 lun 0 (pass2,ada1)
>> <Corsair Neutron GTX SSD M306>     at scbus4 target 0 lun 0 (pass3,ada2)
>> <AHCI SGPIO Enclosure 1.00 0001>   at scbus6 target 0 lun 0 (ses0,pass4)
>>
>> pass1 at ahcich2 bus 0 scbus2 target 0 lun 0
>> pass1: <ATAPI iHAS524   C LL23> Removable CD-ROM SCSI-0 device
>> pass1: Serial Number 3524472 2N8225501140
>> pass1: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192by=
tes)
>>
>> The following patch fixes the panic.
>>
>> Index: sys/cam/ata/ata_xpt.c
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> --- sys/cam/ata/ata_xpt.c       (revision 270249)
>> +++ sys/cam/ata/ata_xpt.c       (working copy)
>> @@ -468,7 +468,8 @@
>>                else
>>                        path->device->inq_flags &=3D ~SID_AEN;
>>                xpt_async(AC_GETDEV_CHANGED, path, NULL);
>> -               if (periph->path->device->protocol !=3D PROTO_ATAPI)
>> +               if (periph->path->device->protocol !=3D PROTO_ATAPI &&
>> +                   periph->path->device->protocol !=3D PROTO_SCSI)
>>                        break;
>>                cam_fill_ataio(ataio,
>>                    1,
>
> I think the more proper test is =3D=3D PROTO_ATA elsewhere, since that=E2=
=80=99s what
> distinguishes the ATA_IDENTIFY from the ATAPI_IDENTIFY.
>
>> However, there seem to be a couple of issues with the original patch:
>>
>> 1. The 'periph->path->device->protocol' is not initialized to
>> PROTO_ATAPI anywhere in the tree so the not-equal-to test is  a no-op.
>
> We test here to determine which identify command to send:
>
>                 if (periph->path->device->protocol =3D=3D PROTO_ATA)
>                         ata_28bit_cmd(ataio, ATA_ATA_IDENTIFY, 0, 0, 0);
>                 else
>                         ata_28bit_cmd(ataio, ATA_ATAPI_IDENTIFY, 0, 0, 0)=
;
>
> and that is working to send the right command.
>

Yes, but PROTO_ATA !=3D PROTO_ATAPI :-)

Since we never initialize 'periph->path->device->protocol' to
'PROTO_ATAPI' in -current:

if (protocol !=3D PROTO_ATAPI) equates to if (1)
if (protocol =3D=3D PROTO_ATAPI) equates to if (0)

I was trying to say that any code that compares 'protocol' to
PROTO_ATAPI probably deserves a second look (e.g., the original patch
that triggered this panic).

>> 2. It seems not right to break out of switch in 'probestart()' without
>> providing a way for 'probedone()' to be called. I believe that this
>> stops the state machine from making forward progress and results in
>> 'xpt_config()' not completing.
>
> That=E2=80=99s a problem, you=E2=80=99re right. Let me rework.
>
>> If you need more information to debug this some more or test a proper
>> fix then I am happy to help.
>
> Please try the one included here. I think it will address things. I=E2=80=
=99ve tried it on one system, and am trying it on others in parallel to sen=
ding this.
>

Yup, works fine. Thanks for the quick fix!

best
Neel

> Warner
>
>
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFgRE9HA29K3NFwRoTRWf-UHq-NWSt=jGN0hX=aAwrdV5Qbqkg>