Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 21 Aug 2014 21:31:01 -0700
From:      Neel Natu <neelnatu@gmail.com>
To:        Warner Losh <imp@freebsd.org>
Cc:        "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, "src-committers@freebsd.org" <src-committers@freebsd.org>
Subject:   Re: svn commit: r270249 - head/sys/cam/ata
Message-ID:  <CAFgRE9GvOMp4EmryGVJvPdZiUcKU0cZ3aajTfrEu8TJkAk2d-g@mail.gmail.com>
In-Reply-To: <201408202258.s7KMwDh3073409@svn.freebsd.org>
References:  <201408202258.s7KMwDh3073409@svn.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Warner,

On Wed, Aug 20, 2014 at 3:58 PM, Warner Losh <imp@freebsd.org> wrote:
> Author: imp
> Date: Wed Aug 20 22:58:12 2014
> New Revision: 270249
> URL: http://svnweb.freebsd.org/changeset/base/270249
>
> Log:
>   Turns out that IDENTIFY DEVICE and IDENTIFY PACKET DEVICE return data
>   that's only mostly similar. Specifically word 78 bits are defined for
>   IDENTIFY DEVICE as
>         5 Supports Hardware Feature Control
>   while a IDENTIFY PACKET DEVICE defines them as
>         5 Asynchronous notification supported
>   Therefore, only pay attention to bit 5 when we're talking to ATAPI
>   devices (we don't use the hardware feature control at this time).
>   Ignore it for ATA devices. Remove kludge that papered over this issue
>   for Samsung SATA SSDs, since Micron drives also have the bit set and
>   the error was caused by this bad interpretation of the spec (which is
>   quite easy to do, since bits aren't normally overlapping like this).
>
> Modified:
>   head/sys/cam/ata/ata_xpt.c
>
> Modified: head/sys/cam/ata/ata_xpt.c
> ==============================================================================
> --- head/sys/cam/ata/ata_xpt.c  Wed Aug 20 22:39:26 2014        (r270248)
> +++ head/sys/cam/ata/ata_xpt.c  Wed Aug 20 22:58:12 2014        (r270249)
> @@ -458,12 +458,18 @@ negotiate:
>                     0, 0x02);
>                 break;
>         case PROBE_SETAN:
> -               /* Remember what transport thinks about AEN. */
> -               if (softc->caps & CTS_SATA_CAPS_H_AN)
> +               /*
> +                * Only ATAPI defines this bit to mean AEN, but remember
> +                * what transport thinks about AEN.
> +                */
> +               if ((softc->caps & CTS_SATA_CAPS_H_AN) &&
> +                   periph->path->device->protocol == PROTO_ATAPI)
>                         path->device->inq_flags |= SID_AEN;
>                 else
>                         path->device->inq_flags &= ~SID_AEN;
>                 xpt_async(AC_GETDEV_CHANGED, path, NULL);
> +               if (periph->path->device->protocol != PROTO_ATAPI)
> +                       break;
>                 cam_fill_ataio(ataio,
>                     1,
>                     probedone,
> @@ -750,14 +756,6 @@ out:
>                         goto noerror;
>
>                 /*
> -                * Some Samsung SSDs report supported Asynchronous Notification,
> -                * but return ABORT on attempt to enable it.
> -                */
> -               } else if (softc->action == PROBE_SETAN &&
> -                   status == CAM_ATA_STATUS_ERROR) {
> -                       goto noerror;
> -
> -               /*
>                  * SES and SAF-TE SEPs have different IDENTIFY commands,
>                  * but SATA specification doesn't tell how to identify them.
>                  * Until better way found, just try another if first fail.
>

This change causes a panic for me on boot. Here is the boot log:

ahci0: <Intel Patsburg AHCI SATA controller> port
0xf050-0xf057,0xf040-0xf043,0xf030-0xf037,0xf020-0xf023,0xf000-0xf01f
mem 0xfbb21000-0xfbb217ff irq 18 at device 31.2 on pci0
ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
ahciem0: <AHCI enclosure management bridge> on ahci0
...
xpt_action_default: CCB type 0xdeadc0de not supported
...
run_interrupt_driven_hooks: still waiting after 60 seconds for xpt_config
run_interrupt_driven_hooks: still waiting after 120 seconds for xpt_config
run_interrupt_driven_hooks: still waiting after 180 seconds for xpt_config
run_interrupt_driven_hooks: still waiting after 240 seconds for xpt_config
run_interrupt_driven_hooks: still waiting after 300 seconds for xpt_config
panic: run_interrupt_driven_config_hooks: waited too long
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff81d92920
kdb_backtrace() at kdb_backtrace+0x39/frame 0xffffffff81d929d0
vpanic() at vpanic+0x189/frame 0xffffffff81d92a50
kassert_panic() at kassert_panic+0x139/frame 0xffffffff81d92ac0
boot_run_interrupt_driven_config_hooks() at
boot_run_interrupt_driven_config_hooks+0x111/frame 0xffffffff81d92b50
mi_startup()fffff81d92b70
btext() at btext+0x2c
KDB: enter: panic
[ thread pid 0 tid 100000 ]
Stopped at      kdb_enter+0x3e: movq    $0,kdb_why
db>

The peripheral in question is a SATA attached CDROM:

% camcontrol devlist
<INTEL SSDSC2CW240A3 400i>         at scbus0 target 0 lun 0 (pass0,ada0)
<ATAPI iHAS524   C LL23>           at scbus2 target 0 lun 0 (cd0,pass1)
<WDC WD1000CHTZ-04JCPV0 04.06A00>  at scbus3 target 0 lun 0 (pass2,ada1)
<Corsair Neutron GTX SSD M306>     at scbus4 target 0 lun 0 (pass3,ada2)
<AHCI SGPIO Enclosure 1.00 0001>   at scbus6 target 0 lun 0 (ses0,pass4)

pass1 at ahcich2 bus 0 scbus2 target 0 lun 0
pass1: <ATAPI iHAS524   C LL23> Removable CD-ROM SCSI-0 device
pass1: Serial Number 3524472 2N8225501140
pass1: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)

The following patch fixes the panic.

Index: sys/cam/ata/ata_xpt.c
===================================================================
--- sys/cam/ata/ata_xpt.c       (revision 270249)
+++ sys/cam/ata/ata_xpt.c       (working copy)
@@ -468,7 +468,8 @@
                else
                        path->device->inq_flags &= ~SID_AEN;
                xpt_async(AC_GETDEV_CHANGED, path, NULL);
-               if (periph->path->device->protocol != PROTO_ATAPI)
+               if (periph->path->device->protocol != PROTO_ATAPI &&
+                   periph->path->device->protocol != PROTO_SCSI)
                        break;
                cam_fill_ataio(ataio,
                    1,

However, there seem to be a couple of issues with the original patch:

1. The 'periph->path->device->protocol' is not initialized to
PROTO_ATAPI anywhere in the tree so the not-equal-to test is  a no-op.

2. It seems not right to break out of switch in 'probestart()' without
providing a way for 'probedone()' to be called. I believe that this
stops the state machine from making forward progress and results in
'xpt_config()' not completing.

If you need more information to debug this some more or test a proper
fix then I am happy to help.

best
Neel



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFgRE9GvOMp4EmryGVJvPdZiUcKU0cZ3aajTfrEu8TJkAk2d-g>