From owner-svn-src-head@FreeBSD.ORG Fri Aug 22 18:56:41 2014 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 16787914 for ; Fri, 22 Aug 2014 18:56:41 +0000 (UTC) Received: from mail-pa0-f41.google.com (mail-pa0-f41.google.com [209.85.220.41]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D31EC370F for ; Fri, 22 Aug 2014 18:56:40 +0000 (UTC) Received: by mail-pa0-f41.google.com with SMTP id rd3so17089096pab.14 for ; Fri, 22 Aug 2014 11:56:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:content-type:mime-version:subject:from :in-reply-to:date:cc:message-id:references:to; bh=LSSK/ZXkSbSjPOHxqtBkm22yporMqE3r6B1t+IwlW6w=; b=I495/yu+J053Uov8HpcFgHdiDCn1z8RcwB7tfj96OGFpprDKdLzaSoPHiIAejOClZS 5rK9YSPU5KQIp2+pjd9w4wyPhVdGpdbevKgYz+Z8w2oE/WfAnRkbGVxupu19NL1x/HAD 8gX4GusVd5YGi5WYoqMAGDl+DYCvXaUBu+VexMIb3re5KF7tEO46VCIDnfRsO4Q7vLwl JG4TpPyepHesgQzG83EVWOKh9MkoB+QCqcOqT3+VZV0hUTl15V7IQb6Izq93OyaV12X3 Y+UdA/Pg7Mn0jfiNEOUSEP4XQRhpU3LWEX0jPlvdCwnTnFdTKo9L4CZky+H5KWwmwWt3 TyTQ== X-Gm-Message-State: ALoCoQk/2yFqEP0n+nsYg5dBqNK2HnePeYqtWcLrVS++WwubRUin8uqEYJpEQi0zNUtM3e5Kyqvj X-Received: by 10.68.95.196 with SMTP id dm4mr8551021pbb.95.1408733799723; Fri, 22 Aug 2014 11:56:39 -0700 (PDT) Received: from [10.64.25.67] (dc1-prod.netflix.com. [69.53.236.251]) by mx.google.com with ESMTPSA id h4sm44784820pdi.30.2014.08.22.11.56.37 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 22 Aug 2014 11:56:38 -0700 (PDT) Sender: Warner Losh Content-Type: multipart/signed; boundary="Apple-Mail=_484F65C8-572B-4C8D-8AC3-164FA0965F65"; protocol="application/pgp-signature"; micalg=pgp-sha512 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: svn commit: r270249 - head/sys/cam/ata From: Warner Losh In-Reply-To: Date: Fri, 22 Aug 2014 12:56:36 -0600 Message-Id: References: <201408202258.s7KMwDh3073409@svn.freebsd.org> <0DAF2357-4BBA-4D5B-8F17-D61845BACDA5@bsdimp.com> <118A680A-E4E4-4FEF-9C9C-44771F89A2D7@bsdimp.com> To: Neel Natu X-Mailer: Apple Mail (2.1878.6) Cc: "svn-src-head@freebsd.org" , "svn-src-all@freebsd.org" , "src-committers@freebsd.org" , Warner Losh X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Aug 2014 18:56:41 -0000 --Apple-Mail=_484F65C8-572B-4C8D-8AC3-164FA0965F65 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 On Aug 22, 2014, at 12:07 PM, Neel Natu wrote: > Hi Warner, >=20 > On Fri, Aug 22, 2014 at 6:13 AM, Warner Losh wrote: >>=20 >> On Aug 21, 2014, at 11:58 PM, Neel Natu wrote: >>=20 >>> Hi Warner, >>>=20 >>> On Thu, Aug 21, 2014 at 10:34 PM, Warner Losh = wrote: >>>>=20 >>>> On Aug 21, 2014, at 10:31 PM, Neel Natu wrote: >>>>=20 >>>>> Hi Warner, >>>>>=20 >>>>> On Wed, Aug 20, 2014 at 3:58 PM, Warner Losh = wrote: >>>>>> Author: imp >>>>>> Date: Wed Aug 20 22:58:12 2014 >>>>>> New Revision: 270249 >>>>>> URL: http://svnweb.freebsd.org/changeset/base/270249 >>>>>>=20 >>>>>> Log: >>>>>> Turns out that IDENTIFY DEVICE and IDENTIFY PACKET DEVICE return = data >>>>>> that's only mostly similar. Specifically word 78 bits are defined = for >>>>>> IDENTIFY DEVICE as >>>>>> 5 Supports Hardware Feature Control >>>>>> while a IDENTIFY PACKET DEVICE defines them as >>>>>> 5 Asynchronous notification supported >>>>>> Therefore, only pay attention to bit 5 when we're talking to = ATAPI >>>>>> devices (we don't use the hardware feature control at this time). >>>>>> Ignore it for ATA devices. Remove kludge that papered over this = issue >>>>>> for Samsung SATA SSDs, since Micron drives also have the bit set = and >>>>>> the error was caused by this bad interpretation of the spec = (which is >>>>>> quite easy to do, since bits aren't normally overlapping like = this). >>>>>>=20 >>>>>> Modified: >>>>>> head/sys/cam/ata/ata_xpt.c >>>>>>=20 >>>>>> Modified: head/sys/cam/ata/ata_xpt.c >>>>>> = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D >>>>>> --- head/sys/cam/ata/ata_xpt.c Wed Aug 20 22:39:26 2014 = (r270248) >>>>>> +++ head/sys/cam/ata/ata_xpt.c Wed Aug 20 22:58:12 2014 = (r270249) >>>>>> @@ -458,12 +458,18 @@ negotiate: >>>>>> 0, 0x02); >>>>>> break; >>>>>> case PROBE_SETAN: >>>>>> - /* Remember what transport thinks about AEN. */ >>>>>> - if (softc->caps & CTS_SATA_CAPS_H_AN) >>>>>> + /* >>>>>> + * Only ATAPI defines this bit to mean AEN, but = remember >>>>>> + * what transport thinks about AEN. >>>>>> + */ >>>>>> + if ((softc->caps & CTS_SATA_CAPS_H_AN) && >>>>>> + periph->path->device->protocol =3D=3D = PROTO_ATAPI) >>>>>> path->device->inq_flags |=3D SID_AEN; >>>>>> else >>>>>> path->device->inq_flags &=3D ~SID_AEN; >>>>>> xpt_async(AC_GETDEV_CHANGED, path, NULL); >>>>>> + if (periph->path->device->protocol !=3D = PROTO_ATAPI) >>>>>> + break; >>>>>> cam_fill_ataio(ataio, >>>>>> 1, >>>>>> probedone, >>>>>> @@ -750,14 +756,6 @@ out: >>>>>> goto noerror; >>>>>>=20 >>>>>> /* >>>>>> - * Some Samsung SSDs report supported = Asynchronous Notification, >>>>>> - * but return ABORT on attempt to enable it. >>>>>> - */ >>>>>> - } else if (softc->action =3D=3D PROBE_SETAN && >>>>>> - status =3D=3D CAM_ATA_STATUS_ERROR) { >>>>>> - goto noerror; >>>>>> - >>>>>> - /* >>>>>> * SES and SAF-TE SEPs have different IDENTIFY = commands, >>>>>> * but SATA specification doesn't tell how to = identify them. >>>>>> * Until better way found, just try another if first = fail. >>>>>>=20 >>>>>=20 >>>>> This change causes a panic for me on boot. Here is the boot log: >>>>>=20 >>>>> ahci0: port >>>>> = 0xf050-0xf057,0xf040-0xf043,0xf030-0xf037,0xf020-0xf023,0xf000-0xf01f >>>>> mem 0xfbb21000-0xfbb217ff irq 18 at device 31.2 on pci0 >>>>> ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not = supported >>>>> ahcich0: at channel 0 on ahci0 >>>>> ahcich1: at channel 1 on ahci0 >>>>> ahcich2: at channel 2 on ahci0 >>>>> ahcich3: at channel 3 on ahci0 >>>>> ahcich4: at channel 4 on ahci0 >>>>> ahcich5: at channel 5 on ahci0 >>>>> ahciem0: on ahci0 >>>>> ... >>>>> xpt_action_default: CCB type 0xdeadc0de not supported >>>>> ... >>>>> run_interrupt_driven_hooks: still waiting after 60 seconds for = xpt_config >>>>> run_interrupt_driven_hooks: still waiting after 120 seconds for = xpt_config >>>>> run_interrupt_driven_hooks: still waiting after 180 seconds for = xpt_config >>>>> run_interrupt_driven_hooks: still waiting after 240 seconds for = xpt_config >>>>> run_interrupt_driven_hooks: still waiting after 300 seconds for = xpt_config >>>>> panic: run_interrupt_driven_config_hooks: waited too long >>>>> cpuid =3D 0 >>>>> KDB: stack backtrace: >>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame = 0xffffffff81d92920 >>>>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xffffffff81d929d0 >>>>> vpanic() at vpanic+0x189/frame 0xffffffff81d92a50 >>>>> kassert_panic() at kassert_panic+0x139/frame 0xffffffff81d92ac0 >>>>> boot_run_interrupt_driven_config_hooks() at >>>>> boot_run_interrupt_driven_config_hooks+0x111/frame = 0xffffffff81d92b50 >>>>> mi_startup()fffff81d92b70 >>>>> btext() at btext+0x2c >>>>> KDB: enter: panic >>>>> [ thread pid 0 tid 100000 ] >>>>> Stopped at kdb_enter+0x3e: movq $0,kdb_why >>>>> db> >>>>>=20 >>>>> The peripheral in question is a SATA attached CDROM: >>>>>=20 >>>>> % camcontrol devlist >>>>> at scbus0 target 0 lun 0 = (pass0,ada0) >>>>> at scbus2 target 0 lun 0 = (cd0,pass1) >>>>> at scbus3 target 0 lun 0 = (pass2,ada1) >>>>> at scbus4 target 0 lun 0 = (pass3,ada2) >>>>> at scbus6 target 0 lun 0 = (ses0,pass4) >>>>>=20 >>>>> pass1 at ahcich2 bus 0 scbus2 target 0 lun 0 >>>>> pass1: Removable CD-ROM SCSI-0 device >>>>> pass1: Serial Number 3524472 2N8225501140 >>>>> pass1: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO = 8192bytes) >>>>>=20 >>>>> The following patch fixes the panic. >>>>>=20 >>>>> Index: sys/cam/ata/ata_xpt.c >>>>> = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>> --- sys/cam/ata/ata_xpt.c (revision 270249) >>>>> +++ sys/cam/ata/ata_xpt.c (working copy) >>>>> @@ -468,7 +468,8 @@ >>>>> else >>>>> path->device->inq_flags &=3D ~SID_AEN; >>>>> xpt_async(AC_GETDEV_CHANGED, path, NULL); >>>>> - if (periph->path->device->protocol !=3D = PROTO_ATAPI) >>>>> + if (periph->path->device->protocol !=3D = PROTO_ATAPI && >>>>> + periph->path->device->protocol !=3D = PROTO_SCSI) >>>>> break; >>>>> cam_fill_ataio(ataio, >>>>> 1, >>>>=20 >>>> I think the more proper test is =3D=3D PROTO_ATA elsewhere, since = that=92s what >>>> distinguishes the ATA_IDENTIFY from the ATAPI_IDENTIFY. >>>>=20 >>>>> However, there seem to be a couple of issues with the original = patch: >>>>>=20 >>>>> 1. The 'periph->path->device->protocol' is not initialized to >>>>> PROTO_ATAPI anywhere in the tree so the not-equal-to test is a = no-op. >>>>=20 >>>> We test here to determine which identify command to send: >>>>=20 >>>> if (periph->path->device->protocol =3D=3D PROTO_ATA) >>>> ata_28bit_cmd(ataio, ATA_ATA_IDENTIFY, 0, 0, = 0); >>>> else >>>> ata_28bit_cmd(ataio, ATA_ATAPI_IDENTIFY, 0, = 0, 0); >>>>=20 >>>> and that is working to send the right command. >>>>=20 >>>=20 >>> Yes, but PROTO_ATA !=3D PROTO_ATAPI :-) >>>=20 >>> Since we never initialize 'periph->path->device->protocol' to >>> 'PROTO_ATAPI' in -current: >>=20 >> But this code appears to: >>=20 >> case PROBE_RESET: >> { >> int sign =3D (done_ccb->ataio.res.lba_high << 8) + >> done_ccb->ataio.res.lba_mid; >> CAM_DEBUG(path, CAM_DEBUG_PROBE, >> ("SIGNATURE: %04x\n", sign)); >> if (sign =3D=3D 0x0000 && >> done_ccb->ccb_h.target_id !=3D 15) { >> path->device->protocol =3D PROTO_ATA; >> PROBE_SET_ACTION(softc, PROBE_IDENTIFY); >> } else if (sign =3D=3D 0x9669 && >> done_ccb->ccb_h.target_id =3D=3D 15) { >> /* Report SIM that PM is present. */ >> bzero(&cts, sizeof(cts)); >> xpt_setup_ccb(&cts.ccb_h, path, = CAM_PRIORITY_NONE); >> cts.ccb_h.func_code =3D XPT_SET_TRAN_SETTINGS; >> cts.type =3D CTS_TYPE_CURRENT_SETTINGS; >> cts.xport_specific.sata.pm_present =3D 1; >> cts.xport_specific.sata.valid =3D = CTS_SATA_VALID_PM; >> xpt_action((union ccb *)&cts); >> path->device->protocol =3D PROTO_SATAPM; >> PROBE_SET_ACTION(softc, PROBE_PM_PID); >> } else if (sign =3D=3D 0xc33c && >> done_ccb->ccb_h.target_id !=3D 15) { >> path->device->protocol =3D PROTO_SEMB; >> PROBE_SET_ACTION(softc, PROBE_IDENTIFY_SES); >> } else if (sign =3D=3D 0xeb14 && >> done_ccb->ccb_h.target_id !=3D 15) { >> path->device->protocol =3D PROTO_SCSI; >> PROBE_SET_ACTION(softc, PROBE_IDENTIFY); >> } else { >> if (done_ccb->ccb_h.target_id !=3D 15) { >> xpt_print(path, >> "Unexpected signature 0x%04x\n", = sign); >> } >> goto device_fail; >> } >>=20 >> what am I missing? >>=20 >=20 > In the snippet above 'protocol' is set to one of PROTO_ATA, > PROTO_SATAPM, PROTO_SEMB or PROTO_SCSI - none of which is PROTO_ATAPI > :-) >=20 > $ find sys -type f -exec grep -nH -w PROTO_ATAPI {} \; > sys/cam/scsi/scsi_pass.c:355: if (cgd->protocol =3D=3D PROTO_SCSI || > cgd->protocol =3D=3D PROTO_ATAPI) > sys/cam/cam_ccb.h:249: PROTO_ATAPI, /* AT Attachment Packetized = Interface */ Uggg. OK. I=92ll look... > best > Neel >=20 >>> if (protocol !=3D PROTO_ATAPI) equates to if (1) >>> if (protocol =3D=3D PROTO_ATAPI) equates to if (0) >>>=20 >>> I was trying to say that any code that compares 'protocol' to >>> PROTO_ATAPI probably deserves a second look (e.g., the original = patch >>> that triggered this panic). >>=20 >> Yes, but I think you=92re analysis was incorrect on this point :) >>=20 >>>>> 2. It seems not right to break out of switch in 'probestart()' = without >>>>> providing a way for 'probedone()' to be called. I believe that = this >>>>> stops the state machine from making forward progress and results = in >>>>> 'xpt_config()' not completing. >>>>=20 >>>> That=92s a problem, you=92re right. Let me rework. >>>>=20 >>>>> If you need more information to debug this some more or test a = proper >>>>> fix then I am happy to help. >>>>=20 >>>> Please try the one included here. I think it will address things. = I=92ve tried it on one system, and am trying it on others in parallel to = sending this. >>>>=20 >>>=20 >>> Yup, works fine. Thanks for the quick fix! >>=20 >> Will push it in. Thanks. >>=20 >> Warner --Apple-Mail=_484F65C8-572B-4C8D-8AC3-164FA0965F65 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJT95JlAAoJEGwc0Sh9sBEAgn4P/Rhg/LsA2xDxdA5O/Ej3j0oJ F5bQHo4cB+Hw3uUbT/tbu3BuUqnqiwRoMW2/MIxsY0EB/pVeHqFmwrNidVuWmUAd Drsam7ndFNhpchcMXjgTtjXnvk4LdI88dTD+2A4pGY0OPWMxNTFONUGrh8S/cPIN DFD4ise8jnv+/qj0P/Hi8yHOJshgu0UOXVIg6LLYXiVgOBOUzH2Dip1FVDT4j/Dj ruJjQ1OnRhgRo5W7u1bfpR3RpI4BH3FoYxdt0GBgepwpUjU5lKpXPO4J2oFWbIGO EVdy8lxBNbZGCNCQdrH/5E4TfgUXC7E+L9SROS+eno+SX+Jkoy7LeUoRt7DYiW4L 59gbXC0BM74mIU13KDegorIPxZLN4tryp933EihrtUGYq0tkeb+ZnHFdHCl1gyJ5 lBnwLwKz1362yJ2h/EEk+e+kcJe/g5OoaW5trSxtNcmV2v/2dmeTysiK+2FVf03m mElhrqJO/YZ71+pEH66DA5jMaVh4O9ef60EuYpXCoMgg4Od1jD2tYsdpGhUE0nzf gWnFiDR/3pTMKkmv9SQtSl0ra5L0HI5/YZM9gMn1oEVGxHZ9Kw0zIpMg4qTPLlrt n7e+Udyb2iYNyezC8tfJaIIjnYoCQLB6bwxISQ4NKVP48J1pfLOYfZMRtsSkczGt NfMnL9ifmwAGan1fCG2A =Licb -----END PGP SIGNATURE----- --Apple-Mail=_484F65C8-572B-4C8D-8AC3-164FA0965F65--