Date: Fri, 22 Feb 2013 19:49:20 +0200 From: Alexander Motin <mav@FreeBSD.org> To: Joel Dahl <joel@freebsd.org> Cc: freebsd-current@freebsd.org, Hans Petter Selasky <hselasky@c2i.net> Subject: Re: HEAD memsticks broken? [USB/CAM Problems?] Message-ID: <5127AFA0.4000008@FreeBSD.org> In-Reply-To: <5124EF38.7080302@FreeBSD.org> References: <20130209073241.GN21730@jd.benders.se> <20130209230939.GQ21730@jd.benders.se> <20130211222105.GC838@jd.benders.se> <201302120851.18810.hselasky@c2i.net> <20130214193707.GD84888@jd.benders.se> <20130216100719.GB47553@jd.benders.se> <5124EF38.7080302@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 20.02.2013 17:43, Alexander Motin wrote: > On 16.02.2013 12:07, Joel Dahl wrote: >> On 14-02-2013 20:37, Joel Dahl wrote: >>> On 12-02-2013 8:51, Hans Petter Selasky wrote: >>>> On Monday 11 February 2013 23:21:05 Joel Dahl wrote: >>>>> On 10-02-2013 0:09, Joel Dahl wrote: >>>>>> On 09-02-2013 20:28, Alexander Motin wrote: >>>>>>> How long ago that HEAD was built? Could you get full dmesg? I don't >>>>>>> think that "PREVENT ALLOW MEDIUM REMOVAL" should cause device drop. "No >>>>>>> sense data present" also doesn't look right. >>>>>> >>>>>> As I mentioned earlier, I've tried several HEAD snapshots. >>>>> >>>>> Just a quick update on this: I've built quite a few releases now and >>>>> managed to track down the problem to somewhere between r235789 and >>>>> r237855. It'll probably take me another day or two before I know which >>>>> commit actually broke it. >>>> >>>> Hi, >>>> >>>> I don't see any relevant USB+UMASS patches for your issue in this interval, >>>> but many patches in the SCSI/CAM area. >>> >>> I finally found it. A r237477 memstick boots fine. A r237478 memstick does not. >>> >>> 237478 is the following commit by mav@: >>> >>> ------------------------------------------------------------------------ >>> r237478 | mav | 2012-06-23 14:32:53 +0200 (Sat, 23 Jun 2012) | 3 lines >>> >>> Add scsi_extract_sense_ccb() -- wrapper around scsi_extract_sense_len(). >>> It allows to remove number of duplicate checks from several places. >>> >>> ------------------------------------------------------------------------ >> >> So, mav@ haven't replied yet so I did some more investigation. I collected >> all the USB sticks I had in the office (5 in total, all Kingston but different >> size and models) and tried a memstick installation with each stick. Turns out >> r237478 only breaks memstick installation in combination with certain USB >> sticks: >> >> # Works: >> >> da0: <Kingston DataTraveler 2.0 1.00> Removable Direct Access SCSI-2 device >> da0: 40.000MB/s transfers >> da0: 7664MB (15695872 512 byte sectors: 255H 63S/T 977C) >> >> da0: <Kingston DataTraveler 2.0 PMAP> Removable Direct Access SCSI-0 device >> da0: 40.000MB/s transfers >> da0: 1906MB (3903488 512 byte sectors: 255H 63S/T 242C) >> >> # Does not work: >> >> da0: <Kingston DataTraveler G3 1.00> Removable Direct Access SCSI-2 device >> da0: 40.000MB/s transfers >> da0: 15295MB (31324160 512 byte sectors: 255H 63S/T 1949C) >> >> da0: <Kingston DataTraveler G3 1.00> Removable Direct Access SCSI-0 device >> da0: 40.000MB/s transfers >> da0: 3690MB (7557704 512 byte sectors: 255H 63S/T 470C) >> >> da0: <Kingston DataTraveler G3 1.00> Removable Direct Access SCSI-2 device >> da0: 40.000MB/s transfers >> da0: 1905MB (3903264 512 byte sectors: 255H 63S/T 242C) >> >> It seems that only USB sticks labeled as "Kingston DataTraveler G3" >> are affected by r237478 (in my limited testing, at least). This particular >> model is what you get if you buy the cheapest Kingston model on the market >> right now. > > I've reviewed that change once more and I see no flaws in it. My only > guess is that it changes something innocent or unrelated in request > order that confuses flash firmware, making it stuck and return errors > without SCSI sense information. In log provided I see that when device > first detected, it normally reports its size. But later, possibly after > some command (SYNCHRONIZE CACHE?, PREVENT ALLOW MEDIUM REMOVAL?), it > starts to behave wrong. Wrong answer to another READ CAPACITY request > causes "got CAM status 0xXX" message and following device loss. > > Unfortunately I can't reproduce the problem. All USB sticks I have are > working fine without any problems with HEAD system. If I could, I would > try to log all commands sent to the stick to find one after which > problem begins. Commands could be logged either on CAM layer by running > `camcontrol debug -IPpc all` before plugging stick in and `camcontrol > debug off` after (you may want to do it in single-user mode or without > syslog running to avoid logging activity on other CAM disks), or > probably somehow on umass layer, or with usbdump on raw USB layer (in > last case some more knowledge will be needed to interpret result). I've analyzed the stick behavior on your system and got to conclusion that problem is not in mentioned revision r237478 itself. This revision fixes some points of too relaxed checks for sense data. At r237477, when umass reported error on PREVENT ALLOW MEDIUM REMOVAL command, it also falsely reported sense data presence. That command was sent by daprevent(), trying to lock the "tray" of the "removable" device. Because of relaxed check, it handled those fake responses as successful completion, and tried to unlock "tray" on device close. That unlock command somehow restored device consciousness and made it to work further. After r237478 the error is no longer hidden, and unlock command is not sent (because lock command has failed). After that, both SYNCHRONIZE CACHE(10) and READ CAPACITY(10) commands return only errors. While SYNCHRONIZE CACHE(10) errors are not significant, errors on READ CAPACITY(10) cause device destruction. Experiment shown that enabling DA_Q_NO_PREVENT quirk for this stick fixes all the problems. I've committed it to HEAD on r247154. -- Alexander Motin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5127AFA0.4000008>