Date: Wed, 13 Apr 2022 05:14:57 +0000 (UTC) From: mahesh mv <maheshm_v@yahoo.com> To: Chris <bsd-lists@bsdforge.com> Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org> Subject: Re: xhci USB transaction error and subsequent recovery mechanism on Freebsd stable/12 Message-ID: <1601830847.251013.1649826897803@mail.yahoo.com> In-Reply-To: <5fefe57150f9efab867a775722b9d71b@bsdforge.com> References: <1524993805.98701.1649776236883.ref@mail.yahoo.com> <1524993805.98701.1649776236883@mail.yahoo.com> <5fefe57150f9efab867a775722b9d71b@bsdforge.com>
next in thread | previous in thread | raw e-mail | index | archive | help
------=_Part_251012_182260728.1649826897801 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, Thank you for the inputs. The drive is formatted as GPT with an ESP/UFS par= titions. Thanks,Mahesh On Wednesday, 13 April 2022, 02:57:45 GMT+5:30, Chris <bsd-lists@bsdfor= ge.com> wrote: =20 =20 On 2022-04-12 08:10, mahesh mv wrote: > Hi all, >=20 > =C2=A0 >=20 > Need you help regarding an urgent issue where we are observing an issue w= ith > Freebsd stable/12. The DATA0/DATA1 are out of sync with respect to EP and= =20 > the > system experiences the >=20 > READ(10) errors. The READ(10) error recovers with in couple of retries mo= st=20 > of the > times but few cases we have observed that the read retries gets exhausted= =20 > and > =C2=A0system moves >=20 > to unusable state (continuous g_vfs_done() errors) . We are using Junos b= ut=20 > the > xhci driver etc.. are all pristine stable 12 drivers no Juniper specific= =20 > changes. > =C2=A0This issue was never observed with Linux kernel 5.4.2 on the same H= W. > =C2=A0Errors Seen on console >=20 > =C2=A0 >=20 > (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 28 cf 28 00 00 40 00 >=20 > (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error >=20 > (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain >=20 > (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 28 cf 28 00 00 40 00 >=20 > (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error >=20 > (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain >=20 > FreeBSD/arm (Amnesiac) (ttyu0) >=20 > login: >=20 > I can share the USB traces taken at the USB device if required. > Thanks,Mahesh I just replaced a drive 2 days ago that exhibited the same behavior. I=20 haven't (yet) checked the replaced drive yet for cause. But what I chose to do was as=20 follows. Get a new (known dependable) drive. Add it to the system and dump the data = on=20 the failing disk to the new drive. At least you'll have a safe copy of it. You didn't say how the drive(s) are formatted/laid out. Are you using UFS/G= PT=20 or ZFS? How you proceed after making a safe copy will depend on how you manage your= =20 disks. UFS/GPT?: simply remove the failing the disk, and change the entry in=20 fdtab(5) to point to the new disk. ZFS. It should be enough to simply replace the failing disk with one at lea= st=20 the size of the failing one and resilver. HTH --Chris =20 ------=_Part_251012_182260728.1649826897801 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <html><head></head><body><div class=3D"ydp8b7e2091yahoo-style-wrap" style= =3D"font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:13px= ;"><div></div> <div dir=3D"ltr" data-setdir=3D"false">Hi,</div><div dir=3D"ltr" da= ta-setdir=3D"false"><br></div><div dir=3D"ltr" data-setdir=3D"false">Thank = you for the inputs. The drive is formatted as GPT with an ESP/UFS partition= s.</div><div dir=3D"ltr" data-setdir=3D"false"><br></div><div dir=3D"ltr" d= ata-setdir=3D"false">Thanks,</div><div dir=3D"ltr" data-setdir=3D"false">Ma= hesh</div><div dir=3D"ltr" data-setdir=3D"false"><br></div><div><br></div> =20 </div><div id=3D"ydp1d9310b1yahoo_quoted_0633315197" class=3D"ydp1d= 9310b1yahoo_quoted"> <div style=3D"font-family:'Helvetica Neue', Helvetica, Arial, s= ans-serif;font-size:13px;color:#26282a;"> =20 <div> On Wednesday, 13 April 2022, 02:57:45 GMT+5:30, Chris &= lt;bsd-lists@bsdforge.com> wrote: </div> <div><br></div> <div><br></div> <div><div dir=3D"ltr">On 2022-04-12 08:10, mahesh mv wrote:= <br clear=3D"none">> Hi all,<br clear=3D"none">> <br clear=3D"none">&= gt; <br clear=3D"none">> <br clear=3D"none">> Need you help reg= arding an urgent issue where we are observing an issue with<br clear=3D"non= e">> Freebsd stable/12. The DATA0/DATA1 are out of sync with respect to = EP and <br clear=3D"none">> the<br clear=3D"none">> system experience= s the<br clear=3D"none">> <br clear=3D"none">> READ(10) errors. The R= EAD(10) error recovers with in couple of retries most <br clear=3D"none">&g= t; of the<br clear=3D"none">> times but few cases we have observed that = the read retries gets exhausted <br clear=3D"none">> and<br clear=3D"non= e">> system moves<br clear=3D"none">> <br clear=3D"none">> t= o unusable state (continuous g_vfs_done() errors) . We are using Junos but = <br clear=3D"none">> the<br clear=3D"none">> xhci driver etc.. are al= l pristine stable 12 drivers no Juniper specific <br clear=3D"none">> ch= anges.<br clear=3D"none">> This issue was never observed with Linu= x kernel 5.4.2 on the same HW.<br clear=3D"none">> Errors Seen on = console<br clear=3D"none">> <br clear=3D"none">> <br clear=3D"n= one">> <br clear=3D"none">> (da0:umass-sim0:0:0:0): READ(10). CDB: 28= 00 00 28 cf 28 00 00 40 00<br clear=3D"none">> <br clear=3D"none">> = (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error<br = clear=3D"none">> <br clear=3D"none">> (da0:umass-sim0:0:0:0): Retryin= g command, 3 more tries remain<br clear=3D"none">> <br clear=3D"none">&g= t; (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 28 cf 28 00 00 40 00<br = clear=3D"none">> <br clear=3D"none">> (da0:umass-sim0:0:0:0): CAM sta= tus: CCB request completed with an error<br clear=3D"none">> <br clear= =3D"none">> (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remai= n<br clear=3D"none">> <br clear=3D"none">> FreeBSD/arm (Amnesiac) (tt= yu0)<br clear=3D"none">> <br clear=3D"none">> login:<br clear=3D"none= ">> <br clear=3D"none">> I can share the USB traces taken at the USB = device if required.<br clear=3D"none">> Thanks,Mahesh<br clear=3D"none">= I just replaced a drive 2 days ago that exhibited the same behavior. I <br = clear=3D"none">haven't (yet)<br clear=3D"none">checked the replaced drive y= et for cause. But what I chose to do was as <br clear=3D"none">follows.<br = clear=3D"none">Get a new (known dependable) drive. Add it to the system and= dump the data on <br clear=3D"none">the<br clear=3D"none">failing disk to = the new drive. At least you'll have a safe copy of it.<br clear=3D"none">Yo= u didn't say how the drive(s) are formatted/laid out. Are you using UFS/GPT= <br clear=3D"none">or<br clear=3D"none">ZFS?<br clear=3D"none">How you pro= ceed after making a safe copy will depend on how you manage your <br clear= =3D"none">disks.<br clear=3D"none">UFS/GPT?: simply remove the failing the = disk, and change the entry in <br clear=3D"none">fdtab(5) to<br clear=3D"no= ne">point to the new disk.<br clear=3D"none">ZFS. It should be enough to si= mply replace the failing disk with one at least <div class=3D"ydp1d9310b1yq= t2490421151" id=3D"ydp1d9310b1yqtfd27184"><br clear=3D"none">the</div><br c= lear=3D"none">size of the failing one and resilver.<br clear=3D"none"><br c= lear=3D"none">HTH<br clear=3D"none"><br clear=3D"none">--Chris<div class=3D= "ydp1d9310b1yqt2490421151" id=3D"ydp1d9310b1yqtfd11869"><br clear=3D"none">= </div></div></div> </div> </div></body></html> ------=_Part_251012_182260728.1649826897801--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1601830847.251013.1649826897803>