Date: Fri, 23 Feb 2024 12:16:20 -0700 From: Warner Losh <imp@bsdimp.com> To: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net> Cc: FreeBSD FS <freebsd-fs@freebsd.org> Subject: Re: really slow problem with nvme Message-ID: <CANCZdfrbZ5X5QoV=JpWENNCwwq4_2-oaPksbMLpJBuOM%2B7VH=Q@mail.gmail.com> In-Reply-To: <rn497rrr-2893-7460-r39q-29sq3snnop0r@yvfgf.mnoonqbm.arg> References: <5n117579-8259-3on3-90pn-79o8n52q4q32@yvfgf.mnoonqbm.arg> <CANCZdfoBc7bW6so8H4=dD-XogPp0uQKr1pZjRvSH5%2B8HEGMTCg@mail.gmail.com> <rn497rrr-2893-7460-r39q-29sq3snnop0r@yvfgf.mnoonqbm.arg>
next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000cd2a3f06121165e0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Feb 23, 2024 at 12:03=E2=80=AFPM Bjoern A. Zeeb < bzeeb-lists@lists.zabbadoz.net> wrote: > On Fri, 23 Feb 2024, Warner Losh wrote: > > > On Fri, Feb 23, 2024, 10:46=E2=80=AFAM Bjoern A. Zeeb < > > bzeeb-lists@lists.zabbadoz.net> wrote: > > > >> Hi, > >> > >> this is a Samsung SSD 970 EVO Plus 1TB nvme and gpart and newfs > >> were already slow (it took like two hours for newfs). > >> > >> Here's another example now: > >> > >> # /usr/bin/time mkdir foo > >> 1.82 real 0.00 user 0.00 sys > >> > >> How does one debug this? > >> > > > > What filesystem? Sounds like UFS but just making sure. . > > yes, ufs > > > So what's the link speed and number of lanes? If it's bad i might rese= at > > (though that might not help) that looks good... > > pciconf I had checked: > > nvme0@pci4:1:0:0: class=3D0x010802 rev=3D0x00 hdr=3D0x00 vendor=3D0= x144d > device=3D0xa808 subvendor=3D0x144d subdevice=3D0xa801 > class =3D mass storage > subclass =3D NVM > bar [10] =3D type Memory, range 64, base 0x40000000, size 16384, > enabled > cap 01[40] =3D powerspec 3 supports D0 D3 current D0 > cap 05[50] =3D MSI supports 1 message, 64 bit > cap 10[70] =3D PCI-Express 2 endpoint max data 128(256) FLR RO NS > max read 512 > link x2(x4) speed 8.0(8.0) ASPM disabled(L1) ClockPM > disabled > cap 11[b0] =3D MSI-X supports 33 messages, enabled > Table in map 0x10[0x3000], PBA in map 0x10[0x2000] > ecap 0001[100] =3D AER 2 0 fatal 0 non-fatal 0 corrected > ecap 0003[148] =3D Serial 1 0000000000000000 > ecap 0004[158] =3D Power Budgeting 1 > ecap 0019[168] =3D PCIe Sec 1 lane errors 0 > ecap 0018[188] =3D LTR 1 > ecap 001e[190] =3D L1 PM Substates 1 > x4 card in a x2 slot. If that's intentional, then this looks good. > > > Though I'd bet money that this is an interrupt issue. I'd do a vmstat. = -i > > to watch how quickly they accumulate... > > That I am waiting for a full world to get onto it. I wish I could have > netbooted but not possible there currently. > > Only took 15 minutes to extract the tar now. Should have used ddb... > hadn't thought of that before... > > # vmstat -ai | grep nvme > its0,0: nvme0:admin 0 0 > its0,1: nvme0:io0 0 0 > its0,2: nvme0:io1 0 0 > its0,3: nvme0:io2 0 0 > its0,4: nvme0:io3 0 0 > its0,5: nvme0:io4 0 0 > its0,6: nvme0:io5 0 0 > its0,7: nvme0:io6 0 0 > its0,8: nvme0:io7 0 0 > > How does this even work? Do we poll? > Yes. We poll, and poll slowly. You have an interrupt problem. On an ARM platform. Fun. ITS and I are old.... foes? Friends? frenemies? As for why, I don't know. I've been fortunate never to have to chase interrupts not working on arm problems.... > And before you ask: > > [1.000407] nvme0: <Generic NVMe Device> mem 0x40000000-0x40003fff at > device 0.0 on pci5 > [1.000409] nvme0: attempting to allocate 9 MSI-X vectors (33 supported) > [1.000410] nvme0: using IRQs 106-114 for MSI-X > [1.000411] nvme0: CapLo: 0x3c033fff: MQES 16383, CQR, AMS WRRwUPC, TO 60 > [1.000412] nvme0: CapHi: 0x00000030: DSTRD 0, NSSRS, CSS 1, CPS 0, MPSMIN > 0, MPSMAX 0 > [1.000413] nvme0: Version: 0x00010300: 1.3 > Yea, that's what I'd expect. > > How old is the drive? Fresh install? Do other drives have this same iss= ue > > in the same slot? Dies this drive have issues in other maxhines or slot= s? > > The drive is a few months old but only in the box until it went on this > board. > > I checked nvmecontrol for anything obvious but didn't see. > OK. So not 'super old nand in its death throes being slow" > > Oh, and what's its temperature? Any message in dmesg? > > Nothing in dmesg, temp seems not too bad. Took a while to get > smartmontools; > we have no way to see this in nvmecontrol in human readable form, do we? > > Temperature Sensor 1: 51 Celsius > Temperature Sensor 2: 48 Celsius > A little warm, but not terrible. 50 is where I start to worry a bit, but the card won't thermal throttle until more like 60. We don't currently have a nvmecontrol identify field to tell you this (I should add it, this is the second time in as many weeks I've wanted it). Ok I got a 2nd identical machine netbooted remotely (pressue with > problems often helps) -- slightly different freebsd version and kernel, > same baord, same type of nvme bought together: > > # /usr/bin/time dd if=3D/dev/zero of=3D/dev/nda0 bs=3D1M count=3D1024 > 1024+0 records in > 1024+0 records out > 1073741824 bytes transferred in 1.657316 secs (647879880 bytes/sec) > 1.66 real 0.00 user 0.94 sys > > and ddb> show intrcnt > .. > its0,0: nvme0:admin 24 > its0,1: nvme0:io0 126 > its0,2: nvme0:io1 143 > its0,3: nvme0:io2 131 > its0,4: nvme0:io3 128 > its0,5: nvme0:io4 135 > its0,6: nvme0:io5 147 > its0,7: nvme0:io6 143 > its0,8: nvme0:io7 144 Yea, that's what I'd expect/ Dozens to hundreds of interrupts. I'll try to make sure I can safely access both over the weekend remotely > from a more comforting place and I know where to start looking now... > > Thanks! > No problem! Warner --000000000000cd2a3f06121165e0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">= <div dir=3D"ltr" class=3D"gmail_attr">On Fri, Feb 23, 2024 at 12:03=E2=80= =AFPM Bjoern A. Zeeb <<a href=3D"mailto:bzeeb-lists@lists.zabbadoz.net">= bzeeb-lists@lists.zabbadoz.net</a>> wrote:<br></div><blockquote class=3D= "gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(2= 04,204,204);padding-left:1ex">On Fri, 23 Feb 2024, Warner Losh wrote:<br> <br> > On Fri, Feb 23, 2024, 10:46=E2=80=AFAM Bjoern A. Zeeb <<br> > <a href=3D"mailto:bzeeb-lists@lists.zabbadoz.net" target=3D"_blank">bz= eeb-lists@lists.zabbadoz.net</a>> wrote:<br> ><br> >> Hi,<br> >><br> >> this is a Samsung SSD 970 EVO Plus 1TB nvme and gpart and newfs<br= > >> were already slow (it took like two hours for newfs).<br> >><br> >> Here's another example now:<br> >><br> >> # /usr/bin/time mkdir foo<br> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1.82 real=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A00.00 user=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.00 sys<br> >><br> >> How does one debug this?<br> >><br> ><br> > What filesystem? Sounds like UFS but just making sure. .<br> <br> yes, ufs<br> <br> > So what's the link speed and number of lanes?=C2=A0 If it's ba= d i might reseat<br> > (though that might not help) that looks good...<br> <br> pciconf I had checked:<br> <br> nvme0@pci4:1:0:0:=C2=A0 =C2=A0 =C2=A0 =C2=A0class=3D0x010802 rev=3D0x00 hdr= =3D0x00 vendor=3D0x144d device=3D0xa808 subvendor=3D0x144d subdevice=3D0xa8= 01<br> =C2=A0 =C2=A0 =C2=A0class=C2=A0 =C2=A0 =C2=A0 =3D mass storage<br> =C2=A0 =C2=A0 =C2=A0subclass=C2=A0 =C2=A0=3D NVM<br> =C2=A0 =C2=A0 =C2=A0bar=C2=A0 =C2=A0[10] =3D type Memory, range 64, base 0x= 40000000, size 16384, enabled<br> =C2=A0 =C2=A0 =C2=A0cap 01[40] =3D powerspec 3=C2=A0 supports D0 D3=C2=A0 c= urrent D0<br> =C2=A0 =C2=A0 =C2=A0cap 05[50] =3D MSI supports 1 message, 64 bit<br> =C2=A0 =C2=A0 =C2=A0cap 10[70] =3D PCI-Express 2 endpoint max data 128(256)= FLR RO NS<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 max read 512= <br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 link x2(x4) = speed 8.0(8.0) ASPM disabled(L1) ClockPM disabled<br> =C2=A0 =C2=A0 =C2=A0cap 11[b0] =3D MSI-X supports 33 messages, enabled<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Table in map= 0x10[0x3000], PBA in map 0x10[0x2000]<br> =C2=A0 =C2=A0 =C2=A0ecap 0001[100] =3D AER 2 0 fatal 0 non-fatal 0 correcte= d<br> =C2=A0 =C2=A0 =C2=A0ecap 0003[148] =3D Serial 1 0000000000000000<br> =C2=A0 =C2=A0 =C2=A0ecap 0004[158] =3D Power Budgeting 1<br> =C2=A0 =C2=A0 =C2=A0ecap 0019[168] =3D PCIe Sec 1 lane errors 0<br> =C2=A0 =C2=A0 =C2=A0ecap 0018[188] =3D LTR 1<br> =C2=A0 =C2=A0 =C2=A0ecap 001e[190] =3D L1 PM Substates 1<br></blockquote><d= iv><br></div><div>x4 card in a x2 slot. If that's intentional, then thi= s looks good.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style= =3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding= -left:1ex"> <br> > Though I'd bet money that this is an interrupt issue. I'd do a= vmstat. -i<br> > to watch how quickly they accumulate...<br> <br> That I am waiting for a full world to get onto it.=C2=A0 I wish I could hav= e<br> netbooted but not possible there currently.<br> <br> Only took 15 minutes to extract the tar now.=C2=A0 Should have used ddb...<= br> hadn't thought of that before...<br> <br> # vmstat -ai | grep nvme<br> its0,0: nvme0:admin=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0<br> its0,1: nvme0:io0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0<br> its0,2: nvme0:io1=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0<br> its0,3: nvme0:io2=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0<br> its0,4: nvme0:io3=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0<br> its0,5: nvme0:io4=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0<br> its0,6: nvme0:io5=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0<br> its0,7: nvme0:io6=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0<br> its0,8: nvme0:io7=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0<br> <br> How does this even work?=C2=A0 Do we poll?<br></blockquote><div><br></div><= div>Yes. We poll, and poll slowly.=C2=A0 You have an interrupt problem.</di= v><div><br></div><div>On an ARM platform. Fun. ITS and I are old.... foes? = Friends? frenemies?</div><div><br></div><div>As for why, I don't know. = I've been fortunate never to have to chase</div><div>interrupts not wor= king on arm problems....</div><div>=C2=A0</div><blockquote class=3D"gmail_q= uote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,2= 04);padding-left:1ex"> And before you ask:<br> <br> [1.000407] nvme0: <Generic NVMe Device> mem 0x40000000-0x40003fff at = device 0.0 on pci5<br> [1.000409] nvme0: attempting to allocate 9 MSI-X vectors (33 supported)<br> [1.000410] nvme0: using IRQs 106-114 for MSI-X<br> [1.000411] nvme0: CapLo: 0x3c033fff: MQES 16383, CQR, AMS WRRwUPC, TO 60<br= > [1.000412] nvme0: CapHi: 0x00000030: DSTRD 0, NSSRS, CSS 1, CPS 0, MPSMIN 0= , MPSMAX 0<br> [1.000413] nvme0: Version: 0x00010300: 1.3<br></blockquote><div><br></div><= div>Yea, that's what I'd expect.</div><div>=C2=A0</div><blockquote = class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px sol= id rgb(204,204,204);padding-left:1ex"> > How old is the drive? Fresh install? Do other drives have this same is= sue<br> > in the same slot? Dies this drive have issues in other maxhines or slo= ts?<br> <br> The drive is a few months old but only in the box until it went on this<br> board.<br> <br> I checked nvmecontrol for anything obvious but didn't see.<br></blockqu= ote><div><br></div><div>OK. So not 'super old nand in its death throes = being slow"</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" st= yle=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padd= ing-left:1ex"> > Oh, and what's its temperature? Any message in dmesg?<br> <br> Nothing in dmesg, temp seems not too bad.=C2=A0 Took a while to get smartmo= ntools;<br> we have no way to see this in nvmecontrol in human readable form, do we?<br= > <br> Temperature Sensor 1:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A051 Celsius<br> Temperature Sensor 2:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A048 Celsius<br></blockquote><div><br></div><div>A little warm, but not te= rrible. 50 is where I start to worry a bit, but the card won't thermal<= /div><div>throttle until more like 60. We don't currently have a nvmeco= ntrol identify field to tell you this</div><div>(I should add it, this is t= he second time in as many weeks I've wanted it).</div><div><br></div><b= lockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-le= ft:1px solid rgb(204,204,204);padding-left:1ex"> Ok I got a 2nd identical machine netbooted remotely (pressue with<br> problems often helps) -- slightly different freebsd version and kernel,<br> same baord, same type of nvme bought together:<br> <br> # /usr/bin/time dd if=3D/dev/zero of=3D/dev/nda0 bs=3D1M count=3D1024<br> 1024+0 records in<br> 1024+0 records out<br> 1073741824 bytes transferred in 1.657316 secs (647879880 bytes/sec)<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01.66 real=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A00.00 user=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.94 sys<br> <br> and ddb> show intrcnt<br> ..<br> its0,0: nvme0:admin=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A024<br> its0,1: nvme0:io0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0126<br> its0,2: nvme0:io1=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0143<br> its0,3: nvme0:io2=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0131<br> its0,4: nvme0:io3=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0128<br> its0,5: nvme0:io4=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0135<br> its0,6: nvme0:io5=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0147<br> its0,7: nvme0:io6=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0143<br> its0,8: nvme0:io7=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0144</blockquote><div><br></div><div>Yea, that= 9;s what I'd expect/ Dozens to hundreds of interrupts.</div><div><br></= div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bor= der-left:1px solid rgb(204,204,204);padding-left:1ex"> I'll try to make sure I can safely access both over the weekend remotel= y<br> from a more comforting place and I know where to start looking now...<br> <br> Thanks!<br></blockquote><div><br></div><div>No problem!</div><div><br></div= ><div>Warner</div></div></div> --000000000000cd2a3f06121165e0--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrbZ5X5QoV=JpWENNCwwq4_2-oaPksbMLpJBuOM%2B7VH=Q>