Date: Wed, 06 Apr 2022 14:02:57 +0200 From: egoitz@ramattack.net To: Eugene Grosbein <eugen@grosbein.net> Cc: freebsd-hackers@freebsd.org Subject: Re: Desperate with 870 QVO and ZFS Message-ID: <109127fb4e43e70cd548fecde2c1f755@ramattack.net> In-Reply-To: <15a86fae-90fd-951d-50e0-48f9be8b4bbc@grosbein.net> References: <6cf6c03c5a4aa8128575ec4e2f70b168@ramattack.net> <15a86fae-90fd-951d-50e0-48f9be8b4bbc@grosbein.net>
next in thread | previous in thread | raw e-mail | index | archive | help
--=_e9fc600bd2777d5700eb4010e802ed4d Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Hi Eugene, No... I normally don't have many delete operations..... in fact the bast majority of them are left for the night... they are done at 2,3,4 am in the morning.... We may have 600 deletes/sec at busy times (acording to what I see in gstat and calculating when having two masters).... I don't think is a trim issue, because where removing snapshots in same disks as these ones, but in another different machines (of another service) and there are no issues... they hold virtual machines... it's not mail.... but I assume something should be seen too if trims where the issue... I honestly think, it could have something to do with concurrency.... the disks have issues when you have perhaps 2200 users for instance and in peak hours only!.... but how a disk... could be suffering of concurrency?. The controller should only be able to do an operation at the own same time... so there's not exist paralelism there... I can't really understand what happens.... Regards, El 2022-04-06 13:42, Eugene Grosbein escribió: > 06.04.2022 18:18, egoitz@ramattack.net wrote: > >> Good morning, >> >> I write this post with the expectation that perhaps someone could help me :) >> >> I am running some mail servers with FreeBSD and ZFS. They use 870 QVO (not EVO or other Samsung SSD disks) disks as storage. They can easily have from 1500 to 2000 concurrent connections. The machines have 128GB of ram and the CPU is almost absolutely idle. The disk IO is normally at 30 or 40% percent at most. >> >> The problem I'm facing is that they could be running just fine and suddenly at some peak hour, >> the IO goes to 60 or 70% and the machine becomes extremely slow. > > You should run: gstat -adpI3s > And monitor all values, especially "deletes": d/s, next KBps and ms/d. > > If you have many delete operations (including ZFS snapshort destroying), > it may result in massive chunks of TRIM operations sent to SSD. > Some SSD products have abysmal TRIM performance. --=_e9fc600bd2777d5700eb4010e802ed4d Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=UTF-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; charset= =3DUTF-8" /></head><body style=3D'font-size: 10pt; font-family: Verdana,Gen= eva,sans-serif'> <p>Hi Eugene,</p> <p><br /></p> <p>No... I normally don't have many delete operations..... in fact the bast= majority of them are left for the night... they are done at 2,3,4 am in th= e morning....</p> <p>We may have 600 deletes/sec at busy times (acording to what I see in gst= at and calculating when having two masters)....</p> <p>I don't think is a trim issue, because where removing snapshots in same = disks as these ones, but in another different machines (of another service)= and there are no issues... they hold virtual machines... it's not mail..= =2E. but I assume something should be seen too if trims where the issue..= =2E</p> <p>I honestly think, it could have something to do with concurrency.... the= disks have issues when you have perhaps 2200 users for instance and in pea= k hours only!.... but how a disk... could be suffering of concurrency?. The= controller should only be able to do an operation at the own same time..= =2E so there's not exist paralelism there... I can't really understand what= happens....</p> <p>Regards,</p> <p><br /></p> <div> </div> <p><br /></p> <p>El 2022-04-06 13:42, Eugene Grosbein escribió:</p> <blockquote type=3D"cite" style=3D"padding: 0 0.4em; border-left: #1010ff 2= px solid; margin: 0"><!-- html ignored --><!-- head ignored --><!-- meta ig= nored --> <div class=3D"pre" style=3D"margin: 0; padding: 0; font-family: monospace">= <br /> 06.04.2022 18:18, <a href=3D"mailto:egoitz@ramattack.net">egoitz@ram= attack.net</a> wrote:<br /> <br /> <blockquote type=3D"cite" style=3D"padding: 0 0.4em; border-left: #1010ff 2= px solid; margin: 0">Good morning,<br /> <br /> I write this post with the = expectation that perhaps someone could help me :)<br /> <br /> I am running= some mail servers with FreeBSD and ZFS. They use 870 QVO (not EVO or other= Samsung SSD disks) disks as storage. They can easily have from 1500 to 200= 0 concurrent connections. The machines have 128GB of ram and the CPU is alm= ost absolutely idle. The disk IO is normally at 30 or 40% percent at most= =2E<br /> <br /> The problem I'm facing is that they could be running just = fine and suddenly at some peak hour,<br /> the IO goes to 60 or 70% and the= machine becomes extremely slow.</blockquote> <br /> You should run: gstat -adpI3s<br /> And monitor all values, especial= ly "deletes": d/s, next KBps and ms/d.<br /> <br /> If you have many delete= operations (including ZFS snapshort destroying),<br /> it may result in ma= ssive chunks of TRIM operations sent to SSD.<br /> Some SSD products have a= bysmal TRIM performance.<br /> <br /> <br /> </div> </blockquote> </body></html> --=_e9fc600bd2777d5700eb4010e802ed4d--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?109127fb4e43e70cd548fecde2c1f755>