Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 06 Apr 2022 14:02:57 +0200
From:      egoitz@ramattack.net
To:        Eugene Grosbein <eugen@grosbein.net>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Desperate with 870 QVO and ZFS
Message-ID:  <109127fb4e43e70cd548fecde2c1f755@ramattack.net>
In-Reply-To: <15a86fae-90fd-951d-50e0-48f9be8b4bbc@grosbein.net>
References:  <6cf6c03c5a4aa8128575ec4e2f70b168@ramattack.net> <15a86fae-90fd-951d-50e0-48f9be8b4bbc@grosbein.net>

next in thread | previous in thread | raw e-mail | index | archive | help
--=_e9fc600bd2777d5700eb4010e802ed4d
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=UTF-8

Hi Eugene, 

No... I normally don't have many delete operations..... in fact the bast
majority of them are left for the night... they are done at 2,3,4 am in
the morning.... 

We may have 600 deletes/sec at busy times (acording to what I see in
gstat and calculating when having two masters).... 

I don't think is a trim issue, because where removing snapshots in same
disks as these ones, but in another different machines (of another
service) and there are no issues... they hold virtual machines... it's
not mail.... but I assume something should be seen too if trims where
the issue... 

I honestly think, it could have something to do with concurrency.... the
disks have issues when you have perhaps 2200 users for instance and in
peak hours only!.... but how a disk... could be suffering of
concurrency?. The controller should only be able to do an operation at
the own same time... so there's not exist paralelism there... I can't
really understand what happens.... 

Regards, 

El 2022-04-06 13:42, Eugene Grosbein escribió:

> 06.04.2022 18:18, egoitz@ramattack.net wrote:
> 
>> Good morning,
>> 
>> I write this post with the expectation that perhaps someone could help me :)
>> 
>> I am running some mail servers with FreeBSD and ZFS. They use 870 QVO (not EVO or other Samsung SSD disks) disks as storage. They can easily have from 1500 to 2000 concurrent connections. The machines have 128GB of ram and the CPU is almost absolutely idle. The disk IO is normally at 30 or 40% percent at most.
>> 
>> The problem I'm facing is that they could be running just fine and suddenly at some peak hour,
>> the IO goes to 60 or 70% and the machine becomes extremely slow.
> 
> You should run: gstat -adpI3s
> And monitor all values, especially "deletes": d/s, next KBps and ms/d.
> 
> If you have many delete operations (including ZFS snapshort destroying),
> it may result in massive chunks of TRIM operations sent to SSD.
> Some SSD products have abysmal TRIM performance.
--=_e9fc600bd2777d5700eb4010e802ed4d
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=UTF-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; charset=
=3DUTF-8" /></head><body style=3D'font-size: 10pt; font-family: Verdana,Gen=
eva,sans-serif'>
<p>Hi Eugene,</p>
<p><br /></p>
<p>No... I normally don't have many delete operations..... in fact the bast=
 majority of them are left for the night... they are done at 2,3,4 am in th=
e morning....</p>
<p>We may have 600 deletes/sec at busy times (acording to what I see in gst=
at and calculating when having two masters)....</p>
<p>I don't think is a trim issue, because where removing snapshots in same =
disks as these ones, but in another different machines (of another service)=
 and there are no issues... they hold virtual machines... it's not mail..=
=2E. but I assume something should be seen too if trims where the issue..=
=2E</p>
<p>I honestly think, it could have something to do with concurrency.... the=
 disks have issues when you have perhaps 2200 users for instance and in pea=
k hours only!.... but how a disk... could be suffering of concurrency?. The=
 controller should only be able to do an operation at the own same time..=
=2E so there's not exist paralelism there... I can't really understand what=
 happens....</p>
<p>Regards,</p>
<p><br /></p>
<div>&nbsp;</div>
<p><br /></p>
<p>El 2022-04-06 13:42, Eugene Grosbein escribi&oacute;:</p>
<blockquote type=3D"cite" style=3D"padding: 0 0.4em; border-left: #1010ff 2=
px solid; margin: 0"><!-- html ignored --><!-- head ignored --><!-- meta ig=
nored -->
<div class=3D"pre" style=3D"margin: 0; padding: 0; font-family: monospace">=
<br /> 06.04.2022 18:18, <a href=3D"mailto:egoitz@ramattack.net">egoitz@ram=
attack.net</a> wrote:<br /> <br />
<blockquote type=3D"cite" style=3D"padding: 0 0.4em; border-left: #1010ff 2=
px solid; margin: 0">Good morning,<br /> <br /> I write this post with the =
expectation that perhaps someone could help me :)<br /> <br /> I am running=
 some mail servers with FreeBSD and ZFS. They use 870 QVO (not EVO or other=
 Samsung SSD disks) disks as storage. They can easily have from 1500 to 200=
0 concurrent connections. The machines have 128GB of ram and the CPU is alm=
ost absolutely idle. The disk IO is normally at 30 or 40% percent at most=
=2E<br /> <br /> The problem I'm facing is that they could be running just =
fine and suddenly at some peak hour,<br /> the IO goes to 60 or 70% and the=
 machine becomes extremely slow.</blockquote>
<br /> You should run: gstat -adpI3s<br /> And monitor all values, especial=
ly "deletes": d/s, next KBps and ms/d.<br /> <br /> If you have many delete=
 operations (including ZFS snapshort destroying),<br /> it may result in ma=
ssive chunks of TRIM operations sent to SSD.<br /> Some SSD products have a=
bysmal TRIM performance.<br /> <br /> <br /> </div>
</blockquote>
</body></html>

--=_e9fc600bd2777d5700eb4010e802ed4d--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?109127fb4e43e70cd548fecde2c1f755>