Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Apr 2022 23:06:15 +0200
From:      Jan Bramkamp <crest@rlwinm.de>
To:        performance@freebsd.org
Subject:   Re: {* 05.00 *}Re: Desperate with 870 QVO and ZFS
Message-ID:  <803f008d-b91a-2a8d-88f9-3d2d091149df@rlwinm.de>
In-Reply-To: <ce51660b5f83f92aa9772d764ae12dff@ramattack.net>
References:  <4e98275152e23141eae40dbe7ba5571f@ramattack.net> <665236B1-8F61-4B0E-BD9B-7B501B8BD617@ultra-secure.de> <0ef282aee34b441f1991334e2edbcaec@ramattack.net> <dd9a55ac-053d-7802-169d-04c95c045ed2@FreeBSD.org> <ce51660b5f83f92aa9772d764ae12dff@ramattack.net>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------O9TimV1H5koHqUIsOKaj4g23
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit


On 06.04.22 18:34, egoitz@ramattack.net wrote:
>
> Hi Stefan!
>
>
> Thank you so much for your answer!!. I do answer below in green bold 
> for instance... for a better distinction....
>
>
> Very thankful for all your comments Stefan!!! :) :) :)
>
>
> Cheers!!
>
>
> El 2022-04-06 17:43, Stefan Esser escribió:
>
>> ATENCION
>> ATENCION
>> ATENCION!!! Este correo se ha enviado desde fuera de la organizacion. 
>> No pinche en los enlaces ni abra los adjuntos a no ser que reconozca 
>> el remitente y sepa que el contenido es seguro.
>>
>> Am 06.04.22 um 16:36 schrieb egoitz@ramattack.net:
>>> Hi Rainer!
>>>
>>> Thank you so much for your help :) :)
>>>
>>> Well I assume they are in a datacenter and should not be a power 
>>> outage....
>>>
>>> About dataset size... yes... our ones are big... they can be 3-4 TB 
>>> easily each
>>> dataset.....
>>>
>>> We bought them, because as they are for mailboxes and mailboxes grow and
>>> grow.... for having space for hosting them...
>>
>> Which mailbox format (e.g. mbox, maildir, ...) do you use?
>> *I'm running Cyrus imap so sort of Maildir... too many little files 
>> normally..... Sometimes directories with tons of little files....*
>>
>>> We knew they had some speed issues, but those speed issues, we 
>>> thought (as
>>> Samsung explains in the QVO site) they started after exceeding the 
>>> speeding
>>> buffer this disks have. We though that meanwhile you didn't exceed it's
>>> capacity (the capacity of the speeding buffer) no speed problem 
>>> arises. Perhaps
>>> we were wrong?.
>>
>> These drives are meant for small loads in a typical PC use case,
>> i.e. some installations of software in the few GB range, else only
>> files of a few MB being written, perhaps an import of media files
>> that range from tens to a few hundred MB at a time, but less often
>> than once a day.
>> *We move, you know... lots of little files... and lot's of different 
>> concurrent modifications by 1500-2000 concurrent imap connections we 
>> have...*
>>
>> As the SSD fills, the space available for the single level write
>> cache gets smaller
>> *The single level write cache is the cache these ssd drivers have, 
>> for compensating the speed issues they have due to using qlc memory?. 
>> Do you refer to that?. Sorry I don't understand well this paragraph.*

A single flash cell can be thought of as a software adjustable resistor 
as part of a voltage divider with a fixed resistor. Storing just a 
single bit per flash cell allows very fast writes and long lifetimes for 
each flash cell at the cost of low data density. You cheaped out and 
bough the crappiest type of consumer SSDs. These SSDs are optimized for 
one thing: price per capacity (at reasonable read performance). They 
accomplish this by exploiting the expected user behavior of modifying 
only small subsets of the stored data in short bursts and buying (a lot 
more capacity) than they use. You deployed them in a mail server facing 
at least continuous writes for hours on end most days of the week. As 
average load increases and the cheap SSDs fill up less and less 
unallocated flash can be used to cache and the fast SLC cache fills up. 
The SSD firmware now has to stop accepting new requests from the SATA 
port and because only ~30 operations can be queued per SATA disk and the 
ordering requirements between those operations not even reads can be 
satisfied while the cache gets slowly written out storing four bits per 
flash cell instead of one. To the user this appears as the system almost 
hanging because every uncached read and sync write takes tens to 100s of 
milliseconds instead of less than 3ms. No amount of file system or 
driver tuning can truly fix this design flaw/compromise without severely 
limiting the write throughput in software to stay below the sustained 
drain rate of the SLC cache. If you want to invest time, pain and 
suffering to squish the most out of this hardware look into the ~2015 
CAM I/O scheduler work Netflix upstreamed back to FreeBSD. Enabling this 
requires at least building and installing your own kernel with this 
feature enabled, setting acceptable latency targets and defining the 
read/write mix the scheduler should maintain.

I don't expect you'll get satisfactory results out of those disks even 
with lots of experimentation. If you want to experiment with I/O 
scheduling on cheap SSDs start by *migrating all production workloads* 
out of your lab environment. The only safe and quick way out of this 
mess is for you to replace all QVO SSDs with at least as large SSDs 
designed for sustained write workloads.

--------------O9TimV1H5koHqUIsOKaj4g23
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 06.04.22 18:34, <a class="moz-txt-link-abbreviated" href="mailto:egoitz@ramattack.net">egoitz@ramattack.net</a>
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:ce51660b5f83f92aa9772d764ae12dff@ramattack.net">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <p>Hi Stefan!</p>
      <p><br>
      </p>
      <p>Thank you so much for your answer!!. I do answer below in green
        bold for instance... for a better distinction....</p>
      <p><br>
      </p>
      <p>Very thankful for all your comments Stefan!!! :) :) :)</p>
      <p><br>
      </p>
      <p>Cheers!!</p>
      <div> </div>
      <p><br>
      </p>
      <p>El 2022-04-06 17:43, Stefan Esser escribió:</p>
      <blockquote type="cite" style="padding: 0 0.4em; border-left:
        #1010ff 2px solid; margin: 0"><!-- html ignored --><!-- head ignored --><!-- meta ignored -->
        <div class="pre" style="margin: 0; padding: 0; font-family:
          monospace">ATENCION<br>
          ATENCION<br>
          ATENCION!!! Este correo se ha enviado desde fuera de la
          organizacion. No pinche en los enlaces ni abra los adjuntos a
          no ser que reconozca el remitente y sepa que el contenido es
          seguro.<br>
          <br>
          Am 06.04.22 um 16:36 schrieb <a class="moz-txt-link-abbreviated" href="mailto:egoitz@ramattack.net">egoitz@ramattack.net</a>:
          <blockquote type="cite" style="padding: 0 0.4em; border-left:
            #1010ff 2px solid; margin: 0">Hi Rainer!<br>
            <br>
            Thank you so much for your help :) :)<br>
            <br>
            Well I assume they are in a datacenter and should not be a
            power outage....<br>
            <br>
            About dataset size... yes... our ones are big... they can be
            3-4 TB easily each<br>
            dataset.....<br>
            <br>
            We bought them, because as they are for mailboxes and
            mailboxes grow and<br>
            grow.... for having space for hosting them...</blockquote>
          <br>
          Which mailbox format (e.g. mbox, maildir, ...) do you use?</div>
        <div class="pre" style="margin: 0; padding: 0; font-family:
          monospace"> </div>
        <div class="pre" style="margin: 0; padding: 0; font-family:
          monospace"><strong><span style="color: #008000;">I'm running
              Cyrus imap so sort of Maildir... too many little files
              normally..... Sometimes directories with tons of little
              files....</span></strong><br>
          <br>
          <blockquote type="cite" style="padding: 0 0.4em; border-left:
            #1010ff 2px solid; margin: 0">We knew they had some speed
            issues, but those speed issues, we thought (as<br>
            Samsung explains in the QVO site) they started after
            exceeding the speeding<br>
            buffer this disks have. We though that meanwhile you didn't
            exceed it's<br>
            capacity (the capacity of the speeding buffer) no speed
            problem arises. Perhaps<br>
            we were wrong?.</blockquote>
          <br>
          These drives are meant for small loads in a typical PC use
          case,<br>
          i.e. some installations of software in the few GB range, else
          only<br>
          files of a few MB being written, perhaps an import of media
          files<br>
          that range from tens to a few hundred MB at a time, but less
          often<br>
          than once a day.</div>
        <div class="pre" style="margin: 0; padding: 0; font-family:
          monospace"> </div>
        <div class="pre" style="margin: 0; padding: 0; font-family:
          monospace"><strong><span style="color: #008000;">We move, you
              know... lots of little files... and lot's of different
              concurrent modifications by 1500-2000 concurrent imap
              connections we have...</span></strong></div>
        <div class="pre" style="margin: 0; padding: 0; font-family:
          monospace"><br>
          As the SSD fills, the space available for the single level
          write<br>
          cache gets smaller</div>
        <div class="pre" style="margin: 0; padding: 0; font-family:
          monospace"> </div>
        <div class="pre" style="margin: 0; padding: 0; font-family:
          monospace"><strong><span style="color: #008000;">The single
              level write cache is the cache these ssd drivers have, for
              compensating the speed issues they have due to using qlc
              memory?. Do you refer to that?. Sorry I don't understand
              well this paragraph.</span></strong></div>
      </blockquote>
    </blockquote>
    <p>A single flash cell can be thought of as a software adjustable
      resistor as part of a voltage divider with a fixed resistor.
      Storing just a single bit per flash cell allows very fast writes
      and long lifetimes for each flash cell at the cost of low data
      density. You cheaped out and bough the crappiest type of consumer
      SSDs. These SSDs are optimized for one thing: price per capacity
      (at reasonable read performance). They accomplish this by
      exploiting the expected user behavior of modifying only small
      subsets of the stored data in short bursts and buying (a lot more
      capacity) than they use. You deployed them in a mail server facing
      at least continuous writes for hours on end most days of the week.
      As average load increases and the cheap SSDs fill up less and less
      unallocated flash can be used to cache and the fast SLC cache
      fills up. The SSD firmware now has to stop accepting new requests
      from the SATA port and because only ~30 operations can be queued
      per SATA disk and the ordering requirements between those
      operations not even reads can be satisfied while the cache gets
      slowly written out storing four bits per flash cell instead of
      one. To the user this appears as the system almost hanging because
      every uncached read and sync write takes tens to 100s of
      milliseconds instead of less than 3ms. No amount of file system or
      driver tuning can truly fix this design flaw/compromise without
      severely limiting the write throughput in software to stay below
      the sustained drain rate of the SLC cache. If you want to invest
      time, pain and suffering to squish the most out of this hardware
      look into the ~2015 CAM I/O scheduler work Netflix upstreamed back
      to FreeBSD. Enabling this requires at least building and
      installing your own kernel with this feature enabled, setting
      acceptable latency targets and defining the read/write mix the
      scheduler should maintain.</p>
    <p>I don't expect you'll get satisfactory results out of those disks
      even with lots of experimentation. If you want to experiment with
      I/O scheduling on cheap SSDs start by <b>migrating all production
        workloads</b> out of your lab environment. The only safe and
      quick way out of this mess is for you to replace all QVO SSDs with
      at least as large SSDs designed for sustained write workloads.<br>
    </p>
  </body>
</html>

--------------O9TimV1H5koHqUIsOKaj4g23--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?803f008d-b91a-2a8d-88f9-3d2d091149df>