Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 10 Aug 2015 16:12:11 +0800
From:      Julian Elischer <julian@freebsd.org>
To:        Konstantin Belousov <kostikbel@gmail.com>, Willem Jan Withagen <wjw@digiware.nl>
Cc:        fs@freebsd.org
Subject:   Re: Using SSDs as swap
Message-ID:  <55C85CDB.80600@freebsd.org>
In-Reply-To: <20150808114107.GD2072@kib.kiev.ua>
References:  <55C5D48E.6010605@digiware.nl> <20150808102900.GA2072@kib.kiev.ua> <20150808103810.GB2072@kib.kiev.ua> <55C5E697.4080102@digiware.nl> <20150808114107.GD2072@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 8/8/15 7:41 PM, Konstantin Belousov wrote:
> On Sat, Aug 08, 2015 at 01:23:03PM +0200, Willem Jan Withagen wrote:
>> On 8-8-2015 12:38, Konstantin Belousov wrote:
>>> On Sat, Aug 08, 2015 at 01:29:00PM +0300, Konstantin Belousov wrote:
>>>> On Sat, Aug 08, 2015 at 12:06:06PM +0200, Willem Jan Withagen wrote:
>>>>> one of the following commits just passed with this in the log, and it
>>>>> triggered again a question I've been having for some time again already.
>>>>>
>>>>> ----
>>>>> Log:
>>>>>    Enable BIO_DELETE passthru in GELI, so TRIM/UNMAP can work as expected
>>>>> when
>>>>>    GELI is used on a SSD or inside virtual machine, so that guest can tell
>>>>>    host that it is no longer using some of the storage.
>>>>> -----
>>>>>
>>>>> In ZFS I slice my SSD's into log and caches, but on a a server with
>>>>> little memory (which can't be grown) I use a partion on each ssd as swap
>>>>> as well. So swappinging does not have to seek, and has faster loading
>>>>> time. To allocate a few GB on aan SSD to swap is not really all that
>>>>> painfull, given current sizes, but the speed difference with regular
>>>>> spindels is impressive.
>>>>>
>>>>> But the questions are:
>>>>> 1) Does the swap driver understand that backing-store needs a TRIM?
>>>> No.
>>>>
>>>>> 1a) if not would it be useful, and what would it take to implement?
>>>> One good thing is that it is simply the question of coding: the VM
>>>> already has a place where it informs the swap pager that the page copy
>>>> in swap is no longer needed. this is the vm_pager_page_unswapped() call
>>>> and swap pager method swap_pager_unswapped(). swp_pager_meta_ctl() would
>>>> need to issue BIO_DELETE to the backing storage.
>>>>
>>>> On the other hand, note that this would increase the amount of work
>>>> performed, even for the swap volumes located on the rotating media,
>>>> which is more typical and reasonable setup.
>>>>
>>>> I think an implementation and a knob to turn it off, or configure per
>>>> swap partition, would be reasonable.
>>> One additional thing: while BIO_DELETE is in progress, the swap block
>>> cannot be marked free, since otherwise we could write other page and
>>> get it obliterated with the TRIM. This can be done async, but the
>>> consequence is that swap space would be released and usable some time
>>> after the page-in.  This will affect loads which are close to OOM.
>> Sort of makes sense to me...
>>
>> I take it that BIO_DELETE fires and returns before TRIM is completed?
>> But then the SSD accepts writes to a TRIMmed block, but then mixes this
>> up? Possibly deleting a write to a to be trimmed block? This sort of
>> strikes me as odd, but then I do not know the full intricate details of
>> TRIM on SSD
>>
>> Would it be possible to be notified that a TRIM has completed, only then
>> to actually free the swap sectors?
> This is exactly what I wrote above.
Having worked on the other side of the dotted line at an ssd vendor, 
Trim can be
both easy and hard depending on many implementation details of how 
they do SSD.
Part of the hard part is that trim needs to be persistent, except when 
it doesn't.
In the case of swap, it could be non persistent if the system were to 
do a huge
trim on the whole of swap when it starts up. We tried to make our TRIM
implementable in a single fast action so there wouldn't need to be any 
notification of
'completion' as  such. (well there would be at the lowest level, but 
we would be talking
a couple of uSec). The persistence is hard because you want it to 
remember the trim if
power is lost immediately after it happens.
Which requires a write, which uses space, and time ,which is leads to 
the odd situation of
being unable to trim because you are short of space.  which means you 
need to
have reserves you can use to trim, in order to free space..

I was often pushing for a "non persistent trim" which in fact would 
have been really
easy for us and incredibly fast.
Would have been great for things like swap.
>
>> And then perhaps the swap bookkeeping does not yet accommodate for a
>> possible extra state?
> It does not need to.  The in-flight BIO_DELETE remembers the intermediate
> state, the swap block should be freed only after the storage reported the
> BIO_DELETE as finished.  It is exactly the same as UFS handles trimming
> of the free blocks, the bitmap of the used/freed blocks is only updated
> after the BIO_DELETE is finished, not when the inode drops reference to
> the block.
>> Speaking about blocks.... Does Swap take into account that disks could
>> be of a sectorsize other than 512 bytes. I would guess so, since we
>> could have a 4K disk as swap disk, and doing read-modify-write for swap
>> is sure going to kill performance.
> swap performs i/o in the page-sized chunks at least, which are min 4k on
> all supported platforms (even on arms, where we do not support smaller
> pages AFAIK).
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55C85CDB.80600>