Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 07 May 2016 16:38:21 +0000
From:      "Poul-Henning Kamp" <phk@phk.freebsd.dk>
To:        "George Neville-Neil" <gnn@neville-neil.com>
Cc:        fs@freebsd.org
Subject:   Re: Fwd: The Morning Paper: NOVA - A log-structured file system for hybrid volatile/non-volatile main memories
Message-ID:  <59877.1462639101@critter.freebsd.dk>
In-Reply-To: <2BE88161-D83A-4265-9EC3-C2F7F7033E93@neville-neil.com>
References:  <4188b6afbe9e5d43111fef4d4ae5e599a57.20160506051425@mail23.atl91.mcsv.net> <2BE88161-D83A-4265-9EC3-C2F7F7033E93@neville-neil.com>

next in thread | previous in thread | raw e-mail | index | archive | help
That's a pretty obvious idea, all things considered, but not
necessarily a good idea and certainly not the best idea. 

Hybrid "disk with SSD cache" is a transitionary phenomena, it's
probably not going to be relevant in five years, which means
that it is almost already too late to develop a new filesystem
for it:  By the time the code is trustworthy, nobody will need
it any more.

That is not to say that there are no relevant improvements to make.

Many years ago we removed the rotational optimizations in FFS in
response to zoned drives, and given the properties of SSDs it is
no longer evident that journaling, softupdates or even supergroups
are good ideas anymore.

The design-choices to make metadata updates single-sector modifications
should be revisited as well.

While LFS seems an obvious storage strategy for SSDs, it's surplus
to requirements because SSD devices already contains a LFS.   Only
they call files "a logical sector (extent)" and the LFS itself a
"Flash Adaptation Layer".

LFS also has some well documented drawbacks, in particular WRT
cleaning, and the optimization that is a tradeoff for are utterly
pointless on media effectively without access time.

It would probably be smarter to focus on reducing the the number of,
and increasing the size of media I/O transactions, with a side order
of general scalability, so that small files have small metadata and
bigger files trade metadatasize for performance.

There is also an interesting space between per-partition and
per-inode keying of encryption which is ripe for study.

The double or even triple work overlap between todays filesystems
and the FALs in SSDs could be avoided if a more expressive set of
verbs than Read/Write/Erase(=TRIM) were exported upwards (expose
the extents as "inodes" ?) but given the patent-minefield and the
heavy-duty NIH-attitudes, that is probably not going to happen.

Poul-Henning

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?59877.1462639101>