Date: Sat, 07 May 2016 16:38:21 +0000 From: "Poul-Henning Kamp" <phk@phk.freebsd.dk> To: "George Neville-Neil" <gnn@neville-neil.com> Cc: fs@freebsd.org Subject: Re: Fwd: The Morning Paper: NOVA - A log-structured file system for hybrid volatile/non-volatile main memories Message-ID: <59877.1462639101@critter.freebsd.dk> In-Reply-To: <2BE88161-D83A-4265-9EC3-C2F7F7033E93@neville-neil.com> References: <4188b6afbe9e5d43111fef4d4ae5e599a57.20160506051425@mail23.atl91.mcsv.net> <2BE88161-D83A-4265-9EC3-C2F7F7033E93@neville-neil.com>
next in thread | previous in thread | raw e-mail | index | archive | help
That's a pretty obvious idea, all things considered, but not necessarily a good idea and certainly not the best idea. Hybrid "disk with SSD cache" is a transitionary phenomena, it's probably not going to be relevant in five years, which means that it is almost already too late to develop a new filesystem for it: By the time the code is trustworthy, nobody will need it any more. That is not to say that there are no relevant improvements to make. Many years ago we removed the rotational optimizations in FFS in response to zoned drives, and given the properties of SSDs it is no longer evident that journaling, softupdates or even supergroups are good ideas anymore. The design-choices to make metadata updates single-sector modifications should be revisited as well. While LFS seems an obvious storage strategy for SSDs, it's surplus to requirements because SSD devices already contains a LFS. Only they call files "a logical sector (extent)" and the LFS itself a "Flash Adaptation Layer". LFS also has some well documented drawbacks, in particular WRT cleaning, and the optimization that is a tradeoff for are utterly pointless on media effectively without access time. It would probably be smarter to focus on reducing the the number of, and increasing the size of media I/O transactions, with a side order of general scalability, so that small files have small metadata and bigger files trade metadatasize for performance. There is also an interesting space between per-partition and per-inode keying of encryption which is ripe for study. The double or even triple work overlap between todays filesystems and the FALs in SSDs could be avoided if a more expressive set of verbs than Read/Write/Erase(=TRIM) were exported upwards (expose the extents as "inodes" ?) but given the patent-minefield and the heavy-duty NIH-attitudes, that is probably not going to happen. Poul-Henning -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?59877.1462639101>