Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 07 May 2016 10:55:31 -0400
From:      "George Neville-Neil" <gnn@neville-neil.com>
To:        fs@freebsd.org
Subject:   Fwd: The Morning Paper: NOVA - A log-structured file system for hybrid volatile/non-volatile main memories
Message-ID:  <2BE88161-D83A-4265-9EC3-C2F7F7033E93@neville-neil.com>
References:  <4188b6afbe9e5d43111fef4d4ae5e599a57.20160506051425@mail23.atl91.mcsv.net>

next in thread | previous in thread | raw e-mail | index | archive | help
It's time for the project to start thinking about these issues IMHO.

Best,
George


Forwarded message:

> From: The Morning Paper <the-morning-paper@onelanday.co.uk>
> To: gnn@neville-neil.com
> Subject: The Morning Paper: NOVA - A log-structured file system for =

> hybrid volatile/non-volatile main memories
> Date: Fri, 6 May 2016 05:14:40 +0000
>
> The implications of combined DRAM and NVMM memory for file system =

> design.
> View this email in your browser =

> (http://us9.campaign-archive1.com/?u=3D4188b6afbe9e5d43111fef4d4&id=3D2=
6178dba42&e=3Dae5e599a57)
> This paper write-up is also available online at The Morning Paper. =

> (http://blog.acolyer.org/2016/05/06/nova-a-log-structured-file-system-f=
or-hybrid-volatilenon-volatile-main-memories)
>
>
> ** the morning paper
> ------------------------------------------------------------
>
>
> ** NOVA: A log-structured file system for hybrid volatile/non-volatile =

> main memories
> ------------------------------------------------------------
>
> NOVA: A Log-structured file system for hybrid volatile/non-volatile =

> main memories =

> (http://cseweb.ucsd.edu/~swanson/papers/FAST2016NOVA.pdf) - Xu & =

> Swanson 2016
>
> Another paper looking at the design implications of mixed DRAM and =

> NVMM systems (it=E2=80=99s the future!), this time in the context of fi=
le =

> systems. (NVMM =3D Non-volatile Main Memory).
>
> Hybrid DRAM/NVMM storage systems present a host of opportunities and =

> challenges for system designers. These systems need to minimize =

> software overhead if they are to fully exploit NVMM=E2=80=99s high =

> performance and efficiently support more flexible access patterns, and =

> at the same time they must provide the strong consistency guarantees =

> that applications require and respect the limitations of emerging =

> memories (e.g. limited program cycles).
>
> Why can=E2=80=99t we just take an existing file system and run it on to=
p of =

> a hybrid memory system? These file systems were built for the =

> performance characteristics of disks (spinning or SSDs) - whereas NVMM =

> and DRAM provide vastly improved performance. They where also built to =

> rely on the consistency guarantees of disks (e.g. atomic sector =

> updates), but memory provides different consistency guarantees from =

> disks. One of the central issues here is the under-the-covers =

> reordering of memory stores, and the need to explicitly flush data =

> from CPU caches to compensate =

> (https://blog.acolyer.org/2016/01/21/blurred-persistence/) . This can =

> easily destroy any performance gains from NVMM if you=E2=80=99re not =

> careful.
>
> To overcome all these limitations, we present the NOn-Volatile memory =

> Accelerated (NOVA) log-structured file system. NOVA adapts =

> conventional log-structured file system techniques to exploit the fast =

> random access provided by hybrid memory systems. This allows NOVA to =

> support massive concurrency, reduce log size, and minimize garbage =

> collection costs while providing strong consistency guarantees for =

> conventional file operations and mmap-based load/store accesses.
>
> All of this hard work pays off: =E2=80=9CWe find that NOVA is significa=
ntly =

> faster than existing file systems in a wide range of applications and =

> outperforms file systems that provide the same data consistency =

> guarantees by between 3.1x and 13.5x in write-intensive workloads.=E2=80=
=9D
>
> There is a lot of detailed information about NOVA=E2=80=99s implementat=
ion =

> in the paper. Here I want to focus on the authors=E2=80=99 excellent =

> discussion of what=E2=80=99s different about hybrid memory systems, and=
 how =

> they approached the high-level design of NOVA as a consequence.
>
>
> ** Challenges in designing for hybrid memory systems
> ------------------------------------------------------------
>
> Xu & Swanson outline three fundamental challenges when designing for =

> hybrid memory systems:
> 1. Realising the performance potential of the hardware
> 2. Write reordering and its impact on consistency
> 3. Providing atomicity for operations
>
>
> ** Performance
> ------------------------------------------------------------
>
> The low latencies of NVMMs alters the trade-offs between hardware and =

> software latency. In conventional storage systems, the latency of slow =

> storage devices (e.g., disks) dominates access latency, so software =

> efficiency is not critical. Previous work has shown that with fast =

> NVMM, software costs can quickly dominate memory latency, squandering =

> the performance that NVMMs could provide=E2=80=A6
>
> It is possible to bypass the DRAM page cache and access NVMM directly =

> using a technique called Direct Access (DAX), or eXecute In Place =

> (XIP), avoiding extra copies between NVMM and DRAM in the storage =

> stack.
>
> NOVA is a DAX file system, and we expect that all NVMM file systems =

> will provide for these (or similar) features.
>
>
> ** Write re-ordering
> ------------------------------------------------------------
>
> Modern processors and their caching hierarchies may reorder store =

> operations to improve performance. The CPU=E2=80=99s memory consistency=
 =

> protocol makes guarantees about the ordering of memory updates, but =

> existing models (with the exception of research proposals [20, 46]) do =

> not provide guarantees on when updates will reach NVMMs. As a result, =

> a power failure may leave the data in an inconsistent state.
>
> It=E2=80=99s possible to explicitly flush caches and issue memory barri=
ers =

> to enforce write ordering. However, while an mfence will enforce order =

> on memory operations before and after the barrier, it only guarantees =

> all CPUs have the same view of the memory. It does not impose any =

> constraints on the order of data writebacks to the NVMM.
>
> Intel has proposed new instructions to fix these problems, which =

> include clflushopt, clwb and pcommit. =E2=80=9CNOVA is built with these=
 =

> instructions in mind=E2=80=A6=E2=80=9D
>
>
> ** Atomicity
> ------------------------------------------------------------
>
> Existing file systems use a variety of techniques like journaling, =

> shadow paging, or log-structuring to provide atomicity guarantees.
>
> A journaling (WAL) system records all updates to a journal before =

> applying them, and in the case of a power failure replays the journal =

> to restore the system to a consistent state. Shadow paging is a =

> copy-on-write mechanism in which a new copy of affected pages is =

> written to storage on a write, before swapping out any references to =

> the old pages for the new ones. Log-structured file systems (LFS) =

> buffer random writes in memory and then convert them into larger =

> sequential writes to the disk. This frequent a steady supply of =

> contiguous free regions of disk, which in turn entails frequent =

> cleaning and compacting of the log to reclaim space.
>
> RAMCloud (https://blog.acolyer.org/2016/01/18/ramcloud/) is an example =

> of a DRAM based storage system that keeps all its data in DRAM to =

> service reads, and keeps a persistent version on disk. It uses log =

> structure for both DRAM and disk.
>
>
> ** NOVA design principles
> ------------------------------------------------------------
>
> NOVA is a log-structured, POSIX file system that builds on the =

> strengths of LFS and adapts them to take advantage of hybrid memory =

> systems. Because it targets a different storage technology, NOVA looks =

> very different from conventional log-structured file systems that are =

> built to maximize disk bandwidth.
>
> Three observations influenced the design:
> 1. Logs that support atomic updates are easy to implement in NVMM, but =

> are not efficient for search operations (e.g. directory lookup and =

> file random access). Data structures that support fast search (e.g. =

> trees) are more difficult to implement correctly and efficiently in =

> NVMM.
> 2. The complexity of log cleaning in LFS comes from the need for =

> contiguous free regions of storage. In NVMM however, random access is =

> cheap and therefore we don=E2=80=99t need to write in contiguous region=
s and =

> hence don=E2=80=99t need such complex cleaning protocols.
> 3. NVMMs support fast, highly concurrent random accesses, and =

> therefore using multiple logs does not negatively impact performance.
>
> Based on this, NOVA:
> * Keeps logs in NVMM, and indexes (radix trees) in DRAM.
> * Gives each inode its own log, which allows concurrent updates across =

> files without synchronization. During recovery, NOVA can replay =

> multiple logs simultaneously.
> * Uses logging and lightweight journaling for complex atomic updates. =

> NOVA=E2=80=99s log-structure provides cheaper atomic updates than journ=
aling =

> or shadow paging. =E2=80=9CTo atomically write to a log, NOVA first app=
ends =

> data to the log, and then atomically updates the log tail to commit =

> the updates, thus avoiding both the duplicate writes overhead of =

> journaling file systems and the cascading update costs of shadow =

> paging systems.=E2=80=9D
> * Implements the log as a singly linked list! The locality benefits of =

> sequential logs are less important in NVMM, so NOVA uses a linked list =

> of 4KB NVMM pages.
>
> Allowing for non-sequential log storage provides three advantages. =

> First, allocating log space is easy since NOVA does not need to =

> allocate large, contiguous regions for the log. Second, NOVA can =

> perform log cleaning at fine-grained page-size granularity. Third, =

> reclaiming log pages that contain only stale entries requires just a =

> few pointer assignments.
>
> * Finally, NOVA does not log file data. NOVA uses copy-on-write for =

> modified pages, and appends metadata about the write to the log.
>
> The high-level layout of the NOVA data structures looks like this:
>
> NOVA=E2=80=99s atomicity comes from a combination of:
> * 64-bit atomic updates - NOVA exploits processor support for 64-bit =

> atomic writes to memory to directly modify metadata for some =

> operations (e.g. a file=E2=80=99s atime for reads), and to commit updat=
es to =

> the log by updating the inode=E2=80=99s log tail pointer.
> * Logging in the inode=E2=80=99s log to record operation that modify a =

> single node.
> * Lightweight journaling for directory operations that require changes =

> to multiple nodes.
> * Enforced write ordering by: (1) committing data and log entries to =

> NVMM before updating the log tail; (2) committing journal data to NVMM =

> before propagating updates; and (3) committing new versions of data =

> pages to NVMM before recycling stale ones. If NOVA is running on a =

> system that supports the new clflushopt=E2=80=99 clwb, and pcommit =

> instructions it will use these to enforce the write ordering, =

> otherwise it uses movntq, =E2=80=9Ca non-temporal move instruction that=
 =

> bypasses the CPU cache hierarchy to perform direct writes to NVMM,=E2=80=
=9D =

> and a combination of clflush and mfence.
>
>
> ** Evaluation
> ------------------------------------------------------------
>
> Figure 6 shows how NOVA file system operation latency compares to =

> other file system across different NVMM configurations.
>
> Note that NOVA is more sensitive to NVMM performance than the other =

> file systems because NOVA=E2=80=99s software overheads are lower, and s=
o =

> overall performance more directly reflects the underlying memory =

> performance.
>
> Figure 7 shows how NOVA compares to other file systems across four =

> Filebench workloads: a file server, web proxy, web server, and varmail =

> (emulates an email server).
>
> Overall, NOVA achieves the best performance in almost all cases, and =

> provides data consistency guarantees that are as strong or stronger =

> than the other file systems. The performance advantages of NOVA are =

> largest on write-intensive workloads with large numbers of files.
>
> http://twitter.com/intent/tweet?text=3DHow%20NVMM%20changes%20optimum%2=
0file%20system%20design.: =

> http%3A%2F%2Fblog.acolyer.org%2F2016%2F05%2F06%2Fnova-a-log-structured-=
file-system-for-hybrid-volatilenon-volatile-main-memories =

> Tweet =

> (http://twitter.com/intent/tweet?text=3DHow%20NVMM%20changes%20optimum%=
20file%20system%20design.: =

> http%3A%2F%2Fblog.acolyer.org%2F2016%2F05%2F06%2Fnova-a-log-structured-=
file-system-for-hybrid-volatilenon-volatile-main-memories)
> This email was brought to you by #themorningpaper =

> (http://blog.acolyer.org) : an interesting/influential/important paper =

> from the world of CS every weekday morning, as selected by Adrian =

> Colyer
>
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>
> Copyright =C2=A9 2016 One L and a Y Ltd, All rights reserved.
>  You are receiving this email because you opted into email delivery =

> for your copy of The Morning Paper.
>
> Our mailing address is:
> One L and a Y Ltd
> Unit 5755
> PO Box 6945
> London, England W1A 6US
> United Kingdom
> ** unsubscribe from this list =

> (http://acolyer.us9.list-manage.com/unsubscribe?u=3D4188b6afbe9e5d43111=
fef4d4&id=3Dde5773de0c&e=3Dae5e599a57&c=3D26178dba42)
> ** update subscription preferences =

> (http://acolyer.us9.list-manage.com/profile?u=3D4188b6afbe9e5d43111fef4=
d4&id=3Dde5773de0c&e=3Dae5e599a57)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2BE88161-D83A-4265-9EC3-C2F7F7033E93>