From owner-freebsd-fs@freebsd.org Sat May 7 14:55:34 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4C91AB31339 for ; Sat, 7 May 2016 14:55:34 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 361931189 for ; Sat, 7 May 2016 14:55:34 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: by mailman.ysv.freebsd.org (Postfix) id 31711B31338; Sat, 7 May 2016 14:55:34 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 310A8B31337 for ; Sat, 7 May 2016 14:55:34 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from smtp.hungerhost.com (smtp.hungerhost.com [216.38.51.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 03EFA1188 for ; Sat, 7 May 2016 14:55:33 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from cpe-67-245-246-80.nyc.res.rr.com ([67.245.246.80]:39779 helo=[10.0.1.146]) by vps.hungerhost.com with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.87) (envelope-from ) id 1az3dk-0006jE-HG for fs@freebsd.org; Sat, 07 May 2016 10:55:32 -0400 From: "George Neville-Neil" To: fs@freebsd.org Subject: Fwd: The Morning Paper: NOVA - A log-structured file system for hybrid volatile/non-volatile main memories Date: Sat, 07 May 2016 10:55:31 -0400 Message-ID: <2BE88161-D83A-4265-9EC3-C2F7F7033E93@neville-neil.com> References: <4188b6afbe9e5d43111fef4d4ae5e599a57.20160506051425@mail23.atl91.mcsv.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Mailer: MailMate (1.9.4r5234) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - neville-neil.com X-Get-Message-Sender-Via: vps.hungerhost.com: authenticated_id: gnn@neville-neil.com X-Authenticated-Sender: vps.hungerhost.com: gnn@neville-neil.com X-Source: X-Source-Args: X-Source-Dir: X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 May 2016 14:55:34 -0000 It's time for the project to start thinking about these issues IMHO. Best, George Forwarded message: > From: The Morning Paper > To: gnn@neville-neil.com > Subject: The Morning Paper: NOVA - A log-structured file system for = > hybrid volatile/non-volatile main memories > Date: Fri, 6 May 2016 05:14:40 +0000 > > The implications of combined DRAM and NVMM memory for file system = > design. > View this email in your browser = > (http://us9.campaign-archive1.com/?u=3D4188b6afbe9e5d43111fef4d4&id=3D2= 6178dba42&e=3Dae5e599a57) > This paper write-up is also available online at The Morning Paper. = > (http://blog.acolyer.org/2016/05/06/nova-a-log-structured-file-system-f= or-hybrid-volatilenon-volatile-main-memories) > > > ** the morning paper > ------------------------------------------------------------ > > > ** NOVA: A log-structured file system for hybrid volatile/non-volatile = > main memories > ------------------------------------------------------------ > > NOVA: A Log-structured file system for hybrid volatile/non-volatile = > main memories = > (http://cseweb.ucsd.edu/~swanson/papers/FAST2016NOVA.pdf) - Xu & = > Swanson 2016 > > Another paper looking at the design implications of mixed DRAM and = > NVMM systems (it=E2=80=99s the future!), this time in the context of fi= le = > systems. (NVMM =3D Non-volatile Main Memory). > > Hybrid DRAM/NVMM storage systems present a host of opportunities and = > challenges for system designers. These systems need to minimize = > software overhead if they are to fully exploit NVMM=E2=80=99s high = > performance and efficiently support more flexible access patterns, and = > at the same time they must provide the strong consistency guarantees = > that applications require and respect the limitations of emerging = > memories (e.g. limited program cycles). > > Why can=E2=80=99t we just take an existing file system and run it on to= p of = > a hybrid memory system? These file systems were built for the = > performance characteristics of disks (spinning or SSDs) - whereas NVMM = > and DRAM provide vastly improved performance. They where also built to = > rely on the consistency guarantees of disks (e.g. atomic sector = > updates), but memory provides different consistency guarantees from = > disks. One of the central issues here is the under-the-covers = > reordering of memory stores, and the need to explicitly flush data = > from CPU caches to compensate = > (https://blog.acolyer.org/2016/01/21/blurred-persistence/) . This can = > easily destroy any performance gains from NVMM if you=E2=80=99re not = > careful. > > To overcome all these limitations, we present the NOn-Volatile memory = > Accelerated (NOVA) log-structured file system. NOVA adapts = > conventional log-structured file system techniques to exploit the fast = > random access provided by hybrid memory systems. This allows NOVA to = > support massive concurrency, reduce log size, and minimize garbage = > collection costs while providing strong consistency guarantees for = > conventional file operations and mmap-based load/store accesses. > > All of this hard work pays off: =E2=80=9CWe find that NOVA is significa= ntly = > faster than existing file systems in a wide range of applications and = > outperforms file systems that provide the same data consistency = > guarantees by between 3.1x and 13.5x in write-intensive workloads.=E2=80= =9D > > There is a lot of detailed information about NOVA=E2=80=99s implementat= ion = > in the paper. Here I want to focus on the authors=E2=80=99 excellent = > discussion of what=E2=80=99s different about hybrid memory systems, and= how = > they approached the high-level design of NOVA as a consequence. > > > ** Challenges in designing for hybrid memory systems > ------------------------------------------------------------ > > Xu & Swanson outline three fundamental challenges when designing for = > hybrid memory systems: > 1. Realising the performance potential of the hardware > 2. Write reordering and its impact on consistency > 3. Providing atomicity for operations > > > ** Performance > ------------------------------------------------------------ > > The low latencies of NVMMs alters the trade-offs between hardware and = > software latency. In conventional storage systems, the latency of slow = > storage devices (e.g., disks) dominates access latency, so software = > efficiency is not critical. Previous work has shown that with fast = > NVMM, software costs can quickly dominate memory latency, squandering = > the performance that NVMMs could provide=E2=80=A6 > > It is possible to bypass the DRAM page cache and access NVMM directly = > using a technique called Direct Access (DAX), or eXecute In Place = > (XIP), avoiding extra copies between NVMM and DRAM in the storage = > stack. > > NOVA is a DAX file system, and we expect that all NVMM file systems = > will provide for these (or similar) features. > > > ** Write re-ordering > ------------------------------------------------------------ > > Modern processors and their caching hierarchies may reorder store = > operations to improve performance. The CPU=E2=80=99s memory consistency= = > protocol makes guarantees about the ordering of memory updates, but = > existing models (with the exception of research proposals [20, 46]) do = > not provide guarantees on when updates will reach NVMMs. As a result, = > a power failure may leave the data in an inconsistent state. > > It=E2=80=99s possible to explicitly flush caches and issue memory barri= ers = > to enforce write ordering. However, while an mfence will enforce order = > on memory operations before and after the barrier, it only guarantees = > all CPUs have the same view of the memory. It does not impose any = > constraints on the order of data writebacks to the NVMM. > > Intel has proposed new instructions to fix these problems, which = > include clflushopt, clwb and pcommit. =E2=80=9CNOVA is built with these= = > instructions in mind=E2=80=A6=E2=80=9D > > > ** Atomicity > ------------------------------------------------------------ > > Existing file systems use a variety of techniques like journaling, = > shadow paging, or log-structuring to provide atomicity guarantees. > > A journaling (WAL) system records all updates to a journal before = > applying them, and in the case of a power failure replays the journal = > to restore the system to a consistent state. Shadow paging is a = > copy-on-write mechanism in which a new copy of affected pages is = > written to storage on a write, before swapping out any references to = > the old pages for the new ones. Log-structured file systems (LFS) = > buffer random writes in memory and then convert them into larger = > sequential writes to the disk. This frequent a steady supply of = > contiguous free regions of disk, which in turn entails frequent = > cleaning and compacting of the log to reclaim space. > > RAMCloud (https://blog.acolyer.org/2016/01/18/ramcloud/) is an example = > of a DRAM based storage system that keeps all its data in DRAM to = > service reads, and keeps a persistent version on disk. It uses log = > structure for both DRAM and disk. > > > ** NOVA design principles > ------------------------------------------------------------ > > NOVA is a log-structured, POSIX file system that builds on the = > strengths of LFS and adapts them to take advantage of hybrid memory = > systems. Because it targets a different storage technology, NOVA looks = > very different from conventional log-structured file systems that are = > built to maximize disk bandwidth. > > Three observations influenced the design: > 1. Logs that support atomic updates are easy to implement in NVMM, but = > are not efficient for search operations (e.g. directory lookup and = > file random access). Data structures that support fast search (e.g. = > trees) are more difficult to implement correctly and efficiently in = > NVMM. > 2. The complexity of log cleaning in LFS comes from the need for = > contiguous free regions of storage. In NVMM however, random access is = > cheap and therefore we don=E2=80=99t need to write in contiguous region= s and = > hence don=E2=80=99t need such complex cleaning protocols. > 3. NVMMs support fast, highly concurrent random accesses, and = > therefore using multiple logs does not negatively impact performance. > > Based on this, NOVA: > * Keeps logs in NVMM, and indexes (radix trees) in DRAM. > * Gives each inode its own log, which allows concurrent updates across = > files without synchronization. During recovery, NOVA can replay = > multiple logs simultaneously. > * Uses logging and lightweight journaling for complex atomic updates. = > NOVA=E2=80=99s log-structure provides cheaper atomic updates than journ= aling = > or shadow paging. =E2=80=9CTo atomically write to a log, NOVA first app= ends = > data to the log, and then atomically updates the log tail to commit = > the updates, thus avoiding both the duplicate writes overhead of = > journaling file systems and the cascading update costs of shadow = > paging systems.=E2=80=9D > * Implements the log as a singly linked list! The locality benefits of = > sequential logs are less important in NVMM, so NOVA uses a linked list = > of 4KB NVMM pages. > > Allowing for non-sequential log storage provides three advantages. = > First, allocating log space is easy since NOVA does not need to = > allocate large, contiguous regions for the log. Second, NOVA can = > perform log cleaning at fine-grained page-size granularity. Third, = > reclaiming log pages that contain only stale entries requires just a = > few pointer assignments. > > * Finally, NOVA does not log file data. NOVA uses copy-on-write for = > modified pages, and appends metadata about the write to the log. > > The high-level layout of the NOVA data structures looks like this: > > NOVA=E2=80=99s atomicity comes from a combination of: > * 64-bit atomic updates - NOVA exploits processor support for 64-bit = > atomic writes to memory to directly modify metadata for some = > operations (e.g. a file=E2=80=99s atime for reads), and to commit updat= es to = > the log by updating the inode=E2=80=99s log tail pointer. > * Logging in the inode=E2=80=99s log to record operation that modify a = > single node. > * Lightweight journaling for directory operations that require changes = > to multiple nodes. > * Enforced write ordering by: (1) committing data and log entries to = > NVMM before updating the log tail; (2) committing journal data to NVMM = > before propagating updates; and (3) committing new versions of data = > pages to NVMM before recycling stale ones. If NOVA is running on a = > system that supports the new clflushopt=E2=80=99 clwb, and pcommit = > instructions it will use these to enforce the write ordering, = > otherwise it uses movntq, =E2=80=9Ca non-temporal move instruction that= = > bypasses the CPU cache hierarchy to perform direct writes to NVMM,=E2=80= =9D = > and a combination of clflush and mfence. > > > ** Evaluation > ------------------------------------------------------------ > > Figure 6 shows how NOVA file system operation latency compares to = > other file system across different NVMM configurations. > > Note that NOVA is more sensitive to NVMM performance than the other = > file systems because NOVA=E2=80=99s software overheads are lower, and s= o = > overall performance more directly reflects the underlying memory = > performance. > > Figure 7 shows how NOVA compares to other file systems across four = > Filebench workloads: a file server, web proxy, web server, and varmail = > (emulates an email server). > > Overall, NOVA achieves the best performance in almost all cases, and = > provides data consistency guarantees that are as strong or stronger = > than the other file systems. The performance advantages of NOVA are = > largest on write-intensive workloads with large numbers of files. > > http://twitter.com/intent/tweet?text=3DHow%20NVMM%20changes%20optimum%2= 0file%20system%20design.: = > http%3A%2F%2Fblog.acolyer.org%2F2016%2F05%2F06%2Fnova-a-log-structured-= file-system-for-hybrid-volatilenon-volatile-main-memories = > Tweet = > (http://twitter.com/intent/tweet?text=3DHow%20NVMM%20changes%20optimum%= 20file%20system%20design.: = > http%3A%2F%2Fblog.acolyer.org%2F2016%2F05%2F06%2Fnova-a-log-structured-= file-system-for-hybrid-volatilenon-volatile-main-memories) > This email was brought to you by #themorningpaper = > (http://blog.acolyer.org) : an interesting/influential/important paper = > from the world of CS every weekday morning, as selected by Adrian = > Colyer > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > Copyright =C2=A9 2016 One L and a Y Ltd, All rights reserved. > You are receiving this email because you opted into email delivery = > for your copy of The Morning Paper. > > Our mailing address is: > One L and a Y Ltd > Unit 5755 > PO Box 6945 > London, England W1A 6US > United Kingdom > ** unsubscribe from this list = > (http://acolyer.us9.list-manage.com/unsubscribe?u=3D4188b6afbe9e5d43111= fef4d4&id=3Dde5773de0c&e=3Dae5e599a57&c=3D26178dba42) > ** update subscription preferences = > (http://acolyer.us9.list-manage.com/profile?u=3D4188b6afbe9e5d43111fef4= d4&id=3Dde5773de0c&e=3Dae5e599a57)