From owner-svn-src-head@freebsd.org Fri Dec 21 08:15:33 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6B7C9134167B; Fri, 21 Dec 2018 08:15:33 +0000 (UTC) (envelope-from bde@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 048CC8B98E; Fri, 21 Dec 2018 08:15:33 +0000 (UTC) (envelope-from bde@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id E5717A915; Fri, 21 Dec 2018 08:15:32 +0000 (UTC) (envelope-from bde@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id wBL8FWkn015977; Fri, 21 Dec 2018 08:15:32 GMT (envelope-from bde@FreeBSD.org) Received: (from bde@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id wBL8FWOu015972; Fri, 21 Dec 2018 08:15:32 GMT (envelope-from bde@FreeBSD.org) Message-Id: <201812210815.wBL8FWOu015972@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: bde set sender to bde@FreeBSD.org using -f From: Bruce Evans Date: Fri, 21 Dec 2018 08:15:32 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r342297 - in head: sbin/mdconfig sys/dev/md sys/sys X-SVN-Group: head X-SVN-Commit-Author: bde X-SVN-Commit-Paths: in head: sbin/mdconfig sys/dev/md sys/sys X-SVN-Commit-Revision: 342297 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 048CC8B98E X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-2.97 / 15.00]; local_wl_from(0.00)[FreeBSD.org]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; NEURAL_HAM_SHORT(-0.97)[-0.972,0]; NEURAL_HAM_LONG(-1.00)[-0.999,0]; ASN(0.00)[asn:11403, ipnet:2610:1c1:1::/48, country:US] X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Dec 2018 08:15:33 -0000 Author: bde Date: Fri Dec 21 08:15:31 2018 New Revision: 342297 URL: https://svnweb.freebsd.org/changeset/base/342297 Log: Use VOP_ADVISE() with POSIX_FADV_DONTNEED instead of IO_DIRECT to implement not double-caching for reads from vnode-backed md devices. Use VOP_ADVISE() similarly instead of !IO_DIRECT unsimilarly for writes. Add a "cache" option to mdconfig to allow changing the default of not caching. This depends on a recent commit to fix VOP_ADVISE(). A previous version had optimizations for sequential i/o's (merge the i/o's and only uncache for discontiguous i/o's and for full blocks), but optimizations and knowledge of block boundaries belong in VOP_ADVISE(). Read-ahead should also be handled better, by supporting it in md and discarding it in VOP_ADVISE(). POSIX_FADV_DONTNEED is ignored by zfs, but so is IO_DIRECT. POSIX_FADV_DONTNEED works better than IO_DIRECT if it is not ignored, since it only discards from the buffer cache immediately, while IO_DIRECT also discards from the page cache immediately. IO_DIRECT was not used for writes since it was claimed to be too slow, but most of the slowness for writes is from doing them synchronously by default. Non-synchronous writes still deadlock in many cases. IO_DIRECT only has a special implementation for ffs reads with DIRECTIO configured. Otherwise, if it is not ignored than it uses the buffer and page caches normally except for discarding everything after each i/o, and then it has much the same overheads as POSIX_FADV_DONTNEED. The overheads for reading with ffs and DIRECTIO were similar in tests of md. Reviewed by: kib Modified: head/sbin/mdconfig/mdconfig.8 head/sbin/mdconfig/mdconfig.c head/sys/dev/md/md.c head/sys/sys/mdioctl.h Modified: head/sbin/mdconfig/mdconfig.8 ============================================================================== --- head/sbin/mdconfig/mdconfig.8 Fri Dec 21 06:38:13 2018 (r342296) +++ head/sbin/mdconfig/mdconfig.8 Fri Dec 21 08:15:31 2018 (r342297) @@ -37,7 +37,7 @@ .\" .\" $FreeBSD$ .\" -.Dd August 28, 2017 +.Dd December 21, 2018 .Dt MDCONFIG 8 .Os .Sh NAME @@ -206,6 +206,32 @@ backed devices: avoid .Dv IO_SYNC for increased performance but at the risk of deadlocking the entire kernel. +.It Oo Cm no Oc Ns Cm cache +For +.Cm vnode +backed devices: enable/disable caching of data in system caches. +The default is to not cache. +.Pp +Accesses via the device are converted to accesses via the vnode. +The caching policy for the vnode is used initially. +This is normally to cache. +This caching policy is retained if the +.Cm cache +option is used. +Otherwise, caching is limited +by releasing data from caches soon after each access. +The release has the same semantics as the +.Dv POSIX_FADV_DONTNEED +feature of +.Xr posix_fadvise 2 . +The result is that with normal (non-zfs) caching, +buffers are released from the buffer cache soon after they are constructed, +but their data is kept in the page cache at lower priority. +.Pp +The +.Cm cache +option tends to waste memory by giving unwanted double caching, +but it saves time if there is memory to spare. .It Oo Cm no Oc Ns Cm reserve Allocate and reserve all needed storage from the start, rather than as needed. .It Oo Cm no Oc Ns Cm cluster Modified: head/sbin/mdconfig/mdconfig.c ============================================================================== --- head/sbin/mdconfig/mdconfig.c Fri Dec 21 06:38:13 2018 (r342296) +++ head/sbin/mdconfig/mdconfig.c Fri Dec 21 08:15:31 2018 (r342297) @@ -88,7 +88,7 @@ usage(void) " mdconfig -l [-v] [-n] [-f file] [-u unit]\n" " mdconfig file\n"); fprintf(stderr, "\t\ttype = {malloc, vnode, swap}\n"); - fprintf(stderr, "\t\toption = {cluster, compress, force,\n"); + fprintf(stderr, "\t\toption = {cache, cluster, compress, force,\n"); fprintf(stderr, "\t\t readonly, reserve, ro, verify}\n"); fprintf(stderr, "\t\tsize = %%d (512 byte blocks), %%db (B),\n"); fprintf(stderr, "\t\t %%dk (kB), %%dm (MB), %%dg (GB), \n"); @@ -178,6 +178,10 @@ main(int argc, char **argv) mdio.md_options |= MD_ASYNC; else if (!strcmp(optarg, "noasync")) mdio.md_options &= ~MD_ASYNC; + else if (!strcmp(optarg, "cache")) + mdio.md_options |= MD_CACHE; + else if (!strcmp(optarg, "nocache")) + mdio.md_options &= ~MD_CACHE; else if (!strcmp(optarg, "cluster")) mdio.md_options |= MD_CLUSTER; else if (!strcmp(optarg, "nocluster")) Modified: head/sys/dev/md/md.c ============================================================================== --- head/sys/dev/md/md.c Fri Dec 21 06:38:13 2018 (r342296) +++ head/sys/dev/md/md.c Fri Dec 21 08:15:31 2018 (r342297) @@ -880,7 +880,7 @@ mdstart_vnode(struct md_s *sc, struct bio *bp) struct buf *pb; bus_dma_segment_t *vlist; struct thread *td; - off_t iolen, len, zerosize; + off_t iolen, iostart, len, zerosize; int ma_offs, npages; switch (bp->bio_cmd) { @@ -983,13 +983,10 @@ unmapped_step: auio.uio_iov = &aiov; auio.uio_iovcnt = 1; } - /* - * When reading set IO_DIRECT to try to avoid double-caching - * the data. When writing IO_DIRECT is not optimal. - */ + iostart = auio.uio_offset; if (auio.uio_rw == UIO_READ) { vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); - error = VOP_READ(vp, &auio, IO_DIRECT, sc->cred); + error = VOP_READ(vp, &auio, 0, sc->cred); VOP_UNLOCK(vp, 0); } else { (void) vn_start_write(vp, &mp, V_WAIT); @@ -1002,6 +999,11 @@ unmapped_step: sc->flags &= ~MD_VERIFY; } + /* When MD_CACHE is set, try to avoid double-caching the data. */ + if (error == 0 && (sc->flags & MD_CACHE) == 0) + VOP_ADVISE(vp, iostart, auio.uio_offset - 1, + POSIX_FADV_DONTNEED); + if (pb != NULL) { pmap_qremove((vm_offset_t)pb->b_data, npages); if (error == 0) { @@ -1464,7 +1466,8 @@ mdcreate_vnode(struct md_s *sc, struct md_req *mdr, st sc->fwheads = mdr->md_fwheads; snprintf(sc->ident, sizeof(sc->ident), "MD-DEV%ju-INO%ju", (uintmax_t)vattr.va_fsid, (uintmax_t)vattr.va_fileid); - sc->flags = mdr->md_options & (MD_FORCE | MD_ASYNC | MD_VERIFY); + sc->flags = mdr->md_options & (MD_ASYNC | MD_CACHE | MD_FORCE | + MD_VERIFY); if (!(flags & FWRITE)) sc->flags |= MD_READONLY; sc->vnode = nd.ni_vp; @@ -2184,6 +2187,9 @@ g_md_dumpconf(struct sbuf *sb, const char *indent, str g_conf_printf_escaped(sb, "%s", mp->file); sbuf_printf(sb, "\n"); } + if (mp->type == MD_VNODE) + sbuf_printf(sb, "%s%s\n", indent, + (mp->flags & MD_CACHE) == 0 ? "off": "on"); sbuf_printf(sb, "%s\n"); Modified: head/sys/sys/mdioctl.h ============================================================================== --- head/sys/sys/mdioctl.h Fri Dec 21 06:38:13 2018 (r342296) +++ head/sys/sys/mdioctl.h Fri Dec 21 08:15:31 2018 (r342297) @@ -92,5 +92,6 @@ struct md_ioctl { #define MD_FORCE 0x20 /* Don't try to prevent foot-shooting */ #define MD_ASYNC 0x40 /* Asynchronous mode */ #define MD_VERIFY 0x80 /* Open file with O_VERIFY (vnode only) */ +#define MD_CACHE 0x100 /* Cache vnode data */ #endif /* _SYS_MDIOCTL_H_*/