From owner-freebsd-current@freebsd.org Tue Nov 13 22:09:46 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D9790113409E; Tue, 13 Nov 2018 22:09:45 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com [209.85.167.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 44ECE7ACF2; Tue, 13 Nov 2018 22:09:45 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-lf1-f51.google.com with SMTP id h192so10055903lfg.3; Tue, 13 Nov 2018 14:09:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=eVSb/6yWf5kfSUFGYd4VOZ0Zm6v+RP8IQB1hOBn/UUs=; b=dTqeIfyeDFK70ix6c3VncTBNCbV+D2YboKyaCWCfiCRmUdGNmwrGtduo8qqeH5zH36 K/09DtRDKFywHiM3EHNJsTwqnbR8FnyNQ883ry9AcQmnPzssktB10u7nCUzgabyGoMqd vhuAcEPrbZklMEW6ub+/Mm7WltNI0c4XPMK6u00r1evFF3Zy92VFzZ8chPpqNJuuLug6 7CMFMtAh5EByLtmZ+mio2SrgVuXr9NV7InvXYGgDuMt5j1B2LYXP5Q5XIkR+xhlLgsd5 QqFcfGpXXMi/C8jGx6cakMoKvKNme+OQ/89+SViYG97XrUVtghlEDU5Wr23VUIX+uwAy 9vog== X-Gm-Message-State: AGRZ1gJLpkKF8yHNTDZB5bMnt629704YbBktY6NzRxeYa+n3ZA8CsNsS V2rOs3BRejzBxlqxIcAJF/+d4fFWL4913HjHX+PYPw== X-Google-Smtp-Source: AJdET5c4NMBOKlpRUaMW5suYzwf7VLZr7ATJp7y6adlIpOLqhpcLEVNuUejTc+FTiXmCy+z5WenAM3ZkhQS7zDWnVX4= X-Received: by 2002:a19:d8ea:: with SMTP id r103mr4262631lfi.146.1542146977324; Tue, 13 Nov 2018 14:09:37 -0800 (PST) MIME-Version: 1.0 From: Alan Somers Date: Tue, 13 Nov 2018 15:09:24 -0700 Message-ID: Subject: Hole-punching, TRIM, etc To: freebsd-arch@freebsd.org, freebsd-fs , FreeBSD CURRENT X-Rspamd-Queue-Id: 44ECE7ACF2 X-Spamd-Result: default: False [-4.06 / 200.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[freebsd.org]; TO_DN_SOME(0.00)[]; NEURAL_HAM_LONG(-1.00)[-0.999,0]; IP_SCORE(-1.10)[ipnet: 209.85.128.0/17(-3.50), asn: 15169(-1.91), country: US(-0.09)]; MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com]; NEURAL_HAM_SHORT(-0.95)[-0.946,0]; RCVD_IN_DNSWL_NONE(0.00)[51.167.85.209.list.dnswl.org : 127.0.5.0]; FORGED_SENDER(0.30)[asomers@freebsd.org,asomers@gmail.com]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; FROM_NEQ_ENVFROM(0.00)[asomers@freebsd.org,asomers@gmail.com]; RCVD_COUNT_TWO(0.00)[2] X-Rspamd-Server: mx1.freebsd.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2018 22:09:46 -0000 Hole-punching has been discussed on these lists before[1]. It basically means to turn a dense file into a sparse file by deallocating storage for some of the blocks in the middle. There's no standard API for it. Linux uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2). A related concept is telling a block device that some blocks are no longer used. SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it "Deallocate", ZBC and ZAC call it "Reset Write Pointer". They all do basically the same thing, and it's analogous to hole-punching for regular files. They are also all inaccessible from FreeBSD's userland except by using pass(4), which is inconvenient and protocol-specific. Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from userland, but it's totally undocumented and doesn't work on regular files. I propose adding support for all of these things using the fcntl(2) API. Using the same syntax that Solaris defined, you would be able to punch a hole in a regular file or TRIM blocks from an SSD. ZFS already supports it (though FreeBSD's port never did, and the code was deleted in r303763). Here's what I would do: 1) Add the F_FREESP command to fcntl(2). 2) Add a .fo_space field for struct fileops 3) Add a devfs_space method that implements .fo_space 4) Add a .d_space field to struct cdevsw 5) Add a g_dev_space method for GEOM that implements .d_space using BIO_DELETE. 6) Add a VOP_SPACE vop 7) Implement VOP_SPACE for tmpfs 8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP). The greatest beneficiaries of this work would be type 2 hypervisors like QEMU and VirtualBox with guests that use TRIM, and userland filesystems such as fusefs-ext2 and fusefs-exfat. High-performance storage systems using SPDK would also benefit. The last item, aio_freesp(2), may seem unnecessary but it would really benefit my application. Questions, objections, flames? -Alan [1] https://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010881.html