From owner-freebsd-scsi@freebsd.org Mon Jan 16 11:15:43 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 53CA2CB0FDE for ; Mon, 16 Jan 2017 11:15:43 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CD0C717A8 for ; Mon, 16 Jan 2017 11:15:42 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v0GBFate016035 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 16 Jan 2017 13:15:37 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v0GBFate016035 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id v0GBFaYu016034; Mon, 16 Jan 2017 13:15:36 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 16 Jan 2017 13:15:36 +0200 From: Konstantin Belousov To: Aijaz Baig Cc: Jan Bramkamp , freebsd-scsi@freebsd.org Subject: Re: Understanding the rationale behind dropping of "block devices" Message-ID: <20170116111536.GO2349@kib.kiev.ua> References: <20170116071105.GB4560@eureka.lemis.com> <29469.1484559072@critter.freebsd.dk> <3a76c14b-d3a1-755b-e894-2869cd42aeb6@rlwinm.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.7.2 (2016-11-26) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Jan 2017 11:15:43 -0000 On Mon, Jan 16, 2017 at 04:19:59PM +0530, Aijaz Baig wrote: > I must add that I am getting confused specifically between two different > things here: > >From the replies above it appears that all disk accesses have to go through > the VM subsystem now (so no raw disk accesses) however the arch handbook > says raw interfaces are the way to go for disks ( > https://www.freebsd.org/doc/en/books/arch-handbook/driverbasics-block.html)? Do not mix the concept of raw disk access and using some VM code to implement this access. See my other reply for some more explanation of the raw disk access, physio in the kernel source files terminology, sys/kern/kern_physio.c. > > Secondly, I presume that the VM subsystem has it's own caching and > buffering mechanism that is independent to the file system so an IO can > choose to skip buffering at the file-system layer however it will still be > served by the VM cache irrespective of whatever the VM object maps to. Is > that true? I believe this is what is meant by 'caching' at the VM layer. First, the term page cache has different meaning in the kernel code, and that page cache was removed from the kernel very recently. More correct but much longer term is 'page queue of the vm object'. If given vnode has a vm object associated with it, then buffer cache ensures that buffers for the given chunk of the vnode data range are created from appropriately indexed pages from the queue. This way, buffer cache becomes consistent with the page queue. The vm object is created on the first vnode open by filesystem-specific code, at least for UFS/devfs/msdosfs/cd9600 etc. Caching policy for buffers is determined both by buffer cache and by (somewhat strong) hints from the filesystems interfacing with the cache. The pages constituing the buffer are wired, i.e. VM subsystem is informed by buffer cache to not reclaim pages while the buffer is alive. VM page caching, i.e. storing them in the vnode page queue, is only independent from the buffer cache when VM need/can handle something that does not involve the buffer cache. E.g. on page fault in the region backed by the file, VM allocates neccessary fresh (AKA without valid content) pages and issues read request into the filesystem which owns the vnode. It is up to the filesystem to implement read in any reasonable way. Until recently, UFS and other local filesystems provided raw disk block indexes for the generic VM code which then read content from the disk blocks into the pages. This has its own shares of problems (but not the consistency issue, since pages are allocated in the vnode vm object page queue). I changes that path to go through the buffer cache explicitely several months ago. But all this is highly depended on the filesystem. As the polar case, tmpfs reuses the swap-backed object, which holds the file data, as the vnode' vm object. The result is that paging requests from the tmpfs mapped file is handled as if it is swap-backed anonymous memory. ZFS cannot reuse vm object page queue for its very special cache ARC. So it keeps the consistency between writes and mmaps by copying the data on write(2) both into ARC buffer, and into the pages from vm object. Hope this helps.