Date: Fri, 27 Jan 2017 10:25:32 +0800 From: Julian Elischer <julian@freebsd.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: Aijaz Baig <aijazbaig1@gmail.com>, "Greg 'groggy' Lehey" <grog@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>, freebsd-scsi@freebsd.org Subject: Re: Understanding the rationale behind dropping of "block devices" Message-ID: <7cf12959-5c1e-2be8-5974-69a96f2cd9d7@freebsd.org> In-Reply-To: <20170116110009.GN2349@kib.kiev.ua> References: <CAHB2L%2BdRbX=E9NxGLd_eHsEeD0ZVYDYAx2k9h17BR0Lc=xu5HA@mail.gmail.com> <20170116071105.GB4560@eureka.lemis.com> <CAHB2L%2Bd9=rBBo48qR%2BPXgy%2BJDa=VRk5cM%2B9hAKDCPW%2BrqFgZAQ@mail.gmail.com> <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org> <20170116110009.GN2349@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On 16/1/17 7:00 pm, Konstantin Belousov wrote: > On Mon, Jan 16, 2017 at 05:20:25PM +0800, Julian Elischer wrote: >> On 16/01/2017 4:49 PM, Aijaz Baig wrote: >>> Oh yes I was actually running an old release inside a VM and yes I had >>> changed the device names myself while jotting down notes (to give it a more >>> descriptive name like what the OSX does). So now I've checked it on a >>> recent release and yes there is indeed no block device. >>> >>> root@bsd-client:/dev # gpart show >>> => 34 83886013 da0 GPT (40G) >>> 34 1024 1 freebsd-boot (512K) >>> 1058 58719232 2 freebsd-ufs (28G) >>> 58720290 3145728 3 freebsd-swap (1.5G) >>> 61866018 22020029 - free - (10G) >>> >>> root@bsd-client:/dev # ls -lrt da* >>> crw-r----- 1 root operator 0x4d Dec 19 17:49 da0p1 >>> crw-r----- 1 root operator 0x4b Dec 19 17:49 da0 >>> crw-r----- 1 root operator 0x4f Dec 19 23:19 da0p3 >>> crw-r----- 1 root operator 0x4e Dec 19 23:19 da0p2 >>> >>> So this shows that I have a single SATA or SAS drive and there are >>> apparently 3 partitions ( or is it four?? Why does it show unused space >>> when I had used the entire disk?) >>> >>> Nevertheless my question still holds. What does 'removing support for block >>> device' mean in this context? Was what I mentioned earlier with regards to >>> my understanding correct? Viz. all disk devices now have a character (or >>> raw) interface and are no longer served via the "page cache" but rather the >>> "buffer cache". Does that mean all disk accesses are now direct by passing >>> the file system?? >> Basically, FreeBSD never really buffered/cached by device. >> >> Buffering and caching is done by vnode in the filesystem. >> We have no device-based block cache. If you want file X at offset Y, >> then we can satisfy that from cache. >> VM objects map closely to vnode objects so the VM system IS the file >> system buffer cache. > This is not true. > > We do have buffer cache of the blocks read through the device (special) > vnode. This is how, typically, the metadata for filesystems which are > clients of the buffer cache, is handled, i.e. UFS msdosfs cd9600 etc. > It is up to the filesystem to not create aliased cached copies of the > blocks both in the device vnode buffer list and in the filesystem vnode. > > In fact, sometimes filesystems, e.g. UFS, consciously break this rule > and read blocks of the user vnode through the disk cache. For instance, > this happens for the SU truncation of the indirect blocks. yes this caches blocks as an offset into a device, but it is still really a part of the system which provides caching services to vnodes. (at least that is how it was last time I looked) > >> If you want device M, at offset N we will fetch it for you from the >> device, DMA'd directly into your address space, >> but there is no cached copy. >> Having said that, it would be trivial to add a 'caching' geom layer to >> the system but that has never been needed. > The useful interpretation of the claim that FreeBSD does not cache > disk blocks is that the cache is not accessible over the user-initiated > i/o (read(2) and write(2)) through the opened devfs nodes. If a program > issues such request, it indeed goes directly to/from disk driver, which > is supplied a kernel buffer formed by remapped user pages. Note that > if this device was or is mounted and filesystem kept some metadata in > the buffer cache, then the devfs i/o would make the cache inconsistent. > >> The added complexity of carrying around two alternate interfaces to >> the same devices was judged by those who did the work to be not worth >> the small gain available to the very few people who used raw devices. >> Interestingly, since that time ZFS has implemented a block-layer cache >> for itself which is of course not integrated with the non-existing >> block level cache in the system :-). > We do carry two interfaces in the cdev drivers, which are lumped into > one. In particular, it is not easy to implement mapping of the block > devices exactly because the interfaces are mixed. If cdev disk device is > mapped, VM would try to use cdevsw d_mmap or later mapping interfaces to > handle user page faults, which is incorrect for the purpose of the disk > block mapping. >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7cf12959-5c1e-2be8-5974-69a96f2cd9d7>