Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 27 Jan 2017 10:25:32 +0800
From:      Julian Elischer <julian@freebsd.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Aijaz Baig <aijazbaig1@gmail.com>, "Greg 'groggy' Lehey" <grog@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>, freebsd-scsi@freebsd.org
Subject:   Re: Understanding the rationale behind dropping of "block devices"
Message-ID:  <7cf12959-5c1e-2be8-5974-69a96f2cd9d7@freebsd.org>
In-Reply-To: <20170116110009.GN2349@kib.kiev.ua>
References:  <CAHB2L%2BdRbX=E9NxGLd_eHsEeD0ZVYDYAx2k9h17BR0Lc=xu5HA@mail.gmail.com> <20170116071105.GB4560@eureka.lemis.com> <CAHB2L%2Bd9=rBBo48qR%2BPXgy%2BJDa=VRk5cM%2B9hAKDCPW%2BrqFgZAQ@mail.gmail.com> <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org> <20170116110009.GN2349@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 16/1/17 7:00 pm, Konstantin Belousov wrote:
> On Mon, Jan 16, 2017 at 05:20:25PM +0800, Julian Elischer wrote:
>> On 16/01/2017 4:49 PM, Aijaz Baig wrote:
>>> Oh yes I was actually running an old release inside a VM and yes I had
>>> changed the device names myself while jotting down notes (to give it a more
>>> descriptive name like what the OSX does). So now I've checked it on a
>>> recent release and yes there is indeed no block device.
>>>
>>> root@bsd-client:/dev # gpart show
>>> =>      34  83886013  da0  GPT  (40G)
>>>           34      1024    1  freebsd-boot  (512K)
>>>         1058  58719232    2  freebsd-ufs  (28G)
>>>     58720290   3145728    3  freebsd-swap  (1.5G)
>>>     61866018  22020029       - free -  (10G)
>>>
>>> root@bsd-client:/dev # ls -lrt da*
>>> crw-r-----  1 root  operator  0x4d Dec 19 17:49 da0p1
>>> crw-r-----  1 root  operator  0x4b Dec 19 17:49 da0
>>> crw-r-----  1 root  operator  0x4f Dec 19 23:19 da0p3
>>> crw-r-----  1 root  operator  0x4e Dec 19 23:19 da0p2
>>>
>>> So this shows that I have a single SATA or SAS drive and there are
>>> apparently 3 partitions ( or is it four?? Why does it show unused space
>>> when I had used the entire disk?)
>>>
>>> Nevertheless my question still holds. What does 'removing support for block
>>> device' mean in this context? Was what I mentioned earlier with regards to
>>> my understanding correct? Viz. all disk devices now have a character (or
>>> raw) interface and are no longer served via the "page cache" but rather the
>>> "buffer cache". Does that mean all disk accesses are now direct by passing
>>> the file system??
>> Basically, FreeBSD never really buffered/cached by device.
>>
>> Buffering and caching is done by vnode in the filesystem.
>> We have no device-based block cache.  If you want file X at offset Y,
>> then we can satisfy that from cache.
>> VM objects map closely to vnode objects so the VM system IS the file
>> system buffer cache.
> This is not true.
>
> We do have buffer cache of the blocks read through the device (special)
> vnode.  This is how, typically, the metadata for filesystems which are
> clients of the buffer cache, is handled, i.e. UFS msdosfs cd9600 etc.
> It is up to the filesystem to not create aliased cached copies of the
> blocks both in the device vnode buffer list and in the filesystem vnode.
>
> In fact, sometimes filesystems, e.g. UFS, consciously break this rule
> and read blocks of the user vnode through the disk cache.  For instance,
> this happens for the SU truncation of the indirect blocks.
yes this caches blocks as an offset into a device, but it is still 
really a
part of the system which provides caching services to vnodes.
(at least that is how it was last time I looked)
>
>> If you want  device M, at offset N we will fetch it for you from the
>> device, DMA'd directly into your address space,
>> but there is no cached copy.
>> Having said that, it would be trivial to add a 'caching' geom layer to
>> the system but that has never been needed.
> The useful interpretation of the claim that FreeBSD does not cache
> disk blocks is that the cache is not accessible over the user-initiated
> i/o (read(2) and write(2)) through the opened devfs nodes.  If a program
> issues such request, it indeed goes directly to/from disk driver, which
> is supplied a kernel buffer formed by remapped user pages.  Note that
> if this device was or is mounted and filesystem kept some metadata in
> the buffer cache, then the devfs i/o would make the cache inconsistent.
>
>> The added complexity of carrying around two alternate interfaces to
>> the same devices was judged by those who did the work to be not worth
>> the small gain available to the very few people who used raw devices.
>> Interestingly, since that time ZFS has implemented a block-layer cache
>> for itself which is of course not integrated with the non-existing
>> block level cache in the system :-).
> We do carry two interfaces in the cdev drivers, which are lumped into
> one. In particular, it is not easy to implement mapping of the block
> devices exactly because the interfaces are mixed. If cdev disk device is
> mapped, VM would try to use cdevsw d_mmap or later mapping interfaces to
> handle user page faults, which is incorrect for the purpose of the disk
> block mapping.
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7cf12959-5c1e-2be8-5974-69a96f2cd9d7>