From owner-freebsd-scsi@freebsd.org  Mon Jan 16 02:40:20 2017
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0E23ECB2D87;
 Mon, 16 Jan 2017 02:40:20 +0000 (UTC)
 (envelope-from aijazbaig1@gmail.com)
Received: from mail-wm0-x22a.google.com (mail-wm0-x22a.google.com
 [IPv6:2a00:1450:400c:c09::22a])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 9BBC61FC5;
 Mon, 16 Jan 2017 02:40:19 +0000 (UTC)
 (envelope-from aijazbaig1@gmail.com)
Received: by mail-wm0-x22a.google.com with SMTP id c206so155334187wme.0;
 Sun, 15 Jan 2017 18:40:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:from:date:message-id:subject:to;
 bh=IkUw+lRenFwsponY9eox5D3RbhR1NjEGnbcBVFyhyBw=;
 b=ja7L/wguo0HfsbwJSmM7OfMgYejiRX0xRo9rZkeODwuK1GsbWiI2pT8mBJkWD5RZpz
 Ziz2AEXxjwd2EzAFEJj1wU66H08FbU3dM5zAFfzVfBmSw2+lcos5XOUqfZgZ34CiQM58
 JOI7AKA7jomqFS/Qe54alTwTOLyJblTNlBYtG8KuHEN/Syla8LXTFRHhgSq0vMjwuZBh
 I92jlnbQ6V2uqyEJtWFFcjo9IvSm/Ul2QxZfsf+ukb5KvZRBCiHkUCNFgoUdEZ0CzZ+O
 /SqWY2ZLFEucNV5Zk+92e5pqxkBQ+OpzDo58VCH04sLZNVdUVccL8N8EnCuYUE9AM1BH
 lAuA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
 bh=IkUw+lRenFwsponY9eox5D3RbhR1NjEGnbcBVFyhyBw=;
 b=tykdDJo5qCoOXZks6Dsqhx10xt4VdPFnjAmAy5EdsgwEPrxGWlYO29dLtTMhbSZOnR
 B0DN63LR00FHdwz4U0zzbdVfSojQeM5vVgx//W0y414MBmMpzVUEEMZfumFkIh5qEOOq
 dHL9bzXdiZVQUu666IDPHvwHA5DDlqgyttXE8hgLybxnBERpIPC2PORdKx8MoaMM9pYl
 eGfHuvIKeRavFWvSRFl1lw9AC2+n2tQPiR79wlx0cyEVi5a4SvYq7jOvUFUnGrc1RLca
 y4mix6aKlPOnQ+ah53SohlzgnLDagWh2T3hxDvDRsb8M/YWaGpdJkKn6HXeilEqI4D8s
 eXoQ==
X-Gm-Message-State: AIkVDXJDuGU6D5c8FO85MA4hkTlLYnsKrtU2ktmVEKQL2BwVJG94yvWWH9qcL4bxzUjdJCYko/R5V5cHAn4WFQ==
X-Received: by 10.28.41.5 with SMTP id p5mr11353083wmp.38.1484534417292; Sun,
 15 Jan 2017 18:40:17 -0800 (PST)
MIME-Version: 1.0
Received: by 10.195.12.46 with HTTP; Sun, 15 Jan 2017 18:40:16 -0800 (PST)
From: Aijaz Baig <aijazbaig1@gmail.com>
Date: Mon, 16 Jan 2017 08:10:16 +0530
Message-ID: <CAHB2L+dRbX=E9NxGLd_eHsEeD0ZVYDYAx2k9h17BR0Lc=xu5HA@mail.gmail.com>
Subject: Understanding the rationale behind dropping of "block devices"
To: FreeBSD Hackers <freebsd-hackers@freebsd.org>, freebsd-scsi@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.23
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Jan 2017 02:40:20 -0000

I am a relative noob to the storage world in general and FreeBSD in
particular. OF what I have been learning of late, I have become somewhat
familiar with such concepts like disk queuing, IOPs, latencies and the likes
I am also reading the classic 'The design and implementation of the FreeBSD
operating system'. However of what I am reading, FreeBSD has "done away"
with "block devices" altogether

Of what I have been reading in that book and elsewhere it appears that the
"block" devices have been dropped out of the architecture. So what I, with
my (still) very limited knowledge of storage, understand this as there are
no drivers in FreeBSD that would deal with blocks of data.

But when I check the disk nodes under /dev I get this
[CODE]ls -l /dev/*disk0
brw-r----- 1 root operator 14, 0 Jan 2 09:39 /dev/disk0
crw-r----- 1 root operator 14, 0 Jan 2 09:39 /dev/rdisk0[/CODE]

where 'b' means block interface and 'c' means char or raw interface. So how
do I reconcile this with what I read about "block devices being gone"
before. What does 'block' mean here?

Of what I know, the block device would be served through the "page cache"
(a place where file system caches it's data and meta data) where as the raw
device would be served via the "buffer cache" where "disk blocks" are
cached by the OS. Thus a block device would be served via the file system
where as the raw device won't. Is this correct?? If yes, then what does
'block' above signify? Or rephrasing the question, what was there earlier
in FreeBSD before 'block device support' was dropped?

I am sure seasoned storage veterans would have a lot more to add. I would
be highly obliged if some one could please elaborate and add more context
to it.

-- 

Best Regards,
Aijaz Baig

From owner-freebsd-scsi@freebsd.org  Mon Jan 16 07:20:33 2017
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id EDD56CB22A3;
 Mon, 16 Jan 2017 07:20:33 +0000 (UTC) (envelope-from grog@lemis.com)
Received: from www.lemis.com (www.lemis.com [208.86.226.86])
 by mx1.freebsd.org (Postfix) with ESMTP id CCC651E19;
 Mon, 16 Jan 2017 07:20:33 +0000 (UTC) (envelope-from grog@lemis.com)
Received: from eureka.lemis.com (www.lemis.com [208.86.226.86])
 by www.lemis.com (Postfix) with ESMTP id 533D51B72804;
 Mon, 16 Jan 2017 07:11:06 +0000 (UTC)
Received: by eureka.lemis.com (Postfix, from userid 1004)
 id 4DC0F4494B2; Mon, 16 Jan 2017 18:11:05 +1100 (AEDT)
Date: Mon, 16 Jan 2017 18:11:05 +1100
From: Greg 'groggy' Lehey <grog@FreeBSD.org>
To: Aijaz Baig <aijazbaig1@gmail.com>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>, freebsd-scsi@freebsd.org
Subject: Re: Understanding the rationale behind dropping of "block devices"
Message-ID: <20170116071105.GB4560@eureka.lemis.com>
References: <CAHB2L+dRbX=E9NxGLd_eHsEeD0ZVYDYAx2k9h17BR0Lc=xu5HA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="gj572EiMnwbLXET9"
Content-Disposition: inline
In-Reply-To: <CAHB2L+dRbX=E9NxGLd_eHsEeD0ZVYDYAx2k9h17BR0Lc=xu5HA@mail.gmail.com>
Organization: The FreeBSD Project
Phone: +61-3-5346-1370, +61-3-5309-0418
Mobile: 0401 265 606.  Use only as instructed.
WWW-Home-Page: http://www.FreeBSD.org/
X-PGP-Fingerprint: 9A1B 8202 BCCE B846 F92F  09AC 22E6 F290 507A 4223
User-Agent: Mutt/1.6.1 (2016-04-27)
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Jan 2017 07:20:34 -0000


--gj572EiMnwbLXET9
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Monday, 16 January 2017 at  8:10:16 +0530, Aijaz Baig wrote:
>
> But when I check the disk nodes under /dev I get this
> [CODE]ls -l /dev/*disk0
> brw-r----- 1 root operator 14, 0 Jan 2 09:39 /dev/disk0
> crw-r----- 1 root operator 14, 0 Jan 2 09:39 /dev/rdisk0[/CODE]

Are you sure that this is FreeBSD?  The naming convention looks more
like Mac OS, though the major device number doesn't match.  FreeBSD
has been through a number of disk naming conventions, but I'm pretty
sure that we never had anything as straightforward as 'disk'.

> what was there earlier in FreeBSD before 'block device support' was
> dropped?

Apart from the name, things used to look similar.  Here a quote from
"The Complete FreeBSD", written some time at the end of the last
century:

crw-r-----  1 root  operator    3, 131072 Oct 31 19:59 /dev/rwd0s1a
brw-r-----  1 root  operator    0, 131072 Oct 31 19:59 /dev/wd0s1a

The minor number included partition encoding, thus the large number.

Greg
--
Sent from my desktop computer.
Finger grog@FreeBSD.org for PGP public key.
See complete headers for address and phone numbers.
This message is digitally signed.  If your Microsoft mail program
reports problems, please read http://lemis.com/broken-MUA

--gj572EiMnwbLXET9
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAlh8cgkACgkQIubykFB6QiM7jwCgoqb+1Zq6wHcox91JMKjSJCM8
7WEAmwbm2veBM5jStU+1syjSSVhxzM3D
=fMg4
-----END PGP SIGNATURE-----

--gj572EiMnwbLXET9--

From owner-freebsd-scsi@freebsd.org  Mon Jan 16 08:49:21 2017
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E1514CB1980;
 Mon, 16 Jan 2017 08:49:21 +0000 (UTC)
 (envelope-from aijazbaig1@gmail.com)
Received: from mail-wm0-x22e.google.com (mail-wm0-x22e.google.com
 [IPv6:2a00:1450:400c:c09::22e])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 7717417F7;
 Mon, 16 Jan 2017 08:49:21 +0000 (UTC)
 (envelope-from aijazbaig1@gmail.com)
Received: by mail-wm0-x22e.google.com with SMTP id f73so29182971wmf.1;
 Mon, 16 Jan 2017 00:49:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=8s/68QNRCN11+Bsk89IqMlROQYvJwcEM6Qt+bNy+G+g=;
 b=M3gURCAiVx/b1VzinSsZ/gTB9Cvtnoyv0xfTkB3Wj4ybNSwHP7Og7fgWeVOz1gwkBV
 DeEzIKUn7kjKuFiBAkbciO3p4JEWalFfFQgUZdK+DaKSnuQ4TU/JnFNIBmwoh700UIdR
 m+T15FhiDTuGDfmo6JpcAh2h8kAcJszNvXO5i0isptFhiijrI++fdFYhsycffgY85orN
 4H1FcnjuEgzhGzBUJkUww+Y52WCDiaW70DLKQGsmyiA2Jd+6lqr5khuFFdYBJ7eL6Ygl
 E8Fs+p1dOsGD6nxUNg9+dr4ojca7kIQxSvUheFEoOHROVSdiGj8CPXzi4SxGO0gSBo9v
 im3Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=8s/68QNRCN11+Bsk89IqMlROQYvJwcEM6Qt+bNy+G+g=;
 b=fYDdiwE6Teq+bVSdjr1c23pk6hb5yRPvs+ZWPioNIEQ5mtDIQGAfQciA8S3eD2p043
 yHyylZUsepMdoW6m/KMizq1ebrXwHC3lbN/uwkIJUuF8Jjrd3naL3HWzSjooIN1eUt1j
 kF6lCie6GWpUVgybXjz0DIodvlo/VwmqvPuzTr+hjFBKmAJUP+uCjByDom0q9ElYvAKm
 c6nxM6pVTciZqcq9sCghXprEs7xQLxtUkao8KIpKtTqFnuj6Pw6EI1aXUGI990ecDkLS
 KQ017tS3gMm7iURD2HP6WMFp2xwxfYmm0n/95rAhAKpcaKKBi4R0AJe92wRQomrMg7PL
 OI/g==
X-Gm-Message-State: AIkVDXIU7GpDXZ5LENWSnZXU44W5n+p8It74FZyYOLhBoLdXvS29zuzZjREGv4YBnnkPrzvKwKD7giCu/apWew==
X-Received: by 10.28.220.135 with SMTP id t129mr12474397wmg.38.1484556558922; 
 Mon, 16 Jan 2017 00:49:18 -0800 (PST)
MIME-Version: 1.0
Received: by 10.195.12.46 with HTTP; Mon, 16 Jan 2017 00:49:18 -0800 (PST)
In-Reply-To: <20170116071105.GB4560@eureka.lemis.com>
References: <CAHB2L+dRbX=E9NxGLd_eHsEeD0ZVYDYAx2k9h17BR0Lc=xu5HA@mail.gmail.com>
 <20170116071105.GB4560@eureka.lemis.com>
From: Aijaz Baig <aijazbaig1@gmail.com>
Date: Mon, 16 Jan 2017 14:19:18 +0530
Message-ID: <CAHB2L+d9=rBBo48qR+PXgy+JDa=VRk5cM+9hAKDCPW+rqFgZAQ@mail.gmail.com>
Subject: Re: Understanding the rationale behind dropping of "block devices"
To: "Greg 'groggy' Lehey" <grog@freebsd.org>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>, freebsd-scsi@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.23
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Jan 2017 08:49:22 -0000

Oh yes I was actually running an old release inside a VM and yes I had
changed the device names myself while jotting down notes (to give it a more
descriptive name like what the OSX does). So now I've checked it on a
recent release and yes there is indeed no block device.

root@bsd-client:/dev # gpart show
=>      34  83886013  da0  GPT  (40G)
        34      1024    1  freebsd-boot  (512K)
      1058  58719232    2  freebsd-ufs  (28G)
  58720290   3145728    3  freebsd-swap  (1.5G)
  61866018  22020029       - free -  (10G)

root@bsd-client:/dev # ls -lrt da*
crw-r-----  1 root  operator  0x4d Dec 19 17:49 da0p1
crw-r-----  1 root  operator  0x4b Dec 19 17:49 da0
crw-r-----  1 root  operator  0x4f Dec 19 23:19 da0p3
crw-r-----  1 root  operator  0x4e Dec 19 23:19 da0p2

So this shows that I have a single SATA or SAS drive and there are
apparently 3 partitions ( or is it four?? Why does it show unused space
when I had used the entire disk?)

Nevertheless my question still holds. What does 'removing support for block
device' mean in this context? Was what I mentioned earlier with regards to
my understanding correct? Viz. all disk devices now have a character (or
raw) interface and are no longer served via the "page cache" but rather the
"buffer cache". Does that mean all disk accesses are now direct by passing
the file system??

On Mon, Jan 16, 2017 at 12:41 PM, Greg 'groggy' Lehey <grog@freebsd.org>
wrote:

> On Monday, 16 January 2017 at  8:10:16 +0530, Aijaz Baig wrote:
> >
> > But when I check the disk nodes under /dev I get this
> > [CODE]ls -l /dev/*disk0
> > brw-r----- 1 root operator 14, 0 Jan 2 09:39 /dev/disk0
> > crw-r----- 1 root operator 14, 0 Jan 2 09:39 /dev/rdisk0[/CODE]
>
> Are you sure that this is FreeBSD?  The naming convention looks more
> like Mac OS, though the major device number doesn't match.  FreeBSD
> has been through a number of disk naming conventions, but I'm pretty
> sure that we never had anything as straightforward as 'disk'.
>
> > what was there earlier in FreeBSD before 'block device support' was
> > dropped?
>
> Apart from the name, things used to look similar.  Here a quote from
> "The Complete FreeBSD", written some time at the end of the last
> century:
>
> crw-r-----  1 root  operator    3, 131072 Oct 31 19:59 /dev/rwd0s1a
> brw-r-----  1 root  operator    0, 131072 Oct 31 19:59 /dev/wd0s1a
>
> The minor number included partition encoding, thus the large number.
>
> Greg
> --
> Sent from my desktop computer.
> Finger grog@FreeBSD.org for PGP public key.
> See complete headers for address and phone numbers.
> This message is digitally signed.  If your Microsoft mail program
> reports problems, please read http://lemis.com/broken-MUA
>


-- 

Best Regards,
Aijaz Baig

From owner-freebsd-scsi@freebsd.org  Mon Jan 16 09:20:41 2017
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 55A9BCB2874;
 Mon, 16 Jan 2017 09:20:41 +0000 (UTC)
 (envelope-from julian@freebsd.org)
Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "vps1.elischer.org",
 Issuer "CA Cert Signing Authority" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 3B71319ED;
 Mon, 16 Jan 2017 09:20:40 +0000 (UTC)
 (envelope-from julian@freebsd.org)
Received: from Julian-MBP3.local
 (ppp121-45-228-247.lns20.per1.internode.on.net [121.45.228.247])
 (authenticated bits=0)
 by vps1.elischer.org (8.15.2/8.15.2) with ESMTPSA id v0G9KUFm024905
 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO);
 Mon, 16 Jan 2017 01:20:36 -0800 (PST)
 (envelope-from julian@freebsd.org)
Subject: Re: Understanding the rationale behind dropping of "block devices"
To: Aijaz Baig <aijazbaig1@gmail.com>, "Greg 'groggy' Lehey" <grog@freebsd.org>
References: <CAHB2L+dRbX=E9NxGLd_eHsEeD0ZVYDYAx2k9h17BR0Lc=xu5HA@mail.gmail.com>
 <20170116071105.GB4560@eureka.lemis.com>
 <CAHB2L+d9=rBBo48qR+PXgy+JDa=VRk5cM+9hAKDCPW+rqFgZAQ@mail.gmail.com>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>, freebsd-scsi@freebsd.org
From: Julian Elischer <julian@freebsd.org>
Message-ID: <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org>
Date: Mon, 16 Jan 2017 17:20:25 +0800
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
 Gecko/20100101 Thunderbird/45.6.0
MIME-Version: 1.0
In-Reply-To: <CAHB2L+d9=rBBo48qR+PXgy+JDa=VRk5cM+9hAKDCPW+rqFgZAQ@mail.gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Jan 2017 09:20:41 -0000

On 16/01/2017 4:49 PM, Aijaz Baig wrote:
> Oh yes I was actually running an old release inside a VM and yes I had
> changed the device names myself while jotting down notes (to give it a more
> descriptive name like what the OSX does). So now I've checked it on a
> recent release and yes there is indeed no block device.
>
> root@bsd-client:/dev # gpart show
> =>      34  83886013  da0  GPT  (40G)
>          34      1024    1  freebsd-boot  (512K)
>        1058  58719232    2  freebsd-ufs  (28G)
>    58720290   3145728    3  freebsd-swap  (1.5G)
>    61866018  22020029       - free -  (10G)
>
> root@bsd-client:/dev # ls -lrt da*
> crw-r-----  1 root  operator  0x4d Dec 19 17:49 da0p1
> crw-r-----  1 root  operator  0x4b Dec 19 17:49 da0
> crw-r-----  1 root  operator  0x4f Dec 19 23:19 da0p3
> crw-r-----  1 root  operator  0x4e Dec 19 23:19 da0p2
>
> So this shows that I have a single SATA or SAS drive and there are
> apparently 3 partitions ( or is it four?? Why does it show unused space
> when I had used the entire disk?)
>
> Nevertheless my question still holds. What does 'removing support for block
> device' mean in this context? Was what I mentioned earlier with regards to
> my understanding correct? Viz. all disk devices now have a character (or
> raw) interface and are no longer served via the "page cache" but rather the
> "buffer cache". Does that mean all disk accesses are now direct by passing
> the file system??

Basically, FreeBSD never really buffered/cached by device.

Buffering and caching is done by vnode in the filesystem.
We have no device-based block cache.  If you want file X at offset Y, 
then we can satisfy that from cache.
VM objects map closely to vnode objects so the VM system IS the file 
system buffer cache.

If you want  device M, at offset N we will fetch it for you from the 
device, DMA'd directly into your address space,
but there is no cached copy.
Having said that, it would be trivial to add a 'caching' geom layer to 
the system but that has never been needed.

The added complexity of carrying around two alternate interfaces to 
the same devices was judged by those who did the work to be not worth 
the small gain available to the very few people who used raw devices.
Interestingly, since that time ZFS has implemented a block-layer cache 
for itself which is of course not integrated with the non-existing 
block level cache in the system :-).

>
> On Mon, Jan 16, 2017 at 12:41 PM, Greg 'groggy' Lehey <grog@freebsd.org>
> wrote:
>
>> On Monday, 16 January 2017 at  8:10:16 +0530, Aijaz Baig wrote:
>>> But when I check the disk nodes under /dev I get this
>>> [CODE]ls -l /dev/*disk0
>>> brw-r----- 1 root operator 14, 0 Jan 2 09:39 /dev/disk0
>>> crw-r----- 1 root operator 14, 0 Jan 2 09:39 /dev/rdisk0[/CODE]
>> Are you sure that this is FreeBSD?  The naming convention looks more
>> like Mac OS, though the major device number doesn't match.  FreeBSD
>> has been through a number of disk naming conventions, but I'm pretty
>> sure that we never had anything as straightforward as 'disk'.
>>
>>> what was there earlier in FreeBSD before 'block device support' was
>>> dropped?
>> Apart from the name, things used to look similar.  Here a quote from
>> "The Complete FreeBSD", written some time at the end of the last
>> century:
>>
>> crw-r-----  1 root  operator    3, 131072 Oct 31 19:59 /dev/rwd0s1a
>> brw-r-----  1 root  operator    0, 131072 Oct 31 19:59 /dev/wd0s1a
>>
>> The minor number included partition encoding, thus the large number.
>>
>> Greg
>> --
>> Sent from my desktop computer.
>> Finger grog@FreeBSD.org for PGP public key.
>> See complete headers for address and phone numbers.
>> This message is digitally signed.  If your Microsoft mail program
>> reports problems, please read http://lemis.com/broken-MUA
>>
>
>


From owner-freebsd-scsi@freebsd.org  Mon Jan 16 09:31:21 2017
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id DBEC4CB2BF2;
 Mon, 16 Jan 2017 09:31:21 +0000 (UTC)
 (envelope-from phk@phk.freebsd.dk)
Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222])
 by mx1.freebsd.org (Postfix) with ESMTP id A4EF8109A;
 Mon, 16 Jan 2017 09:31:21 +0000 (UTC)
 (envelope-from phk@phk.freebsd.dk)
Received: from critter.freebsd.dk (unknown [192.168.55.3])
 by phk.freebsd.dk (Postfix) with ESMTP id 41EA2273AC;
 Mon, 16 Jan 2017 09:31:14 +0000 (UTC)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
 by critter.freebsd.dk (8.15.2/8.15.2) with ESMTP id v0G9VC81029470;
 Mon, 16 Jan 2017 09:31:13 GMT (envelope-from phk@phk.freebsd.dk)
To: Julian Elischer <julian@freebsd.org>
cc: Aijaz Baig <aijazbaig1@gmail.com>,
 "Greg 'groggy' Lehey" <grog@freebsd.org>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>, freebsd-scsi@freebsd.org
Subject: Re: Understanding the rationale behind dropping of "block devices"
In-reply-to: <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org>
From: "Poul-Henning Kamp" <phk@phk.freebsd.dk>
References: <CAHB2L+dRbX=E9NxGLd_eHsEeD0ZVYDYAx2k9h17BR0Lc=xu5HA@mail.gmail.com>
 <20170116071105.GB4560@eureka.lemis.com>
 <CAHB2L+d9=rBBo48qR+PXgy+JDa=VRk5cM+9hAKDCPW+rqFgZAQ@mail.gmail.com>
 <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <29468.1484559072.1@critter.freebsd.dk>
Content-Transfer-Encoding: quoted-printable
Date: Mon, 16 Jan 2017 09:31:12 +0000
Message-ID: <29469.1484559072@critter.freebsd.dk>
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Jan 2017 09:31:22 -0000

--------
In message <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org>, Julian Elis=
cher =

writes:

>Having said that, it would be trivial to add a 'caching' geom layer to =

>the system but that has never been needed.

A tinker-toy-cache like that would be architecturally disgusting.

The right solution would be to enable mmap(2)'ing of disk(-like)
devices, leveraging the VM systems exsting code for caching and
optimistic prefetch/clustering, including the very primitive
cache-control/visibility offered by madvise(2), mincore(2), mprotect(2),
msync(2) etc.

-- =

Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    =

Never attribute to malice what can adequately be explained by incompetence=
.

From owner-freebsd-scsi@freebsd.org  Mon Jan 16 10:26:18 2017
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2BE75CAFECE
 for <freebsd-scsi@mailman.ysv.freebsd.org>;
 Mon, 16 Jan 2017 10:26:18 +0000 (UTC) (envelope-from crest@rlwinm.de)
Received: from smtp.rlwinm.de (smtp.rlwinm.de [IPv6:2a01:4f8:201:31ef::e])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id ECD051CE7
 for <freebsd-scsi@freebsd.org>; Mon, 16 Jan 2017 10:26:17 +0000 (UTC)
 (envelope-from crest@rlwinm.de)
Received: from crest.local (unknown [87.253.189.132])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.rlwinm.de (Postfix) with ESMTPSA id 67CD6111D9
 for <freebsd-scsi@freebsd.org>; Mon, 16 Jan 2017 11:26:08 +0100 (CET)
Subject: Re: Understanding the rationale behind dropping of "block devices"
To: freebsd-scsi@freebsd.org
References: <CAHB2L+dRbX=E9NxGLd_eHsEeD0ZVYDYAx2k9h17BR0Lc=xu5HA@mail.gmail.com>
 <20170116071105.GB4560@eureka.lemis.com>
 <CAHB2L+d9=rBBo48qR+PXgy+JDa=VRk5cM+9hAKDCPW+rqFgZAQ@mail.gmail.com>
 <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org>
 <29469.1484559072@critter.freebsd.dk>
From: Jan Bramkamp <crest@rlwinm.de>
Message-ID: <3a76c14b-d3a1-755b-e894-2869cd42aeb6@rlwinm.de>
Date: Mon, 16 Jan 2017 11:26:07 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0)
 Gecko/20100101 Thunderbird/45.6.0
MIME-Version: 1.0
In-Reply-To: <29469.1484559072@critter.freebsd.dk>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Jan 2017 10:26:18 -0000

On 16/01/2017 10:31, Poul-Henning Kamp wrote:
> --------
> In message <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org>, Julian Elischer
> writes:
>
>> Having said that, it would be trivial to add a 'caching' geom layer to
>> the system but that has never been needed.
>
> A tinker-toy-cache like that would be architecturally disgusting.
>
> The right solution would be to enable mmap(2)'ing of disk(-like)
> devices, leveraging the VM systems exsting code for caching and
> optimistic prefetch/clustering, including the very primitive
> cache-control/visibility offered by madvise(2), mincore(2), mprotect(2),
> msync(2) etc.
>
Enabling mmap(2) on devices would be nice, but it would also create 
problems with revoke(2). The revoke(2) syscall allows revoking access to 
open devices (e.g. a serial console). This is required to securely 
logout users. The existing file descriptors are marked as revoked an 
will return EIO on every access. How would you implement gracefully 
revoking mapped device memory? Killing all those processes with 
SIGBUS/SIGSEGV would keep the system secure, but it would be far from 
elegant.

From owner-freebsd-scsi@freebsd.org  Mon Jan 16 10:39:07 2017
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 23946CB0123
 for <freebsd-scsi@mailman.ysv.freebsd.org>;
 Mon, 16 Jan 2017 10:39:07 +0000 (UTC)
 (envelope-from aijazbaig1@gmail.com)
Received: from mail-wm0-x22b.google.com (mail-wm0-x22b.google.com
 [IPv6:2a00:1450:400c:c09::22b])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id A56BA10A5
 for <freebsd-scsi@freebsd.org>; Mon, 16 Jan 2017 10:39:06 +0000 (UTC)
 (envelope-from aijazbaig1@gmail.com)
Received: by mail-wm0-x22b.google.com with SMTP id r144so168663927wme.1
 for <freebsd-scsi@freebsd.org>; Mon, 16 Jan 2017 02:39:06 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=nnMt6KRa6h324EoE99dVfrDcJ3AMMwJ+lwYHwCTyxts=;
 b=CXX1qhhFUT9U9WFJdnpnNOcFgZKmou18K7UxonXQQa0tBOgmSJJE6d49nuDCUYbfhW
 FKmXR+IjXBez0Mnz0eJ8QocuV6KSjy5teV3g4sjC0pF4dxVoMq/Q4UMoPMRqTA+DEc8p
 VfTLugswTzmmvPrYct9gSIGLZpI9moDxJu4ReNZ1yVH9rwhhH75kLQf3fwdZ43KUAVrz
 DwiLGz9zVKkrTn80njeBa4MZxXMPXF5PYZBGi6kONNPAW/H2PyWRg889FY7cynqZQum/
 crrzJF8/VhIoopkPaPcoCCQKoiQJivNlsANv1u5omlQSba8prP4qcNHmM12nHyQHJpty
 FXRQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=nnMt6KRa6h324EoE99dVfrDcJ3AMMwJ+lwYHwCTyxts=;
 b=mL0XpqCzrhNODjTvXTKaZbSg9C1+gTLJLvrzE4oERcg9qxAsIdYAUM/a1IDQZaTIWP
 diTbHLzlrEFAEz4APEfrh6pf66y9BjrrWNVOtqSEEi1klp+5nTMLjIWEJ/Ai1oJVZuz8
 czeiSA7gKBRT1TfrD8IJsf5AJbuw2PDzH7lKVJAQV4aGtH3t/5DaN6iGoyJ/qttr9NPb
 5bxE0Tcwpoih4GDspLfPmj0+TZ/z+GKfqohsAKDxa8PYapMR1MNMyAXXPJ4nBrLWzPfs
 MxwMkmswLBvc3jp5G+Ild3q8Oaaucc621v27r9SIzfEJOwKmkDmDWBBaZUFBNL9+biv2
 RARw==
X-Gm-Message-State: AIkVDXJ22FEqHO4mswh7uIaHTIB9wca64Izjzr3xnhwoepi1p/lqQ+qgojvAdWX1104ALMfMvwHMeZt1LE9lFA==
X-Received: by 10.223.163.30 with SMTP id c30mr1046519wrb.40.1484563144356;
 Mon, 16 Jan 2017 02:39:04 -0800 (PST)
MIME-Version: 1.0
Received: by 10.195.12.46 with HTTP; Mon, 16 Jan 2017 02:39:03 -0800 (PST)
In-Reply-To: <3a76c14b-d3a1-755b-e894-2869cd42aeb6@rlwinm.de>
References: <CAHB2L+dRbX=E9NxGLd_eHsEeD0ZVYDYAx2k9h17BR0Lc=xu5HA@mail.gmail.com>
 <20170116071105.GB4560@eureka.lemis.com>
 <CAHB2L+d9=rBBo48qR+PXgy+JDa=VRk5cM+9hAKDCPW+rqFgZAQ@mail.gmail.com>
 <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org>
 <29469.1484559072@critter.freebsd.dk>
 <3a76c14b-d3a1-755b-e894-2869cd42aeb6@rlwinm.de>
From: Aijaz Baig <aijazbaig1@gmail.com>
Date: Mon, 16 Jan 2017 16:09:03 +0530
Message-ID: <CAHB2L+d1XG096SumiAk3VS7AE4cFLPfSCnCEjcWNXAeOxp2QCg@mail.gmail.com>
Subject: Re: Understanding the rationale behind dropping of "block devices"
To: Jan Bramkamp <crest@rlwinm.de>
Cc: freebsd-scsi@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.23
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Jan 2017 10:39:07 -0000

Oh thank you everyone for clearing the air a bit. Although for a noob like
myself, that was mighty concise!

Nevertheless, let me re-iterate what has been summarized in the last two
mails so I know I got exactly what was being said.

Let me begin that I come from the Linux world where there has traditionally
been two separate caches, the "buffer cache" and the "page cache" although
almost all IO is now driven through the "page cache". The buffer cache
still remains however it now only caches disk blocks (
https://www.quora.com/What-is-the-difference-between-Buffers-and-Cached-columns-in-proc-meminfo-output).
So 'read' and 'write' were satisfied through the buffer cache whereas
'fwrite/read', 'mmap' went through the page cache (which was actually
populated by reading the buffer cache thereby wasting almost twice the
memory and compute cycles). Hence the merging.

Nevertheless, as had been mentioned by Julian, it appears that there is no
"buffer cache" so to speak (is that correct Julian??)
> If you want  device M, at offset N we will fetch it for you from the
device, DMA'd directly into your address space, but there is no cached copy.

Instead it appears FreeBSD has a generic 'VM object' that is used to
address myriad entities including disks and as such all operations have to
go through the VM subsystem now. Does that also mean that there is no way
an application can directly use raw disks? At least it appears so
> The added complexity of carrying around two alternate interfaces to the
same devices was judged by those who did the work to be not worth the small
gain available to the very few people who used raw devices

Thank you for all your inputs and waiting to hear more! Al though a bit
more context would really help noobs (both to enterprise storage and
FreeBSD) like me!

On Mon, Jan 16, 2017 at 3:56 PM, Jan Bramkamp <crest@rlwinm.de> wrote:

> On 16/01/2017 10:31, Poul-Henning Kamp wrote:
>
>> --------
>> In message <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org>, Julian
>> Elischer
>> writes:
>>
>> Having said that, it would be trivial to add a 'caching' geom layer to
>>> the system but that has never been needed.
>>>
>>
>> A tinker-toy-cache like that would be architecturally disgusting.
>>
>> The right solution would be to enable mmap(2)'ing of disk(-like)
>> devices, leveraging the VM systems exsting code for caching and
>> optimistic prefetch/clustering, including the very primitive
>> cache-control/visibility offered by madvise(2), mincore(2), mprotect(2),
>> msync(2) etc.
>>
>> Enabling mmap(2) on devices would be nice, but it would also create
> problems with revoke(2). The revoke(2) syscall allows revoking access to
> open devices (e.g. a serial console). This is required to securely logout
> users. The existing file descriptors are marked as revoked an will return
> EIO on every access. How would you implement gracefully revoking mapped
> device memory? Killing all those processes with SIGBUS/SIGSEGV would keep
> the system secure, but it would be far from elegant.
> _______________________________________________
> freebsd-scsi@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"
>


-- 

Best Regards,
Aijaz Baig

From owner-freebsd-scsi@freebsd.org  Mon Jan 16 10:50:02 2017
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id DB674CB0312
 for <freebsd-scsi@mailman.ysv.freebsd.org>;
 Mon, 16 Jan 2017 10:50:02 +0000 (UTC)
 (envelope-from aijazbaig1@gmail.com)
Received: from mail-wm0-x22d.google.com (mail-wm0-x22d.google.com
 [IPv6:2a00:1450:400c:c09::22d])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 68F05142F
 for <freebsd-scsi@freebsd.org>; Mon, 16 Jan 2017 10:50:02 +0000 (UTC)
 (envelope-from aijazbaig1@gmail.com)
Received: by mail-wm0-x22d.google.com with SMTP id r126so153770035wmr.0
 for <freebsd-scsi@freebsd.org>; Mon, 16 Jan 2017 02:50:02 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=1roVqTRFFknK1IDO41k0DGcS7uBy1lMAi7MIWWJ5eHY=;
 b=RlJyASoJUNz4Xb528xPBt2boqfjLAl3VYZSEN/0Dukpz3jt9PdPx9ZLyHbvSLvcQjD
 jj9BypBu/saqAL5L1C0/L6m4kwssLfHFTC6ExmgC9Ko02UX4w741VTqa2tMVPJh4aIJR
 Fns7BlKcMqdaBjwAAt13GLXeDAybkUBe3myduVmpbAuSehu+mF26K4I5Jo4qrGiIbeEY
 uQPVZb62IU7l0dbnJZYGMrq6c10k8wLYKHvY3/AK0o1bkowc9RPKmdc4hQWgpTdy+5ZG
 6+UPHu5qY+2tajNQvQHixn23LBTK6lOQSq7KJzqTCZ8dF2VD4N7EuQ+6ONk8fHVp16op
 K2Og==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=1roVqTRFFknK1IDO41k0DGcS7uBy1lMAi7MIWWJ5eHY=;
 b=qJtYOde/niNWNmARKep1V1eFpROiJ9f3OqN5h2gcQDjnJjvhutz+GSP9TYULoT1IX0
 lRkGMvky+H4sE3P85bRgWmhcfodi5CCnFRCydYMvmGDy/gMNhNoDGJiez3Qz+Wl2Qdai
 3Ywp+8haN0ASnx4N2+sdk/bOdxQcEG5uIWgVED5RA0yeaftSXu2MWn57N/cmM5p2ZiJc
 Z34SLfxy2oO4MWNl+ZZnrI2TKmYnM72+JaZZTDLzvVw9IlgaQ3jIAOZk+v24zLqolJ7r
 0D0z+80G35zgVKCdfz05Jw5yWVc1/bTOgF2blvaOkW2Ib/I7atZBlspzlCf/gBVMgz02
 /SMA==
X-Gm-Message-State: AIkVDXKrXXbUEF6C31KHplGDT+zbd5hJJ/YmgLbcHztXWgM7WuKy48l5KDgvP8fxxw53LgTf6RFfz4S3DgoVAQ==
X-Received: by 10.28.72.3 with SMTP id v3mr12996653wma.20.1484563800225; Mon,
 16 Jan 2017 02:50:00 -0800 (PST)
MIME-Version: 1.0
Received: by 10.195.12.46 with HTTP; Mon, 16 Jan 2017 02:49:59 -0800 (PST)
In-Reply-To: <CAHB2L+d1XG096SumiAk3VS7AE4cFLPfSCnCEjcWNXAeOxp2QCg@mail.gmail.com>
References: <CAHB2L+dRbX=E9NxGLd_eHsEeD0ZVYDYAx2k9h17BR0Lc=xu5HA@mail.gmail.com>
 <20170116071105.GB4560@eureka.lemis.com>
 <CAHB2L+d9=rBBo48qR+PXgy+JDa=VRk5cM+9hAKDCPW+rqFgZAQ@mail.gmail.com>
 <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org>
 <29469.1484559072@critter.freebsd.dk>
 <3a76c14b-d3a1-755b-e894-2869cd42aeb6@rlwinm.de>
 <CAHB2L+d1XG096SumiAk3VS7AE4cFLPfSCnCEjcWNXAeOxp2QCg@mail.gmail.com>
From: Aijaz Baig <aijazbaig1@gmail.com>
Date: Mon, 16 Jan 2017 16:19:59 +0530
Message-ID: <CAHB2L+fxr0+PquAXtznMFkhr+VGEX+dY03EkyWM4Ef6qvMLoXQ@mail.gmail.com>
Subject: Re: Understanding the rationale behind dropping of "block devices"
To: Jan Bramkamp <crest@rlwinm.de>
Cc: freebsd-scsi@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.23
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Jan 2017 10:50:03 -0000

I must add that I am getting confused specifically between two different
things here:
>From the replies above it appears that all disk accesses have to go through
the VM subsystem now (so no raw disk accesses) however the arch handbook
says raw interfaces are the way to go for disks (
https://www.freebsd.org/doc/en/books/arch-handbook/driverbasics-block.html)?

Secondly, I presume that the VM subsystem has it's own caching and
buffering mechanism that is independent to the file system so an IO can
choose to skip  buffering at the file-system layer however it will still be
served by the VM cache irrespective of whatever the VM object maps to. Is
that true? I believe this is what is meant by 'caching' at the VM layer.

Any comments?

On Mon, Jan 16, 2017 at 4:09 PM, Aijaz Baig <aijazbaig1@gmail.com> wrote:

> Oh thank you everyone for clearing the air a bit. Although for a noob like
> myself, that was mighty concise!
>
> Nevertheless, let me re-iterate what has been summarized in the last two
> mails so I know I got exactly what was being said.
>
> Let me begin that I come from the Linux world where there has
> traditionally been two separate caches, the "buffer cache" and the "page
> cache" although almost all IO is now driven through the "page cache". The
> buffer cache still remains however it now only caches disk blocks (
> https://www.quora.com/What-is-the-difference-between-
> Buffers-and-Cached-columns-in-proc-meminfo-output). So 'read' and 'write'
> were satisfied through the buffer cache whereas 'fwrite/read', 'mmap' went
> through the page cache (which was actually populated by reading the buffer
> cache thereby wasting almost twice the memory and compute cycles). Hence
> the merging.
>
> Nevertheless, as had been mentioned by Julian, it appears that there is no
> "buffer cache" so to speak (is that correct Julian??)
> > If you want  device M, at offset N we will fetch it for you from the
> device, DMA'd directly into your address space, but there is no cached
> copy.
>
> Instead it appears FreeBSD has a generic 'VM object' that is used to
> address myriad entities including disks and as such all operations have to
> go through the VM subsystem now. Does that also mean that there is no way
> an application can directly use raw disks? At least it appears so
> > The added complexity of carrying around two alternate interfaces to the
> same devices was judged by those who did the work to be not worth the small
> gain available to the very few people who used raw devices
>
> Thank you for all your inputs and waiting to hear more! Al though a bit
> more context would really help noobs (both to enterprise storage and
> FreeBSD) like me!
>
> On Mon, Jan 16, 2017 at 3:56 PM, Jan Bramkamp <crest@rlwinm.de> wrote:
>
>> On 16/01/2017 10:31, Poul-Henning Kamp wrote:
>>
>>> --------
>>> In message <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org>, Julian
>>> Elischer
>>> writes:
>>>
>>> Having said that, it would be trivial to add a 'caching' geom layer to
>>>> the system but that has never been needed.
>>>>
>>>
>>> A tinker-toy-cache like that would be architecturally disgusting.
>>>
>>> The right solution would be to enable mmap(2)'ing of disk(-like)
>>> devices, leveraging the VM systems exsting code for caching and
>>> optimistic prefetch/clustering, including the very primitive
>>> cache-control/visibility offered by madvise(2), mincore(2), mprotect(2),
>>> msync(2) etc.
>>>
>>> Enabling mmap(2) on devices would be nice, but it would also create
>> problems with revoke(2). The revoke(2) syscall allows revoking access to
>> open devices (e.g. a serial console). This is required to securely logout
>> users. The existing file descriptors are marked as revoked an will return
>> EIO on every access. How would you implement gracefully revoking mapped
>> device memory? Killing all those processes with SIGBUS/SIGSEGV would keep
>> the system secure, but it would be far from elegant.
>> _______________________________________________
>> freebsd-scsi@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
>> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"
>>
>
>
>
> --
>
> Best Regards,
> Aijaz Baig
>


-- 

Best Regards,
Aijaz Baig

From owner-freebsd-scsi@freebsd.org  Mon Jan 16 11:00:16 2017
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 45B19CB0785;
 Mon, 16 Jan 2017 11:00:16 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id CA2151A35;
 Mon, 16 Jan 2017 11:00:15 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v0GB09Je011580
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Mon, 16 Jan 2017 13:00:09 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v0GB09Je011580
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id v0GB09BS011579;
 Mon, 16 Jan 2017 13:00:09 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Mon, 16 Jan 2017 13:00:09 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Julian Elischer <julian@freebsd.org>
Cc: Aijaz Baig <aijazbaig1@gmail.com>,
 "Greg 'groggy' Lehey" <grog@freebsd.org>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>, freebsd-scsi@freebsd.org
Subject: Re: Understanding the rationale behind dropping of "block devices"
Message-ID: <20170116110009.GN2349@kib.kiev.ua>
References: <CAHB2L+dRbX=E9NxGLd_eHsEeD0ZVYDYAx2k9h17BR0Lc=xu5HA@mail.gmail.com>
 <20170116071105.GB4560@eureka.lemis.com>
 <CAHB2L+d9=rBBo48qR+PXgy+JDa=VRk5cM+9hAKDCPW+rqFgZAQ@mail.gmail.com>
 <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Jan 2017 11:00:16 -0000

On Mon, Jan 16, 2017 at 05:20:25PM +0800, Julian Elischer wrote:
> On 16/01/2017 4:49 PM, Aijaz Baig wrote:
> > Oh yes I was actually running an old release inside a VM and yes I had
> > changed the device names myself while jotting down notes (to give it a more
> > descriptive name like what the OSX does). So now I've checked it on a
> > recent release and yes there is indeed no block device.
> >
> > root@bsd-client:/dev # gpart show
> > =>      34  83886013  da0  GPT  (40G)
> >          34      1024    1  freebsd-boot  (512K)
> >        1058  58719232    2  freebsd-ufs  (28G)
> >    58720290   3145728    3  freebsd-swap  (1.5G)
> >    61866018  22020029       - free -  (10G)
> >
> > root@bsd-client:/dev # ls -lrt da*
> > crw-r-----  1 root  operator  0x4d Dec 19 17:49 da0p1
> > crw-r-----  1 root  operator  0x4b Dec 19 17:49 da0
> > crw-r-----  1 root  operator  0x4f Dec 19 23:19 da0p3
> > crw-r-----  1 root  operator  0x4e Dec 19 23:19 da0p2
> >
> > So this shows that I have a single SATA or SAS drive and there are
> > apparently 3 partitions ( or is it four?? Why does it show unused space
> > when I had used the entire disk?)
> >
> > Nevertheless my question still holds. What does 'removing support for block
> > device' mean in this context? Was what I mentioned earlier with regards to
> > my understanding correct? Viz. all disk devices now have a character (or
> > raw) interface and are no longer served via the "page cache" but rather the
> > "buffer cache". Does that mean all disk accesses are now direct by passing
> > the file system??
> 
> Basically, FreeBSD never really buffered/cached by device.
> 
> Buffering and caching is done by vnode in the filesystem.
> We have no device-based block cache.  If you want file X at offset Y, 
> then we can satisfy that from cache.
> VM objects map closely to vnode objects so the VM system IS the file 
> system buffer cache.
This is not true.

We do have buffer cache of the blocks read through the device (special)
vnode.  This is how, typically, the metadata for filesystems which are
clients of the buffer cache, is handled, i.e. UFS msdosfs cd9600 etc.
It is up to the filesystem to not create aliased cached copies of the
blocks both in the device vnode buffer list and in the filesystem vnode.

In fact, sometimes filesystems, e.g. UFS, consciously break this rule
and read blocks of the user vnode through the disk cache.  For instance,
this happens for the SU truncation of the indirect blocks.

> If you want  device M, at offset N we will fetch it for you from the 
> device, DMA'd directly into your address space,
> but there is no cached copy.
> Having said that, it would be trivial to add a 'caching' geom layer to 
> the system but that has never been needed.
The useful interpretation of the claim that FreeBSD does not cache
disk blocks is that the cache is not accessible over the user-initiated
i/o (read(2) and write(2)) through the opened devfs nodes.  If a program
issues such request, it indeed goes directly to/from disk driver, which
is supplied a kernel buffer formed by remapped user pages.  Note that
if this device was or is mounted and filesystem kept some metadata in
the buffer cache, then the devfs i/o would make the cache inconsistent.

> The added complexity of carrying around two alternate interfaces to 
> the same devices was judged by those who did the work to be not worth 
> the small gain available to the very few people who used raw devices.
> Interestingly, since that time ZFS has implemented a block-layer cache 
> for itself which is of course not integrated with the non-existing 
> block level cache in the system :-).
We do carry two interfaces in the cdev drivers, which are lumped into
one. In particular, it is not easy to implement mapping of the block
devices exactly because the interfaces are mixed. If cdev disk device is
mapped, VM would try to use cdevsw d_mmap or later mapping interfaces to
handle user page faults, which is incorrect for the purpose of the disk
block mapping.

From owner-freebsd-scsi@freebsd.org  Mon Jan 16 11:04:48 2017
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4446FCB0A8F;
 Mon, 16 Jan 2017 11:04:48 +0000 (UTC)
 (envelope-from eugen@grosbein.net)
Received: from hz.grosbein.net (hz.grosbein.net [78.47.246.247])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "hz.grosbein.net", Issuer "hz.grosbein.net" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id D1E1E1111;
 Mon, 16 Jan 2017 11:04:47 +0000 (UTC)
 (envelope-from eugen@grosbein.net)
Received: from eg.sd.rdtc.ru (root@eg.sd.rdtc.ru [62.231.161.221])
 by hz.grosbein.net (8.15.2/8.15.2) with ESMTPS id v0GB4Zgl050449
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Mon, 16 Jan 2017 12:04:36 +0100 (CET)
 (envelope-from eugen@grosbein.net)
X-Envelope-From: eugen@grosbein.net
X-Envelope-To: julian@freebsd.org
Received: from [10.58.0.10] (dadvw [10.58.0.10])
 by eg.sd.rdtc.ru (8.15.2/8.15.2) with ESMTPS id v0GB4WYA046170
 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT);
 Mon, 16 Jan 2017 18:04:32 +0700 (KRAT)
 (envelope-from eugen@grosbein.net)
Subject: Re: Understanding the rationale behind dropping of "block devices"
To: Julian Elischer <julian@freebsd.org>, Aijaz Baig <aijazbaig1@gmail.com>,
 "Greg 'groggy' Lehey" <grog@freebsd.org>
References: <CAHB2L+dRbX=E9NxGLd_eHsEeD0ZVYDYAx2k9h17BR0Lc=xu5HA@mail.gmail.com>
 <20170116071105.GB4560@eureka.lemis.com>
 <CAHB2L+d9=rBBo48qR+PXgy+JDa=VRk5cM+9hAKDCPW+rqFgZAQ@mail.gmail.com>
 <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>, freebsd-scsi@freebsd.org
From: Eugene Grosbein <eugen@grosbein.net>
Message-ID: <587CA8BC.1070609@grosbein.net>
Date: Mon, 16 Jan 2017 18:04:28 +0700
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101
 Thunderbird/38.7.2
MIME-Version: 1.0
In-Reply-To: <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_00,LOCAL_FROM
 autolearn=no autolearn_force=no version=3.4.1
X-Spam-Report: * -2.3 BAYES_00 BODY: Bayes spam probability is 0 to 1%
 *      [score: 0.0000] *  2.6 LOCAL_FROM From my domains
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on hz.grosbein.net
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Jan 2017 11:04:48 -0000

16.01.2017 16:20, Julian Elischer wrote:

> If you want  device M, at offset N we will fetch it for you from the device, DMA'd directly into your address space,
> but there is no cached copy.
> Having said that, it would be trivial to add a 'caching' geom layer to the system but that has never been needed.

In fact, FreeBSD does have geom_cache/gcache(8) for long time. It is block-level read cache
passing write requests through transparently.

It is unmaintained, though and there were some reports that it is suspected to cause kernel panics
if there are more than one active GEOM_CACHE instances in a system.

> The added complexity of carrying around two alternate interfaces to the same devices was judged by those who did the work to be not worth the small gain available to the very few people who used raw devices.
> Interestingly, since that time ZFS has implemented a block-layer cache for itself which is of course not integrated with the non-existing block level cache in the system :-).


From owner-freebsd-scsi@freebsd.org  Mon Jan 16 11:15:43 2017
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 53CA2CB0FDE
 for <freebsd-scsi@mailman.ysv.freebsd.org>;
 Mon, 16 Jan 2017 11:15:43 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id CD0C717A8
 for <freebsd-scsi@freebsd.org>; Mon, 16 Jan 2017 11:15:42 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v0GBFate016035
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Mon, 16 Jan 2017 13:15:37 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v0GBFate016035
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id v0GBFaYu016034;
 Mon, 16 Jan 2017 13:15:36 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Mon, 16 Jan 2017 13:15:36 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Aijaz Baig <aijazbaig1@gmail.com>
Cc: Jan Bramkamp <crest@rlwinm.de>, freebsd-scsi@freebsd.org
Subject: Re: Understanding the rationale behind dropping of "block devices"
Message-ID: <20170116111536.GO2349@kib.kiev.ua>
References: <CAHB2L+dRbX=E9NxGLd_eHsEeD0ZVYDYAx2k9h17BR0Lc=xu5HA@mail.gmail.com>
 <20170116071105.GB4560@eureka.lemis.com>
 <CAHB2L+d9=rBBo48qR+PXgy+JDa=VRk5cM+9hAKDCPW+rqFgZAQ@mail.gmail.com>
 <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org>
 <29469.1484559072@critter.freebsd.dk>
 <3a76c14b-d3a1-755b-e894-2869cd42aeb6@rlwinm.de>
 <CAHB2L+d1XG096SumiAk3VS7AE4cFLPfSCnCEjcWNXAeOxp2QCg@mail.gmail.com>
 <CAHB2L+fxr0+PquAXtznMFkhr+VGEX+dY03EkyWM4Ef6qvMLoXQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAHB2L+fxr0+PquAXtznMFkhr+VGEX+dY03EkyWM4Ef6qvMLoXQ@mail.gmail.com>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Jan 2017 11:15:43 -0000

On Mon, Jan 16, 2017 at 04:19:59PM +0530, Aijaz Baig wrote:
> I must add that I am getting confused specifically between two different
> things here:
> >From the replies above it appears that all disk accesses have to go through
> the VM subsystem now (so no raw disk accesses) however the arch handbook
> says raw interfaces are the way to go for disks (
> https://www.freebsd.org/doc/en/books/arch-handbook/driverbasics-block.html)?
Do not mix the concept of raw disk access and using some VM code to
implement this access. See my other reply for some more explanation of
the raw disk access, physio in the kernel source files terminology,
sys/kern/kern_physio.c.

> 
> Secondly, I presume that the VM subsystem has it's own caching and
> buffering mechanism that is independent to the file system so an IO can
> choose to skip  buffering at the file-system layer however it will still be
> served by the VM cache irrespective of whatever the VM object maps to. Is
> that true? I believe this is what is meant by 'caching' at the VM layer.
First, the term page cache has different meaning in the kernel code,
and that page cache was removed from the kernel very recently.  More
correct but much longer term is 'page queue of the vm object'.  If
given vnode has a vm object associated with it, then buffer cache ensures
that buffers for the given chunk of the vnode data range are created from 
appropriately indexed pages from the queue.  This way, buffer cache becomes
consistent with the page queue.

The vm object is created on the first vnode open by filesystem-specific
code, at least for UFS/devfs/msdosfs/cd9600 etc.

Caching policy for buffers is determined both by buffer cache and by
(somewhat strong) hints from the filesystems interfacing with the cache.
The pages constituing the buffer are wired, i.e. VM subsystem is informed
by buffer cache to not reclaim pages while the buffer is alive.

VM page caching, i.e. storing them in the vnode page queue, is only
independent from the buffer cache when VM need/can handle something
that does not involve the buffer cache. E.g. on page fault in the
region backed by the file, VM allocates neccessary fresh (AKA without
valid content) pages and issues read request into the filesystem which
owns the vnode. It is up to the filesystem to implement read in any
reasonable way.

Until recently, UFS and other local filesystems provided raw disk block
indexes for the generic VM code which then read content from the disk
blocks into the pages.  This has its own shares of problems (but not
the consistency issue, since pages are allocated in the vnode vm
object page queue).  I changes that path to go through the buffer cache
explicitely several months ago.

But all this is highly depended on the filesystem.  As the polar case,
tmpfs reuses the swap-backed object, which holds the file data, as the
vnode' vm object.  The result is that paging requests from the tmpfs
mapped file is handled as if it is swap-backed anonymous memory.

ZFS cannot reuse vm object page queue for its very special cache ARC.
So it keeps the consistency between writes and mmaps by copying the
data on write(2) both into ARC buffer, and into the pages from vm object.

Hope this helps.

From owner-freebsd-scsi@freebsd.org  Tue Jan 17 11:45:41 2017
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E41B5CB4C02;
 Tue, 17 Jan 2017 11:45:41 +0000 (UTC)
 (envelope-from aijazbaig1@gmail.com)
Received: from mail-wm0-x242.google.com (mail-wm0-x242.google.com
 [IPv6:2a00:1450:400c:c09::242])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 70BD61104;
 Tue, 17 Jan 2017 11:45:41 +0000 (UTC)
 (envelope-from aijazbaig1@gmail.com)
Received: by mail-wm0-x242.google.com with SMTP id c85so5332970wmi.1;
 Tue, 17 Jan 2017 03:45:41 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=MLDYWwu3C7INMUeE2EMDxPoHzJWkqo8NYwQxuuAEVmc=;
 b=jPo8r0s8KEyY+RsOxWCPeWVtn6xuHj0JJRSXx9znjju92dvPw4WwALRFuE/5Z6KSdr
 oRIxMq2UOAuRTImqcblemYv1HtX+J72aIf0oWO22do76sQUWEIdC2OkXZIrUghYoyN1H
 3s6VAf4HnnTpMMHiopIAPAguyPYIx9KRegAUDUTHE6uro+n3XfkZbdSvMiBiQBvEiApd
 h0x0soDfvTmKKV+HCuP4g9VxnRkQhdUs+J9wHWzjFABISN8gGnd/3UwgLPM5cGMahRky
 xqaejkc9gf6ghZ02rGA2Xor2oKBLj/Z9IN9A+Mjj61xyQfIJbGroL/oqtoSrjyrruRum
 Eneg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=MLDYWwu3C7INMUeE2EMDxPoHzJWkqo8NYwQxuuAEVmc=;
 b=b/m0Y2o76bn54eEBpqWrhydzelQFB7PdHwEPd/eUvyyAoOeqrUC2Ff8S4OoFT5C3dj
 /5fzNqnCdlBIu9TkO+iRAu2gySz8JDrPU8X0oHuZjlZVr5lX2Ajta+As1EKb0P8HH64T
 PmA9DtZL6MQlk4O1xH3GLnh/eeC6mFKJMZx0rX5eZVGlXIU10ljnCevwRFfmij+qaXoH
 d9MlFZEBr3FZi4oFbWIxXhUui3Q/riF306zvocfEW6jlsQCp9SLF4AyiOowb5qSAnnyg
 3jSJJrRUSc8Z4oz73BF7eIZIi0bwdY+RJUufzxyEEK5FKssiMJ7EQfFI1btnZGbtpn3v
 tohQ==
X-Gm-Message-State: AIkVDXKYjPnAiBrWVEI1kL2PG1TnY8/gF3CH/SVyP809UPbO80q8Xj/uLABeNOJ+/br+nI2CbW287tLCPp/Qxw==
X-Received: by 10.223.166.106 with SMTP id k97mr30867636wrc.170.1484653539455; 
 Tue, 17 Jan 2017 03:45:39 -0800 (PST)
MIME-Version: 1.0
Received: by 10.195.12.46 with HTTP; Tue, 17 Jan 2017 03:45:38 -0800 (PST)
In-Reply-To: <20170116110009.GN2349@kib.kiev.ua>
References: <CAHB2L+dRbX=E9NxGLd_eHsEeD0ZVYDYAx2k9h17BR0Lc=xu5HA@mail.gmail.com>
 <20170116071105.GB4560@eureka.lemis.com>
 <CAHB2L+d9=rBBo48qR+PXgy+JDa=VRk5cM+9hAKDCPW+rqFgZAQ@mail.gmail.com>
 <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org>
 <20170116110009.GN2349@kib.kiev.ua>
From: Aijaz Baig <aijazbaig1@gmail.com>
Date: Tue, 17 Jan 2017 17:15:38 +0530
Message-ID: <CAHB2L+etmSEX419chBD7D8Tm2A6tLD23aTcZjy1qkZctrv8jhQ@mail.gmail.com>
Subject: Re: Understanding the rationale behind dropping of "block devices"
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: Julian Elischer <julian@freebsd.org>,
 "Greg 'groggy' Lehey" <grog@freebsd.org>, 
 FreeBSD Hackers <freebsd-hackers@freebsd.org>, freebsd-scsi@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.23
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Jan 2017 11:45:42 -0000

First of all, a very big thank you to Konstantin for such a detailed reply!
It has been a
pleasure to go through. My replies are inline

On Mon, Jan 16, 2017 at 4:30 PM, Konstantin Belousov <kostikbel@gmail.com>
wrote:

> This is not true.
>
> We do have buffer cache of the blocks read through the device (special)
> vnode.  This is how, typically, the metadata for filesystems which are
> clients of the buffer cache, is handled, i.e. UFS msdosfs cd9600 etc.
> It is up to the filesystem to not create aliased cached copies of the
> blocks both in the device vnode buffer list and in the filesystem vnode.
>
> In fact, sometimes filesystems, e.g. UFS, consciously break this rule
> and read blocks of the user vnode through the disk cache.  For instance,
> this happens for the SU truncation of the indirect blocks.
>
This makes a lot more sense now. This basically means that no matter
the underlying entity (for the vnode), be it a device special file or a
remote location it always has to go through the VFS layer. So as you
clearly said, the file-system has the sole discretion what to do with them.
And as example, UFS does break this rule by having caching disk blocks
which have already been cached by the VFS.

> > If you want  device M, at offset N we will fetch it for you from the
> > device, DMA'd directly into your address space,
> > but there is no cached copy.
> > Having said that, it would be trivial to add a 'caching' geom layer to
> > the system but that has never been needed.
> The useful interpretation of the claim that FreeBSD does not cache
> disk blocks is that the cache is not accessible over the user-initiated
> i/o (read(2) and write(2)) through the opened devfs nodes.  If a program
> issues such request, it indeed goes directly to/from disk driver, which
> is supplied a kernel buffer formed by remapped user pages.
So basically read(2) and write(2) calls on device nodes bypass the VFS
buffer cache as well and the IO indeed goes directly through the kernel
memory pages (which as you said are in fact remapped user-land pages)
So does that mean only the file-system code now uses the disk buffer
cache?

> Note that if this device was or is mounted and filesystem kept some
> metadata in the buffer cache, then the devfs i/o would make the
> cache inconsistent.
>
Device being mounted as a file-system you mean? Could you please
elaborate?

> > The added complexity of carrying around two alternate interfaces to
> > the same devices was judged by those who did the work to be not worth
> > the small gain available to the very few people who used raw devices.
> > Interestingly, since that time ZFS has implemented a block-layer cache
> > for itself which is of course not integrated with the non-existing
> > block level cache in the system :-).
> We do carry two interfaces in the cdev drivers, which are lumped into
> one. In particular, it is not easy to implement mapping of the block
> devices exactly because the interfaces are mixed.
By "mapping" of the block devices, you mean serving the IO intended for the
said disk blocks right? So as you said, we can either serve the IO via the
VFS
directly using buffer cache or we could do that via the file system cache

> If cdev disk device is mapped, VM would try to use cdevsw'd_mmap
> or later mapping interfaces to handle user page faults, which is incorrect
> for the purpose of the disk block mapping.
Could you please elaborate?

> > I must add that I am getting confused specifically between two different
> > things here:
> > >From the replies above it appears that all disk accesses have to go
through
> > the VM subsystem now (so no raw disk accesses) however the arch handbook
> > says raw interfaces are the way to go for disks (
> > https://www.freebsd.org/doc/en/books/arch-handbook/
driverbasics-block.html)?
> Do not mix the concept of raw disk access and using some VM code to
> implement this access. See my other reply for some more explanation of
> the raw disk access, physio in the kernel source files terminology,
> sys/kern/kern_physio.c.
>
yes I have taken a note of your earlier replies (thank you for being so
elaborate) and
as I have re-iterated earlier, I now understand that raw disk access is now
direct
between the kernel memory and the underlying device. So as you mentioned,
the
file system code (and perhaps only a few other entities) use the disk
buffer cache
that the VM implements. So an end user cannot interact with the buffer
cache in
any way is that what it is?

> > Secondly, I presume that the VM subsystem has it's own caching and
> > buffering mechanism that is independent to the file system so an IO can
> > choose to skip  buffering at the file-system layer however it will
still be
> > served by the VM cache irrespective of whatever the VM object maps to.
Is
> > that true? I believe this is what is meant by 'caching' at the VM layer.
> First, the term page cache has different meaning in the kernel code,
> and that page cache was removed from the kernel very recently.
> More correct but much longer term is 'page queue of the vm object'.  If
> given vnode has a vm object associated with it, then buffer cache ensures
> that buffers for the given chunk of the vnode data range are created from
> appropriately indexed pages from the queue.  This way, buffer cache
becomes
> consistent with the page queue.
> The vm object is created on the first vnode open by filesystem-specific
> code, at least for UFS/devfs/msdosfs/cd9600 etc.
I understand page cache as a cache implemented by the file system to speed
up IO access (at least this is what Linux defines it as). Does it have a
different
meaning in FreeBSD?

So a vnode is a VFS entity right? And I presume a VM object is any object
from
the perspective of the virtual memory subsystem. Since we no longer run in
real mode
isn't every vnode actually supposed to have an entity in the VM subsystem?
May be
I am not understanding what 'page cache' means in FreeBSD?

I mean every vnode in the VFS layer must have a backing VM object right?
May be only mmap(2)'ed device nodes don't have a backing VM object or do
they?
If this assumption is correct than I cannot get my mind around what you
mentioned
regarding buffer caches coming into play for vnodes *only* if it has a
backing vm object

>
> Caching policy for buffers is determined both by buffer cache and by
> (somewhat strong) hints from the filesystems interfacing with the cache.
> The pages constituting the buffer are wired, i.e. VM subsystem is informed
> by buffer cache to not reclaim pages while the buffer is alive.
>
> VM page caching, i.e. storing them in the vnode page queue, is only
> independent from the buffer cache when VM need/can handle something
> that does not involve the buffer cache. E.g. on page fault in the
> region backed by the file, VM allocates neccessary fresh (AKA without
> valid content) pages and issues read request into the filesystem which
> owns the vnode. It is up to the filesystem to implement read in any
> reasonable way.
>
> Until recently, UFS and other local filesystems provided raw disk block
> indexes for the generic VM code which then read content from the disk
> blocks into the pages.  This has its own shares of problems (but not
> the consistency issue, since pages are allocated in the vnode vm
> object page queue).  I changes that path to go through the buffer cache
> explicitely several months ago.
>
> But all this is highly depended on the filesystem.  As the polar case,
> tmpfs reuses the swap-backed object, which holds the file data, as the
> vnode' vm object.  The result is that paging requests from the tmpfs
> mapped file is handled as if it is swap-backed anonymous memory.
>
> ZFS cannot reuse vm object page queue for its very special cache ARC.
> So it keeps the consistency between writes and mmaps by copying the
> data on write(2) both into ARC buffer, and into the pages from vm object.
Well this is rather confusing (sorry again) may be too much detail for a
noob
like myself to appreciate at this stage of my journey.

Nevertheless to summarize  this, raw disk block access bypasses the
buffer cache (as you had so painstakingly explained about read(2) and
write(2) above) but is still cache by the VFS in the page queue. However
this
is also at the sole discretion of the file-system right?

To summarize, the page cache (or rather the page queue for a given vnode)
and the buffer cache are in fact separate entities although they are very
tightly coupled
for the most part except in cases like what you mentioned (about file
backed data).

So if we think of these as vertical layers, how would they look? From what
you talk
about page faults, it appears that the VM subsystem is apparently placed
above or
perhaps adjacent to the VFS layer Is that correct? Also about these caches,
the buffer cache is a global cache available to both the VM subsystem as
well as
the VFS layer whereas the page queue for the vnode is the responsibility of
the
underlying file-system. Is that true?

> Hope this helps.
Of course this has helped. AL though it has raised a lot more questions as
you can see
at least it has got me thinking in (hopefully) the right direction. Once
again a very big
thank you!!

Best Regards,
Aijaz Baig

From owner-freebsd-scsi@freebsd.org  Tue Jan 17 17:36:38 2017
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1310DCB43BE
 for <freebsd-scsi@mailman.ysv.freebsd.org>;
 Tue, 17 Jan 2017 17:36:38 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 0277A1AB2
 for <freebsd-scsi@FreeBSD.org>; Tue, 17 Jan 2017 17:36:38 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v0HHabar090637
 for <freebsd-scsi@FreeBSD.org>; Tue, 17 Jan 2017 17:36:37 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-scsi@FreeBSD.org
Subject: [Bug 204614] LOR In mpr(4)
Date: Tue, 17 Jan 2017 17:36:37 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: CURRENT
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Only Me
X-Bugzilla-Who: pete@nomadlogic.org
X-Bugzilla-Status: Closed
X-Bugzilla-Resolution: Unable to Reproduce
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: resolution bug_status
Message-ID: <bug-204614-5312-Fl7f3xuQ5L@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-204614-5312@https.bugs.freebsd.org/bugzilla/>
References: <bug-204614-5312@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Jan 2017 17:36:38 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D204614

pete@nomadlogic.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |Unable to Reproduce
             Status|New                         |Closed

--- Comment #3 from pete@nomadlogic.org ---
No longer have access to this system so closing to clean up queue.

--=20
You are receiving this mail because:
You are on the CC list for the bug.=

From owner-freebsd-scsi@freebsd.org  Wed Jan 18 11:46:32 2017
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E7119CB69A5;
 Wed, 18 Jan 2017 11:46:32 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 5E5681F3E;
 Wed, 18 Jan 2017 11:46:32 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v0IBkN08064452
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Wed, 18 Jan 2017 13:46:23 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v0IBkN08064452
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id v0IBkNQG064451;
 Wed, 18 Jan 2017 13:46:23 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 18 Jan 2017 13:46:23 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Aijaz Baig <aijazbaig1@gmail.com>
Cc: Julian Elischer <julian@freebsd.org>,
 "Greg 'groggy' Lehey" <grog@freebsd.org>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>, freebsd-scsi@freebsd.org
Subject: Re: Understanding the rationale behind dropping of "block devices"
Message-ID: <20170118114623.GF2349@kib.kiev.ua>
References: <CAHB2L+dRbX=E9NxGLd_eHsEeD0ZVYDYAx2k9h17BR0Lc=xu5HA@mail.gmail.com>
 <20170116071105.GB4560@eureka.lemis.com>
 <CAHB2L+d9=rBBo48qR+PXgy+JDa=VRk5cM+9hAKDCPW+rqFgZAQ@mail.gmail.com>
 <a86ad6f5-954d-62f0-fdb3-9480a13dc1c3@freebsd.org>
 <20170116110009.GN2349@kib.kiev.ua>
 <CAHB2L+etmSEX419chBD7D8Tm2A6tLD23aTcZjy1qkZctrv8jhQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAHB2L+etmSEX419chBD7D8Tm2A6tLD23aTcZjy1qkZctrv8jhQ@mail.gmail.com>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Jan 2017 11:46:33 -0000

On Tue, Jan 17, 2017 at 05:15:38PM +0530, Aijaz Baig wrote:
> First of all, a very big thank you to Konstantin for such a detailed reply!
> It has been a
> pleasure to go through. My replies are inline
> 
> On Mon, Jan 16, 2017 at 4:30 PM, Konstantin Belousov <kostikbel@gmail.com>
> wrote:
> 
> > This is not true.
> >
> > We do have buffer cache of the blocks read through the device (special)
> > vnode.  This is how, typically, the metadata for filesystems which are
> > clients of the buffer cache, is handled, i.e. UFS msdosfs cd9600 etc.
> > It is up to the filesystem to not create aliased cached copies of the
> > blocks both in the device vnode buffer list and in the filesystem vnode.
> >
> > In fact, sometimes filesystems, e.g. UFS, consciously break this rule
> > and read blocks of the user vnode through the disk cache.  For instance,
> > this happens for the SU truncation of the indirect blocks.
> >
> This makes a lot more sense now. This basically means that no matter
> the underlying entity (for the vnode), be it a device special file or a
> remote location it always has to go through the VFS layer. So as you
> clearly said, the file-system has the sole discretion what to do with them.
> And as example, UFS does break this rule by having caching disk blocks
> which have already been cached by the VFS.
I do not understand what the 'it' that has to go through the VFS layer.

> 
> > > If you want  device M, at offset N we will fetch it for you from the
> > > device, DMA'd directly into your address space,
> > > but there is no cached copy.
> > > Having said that, it would be trivial to add a 'caching' geom layer to
> > > the system but that has never been needed.
> > The useful interpretation of the claim that FreeBSD does not cache
> > disk blocks is that the cache is not accessible over the user-initiated
> > i/o (read(2) and write(2)) through the opened devfs nodes.  If a program
> > issues such request, it indeed goes directly to/from disk driver, which
> > is supplied a kernel buffer formed by remapped user pages.
> So basically read(2) and write(2) calls on device nodes bypass the VFS
> buffer cache as well and the IO indeed goes directly through the kernel
> memory pages (which as you said are in fact remapped user-land pages)
> So does that mean only the file-system code now uses the disk buffer
> cache?
Right now, in the tree, only filesystems calls into vfs_bio.c.

> 
> > Note that if this device was or is mounted and filesystem kept some
> > metadata in the buffer cache, then the devfs i/o would make the
> > cache inconsistent.
> >
> Device being mounted as a file-system you mean? Could you please
> elaborate?
Yes, device which carries a volume, and the volume is mounted.

> 
> > > The added complexity of carrying around two alternate interfaces to
> > > the same devices was judged by those who did the work to be not worth
> > > the small gain available to the very few people who used raw devices.
> > > Interestingly, since that time ZFS has implemented a block-layer cache
> > > for itself which is of course not integrated with the non-existing
> > > block level cache in the system :-).
> > We do carry two interfaces in the cdev drivers, which are lumped into
> > one. In particular, it is not easy to implement mapping of the block
> > devices exactly because the interfaces are mixed.
> By "mapping" of the block devices, you mean serving the IO intended for the
> said disk blocks right?
I mean, using mmap(2) interface on the file which references device special
node.

>
> So as you said, we can either serve the IO via the
> VFS
> directly using buffer cache or we could do that via the file system cache
> 
> > If cdev disk device is mapped, VM would try to use cdevsw'd_mmap
> > or later mapping interfaces to handle user page faults, which is incorrect
> > for the purpose of the disk block mapping.
> Could you please elaborate?
Read the code, I do not see much sense in rewording things that are
stated in the code.

> 
> > > I must add that I am getting confused specifically between two different
> > > things here:
> > > >From the replies above it appears that all disk accesses have to go
> through
> > > the VM subsystem now (so no raw disk accesses) however the arch handbook
> > > says raw interfaces are the way to go for disks (
> > > https://www.freebsd.org/doc/en/books/arch-handbook/
> driverbasics-block.html)?
> > Do not mix the concept of raw disk access and using some VM code to
> > implement this access. See my other reply for some more explanation of
> > the raw disk access, physio in the kernel source files terminology,
> > sys/kern/kern_physio.c.
> >
> yes I have taken a note of your earlier replies (thank you for being so
> elaborate) and
> as I have re-iterated earlier, I now understand that raw disk access is now
> direct
> between the kernel memory and the underlying device.
Such io is always direct between memory and device.  The differences is
in who owns the memory used for io, and how this is interpreted by
system.

> So as you mentioned,
> the
> file system code (and perhaps only a few other entities) use the disk
> buffer cache
> that the VM implements. So an end user cannot interact with the buffer
> cache in
> any way is that what it is?
This question does not make any sense.  Buffer cache is the kernel subsystem,
used as a library for other parts of the kernel.

> 
> > > Secondly, I presume that the VM subsystem has it's own caching and
> > > buffering mechanism that is independent to the file system so an IO can
> > > choose to skip  buffering at the file-system layer however it will
> still be
> > > served by the VM cache irrespective of whatever the VM object maps to.
> Is
> > > that true? I believe this is what is meant by 'caching' at the VM layer.
> > First, the term page cache has different meaning in the kernel code,
> > and that page cache was removed from the kernel very recently.
> > More correct but much longer term is 'page queue of the vm object'.  If
> > given vnode has a vm object associated with it, then buffer cache ensures
> > that buffers for the given chunk of the vnode data range are created from
> > appropriately indexed pages from the queue.  This way, buffer cache
> becomes
> > consistent with the page queue.
> > The vm object is created on the first vnode open by filesystem-specific
> > code, at least for UFS/devfs/msdosfs/cd9600 etc.
> I understand page cache as a cache implemented by the file system to speed
> up IO access (at least this is what Linux defines it as). Does it have a
> different
> meaning in FreeBSD?
I explicitely answered this question in advance, above.

> 
> So a vnode is a VFS entity right? And I presume a VM object is any object
> from
> the perspective of the virtual memory subsystem.
No, vm object is struct vm_object.

> Since we no longer run in
> real mode
> isn't every vnode actually supposed to have an entity in the VM subsystem?
> May be
> I am not understanding what 'page cache' means in FreeBSD?
At this point, I am not able to add any information to you.  Unless
you read the code, any further explanations would not provide any
useful sense.

> 
> I mean every vnode in the VFS layer must have a backing VM object right?
No.

> May be only mmap(2)'ed device nodes don't have a backing VM object or do
> they?
Device vnodes do have backing VM object, but they cannot be mapped.

> If this assumption is correct than I cannot get my mind around what you
> mentioned
> regarding buffer caches coming into play for vnodes *only* if it has a
> backing vm object
I never said this.

> 
> >
> > Caching policy for buffers is determined both by buffer cache and by
> > (somewhat strong) hints from the filesystems interfacing with the cache.
> > The pages constituting the buffer are wired, i.e. VM subsystem is informed
> > by buffer cache to not reclaim pages while the buffer is alive.
> >
> > VM page caching, i.e. storing them in the vnode page queue, is only
> > independent from the buffer cache when VM need/can handle something
> > that does not involve the buffer cache. E.g. on page fault in the
> > region backed by the file, VM allocates neccessary fresh (AKA without
> > valid content) pages and issues read request into the filesystem which
> > owns the vnode. It is up to the filesystem to implement read in any
> > reasonable way.
> >
> > Until recently, UFS and other local filesystems provided raw disk block
> > indexes for the generic VM code which then read content from the disk
> > blocks into the pages.  This has its own shares of problems (but not
> > the consistency issue, since pages are allocated in the vnode vm
> > object page queue).  I changes that path to go through the buffer cache
> > explicitely several months ago.
> >
> > But all this is highly depended on the filesystem.  As the polar case,
> > tmpfs reuses the swap-backed object, which holds the file data, as the
> > vnode' vm object.  The result is that paging requests from the tmpfs
> > mapped file is handled as if it is swap-backed anonymous memory.
> >
> > ZFS cannot reuse vm object page queue for its very special cache ARC.
> > So it keeps the consistency between writes and mmaps by copying the
> > data on write(2) both into ARC buffer, and into the pages from vm object.
> Well this is rather confusing (sorry again) may be too much detail for a
> noob
> like myself to appreciate at this stage of my journey.
> 
> Nevertheless to summarize  this, raw disk block access bypasses the
> buffer cache (as you had so painstakingly explained about read(2) and
> write(2) above) but is still cache by the VFS in the page queue. However
> this
> is also at the sole discretion of the file-system right?
> 
> To summarize, the page cache (or rather the page queue for a given vnode)
> and the buffer cache are in fact separate entities although they are very
> tightly coupled
> for the most part except in cases like what you mentioned (about file
> backed data).
> 
> So if we think of these as vertical layers, how would they look? From what
> you talk
> about page faults, it appears that the VM subsystem is apparently placed
> above or
> perhaps adjacent to the VFS layer Is that correct? Also about these caches,
> the buffer cache is a global cache available to both the VM subsystem as
> well as
> the VFS layer whereas the page queue for the vnode is the responsibility of
> the
> underlying file-system. Is that true?
> 
> > Hope this helps.
> Of course this has helped. AL though it has raised a lot more questions as
> you can see
> at least it has got me thinking in (hopefully) the right direction. Once
> again a very big
> thank you!!
> 
> Best Regards,
> Aijaz Baig

From owner-freebsd-scsi@freebsd.org  Sat Jan 21 03:04:20 2017
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 40B32CBA215;
 Sat, 21 Jan 2017 03:04:20 +0000 (UTC)
 (envelope-from aijazbaig1@gmail.com)
Received: from mail-pg0-x242.google.com (mail-pg0-x242.google.com
 [IPv6:2607:f8b0:400e:c05::242])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 132C51B1D;
 Sat, 21 Jan 2017 03:04:20 +0000 (UTC)
 (envelope-from aijazbaig1@gmail.com)
Received: by mail-pg0-x242.google.com with SMTP id 194so8281002pgd.0;
 Fri, 20 Jan 2017 19:04:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:content-transfer-encoding:message-id:date:subject:from
 :to:cc; bh=bJOdgRP6JH7kZiRo43OK3vrIWakt7Gu0nda9nKRecVQ=;
 b=SIfrxiiyklWKaDTzX33GMyXpdUjCDo4yYMLlVj6lMtu4WC+eeOAB0V7foRL4B90wyS
 Jozjp8vjZtWb7MA+BNkISxfuZmkekULrqk8tc/NqNxMbK7aubMzi7ElG2BrI2i8LeM5N
 QWAxEe9uvl814yFJI8K7Pvu9KyzqQQd+V4ZcMxIK5candNJZQn1oXRBL6pRbddmffqOi
 rhexdbJzViF576QXvgF6gAhxfsf874RSQEO0CXJUWMHWpMYTqVrt6vCrMvmnaTef7Rux
 UPikT82d0VJucAKRwQwcqEPC72Jp0HsGcwunutmwkPF5DyhWOnZsGiMvfhpmaZRJTESW
 3eTg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:content-transfer-encoding
 :message-id:date:subject:from:to:cc;
 bh=bJOdgRP6JH7kZiRo43OK3vrIWakt7Gu0nda9nKRecVQ=;
 b=LHKMLX2Z84ciK/sfgnStVGye3GQWtTssBtQ/8fJSOPMUyiGZ36LpBys8CTSSEMDNpC
 rDsdvh2o4lpNPWfWF1+oo1wz3H3QXslO/zLpW0S1Jr3OvIQj4cFg1ElqX7OwA3o5snv3
 ybn7cZGbMvUD8FMlNkDrVXYoAOUext78+TyjHNf5Z1xnggbw3CTVgECdkW6lXKq1xhvp
 DCU84zy6Yt9XFACk5n9WvrNq4TFG0WC056oSHUdE4PmXumAILnT3GC+dq8x1pfrbKVVM
 m7YaQFzgAURjTWlffIw2myXmE3HXRDzGMSEwJ9DdSEXMZGUjK/YgprbVpNA1YVKKwt5Q
 rW8w==
X-Gm-Message-State: AIkVDXKSPvSah/TRmvYWD788wiW47jQliqwN05IsksdHaixh8VGykunwv5qGP5lmafRiJw==
X-Received: by 10.98.95.70 with SMTP id t67mr17417170pfb.37.1484967859327;
 Fri, 20 Jan 2017 19:04:19 -0800 (PST)
Received: from [127.0.0.1] ([27.7.0.247])
 by smtp.gmail.com with ESMTPSA id s64sm19857248pfe.27.2017.01.20.19.04.15
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Fri, 20 Jan 2017 19:04:17 -0800 (PST)
X-Mailer: BlackBerry Email (10.3.3.2163)
Message-ID: <20170121030415.5111889.13690.2248@gmail.com>
Date: Sat, 21 Jan 2017 08:34:15 +0530
Subject: Re: Understanding the rationale behind dropping of "block devices"
From: Aijaz Baig <aijazbaig1@gmail.com>
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: Julian Elischer <julian@freebsd.org>,
 Greg 'groggy' Lehey <grog@freebsd.org>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>, freebsd-scsi@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.23
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 21 Jan 2017 03:04:20 -0000