From owner-freebsd-fs@freebsd.org  Sun May 15 10:48:47 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id F0580B3A5DC
 for <freebsd-fs@mailman.ysv.freebsd.org>; Sun, 15 May 2016 10:48:47 +0000 (UTC)
 (envelope-from s_sourceforge@nedprod.com)
Received: from mail.nedprod.com (europe4.nedproductions.biz [213.251.186.177])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
 bits)) (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id A09611822
 for <freebsd-fs@freebsd.org>; Sun, 15 May 2016 10:48:47 +0000 (UTC)
 (envelope-from s_sourceforge@nedprod.com)
Received: from authenticated-user (mail.nedprod.com [213.251.186.177])
 (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mail.nedprod.com (Postfix) with ESMTPSA id E9BC415467
 for <freebsd-fs@freebsd.org>; Sun, 15 May 2016 11:48:44 +0100 (BST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=nedprod.com; s=mail;
 t=1463309325; bh=UCsylhJme7/vwCo/gBDC2bPdEIyX9s3ntwKcYVkdCUM=;
 h=From:To:Date:Subject:In-reply-to:References:From;
 b=nZZCjxn/MGs771Jw7qvoWI8OBKbFd1gmqwOf2N9MJVgA4dHzsFbh/SqsjZr9ark0/
 2BLqd+OWU9Q8ryE2QghOnNG2qIjuE/0/JWuKMp+wj1RQEkDRpWvpNz+MTCYuT9+2qN
 Q/poWu5Quuct4YQvfNJmdj0jR4DkTBN0JfIQWSWbS9wM8Yzznqy6/fn+Bwkbc1fj/P
 1J4ND7eXpUqaMCKXO8ylnxpjX2TlL8soSUWZXMjOsbroIlG9tefcEdTgoFivWWLTZN
 9itG/GTO76HCuNkkOVxi5naPp9Ou7sTc+O4drGvX6HHVeBCAl484NzZ9+rTsofOM4X
 SjJCGeteY8gxg==
From: "Niall Douglas" <s_sourceforge@nedprod.com>
To: "freebsd-fs@FreeBSD.org" <freebsd-fs@freebsd.org>
Date: Sun, 15 May 2016 11:45:42 +0100
MIME-Version: 1.0
Subject: Re: State of native encryption in ZFS
Message-ID: <57385356.4525.E728971@s_sourceforge.nedprod.com>
Priority: normal
In-reply-to: <CAHM0Q_PGvBRbUFOhmin4RKaDKRTRJyjieuaZ5_tjPerK4eRz=w@mail.gmail.com>
References: <5736E7B4.1000409@gmail.com>,
 <57378707.19425.B54772B@s_sourceforge.nedprod.com>,
 <CAHM0Q_PGvBRbUFOhmin4RKaDKRTRJyjieuaZ5_tjPerK4eRz=w@mail.gmail.com>
X-PM-Encryptor: IDWSM-PM32, 4
Content-Type: multipart/signed; protocol="application/x-pkcs7-signature";
 micalg=sha1; boundary="SMime-=-=-Boundary-=-=-99E31796"
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 15 May 2016 10:48:48 -0000

--SMime-=-=-Boundary-=-=-99E31796
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: Quoted-printable
Content-Description: Mail message body

On 14 May 2016 at 16:09, K. Macy wrote:

> >> It=E2=80=99s not even clear ho=
w that encryption would be implemented or exposed.
> >>  Per pool?  Per dat=
aset?  Per folder?  Per file?  There have been
> >> requests for all of the=
 above at one time or another, and the key
> >> management challenges for e=
ach are different.  They can also be
> >> implemented at a layer above ZFS,=
 given sufficient interest.
> >
> > If FreeBSD had a bigger PATH_MAX then st=
ackable encryptions layers
> > like ecryptfs (encfs?) would be viable choic=
es. Because encrypted
> > path components are so long, one runs very rapidl=
y into the maximum
> > path on the system when PATH_MAX is so low.
> >
> > I =
ended up actually installing ZFS on Linux with ecryptfs on top to
> > solve=
 this. Every 15 minutes it ZFS snapshot syncs with the FreeBSD
> > edition.=
 This works very well, apart from the poor performance of ZFS
> > on Linux.
=
> >
> > ZFS handles long paths with ease. FreeBSD currently does not :(
> 
> =
AFAICT that's a 1 line patch. Have you tried patching that and
> rebuilding=
 kernel, world, and any vulnerable ports?

The problem is apparently kernel =
structure bloat and that they want 
to remove fixed maximum paths altogethe=
r so it would be boot 
modifiable.

http://freebsd.1045724.n5.nabble.com/misc=
-184340-PATH-MAX-not-interope
rable-with-Linux-td5864469.html

As laudable as=
 the latter goal is, unfortunately OS X and Linux hard 
code theirs, and mu=
ch POSIX software will use whatever PATH_MAX is 
set to. I'm therefore not =
sure the implementation cost is worth it.

In any case, a 1024 byte path lim=
it is just 256 unicode characters 
potentially. That's worse than Windows 9=
5 :(

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.bi=
z/ 
http://ie.linkedin.com/in/nialldouglas/


--SMime-=-=-Boundary-=-=-99E31796
Content-Type: application/x-pkcs7-signature; name=SMime.p7s
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename=SMime.p7s

MIIY1AYJKoZIhvcNAQcCoIIYxTCCGMECAQExCzAJBgUrDgMCGgUAMAsGCSqGSIb3
DQEHAaCCFYIwggY0MIIEHKADAgECAgEgMA0GCSqGSIb3DQEBBQUAMH0xCzAJBgNV
BAYTAklMMRYwFAYDVQQKEw1TdGFydENvbSBMdGQuMSswKQYDVQQLEyJTZWN1cmUg
RGlnaXRhbCBDZXJ0aWZpY2F0ZSBTaWduaW5nMSkwJwYDVQQDEyBTdGFydENvbSBD
ZXJ0aWZpY2F0aW9uIEF1dGhvcml0eTAeFw0wNzEwMjQyMTAyNTVaFw0xNzEwMjQy
MTAyNTVaMIGMMQswCQYDVQQGEwJJTDEWMBQGA1UEChMNU3RhcnRDb20gTHRkLjEr
MCkGA1UECxMiU2VjdXJlIERpZ2l0YWwgQ2VydGlmaWNhdGUgU2lnbmluZzE4MDYG
A1UEAxMvU3RhcnRDb20gQ2xhc3MgMiBQcmltYXJ5IEludGVybWVkaWF0ZSBDbGll
bnQgQ0EwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDLKIVFnAEs+xny
q6UzjCqgDcvQVe1dIoFnRsQPCFO+y92k8RK0Pn3MbQ2Gd+mehh9GBZ+36uUQA7Xj
9AGM6wgPhEE34vKtfpAN5tJ8LcFxveDObCKrL7O5UT9WsnAZHv7OYPYSR68mdmnE
nJ83M4wQgKO19b+Rt8sPDAz9ptkQsntCn4GeJzg3q2SVc4QJTg/WHo7wF2ah5LMO
eh8xJVSKGEmd6uPkSbj113yKMm8vmNptRPmM1+YgmVwcdOYJOjCgFtb2sOP79jji
8uhWR91xx7TpM1K3hv/wrBZwffrmmEpUeuXHRs07JqCCvFh9coKF4UQZvfEg+x3/
69xRCzb1AgMBAAGjggGtMIIBqTAPBgNVHRMBAf8EBTADAQH/MA4GA1UdDwEB/wQE
AwIBBjAdBgNVHQ4EFgQUrlWDb+wxyrn3HfqvazHzyB3jrLswHwYDVR0jBBgwFoAU
TgvvGqRAW6UXaYcwyjRoQ9BBrvIwZgYIKwYBBQUHAQEEWjBYMCcGCCsGAQUFBzAB
hhtodHRwOi8vb2NzcC5zdGFydHNzbC5jb20vY2EwLQYIKwYBBQUHMAKGIWh0dHA6
Ly93d3cuc3RhcnRzc2wuY29tL3Nmc2NhLmNydDBbBgNVHR8EVDBSMCegJaAjhiFo
dHRwOi8vd3d3LnN0YXJ0c3NsLmNvbS9zZnNjYS5jcmwwJ6AloCOGIWh0dHA6Ly9j
cmwuc3RhcnRzc2wuY29tL3Nmc2NhLmNybDCBgAYDVR0gBHkwdzB1BgsrBgEEAYG1
NwECATBmMC4GCCsGAQUFBwIBFiJodHRwOi8vd3d3LnN0YXJ0c3NsLmNvbS9wb2xp
Y3kucGRmMDQGCCsGAQUFBwIBFihodHRwOi8vd3d3LnN0YXJ0c3NsLmNvbS9pbnRl
cm1lZGlhdGUucGRmMA0GCSqGSIb3DQEBBQUAA4ICAQA6qScNyNO0FpHvaZTQacVM
XH33O51KyEKSRw3IvdQxRu31YR0ZDGdSfgSoOVDVMSBSdmfQfdDInHPzV3LO5DwU
XZ+lxjv7z3PO2OkfnFkvTXPfn6dxJ5rJveDsTsCPcJ/Kp6/+qN5g+J6D/SaYcFD0
18B6L42r0Z4VEBy36P4tjRtF14Ex10tl5tJFVKM16qWKQHbpjIgf73s49UB0CQ5l
HT2DHKfq3oPfdNc5Mk93w1v4ryVb+qVrZIej8NsrWU+5r4O2IV91edDb/OtHFddZ
qHFFXKgS79IHE/hwQ2LW7r3sTX7cDUCg+dfdwO8zeLxuwk2JF8crUoyrl66RGrRI
hT8VoG/OJ1Y9uUlOav69V4cG8upi4ZG2l7JZFbcBFk91Wp+Payo5SuF61CmGFrZ3
86umkmpObtFacXda2O/bVoQ9xHQrzoTc/0KZTWvlZCLK3Ke/vGYT9ZdW9lOjGsSF
bXrlTA919L84iMK+48WGnvRWY28ZaVHpql43AtEGhXze6iNCbEDACy+4hkQYOytA
qDgcxAnQ937mYpeZFPyz/XK9QSt9VNFMuudWxZwDDDJKoQAoSG59Hou9lZ26UrK6
0nRdAQBmEPL8h2nuWgoPh++XVQld9yuhbsWa39Pck8/lcfz5HUVGJF5mc/zk38iV
7FDlF68puiryNq2KXHEpOTCCB3kwggZhoAMCAQICAk++MA0GCSqGSIb3DQEBBQUA
MIGMMQswCQYDVQQGEwJJTDEWMBQGA1UEChMNU3RhcnRDb20gTHRkLjErMCkGA1UE
CxMiU2VjdXJlIERpZ2l0YWwgQ2VydGlmaWNhdGUgU2lnbmluZzE4MDYGA1UEAxMv
U3RhcnRDb20gQ2xhc3MgMiBQcmltYXJ5IEludGVybWVkaWF0ZSBDbGllbnQgQ0Ew
HhcNMTQwNzE5MDUyOTU4WhcNMTYwNzE4MjE1NTM5WjCBjjEZMBcGA1UEDRMQNjlS
SUc0ajZNN2ZpNTRURDELMAkGA1UEBhMCSUUxDTALBgNVBAgTBENvcmsxEzARBgNV
BAcTCktlcnJ5IFBpa2UxFjAUBgNVBAMTDU5pYWxsIERvdWdsYXMxKDAmBgkqhkiG
9w0BCQEWGXNfc291cmNlZm9yZ2VAbmVkcHJvZC5jb20wggIiMA0GCSqGSIb3DQEB
AQUAA4ICDwAwggIKAoICAQC0mleHTofGvJXwH9xAr0+IU5dTotN0BOF1W/vhVoOT
bvD0bxesFkPuemSopttKgc94p8FCgEqymbldJrki1cBsc73gODT4XHEigPktcSaF
a2jUxkRmL3gfnhEyQ2d7P+ujJCQcur+Ug1xcJjbpQ8eq1dPI6mznITdARqENYqA6
vhH/VNg2n80ksE5HiA1xx2Trd6synZplenahybHkf1pSlyTS9bKeuKi1awIkF/1w
QxsckB+ZBHdgPxT/RdFqE7aPF5+VSvbP2wEyieOCWDMCRG4mpsa0Ow54Ytdvf7za
6iGn8VWHwe8E85QpYzfp5RUGsScdo2vcpccLrGDMUDV3AZrcOWmE1r9oAvb3b0o1
4VY+ZE052arIPDpxYUOtpw2/rlxOGrdB1MemXuv2CQx2J2w0p6iOTeISB7xWtIi+
ZuCB5db62NnEh3txKvqDHCX8SYK6qE4PSrnHtb+ziCrYLkQ28lCWUPuwoamstLu0
B8ngNXEoOYuv8ADXc/OufLDrlPt7O0p0QvkEqIexBHCbjiohqFxqvxNxzYo20g5u
A3eMymI2F2XOYz/m+muqFYbfy+/2KXrsgjM8oZ5eUqeZES8zY91VH+Ps9ryo/jv/
un6f0FfwzAjO/PkizTxLc5NS138mNBGk/NpWYHCRiTb0A7WiXn2SnpUiGi+IWFyu
uQIDAQABo4IC3zCCAtswCQYDVR0TBAIwADALBgNVHQ8EBAMCBLAwHQYDVR0lBBYw
FAYIKwYBBQUHAwIGCCsGAQUFBwMEMB0GA1UdDgQWBBRpgKYvPXl8EYUnJmSNpjoT
f/OpKjAfBgNVHSMEGDAWgBSuVYNv7DHKufcd+q9rMfPIHeOsuzAkBgNVHREEHTAb
gRlzX3NvdXJjZWZvcmdlQG5lZHByb2QuY29tMIIBTAYDVR0gBIIBQzCCAT8wggE7
BgsrBgEEAYG1NwECAzCCASowLgYIKwYBBQUHAgEWImh0dHA6Ly93d3cuc3RhcnRz
c2wuY29tL3BvbGljeS5wZGYwgfcGCCsGAQUFBwICMIHqMCcWIFN0YXJ0Q29tIENl
cnRpZmljYXRpb24gQXV0aG9yaXR5MAMCAQEagb5UaGlzIGNlcnRpZmljYXRlIHdh
cyBpc3N1ZWQgYWNjb3JkaW5nIHRvIHRoZSBDbGFzcyAyIFZhbGlkYXRpb24gcmVx
dWlyZW1lbnRzIG9mIHRoZSBTdGFydENvbSBDQSBwb2xpY3ksIHJlbGlhbmNlIG9u
bHkgZm9yIHRoZSBpbnRlbmRlZCBwdXJwb3NlIGluIGNvbXBsaWFuY2Ugb2YgdGhl
IHJlbHlpbmcgcGFydHkgb2JsaWdhdGlvbnMuMDYGA1UdHwQvMC0wK6ApoCeGJWh0
dHA6Ly9jcmwuc3RhcnRzc2wuY29tL2NydHUyLWNybC5jcmwwgY4GCCsGAQUFBwEB
BIGBMH8wOQYIKwYBBQUHMAGGLWh0dHA6Ly9vY3NwLnN0YXJ0c3NsLmNvbS9zdWIv
Y2xhc3MyL2NsaWVudC9jYTBCBggrBgEFBQcwAoY2aHR0cDovL2FpYS5zdGFydHNz
bC5jb20vY2VydHMvc3ViLmNsYXNzMi5jbGllbnQuY2EuY3J0MCMGA1UdEgQcMBqG
GGh0dHA6Ly93d3cuc3RhcnRzc2wuY29tLzANBgkqhkiG9w0BAQUFAAOCAQEAhR1+
CDw7mNmPZUiu/pEteirAI75LpBVUhwuzuU9xfglwFfhAaNX9z95wP3qMphThpIWr
kLR4KkMEgHvJTTJ/3KVq0rnNEt2V3605SZDPPlVnt7MMBOlNN8aeClRP62W/GOXa
RBfO/w7k8yheUnD8OYtU51rFopIamQkRFXcqdZ0V1rUG0GLiPD1CuRevKop7ebcT
YzVFcO0aHFnW2qtn/4OX7W1gQka0pi9zUNXilqXApNjjWIenOtb44KXBFxEqJ7i/
EozUxRExWu7mov+geijuVVYxOT7N7zoQ9aWTJQVn6vNdGqmqZ5XcKtVXHLLFefhh
yTBqa0d2jJ4exZYC5TCCB8kwggWxoAMCAQICAQEwDQYJKoZIhvcNAQEFBQAwfTEL
MAkGA1UEBhMCSUwxFjAUBgNVBAoTDVN0YXJ0Q29tIEx0ZC4xKzApBgNVBAsTIlNl
Y3VyZSBEaWdpdGFsIENlcnRpZmljYXRlIFNpZ25pbmcxKTAnBgNVBAMTIFN0YXJ0
Q29tIENlcnRpZmljYXRpb24gQXV0aG9yaXR5MB4XDTA2MDkxNzE5NDYzNloXDTM2
MDkxNzE5NDYzNlowfTELMAkGA1UEBhMCSUwxFjAUBgNVBAoTDVN0YXJ0Q29tIEx0
ZC4xKzApBgNVBAsTIlNlY3VyZSBEaWdpdGFsIENlcnRpZmljYXRlIFNpZ25pbmcx
KTAnBgNVBAMTIFN0YXJ0Q29tIENlcnRpZmljYXRpb24gQXV0aG9yaXR5MIICIjAN
BgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAwYjbCbxsRnx4n5V7tTOQ8nJi1sE2
ICIkXs7pd/JDCqIGZKTMjjb4OOYj8G5tsTzdcqOFHKHTPbQzK9Mvr/7qsEFZZ7bE
Bn0KnnSF1nlMgDd63zkFUln39BtGQ6TShYXSw3HzdWI0uiyKfx6P7u000BHHls1S
Pboz1t1N3gs7SkufwiYv+rUWHHI1d8o8XebK4SaLGjZ2XAHbdBQl/u21oIgP3XjK
LR8HlzABLXJ5+kbWEyqouaarg0kd5fLv3eQBjhgKj2NTFoViqQ4ZOsy1ZqbCa3QH
5Cvhdj60bdj2ROFzYh87xL6gU1YlbFEJ96qryr92/W2b853bvz1mvAxWqq+YSJU6
S9+nWFDZOHWpW+pDDAL/mevobE1wWyllnN2qXcyvATHsDOvSjejqnHvmbvcnZgwa
SNduQuM/3iE+e+ENcPtjqqhsGlS0XCV6yaLJixamuyx+F14FTVhuEh0B7hIQDcYy
fxj//PT6zW6R6DZJvhpIaYvClk0aErJpF8EKkNb6eSJIv7p7afhwx/p6N9jYDdJ2
T1f/kLfjkdLd78Jgt2c63f6qnPDUi39yIs7Gn5e2+K+KoBCo2fsYxra1XFI8ibYZ
KnMBCg8DsxJg8novgdujbv8mMJf1i92JV7atPbOvK8W3dgLwpdYrmoYUKnL24zOM
XQlLE9+7jHQTUksCAwEAAaOCAlIwggJOMAwGA1UdEwQFMAMBAf8wCwYDVR0PBAQD
AgGuMB0GA1UdDgQWBBROC+8apEBbpRdphzDKNGhD0EGu8jBkBgNVHR8EXTBbMCyg
KqAohiZodHRwOi8vY2VydC5zdGFydGNvbS5vcmcvc2ZzY2EtY3JsLmNybDAroCmg
J4YlaHR0cDovL2NybC5zdGFydGNvbS5vcmcvc2ZzY2EtY3JsLmNybDCCAV0GA1Ud
IASCAVQwggFQMIIBTAYLKwYBBAGBtTcBAQEwggE7MC8GCCsGAQUFBwIBFiNodHRw
Oi8vY2VydC5zdGFydGNvbS5vcmcvcG9saWN5LnBkZjA1BggrBgEFBQcCARYpaHR0
cDovL2NlcnQuc3RhcnRjb20ub3JnL2ludGVybWVkaWF0ZS5wZGYwgdAGCCsGAQUF
BwICMIHDMCcWIFN0YXJ0IENvbW1lcmNpYWwgKFN0YXJ0Q29tKSBMdGQuMAMCAQEa
gZdMaW1pdGVkIExpYWJpbGl0eSwgcmVhZCB0aGUgc2VjdGlvbiAqTGVnYWwgTGlt
aXRhdGlvbnMqIG9mIHRoZSBTdGFydENvbSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0
eSBQb2xpY3kgYXZhaWxhYmxlIGF0IGh0dHA6Ly9jZXJ0LnN0YXJ0Y29tLm9yZy9w
b2xpY3kucGRmMBEGCWCGSAGG+EIBAQQEAwIABzA4BglghkgBhvhCAQ0EKxYpU3Rh
cnRDb20gRnJlZSBTU0wgQ2VydGlmaWNhdGlvbiBBdXRob3JpdHkwDQYJKoZIhvcN
AQEFBQADggIBABZsmfRmDDT10IVefQrs2hBOOBxe36YlBUuRMsHoO/E93UQJWwdJ
iinLZgK3sZr3JZgJPI4b4d02hytLu2jTOWY9oCbH8jmRHVGrgnt+1c5a5OIDV3Bp
lwj5XlimCt+MBppFFhY4Cl5X9mLHegIF5rwetfKe9Kkpg/iyFONuKIdEw5Aa3jip
PKxDTWRFzt0oqVzyc3sE+Bfoq7HzLlxkbnMxOhK4vLMR5H2PgVGaO42J9E2TZns8
A+3Tmh2a82VQ9aDQdZ8vr/DqgkOY+GmciXnEQ45GcuNkNhKv9yUeOImQd37Da2q5
w8tES6x4kIvnxyweSxFEyDRSJ80KXZ+FwYnVGnjylRBTMt2AhGZ12bVoKPthLr6E
qDjAmRKGpR5nZK0GLi+pcIXHlg98iWX1jkNUDqvdpYA5lGDANMmWcCyjEvUfSHu9
HH5rt52Q9CI7rvj8Ksr6glKg769LVZPrwbXwIousNE4mIgShhyx1SrflfRPXuAxk
wDbSyS+GEowjCcEbgjtzSaNqV4eU5dZ4xZlDY+NN4Hct4WWZcmkEGkcJ5g8BViT7
H78OealYLrnECQF+lbptAAY+supKEDnY0Cv1v+x1v5cCxQkbCNxVN+KB+zeEQ2Ig
yudWS2Xq/mzBJJMkoTTrBf+aIq6bfT/xZVEKpjBqs/SIHIAN/HKK6INeMYIDGjCC
AxYCAQEwgZMwgYwxCzAJBgNVBAYTAklMMRYwFAYDVQQKEw1TdGFydENvbSBMdGQu
MSswKQYDVQQLEyJTZWN1cmUgRGlnaXRhbCBDZXJ0aWZpY2F0ZSBTaWduaW5nMTgw
NgYDVQQDEy9TdGFydENvbSBDbGFzcyAyIFByaW1hcnkgSW50ZXJtZWRpYXRlIENs
aWVudCBDQQICT74wCQYFKw4DAhoFAKBdMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0B
BwEwHAYJKoZIhvcNAQkFMQ8XDTE2MDUxNTEwNDU0MlowIwYJKoZIhvcNAQkEMRYE
FF7ls6ViN6WT1Qn+5+CYM95Mcl3FMA0GCSqGSIb3DQEBAQUABIICAJDX9JRCC6i+
lD17VA+YTwwMw4V5LXB7NnJNZ/TKFcd4Fn66+D34Jw5H4x79JYKUH1X8QhLF+tmJ
viLlOc6iotuZDgxWJe7H3RV3Tbh0YT7KLZgCpNS3DSW/ttPJnoWaAt5Twl7S0z8+
1IZvxHuYS1C0vybnJ+FR4GLpDwIFT6Tpn5vaR1Y+BzlSZhhnwets0GusTnD8eg2g
cR3A7sXeWcYQrbSBQ4djnBriXMqOGh4/iWdi1GEg25SZ0UTKZrSwTcBCO5yztmSF
/0KekHnqjcpAabpxtpopVeK3GAb9Kg4YqHPU2viF6wug7NTxyCO5oTTnd2DK3CVq
Y5zVe1Ycl4a1pa4NQ45RGefebBJlPO6jpvHFi7zY19bcgslqzGYMEfJM9paMlnYn
S+UbUPcwKV3WHyDy3iaJZ4P510KiYeNATbf/nUzYY6sjdI+PFKsHzWQ1Q83hIYb1
2dYg9n+SVYKFVq1QzArLvFJe2GLTyJXLTxWLb+wbSftZl6Q3IfqiJ5pw8ol/t3jY
6mffUm8a2gkQVEzcIACAyqVtu/WQXmpN7Q2JGd68zjna35nx7dbQ1oWvCP2e9ztd
e1BsbsE+TfZsB2NES7m8z5V+2GYXUfjnlzMP/qLD3OIhbZYifhj750aJWA6v9pUg
jG4ehQcsmznQf3B2080lZKB8bDyDxps7

--SMime-=-=-Boundary-=-=-99E31796--

From owner-freebsd-fs@freebsd.org  Sun May 15 13:42:51 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C0D37B3B7BC;
 Sun, 15 May 2016 13:42:51 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 73E7E1099;
 Sun, 15 May 2016 13:42:46 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA04848;
 Sun, 15 May 2016 16:38:02 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1b1wF8-000GHJ-8t; Sun, 15 May 2016 16:38:02 +0300
To: freebsd-arch@FreeBSD.org, freebsd-fs <freebsd-fs@FreeBSD.org>
From: Andriy Gapon <avg@FreeBSD.org>
Subject: mount / unmount and mountcheckdirs()
Message-ID: <5c01bf62-b7b2-2e1d-bca5-859e6bf1f0e5@FreeBSD.org>
Date: Sun, 15 May 2016 16:37:05 +0300
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 15 May 2016 13:42:51 -0000


I am curious about the purpose of mountcheckdirs() called when mounting and
unmounting a filesystem.

The function is described as such:
/*
 * Scan all active processes and prisons to see if any of them have a current
 * or root directory of `olddp'. If so, replace them with the new mount point.
 */
and it seems to be used to "lift" processes and jails to a root of a new
filesystem when it is mounted and to "lower" them onto a covered vnode (if any)
when a filesystem is unmounted.

What's the purpose of those actions?
It's strange that the machinations are done at all, but it is stranger that they
are applied only to processes and jails at exactly a covered vnode and a root
vnode.  Anything below in a filesystem's tree is left alone.  Is there anything
so very special about being at exactly those points?

IMO, the machinations can have unexpected security consequences.

A little bit of history.
mountcheckdirs() was first added in r22521 (circa 1997) as checkdirs with a
rather non-specific commit message.  Initially it was used only when a
filesystem was mounted.
Then, in r73241 (circa 2002) the function was added to dounmount():
    The checkdirs() function is called at mount time to find any process
    fd_cdir or fd_rdir pointers referencing the covered mountpoint
    vnode. It transfers these to point at the root of the new filesystem.
    However, this process was not reversed at unmount time, so processes
    with a cwd/root at a mount point would unexpectedly lose their
    cwd/root following a mount-unmount cycle at that mountpoint.
...
    Dounmount() now undoes the actions
    taken by checkdirs() at mount time; any process cdir/rdir pointers
    that reference the root vnode of the unmounted filesystem are
    transferred to the now-uncovered vnode.


-- 
Andriy Gapon

From owner-freebsd-fs@freebsd.org  Sun May 15 16:53:37 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D384CB3C990;
 Sun, 15 May 2016 16:53:37 +0000 (UTC)
 (envelope-from mjguzik@gmail.com)
Received: from mail-wm0-x244.google.com (mail-wm0-x244.google.com
 [IPv6:2a00:1450:400c:c09::244])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 7DEA9120F;
 Sun, 15 May 2016 16:53:37 +0000 (UTC)
 (envelope-from mjguzik@gmail.com)
Received: by mail-wm0-x244.google.com with SMTP id e201so13358314wme.2;
 Sun, 15 May 2016 09:53:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:in-reply-to:user-agent;
 bh=Y1hB2QtQw86g5Esi718zi9W2Rdhm8nQI48Gzl+oxQLo=;
 b=fXlhRU7dtzRsrz0aQqMjrJsYvzPVwaMBFS7+rbJgQjGr2ckEKqn6bZF6XcdC9abSIl
 OJYE+NvQkHc2C80ziEdPrOHABDXweIN7oCMgKrTGufI/emJ4Qe2Tk64vb+KmjqqS2JDG
 I4eREXCsZP3kvcPbIspl5+F4fx2cdX4E1NnLAi1mKmiCvb++mkpv4WUAAiuVOlnoOh7j
 15mXgYkSmCbwz43F9RvfOJFznvSSSZycGZhrO+WbCEZkVdwPE6X7BdfBoQVMititnuZR
 /mffZVtKvCMzaaUJ7aakyzC3AL+9yllMeshI4w2TQMasoQRxJCCaoSKJ48WoIHQMKKxW
 EVJQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:in-reply-to:user-agent;
 bh=Y1hB2QtQw86g5Esi718zi9W2Rdhm8nQI48Gzl+oxQLo=;
 b=EPMHiYsTPvh0DqxUHK+GNfKGps0OOVSfjjXDbj5Kza9LPBhGBLzxAWCa7pI2WtEJuH
 HO/THMTsBJB2KFfqLitdCZyXiufpmtpRWZmoXrIOJXqH2XwPqH7EC8/xpNihABPfUosZ
 itfN3bdhwdrEvDdV6+z3DKizDdcB9ISoXZBmRVglyJxiPUMpv0c262NlCwuNboIZQrs9
 XPr4Kaqs7DiWX7bzO4UgDyEEGUg9ZW323u0SaxcZTbaY+EWnsn9TH5W39SKvi/TYgBHV
 6DqWwaYfa7Obttb7MA64PQf2QG4FvRFd2I5fvRiAp3/Z1v2H/S+KUtZhD2QKE97RKhI0
 9umg==
X-Gm-Message-State: AOPr4FVvul2R3LpF1VutAwX2UMkvpoR32Rpyqxsf02LnkAOkwHQm9YYzEcLL84Pgom9tVQ==
X-Received: by 10.194.234.167 with SMTP id uf7mr25495264wjc.122.1463331216058; 
 Sun, 15 May 2016 09:53:36 -0700 (PDT)
Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net.
 [2001:470:1f08:1f7::2])
 by smtp.gmail.com with ESMTPSA id c194sm13990641wme.9.2016.05.15.09.53.34
 (version=TLS1_2 cipher=AES128-SHA bits=128/128);
 Sun, 15 May 2016 09:53:34 -0700 (PDT)
Date: Sun, 15 May 2016 18:53:32 +0200
From: Mateusz Guzik <mjguzik@gmail.com>
To: Andriy Gapon <avg@FreeBSD.org>
Cc: freebsd-arch@FreeBSD.org, freebsd-fs <freebsd-fs@FreeBSD.org>
Subject: Re: mount / unmount and mountcheckdirs()
Message-ID: <20160515165332.GA27836@dft-labs.eu>
References: <5c01bf62-b7b2-2e1d-bca5-859e6bf1f0e5@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <5c01bf62-b7b2-2e1d-bca5-859e6bf1f0e5@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 15 May 2016 16:53:37 -0000

On Sun, May 15, 2016 at 04:37:05PM +0300, Andriy Gapon wrote:
> 
> I am curious about the purpose of mountcheckdirs() called when mounting and
> unmounting a filesystem.
> 
> The function is described as such:
> /*
>  * Scan all active processes and prisons to see if any of them have a current
>  * or root directory of `olddp'. If so, replace them with the new mount point.
>  */
> and it seems to be used to "lift" processes and jails to a root of a new
> filesystem when it is mounted and to "lower" them onto a covered vnode (if any)
> when a filesystem is unmounted.
> 
> What's the purpose of those actions?
> It's strange that the machinations are done at all, but it is stranger that they
> are applied only to processes and jails at exactly a covered vnode and a root
> vnode.  Anything below in a filesystem's tree is left alone.  Is there anything
> so very special about being at exactly those points?
> 
> IMO, the machinations can have unexpected security consequences.
> 

I don't know why this was implemented. It is also being done in NetBSD.
It is not done in Solaris nor Linux.

Replacement is buggy in at least 2 ways:
1. the process vs jail vnode replacement leaves a time window where
these 2 don't match, which screws up with the look up
2. on fork we can have a 'struct filedesc' object copied but not
assigned to the new process yet, so it ends up with the old vnode

And indeed, interested parties still have access to old vnodes by means
of having a file descriptor.

That said, this likely needs to be simply changed to /deny/ mount
operations which would alter jail roots.

-- 
Mateusz Guzik <mjguzik gmail.com>

From owner-freebsd-fs@freebsd.org  Sun May 15 21:00:05 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3F59EB3CB0C
 for <freebsd-fs@mailman.ysv.freebsd.org>; Sun, 15 May 2016 21:00:05 +0000 (UTC)
 (envelope-from bugzilla-noreply@FreeBSD.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 1E4B11A3B
 for <freebsd-fs@FreeBSD.org>; Sun, 15 May 2016 21:00:05 +0000 (UTC)
 (envelope-from bugzilla-noreply@FreeBSD.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u4FL01nx037409
 for <freebsd-fs@FreeBSD.org>; Sun, 15 May 2016 21:00:04 GMT
 (envelope-from bugzilla-noreply@FreeBSD.org)
Message-Id: <201605152100.u4FL01nx037409@kenobi.freebsd.org>
From: bugzilla-noreply@FreeBSD.org
To: freebsd-fs@FreeBSD.org
Subject: Problem reports for freebsd-fs@FreeBSD.org that need special attention
Date: Sun, 15 May 2016 21:00:04 +0000
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 15 May 2016 21:00:05 -0000

To view an individual PR, use:
  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id).

The following is a listing of current problems submitted by FreeBSD users,
which need special attention. These represent problem reports covering
all versions including experimental development code and obsolete releases.

Status      |    Bug Id | Description
------------+-----------+---------------------------------------------------
New         |    203492 | mount_unionfs -o below causes panic               
Open        |    136470 | [nfs] Cannot mount / in read-only, over NFS       
Open        |    139651 | [nfs] mount(8): read-only remount of NFS volume d 
Open        |    144447 | [zfs] sharenfs fsunshare() & fsshare_main() non f 

4 problems total for which you should take action.

From owner-freebsd-fs@freebsd.org  Mon May 16 05:02:37 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 22A89B3CA4C
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 05:02:37 +0000 (UTC)
 (envelope-from kmacybsd@gmail.com)
Received: from mail-ig0-x236.google.com (mail-ig0-x236.google.com
 [IPv6:2607:f8b0:4001:c05::236])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id E52101734
 for <freebsd-fs@freebsd.org>; Mon, 16 May 2016 05:02:36 +0000 (UTC)
 (envelope-from kmacybsd@gmail.com)
Received: by mail-ig0-x236.google.com with SMTP id s8so34516387ign.0
 for <freebsd-fs@freebsd.org>; Sun, 15 May 2016 22:02:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:date:message-id:subject:from:to;
 bh=aQULUxC7HkCczlSVOfSxpyJBn9o6BXP3zyhQUQm9NWQ=;
 b=rZHtwISIrxQyMTbfgM2EP3fm9tgrgu/p13W5sMpPGl/44bl43fbEnA8yr6AZM7AbJR
 ZYFldykzwHLgE2Kr8QIxGQvrTgqAxAfqXISM+DFkP/YFCU3j5fvYV2E9Ec0kJBRVOt1t
 NJnESfWLm1R+/dI/BC/iIiEt5J7iTAYrEQj1jNIO3XdXh2EzsdvHxcyGvQFJQpIGtv2/
 KZPK6MLyKoJyEcs5Eaplx3o9LZvPYo+pnNd+RUv7AnHLB35QTGO2nPQC0mCXYYkELCIA
 oDbvX+jNnlGPncXqZgKrnDFWBxH1fYcL/oFyCP5bzZd165U7ALvVd/aqg3KqgW1QPRRZ
 /K3w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:sender:date:message-id:subject:from
 :to; bh=aQULUxC7HkCczlSVOfSxpyJBn9o6BXP3zyhQUQm9NWQ=;
 b=FfXRTIIk/m7az7oXnZr6PXkyzK/+1+i+jtgltIcZD7BRtAtiDhTXv90awwGWRweh2R
 5nYAY+r4xhv8pPTAhdBvlslAMdPIsl4cO77nrq+XmU9eTE84ds7xlZPe/4OWFRH+rP3/
 sx8xW3LfYiM62NfGqnrfNLMhqEISg0p+jddaN3B55+Ec7HJjflYDyzhbArSkP3zVLFTZ
 7vQ4VXrJ67pc1IgpuAagMhq4rS8f1jH0U0qV9CDSLGzBgxFuZ1i7mCV7NCpsaA+/NatQ
 O6Xz7a/Vmqtw1TJn93ZeMKni5iU6o9daszhDpfBaEfdkd1zN3qS9qX0RyM2iexyQFE7D
 dXjQ==
X-Gm-Message-State: AOPr4FUYncgdLJbJfJhci+2nHzLYOXFr9ptV5uXbHEnbor8tQ2PZ2S4QNjDMb2SinG46cTI7GP2Lq2xqsTbWVQ==
MIME-Version: 1.0
X-Received: by 10.50.225.165 with SMTP id rl5mr8705970igc.63.1463374956447;
 Sun, 15 May 2016 22:02:36 -0700 (PDT)
Sender: kmacybsd@gmail.com
Received: by 10.107.140.8 with HTTP; Sun, 15 May 2016 22:02:36 -0700 (PDT)
Date: Sun, 15 May 2016 22:02:36 -0700
X-Google-Sender-Auth: upTJS6qsvDDeQg4zNR9J5wE5uUQ
Message-ID: <CAHM0Q_OYGmRmG39v=688f7f8ARs7avSZHBk8y-BSJkfzzYUzdA@mail.gmail.com>
Subject: bug in umass?
From: "K. Macy" <kmacy@freebsd.org>
To: Hans Petter Selasky <hps@selasky.org>,
 "freebsd-fs@FreeBSD.org" <freebsd-fs@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 05:02:37 -0000

I'm not able to complete a coredump in i915 to a USB key. The
backtrace in the log looks like a bug in umass.


May 15 21:57:10 beastie kernel: cmap[0]=0 cmap[1]=7f0000 cmap[2]=7f00
cmap[3]=c4a000
May 15 21:57:10 beastie kernel: end FB_INFO
May 15 21:57:10 beastie kernel: drmn0: fb0: inteldrmfb frame buffer device
May 15 21:57:10 beastie kernel: ..3%
May 15 21:58:16 beastie syslogd: kernel boot file is /boot/kernel/kernel
May 15 21:58:16 beastie kernel: trap_fatal() at trap_fatal+0x2d/frame
0xfffffe01e2fd6350
May 15 21:58:16 beastie kernel: trap() at trap+0xc48/frame 0xfffffe01e2fd6690
May 15 21:58:16 beastie kernel: trap_check() at trap_check+0x4a/frame
0xfffffe01e2fd66b0
May 15 21:58:16 beastie kernel: calltrap() at calltrap+0x8/frame
0xfffffe01e2fd66b0
May 15 21:58:16 beastie kernel: --- trap 0x9, rip =
0xffffffff80f5a950, rsp = 0xfffffe01e2fd6780, rbp = 0xfffffe01e2fd6810
---
May 15 21:58:16 beastie kernel: __mtx_lock_flags() at
__mtx_lock_flags+0xd0/frame 0xfffffe01e2fd6810
May 15 21:58:16 beastie kernel: xpt_done_process() at
xpt_done_process+0x495/frame 0xfffffe01e2fd68c0
May 15 21:58:16 beastie kernel: xpt_done_td() at
xpt_done_td+0x1c0/frame 0xfffffe01e2fd6930
May 15 21:58:16 beastie kernel: fork_exit() at fork_exit+0x13b/frame
0xfffffe01e2fd69b0
May 15 21:58:16 beastie kernel: fork_trampoline() at
fork_trampoline+0xe/frame 0xfffffe01e2fd69b0
May 15 21:58:16 beastie kernel: --- trap 0, rip = 0, rsp = 0, rbp = 0 ---

From owner-freebsd-fs@freebsd.org  Mon May 16 06:24:43 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 279F8B3D104
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 06:24:43 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 18C361013
 for <freebsd-fs@FreeBSD.org>; Mon, 16 May 2016 06:24:43 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u4G6OfFP038646
 for <freebsd-fs@FreeBSD.org>; Mon, 16 May 2016 06:24:42 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 207464] Panic when destroying ZFS snapshot
Date: Mon, 16 May 2016 06:24:41 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.2-STABLE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: commit-hook@freebsd.org
X-Bugzilla-Status: Open
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: avg@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-207464-3630-OZ68IfJFl0@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-207464-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-207464-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 06:24:43 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D207464

--- Comment #24 from commit-hook@freebsd.org ---
A commit references this bug:

Author: avg
Date: Mon May 16 06:24:05 UTC 2016
New revision: 299900
URL: https://svnweb.freebsd.org/changeset/base/299900

Log:
  zfsctl: fix several problems with reference counts

  * Remove excessive references on a snapshot mountpoint vnode.
    zfsctl_snapdir_lookup() called VN_HOLD() on a vnode returned from
    zfsctl_snapshot_mknode() and the latter also had a call to VN_HOLD()
    on the same vnode.
    On top of that gfs_dir_create() already returns the vnode with the
    use count of 1 (set in getnewvnode).
    So there was 3 references on the vnode.

  * mount_snapshot() should keep a reference to a covered vnode.
    That reference is owned by the mountpoint (mounted snapshot filesystem).

  * Remove cryptic manipulations of a covered vnode in zfs_umount().
    FreeBSD dounmount() already does the right thing and releases the cover=
ed
    vnode.

  PR:           207464
  Reported by:  dustinwenz@ebureau.com
  Tested by:    Howard Powell <hpowell@lighthouseinstruments.com>
  MFC after:    3 weeks

Changes:
  head/sys/cddl/compat/opensolaris/kern/opensolaris_vfs.c
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c

--=20
You are receiving this mail because:
You are on the CC list for the bug.=

From owner-freebsd-fs@freebsd.org  Mon May 16 06:25:11 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 48D12B3D146
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 06:25:11 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3A125109A
 for <freebsd-fs@FreeBSD.org>; Mon, 16 May 2016 06:25:11 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u4G6PA2G039629
 for <freebsd-fs@FreeBSD.org>; Mon, 16 May 2016 06:25:11 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 207464] Panic when destroying ZFS snapshot
Date: Mon, 16 May 2016 06:25:10 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.2-STABLE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: avg@FreeBSD.org
X-Bugzilla-Status: In Progress
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: avg@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_status
Message-ID: <bug-207464-3630-150rTKx8dZ@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-207464-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-207464-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 06:25:11 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D207464

Andriy Gapon <avg@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|Open                        |In Progress

--=20
You are receiving this mail because:
You are on the CC list for the bug.=

From owner-freebsd-fs@freebsd.org  Mon May 16 07:03:20 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4DAFCB3DED6;
 Mon, 16 May 2016 07:03:20 +0000 (UTC)
 (envelope-from etnapierala@gmail.com)
Received: from mail-wm0-x236.google.com (mail-wm0-x236.google.com
 [IPv6:2a00:1450:400c:c09::236])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id D896E1432;
 Mon, 16 May 2016 07:03:19 +0000 (UTC)
 (envelope-from etnapierala@gmail.com)
Received: by mail-wm0-x236.google.com with SMTP id a17so120423832wme.0;
 Mon, 16 May 2016 00:03:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:date:from:to:cc:subject:message-id:mail-followup-to
 :references:mime-version:content-disposition:in-reply-to:user-agent;
 bh=JPOvQYSt7RTwNWj5WNCvYQS5JFsuu4sZBydUKZ2z2fA=;
 b=F1v0eF+z+4Xc49juEOXt0LUUUebPI3Ya2fV7Y/P4aIR6N2F59AXKxdWtzXSWHt/sQb
 ZOISylbveoDM40XhTOAvQbVoRQ2pLLNncth6pGQiJxLsqOgSPDEQTHehw3iTh+MfsAZJ
 ENXQcq6/PoBSEOl292WF03K6LRmgzCJxX1y/N/skRQ5zvCno8Y6rZ5FIRxC5vkQ/V2/s
 Qrp0rft1ITdbD7WvwJfhV3nsA7H+VDeBBfjOASTSQGfK2Zq8MLmR721Dehf1DRyVAhvr
 K9zFkWMIgqGqAsF9Hu523dLMWiP08Qn8PXDOQFMU576OWymhjxFw/B33r3ebUA00Uh9X
 3GqA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:sender:date:from:to:cc:subject:message-id
 :mail-followup-to:references:mime-version:content-disposition
 :in-reply-to:user-agent;
 bh=JPOvQYSt7RTwNWj5WNCvYQS5JFsuu4sZBydUKZ2z2fA=;
 b=B6WGRH8RA95o84U63iBaiMdK2LE82x0A/nL7SRgrvpG0Q7VBDkzxNpm/DtFInbmB+R
 RJcuquh+1PKXRCBiNu9ksnaS+absVYNikieJq4M4KUs7L0bNmv7v3irG5EgcN/m7IxMr
 ucYWzml8d4ie+ZSnx29s9vhdPrgwK3qCaj8jThTb5JF88EkUDGpJrRKZEvWx2jO2hAhN
 NVksSUNdw9qboK8uSeVoIL1bGEqzzw8EjuMLkK4+dG9whM9etBxncdfc4lyDau4fYG/B
 3JdRZm9iUgjclhsQO0XdfRZjubmwoZE19O/0HXm/sLlcIc5TMjCzA4ZELLfDlk6Fnnme
 zBqw==
X-Gm-Message-State: AOPr4FVli7+0kYXLCNaKsfM4eaOkPc26NVu3FeWtcZVAeshWm36oud2wKa8J8wCWqmJoHg==
X-Received: by 10.194.19.197 with SMTP id h5mr27803898wje.139.1463382198098;
 Mon, 16 May 2016 00:03:18 -0700 (PDT)
Received: from brick (acyr99.neoplus.adsl.tpnet.pl. [83.11.201.99])
 by smtp.gmail.com with ESMTPSA id u6sm32098595wjh.2.2016.05.16.00.03.16
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 16 May 2016 00:03:16 -0700 (PDT)
Sender: =?UTF-8?Q?Edward_Tomasz_Napiera=C5=82a?= <etnapierala@gmail.com>
Date: Mon, 16 May 2016 09:03:14 +0200
From: Edward Tomasz =?utf-8?Q?Napiera=C5=82a?= <trasz@FreeBSD.org>
To: Andriy Gapon <avg@FreeBSD.org>
Cc: freebsd-arch@FreeBSD.org, freebsd-fs <freebsd-fs@FreeBSD.org>
Subject: Re: mount / unmount and mountcheckdirs()
Message-ID: <20160516070314.GA3029@brick>
Mail-Followup-To: Andriy Gapon <avg@FreeBSD.org>, freebsd-arch@FreeBSD.org,
 freebsd-fs <freebsd-fs@FreeBSD.org>
References: <5c01bf62-b7b2-2e1d-bca5-859e6bf1f0e5@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <5c01bf62-b7b2-2e1d-bca5-859e6bf1f0e5@FreeBSD.org>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 07:03:20 -0000

On 0515T1637, Andriy Gapon wrote:
> 
> I am curious about the purpose of mountcheckdirs() called when mounting and
> unmounting a filesystem.

[..]

Whatever you do, please make sure you don't break autofs, and reroot,
esp. firmware(9) loading after reroot.  I'll happily test patches, just
mail them to me.  Thanks :-)


From owner-freebsd-fs@freebsd.org  Mon May 16 07:37:45 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9EC72B3CE67
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 07:37:45 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 8F44B166B
 for <freebsd-fs@FreeBSD.org>; Mon, 16 May 2016 07:37:45 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u4G7bjAu049779
 for <freebsd-fs@FreeBSD.org>; Mon, 16 May 2016 07:37:45 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 207464] Panic when destroying ZFS snapshot
Date: Mon, 16 May 2016 07:37:45 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.2-STABLE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: avg@FreeBSD.org
X-Bugzilla-Status: In Progress
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: avg@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: attachments.created
Message-ID: <bug-207464-3630-paOId41b86@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-207464-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-207464-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 07:37:45 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D207464

--- Comment #25 from Andriy Gapon <avg@FreeBSD.org> ---
Created attachment 170343
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D170343&action=
=3Dedit
add-on patch

If you are testing the first patch could you please test this patch on top =
of
the first patch as well?

--=20
You are receiving this mail because:
You are on the CC list for the bug.=

From owner-freebsd-fs@freebsd.org  Mon May 16 07:43:49 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 61BDDB3CFEC;
 Mon, 16 May 2016 07:43:49 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 7997A199B;
 Mon, 16 May 2016 07:43:48 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA07644;
 Mon, 16 May 2016 10:43:46 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1b2DBq-000HKc-DV; Mon, 16 May 2016 10:43:46 +0300
Subject: Re: mount / unmount and mountcheckdirs()
To: freebsd-arch@FreeBSD.org, freebsd-fs <freebsd-fs@FreeBSD.org>
References: <5c01bf62-b7b2-2e1d-bca5-859e6bf1f0e5@FreeBSD.org>
 <20160516070314.GA3029@brick>
From: Andriy Gapon <avg@FreeBSD.org>
Message-ID: <a2e28eaa-7e93-ede0-3a20-184f0024e516@FreeBSD.org>
Date: Mon, 16 May 2016 10:43:08 +0300
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <20160516070314.GA3029@brick>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 07:43:49 -0000

On 16/05/2016 10:03, Edward Tomasz Napierała wrote:
> On 0515T1637, Andriy Gapon wrote:
>>
>> I am curious about the purpose of mountcheckdirs() called when mounting and
>> unmounting a filesystem.
> 
> [..]
> 
> Whatever you do, please make sure you don't break autofs, and reroot,
> esp. firmware(9) loading after reroot.  I'll happily test patches, just
> mail them to me.  Thanks :-)
> 

Well, the only patch I had in mind (besides
https://svnweb.freebsd.org/changeset/base/299913) is completely removing
mountcheckdirs().  But now that you mentioned autofs and reroot I am not sure
that it could be that simple...

-- 
Andriy Gapon

From owner-freebsd-fs@freebsd.org  Mon May 16 07:47:40 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id ECA54B3D1CB
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 07:47:40 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id DDD971F32
 for <freebsd-fs@FreeBSD.org>; Mon, 16 May 2016 07:47:40 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u4G7leqa073013
 for <freebsd-fs@FreeBSD.org>; Mon, 16 May 2016 07:47:40 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 207464] Panic when destroying ZFS snapshot
Date: Mon, 16 May 2016 07:47:40 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.2-STABLE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: delphij@FreeBSD.org
X-Bugzilla-Status: In Progress
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: avg@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-207464-3630-pOugkfkXCu@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-207464-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-207464-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 07:47:41 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D207464

Xin LI <delphij@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |delphij@FreeBSD.org,
                   |                            |re@FreeBSD.org

--- Comment #26 from Xin LI <delphij@FreeBSD.org> ---
EN candidate?

--=20
You are receiving this mail because:
You are on the CC list for the bug.=

From owner-freebsd-fs@freebsd.org  Mon May 16 07:47:44 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 27999B3D1DD
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 07:47:44 +0000 (UTC)
 (envelope-from hps@selasky.org)
Received: from mail.turbocat.net (mail.turbocat.net
 [IPv6:2a01:4f8:d16:4514::2])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id DED251F43;
 Mon, 16 May 2016 07:47:43 +0000 (UTC) (envelope-from hps@selasky.org)
Received: from laptop015.home.selasky.org (unknown [62.141.129.119])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.turbocat.net (Postfix) with ESMTPSA id 306171FE023;
 Mon, 16 May 2016 09:47:42 +0200 (CEST)
Subject: Re: bug in umass?
To: "K. Macy" <kmacy@freebsd.org>,
 "freebsd-fs@FreeBSD.org" <freebsd-fs@freebsd.org>,
 Alexander Motin <mav@FreeBSD.org>
References: <CAHM0Q_OYGmRmG39v=688f7f8ARs7avSZHBk8y-BSJkfzzYUzdA@mail.gmail.com>
From: Hans Petter Selasky <hps@selasky.org>
Message-ID: <d5d2e8c9-2b3b-f96b-34c1-f5b4a865eab9@selasky.org>
Date: Mon, 16 May 2016 09:51:01 +0200
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101
 Thunderbird/45.0
MIME-Version: 1.0
In-Reply-To: <CAHM0Q_OYGmRmG39v=688f7f8ARs7avSZHBk8y-BSJkfzzYUzdA@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 07:47:44 -0000

Hi Alexander,

Does dumping core on a USB stick from KDB require any threads?

 From the USB point of view we are doing polling in "umass_cam_poll()".

--HPS

On 05/16/16 07:02, K. Macy wrote:
> I'm not able to complete a coredump in i915 to a USB key. The
> backtrace in the log looks like a bug in umass.
>
>
> May 15 21:57:10 beastie kernel: cmap[0]=0 cmap[1]=7f0000 cmap[2]=7f00
> cmap[3]=c4a000
> May 15 21:57:10 beastie kernel: end FB_INFO
> May 15 21:57:10 beastie kernel: drmn0: fb0: inteldrmfb frame buffer device
> May 15 21:57:10 beastie kernel: ..3%
> May 15 21:58:16 beastie syslogd: kernel boot file is /boot/kernel/kernel
> May 15 21:58:16 beastie kernel: trap_fatal() at trap_fatal+0x2d/frame
> 0xfffffe01e2fd6350
> May 15 21:58:16 beastie kernel: trap() at trap+0xc48/frame 0xfffffe01e2fd6690
> May 15 21:58:16 beastie kernel: trap_check() at trap_check+0x4a/frame
> 0xfffffe01e2fd66b0
> May 15 21:58:16 beastie kernel: calltrap() at calltrap+0x8/frame
> 0xfffffe01e2fd66b0
> May 15 21:58:16 beastie kernel: --- trap 0x9, rip =
> 0xffffffff80f5a950, rsp = 0xfffffe01e2fd6780, rbp = 0xfffffe01e2fd6810
> ---
> May 15 21:58:16 beastie kernel: __mtx_lock_flags() at
> __mtx_lock_flags+0xd0/frame 0xfffffe01e2fd6810
> May 15 21:58:16 beastie kernel: xpt_done_process() at
> xpt_done_process+0x495/frame 0xfffffe01e2fd68c0
> May 15 21:58:16 beastie kernel: xpt_done_td() at
> xpt_done_td+0x1c0/frame 0xfffffe01e2fd6930
> May 15 21:58:16 beastie kernel: fork_exit() at fork_exit+0x13b/frame
> 0xfffffe01e2fd69b0
> May 15 21:58:16 beastie kernel: fork_trampoline() at
> fork_trampoline+0xe/frame 0xfffffe01e2fd69b0
> May 15 21:58:16 beastie kernel: --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>


From owner-freebsd-fs@freebsd.org  Mon May 16 08:43:48 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6B80DB3D427
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 08:43:48 +0000 (UTC)
 (envelope-from mavbsd@gmail.com)
Received: from mail-lf0-x22c.google.com (mail-lf0-x22c.google.com
 [IPv6:2a00:1450:4010:c07::22c])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 064681865;
 Mon, 16 May 2016 08:43:47 +0000 (UTC)
 (envelope-from mavbsd@gmail.com)
Received: by mail-lf0-x22c.google.com with SMTP id j8so113704492lfd.2;
 Mon, 16 May 2016 01:43:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:message-id:date:from:user-agent:mime-version:to:subject
 :references:in-reply-to:content-transfer-encoding;
 bh=0XaU0hNJ4iSr+DIl6jiFT4HDIFaPKMjrGCSY1E8edqk=;
 b=gh64LiD7UnQFQJY7GViNwHhQAXtCl68UfpyPX58y0aBMND6iFNQGtaeXyzX6wvfXcD
 J37CvOy4kRxrTdS2oPuv4zjK8uw+L9IXLJt56+ElTNxzLLUUd4ioRVIhHqoI+Jj4LJhe
 N8J3iT5tTEeu+2C4UPI14B7d/+g3zaqSPNSLn2rc++1kwsR54vPhMofF3n95Rqqc9x6O
 Omrsu9rJlOfnxJQQG9pUqk6ac38hhIhT0nKuHNY6NzYNouwOyoma9tskgvKK6L+Mck7u
 nCPPjfT4j9GPI09rdyJv2IrW3AO2j+nAFuQRmI93CkEoebDAw9Vvk2q1zo3ZYPRK26l9
 pWkA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:sender:message-id:date:from:user-agent
 :mime-version:to:subject:references:in-reply-to
 :content-transfer-encoding;
 bh=0XaU0hNJ4iSr+DIl6jiFT4HDIFaPKMjrGCSY1E8edqk=;
 b=i0ZwWw0ZIOCUqzgxoxUt/Mr4W7flWrL3vVNA8fRV/kD9ivzpclSN2mhlX6R6kj/Kzt
 ezivhgwlTySvb7IV8smifmnf12FCdlMmfgtULVcfeu2Lqanu6xiMFBzsOCIdW+uXnmaB
 pfPJ4+tF/MPZeImhY9EwxT5YH1O2W0CL9quYkp5qMzlKl9Gg9h+35fpzwh2TzDvUzeho
 Ko9U47U+vOz1xjH2sX0XvnK4B6SV68ncbFeaKmNHUv1Hp9qUZ1JJTssMrSxwE6VILAb3
 TcxRDvoT3b8rse1PE0Q7hPsZIlSaZAjgp/v8M+eSH0zd27x0BN6ywMoQtLcruQaXBQDm
 4gRQ==
X-Gm-Message-State: AOPr4FU0DJC8vMnH7Wb0a9QxDNLJ4o3o+SHjnBG59m+0yWIkDOlJPeXTcdXDxl8ndYpwGg==
X-Received: by 10.25.159.7 with SMTP id i7mr442292lfe.130.1463388226208;
 Mon, 16 May 2016 01:43:46 -0700 (PDT)
Received: from mavbook.mavhome.dp.ua ([134.249.139.101])
 by smtp.googlemail.com with ESMTPSA id o91sm5256487lfi.41.2016.05.16.01.43.44
 (version=TLSv1/SSLv3 cipher=OTHER);
 Mon, 16 May 2016 01:43:45 -0700 (PDT)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <57398840.6010700@FreeBSD.org>
Date: Mon, 16 May 2016 11:43:44 +0300
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:31.0) Gecko/20100101 Thunderbird/31.6.0
MIME-Version: 1.0
To: Hans Petter Selasky <hps@selasky.org>, "K. Macy" <kmacy@freebsd.org>, 
 "freebsd-fs@FreeBSD.org" <freebsd-fs@freebsd.org>
Subject: Re: bug in umass?
References: <CAHM0Q_OYGmRmG39v=688f7f8ARs7avSZHBk8y-BSJkfzzYUzdA@mail.gmail.com>
 <d5d2e8c9-2b3b-f96b-34c1-f5b4a865eab9@selasky.org>
In-Reply-To: <d5d2e8c9-2b3b-f96b-34c1-f5b4a865eab9@selasky.org>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 08:43:48 -0000

On 16.05.16 10:51, Hans Petter Selasky wrote:
> Hi Alexander,
> 
> Does dumping core on a USB stick from KDB require any threads?
> 
> From the USB point of view we are doing polling in "umass_cam_poll()".

CAM does not differentiate USB from others.  Kernel dumping completely
bypasses GEOM and its threads, manually pushes CAM queues without
requiring context switches, does polling for CAM HBA drivers via
respective method call, and processes completion queues without
depending on completion threads.

> On 05/16/16 07:02, K. Macy wrote:
>> I'm not able to complete a coredump in i915 to a USB key. The
>> backtrace in the log looks like a bug in umass.
>>
>>
>> May 15 21:57:10 beastie kernel: cmap[0]=0 cmap[1]=7f0000 cmap[2]=7f00
>> cmap[3]=c4a000
>> May 15 21:57:10 beastie kernel: end FB_INFO
>> May 15 21:57:10 beastie kernel: drmn0: fb0: inteldrmfb frame buffer
>> device
>> May 15 21:57:10 beastie kernel: ..3%
>> May 15 21:58:16 beastie syslogd: kernel boot file is /boot/kernel/kernel
>> May 15 21:58:16 beastie kernel: trap_fatal() at trap_fatal+0x2d/frame
>> 0xfffffe01e2fd6350
>> May 15 21:58:16 beastie kernel: trap() at trap+0xc48/frame
>> 0xfffffe01e2fd6690
>> May 15 21:58:16 beastie kernel: trap_check() at trap_check+0x4a/frame
>> 0xfffffe01e2fd66b0
>> May 15 21:58:16 beastie kernel: calltrap() at calltrap+0x8/frame
>> 0xfffffe01e2fd66b0
>> May 15 21:58:16 beastie kernel: --- trap 0x9, rip =
>> 0xffffffff80f5a950, rsp = 0xfffffe01e2fd6780, rbp = 0xfffffe01e2fd6810
>> ---
>> May 15 21:58:16 beastie kernel: __mtx_lock_flags() at
>> __mtx_lock_flags+0xd0/frame 0xfffffe01e2fd6810
>> May 15 21:58:16 beastie kernel: xpt_done_process() at
>> xpt_done_process+0x495/frame 0xfffffe01e2fd68c0
>> May 15 21:58:16 beastie kernel: xpt_done_td() at
>> xpt_done_td+0x1c0/frame 0xfffffe01e2fd6930
>> May 15 21:58:16 beastie kernel: fork_exit() at fork_exit+0x13b/frame
>> 0xfffffe01e2fd69b0
>> May 15 21:58:16 beastie kernel: fork_trampoline() at
>> fork_trampoline+0xe/frame 0xfffffe01e2fd69b0
>> May 15 21:58:16 beastie kernel: --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>>
> 


-- 
Alexander Motin

From owner-freebsd-fs@freebsd.org  Mon May 16 10:15:15 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7B911B3C34D
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 10:15:15 +0000 (UTC)
 (envelope-from girgen@FreeBSD.org)
Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202])
 by mx1.freebsd.org (Postfix) with ESMTP id 44DA2131F
 for <freebsd-fs@freebsd.org>; Mon, 16 May 2016 10:15:14 +0000 (UTC)
 (envelope-from girgen@FreeBSD.org)
Received: from [172.16.0.5] (citron.pingpong.net [195.178.173.66])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.pingpong.net (Postfix) with ESMTPSA id 3F6601667F
 for <freebsd-fs@freebsd.org>; Mon, 16 May 2016 12:08:39 +0200 (CEST)
From: Palle Girgensohn <girgen@FreeBSD.org>
X-Pgp-Agent: GPGMail 2.6b2
Content-Type: multipart/signed;
 boundary="Apple-Mail=_7DF1126C-CDCD-4386-958B-B5EEAA0A8866";
 protocol="application/pgp-signature"; micalg=pgp-sha256
Subject: Best practice for high availability ZFS pool
Date: Mon, 16 May 2016 12:08:38 +0200
Message-Id: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
To: freebsd-fs@freebsd.org
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 10:15:15 -0000


--Apple-Mail=_7DF1126C-CDCD-4386-958B-B5EEAA0A8866
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

Hi,

We need to set up a ZFS pool with redundance. The main goal is high =
availability - uptime.

I can see a few of paths to follow.

1. HAST + ZFS

2. Some sort of shared storage, two machines sharing a JBOD box.

3. ZFS replication (zfs snapshot + zfs send | ssh | zfs receive)

4. using something else than ZFS, even a different OS if required.

My main concern with HAST+ZFS is performance. Google offer some insights =
here, I find mainly unsolved problems. Please share any success stories =
or other experiences.

Shared storage still has a single point of failure, the JBOD box. Apart =
from that, is there even any support for the kind of storage PCI cards =
that support dual head for a storage box? I cannot find any.

We are running with ZFS replication today, but it is just too slow for =
the amount of data.

We prefer to keep ZFS as we already have a rather big (~30 TB) pool and =
also tools, scripts, backup all is using ZFS, but if there is no =
solution using ZFS, we're open to alternatives. Nexenta springs to mind, =
but I believe it is using shared storage for redundance, so it does have =
single points of failure?

Any other suggestions? Please share your experience. :)

Palle


--Apple-Mail=_7DF1126C-CDCD-4386-958B-B5EEAA0A8866
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iQEcBAEBCAAGBQJXOZwmAAoJEDQn0sf36Uls3OsIAKelrK6+NG5LtCsTOOa0yy+2
5JCUsdVm+77IOGC2tl8AVTYGK2SODSiW1WZZiCNB4pfRFacKjNqBGSm3GDUizh0k
CvALLacegxvSqsKzu6TMUifQa0UPZayoswLMKbfj6zavKy+kxFyHTiursS7X5XgZ
HOng44oq+p1S+AtvXvRTDegSMIIAzYFppznN+vNuyv1qjOBoPJaf4gp9g2PCfOi0
S6iz/J5f1x2Dtg3m/FAky7nn2Kj6BgajyLaLtOi1lhoG91PqKFngneFidjtLS0Sr
vmaIO/HGQpaPi2RaEpAODoQzUbfF7fdp8IVRYlS1FuqzdfIckF/MzpyKJV6R7E0=
=RLQy
-----END PGP SIGNATURE-----

--Apple-Mail=_7DF1126C-CDCD-4386-958B-B5EEAA0A8866--

From owner-freebsd-fs@freebsd.org  Mon May 16 13:18:30 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8B05EB3D28C
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 13:18:30 +0000 (UTC)
 (envelope-from wjw@digiware.nl)
Received: from smtp.digiware.nl (unknown [IPv6:2001:4cb8:90:ffff::3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 1FE4818B9
 for <freebsd-fs@freebsd.org>; Mon, 16 May 2016 13:18:29 +0000 (UTC)
 (envelope-from wjw@digiware.nl)
Received: from rack1.digiware.nl (unknown [127.0.0.1])
 by smtp.digiware.nl (Postfix) with ESMTP id B393815340D;
 Mon, 16 May 2016 15:18:18 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.nl
Received: from smtp.digiware.nl ([127.0.0.1])
 by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id Cjn8AI7kTrdo; Mon, 16 May 2016 15:18:17 +0200 (CEST)
Received: from [192.168.10.10] (asus [192.168.10.10])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.digiware.nl (Postfix) with ESMTPSA id 6EA0A153401;
 Mon, 16 May 2016 15:18:17 +0200 (CEST)
Subject: Bigger MAX_PATH (Was: Re: State of native encryption in ZFS)
To: Niall Douglas <s_sourceforge@nedprod.com>,
 "freebsd-fs@FreeBSD.org" <freebsd-fs@freebsd.org>
References: <5736E7B4.1000409@gmail.com>
 <57378707.19425.B54772B@s_sourceforge.nedprod.com>
 <CAHM0Q_PGvBRbUFOhmin4RKaDKRTRJyjieuaZ5_tjPerK4eRz=w@mail.gmail.com>
 <57385356.4525.E728971@s_sourceforge.nedprod.com>
From: Willem Jan Withagen <wjw@digiware.nl>
Message-ID: <9ead4b28-9711-5e38-483f-ef9eaf0bc583@digiware.nl>
Date: Mon, 16 May 2016 15:18:17 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.0
MIME-Version: 1.0
In-Reply-To: <57385356.4525.E728971@s_sourceforge.nedprod.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 13:18:30 -0000

On 15-5-2016 12:45, Niall Douglas via freebsd-fs wrote:
> On 14 May 2016 at 16:09, K. Macy wrote:
> 
>>>> It’s not even clear how that encryption would be implemented or exposed.
>>>>  Per pool?  Per dataset?  Per folder?  Per file?  There have been
>>>> requests for all of the above at one time or another, and the key
>>>> management challenges for each are different.  They can also be
>>>> implemented at a layer above ZFS, given sufficient interest.
>>>
>>> If FreeBSD had a bigger PATH_MAX then stackable encryptions layers
>>> like ecryptfs (encfs?) would be viable choices. Because encrypted
>>> path components are so long, one runs very rapidly into the maximum
>>> path on the system when PATH_MAX is so low.
>>>
>>> I ended up actually installing ZFS on Linux with ecryptfs on top to
>>> solve this. Every 15 minutes it ZFS snapshot syncs with the FreeBSD
>>> edition. This works very well, apart from the poor performance of ZFS
>>> on Linux.
>>>
>>> ZFS handles long paths with ease. FreeBSD currently does not :(
>>
>> AFAICT that's a 1 line patch. Have you tried patching that and
>> rebuilding kernel, world, and any vulnerable ports?
> 
> The problem is apparently kernel structure bloat and that they want 
> to remove fixed maximum paths altogether so it would be boot 
> modifiable.
> 
> http://freebsd.1045724.n5.nabble.com/misc-184340-PATH-MAX-not-interope
> rable-with-Linux-td5864469.html
> 
> As laudable as the latter goal is, unfortunately OS X and Linux hard 
> code theirs, and much POSIX software will use whatever PATH_MAX is 
> set to. I'm therefore not sure the implementation cost is worth it.
> 
> In any case, a 1024 byte path limit is just 256 unicode characters 
> potentially. That's worse than Windows 95 :(

I'm pretty sure that just about everybody that runs a somewhat bigger
ZFS installation runs into this a one point or another.
The weekly locate database build nags (after every fresh install) me for
about 4 years already that it needs a larger path than 1024. And then I
just dig into the source to up the value. the locate.db does not really
care.

I think I go a reply from Jilles around that time, that changing the
defines might cause unwanted compatibility fallout.
That was an answer sure enough to keep my hands from just doing the
1-line patch.

Trying to port Ceph is also running into the limit in:
/usr/include/sys/syslimits.h:
#define NAME_MAX        255  /* max bytes in a file name */

but I also found:
/usr/include/stdio.h:
#define FILENAME_MAX    1024 /* must be <= PATH_MAX <sys/syslimits.h> */

So take a pick??

--WjW


From owner-freebsd-fs@freebsd.org  Mon May 16 13:56:38 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 781DAB3DF25
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 13:56:38 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from cu01176a.smtpx.saremail.com (cu01176a.smtpx.saremail.com
 [195.16.150.151])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 36C3A135D;
 Mon, 16 May 2016 13:56:37 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from [172.16.8.36] (izaro.sarenet.es [192.148.167.11])
 by proxypop03.sare.net (Postfix) with ESMTPSA id 35EB89DD37C;
 Mon, 16 May 2016 15:51:03 +0200 (CEST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: Best practice for high availability ZFS pool
From: Borja Marcos <borjam@sarenet.es>
In-Reply-To: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
Date: Mon, 16 May 2016 15:51:02 +0200
Cc: freebsd-fs@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <F3716A47-BC73-4C51-BF7C-911BCFE4D29F@sarenet.es>
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
To: Palle Girgensohn <girgen@freebsd.org>
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 13:56:38 -0000


> On 16 May 2016, at 12:08, Palle Girgensohn <girgen@freebsd.org> wrote:
>=20
> Hi,
>=20
> We need to set up a ZFS pool with redundance. The main goal is high =
availability - uptime.
>=20
> I can see a few of paths to follow.
>=20
> 1. HAST + ZFS

Which means that a possible corruption causing bug in ZFS would vaporize =
the data of both replicas.

> 3. ZFS replication (zfs snapshot + zfs send | ssh | zfs receive)

If you don=E2=80=99t have a hard requirement for synchronous replication =
(and, in that case, I would opt for a more application
aware approach) it=E2=80=99s the best method in my opinion.


Borja.


From owner-freebsd-fs@freebsd.org  Mon May 16 14:52:37 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id AE6F6B3D121
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 14:52:37 +0000 (UTC)
 (envelope-from rainer@ultra-secure.de)
Received: from connect.ultra-secure.de (connect.ultra-secure.de
 [88.198.71.201])
 by mx1.freebsd.org (Postfix) with ESMTP id CA45419F7;
 Mon, 16 May 2016 14:52:36 +0000 (UTC)
 (envelope-from rainer@ultra-secure.de)
Received: (Haraka outbound); Mon, 16 May 2016 16:52:29 +0200
Authentication-Results: connect.ultra-secure.de; iprev=pass; auth=pass (plain);
 spf=none smtp.mailfrom=ultra-secure.de
Received-SPF: None (connect.ultra-secure.de: domain of ultra-secure.de does
 not designate 217.71.83.52 as permitted sender)
 receiver=connect.ultra-secure.de; identity=mailfrom; client-ip=217.71.83.52;
 helo=[192.168.1.200]; envelope-from=<rainer@ultra-secure.de>
Received: from [192.168.1.200] (217-071-083-052.ip-tech.ch [217.71.83.52])
 by connect.ultra-secure.de (Haraka/2.6.2-toaster) with ESMTPSA id
 9062EEE8-5B4C-48E7-B021-F8137F8512A3.1
 envelope-from <rainer@ultra-secure.de> (authenticated bits=0)
 (version=TLSv1/SSLv3 cipher=AES256-SHA verify=NO);
 Mon, 16 May 2016 16:52:26 +0200
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: Best practice for high availability ZFS pool
From: Rainer Duffner <rainer@ultra-secure.de>
In-Reply-To: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
Date: Mon, 16 May 2016 16:52:24 +0200
Cc: freebsd-fs@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <284D58D1-1C62-4519-A46B-7D0E8326B86B@ultra-secure.de>
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
To: Palle Girgensohn <girgen@FreeBSD.org>
X-Mailer: Apple Mail (2.3124)
X-Haraka-GeoIP: EU, CH, 451km
X-Haraka-ASN: 24951
X-Haraka-GeoIP-Received: 
X-Haraka-ASN: 24951 217.71.80.0/20
X-Haraka-ASN-CYMRU: asn=24951 net=217.71.80.0/20 country=CH assignor=ripencc
 date=2003-08-07
X-Haraka-FCrDNS: 217-071-083-052.ip-tech.ch
X-Haraka-p0f: os="Mac OS X " link_type="DSL" distance=13 total_conn=1
 shared_ip=N
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on spamassassin
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00
 autolearn=ham autolearn_force=no version=3.4.1
X-Haraka-Karma: score: 6, good: 166, bad: 0, connections: 326, history: 166,
 asn_score: 100, asn_connections: 111, asn_good: 100, asn_bad: 0, pass:all_good,
 asn, asn_all_good, relaying
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 14:52:37 -0000


> Am 16.05.2016 um 12:08 schrieb Palle Girgensohn <girgen@FreeBSD.org>:
>=20
> Hi,
>=20
> We need to set up a ZFS pool with redundance. The main goal is high =
availability - uptime.
>=20
> I can see a few of paths to follow.
>=20
> 1. HAST + ZFS
>=20
> 2. Some sort of shared storage, two machines sharing a JBOD box.
>=20
> 3. ZFS replication (zfs snapshot + zfs send | ssh | zfs receive)
>=20
> 4. using something else than ZFS, even a different OS if required.


There=E2=80=99s always GlusterFS.
Recently ported to FreeBSD and available as net/gulsterfs (10.3 =
recommended, AFAIK).

At work, we use it on Ubuntu - but not with so much data.
On Linux, I=E2=80=99d use it on top of XFS.

For our Cloud-Storage, we went with ScaleIO (which is Linux only).

You need more than two nodes with Gluster, though (for production use)
I think my co-worker said four at least.

If you have the money and don=E2=80=99t mind Linux, ScaleIO is probably =
the best you can buy at the moment.
While licensed at the GByte-Level (yeah, EMC=E2=80=A6) it can be used =
free of charge, unsupported.


From owner-freebsd-fs@freebsd.org  Mon May 16 15:38:20 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C9FCAB3DF5D
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 15:38:20 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id BA8E41BCD
 for <freebsd-fs@FreeBSD.org>; Mon, 16 May 2016 15:38:20 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u4GFcKBB050055
 for <freebsd-fs@FreeBSD.org>; Mon, 16 May 2016 15:38:20 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 209093] ZFS snapshot rename : .zfs/snapshot messes up
Date: Mon, 16 May 2016 15:38:20 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.3-RELEASE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Some People
X-Bugzilla-Who: commit-hook@freebsd.org
X-Bugzilla-Status: New
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-209093-3630-VivHvkX3pR@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-209093-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-209093-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 15:38:20 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209093

--- Comment #1 from commit-hook@freebsd.org ---
A commit references this bug:

Author: avg
Date: Mon May 16 15:37:41 UTC 2016
New revision: 299949
URL: https://svnweb.freebsd.org/changeset/base/299949

Log:
  try to recycle "snap" vnodes as soon as possible

  Those vnodes should not linger.  "Stale" nodes may get out of
  synchronization with actual snapshots.  For example if we destroy a
  snapshot and create a new one with the same name.  Or when we rename a
  snapshot.

  While there fix the argument type for zfsctl_snapshot_reclaim().
  Also, its original argument can be passed to gfs_vop_reclaim() directly.

  Bug 209093 could be related although I have not specifically verified
  that.  Referencing just in case.

  PR:           209093
  MFC after:    5 weeks

Changes:
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Mon May 16 16:32:52 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A8DAFB3D2E4
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 16:32:52 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 99A1E1ACD
 for <freebsd-fs@FreeBSD.org>; Mon, 16 May 2016 16:32:52 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u4GGWqnn004439
 for <freebsd-fs@FreeBSD.org>; Mon, 16 May 2016 16:32:52 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 209093] ZFS snapshot rename : .zfs/snapshot messes up
Date: Mon, 16 May 2016 16:32:52 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.3-RELEASE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Some People
X-Bugzilla-Who: avg@FreeBSD.org
X-Bugzilla-Status: New
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: attachments.created
Message-ID: <bug-209093-3630-ptmV8yLwmi@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-209093-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-209093-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 16:32:52 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209093

--- Comment #2 from Andriy Gapon <avg@FreeBSD.org> ---
Created attachment 170370
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D170370&action=
=3Dedit
proposed patch for testing

I think I've found the cause of this problem.
I can't believe it but it seems that the 'allow_mounted' check was reversed=
 for
3 years since it was introduced in 2013.

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Mon May 16 16:37:07 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id EF46AB3D3AB
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 16:37:07 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id E02F51BE8
 for <freebsd-fs@FreeBSD.org>; Mon, 16 May 2016 16:37:07 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u4GGb7H5010506
 for <freebsd-fs@FreeBSD.org>; Mon, 16 May 2016 16:37:07 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 209093] ZFS snapshot rename : .zfs/snapshot messes up
Date: Mon, 16 May 2016 16:37:08 +0000
X-Bugzilla-Reason: CC AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.3-RELEASE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Some People
X-Bugzilla-Who: avg@FreeBSD.org
X-Bugzilla-Status: Open
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: avg@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc assigned_to bug_status
Message-ID: <bug-209093-3630-O4oqHHDUs3@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-209093-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-209093-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 16:37:08 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209093

Andriy Gapon <avg@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |freebsd-fs@FreeBSD.org
           Assignee|freebsd-fs@FreeBSD.org      |avg@FreeBSD.org
             Status|New                         |Open

--=20
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Mon May 16 16:45:23 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A75C5B3D760
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 16:45:23 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 96EED13A6
 for <freebsd-fs@FreeBSD.org>; Mon, 16 May 2016 16:45:23 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u4GGjNp5031824
 for <freebsd-fs@FreeBSD.org>; Mon, 16 May 2016 16:45:23 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 207464] Panic when destroying ZFS snapshot
Date: Mon, 16 May 2016 16:45:23 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.2-STABLE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: karl@denninger.net
X-Bugzilla-Status: In Progress
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: avg@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-207464-3630-guXSO18cdz@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-207464-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-207464-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 16:45:23 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D207464

--- Comment #27 from karl@denninger.net ---
Comment on attachment 170343
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D170343
add-on patch

Rebuilding kernel to include this as well....

--=20
You are receiving this mail because:
You are on the CC list for the bug.=

From owner-freebsd-fs@freebsd.org  Mon May 16 19:22:33 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 30B36B3DE9B
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 19:22:33 +0000 (UTC)
 (envelope-from truckman@FreeBSD.org)
Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 0025510B2
 for <freebsd-fs@FreeBSD.org>; Mon, 16 May 2016 19:22:32 +0000 (UTC)
 (envelope-from truckman@FreeBSD.org)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
 by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id u4GJMQNr072510
 for <freebsd-fs@FreeBSD.org>; Mon, 16 May 2016 12:22:30 -0700 (PDT)
 (envelope-from truckman@FreeBSD.org)
Message-Id: <201605161922.u4GJMQNr072510@gw.catspoiler.org>
Date: Mon, 16 May 2016 12:22:26 -0700 (PDT)
From: Don Lewis <truckman@FreeBSD.org>
Subject: patch to fix Coverity CIDs in rpc.statd find_host()
To: freebsd-fs@FreeBSD.org
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 19:22:33 -0000

Coverity barfed all over find_host() in rpc.statd.

I put a patch up for review here: <https://reviews.freebsd.org/D6398>.
I'd like to get some other eyeballs on it before I commit it.


From owner-freebsd-fs@freebsd.org  Mon May 16 20:06:15 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9C80EB3EF31
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 20:06:15 +0000 (UTC)
 (envelope-from peter@rulingia.com)
Received: from vps.rulingia.com (vps.rulingia.com [103.243.244.15])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "rulingia.com", Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 492601AAD
 for <freebsd-fs@freebsd.org>; Mon, 16 May 2016 20:06:14 +0000 (UTC)
 (envelope-from peter@rulingia.com)
Received: from server.rulingia.com (ppp59-167-167-3.static.internode.on.net
 [59.167.167.3])
 by vps.rulingia.com (8.15.2/8.15.2) with ESMTPS id u4GK5nPV000925
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Tue, 17 May 2016 06:05:56 +1000 (AEST)
 (envelope-from peter@rulingia.com)
X-Bogosity: Ham, spamicity=0.000000
Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1])
 by server.rulingia.com (8.15.2/8.15.2) with ESMTPS id u4GK5iid028001
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Tue, 17 May 2016 06:05:44 +1000 (AEST)
 (envelope-from peter@server.rulingia.com)
Received: (from peter@localhost)
 by server.rulingia.com (8.15.2/8.15.2/Submit) id u4GK5h0w028000;
 Tue, 17 May 2016 06:05:43 +1000 (AEST) (envelope-from peter)
Date: Tue, 17 May 2016 06:05:43 +1000
From: Peter Jeremy <peter@rulingia.com>
To: Willem Jan Withagen <wjw@digiware.nl>
Cc: "freebsd-fs@FreeBSD.org" <freebsd-fs@freebsd.org>
Subject: Re: Bigger MAX_PATH (Was: Re: State of native encryption in ZFS)
Message-ID: <20160516200543.GC42426@server.rulingia.com>
References: <5736E7B4.1000409@gmail.com>
 <57378707.19425.B54772B@s_sourceforge.nedprod.com>
 <CAHM0Q_PGvBRbUFOhmin4RKaDKRTRJyjieuaZ5_tjPerK4eRz=w@mail.gmail.com>
 <57385356.4525.E728971@s_sourceforge.nedprod.com>
 <9ead4b28-9711-5e38-483f-ef9eaf0bc583@digiware.nl>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
 protocol="application/pgp-signature"; boundary="8t9RHnE3ZwKMSgU+"
Content-Disposition: inline
In-Reply-To: <9ead4b28-9711-5e38-483f-ef9eaf0bc583@digiware.nl>
X-PGP-Key: http://www.rulingia.com/keys/peter.pgp
User-Agent: Mutt/1.6.1 (2016-04-27)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 20:06:15 -0000


--8t9RHnE3ZwKMSgU+
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2016-May-16 15:18:17 +0200, Willem Jan Withagen <wjw@digiware.nl> wrote:
>Trying to port Ceph is also running into the limit in:
>/usr/include/sys/syslimits.h:
>#define NAME_MAX        255  /* max bytes in a file name */
>
>but I also found:
>/usr/include/stdio.h:
>#define FILENAME_MAX    1024 /* must be <=3D PATH_MAX <sys/syslimits.h> */
>
>So take a pick??

There are two distinct limits: The maximum number of characters in a
pathname component (ie the name seen in a directory entry):  For UFS,
this is 255 because the length is stored on disk in a uint8_t (I don't
know the limit for ZFS).  The other limit is the maximum number of
characters in a pathname - PATH_MAX.  This is used to dimension various
buffers but isn't persistent on disk so you should be able to increase
it by changing the relevant #defines and rebuilding everything.

--=20
Peter Jeremy

--8t9RHnE3ZwKMSgU+
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQJ8BAEBCgBmBQJXOigXXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRFRUIyOTg2QzMwNjcxRTc0RTY1QzIyN0Ux
NkE1OTdBMEU0QTIwQjM0AAoJEBall6Dkogs0JXsP/3kcHzh+YSnEjcbMb2eY7qSZ
0U5XU/DiK9ko2VpKELqw+y3cQoeyu0YT7IlvIVWqSDtb4NdJ5o/AcAtP7JO6c4ot
JyMwOu1VvyFm9ZZ7cR9AGJ7GH0/YtcXBYTlXkrHqwi1vg18AhL0kFH+VD3uAQn/o
9bkLvJKxGaf5MSQyBoHY4jjBCHU2wN3+nu/ZS7ZZMJ27qYEyX1CCpqSoV4wpJIFC
1JUGz4lhhk+J1qdqN94AbnoD3iYos1HBIiFo8gzVCEngnzfFhSE9DIbTRH7HUQit
EBmpi8fb3gCeOLTj1qmc0qE5MGLz2Y4m/GWoqMgkPpHq+957LYIihUEktfuviHdC
6xKDVuQBFqv3lrt1DaboRmobnEVBephKlTgpNoYM2z/n8oEgkEukQUGui+pArqFK
RjN+pnLOzMyUoK1I39eRR1WN120KV7RdOEvIYdKZEZFKhtJ95yN4lxXseQrAAQ2C
SA0NXoNW3VU5ZZGur0m7yRj8YbxHxCdB4rqAX0ppoPngo5nrTXJHtGTp1N4lRbvl
Qzqk1Wq1CbejtY9i3VCSEWXK/d3tXnGGkDFu4Faq2rSoJwy2f7Mr2Kv88eq+dkwE
JJlB555ZhdkIlCy+Ypt8N5TSEgxwm7pXVjXbr87YTL+hrG5nhJGSXhUSX3ZJvz4R
6ALzojz8wNTEh527wvXR
=43s8
-----END PGP SIGNATURE-----

--8t9RHnE3ZwKMSgU+--

From owner-freebsd-fs@freebsd.org  Mon May 16 20:19:17 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 35D4CB3D40A
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 20:19:17 +0000 (UTC)
 (envelope-from opticz7g__toypmoypru__r4@rayman.beget.ru)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 245A71363
 for <freebsd-fs@freebsd.org>; Mon, 16 May 2016 20:19:17 +0000 (UTC)
 (envelope-from opticz7g__toypmoypru__r4@rayman.beget.ru)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 1FD96B3D409; Mon, 16 May 2016 20:19:17 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 06F7FB3D405
 for <fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 20:19:17 +0000 (UTC)
 (envelope-from opticz7g__toypmoypru__r4@rayman.beget.ru)
Received: from m2.rayman.beget.ru (m2.rayman.beget.ru [87.236.19.11])
 (using TLSv1 with cipher AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6CD001362
 for <fs@freebsd.org>; Mon, 16 May 2016 20:19:16 +0000 (UTC)
 (envelope-from opticz7g__toypmoypru__r4@rayman.beget.ru)
Received: from opticz7g (Authenticated sender opticz7g@rayman.beget.ru)
 by rayman.beget.ru with local (Exim 4.76)
 (envelope-from <opticz7g__toypmoypru__r4@rayman.beget.ru>)
 id 1b2Oyu-0007wC-Qe
 for fs@freebsd.org; Mon, 16 May 2016 23:19:12 +0300
To: fs@freebsd.org
Subject: Notice of appearance in Court #00630654
Date: Mon, 16 May 2016 23:19:12 +0300
From: "County Court" <rick.finley@toy-moy.ru>
Reply-To: "County Court" <rick.finley@toy-moy.ru>
Message-ID: <69adcbc1da1b1e8dfee1445db35757b6@toy-moy.ru>
X-Priority: 3
MIME-Version: 1.0
Precedence: bulk
Content-Type: text/plain; charset=us-ascii
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 20:19:17 -0000

Notice to Appear,

You have to appear in the Court on the May 23.
Please, prepare all the documents relating to the case and bring them to Court on the specified date.
Note: The case may be heard by the judge in your absence if you do not come.

You can review complete details of the Court Notice in the attachment.

Kind regards,
Rick Finley,
Clerk of Court.


From owner-freebsd-fs@freebsd.org  Mon May 16 21:14:04 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id AED72B3EAED
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 21:14:04 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 9E3DD1F1B
 for <freebsd-fs@freebsd.org>; Mon, 16 May 2016 21:14:04 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 99E50B3EAEC; Mon, 16 May 2016 21:14:04 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9988CB3EAEB
 for <fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 21:14:04 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au
 [211.29.132.97])
 by mx1.freebsd.org (Postfix) with ESMTP id 2F6BF1F19;
 Mon, 16 May 2016 21:14:03 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au
 (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109])
 by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 823C57837EC;
 Tue, 17 May 2016 07:13:56 +1000 (AEST)
Date: Tue, 17 May 2016 07:13:55 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: fs@freebsd.org
cc: rmacklem@freebsd.org
Subject: fixes for i/o counting in nfs
Message-ID: <20160517063058.E2021@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=TuMb/2jh c=1 sm=1 tr=0
 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10
 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=ao2s4AhymvbEqQ6vx_0A:9
 a=8qM5l07LXi6s6vM6:21 a=vbcHcmFXaA1j43Cz:21 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 21:14:04 -0000

nfs doesn't count block inputs in resource usage.  It seems to count
block outputs well enough (not very well, since buffering and threading
causes some i/o's to be done by other threads where the counts are hard
to see and harder to associate with the actual user).

nfs doesn't support per-mount i/o counts for either input and output.

These patches are for an old version of oldnfs.  They apply cleanly
to oldnfs in FreeBSD-10.

I'm not sure if I found all the i/o's and don't trust my thread and mount
pointer handling, but they seem to work reasonably and never find a null
pointer.  The per-mount i/o counts were easier to do than for most
file systems since nfs isn't handicapped by using geom.  The corresponding
code in g_vfs_strategy() has a harder time finding the mount point and
often fails, so must check for null pointers and not work when it can't
find the mount point.  This code doesn't even exist in the version that
this patch is for (except I patch it in).

X Index: nfs_bio.c
X ===================================================================
X --- nfs_bio.c	(revision 181737)
X +++ nfs_bio.c	(working copy)
X @@ -1568,6 +1581,14 @@
X  	    case VREG:
X  		uiop->uio_offset = ((off_t)bp->b_blkno) * DEV_BSIZE;
X  		nfsstats.read_bios++;
X +		if (td == NULL)
X +			curthread->td_ru.ru_inblock++;	/* XXX */
X +		else
X +			td->td_ru.ru_inblock++;		/* XXX? */

These are XXX'ed since I don't know if td is ever null or always right
when it is non-null.  But this seems to work right -- some counts go
to normal threads and some to nfsiod's.


X +		if (LK_HOLDER(bp->b_lock.lk_lock) == LK_KERNPROC)
X +			vp->v_mount->mnt_stat.f_asyncreads++;	/* XXX */
X +		else
X +			vp->v_mount->mnt_stat.f_syncreads++;
X  		error = (nmp->nm_rpcops->nr_readrpc)(vp, uiop, cr);

This is XXX'ed since I don't trust the LK_KERNPROC check at all.  This
was blindly copied from g_vfs_strategy().  A separate count for async
_reads_ is not very useful anyway.  It is mostly for read-ahead.  Most
reads should be ahead, but complicated buffering in hardware and software
makes them hard to count and the counts not very useful.

X 
X  		if (!error) {
X @@ -1674,10 +1695,16 @@
X  		io.iov_base = (char *)bp->b_data + bp->b_dirtyoff;
X  		uiop->uio_rw = UIO_WRITE;
X  		nfsstats.write_bios++;
X +		if (td == NULL)
X +			curthread->td_ru.ru_oublock++;	/* XXX */
X +		else
X +			td->td_ru.ru_oublock++;		/* XXX? */

As above.

X 
X  		if ((bp->b_flags & (B_ASYNC | B_NEEDCOMMIT | B_NOCACHE | B_CLUSTER)) == B_ASYNC)
X +			vp->v_mount->mnt_stat.f_asyncwrites++,
X  		    iomode = NFSV3WRITE_UNSTABLE;
X  		else
X +			vp->v_mount->mnt_stat.f_syncwrites++,
X  		    iomode = NFSV3WRITE_FILESYNC;

Here the sync/async decision is easy to make correctly.

The patch uses a comma splice hack to keep the patch small.

X 
X  		error = (nmp->nm_rpcops->nr_writerpc)(vp, uiop, cr, &iomode, &must_commit);
X Index: nfs_vnops.c
X ===================================================================
X --- nfs_vnops.c	(revision 181737)
X +++ nfs_vnops.c	(working copy)
X @@ -3138,7 +3290,6 @@
X  	bp->b_iocmd = BIO_WRITE;
X 
X  	bufobj_wref(bp->b_bufobj);
X -	curthread->td_ru.ru_oublock++;
X  	splx(s);
X 
X  	/*

This is now counted in nfs_bio.c, and the results are much the same.

Apparently it makes little difference to always use curthread.  In file
systems generally, we pass around td's and they are usually useless, but
if they are good for anything at all then it is to record the (first)
originator of the i/o so as to charge the originator and not a daemon.
I don't know if they are used for that.

Bruce

From owner-freebsd-fs@freebsd.org  Mon May 16 21:26:19 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6AF39B3EF13
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 21:26:19 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 5AE8516F0
 for <freebsd-fs@freebsd.org>; Mon, 16 May 2016 21:26:19 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 56973B3EF12; Mon, 16 May 2016 21:26:19 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 563D4B3EF11
 for <fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 21:26:19 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au
 [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 2500716EF
 for <fs@freebsd.org>; Mon, 16 May 2016 21:26:18 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au
 (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109])
 by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 338E2D6691C
 for <fs@freebsd.org>; Tue, 17 May 2016 07:26:09 +1000 (AEST)
Date: Tue, 17 May 2016 07:26:08 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: fs@freebsd.org
Subject: fix for per-mount i/o counting in ffs
Message-ID: <20160517072104.I2137@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=EfU1O6SC c=1 sm=1 tr=0
 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10
 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=5YkQZLojSFcQydPC5FAA:9
 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 21:26:19 -0000

Counting of i/o's in g_vfs_strategy() requires the fs to initialize
devvp->v_rdev->si_mountpt to non-null.  This seems to be done correctly
in ext2fs and msdosfs, but in ffs it is not done for ro mounts, or for
rw mounts that started as ro.  The bug is most obvious for the root
file system since it always starts as ro.

The patch fixes 2 unrelated style bugs in comments.

X Index: ffs_vfsops.c
X ===================================================================
X --- ffs_vfsops.c	(revision 299263)
X +++ ffs_vfsops.c	(working copy)
X @@ -512,7 +512,7 @@
X  		 * We need the name for the mount point (also used for
X  		 * "last mounted on") copied in. If an error occurs,
X  		 * the mount point is discarded by the upper level code.
X -		 * Note that vfs_mount() populates f_mntonname for us.
X +		 * Note that vfs_mount_alloc() populates f_mntonname for us.
X  		 */
X  		if ((error = ffs_mountfs(devvp, mp, td)) != 0) {
X  			vrele(devvp);
X @@ -1049,8 +1049,6 @@
X  			ffs_flushfiles(mp, FORCECLOSE, td);
X  			goto out;
X  		}
X -		if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
X -			devvp->v_rdev->si_mountpt = mp;
X  		if (fs->fs_snapinum[0] != 0)
X  			ffs_snapshot_mount(mp);
X  		fs->fs_fmod = 1;
X @@ -1057,8 +1055,10 @@
X  		fs->fs_clean = 0;
X  		(void) ffs_sbupdate(ump, MNT_WAIT, 0);
X  	}
X +	if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
X +		devvp->v_rdev->si_mountpt = mp;
X  	/*
X -	 * Initialize filesystem stat information in mount struct.
X +	 * Initialize filesystem state information in mount struct.
X  	 */
X  	MNT_ILOCK(mp);
X  	mp->mnt_kern_flag |= MNTK_LOOKUP_SHARED | MNTK_EXTENDED_SHARED |

Bruce

From owner-freebsd-fs@freebsd.org  Mon May 16 21:54:37 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D47D5B387D4
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 21:54:37 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id C46E518E0
 for <freebsd-fs@freebsd.org>; Mon, 16 May 2016 21:54:37 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id C008EB387D3; Mon, 16 May 2016 21:54:37 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id BFAECB387D2
 for <fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 21:54:37 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id 8D6FA18DF
 for <fs@freebsd.org>; Mon, 16 May 2016 21:54:37 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au
 (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id E24C64271DC
 for <fs@freebsd.org>; Tue, 17 May 2016 07:54:28 +1000 (AEST)
Date: Tue, 17 May 2016 07:54:27 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: fs@freebsd.org
Subject: quick fix for slow directory shrinking in ffs
Message-ID: <20160517072705.F2157@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=TuMb/2jh c=1 sm=1 tr=0
 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10
 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=pubc52WGR5en7ZIXB40A:9
 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 21:54:37 -0000

ffs does very slow shrinking of directories after removing some files
leaves unused blocks at the end, by always doing synchronous truncation.

This often happens in my normal usage: medium size builds expand /tmp
from 512 to 1024 to hold a few more hundred bytes of file names;
expansion is async and fast, but shrinking is sync and slow, and
with a certain size of build the boundary is crossed back and forth
very often.

My /tmp directory is always on an async-mounted file system, so this
quick fix of always doing an async truncation for async mounts works
for me.  Using IO_SYNC when not asked to is a bug for async mounts
in all cases anyway.

The file system has block size 8192 and frag size 1024, so it is also
wrong to shrink to size DIRBLKSIZE = 512.  The shrinkage seems to be
considered at every DIRBLKSIZE boundary, so not only small directories
are affected.

The patch fixes an unrelated typo in a message.

X Index: ufs_lookup.c
X ===================================================================
X --- ufs_lookup.c	(revision 299263)
X +++ ufs_lookup.c	(working copy)
X @@ -1131,9 +1131,9 @@
X  		if (tvp != NULL)
X  			VOP_UNLOCK(tvp, 0);
X  		error = UFS_TRUNCATE(dvp, (off_t)dp->i_endoff,
X -		    IO_NORMAL | IO_SYNC, cr);
X +		    IO_NORMAL | (DOINGASYNC(dvp) ? 0 : IO_SYNC), cr);
X  		if (error != 0)
X -			vprint("ufs_direnter: failted to truncate", dvp);
X +			vprint("ufs_direnter: failed to truncate", dvp);
X  #ifdef UFS_DIRHASH
X  		if (error == 0 && dp->i_dirhash != NULL)
X  			ufsdirhash_dirtrunc(dp, dp->i_endoff);

Bruce

From owner-freebsd-fs@freebsd.org  Mon May 16 22:36:54 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id CA742B3D6FC
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 22:36:54 +0000 (UTC)
 (envelope-from girgen@FreeBSD.org)
Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202])
 by mx1.freebsd.org (Postfix) with ESMTP id 92F0014E6
 for <freebsd-fs@freebsd.org>; Mon, 16 May 2016 22:36:54 +0000 (UTC)
 (envelope-from girgen@FreeBSD.org)
Received: from [10.0.1.11] (h-155-4-128-242.na.cust.bahnhof.se [155.4.128.242])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.pingpong.net (Postfix) with ESMTPSA id 3120F16B2C;
 Tue, 17 May 2016 00:36:52 +0200 (CEST)
Subject: Re: Best practice for high availability ZFS pool
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Content-Type: multipart/signed;
 boundary="Apple-Mail=_E981C4D9-6449-4B12-B476-356B0F43A9DD";
 protocol="application/pgp-signature"; micalg=pgp-sha256
X-Pgp-Agent: GPGMail 2.6b2
From: Palle Girgensohn <girgen@FreeBSD.org>
In-Reply-To: <F3716A47-BC73-4C51-BF7C-911BCFE4D29F@sarenet.es>
Date: Tue, 17 May 2016 00:36:51 +0200
Cc: freebsd-fs@freebsd.org
Message-Id: <89D73122-FAC7-4449-AAB3-C4BBE74B960A@FreeBSD.org>
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
 <F3716A47-BC73-4C51-BF7C-911BCFE4D29F@sarenet.es>
To: Borja Marcos <borjam@sarenet.es>
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 22:36:54 -0000


--Apple-Mail=_E981C4D9-6449-4B12-B476-356B0F43A9DD
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8


> 16 maj 2016 kl. 15:51 skrev Borja Marcos <borjam@sarenet.es>:
>=20
>=20
>> On 16 May 2016, at 12:08, Palle Girgensohn <girgen@freebsd.org> =
wrote:
>>=20
>> Hi,
>>=20
>> We need to set up a ZFS pool with redundance. The main goal is high =
availability - uptime.
>>=20
>> I can see a few of paths to follow.
>>=20
>> 1. HAST + ZFS
>=20
> Which means that a possible corruption causing bug in ZFS would =
vaporize the data of both replicas.
>=20
>> 3. ZFS replication (zfs snapshot + zfs send | ssh | zfs receive)
>=20
> If you don=E2=80=99t have a hard requirement for synchronous =
replication (and, in that case, I would opt for a more application
> aware approach) it=E2=80=99s the best method in my opinion.

That was exactly my thought 18 months ago, and we set up two systems =
with zfs snapshot + zfs send | ssh | zfs receive. It works, but the =
problem is it just too slow and a complete sync takes like 10 minutes =
for all the file systems. We are forced to sync the file systems one at =
a time to get the kind of control and separation we need. Even if we =
could speed that up somehow, we are really looking for a more recilient =
system. Also, constant snapshotting and writing makes scrub very slow so =
we need to tune down the amount of syncing every fourth week-end to =
scrub. It's OK but not optimal, so we're pondering for something better.

My first choice is really HAST at the moment, but I also dont find much =
written in the last couple of years, apart from some articles about =
setting it up in very minimal testbeds or posts about performance and =
stability troubles. This makes me wonder, is HAST actively maintained? =
Is it stable, used and loved by the community? I'd love to hear some =
success stories with farily large installations of at least 20 TB or so.

Palle


--Apple-Mail=_E981C4D9-6449-4B12-B476-356B0F43A9DD
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iQEcBAEBCAAGBQJXOkuDAAoJEDQn0sf36Uls++IIAIGX1yPZt2BdPB9rly71u+TV
9jap9c0ZtUagYcwUNnUbKuShoEKr1FCyIv5trIB13CC7UieBV3f8AAprCa7fohb3
Hc5nENqjyqaG2udppYg7J5mXs1so5W6F9SdmSuIh2RSCvtV+aKm5ofmF+Ef7ZiEo
zvR8jJzVcLEHm5RnpzQm1oU17U0eHwfF5fdWtaw69roHCWMk08MkQcJBocXORAh5
/+L7zzPxezQh4YeYfDnj9rC7vaerU8iyEQsw8MV6tY6gD+JiW1dfjZK6p0AwwkKk
W876vHi+rbxpWt4bLYDBPbRsnRGYaL9AuX1bGSgAvXlhZS2Rod5DdnpoX5ez/+E=
=kVC+
-----END PGP SIGNATURE-----

--Apple-Mail=_E981C4D9-6449-4B12-B476-356B0F43A9DD--

From owner-freebsd-fs@freebsd.org  Mon May 16 22:44:19 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id AFF7DB3D9FC
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 22:44:19 +0000 (UTC)
 (envelope-from girgen@FreeBSD.org)
Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202])
 by mx1.freebsd.org (Postfix) with ESMTP id 76AF41D37
 for <freebsd-fs@freebsd.org>; Mon, 16 May 2016 22:44:19 +0000 (UTC)
 (envelope-from girgen@FreeBSD.org)
Received: from [10.0.1.11] (h-155-4-128-242.na.cust.bahnhof.se [155.4.128.242])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.pingpong.net (Postfix) with ESMTPSA id 6737716B4D;
 Tue, 17 May 2016 00:44:18 +0200 (CEST)
Subject: Re: Best practice for high availability ZFS pool
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Content-Type: multipart/signed;
 boundary="Apple-Mail=_7E2DA032-0BE8-495C-95AE-5A80E8AB857A";
 protocol="application/pgp-signature"; micalg=pgp-sha256
X-Pgp-Agent: GPGMail 2.6b2
From: Palle Girgensohn <girgen@FreeBSD.org>
In-Reply-To: <284D58D1-1C62-4519-A46B-7D0E8326B86B@ultra-secure.de>
Date: Tue, 17 May 2016 00:44:18 +0200
Cc: freebsd-fs@freebsd.org
Message-Id: <AF7C7C50-B435-48BA-8069-1AB85D2F2B0F@FreeBSD.org>
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
 <284D58D1-1C62-4519-A46B-7D0E8326B86B@ultra-secure.de>
To: Rainer Duffner <rainer@ultra-secure.de>
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 22:44:19 -0000


--Apple-Mail=_7E2DA032-0BE8-495C-95AE-5A80E8AB857A
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8


> 16 maj 2016 kl. 16:52 skrev Rainer Duffner <rainer@ultra-secure.de>:
>=20
>=20
>> Am 16.05.2016 um 12:08 schrieb Palle Girgensohn <girgen@FreeBSD.org>:
>>=20
>> Hi,
>>=20
>> We need to set up a ZFS pool with redundance. The main goal is high =
availability - uptime.
>>=20
>> I can see a few of paths to follow.
>>=20
>> 1. HAST + ZFS
>>=20
>> 2. Some sort of shared storage, two machines sharing a JBOD box.
>>=20
>> 3. ZFS replication (zfs snapshot + zfs send | ssh | zfs receive)
>>=20
>> 4. using something else than ZFS, even a different OS if required.
>=20
>=20
>=20
> There=E2=80=99s always GlusterFS.
> Recently ported to FreeBSD and available as net/gulsterfs (10.3 =
recommended, AFAIK).
>=20
> At work, we use it on Ubuntu - but not with so much data.
> On Linux, I=E2=80=99d use it on top of XFS.
>=20
> For our Cloud-Storage, we went with ScaleIO (which is Linux only).
>=20
> You need more than two nodes with Gluster, though (for production use)
> I think my co-worker said four at least.

Yeah, it is interesting, but as you say, you really create a RAID5 setup =
at least.

>=20
> If you have the money and don=E2=80=99t mind Linux, ScaleIO is =
probably the best you can buy at the moment.
> While licensed at the GByte-Level (yeah, EMC=E2=80=A6) it can be used =
free of charge, unsupported.

Yeah that is definitely an option.

We already have an infrastructure based on ZFS, and I am not sure I do =
trust ZFS on Linux?

Palle


--Apple-Mail=_7E2DA032-0BE8-495C-95AE-5A80E8AB857A
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iQEcBAEBCAAGBQJXOk1CAAoJEDQn0sf36UlsT6MIALIVeD599KCAcpNS0ogBGxK3
a9SOuA/eUUTrsuMqrbvxdBWlvzmOF3IkalgjRpVzuCrup2Ukaq7qxpMPmxVBXylM
dNDiLi6aVU++vlfBbnTJRBrY8HNG2ZhCKBd+r83gCyo6SAPOoYtHEUjLZC/OYhVv
MmBCarS41VD1c/VvildV0inJtLwPeK/ltQb4V39DBuMGKoDYq//cPJqzw4PoPns1
i+M8JS7r70AAanh6QO73gUHr3cTvztwVVNPgjROWispmboZ/Hh6im+dyWwmegDSW
JMV8nlUPG2urq1HukcY1poV3OdY/sWVuO8X8t4F1thEOCeLF5wsf8aL1PhGaXCA=
=0v7N
-----END PGP SIGNATURE-----

--Apple-Mail=_7E2DA032-0BE8-495C-95AE-5A80E8AB857A--

From owner-freebsd-fs@freebsd.org  Mon May 16 22:47:13 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9328EB3DAD8
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 22:47:13 +0000 (UTC)
 (envelope-from rainer@ultra-secure.de)
Received: from connect.ultra-secure.de (connect.ultra-secure.de
 [88.198.71.201])
 by mx1.freebsd.org (Postfix) with ESMTP id C32CD1F6B;
 Mon, 16 May 2016 22:47:12 +0000 (UTC)
 (envelope-from rainer@ultra-secure.de)
Received: (Haraka outbound); Tue, 17 May 2016 00:47:10 +0200
Authentication-Results: connect.ultra-secure.de; iprev=pass; auth=pass (plain);
 spf=none smtp.mailfrom=ultra-secure.de
Received-SPF: None (connect.ultra-secure.de: domain of ultra-secure.de does
 not designate 217.71.83.52 as permitted sender)
 receiver=connect.ultra-secure.de; identity=mailfrom; client-ip=217.71.83.52;
 helo=[192.168.1.200]; envelope-from=<rainer@ultra-secure.de>
Received: from [192.168.1.200] (217-071-083-052.ip-tech.ch [217.71.83.52])
 by connect.ultra-secure.de (Haraka/2.6.2-toaster) with ESMTPSA id
 D4AC038E-61DD-4A34-A05C-6796C46862BF.1
 envelope-from <rainer@ultra-secure.de> (authenticated bits=0)
 (version=TLSv1/SSLv3 cipher=AES256-SHA verify=NO);
 Tue, 17 May 2016 00:47:08 +0200
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: Best practice for high availability ZFS pool
From: Rainer Duffner <rainer@ultra-secure.de>
In-Reply-To: <AF7C7C50-B435-48BA-8069-1AB85D2F2B0F@FreeBSD.org>
Date: Tue, 17 May 2016 00:47:07 +0200
Cc: freebsd-fs@freebsd.org
Message-Id: <DE067239-8D5B-4B8A-8E2D-7EBD3E3B42F8@ultra-secure.de>
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
 <284D58D1-1C62-4519-A46B-7D0E8326B86B@ultra-secure.de>
 <AF7C7C50-B435-48BA-8069-1AB85D2F2B0F@FreeBSD.org>
To: Palle Girgensohn <girgen@FreeBSD.org>
X-Mailer: Apple Mail (2.3124)
X-Haraka-GeoIP: EU, CH, 451km
X-Haraka-ASN: 24951
X-Haraka-GeoIP-Received: 
X-Haraka-ASN: 24951 217.71.80.0/20
X-Haraka-ASN-CYMRU: asn=24951 net=217.71.80.0/20 country=CH assignor=ripencc
 date=2003-08-07
X-Haraka-FCrDNS: 217-071-083-052.ip-tech.ch
X-Haraka-p0f: os="Mac OS X " link_type="DSL" distance=13 total_conn=1
 shared_ip=N
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on spamassassin
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 HTML_MESSAGE autolearn=ham autolearn_force=no version=3.4.1
X-Haraka-Karma: score: 6, good: 167, bad: 0, connections: 327, history: 167,
 asn_score: 101, asn_connections: 112, asn_good: 101, asn_bad: 0, pass:all_good,
 asn, asn_all_good, relaying
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 22:47:13 -0000


> Am 17.05.2016 um 00:44 schrieb Palle Girgensohn <girgen@FreeBSD.org>:
>=20
>>=20
>=20
> We already have an infrastructure based on ZFS, and I am not sure I do =
trust ZFS on Linux?


Wouldn=E2=80=99t start with a 20T pool on that one, TBH ;-)

There are probably a lot of quirks and workarounds needed that only =
those who=E2=80=99ve run it for a long time are aware of (if they=E2=80=99=
re actually aware of them at all).


That said, I=E2=80=99ve run into my own problems with zfs send =
now=E2=80=A6.but only on 10.3.


Rainer=

From owner-freebsd-fs@freebsd.org  Mon May 16 22:50:08 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D956AB3DBC4
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 22:50:08 +0000 (UTC)
 (envelope-from girgen@pingpong.net)
Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202])
 by mx1.freebsd.org (Postfix) with ESMTP id 6BF39109F;
 Mon, 16 May 2016 22:50:07 +0000 (UTC)
 (envelope-from girgen@pingpong.net)
Received: from [10.0.1.14] (h-155-4-128-242.na.cust.bahnhof.se [155.4.128.242])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.pingpong.net (Postfix) with ESMTPSA id 15F0816B6F;
 Tue, 17 May 2016 00:50:07 +0200 (CEST)
Mime-Version: 1.0 (1.0)
Subject: Re: Best practice for high availability ZFS pool
From: Palle Girgensohn <girgen@pingpong.net>
X-Mailer: iPhone Mail (13E238)
In-Reply-To: <DE067239-8D5B-4B8A-8E2D-7EBD3E3B42F8@ultra-secure.de>
Date: Tue, 17 May 2016 00:50:06 +0200
Cc: Palle Girgensohn <girgen@FreeBSD.org>, freebsd-fs@freebsd.org
Message-Id: <726D88E6-A1DF-4E5A-ACFF-8A11E6EB3916@pingpong.net>
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
 <284D58D1-1C62-4519-A46B-7D0E8326B86B@ultra-secure.de>
 <AF7C7C50-B435-48BA-8069-1AB85D2F2B0F@FreeBSD.org>
 <DE067239-8D5B-4B8A-8E2D-7EBD3E3B42F8@ultra-secure.de>
To: Rainer Duffner <rainer@ultra-secure.de>
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 22:50:08 -0000


> 17 maj 2016 kl. 00:47 skrev Rainer Duffner <rainer@ultra-secure.de>:
>=20
>=20
>>> Am 17.05.2016 um 00:44 schrieb Palle Girgensohn <girgen@FreeBSD.org>:
>>>=20
>>=20
>> We already have an infrastructure based on ZFS, and I am not sure I do tr=
ust ZFS on Linux?
>=20
>=20
>=20
>=20
> Wouldn=E2=80=99t start with a 20T pool on that one, TBH ;-)
>=20
> There are probably a lot of quirks and workarounds needed that only those w=
ho=E2=80=99ve run it for a long time are aware of (if they=E2=80=99re actual=
ly aware of them at all).
>=20
>=20
> That said, I=E2=80=99ve run into my own problems with zfs send now=E2=80=A6=
.but only on 10.3.
>=20

We are still 10.2. Are there, in your opinion, regressions in 10.3 for zfs s=
end?=

From owner-freebsd-fs@freebsd.org  Mon May 16 23:02:35 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8EED1B3E04B
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 23:02:35 +0000 (UTC)
 (envelope-from nonesuch@longcount.org)
Received: from mail-io0-x231.google.com (mail-io0-x231.google.com
 [IPv6:2607:f8b0:4001:c06::231])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 5247A1EE3
 for <freebsd-fs@freebsd.org>; Mon, 16 May 2016 23:02:35 +0000 (UTC)
 (envelope-from nonesuch@longcount.org)
Received: by mail-io0-x231.google.com with SMTP id f89so2240066ioi.0
 for <freebsd-fs@freebsd.org>; Mon, 16 May 2016 16:02:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=longcount-org.20150623.gappssmtp.com; s=20150623;
 h=mime-version:subject:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to;
 bh=qXvAPASJ80O1BKmGkDkUgKc4WpmA27h4JISfn/yv4J8=;
 b=LoVMnaqMDjfHLShd1TI0C/cNkD3RlfyclnCzIppSm2DR84bKibuPtEord088D2Bl+4
 WE+mHvNsMqWLQqfCvz5tMjIgkFPBge4ZBPAecCLJ+IewqL3RMu59/6NrFWDPDhVhT7bp
 gHqK+QC+9+6tAp1wFVfGBWLtVsy7bcr9tjp3qfRYX2zDpQku6D3JaHh4hBF+YUz2Mz9J
 YatnbHCRqHvmtEcRbDJO8gnYgyJxpXzPgxt6ZC3P2HeTO8i7CCep67m8LeEklmCPx7PX
 6Ed0w9XLlIqGQr6EJJGtm9CoBJ8Di08KnMfITKgaE2K1sLTna8WpB147elFsiSD2wC+h
 PFCg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to;
 bh=qXvAPASJ80O1BKmGkDkUgKc4WpmA27h4JISfn/yv4J8=;
 b=gPhyAagvN+v6ux1lFxu11yTgSzFrjWfpMWQASSmjH39QGosJHn+IecSSpVStQB35Qg
 XGYAq7GNW3nhykjbZlHVqg1iAkAPt1Vt9IncjdN/ISl+hgzEOK5tTphhjRYorhRERkNh
 YtGS+qPVrFdGnwu8+uzdyJMiF03jC0v0Em5TOCWkuxC/KKaRtRLgfKGfLrVv1OUuT68J
 51S1xpr/LZsjkpTKsmeIkgK8L14dEIShVq5zrt8WZkUntzFjEbTFUmdS9juB2KbY2xBo
 eeHLOLi3J6+GSA/MhEwVHC0cJJar/tRjkqRkwh1Q8kjfKNBLSbiGp/coDVEj3dXgOmh0
 4D7A==
X-Gm-Message-State: AOPr4FX1F97w9iGv7lZG+aZzCeVRRnrcAZSFiWLxhq9KUesA9eQmnUshMMiLCG2Hu3IX9g==
X-Received: by 10.107.10.208 with SMTP id 77mr22206726iok.51.1463439754659;
 Mon, 16 May 2016 16:02:34 -0700 (PDT)
Received: from [100.85.18.225] (153.sub-70-214-103.myvzw.com. [70.214.103.153])
 by smtp.gmail.com with ESMTPSA id uh3sm115304igb.3.2016.05.16.16.02.33
 (version=TLSv1/SSLv3 cipher=OTHER);
 Mon, 16 May 2016 16:02:33 -0700 (PDT)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (1.0)
Subject: Re: Best practice for high availability ZFS pool
From: Mark Saad <nonesuch@longcount.org>
X-Mailer: iPhone Mail (13E238)
In-Reply-To: <726D88E6-A1DF-4E5A-ACFF-8A11E6EB3916@pingpong.net>
Date: Mon, 16 May 2016 19:02:32 -0400
Cc: Rainer Duffner <rainer@ultra-secure.de>, freebsd-fs@freebsd.org,
 Palle Girgensohn <girgen@FreeBSD.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <2135669E-BA6F-4D1F-B865-33D40E74CF51@longcount.org>
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
 <284D58D1-1C62-4519-A46B-7D0E8326B86B@ultra-secure.de>
 <AF7C7C50-B435-48BA-8069-1AB85D2F2B0F@FreeBSD.org>
 <DE067239-8D5B-4B8A-8E2D-7EBD3E3B42F8@ultra-secure.de>
 <726D88E6-A1DF-4E5A-ACFF-8A11E6EB3916@pingpong.net>
To: Palle Girgensohn <girgen@pingpong.net>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 23:02:35 -0000


> On May 16, 2016, at 6:50 PM, Palle Girgensohn <girgen@pingpong.net> wrote:=

>=20
>=20
>=20
>> 17 maj 2016 kl. 00:47 skrev Rainer Duffner <rainer@ultra-secure.de>:
>>=20
>>=20
>>>> Am 17.05.2016 um 00:44 schrieb Palle Girgensohn <girgen@FreeBSD.org>:
>>>=20
>>> We already have an infrastructure based on ZFS, and I am not sure I do t=
rust ZFS on Linux?
>>=20
>>=20
>>=20
>>=20
>> Wouldn=E2=80=99t start with a 20T pool on that one, TBH ;-)
>>=20
>> There are probably a lot of quirks and workarounds needed that only those=
 who=E2=80=99ve run it for a long time are aware of (if they=E2=80=99re actu=
ally aware of them at all).
>>=20
>>=20
>> That said, I=E2=80=99ve run into my own problems with zfs send now=E2=80=A6=
.but only on 10.3.
>=20
> We are still 10.2. Are there, in your opinion, regressions in 10.3 for zfs=
 send?


Hi palle
  So two questions how is your zpool setup ? Are you using a dedicated slog ,=
 and or l2arc ? What level or Zfs raid are you using ? =20

At work we use leofs on top of zfs . Works well has good relocation and spee=
d , but it's an s3 work like not a general purpose fs .=20


---
Mark Saad | nonesuch@longcount.org

> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

From owner-freebsd-fs@freebsd.org  Mon May 16 23:06:37 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 48C8BB3E1A9
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 23:06:37 +0000 (UTC)
 (envelope-from 000.fbsd@quip.cz)
Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 0DEBB1111;
 Mon, 16 May 2016 23:06:36 +0000 (UTC)
 (envelope-from 000.fbsd@quip.cz)
Received: from elsa.codelab.cz (localhost [127.0.0.1])
 by elsa.codelab.cz (Postfix) with ESMTP id 2E8CC28426;
 Tue, 17 May 2016 01:00:40 +0200 (CEST)
Received: from illbsd.quip.test (ip-86-49-16-209.net.upcbroadband.cz
 [86.49.16.209])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by elsa.codelab.cz (Postfix) with ESMTPSA id 29EC528412;
 Tue, 17 May 2016 01:00:39 +0200 (CEST)
Message-ID: <573A5116.3090302@quip.cz>
Date: Tue, 17 May 2016 01:00:38 +0200
From: Miroslav Lachman <000.fbsd@quip.cz>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:35.0) Gecko/20100101 Firefox/35.0 SeaMonkey/2.32
MIME-Version: 1.0
To: Palle Girgensohn <girgen@FreeBSD.org>,
 Borja Marcos <borjam@sarenet.es>
CC: freebsd-fs@freebsd.org
Subject: Re: Best practice for high availability ZFS pool
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
 <F3716A47-BC73-4C51-BF7C-911BCFE4D29F@sarenet.es>
 <89D73122-FAC7-4449-AAB3-C4BBE74B960A@FreeBSD.org>
In-Reply-To: <89D73122-FAC7-4449-AAB3-C4BBE74B960A@FreeBSD.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 23:06:37 -0000

Palle Girgensohn wrote on 05/17/2016 00:36:
>
>> 16 maj 2016 kl. 15:51 skrev Borja Marcos <borjam@sarenet.es>:
>>
>>
>>> On 16 May 2016, at 12:08, Palle Girgensohn <girgen@freebsd.org> wrote:
>>>
>>> Hi,
>>>
>>> We need to set up a ZFS pool with redundance. The main goal is high availability - uptime.
>>>
>>> I can see a few of paths to follow.
>>>
>>> 1. HAST + ZFS
>>
>> Which means that a possible corruption causing bug in ZFS would vaporize the data of both replicas.
>>
>>> 3. ZFS replication (zfs snapshot + zfs send | ssh | zfs receive)
>>
>> If you don’t have a hard requirement for synchronous replication (and, in that case, I would opt for a more application
>> aware approach) it’s the best method in my opinion.
>
> That was exactly my thought 18 months ago, and we set up two systems with zfs snapshot + zfs send | ssh | zfs receive. It works, but the problem is it just too slow and a complete sync takes like 10 minutes for all the file systems. We are forced to sync the file systems one at a time to get the kind of control and separation we need. Even if we could speed that up somehow, we are really looking for a more recilient system. Also, constant snapshotting and writing makes scrub very slow so we need to tune down the amount of syncing every fourth week-end to scrub. It's OK but not optimal, so we're pondering for something better.
>
> My first choice is really HAST at the moment, but I also dont find much written in the last couple of years, apart from some articles about setting it up in very minimal testbeds or posts about performance and stability troubles. This makes me wonder, is HAST actively maintained? Is it stable, used and loved by the community? I'd love to hear some success stories with farily large installations of at least 20 TB or so.

I am not using HAST personally but I read about success with HAST and 
ZFS somewhere in FreeBSD mailing lists. I don't have a direct link / 
bookmark for it. Maybe you will find it thru search engine.

Miroslav Lachman

From owner-freebsd-fs@freebsd.org  Mon May 16 23:07:30 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8A86CB3E23B
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 23:07:30 +0000 (UTC)
 (envelope-from rainer@ultra-secure.de)
Received: from connect.ultra-secure.de (connect.ultra-secure.de
 [88.198.71.201]) by mx1.freebsd.org (Postfix) with ESMTP id E79E511DE
 for <freebsd-fs@freebsd.org>; Mon, 16 May 2016 23:07:29 +0000 (UTC)
 (envelope-from rainer@ultra-secure.de)
Received: (Haraka outbound); Tue, 17 May 2016 01:07:28 +0200
Authentication-Results: connect.ultra-secure.de; iprev=pass; auth=pass (plain);
 spf=none smtp.mailfrom=ultra-secure.de
Received-SPF: None (connect.ultra-secure.de: domain of ultra-secure.de does
 not designate 217.71.83.52 as permitted sender)
 receiver=connect.ultra-secure.de; identity=mailfrom; client-ip=217.71.83.52;
 helo=[192.168.1.200]; envelope-from=<rainer@ultra-secure.de>
Received: from [192.168.1.200] (217-071-083-052.ip-tech.ch [217.71.83.52])
 by connect.ultra-secure.de (Haraka/2.6.2-toaster) with ESMTPSA id
 D0846A73-60AD-4F3A-841F-6946D77246BB.1
 envelope-from <rainer@ultra-secure.de> (authenticated bits=0)
 (version=TLSv1/SSLv3 cipher=AES256-SHA verify=NO);
 Tue, 17 May 2016 01:07:26 +0200
From: Rainer Duffner <rainer@ultra-secure.de>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Subject: zfs receive stalls whole system
Message-Id: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>
Date: Tue, 17 May 2016 01:07:24 +0200
To: FreeBSD Filesystems <freebsd-fs@freebsd.org>
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
X-Mailer: Apple Mail (2.3124)
X-Haraka-GeoIP: EU, CH, 451km
X-Haraka-ASN: 24951
X-Haraka-GeoIP-Received: 
X-Haraka-ASN: 24951 217.71.80.0/20
X-Haraka-ASN-CYMRU: asn=24951 net=217.71.80.0/20 country=CH assignor=ripencc
 date=2003-08-07
X-Haraka-FCrDNS: 217-071-083-052.ip-tech.ch
X-Haraka-p0f: os="Mac OS X " link_type="DSL" distance=13 total_conn=2
 shared_ip=N
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on spamassassin
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00
 autolearn=ham autolearn_force=no version=3.4.1
X-Haraka-Karma: score: 6, good: 168, bad: 0, connections: 328, history: 168,
 asn_score: 102, asn_connections: 113, asn_good: 102, asn_bad: 0, pass:all_good,
 asn, asn_all_good, relaying
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 23:07:30 -0000

Hi,

I have two servers, that were running FreeBSD 10.1-AMD64 for a long =
time, one zfs-sending to the other (via zxfer). Both are NFS-servers and =
MySQL-slaves, the sender is actively used as NFS-server, the recipient =
is just a warm-standby, in case something serious happens and we don=E2=80=
=99t want to wait for a day until the restore is back in place. The =
MySQL-Slaves are actively used as read-only servers (at the application =
level, Python=E2=80=99s SQL-Alchemy does that, apparently).

They are HP DL380G8 (one CPU, hexacore) with over 128 GB RAM (I think =
one has 144, the other has 192).
While they were running 10.1, they used HP P420 RAID-controllers with =
individual 12 RAID0 volumes that I pooled into 6-disk RAIDZ2 vdevs.
I use zfsnap to do hourly, daily and weekly snapshots.

Sending worked well, especially after updating to 10.1

Because the storage was over 90% full (and I really hate this =
RAID0-business we have with the HP RAID controllers), I rebuilt the =
servers with HPs OEMed H220/221 controllers (LSI 2308 in disguise) and =
an external disk shelf, hosting 12 additional disks was added- and I =
upgraded to FreeBSD 10.3.
Because we didn=E2=80=99t want to throw out the original disks, but =
increase available space a lot, the new disks are double the size of the =
original disks (600 vs. 1200 GB SAS).=20
I also created GPT-partitions on the disks and labeled them according to =
the disk=E2=80=99s position in the cages/shelf, created the pools with =
the got-partition-names instead of the daX-names.

Now, when I do a zxfer, sometimes the whole system stalls while the data =
is sent over, especially if the delta is large or if something else is =
reading from the disk at the same time (backup agent).

I had this before, on 10.0 (I believe, we didn=E2=80=99t have this in =
9.1 either, IIRC) and it went away in 10.1.

It=E2=80=99s very difficult (well, impossible) to debug, because the =
system totally hangs and doesn=E2=80=99t accept any keypresses.

Would a ZIL help in this case?
I always thought that NFS was the only thing that did SYNC writes=E2=80=A6=


From owner-freebsd-fs@freebsd.org  Mon May 16 23:14:25 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 86A07B3E393
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 23:14:25 +0000 (UTC)
 (envelope-from rainer@ultra-secure.de)
Received: from connect.ultra-secure.de (connect.ultra-secure.de
 [88.198.71.201]) by mx1.freebsd.org (Postfix) with ESMTP id E65151647
 for <freebsd-fs@freebsd.org>; Mon, 16 May 2016 23:14:24 +0000 (UTC)
 (envelope-from rainer@ultra-secure.de)
Received: (Haraka outbound); Tue, 17 May 2016 01:14:23 +0200
Authentication-Results: connect.ultra-secure.de; iprev=pass; auth=pass (plain);
 spf=none smtp.mailfrom=ultra-secure.de
Received-SPF: None (connect.ultra-secure.de: domain of ultra-secure.de does
 not designate 217.71.83.52 as permitted sender)
 receiver=connect.ultra-secure.de; identity=mailfrom; client-ip=217.71.83.52;
 helo=[192.168.1.200]; envelope-from=<rainer@ultra-secure.de>
Received: from [192.168.1.200] (217-071-083-052.ip-tech.ch [217.71.83.52])
 by connect.ultra-secure.de (Haraka/2.6.2-toaster) with ESMTPSA id
 C3F38CD4-6758-439C-B896-0AEB07043CA5.1
 envelope-from <rainer@ultra-secure.de> (authenticated bits=0)
 (version=TLSv1/SSLv3 cipher=AES256-SHA verify=NO);
 Tue, 17 May 2016 01:14:21 +0200
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: zfs receive stalls whole system
From: Rainer Duffner <rainer@ultra-secure.de>
In-Reply-To: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>
Date: Tue, 17 May 2016 01:14:19 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <1513A05F-1DA7-4765-A67C-360555C97CF0@ultra-secure.de>
References: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>
To: FreeBSD Filesystems <freebsd-fs@freebsd.org>
X-Mailer: Apple Mail (2.3124)
X-Haraka-GeoIP: EU, CH, 451km
X-Haraka-ASN: 24951
X-Haraka-GeoIP-Received: 
X-Haraka-ASN: 24951 217.71.80.0/20
X-Haraka-ASN-CYMRU: asn=24951 net=217.71.80.0/20 country=CH assignor=ripencc
 date=2003-08-07
X-Haraka-FCrDNS: 217-071-083-052.ip-tech.ch
X-Haraka-p0f: os="Mac OS X " link_type="DSL" distance=13 total_conn=3
 shared_ip=N
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on spamassassin
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00
 autolearn=ham autolearn_force=no version=3.4.1
X-Haraka-Karma: score: 6, good: 169, bad: 0, connections: 329, history: 169,
 asn_score: 103, asn_connections: 114, asn_good: 103, asn_bad: 0, pass:all_good,
 asn, asn_all_good, relaying
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 23:14:25 -0000


>=20
> Would a ZIL help in this case?
> I always thought that NFS was the only thing that did SYNC writes=E2=80=A6=

>=20


I mean an SSD-based SLOG device, for the record.

Because I=E2=80=99ve already maxed out my three PCIe slots I have with =
the single CPU, my only option would be a DC S3710, it seems.
Only problems is I have to mount it into a HP disk-tray, somehow, which =
I=E2=80=99ve never tried.

Also, it would sit on the same SAS-backplane as the other disks, which =
may or may not be a good thing...=

From owner-freebsd-fs@freebsd.org  Tue May 17 01:36:13 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E94A6B3DBE4
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 01:36:13 +0000 (UTC)
 (envelope-from m.e.sanliturk@gmail.com)
Received: from mail-oi0-x22e.google.com (mail-oi0-x22e.google.com
 [IPv6:2607:f8b0:4003:c06::22e])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id BAFF71067;
 Tue, 17 May 2016 01:36:13 +0000 (UTC)
 (envelope-from m.e.sanliturk@gmail.com)
Received: by mail-oi0-x22e.google.com with SMTP id v145so2863560oie.0;
 Mon, 16 May 2016 18:36:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc; bh=1JFEatSlMO0Hp7F9c31GY/oDeD0LzVNEFL5AthZjwSU=;
 b=nEIK1S5SqHjIBDlPaaJI4lgJC9n6GcFNtMGco7vikDTf+XJ6oo/VYIy/Ev6pNY6xXP
 lGtKRFbg8/zhHWfCx5iieX9NmZUWkKO49gJMeze31tp/q5P5h5RmX4Bu32Tw9cwrCymI
 O5n+IorMpA3OrnQMs02tOiBMynuKDdvG7tQp9qGtRHBPEtAHng89nfhdLHurKZH42gOB
 IxWPL0d/HPoRP5jOWeoSVuGXuhDB3TajB4t63uzQWIYhRhaffez+B1R4p4ozlF3XZCqU
 dEQc5to/hlB9j6CVEIzhtGcJDZhnDrUhvKFoSdX7+a19MM3Inu0qxPwl1WQ9Mj4kx0Iy
 +oag==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:date
 :message-id:subject:from:to:cc;
 bh=1JFEatSlMO0Hp7F9c31GY/oDeD0LzVNEFL5AthZjwSU=;
 b=nHa3ywsmK0wD+HUckxYV/PiVEepQOx0fASmRusxFELsZDV4OBmaf6it0dudeDm0GLY
 /DoGlj0JwbvICBbyAVgGs32REJ0F0Irzh609nzI00r1UHtfnmNT+6tnL34k806FiRKec
 OiAUgdC7tzKu2y7fd+ytEwmkiJIZTRpWpuwH+1kOrEdXIeG+JXkvRVhnzZFhItSP3GSY
 l3anbjxhjwFW0MRxXTQA408+y+FZ65kJFf3E8uoAe49BWD4474kFYSBYKf0gZpGKpeFD
 9jUf34Z+HKwdspYOJz10hvbGss4SNvwYxeMeL4oI6iR2yem17s+rvoUubp/tDaXG7mNc
 hjXQ==
X-Gm-Message-State: AOPr4FWQo+LAzjsjggiUjMkDniystn4TU/ev8k46y1cWAJNftsCN18cd/9fZAUpA9oWsOPMGxzx1mDqsSfGctQ==
MIME-Version: 1.0
X-Received: by 10.202.222.197 with SMTP id v188mr16403551oig.82.1463448972892; 
 Mon, 16 May 2016 18:36:12 -0700 (PDT)
Received: by 10.157.45.131 with HTTP; Mon, 16 May 2016 18:36:12 -0700 (PDT)
In-Reply-To: <573A5116.3090302@quip.cz>
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
 <F3716A47-BC73-4C51-BF7C-911BCFE4D29F@sarenet.es>
 <89D73122-FAC7-4449-AAB3-C4BBE74B960A@FreeBSD.org>
 <573A5116.3090302@quip.cz>
Date: Mon, 16 May 2016 18:36:12 -0700
Message-ID: <CAOgwaMthvQ7y3boTw3Yk=ETXgL64OGPV1Pw022SVTB5vCtfhVg@mail.gmail.com>
Subject: Re: Best practice for high availability ZFS pool
From: Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com>
To: Miroslav Lachman <000.fbsd@quip.cz>
Cc: Palle Girgensohn <girgen@freebsd.org>, Borja Marcos <borjam@sarenet.es>,
 freebsd-fs@freebsd.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 01:36:14 -0000

On Mon, May 16, 2016 at 4:00 PM, Miroslav Lachman <000.fbsd@quip.cz> wrote:

> Palle Girgensohn wrote on 05/17/2016 00:36:
>
>>
>> 16 maj 2016 kl. 15:51 skrev Borja Marcos <borjam@sarenet.es>:
>>>
>>>
>>> On 16 May 2016, at 12:08, Palle Girgensohn <girgen@freebsd.org> wrote:
>>>>
>>>> Hi,
>>>>
>>>> We need to set up a ZFS pool with redundance. The main goal is high
>>>> availability - uptime.
>>>>
>>>> I can see a few of paths to follow.
>>>>
>>>> 1. HAST + ZFS
>>>>
>>>
>>> Which means that a possible corruption causing bug in ZFS would vaporiz=
e
>>> the data of both replicas.
>>>
>>> 3. ZFS replication (zfs snapshot + zfs send | ssh | zfs receive)
>>>>
>>>
>>> If you don=E2=80=99t have a hard requirement for synchronous replicatio=
n (and,
>>> in that case, I would opt for a more application
>>> aware approach) it=E2=80=99s the best method in my opinion.
>>>
>>
>> That was exactly my thought 18 months ago, and we set up two systems wit=
h
>> zfs snapshot + zfs send | ssh | zfs receive. It works, but the problem i=
s
>> it just too slow and a complete sync takes like 10 minutes for all the f=
ile
>> systems. We are forced to sync the file systems one at a time to get the
>> kind of control and separation we need. Even if we could speed that up
>> somehow, we are really looking for a more recilient system. Also, consta=
nt
>> snapshotting and writing makes scrub very slow so we need to tune down t=
he
>> amount of syncing every fourth week-end to scrub. It's OK but not optima=
l,
>> so we're pondering for something better.
>>
>> My first choice is really HAST at the moment, but I also dont find much
>> written in the last couple of years, apart from some articles about sett=
ing
>> it up in very minimal testbeds or posts about performance and stability
>> troubles. This makes me wonder, is HAST actively maintained? Is it stabl=
e,
>> used and loved by the community? I'd love to hear some success stories w=
ith
>> farily large installations of at least 20 TB or so.
>>
>
> I am not using HAST personally but I read about success with HAST and ZFS
> somewhere in FreeBSD mailing lists. I don't have a direct link / bookmark
> for it. Maybe you will find it thru search engine.
>
> Miroslav Lachman
> _______________________________________________
> f <freebsd-fs@freebsd.org>
>


If you search

HAST and ZFS

in Google , it will provide a long list of possible related pages .


Mehmet Erol Sanliturk

From owner-freebsd-fs@freebsd.org  Tue May 17 01:48:14 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5F369B3DFC1
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 01:48:14 +0000 (UTC)
 (envelope-from bfriesen@simple.dallas.tx.us)
Received: from smtp.simplesystems.org (smtp.simplesystems.org [65.66.246.90])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
 bits)) (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 2F58B1744;
 Tue, 17 May 2016 01:48:13 +0000 (UTC)
 (envelope-from bfriesen@simple.dallas.tx.us)
Received: from freddy.simplesystems.org (freddy.simplesystems.org
 [65.66.246.65])
 by smtp.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id u4H1hnHC008304;
 Mon, 16 May 2016 20:43:49 -0500 (CDT)
Date: Mon, 16 May 2016 20:43:49 -0500 (CDT)
From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-X-Sender: bfriesen@freddy.simplesystems.org
To: Palle Girgensohn <girgen@freebsd.org>
cc: freebsd-fs@freebsd.org
Subject: Re: Best practice for high availability ZFS pool
In-Reply-To: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
Message-ID: <alpine.GSO.2.20.1605162034170.7756@freddy.simplesystems.org>
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
User-Agent: Alpine 2.20 (GSO 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
 (smtp.simplesystems.org [65.66.246.90]); Mon, 16 May 2016 20:43:49 -0500 (CDT)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 01:48:14 -0000

On Mon, 16 May 2016, Palle Girgensohn wrote:
>
> Shared storage still has a single point of failure, the JBOD box. 
> Apart from that, is there even any support for the kind of storage 
> PCI cards that support dual head for a storage box? I cannot find 
> any.

Use two (or three) JBOD boxes and do simple zfs mirroring across them 
so you can unplug a JBOD and the pool still works. Or use a bunch of 
JBOD boxes and use zfs raidz2 (or raidz3) across them with careful LUN 
selection so there is total storage redundancy and you can unplug a 
JBOD and the pool still works.

Fiber channel (or FCoE) or iSCSI allows putting the hardware at some 
distance.

Without completely isolated systems there is always the risk of total 
failure.  Even with zfs send there is the risk of total failure if the 
sent data results in corruption on the receiving side.

Decide if you really want to optimize for maximum availability or you 
want to minimize the duration of the outage if something goes wrong. 
There is a difference.

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From owner-freebsd-fs@freebsd.org  Tue May 17 01:51:28 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C7908B3E164
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 01:51:28 +0000 (UTC)
 (envelope-from bfriesen@simple.dallas.tx.us)
Received: from smtp.simplesystems.org (smtp.simplesystems.org [65.66.246.90])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
 bits)) (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9986719AF
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 01:51:28 +0000 (UTC)
 (envelope-from bfriesen@simple.dallas.tx.us)
Received: from freddy.simplesystems.org (freddy.simplesystems.org
 [65.66.246.65])
 by smtp.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id u4H1pQeY008619;
 Mon, 16 May 2016 20:51:27 -0500 (CDT)
Date: Mon, 16 May 2016 20:51:26 -0500 (CDT)
From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-X-Sender: bfriesen@freddy.simplesystems.org
To: Rainer Duffner <rainer@ultra-secure.de>
cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject: Re: zfs receive stalls whole system
In-Reply-To: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>
Message-ID: <alpine.GSO.2.20.1605162050250.7756@freddy.simplesystems.org>
References: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>
User-Agent: Alpine 2.20 (GSO 67 2015-01-07)
MIME-Version: 1.0
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
 (smtp.simplesystems.org [65.66.246.90]); Mon, 16 May 2016 20:51:27 -0500 (CDT)
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8BIT
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 01:51:28 -0000

On Tue, 17 May 2016, Rainer Duffner wrote:
>
> It’s very difficult (well, impossible) to debug, because the system 
> totally hangs and doesn’t accept any keypresses.
>
> Would a ZIL help in this case?
> I always thought that NFS was the only thing that did SYNC writes…

This sounds like a hardware or driver problem.  A dedicated ZIL 
won't help a system which entirely hangs.

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
From owner-freebsd-fs@freebsd.org  Tue May 17 03:59:57 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 34572B3EDD1
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 03:59:57 +0000 (UTC)
 (envelope-from rainer@ultra-secure.de)
Received: from connect.ultra-secure.de (connect.ultra-secure.de
 [88.198.71.201]) by mx1.freebsd.org (Postfix) with ESMTP id 91AB4115F
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 03:59:56 +0000 (UTC)
 (envelope-from rainer@ultra-secure.de)
Received: (Haraka outbound); Tue, 17 May 2016 05:59:54 +0200
Authentication-Results: connect.ultra-secure.de; iprev=pass; auth=pass (plain);
 spf=none smtp.mailfrom=ultra-secure.de
Received-SPF: None (connect.ultra-secure.de: domain of ultra-secure.de does
 not designate 217.71.83.52 as permitted sender)
 receiver=connect.ultra-secure.de; identity=mailfrom; client-ip=217.71.83.52;
 helo=[192.168.1.200]; envelope-from=<rainer@ultra-secure.de>
Received: from [192.168.1.200] (217-071-083-052.ip-tech.ch [217.71.83.52])
 by connect.ultra-secure.de (Haraka/2.6.2-toaster) with ESMTPSA id
 DBC21CD3-3F3C-4C27-B0CE-EA8A86995E60.1
 envelope-from <rainer@ultra-secure.de> (authenticated bits=0)
 (version=TLSv1/SSLv3 cipher=AES256-SHA verify=NO);
 Tue, 17 May 2016 05:59:47 +0200
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: zfs receive stalls whole system
From: Rainer Duffner <rainer@ultra-secure.de>
In-Reply-To: <alpine.GSO.2.20.1605162050250.7756@freddy.simplesystems.org>
Date: Tue, 17 May 2016 05:59:45 +0200
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <3E271E07-F60E-4181-B8B0-9ED2CFCDF5A0@ultra-secure.de>
References: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>
 <alpine.GSO.2.20.1605162050250.7756@freddy.simplesystems.org>
To: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-Mailer: Apple Mail (2.3124)
X-Haraka-GeoIP: EU, CH, 451km
X-Haraka-ASN: 24951
X-Haraka-GeoIP-Received: 
X-Haraka-ASN: 24951 217.71.80.0/20
X-Haraka-ASN-CYMRU: asn=24951 net=217.71.80.0/20 country=CH assignor=ripencc
 date=2003-08-07
X-Haraka-FCrDNS: 217-071-083-052.ip-tech.ch
X-Haraka-p0f: os="Mac OS X " link_type="DSL" distance=13 total_conn=1
 shared_ip=N
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on spamassassin
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.1
X-Haraka-Karma: score: 6, good: 170, bad: 0, connections: 330, history: 170,
 asn_score: 104, asn_connections: 115, asn_good: 104, asn_bad: 0, pass:all_good,
 asn, asn_all_good, relaying
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 03:59:57 -0000


> Am 17.05.2016 um 03:51 schrieb Bob Friesenhahn =
<bfriesen@simple.dallas.tx.us>:
>=20
> On Tue, 17 May 2016, Rainer Duffner wrote:
>>=20
>> It=E2=80=99s very difficult (well, impossible) to debug, because the =
system totally hangs and doesn=E2=80=99t accept any keypresses.
>>=20
>> Would a ZIL help in this case?
>> I always thought that NFS was the only thing that did SYNC writes=E2=80=
=A6
>=20
> This sounds like a hardware or driver problem.  A dedicated ZIL won't =
help a system which entirely hangs.


When I rebuilt these systems, I started with the 2nd one, the =
standby-system.
I zfs sent 5 or 6T worth of data from the original system to it and it =
was very fast. I got 600 MBit flat out of it.

Then, I made that system master while I rebuilt the other one.

When I synced back, I got maybe 500MBit on the zfs sends.

And I started to see these stalls on sending updates.

Could this be a problem:

(nfs2-prod </root>) 1 # sysctl -a |grep mps |grep =
"driver_version\|firmware_version"
dev.mps.2.driver_version: 20.00.00.00-fbsd
dev.mps.2.firmware_version: 15.10.01.00
dev.mps.1.driver_version: 20.00.00.00-fbsd
dev.mps.1.firmware_version: 13.10.53.00
dev.mps.0.driver_version: 20.00.00.00-fbsd
dev.mps.0.firmware_version: 13.10.53.00


As per this thread:

=
https://forums.freenas.org/index.php?threads/9-3-1-update-with-alert-firmw=
are-version-16-does-not-match-driver-version-20-for-dev-mps0.36536/page-4


I will have to ask HP for a newer firmware then=E2=80=A6


From owner-freebsd-fs@freebsd.org  Tue May 17 07:47:17 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D4384B3E79D
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 07:47:17 +0000 (UTC)
 (envelope-from jg@internetx.com)
Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39])
 by mx1.freebsd.org (Postfix) with ESMTP id 98EAB1DA6;
 Tue, 17 May 2016 07:47:17 +0000 (UTC)
 (envelope-from jg@internetx.com)
Received: from localhost (localhost [127.0.0.1])
 by mx1.internetx.com (Postfix) with ESMTP id A015F45FC0CD;
 Tue, 17 May 2016 09:41:50 +0200 (CEST)
X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de
Received: from mx1.internetx.com ([62.116.129.39])
 by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id zDk391QzatOV; Tue, 17 May 2016 09:41:48 +0200 (CEST)
Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.internetx.com (Postfix) with ESMTPSA id 602374C4C754;
 Tue, 17 May 2016 09:41:48 +0200 (CEST)
Subject: Re: Best practice for high availability ZFS pool
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
To: Palle Girgensohn <girgen@FreeBSD.org>, freebsd-fs@freebsd.org
From: InterNetX - Juergen Gotteswinter <jg@internetx.com>
Reply-To: jg@internetx.com
Message-ID: <cc321bd9-afd2-b512-86a9-b4509187d9ed@internetx.com>
Date: Tue, 17 May 2016 09:41:44 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 07:47:17 -0000

Hi,

Am 5/16/2016 um 12:08 PM schrieb Palle Girgensohn:
> Hi,
> 
> We need to set up a ZFS pool with redundance. The main goal is high availability - uptime.
> 
> I can see a few of paths to follow.
> 
> 1. HAST + ZFS

dont do this, this has already been discussed some time ago. afaik
nothing changed until this

https://lists.freebsd.org/pipermail/freebsd-fs/2014-October/020084.html

> 
> 2. Some sort of shared storage, two machines sharing a JBOD box.

take care when choosing sas hba and expander, avoid sata behind sas

with dual expander jbods you will be able to build an ha setup, but i
highly recommend to avoid any home brew solutions. go for rsf-1.


> 
> 3. ZFS replication (zfs snapshot + zfs send | ssh | zfs receive)
> 
> 4. using something else than ZFS, even a different OS if required.
> 
> My main concern with HAST+ZFS is performance. Google offer some insights here, I find mainly unsolved problems. Please share any success stories or other experiences.
> 

performance isnt the real problem, check the older discussion mentioned
above.

> Shared storage still has a single point of failure, the JBOD box. Apart from that, is there even any support for the kind of storage PCI cards that support dual head for a storage box? I cannot find any.
> 

the jbods are just a dumb piece of metal with an expander mounted. so
far, i never had a broken one.

> We are running with ZFS replication today, but it is just too slow for the amount of data.
> 

replicate more often to keep the delta between each snapshot as small as
possible? maybe even 10G crosslink if possible?


> We prefer to keep ZFS as we already have a rather big (~30 TB) pool and also tools, scripts, backup all is using ZFS, but if there is no solution using ZFS, we're open to alternatives. Nexenta springs to mind, but I believe it is using shared storage for redundance, so it does have single points of failure?
> 
> Any other suggestions? Please share your experience. :)
> 
> Palle
> 

From owner-freebsd-fs@freebsd.org  Tue May 17 07:56:28 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id F0982B3EBEA
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 07:56:28 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id E1A3E1953
 for <freebsd-fs@FreeBSD.org>; Tue, 17 May 2016 07:56:28 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u4H7uSBY018300
 for <freebsd-fs@FreeBSD.org>; Tue, 17 May 2016 07:56:28 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 209093] ZFS snapshot rename : .zfs/snapshot messes up
Date: Tue, 17 May 2016 07:56:28 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.3-RELEASE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Some People
X-Bugzilla-Who: commit-hook@freebsd.org
X-Bugzilla-Status: Open
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: avg@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-209093-3630-GJ0mio0DUu@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-209093-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-209093-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 07:56:29 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209093

--- Comment #3 from commit-hook@freebsd.org ---
A commit references this bug:

Author: avg
Date: Tue May 17 07:56:05 UTC 2016
New revision: 300024
URL: https://svnweb.freebsd.org/changeset/base/300024

Log:
  zfs_ioc_rename: fix a reversed condition

  FreeBSD zfs_ioc_rename() has an option, not present upstream, that
  allows to rename snapshots without unmounting them first.  I am not sure
  what is a rationale for that option, but its actual behavior was the
  opposite of the intended behavior.  That is, by default the snapshots
  were not unmounted.
  The option was introduced as part of a large update from upstream in
  r248498.

  One of the consequences was a havoc under .zfs/snapshot after the rename.
  The snapshots got new names but were mounted on top of directories with
  old names, so readdir would list the new names, but lookup would still
  find the old mounts.

  PR:           209093
  Reported by:  Fr?d?ric VANNI?RE <f.vanniere@planet-work.com>
  MFC after:    5 days

Changes:
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c

--=20
You are receiving this mail because:
You are on the CC list for the bug.=

From owner-freebsd-fs@freebsd.org  Tue May 17 07:58:25 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0C77FB3EDAC
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 07:58:25 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id F1A771B98
 for <freebsd-fs@FreeBSD.org>; Tue, 17 May 2016 07:58:24 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u4H7wObv021387
 for <freebsd-fs@FreeBSD.org>; Tue, 17 May 2016 07:58:24 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 209093] ZFS snapshot rename : .zfs/snapshot messes up
Date: Tue, 17 May 2016 07:58:25 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.3-RELEASE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Some People
X-Bugzilla-Who: avg@FreeBSD.org
X-Bugzilla-Status: In Progress
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: avg@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_status
Message-ID: <bug-209093-3630-Vsmuh6I9uf@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-209093-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-209093-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 07:58:25 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209093

Andriy Gapon <avg@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|Open                        |In Progress

--=20
You are receiving this mail because:
You are on the CC list for the bug.=

From owner-freebsd-fs@freebsd.org  Tue May 17 07:59:44 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id AA3D4B3EE1F
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 07:59:44 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 9A5841C97
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 07:59:44 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 99C27B3EE1D; Tue, 17 May 2016 07:59:44 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 994FDB3EE1B;
 Tue, 17 May 2016 07:59:44 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from cu01176b.smtpx.saremail.com (cu01176b.smtpx.saremail.com
 [195.16.151.151])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6038B1C96;
 Tue, 17 May 2016 07:59:43 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from [172.16.8.36] (izaro.sarenet.es [192.148.167.11])
 by proxypop01.sare.net (Postfix) with ESMTPSA id E7CBC9DD374;
 Tue, 17 May 2016 09:49:46 +0200 (CEST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: ZFS and NVMe, trim caused stalling
From: Borja Marcos <borjam@sarenet.es>
In-Reply-To: <5E710EA5-C9B0-4521-85F1-3FE87555B0AF@bsdimp.com>
Date: Tue, 17 May 2016 09:49:46 +0200
Cc: fs@freebsd.org,
 FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <BD7424F9-2968-410D-8146-27496054BCFA@sarenet.es>
References: <5E710EA5-C9B0-4521-85F1-3FE87555B0AF@bsdimp.com>
To: Warner Losh <imp@bsdimp.com>
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 07:59:44 -0000


> On 05 May 2016, at 16:39, Warner Losh <imp@bsdimp.com> wrote:
>=20
>> What do you think? In some cases it=E2=80=99s clear that TRIM can do =
more harm than good.
>=20
> I think it=E2=80=99s best we not overreact.

I agree. But with this issue the system is almost unusable for now.

> This particular case is cause by the nvd driver, not the Intel P3500 =
NVME drive. You need
> a solution (3): Fix the driver.
>=20
> Specifically, ZFS is pushing down a boatload of BIO_DELETE requests. =
In ata/da land, these
> requests are queued up, then collapsed together as much as makes sense =
(or is possible).
> This vastly helps performance (even with the extra sorting that I =
forced to be in there that I
> need to fix before 11). The nvd driver needs to do the same thing.

I understand that, but I don=E2=80=99t think it=E2=80=99s a good that =
ZFS depends blindly on a driver feature such
as that. Of course, it=E2=80=99s great to exploit it.

I have also noticed that ZFS has a good throttling mechanism for write =
operations. A similar
mechanism should throttle trim requests so that trim requests don=E2=80=99=
t clog the whole system.

> I=E2=80=99d be extremely hesitant to tossing away TRIMs. They are =
actually quite important for
> the FTL in the drive=E2=80=99s firmware to proper manage the NAND =
wear. More free space always
> reduces write amplification. It tends to go as 1 / freespace, so =
simply dropping them on
> the floor should be done with great reluctance.

I understand. I was wondering about choosing the lesser between two =
evils. A 15 minute
I/O stall (I deleted 2 TB of data, that=E2=80=99s a lot, but not so =
unrealistic) or settings trims aside
during the peak activity.

I see that I was wrong on that, as a throttling mechanism would be more =
than enough probably,
unless the system is close to running out of space.

I=E2=80=99ve filed a bug report anyway. And copying to -stable.


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209571

Thanks!


Borja.


From owner-freebsd-fs@freebsd.org  Tue May 17 08:20:57 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B0D63B3E992
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 08:20:57 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 9BF3E1D7C
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 08:20:57 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 9B34EB3E991; Tue, 17 May 2016 08:20:57 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9AD48B3E990
 for <fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 08:20:57 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 33C901D7B
 for <fs@freebsd.org>; Tue, 17 May 2016 08:20:56 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u4H8Kohw013266
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Tue, 17 May 2016 11:20:51 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u4H8Kohw013266
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u4H8Ko7G013262;
 Tue, 17 May 2016 11:20:50 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Tue, 17 May 2016 11:20:50 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Cc: fs@freebsd.org
Subject: Re: quick fix for slow directory shrinking in ffs
Message-ID: <20160517082050.GX89104@kib.kiev.ua>
References: <20160517072705.F2157@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160517072705.F2157@besplex.bde.org>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 08:20:57 -0000

On Tue, May 17, 2016 at 07:54:27AM +1000, Bruce Evans wrote:
> ffs does very slow shrinking of directories after removing some files
> leaves unused blocks at the end, by always doing synchronous truncation.
> 
> This often happens in my normal usage: medium size builds expand /tmp
> from 512 to 1024 to hold a few more hundred bytes of file names;
> expansion is async and fast, but shrinking is sync and slow, and
> with a certain size of build the boundary is crossed back and forth
> very often.
> 
> My /tmp directory is always on an async-mounted file system, so this
> quick fix of always doing an async truncation for async mounts works
> for me.  Using IO_SYNC when not asked to is a bug for async mounts
> in all cases anyway.
> 
> The file system has block size 8192 and frag size 1024, so it is also
> wrong to shrink to size DIRBLKSIZE = 512.  The shrinkage seems to be
> considered at every DIRBLKSIZE boundary, so not only small directories
> are affected.
> 
> The patch fixes an unrelated typo in a message.
> 
> X Index: ufs_lookup.c
> X ===================================================================
> X --- ufs_lookup.c	(revision 299263)
> X +++ ufs_lookup.c	(working copy)
> X @@ -1131,9 +1131,9 @@
> X  		if (tvp != NULL)
> X  			VOP_UNLOCK(tvp, 0);
> X  		error = UFS_TRUNCATE(dvp, (off_t)dp->i_endoff,
> X -		    IO_NORMAL | IO_SYNC, cr);
> X +		    IO_NORMAL | (DOINGASYNC(dvp) ? 0 : IO_SYNC), cr);
> X  		if (error != 0)
> X -			vprint("ufs_direnter: failted to truncate", dvp);
> X +			vprint("ufs_direnter: failed to truncate", dvp);
> X  #ifdef UFS_DIRHASH
> X  		if (error == 0 && dp->i_dirhash != NULL)
> X  			ufsdirhash_dirtrunc(dp, dp->i_endoff);
> 

The IO_SYNC flag, for non-journaled SU and any kind of non-SU mounts,
only affects the new blocks allocation mode, and write-out mode for
the last fragment. The truncation itself (for -J) is performed in the
context of the truncating thread. The cg blocks, after the bits are
set to free, are marked for delayed write (with the background write
hack). The inode block is written according to the mount mode, ignoring
IO_SYNC.

That is, for always fully populated directory files, I do not see how
anything is changed by the patch.

I committed the typo fix.

From owner-freebsd-fs@freebsd.org  Tue May 17 08:33:28 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 67E1EB3EF9F
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 08:33:28 +0000 (UTC)
 (envelope-from freebsd-listen@fabiankeil.de)
Received: from smtprelay05.ispgateway.de (smtprelay05.ispgateway.de
 [80.67.31.98])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 2FCAF1F8A
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 08:33:27 +0000 (UTC)
 (envelope-from freebsd-listen@fabiankeil.de)
Received: from [78.35.176.77] (helo=fabiankeil.de)
 by smtprelay05.ispgateway.de with esmtpsa (TLSv1.2:AES128-GCM-SHA256:128)
 (Exim 4.84) (envelope-from <freebsd-listen@fabiankeil.de>)
 id 1b2aPy-0002dW-TL
 for freebsd-fs@freebsd.org; Tue, 17 May 2016 10:31:54 +0200
Date: Tue, 17 May 2016 10:27:57 +0200
From: Fabian Keil <freebsd-listen@fabiankeil.de>
To: FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject: Re: zfs receive stalls whole system
Message-ID: <20160517102757.135c1468@fabiankeil.de>
In-Reply-To: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>
References: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 boundary="Sig_/ZojT=4SLUeXeJZEf2IdOajl"; protocol="application/pgp-signature"
X-Df-Sender: Nzc1MDY3
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 08:33:28 -0000

--Sig_/ZojT=4SLUeXeJZEf2IdOajl
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Rainer Duffner <rainer@ultra-secure.de> wrote:

> I have two servers, that were running FreeBSD 10.1-AMD64 for a long time,=
 one zfs-sending to the other (via zxfer). Both are NFS-servers and MySQL-s=
laves, the sender is actively used as NFS-server, the recipient is just a w=
arm-standby, in case something serious happens and we don=E2=80=99t want to=
 wait for a day until the restore is back in place. The MySQL-Slaves are ac=
tively used as read-only servers (at the application level, Python=E2=80=99=
s SQL-Alchemy does that, apparently).
>=20
> They are HP DL380G8 (one CPU, hexacore) with over 128 GB RAM (I think one=
 has 144, the other has 192).
> While they were running 10.1, they used HP P420 RAID-controllers with ind=
ividual 12 RAID0 volumes that I pooled into 6-disk RAIDZ2 vdevs.
> I use zfsnap to do hourly, daily and weekly snapshots.
[...]
> Now, when I do a zxfer, sometimes the whole system stalls while the data =
is sent over, especially if the delta is large or if something else is read=
ing from the disk at the same time (backup agent).
>=20
> I had this before, on 10.0 (I believe, we didn=E2=80=99t have this in 9.1=
 either, IIRC) and it went away in 10.1.

Do you use geli for swap device(s)?

> It=E2=80=99s very difficult (well, impossible) to debug, because the syst=
em totally hangs and doesn=E2=80=99t accept any keypresses.

You could try reducing ZFS's deadman timeout to get a panic.
On systems with local disks I usually use:

vfs.zfs.deadman_enabled: 1
vfs.zfs.deadman_checktime_ms: 5000
vfs.zfs.deadman_synctime_ms: 10000

Fabian

--Sig_/ZojT=4SLUeXeJZEf2IdOajl
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlc61g4ACgkQBYqIVf93VJ0shgCaA2wnHQq+AKX3XK7yt5jWKHZ/
rUEAn1IMBjKGvRcA9ZljB/Qy7cY0gLAk
=TR3y
-----END PGP SIGNATURE-----

--Sig_/ZojT=4SLUeXeJZEf2IdOajl--

From owner-freebsd-fs@freebsd.org  Tue May 17 08:42:47 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 782ABB3D422
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 08:42:47 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 62F8E18BF
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 08:42:47 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 61FF9B3D41F; Tue, 17 May 2016 08:42:47 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 61A2DB3D41D
 for <fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 08:42:47 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id D57BF18BE
 for <fs@freebsd.org>; Tue, 17 May 2016 08:42:46 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u4H8gfBE018503
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Tue, 17 May 2016 11:42:42 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u4H8gfBE018503
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u4H8gfYj018502;
 Tue, 17 May 2016 11:42:41 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Tue, 17 May 2016 11:42:41 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Cc: fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
Message-ID: <20160517084241.GY89104@kib.kiev.ua>
References: <20160517072104.I2137@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160517072104.I2137@besplex.bde.org>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 08:42:47 -0000

On Tue, May 17, 2016 at 07:26:08AM +1000, Bruce Evans wrote:
> Counting of i/o's in g_vfs_strategy() requires the fs to initialize
> devvp->v_rdev->si_mountpt to non-null.  This seems to be done correctly
> in ext2fs and msdosfs, but in ffs it is not done for ro mounts, or for
> rw mounts that started as ro.  The bug is most obvious for the root
> file system since it always starts as ro.

I committed the comments updates.

For the accounting patch, don't we want to account for all io, including
the mount-time metadata reads and initial superblock update ?

diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
index 9776554..712fc21 100644
--- a/sys/ufs/ffs/ffs_vfsops.c
+++ b/sys/ufs/ffs/ffs_vfsops.c
@@ -780,6 +780,8 @@ ffs_mountfs(devvp, mp, td)
 		mp->mnt_iosize_max = MAXPHYS;
 
 	devvp->v_bufobj.bo_ops = &ffs_ops;
+	if (devvp->v_type == VCHR)
+		devvp->v_rdev->si_mountpt = mp;
 
 	fs = NULL;
 	sblockloc = 0;
@@ -1049,8 +1051,6 @@ ffs_mountfs(devvp, mp, td)
 			ffs_flushfiles(mp, FORCECLOSE, td);
 			goto out;
 		}
-		if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
-			devvp->v_rdev->si_mountpt = mp;
 		if (fs->fs_snapinum[0] != 0)
 			ffs_snapshot_mount(mp);
 		fs->fs_fmod = 1;
@@ -1083,6 +1083,8 @@ ffs_mountfs(devvp, mp, td)
 out:
 	if (bp)
 		brelse(bp);
+	if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
+		devvp->v_rdev->si_mountpt = NULL;
 	if (cp != NULL) {
 		DROP_GIANT();
 		g_topology_lock();

From owner-freebsd-fs@freebsd.org  Tue May 17 08:42:48 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1D322B3D426
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 08:42:48 +0000 (UTC)
 (envelope-from wjw@digiware.nl)
Received: from smtp.digiware.nl (unknown [IPv6:2001:4cb8:90:ffff::3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id DB15418C0
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 08:42:47 +0000 (UTC)
 (envelope-from wjw@digiware.nl)
Received: from rack1.digiware.nl (unknown [127.0.0.1])
 by smtp.digiware.nl (Postfix) with ESMTP id 24D29153402;
 Tue, 17 May 2016 10:42:45 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.nl
Received: from smtp.digiware.nl ([127.0.0.1])
 by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id LKXrGJgCqvNe; Tue, 17 May 2016 10:42:44 +0200 (CEST)
Received: from [IPv6:2001:4cb8:3:1:8c49:9de7:acf1:6a1f] (unknown
 [IPv6:2001:4cb8:3:1:8c49:9de7:acf1:6a1f])
 by smtp.digiware.nl (Postfix) with ESMTP id 301D415340A;
 Tue, 17 May 2016 10:42:44 +0200 (CEST)
Subject: Re: Bigger MAX_PATH (Was: Re: State of native encryption in ZFS)
To: Peter Jeremy <peter@rulingia.com>
References: <5736E7B4.1000409@gmail.com>
 <57378707.19425.B54772B@s_sourceforge.nedprod.com>
 <CAHM0Q_PGvBRbUFOhmin4RKaDKRTRJyjieuaZ5_tjPerK4eRz=w@mail.gmail.com>
 <57385356.4525.E728971@s_sourceforge.nedprod.com>
 <9ead4b28-9711-5e38-483f-ef9eaf0bc583@digiware.nl>
 <20160516200543.GC42426@server.rulingia.com>
Cc: "freebsd-fs@FreeBSD.org" <freebsd-fs@freebsd.org>
From: Willem Jan Withagen <wjw@digiware.nl>
Organization: Digiware Management b.v.
Message-ID: <ee8e870f-68ef-0e8f-5300-16edb3c4c251@digiware.nl>
Date: Tue, 17 May 2016 10:42:32 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <20160516200543.GC42426@server.rulingia.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 08:42:48 -0000

On 16-5-2016 22:05, Peter Jeremy wrote:
> On 2016-May-16 15:18:17 +0200, Willem Jan Withagen <wjw@digiware.nl> wrote:
>> Trying to port Ceph is also running into the limit in:
>> /usr/include/sys/syslimits.h:
>> #define NAME_MAX        255  /* max bytes in a file name */
>>
>> but I also found:
>> /usr/include/stdio.h:
>> #define FILENAME_MAX    1024 /* must be <= PATH_MAX <sys/syslimits.h> */
>>
>> So take a pick??
>
> There are two distinct limits: The maximum number of characters in a
> pathname component (ie the name seen in a directory entry):  For UFS,
> this is 255 because the length is stored on disk in a uint8_t (I don't
> know the limit for ZFS).  The other limit is the maximum number of
> characters in a pathname - PATH_MAX.  This is used to dimension various
> buffers but isn't persistent on disk so you should be able to increase
> it by changing the relevant #defines and rebuilding everything.

Don't remeber if I did such an experiment.
Got to talk to the local engineer of dutie here to see if I can get a few
more VMs to go compile and blow up. :)

Getting the NAME_MAX size per fs is something I'm going to need in the
long run for Ceph to make optimal usage of its capabilities.

I think that Linux is now at 1024, and the underlaying store for Ceph
is going to 4096.....

--WjW


From owner-freebsd-fs@freebsd.org  Tue May 17 09:08:21 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 41F3FB3DBBE;
 Tue, 17 May 2016 09:08:21 +0000 (UTC)
 (envelope-from rainer@ultra-secure.de)
Received: from connect.ultra-secure.de (connect.ultra-secure.de
 [88.198.71.201])
 by mx1.freebsd.org (Postfix) with ESMTP id 676BE13C1;
 Tue, 17 May 2016 09:08:19 +0000 (UTC)
 (envelope-from rainer@ultra-secure.de)
Received: (Haraka outbound); Tue, 17 May 2016 11:08:18 +0200
Authentication-Results: connect.ultra-secure.de; auth=pass (login);
 spf=none smtp.mailfrom=ultra-secure.de
Received-SPF: None (connect.ultra-secure.de: domain of ultra-secure.de does
 not designate 127.0.0.16 as permitted sender)
 receiver=connect.ultra-secure.de; identity=mailfrom; client-ip=127.0.0.16;
 helo=connect.ultra-secure.de; envelope-from=<rainer@ultra-secure.de>
Received: from connect.ultra-secure.de (expwebmail [127.0.0.16])
 by connect.ultra-secure.de (Haraka/2.6.2-toaster) with ESMTPSA id
 6E9A37E4-94A9-49FA-B13F-28674A2778A6.1
 envelope-from <rainer@ultra-secure.de> (authenticated bits=0)
 (version=TLSv1/SSLv3 cipher=AES128-GCM-SHA256 verify=NO);
 Tue, 17 May 2016 11:08:15 +0200
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8;
 format=flowed
Content-Transfer-Encoding: 8bit
Date: Tue, 17 May 2016 11:08:14 +0200
From: rainer@ultra-secure.de
To: Fabian Keil <freebsd-listen@fabiankeil.de>
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>, owner-freebsd-fs@freebsd.org
Subject: Re: zfs receive stalls whole system
In-Reply-To: <20160517102757.135c1468@fabiankeil.de>
References: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>
 <20160517102757.135c1468@fabiankeil.de>
Message-ID: <c090ab7bbff2fffe2a49284f9be70183@ultra-secure.de>
X-Sender: rainer@ultra-secure.de
User-Agent: Roundcube Webmail/1.1.4
X-Haraka-GeoIP: --, , NaNkm
X-Haraka-GeoIP-Received: 
X-Haraka-p0f: os="undefined undefined" link_type="undefined"
 distance=undefined total_conn=undefined shared_ip=Y
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on spamassassin
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.1
X-Haraka-Karma: score: 6, good: 42, bad: 0, connections: 57, history: 42,
 pass:all_good, relaying
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 09:08:21 -0000

Am 2016-05-17 10:27, schrieb Fabian Keil:
> Rainer Duffner <rainer@ultra-secure.de> wrote:
> 
>> I have two servers, that were running FreeBSD 10.1-AMD64 for a long 
>> time, one zfs-sending to the other (via zxfer). Both are NFS-servers 
>> and MySQL-slaves, the sender is actively used as NFS-server, the 
>> recipient is just a warm-standby, in case something serious happens 
>> and we don’t want to wait for a day until the restore is back in 
>> place. The MySQL-Slaves are actively used as read-only servers (at the 
>> application level, Python’s SQL-Alchemy does that, apparently).
>> 
>> They are HP DL380G8 (one CPU, hexacore) with over 128 GB RAM (I think 
>> one has 144, the other has 192).
>> While they were running 10.1, they used HP P420 RAID-controllers with 
>> individual 12 RAID0 volumes that I pooled into 6-disk RAIDZ2 vdevs.
>> I use zfsnap to do hourly, daily and weekly snapshots.
> [...]
>> Now, when I do a zxfer, sometimes the whole system stalls while the 
>> data is sent over, especially if the delta is large or if something 
>> else is reading from the disk at the same time (backup agent).
>> 
>> I had this before, on 10.0 (I believe, we didn’t have this in 9.1 
>> either, IIRC) and it went away in 10.1.
> 
> Do you use geli for swap device(s)?


Yes, I do.
/dev/mirror/swap.eli		none	swap	sw		0	0

Bad idea?


>> It’s very difficult (well, impossible) to debug, because the system 
>> totally hangs and doesn’t accept any keypresses.
> 
> You could try reducing ZFS's deadman timeout to get a panic.
> On systems with local disks I usually use:
> 
> vfs.zfs.deadman_enabled: 1
> vfs.zfs.deadman_checktime_ms: 5000
> vfs.zfs.deadman_synctime_ms: 10000


Too bad I don't have a spare-system I could use to test this ;-)

From owner-freebsd-fs@freebsd.org  Tue May 17 09:56:41 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 10D24B3E7FB
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 09:56:41 +0000 (UTC)
 (envelope-from crest@rlwinm.de)
Received: from smtp.rlwinm.de (smtp.rlwinm.de [IPv6:2a01:4f8:201:31ef::e])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id D01361EE6
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 09:56:40 +0000 (UTC)
 (envelope-from crest@rlwinm.de)
Received: from crest.local (unknown [87.253.189.132])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.rlwinm.de (Postfix) with ESMTPSA id 35B6A86E0
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 11:56:29 +0200 (CEST)
Subject: Re: Best practice for high availability ZFS pool
To: freebsd-fs@freebsd.org
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
From: Jan Bramkamp <crest@rlwinm.de>
Message-ID: <84e3b485-d8bd-0f2f-47a4-85a64678d286@rlwinm.de>
Date: Tue, 17 May 2016 11:56:28 +0200
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
 Gecko/20100101 Thunderbird/45.0
MIME-Version: 1.0
In-Reply-To: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 09:56:41 -0000

On 16/05/16 12:08, Palle Girgensohn wrote:
> Hi,
>
> We need to set up a ZFS pool with redundance. The main goal is high availability - uptime.
>
> I can see a few of paths to follow.
>
> 1. HAST + ZFS
>
> 2. Some sort of shared storage, two machines sharing a JBOD box.

If you're willing to put your disks into JBODs you can use JBODs with 
two upstream ports per SAS expander and hook up one port to each head 
node. Now you can access the all disks on both head nodes. The next step 
you require is reliable master election. Two nodes alone can't form the 
required consensus.  In theory you could use SCSI persistent 
reservations, but afaik FreeBSD lacks the tooling unless you want send 
raw SCSI commands through camcontrol. The easier solution is to run some 
master election using one or three additional nodes for a total of three 
or five. Both consul and etcd are available as ports and are designed 
for reliable master election without special hardware.

If you go done this way you still need some kind of fencing (maybe via 
IPMI or PDUs).

Now the JBOD is your SPoF so get yourself at least two or better three 
JBODs. For optimal performance and reliability use three JBODs with 
3-way mirrors spread over all JOBDs.

In this setup no hardware protects your disks from the hot standby. If 
it falls out of sync you have to keep it from writing to the shared 
directed attached storage. One way to achieve this would be to load the 
SAS HBA kernel module only after the role (primary, backup) has been 
elected and disable the HBA option ROM in the UEFI/BIOS.

I tried this once out of curiosity and it performed well, but good look 
finding any support for such a setup.

The same kind of setup should be possible with iSCSI instead of SAS 
disks connected to dual ported expanders, but I can't say anything about 
the performance you can expect from the FreeBSD iSCSI target and 
initiator. At least it would simply fencing a lot because the fencing 
could be moved from the SCSI initiators into the SCSI targets.

Jan Bramkamp

From owner-freebsd-fs@freebsd.org  Tue May 17 09:58:56 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 590C9B3E8B6
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 09:58:56 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x233.google.com (mail-wm0-x233.google.com
 [IPv6:2a00:1450:400c:c09::233])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 0E0BF1FCF
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 09:58:56 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x233.google.com with SMTP id e201so132560545wme.0
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 02:58:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:subject:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to;
 bh=fVFYb9DEHoOtLYb3+bXfEaJ0BkPZUk/TdYG58pqGLJk=;
 b=cXL7gTzqQ/z0DwE5cR8VGa/unafK8AjKW+v6Gaiqr4WNPHKkq+M1nQcTZRotnmh+G/
 pk93ASZY7cF3JgoBw0FU9D8s2wwD6xod2rpkleRh6UnFGk2pQHz7YqOs2xusqKPKfn7Q
 zQAbANZR8yL8Kz06sI7z5qug3cjm0GQuMQXxNb53eS8Mp2cH2aMZOsuNUMMCVinm2WvM
 ctMcNuxG0pGL9wAN/mFaEl9lbCXLC73LX/h6CLIoLXxJWllD8Cim0kyuDyRDRqptUYf7
 wXUTw/0yrAVkKv6mU8ESEFdVGG5VlgFjhz63XcRUAez8kLU0MPIHrPz00yCMn6cAP4qt
 ojOw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to;
 bh=fVFYb9DEHoOtLYb3+bXfEaJ0BkPZUk/TdYG58pqGLJk=;
 b=CaTHLtusM8hSlTeNufOY7rFmFFwHKwndIoi3Y3wUleXBVUnxTBLKUeWLdg7bBmhKPt
 P+0VSzp7IgU25k9yqEkXJUvwHC1TUm9SImpsRGRuPVGKNY7wb0HKJT41xzCXfBNiZf4d
 60TxePet/z/A0sCFI7Uq0jPdt7WFtPSqlItpep92/FgdPfTb36MnMHS6BK8tbqp3PztI
 TjelVyL85lWnpYuNCt7aWWIhm2/ZZH+GfZDfIkXvUQ+iEXextRRpJoCFFNVdvVjUZqci
 bEmgdkynHyRZ0DysAymBNI6nvDWzjCvW/euf0HJG1MnOb2h88Zs/0gxmd5FHZI7XhBnB
 wWvQ==
X-Gm-Message-State: AOPr4FXUJzN2GaBVBzoLI/LVLLsFvjXpPzP+kmPDUOmLZjGRLoaJJFTejmqSyD6XP2YfYQ==
X-Received: by 10.28.86.10 with SMTP id k10mr458623wmb.96.1463479134591;
 Tue, 17 May 2016 02:58:54 -0700 (PDT)
Received: from [192.168.1.16] (210.236.26.109.rev.sfr.net. [109.26.236.210])
 by smtp.gmail.com with ESMTPSA id jp2sm2183352wjc.16.2016.05.17.02.58.53
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Tue, 17 May 2016 02:58:53 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: Bigger MAX_PATH (Was: Re: State of native encryption in ZFS)
From: Ben RUBSON <ben.rubson@gmail.com>
In-Reply-To: <9ead4b28-9711-5e38-483f-ef9eaf0bc583@digiware.nl>
Date: Tue, 17 May 2016 11:58:52 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <9F057D48-5413-437B-A612-64D47E95C846@gmail.com>
References: <5736E7B4.1000409@gmail.com>
 <57378707.19425.B54772B@s_sourceforge.nedprod.com>
 <CAHM0Q_PGvBRbUFOhmin4RKaDKRTRJyjieuaZ5_tjPerK4eRz=w@mail.gmail.com>
 <57385356.4525.E728971@s_sourceforge.nedprod.com>
 <9ead4b28-9711-5e38-483f-ef9eaf0bc583@digiware.nl>
To: "freebsd-fs@FreeBSD.org" <freebsd-fs@freebsd.org>
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 09:58:56 -0000

> On 15 may 2016 at 12:45, Niall Douglas wrote:
>=20
>>> If FreeBSD had a bigger PATH_MAX then stackable encryptions layers
>>> like ecryptfs (encfs?) would be viable choices. Because encrypted
>>> path components are so long, one runs very rapidly into the maximum
>>> path on the system when PATH_MAX is so low.

Could you give us some examples where PATH_MAX was too low for you using =
ecryptfs ?
I (for the moment) do not run into troubles using EncFS.

> http://freebsd.1045724.n5.nabble.com/misc-184340-PATH-MAX-not-interope
> rable-with-Linux-td5864469.html

And examples where PATH_MAX is too low using Rsync ?
Is it too low when we want to sync from Linux to FreeBSD ? Or from =
FreeBSD to Linux ?
Using Rsync over SSH ? Or using the Rsync daemon on the receiving side ?

Thank you very much !

Ben


From owner-freebsd-fs@freebsd.org  Tue May 17 10:26:43 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 730D8B3EF91
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 10:26:43 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 60D271E00
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 10:26:43 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 601CDB3EF8F; Tue, 17 May 2016 10:26:43 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5FC29B3EF8D
 for <fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 10:26:43 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail108.syd.optusnet.com.au (mail108.syd.optusnet.com.au
 [211.29.132.59]) by mx1.freebsd.org (Postfix) with ESMTP id 10C9E1DFF
 for <fs@freebsd.org>; Tue, 17 May 2016 10:26:42 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au
 (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109])
 by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id A332D1A3E13;
 Tue, 17 May 2016 20:26:33 +1000 (AEST)
Date: Tue, 17 May 2016 20:26:26 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
cc: fs@freebsd.org
Subject: Re: quick fix for slow directory shrinking in ffs
In-Reply-To: <20160517082050.GX89104@kib.kiev.ua>
Message-ID: <20160517192933.U4573@besplex.bde.org>
References: <20160517072705.F2157@besplex.bde.org>
 <20160517082050.GX89104@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=TuMb/2jh c=1 sm=1 tr=0
 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10
 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=DFMq5MnWpUiX_LZUeQ4A:9
 a=l0SOHounc31-8c1A:21 a=KhYsnkSFTUpZV8AS:21 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 10:26:43 -0000

On Tue, 17 May 2016, Konstantin Belousov wrote:

> On Tue, May 17, 2016 at 07:54:27AM +1000, Bruce Evans wrote:
>> ffs does very slow shrinking of directories after removing some files
>> leaves unused blocks at the end, by always doing synchronous truncation.
>> ...
>> X Index: ufs_lookup.c
>> X ===================================================================
>> X --- ufs_lookup.c	(revision 299263)
>> X +++ ufs_lookup.c	(working copy)
>> X @@ -1131,9 +1131,9 @@
>> X  		if (tvp != NULL)
>> X  			VOP_UNLOCK(tvp, 0);
>> X  		error = UFS_TRUNCATE(dvp, (off_t)dp->i_endoff,
>> X -		    IO_NORMAL | IO_SYNC, cr);
>> X +		    IO_NORMAL | (DOINGASYNC(dvp) ? 0 : IO_SYNC), cr);
>> X  		if (error != 0)
>> X -			vprint("ufs_direnter: failted to truncate", dvp);
>> X +			vprint("ufs_direnter: failed to truncate", dvp);
>> X  #ifdef UFS_DIRHASH
>> X  		if (error == 0 && dp->i_dirhash != NULL)
>> X  			ufsdirhash_dirtrunc(dp, dp->i_endoff);
>
> The IO_SYNC flag, for non-journaled SU and any kind of non-SU mounts,
> only affects the new blocks allocation mode, and write-out mode for
> the last fragment. The truncation itself (for -J) is performed in the
> context of the truncating thread. The cg blocks, after the bits are
> set to free, are marked for delayed write (with the background write
> hack). The inode block is written according to the mount mode, ignoring
> IO_SYNC.

I don't see why you think that.  ffs_truncate() clearly honors IO_SYNC,
and testing shows that ffs with soft updates does precisely 7 extra
sync writes for directory compaction (where some of the 7 are probably
to sync previous activity).

I think it would be wrong to ignore IO_SYNC and use the mount mode for
inodes.  Async mounts still have that bug IIRC (I fixed locally long
ago).  IO_SYNC is set if the file is is open with O_SYNC and mount mode
must not override this.  I think ffs has no way of telling that this
particular IO_SYNC is not associated with O_SYNC.

> That is, for always fully populated directory files, I do not see how
> anything is changed by the patch.

This problem affects all 512-boundaries, which are rarely block or
even fragment boundaries.

Test program:

X mp=$(df . | grep -v Filesystem | sed 's/ .*//')
X echo $mp
X while :;
X do
X 	echo -n "start:   $(/usr/bin/stat -f %5z .); "
X 	mount -v | grep $mp | sed -e 's/.*writes/writes/' -e 's/, reads.*//'
X 
X 	touch $(jot 41 0)	# just over 512 bytes
X 	echo -n "touch:   $(/usr/bin/stat -f %5z .); "
X 	mount -v | grep $mp | sed -e 's/.*writes/writes/' -e 's/, reads.*//'
X 
X 	#
X 	# Async mounts are still broken in -current -- these rm's (but nothing
X 	# else here) cause sync writes (but just 1 for the 2 rm's).
X 	#
X 	# rm 39 40		# just under, but no truncation yet
X 	echo -n "rm:      $(/usr/bin/stat -f %5z .); "
X 	mount -v | grep $mp | sed -e 's/.*writes/writes/' -e 's/, reads.*//'
X 
X 	#
X 	# Another bug in async mounts makes the truncate for the compaction
X 	# triggered by this touch do an async write (with the fix to stop it
X 	# doing a sync write.
X 	#
X 	touch 39		# still under; this creation does the truncation
X 	echo -n "touch 39:$(/usr/bin/stat -f %5z .); "
X 	mount -v | grep $mp | sed -e 's/.*writes/writes/' -e 's/, reads.*//'
X 
X 	sleep 10
X 	echo
X done

I hope this uses a portable enough way to find the mount point.  This must
be run in an empty directory (or you have to adjust the sizes).

Results:
- async mount with fix: 1 sync write per iteration.  A bogus one triggered
   by the rm.  I only fixed this locally.  Remove the rm line so that the
   size stays slightly about 1024 bytes and there are 0 sync writes.
   There is also 1 async write triggered by the truncate.  This is another
   bug in async mounts which I have fixed locally.  All writes for async
   mounts should be delayed unless IO_SYNC forces them to be sync.

- soft updates: 7 sync writes per iteration all triggered by the final
   touch (which triggers the compaction).  Remove the rm line and there
   are again 0 sync writes. Sometimes there are 2-5 async writes between
   the loop iterations.  These might be for the loop too, since there
   are more of them than for async mounts.  (I left daemons running
   whuile testing this on the root file system.  Test on a completely
   idle fs to be sure.)

- no soft updates and no async mount: first touch does 3 sync writes, rm
   does 2 sync, last touch does 4 sync; 0 async writes.

The IO_SYNC for soft updates apparently turns all the previous writes for
the loop into sync ones.  It has to order them and wait for them and there
is no better way to wait than a sync write.  The ordering makes an
unnecessary sync write even more expensive for soft updates than for
other cases.

Some relevant code in ffs_truncate:

Y 	/*
Y 	 * Shorten the size of the file. If the file is not being
Y 	 * truncated to a block boundary, the contents of the
Y 	 * partial block following the end of the file must be
Y 	 * zero'ed in case it ever becomes accessible again because
Y 	 * of subsequent file growth. Directories however are not
Y 	 * zero'ed as they should grow back initialized to empty.
Y 	 */
Y 	offset = blkoff(fs, length);
Y 	if (offset == 0) {
Y 		ip->i_size = length;
Y 		DIP_SET(ip, i_size, length);
Y 	} else {
Y 		lbn = lblkno(fs, length);
Y 		flags |= BA_CLRBUF;
Y 		error = UFS_BALLOC(vp, length - 1, 1, cred, flags, &bp);
Y 		if (error) {
Y 			return (error);
Y 		}
Y 		/*
Y 		 * When we are doing soft updates and the UFS_BALLOC
Y 		 * above fills in a direct block hole with a full sized
Y 		 * block that will be truncated down to a fragment below,
Y 		 * we must flush out the block dependency with an FSYNC
Y 		 * so that we do not get a soft updates inconsistency
Y 		 * when we create the fragment below.
Y 		 */
Y 		if (DOINGSOFTDEP(vp) && lbn < NDADDR &&
Y 		    fragroundup(fs, blkoff(fs, length)) < fs->fs_bsize &&
Y 		    (error = ffs_syncvnode(vp, MNT_WAIT)) != 0)
Y 			return (error);
Y 		ip->i_size = length;
Y 		DIP_SET(ip, i_size, length);
Y 		size = blksize(fs, ip, lbn);
Y 		if (vp->v_type != VDIR)
Y 			bzero((char *)bp->b_data + offset,
Y 			    (u_int)(size - offset));
Y 		/* Kirk's code has reallocbuf(bp, size, 1) here */
Y 		allocbuf(bp, size);
Y 		if (bp->b_bufsize == fs->fs_bsize)
Y 			bp->b_flags |= B_CLUSTEROK;
Y 		if (flags & IO_SYNC)
Y 			bwrite(bp);
Y 		else
Y 			bawrite(bp);
Y 	}

I think we usually arrive here and honor the IO_SYNC flag.  This is correct.
Otherwise, we always do an async write, but that is wrong for async mounts.
Here is my old fix for this:

Z diff -u2 ffs_inode.c~ ffs_inode.c
Z --- ffs_inode.c~	Wed Apr  7 21:22:26 2004
Z +++ ffs_inode.c	Sat Mar 23 01:23:16 2013
Z @@ -345,4 +431,6 @@
Z  		if (flags & IO_SYNC)
Z  			bwrite(bp);
Z +		else if (DOINGASYNC(ovp))
Z +			bdwrite(bp);
Z  		else
Z  			bawrite(bp);

This fix must be sprinkled in most places where there is a bwrite()/
bawrite() decision.

Bruce

From owner-freebsd-fs@freebsd.org  Tue May 17 10:30:16 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 83E92B3F0E9
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 10:30:16 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x233.google.com (mail-wm0-x233.google.com
 [IPv6:2a00:1450:400c:c09::233])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 2C93B101F
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 10:30:16 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x233.google.com with SMTP id a17so22804850wme.0
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 03:30:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:subject:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to;
 bh=jty0ntpJDQGFhO9kRh3QxQ/7cIhmQo6O2TEo6JXUcD8=;
 b=NmWfhwGoZYPQv84VccOt96jk5ETqkdYM93eRba4g9TOf+tKUAHTP3mzu5PfGDeuqEy
 JA5Bs1pch0pMeG3ar8okJSje6sDq9SZWcRrutGmBpe1sD3IcfglKWF/kZ96+9ZySz2uC
 JSyE6FBRBvEkQ/MxnD2ZH5lfHf79hJb/UH9fE4yjfLRYza/eysQmZFdN18AvGZfX80Fo
 SkgsJGW7vUkpCCzwKAGLjJj6Cosnzb2vpCkPXocM2ts+gxwW65tD0PVJOnqgdJ4+WzG6
 QPerm2DSeVgHnTlqYItPodXk1xNz65iyTfWTIEWDYyMyJ4FF0S7qeGSV4xVGOTadiuXw
 a61A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to;
 bh=jty0ntpJDQGFhO9kRh3QxQ/7cIhmQo6O2TEo6JXUcD8=;
 b=abYT7RgQ8lNOwwg+ccy3y/br34uWppe0zu+a3pt9Fn8Yp88Kxy1g5dCbSjdTgHvfpx
 3jHRaqXWFFzdooAMlHOgmkY2nsQifRuefQrtsfXp0/tc3Gn4GB5KZ1xumO9EJmPMTb9Z
 Tnj784wqiLnDvou8juSxorK0QwSIumARuXD1/f+ejN4TXECqJ+Ii+CEQfjH5FmvSzLZc
 jOJbLWwBjJIjirrhXkAixSbkyId4MHP9MbeUf+0pASEmeID99XnxlW76TPV42NknJlAV
 YXdPZYPAavEXN/3kkM+9xAsl9FJ+RPxkPXxVOWM10BSoTmuPWCBK8OHPQqy4WO4wM1Ko
 GVsA==
X-Gm-Message-State: AOPr4FX4ft77QpFfUdSZUwLp5rkCsevvlXs07vLdnWJjRjuoH+WzqMBmTmzCbLcgyIYy5w==
X-Received: by 10.194.72.103 with SMTP id c7mr646622wjv.65.1463481014764;
 Tue, 17 May 2016 03:30:14 -0700 (PDT)
Received: from [192.168.1.16] (210.236.26.109.rev.sfr.net. [109.26.236.210])
 by smtp.gmail.com with ESMTPSA id lr9sm2297152wjb.39.2016.05.17.03.30.13
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Tue, 17 May 2016 03:30:14 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: Best practice for high availability ZFS pool
From: Ben RUBSON <ben.rubson@gmail.com>
In-Reply-To: <alpine.GSO.2.20.1605162034170.7756@freddy.simplesystems.org>
Date: Tue, 17 May 2016 12:30:13 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <AB71607F-7048-404E-AFE3-D448823BB768@gmail.com>
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
 <alpine.GSO.2.20.1605162034170.7756@freddy.simplesystems.org>
To: freebsd-fs@freebsd.org
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 10:30:16 -0000

> On 17 may 2016 at 03:43, Bob Friesenhahn =
<bfriesen@simple.dallas.tx.us> wrote:
>=20
> On Mon, 16 May 2016, Palle Girgensohn wrote:
>>=20
>> Shared storage still has a single point of failure, the JBOD box. =
Apart from that, is there even any support for the kind of storage PCI =
cards that support dual head for a storage box? I cannot find any.
>=20
> Use two (or three) JBOD boxes and do simple zfs mirroring across them =
so you can unplug a JBOD and the pool still works. Or use a bunch of =
JBOD boxes and use zfs raidz2 (or raidz3) across them with careful LUN =
selection so there is total storage redundancy and you can unplug a JBOD =
and the pool still works.
>=20
> Fiber channel (or FCoE) or iSCSI allows putting the hardware at some =
distance.
>=20
> Without completely isolated systems there is always the risk of total =
failure.  Even with zfs send there is the risk of total failure if the =
sent data results in corruption on the receiving side.

In this case rollback one of the previous snapshots on the receiving =
side ?
Did you mean the sent data can totally brake the receiving pool making =
it unusable / unable to import ? Did we already see this ?

Thank you,

Ben=

From owner-freebsd-fs@freebsd.org  Tue May 17 10:45:01 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A80CCB3F3D8
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 10:45:01 +0000 (UTC)
 (envelope-from ronald-lists@klop.ws)
Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl
 [195.190.28.81]) (using TLSv1 with cipher AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6D0321763
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 10:45:00 +0000 (UTC)
 (envelope-from ronald-lists@klop.ws)
Received: from smtp.greenhost.nl ([213.108.104.138])
 by smarthost1.greenhost.nl with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16)
 (Exim 4.72) (envelope-from <ronald-lists@klop.ws>)
 id 1b2cUd-0001G6-Ak; Tue, 17 May 2016 12:44:51 +0200
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: "FreeBSD Filesystems" <freebsd-fs@freebsd.org>, "Rainer Duffner"
 <rainer@ultra-secure.de>
Subject: Re: zfs receive stalls whole system
References: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>
Date: Tue, 17 May 2016 12:44:50 +0200
MIME-Version: 1.0
Content-Transfer-Encoding: Quoted-Printable
From: "Ronald Klop" <ronald-lists@klop.ws>
Message-ID: <op.yhlr40k3kndu52@ronaldradial.radialsg.local>
In-Reply-To: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>
User-Agent: Opera Mail/1.0 (Win32)
X-Authenticated-As-Hash: 398f5522cb258ce43cb679602f8cfe8b62a256d1
X-Virus-Scanned: by clamav at smarthost1.samage.net
X-Spam-Level: /
X-Spam-Score: -0.2
X-Spam-Status: No, score=-0.2 required=5.0 tests=ALL_TRUSTED,
 BAYES_50 autolearn=disabled version=3.4.0
X-Scan-Signature: 67ca9281b58cf5c8a5b2b1d981559170
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 10:45:01 -0000

On Tue, 17 May 2016 01:07:24 +0200, Rainer Duffner  =

<rainer@ultra-secure.de> wrote:

> Hi,
>
> I have two servers, that were running FreeBSD 10.1-AMD64 for a long  =

> time, one zfs-sending to the other (via zxfer). Both are NFS-servers a=
nd  =

> MySQL-slaves, the sender is actively used as NFS-server, the recipient=
  =

> is just a warm-standby, in case something serious happens and we don=E2=
=80=99t  =

> want to wait for a day until the restore is back in place. The  =

> MySQL-Slaves are actively used as read-only servers (at the applicatio=
n  =

> level, Python=E2=80=99s SQL-Alchemy does that, apparently).
>
> They are HP DL380G8 (one CPU, hexacore) with over 128 GB RAM (I think =
 =

> one has 144, the other has 192).
> While they were running 10.1, they used HP P420 RAID-controllers with =
 =

> individual 12 RAID0 volumes that I pooled into 6-disk RAIDZ2 vdevs.
> I use zfsnap to do hourly, daily and weekly snapshots.
>
> Sending worked well, especially after updating to 10.1
>
> Because the storage was over 90% full (and I really hate this  =

> RAID0-business we have with the HP RAID controllers), I rebuilt the  =

> servers with HPs OEMed H220/221 controllers (LSI 2308 in disguise) and=
  =

> an external disk shelf, hosting 12 additional disks was added- and I  =

> upgraded to FreeBSD 10.3.
> Because we didn=E2=80=99t want to throw out the original disks, but in=
crease  =

> available space a lot, the new disks are double the size of the origin=
al  =

> disks (600 vs. 1200 GB SAS).
> I also created GPT-partitions on the disks and labeled them according =
to  =

> the disk=E2=80=99s position in the cages/shelf, created the pools with=
 the  =

> got-partition-names instead of the daX-names.
>
> Now, when I do a zxfer, sometimes the whole system stalls while the da=
ta  =

> is sent over, especially if the delta is large or if something else is=
  =

> reading from the disk at the same time (backup agent).
>
> I had this before, on 10.0 (I believe, we didn=E2=80=99t have this in =
9.1  =

> either, IIRC) and it went away in 10.1.
>
> It=E2=80=99s very difficult (well, impossible) to debug, because the s=
ystem  =

> totally hangs and doesn=E2=80=99t accept any keypresses.
>
> Would a ZIL help in this case?
> I always thought that NFS was the only thing that did SYNC writes=E2=80=
=A6

Databases love SYNC writes too. (But that doesn't say anything about the=
  =

unresponsive system).
I think there is a statistic somewhere in FreeBSD to analyze the sync vs=
  =

async writes and decide if a ZIL will help or not. (But that doesn't say=
  =

anything about the unresponsive system either).

Ronald.

From owner-freebsd-fs@freebsd.org  Tue May 17 10:47:00 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 78943B3F475
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 10:47:00 +0000 (UTC)
 (envelope-from ronald-lists@klop.ws)
Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl
 [195.190.28.81]) (using TLSv1 with cipher AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 408231979
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 10:47:00 +0000 (UTC)
 (envelope-from ronald-lists@klop.ws)
Received: from smtp.greenhost.nl ([213.108.104.138])
 by smarthost1.greenhost.nl with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16)
 (Exim 4.72) (envelope-from <ronald-lists@klop.ws>)
 id 1b2cWg-0001vj-1L; Tue, 17 May 2016 12:46:58 +0200
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: "FreeBSD Filesystems" <freebsd-fs@freebsd.org>, "Rainer Duffner"
 <rainer@ultra-secure.de>
Subject: Re: zfs receive stalls whole system
References: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>
 <op.yhlr40k3kndu52@ronaldradial.radialsg.local>
Date: Tue, 17 May 2016 12:46:56 +0200
MIME-Version: 1.0
Content-Transfer-Encoding: Quoted-Printable
From: "Ronald Klop" <ronald-lists@klop.ws>
Message-ID: <op.yhlr8ifwkndu52@ronaldradial.radialsg.local>
In-Reply-To: <op.yhlr40k3kndu52@ronaldradial.radialsg.local>
User-Agent: Opera Mail/1.0 (Win32)
X-Authenticated-As-Hash: 398f5522cb258ce43cb679602f8cfe8b62a256d1
X-Virus-Scanned: by clamav at smarthost1.samage.net
X-Spam-Level: /
X-Spam-Score: -0.2
X-Spam-Status: No, score=-0.2 required=5.0 tests=ALL_TRUSTED,
 BAYES_50 autolearn=disabled version=3.4.0
X-Scan-Signature: a9e4b997d6a751f3e45cb47a3c2b1d2c
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 10:47:00 -0000

On Tue, 17 May 2016 12:44:50 +0200, Ronald Klop <ronald-lists@klop.ws>  =

wrote:

> On Tue, 17 May 2016 01:07:24 +0200, Rainer Duffner  =

> <rainer@ultra-secure.de> wrote:
>
>> Hi,
>>
>> I have two servers, that were running FreeBSD 10.1-AMD64 for a long  =

>> time, one zfs-sending to the other (via zxfer). Both are NFS-servers =
 =

>> and MySQL-slaves, the sender is actively used as NFS-server, the  =

>> recipient is just a warm-standby, in case something serious happens a=
nd  =

>> we don=E2=80=99t want to wait for a day until the restore is back in =
place. The  =

>> MySQL-Slaves are actively used as read-only servers (at the applicati=
on  =

>> level, Python=E2=80=99s SQL-Alchemy does that, apparently).
>>
>> They are HP DL380G8 (one CPU, hexacore) with over 128 GB RAM (I think=
  =

>> one has 144, the other has 192).
>> While they were running 10.1, they used HP P420 RAID-controllers with=
  =

>> individual 12 RAID0 volumes that I pooled into 6-disk RAIDZ2 vdevs.
>> I use zfsnap to do hourly, daily and weekly snapshots.
>>
>> Sending worked well, especially after updating to 10.1
>>
>> Because the storage was over 90% full (and I really hate this  =

>> RAID0-business we have with the HP RAID controllers), I rebuilt the  =

>> servers with HPs OEMed H220/221 controllers (LSI 2308 in disguise) an=
d  =

>> an external disk shelf, hosting 12 additional disks was added- and I =
 =

>> upgraded to FreeBSD 10.3.
>> Because we didn=E2=80=99t want to throw out the original disks, but i=
ncrease  =

>> available space a lot, the new disks are double the size of the  =

>> original disks (600 vs. 1200 GB SAS).
>> I also created GPT-partitions on the disks and labeled them according=
  =

>> to the disk=E2=80=99s position in the cages/shelf, created the pools =
with the  =

>> got-partition-names instead of the daX-names.
>>
>> Now, when I do a zxfer, sometimes the whole system stalls while the  =

>> data is sent over, especially if the delta is large or if something  =

>> else is reading from the disk at the same time (backup agent).
>>
>> I had this before, on 10.0 (I believe, we didn=E2=80=99t have this in=
 9.1  =

>> either, IIRC) and it went away in 10.1.
>>
>> It=E2=80=99s very difficult (well, impossible) to debug, because the =
system  =

>> totally hangs and doesn=E2=80=99t accept any keypresses.
>>
>> Would a ZIL help in this case?
>> I always thought that NFS was the only thing that did SYNC writes=E2=80=
=A6
>
> Databases love SYNC writes too. (But that doesn't say anything about t=
he  =

> unresponsive system).
> I think there is a statistic somewhere in FreeBSD to analyze the sync =
vs  =

> async writes and decide if a ZIL will help or not. (But that doesn't s=
ay  =

> anything about the unresponsive system either).
>
> Ronald.

One question. You did not enable dedup(lication)?

Ronald.

From owner-freebsd-fs@freebsd.org  Tue May 17 10:48:45 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9F72EB3F514
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 10:48:45 +0000 (UTC)
 (envelope-from freebsd-listen@fabiankeil.de)
Received: from smtprelay04.ispgateway.de (smtprelay04.ispgateway.de
 [80.67.18.16])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 65ABB1A3E
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 10:48:44 +0000 (UTC)
 (envelope-from freebsd-listen@fabiankeil.de)
Received: from [78.35.176.77] (helo=fabiankeil.de)
 by smtprelay04.ispgateway.de with esmtpsa (TLSv1.2:AES128-GCM-SHA256:128)
 (Exim 4.84) (envelope-from <freebsd-listen@fabiankeil.de>)
 id 1b2cUs-0000f2-Vw
 for freebsd-fs@freebsd.org; Tue, 17 May 2016 12:45:07 +0200
Date: Tue, 17 May 2016 12:36:27 +0200
From: Fabian Keil <freebsd-listen@fabiankeil.de>
To: FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject: Re: zfs receive stalls whole system
Message-ID: <20160517123627.699e2aa5@fabiankeil.de>
In-Reply-To: <c090ab7bbff2fffe2a49284f9be70183@ultra-secure.de>
References: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>
 <20160517102757.135c1468@fabiankeil.de>
 <c090ab7bbff2fffe2a49284f9be70183@ultra-secure.de>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 boundary="Sig_/Fulk3QySoETDWNL8l4bPeY/"; protocol="application/pgp-signature"
X-Df-Sender: Nzc1MDY3
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 10:48:45 -0000

--Sig_/Fulk3QySoETDWNL8l4bPeY/
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

rainer@ultra-secure.de wrote:

> Am 2016-05-17 10:27, schrieb Fabian Keil:
> > Rainer Duffner <rainer@ultra-secure.de> wrote:
> >  =20
> >> I have two servers, that were running FreeBSD 10.1-AMD64 for a long=20
> >> time, one zfs-sending to the other (via zxfer). Both are NFS-servers=20
> >> and MySQL-slaves, the sender is actively used as NFS-server, the=20
> >> recipient is just a warm-standby, in case something serious happens=20
> >> and we don=E2=80=99t want to wait for a day until the restore is back =
in=20
> >> place. The MySQL-Slaves are actively used as read-only servers (at the=
=20
> >> application level, Python=E2=80=99s SQL-Alchemy does that, apparently).
> >>=20
> >> They are HP DL380G8 (one CPU, hexacore) with over 128 GB RAM (I think=
=20
> >> one has 144, the other has 192).
> >> While they were running 10.1, they used HP P420 RAID-controllers with=
=20
> >> individual 12 RAID0 volumes that I pooled into 6-disk RAIDZ2 vdevs.
> >> I use zfsnap to do hourly, daily and weekly snapshots. =20
> > [...] =20
> >> Now, when I do a zxfer, sometimes the whole system stalls while the=20
> >> data is sent over, especially if the delta is large or if something=20
> >> else is reading from the disk at the same time (backup agent).
> >>=20
> >> I had this before, on 10.0 (I believe, we didn=E2=80=99t have this in =
9.1=20
> >> either, IIRC) and it went away in 10.1. =20
> >=20
> > Do you use geli for swap device(s)? =20
>=20
>=20
> Yes, I do.
> /dev/mirror/swap.eli		none	swap	sw		0	0
>=20
> Bad idea?

It can cause deadlocks and poor performance when paging.

This was recently fixed in ElectroBSD and I intend to submit
the patch in a couple of days after a bit more stress testing.

The patch is already available at:
https://www.fabiankeil.de/sourcecode/electrobsd/GELI-Use-a-dedicated-uma-zo=
ne-for-writes.diff

Fabian

--Sig_/Fulk3QySoETDWNL8l4bPeY/
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlc69CsACgkQBYqIVf93VJ1CEgCgt1OGGtUcZ/RH421bYTH0ZXaK
GsgAn32stEPMDuiCyd5favEJTAKnJtT/
=3xI0
-----END PGP SIGNATURE-----

--Sig_/Fulk3QySoETDWNL8l4bPeY/--

From owner-freebsd-fs@freebsd.org  Tue May 17 11:17:25 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 12609B3FDD9
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 11:17:25 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id F067B1E4B
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 11:17:24 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id EFC38B3FDD8; Tue, 17 May 2016 11:17:24 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id EF68AB3FDD7
 for <fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 11:17:24 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7EC451E4A
 for <fs@freebsd.org>; Tue, 17 May 2016 11:17:24 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u4HBHFQo055669
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Tue, 17 May 2016 14:17:15 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u4HBHFQo055669
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u4HBHFYF055668;
 Tue, 17 May 2016 14:17:15 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Tue, 17 May 2016 14:17:15 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Cc: fs@freebsd.org
Subject: Re: quick fix for slow directory shrinking in ffs
Message-ID: <20160517111715.GC89104@kib.kiev.ua>
References: <20160517072705.F2157@besplex.bde.org>
 <20160517082050.GX89104@kib.kiev.ua>
 <20160517192933.U4573@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160517192933.U4573@besplex.bde.org>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 11:17:25 -0000

On Tue, May 17, 2016 at 08:26:26PM +1000, Bruce Evans wrote:
> On Tue, 17 May 2016, Konstantin Belousov wrote:
> 
> > On Tue, May 17, 2016 at 07:54:27AM +1000, Bruce Evans wrote:
> >> ffs does very slow shrinking of directories after removing some files
> >> leaves unused blocks at the end, by always doing synchronous truncation.
> >> ...
> >> X Index: ufs_lookup.c
> >> X ===================================================================
> >> X --- ufs_lookup.c	(revision 299263)
> >> X +++ ufs_lookup.c	(working copy)
> >> X @@ -1131,9 +1131,9 @@
> >> X  		if (tvp != NULL)
> >> X  			VOP_UNLOCK(tvp, 0);
> >> X  		error = UFS_TRUNCATE(dvp, (off_t)dp->i_endoff,
> >> X -		    IO_NORMAL | IO_SYNC, cr);
> >> X +		    IO_NORMAL | (DOINGASYNC(dvp) ? 0 : IO_SYNC), cr);
> >> X  		if (error != 0)
> >> X -			vprint("ufs_direnter: failted to truncate", dvp);
> >> X +			vprint("ufs_direnter: failed to truncate", dvp);
> >> X  #ifdef UFS_DIRHASH
> >> X  		if (error == 0 && dp->i_dirhash != NULL)
> >> X  			ufsdirhash_dirtrunc(dp, dp->i_endoff);
> >
> > The IO_SYNC flag, for non-journaled SU and any kind of non-SU mounts,
> > only affects the new blocks allocation mode, and write-out mode for
> > the last fragment. The truncation itself (for -J) is performed in the
> > context of the truncating thread. The cg blocks, after the bits are
> > set to free, are marked for delayed write (with the background write
> > hack). The inode block is written according to the mount mode, ignoring
> > IO_SYNC.
> 
> I don't see why you think that.  ffs_truncate() clearly honors IO_SYNC,
> and testing shows that ffs with soft updates does precisely 7 extra
> sync writes for directory compaction (where some of the 7 are probably
> to sync previous activity).
ffs_truncate() completely syncs the vnode for non-J truncations.
I enumerated bits which are written according to the flags, and it
seems to be aligned with what you wrote below.

> 
> I think it would be wrong to ignore IO_SYNC and use the mount mode for
> inodes.  Async mounts still have that bug IIRC (I fixed locally long
> ago).  IO_SYNC is set if the file is is open with O_SYNC and mount mode
> must not override this.  I think ffs has no way of telling that this
> particular IO_SYNC is not associated with O_SYNC.
> 
> > That is, for always fully populated directory files, I do not see how
> > anything is changed by the patch.
> 
> This problem affects all 512-boundaries, which are rarely block or
> even fragment boundaries.
Yes, the write-outs of the blocks or fragments at the new end of the
file are not needed if the buffer is clear and not newly allocated.
But they are performed unconditionally.

> 
> The IO_SYNC for soft updates apparently turns all the previous writes for
> the loop into sync ones.  It has to order them and wait for them and there
> is no better way to wait than a sync write.  The ordering makes an
> unnecessary sync write even more expensive for soft updates than for
> other cases.
> 
> Some relevant code in ffs_truncate:
> 
> Y 	/*
> Y 	 * Shorten the size of the file. If the file is not being
> Y 	 * truncated to a block boundary, the contents of the
> Y 	 * partial block following the end of the file must be
> Y 	 * zero'ed in case it ever becomes accessible again because
> Y 	 * of subsequent file growth. Directories however are not
> Y 	 * zero'ed as they should grow back initialized to empty.
> Y 	 */
> Y 	offset = blkoff(fs, length);
> Y 	if (offset == 0) {
> Y 		ip->i_size = length;
> Y 		DIP_SET(ip, i_size, length);
> Y 	} else {
> Y 		lbn = lblkno(fs, length);
> Y 		flags |= BA_CLRBUF;
> Y 		error = UFS_BALLOC(vp, length - 1, 1, cred, flags, &bp);
> Y 		if (error) {
> Y 			return (error);
> Y 		}
> Y 		/*
> Y 		 * When we are doing soft updates and the UFS_BALLOC
> Y 		 * above fills in a direct block hole with a full sized
> Y 		 * block that will be truncated down to a fragment below,
> Y 		 * we must flush out the block dependency with an FSYNC
> Y 		 * so that we do not get a soft updates inconsistency
> Y 		 * when we create the fragment below.
> Y 		 */
> Y 		if (DOINGSOFTDEP(vp) && lbn < NDADDR &&
> Y 		    fragroundup(fs, blkoff(fs, length)) < fs->fs_bsize &&
> Y 		    (error = ffs_syncvnode(vp, MNT_WAIT)) != 0)
> Y 			return (error);
> Y 		ip->i_size = length;
> Y 		DIP_SET(ip, i_size, length);
> Y 		size = blksize(fs, ip, lbn);
> Y 		if (vp->v_type != VDIR)
> Y 			bzero((char *)bp->b_data + offset,
> Y 			    (u_int)(size - offset));
> Y 		/* Kirk's code has reallocbuf(bp, size, 1) here */
> Y 		allocbuf(bp, size);
> Y 		if (bp->b_bufsize == fs->fs_bsize)
> Y 			bp->b_flags |= B_CLUSTEROK;
> Y 		if (flags & IO_SYNC)
> Y 			bwrite(bp);
> Y 		else
> Y 			bawrite(bp);
> Y 	}
> 
> I think we usually arrive here and honor the IO_SYNC flag.  This is correct.
> Otherwise, we always do an async write, but that is wrong for async mounts.
> Here is my old fix for this:
> 
> Z diff -u2 ffs_inode.c~ ffs_inode.c
> Z --- ffs_inode.c~	Wed Apr  7 21:22:26 2004
> Z +++ ffs_inode.c	Sat Mar 23 01:23:16 2013
> Z @@ -345,4 +431,6 @@
> Z  		if (flags & IO_SYNC)
> Z  			bwrite(bp);
> Z +		else if (DOINGASYNC(ovp))
> Z +			bdwrite(bp);
> Z  		else
> Z  			bawrite(bp);
> 
> This fix must be sprinkled in most places where there is a bwrite()/
> bawrite() decision.
No, I do not think that it would be correct for SU mounts.  It is essential
for the correct operation of e.g. ffs_indirtrunc() that writes for SU
case are synchronous, since no dependencies on the indirect block updates
are recorded. The fact that syncvnode() is done before is similarly
important, because no existing dependencies are cleared.

On the other hand, I agree with the note that the final ffs_update()
must honour IO_SYNC requests.

Anyway, my point was that your patch does not change the hardest source
of sync writes, only the write of the final block.  I will commit the
following.

diff --git a/sys/ufs/ffs/ffs_inode.c b/sys/ufs/ffs/ffs_inode.c
index 0202820..50b456b 100644
--- a/sys/ufs/ffs/ffs_inode.c
+++ b/sys/ufs/ffs/ffs_inode.c
@@ -610,7 +610,7 @@ extclean:
 		softdep_journal_freeblocks(ip, cred, length, IO_EXT);
 	else
 		softdep_setup_freeblocks(ip, length, IO_EXT);
-	return (ffs_update(vp, !DOINGASYNC(vp)));
+	return (ffs_update(vp, (flags & IO_SYNC) != 0 || !DOINGASYNC(vp)));
 }
 
 /*
diff --git a/sys/ufs/ufs/ufs_lookup.c b/sys/ufs/ufs/ufs_lookup.c
index 43b4e5c..53536ff 100644
--- a/sys/ufs/ufs/ufs_lookup.c
+++ b/sys/ufs/ufs/ufs_lookup.c
@@ -1131,7 +1131,7 @@ ufs_direnter(dvp, tvp, dirp, cnp, newdirbp, isrename)
 		if (tvp != NULL)
 			VOP_UNLOCK(tvp, 0);
 		error = UFS_TRUNCATE(dvp, (off_t)dp->i_endoff,
-		    IO_NORMAL | IO_SYNC, cr);
+		    IO_NORMAL | (DOINGASYNC(dvp) ? 0 : IO_SYNC), cr);
 		if (error != 0)
 			vprint("ufs_direnter: failed to truncate", dvp);
 #ifdef UFS_DIRHASH

From owner-freebsd-fs@freebsd.org  Tue May 17 11:59:58 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6E06FB3EE6C
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 11:59:58 +0000 (UTC)
 (envelope-from rainer@ultra-secure.de)
Received: from connect.ultra-secure.de (connect.ultra-secure.de
 [88.198.71.201]) by mx1.freebsd.org (Postfix) with ESMTP id CB5511BB2
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 11:59:57 +0000 (UTC)
 (envelope-from rainer@ultra-secure.de)
Received: (Haraka outbound); Tue, 17 May 2016 13:59:55 +0200
Authentication-Results: connect.ultra-secure.de; auth=pass (login);
 spf=none smtp.mailfrom=ultra-secure.de
Received-SPF: None (connect.ultra-secure.de: domain of ultra-secure.de does
 not designate 127.0.0.16 as permitted sender)
 receiver=connect.ultra-secure.de; identity=mailfrom; client-ip=127.0.0.16;
 helo=connect.ultra-secure.de; envelope-from=<rainer@ultra-secure.de>
Received: from connect.ultra-secure.de (expwebmail [127.0.0.16])
 by connect.ultra-secure.de (Haraka/2.6.2-toaster) with ESMTPSA id
 9E04EF33-5B1F-42B7-8D97-D227B86C463D.1
 envelope-from <rainer@ultra-secure.de> (authenticated bits=0)
 (version=TLSv1/SSLv3 cipher=AES128-GCM-SHA256 verify=NO);
 Tue, 17 May 2016 13:59:53 +0200
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII;
 format=flowed
Content-Transfer-Encoding: 7bit
Date: Tue, 17 May 2016 13:59:53 +0200
From: rainer@ultra-secure.de
To: Ronald Klop <ronald-lists@klop.ws>
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject: Re: zfs receive stalls whole system
In-Reply-To: <op.yhlr8ifwkndu52@ronaldradial.radialsg.local>
References: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>
 <op.yhlr40k3kndu52@ronaldradial.radialsg.local>
 <op.yhlr8ifwkndu52@ronaldradial.radialsg.local>
Message-ID: <a49e175e5ad4c4bfab64bc8e96ce3e3b@ultra-secure.de>
X-Sender: rainer@ultra-secure.de
User-Agent: Roundcube Webmail/1.1.4
X-Haraka-GeoIP: --, , NaNkm
X-Haraka-GeoIP-Received: 
X-Haraka-p0f: os="undefined undefined" link_type="undefined"
 distance=undefined total_conn=undefined shared_ip=Y
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on spamassassin
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.1
X-Haraka-Karma: score: 6, good: 43, bad: 0, connections: 58, history: 43,
 pass:all_good, relaying
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 11:59:58 -0000

Am 2016-05-17 12:46, schrieb Ronald Klop:
> On Tue, 17 May 2016 12:44:50 +0200, Ronald Klop <ronald-lists@klop.ws>  
> wrote:

> One question. You did not enable dedup(lication)?


No, certainly not.
It's off on all the filesystems.

I was sometimes toying with the idea of enabling it, because the dataset 
is structured in a way where it might actually benefit from dedup.
But I didn't go through with it.


Thanks
Rainer

From owner-freebsd-fs@freebsd.org  Tue May 17 12:08:57 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9B074B3F73D
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 12:08:57 +0000 (UTC)
 (envelope-from lexa@lexa.ru)
Received: from mx3.lexa.ru (ns503534.ip-198-27-68.net [198.27.68.102])
 by mx1.freebsd.org (Postfix) with ESMTP id 808251480
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 12:08:56 +0000 (UTC)
 (envelope-from lexa@lexa.ru)
Received: by mx3.lexa.ru (Postfix, from userid 66)
 id C8433224A5E; Tue, 17 May 2016 08:00:07 -0400 (EDT)
Received: from [193.124.130.166] (unknown [193.124.130.166])
 by home-gw.lexa.ru (Postfix) with ESMTP id 6F73616DE
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 15:00:03 +0300 (MSK)
To: freebsd-fs@freebsd.org
From: Alex Tutubalin <lexa@lexa.ru>
Subject: ZFS performance bottlenecks: CPU or RAM or anything else?
Message-ID: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
Date: Tue, 17 May 2016 15:00:03 +0300
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 12:08:57 -0000

Hi,

I'm new to the list, sorry if the subject was discussed earlier (for 
many times), just point to archives....

I'm building new storage server for 'linear read/linear write' 
performance with limited number of parallel data streams (load like 
read/write multi-gigabyte photoshop files, or read many large raw photo 
files).
Target is to saturate 10G link using SMB or iSCSI.

Several years ago I've tested small zpool (5x3Tb 7200rpm drives in 
RAIDZ) with different CPU/memory combos and have got  these results for 
linear write speed by big chunks:

  440 Mb/sec with Core i3-2120/DDR3-1600 ram (2 channel)
  360 Mb/sec with core i7-920/DDR3-1333 (3 channel RAM)
  280 Mb/sec with Core 2Q Q9300 /DDR2-800 (2 channel)

Mixed thoughts:  i7-920 is fastest of the three, RAM linear access also 
fastest, but beaten by i3-2120 with lower latency memory.

Also, I've found this link: https://calomel.org/zfs_raid_speed_capacity.html
For 6x SSD and 10x SSD in RAIDZ2, there is very similar read speed 
(1.7Gb/sec) and very close in write speed (721/806 Mb/sec for 6/10 drives).

Assuming HBA/PCIe performance to be very same for read and write 
operations, write speed is not limited by HBA/bus... so it is limited by 
what?  CPU or RAM or ...?

So, my question is 'what CPU/memory is optimal for ZFS performance'?

In particular:
   - DDR3 or DDR4 (twice the bandwidth) ?
  -  limited number of cores and high clock rate (e.g. i3-6xxxx) or many 
cores/slower clock ?

No plans to use compression or deduplication, only raidz2 with 8-10 HDD 
spindles and 3-4-5 SSDs for L2ARC.

Alex Tutubalin
lexa@lexa.ru


From owner-freebsd-fs@freebsd.org  Tue May 17 12:11:16 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7E1E5B3F7A5
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 12:11:16 +0000 (UTC)
 (envelope-from jg@internetx.com)
Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3CDCB1773
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 12:11:16 +0000 (UTC)
 (envelope-from jg@internetx.com)
Received: from localhost (localhost [127.0.0.1])
 by mx1.internetx.com (Postfix) with ESMTP id 4117445FC0D8;
 Tue, 17 May 2016 14:11:13 +0200 (CEST)
X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de
Received: from mx1.internetx.com ([62.116.129.39])
 by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id HSvJnI2+xBvQ; Tue, 17 May 2016 14:11:09 +0200 (CEST)
Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.internetx.com (Postfix) with ESMTPSA id 49B5C45FC0D6;
 Tue, 17 May 2016 14:11:08 +0200 (CEST)
Subject: Re: ZFS performance bottlenecks: CPU or RAM or anything else?
References: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
To: Alex Tutubalin <lexa@lexa.ru>, freebsd-fs@freebsd.org
Reply-To: jg@internetx.com
From: InterNetX - Juergen Gotteswinter <jg@internetx.com>
Message-ID: <f87ec54a-104e-e712-7793-86c37285fdaa@internetx.com>
Date: Tue, 17 May 2016 14:11:07 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 12:11:16 -0000

Raidz is your Problem, go for Mirrors

Am 5/17/2016 um 2:00 PM schrieb Alex Tutubalin:
> Hi,
> 
> I'm new to the list, sorry if the subject was discussed earlier (for
> many times), just point to archives....
> 
> I'm building new storage server for 'linear read/linear write'
> performance with limited number of parallel data streams (load like
> read/write multi-gigabyte photoshop files, or read many large raw photo
> files).
> Target is to saturate 10G link using SMB or iSCSI.
> 
> Several years ago I've tested small zpool (5x3Tb 7200rpm drives in
> RAIDZ) with different CPU/memory combos and have got  these results for
> linear write speed by big chunks:
> 
>  440 Mb/sec with Core i3-2120/DDR3-1600 ram (2 channel)
>  360 Mb/sec with core i7-920/DDR3-1333 (3 channel RAM)
>  280 Mb/sec with Core 2Q Q9300 /DDR2-800 (2 channel)
> 
> Mixed thoughts:  i7-920 is fastest of the three, RAM linear access also
> fastest, but beaten by i3-2120 with lower latency memory.
> 
> Also, I've found this link:
> https://calomel.org/zfs_raid_speed_capacity.html
> For 6x SSD and 10x SSD in RAIDZ2, there is very similar read speed
> (1.7Gb/sec) and very close in write speed (721/806 Mb/sec for 6/10 drives).
> 
> Assuming HBA/PCIe performance to be very same for read and write
> operations, write speed is not limited by HBA/bus... so it is limited by
> what?  CPU or RAM or ...?
> 
> So, my question is 'what CPU/memory is optimal for ZFS performance'?
> 
> In particular:
>   - DDR3 or DDR4 (twice the bandwidth) ?
>  -  limited number of cores and high clock rate (e.g. i3-6xxxx) or many
> cores/slower clock ?
> 
> No plans to use compression or deduplication, only raidz2 with 8-10 HDD
> spindles and 3-4-5 SSDs for L2ARC.
> 
> Alex Tutubalin
> lexa@lexa.ru
> 
> 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

From owner-freebsd-fs@freebsd.org  Tue May 17 12:14:24 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 65FCCB3F8A9
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 12:14:24 +0000 (UTC)
 (envelope-from maurizio.vairani@cloverinformatica.it)
Received: from host202-129-static.10-188-b.business.telecomitalia.it
 (host202-129-static.10-188-b.business.telecomitalia.it [188.10.129.202])
 by mx1.freebsd.org (Postfix) with ESMTP id 21F391984;
 Tue, 17 May 2016 12:14:23 +0000 (UTC)
 (envelope-from maurizio.vairani@cloverinformatica.it)
Received: from [192.168.0.60] (MAURIZIO-PC [192.168.0.60])
 by host202-129-static.10-188-b.business.telecomitalia.it (Postfix) with ESMTP
 id C35822C6EC; Tue, 17 May 2016 14:04:28 +0200 (CEST)
Subject: Re: Best practice for high availability ZFS pool
To: Palle Girgensohn <girgen@FreeBSD.org>, freebsd-fs@freebsd.org
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
From: Maurizio Vairani <maurizio.vairani@cloverinformatica.it>
Message-ID: <625e2776-a97f-9ee7-a1cb-c1a053804f6c@cloverinformatica.it>
Date: Tue, 17 May 2016 14:04:28 +0200
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.0
MIME-Version: 1.0
In-Reply-To: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 12:14:24 -0000

Il 16/05/2016 12:08, Palle Girgensohn ha scritto:
> Hi,
>
> We need to set up a ZFS pool with redundance. The main goal is high availability - uptime.
>
> I can see a few of paths to follow.
>
> 1. HAST + ZFS
>
> 2. Some sort of shared storage, two machines sharing a JBOD box.
>
> 3. ZFS replication (zfs snapshot + zfs send | ssh | zfs receive)
Hi, have you tried compression ? Somethings like:

zfs snapshot + zfs send | lzop | ssh | lzop -d | zfs receive

I am successfully using this method using a modified version of 
sysutils/zrep , but my pools are only few TB in size.
--
Maurizio


From owner-freebsd-fs@freebsd.org  Tue May 17 12:24:10 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4D385B3FC11
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 12:24:10 +0000 (UTC)
 (envelope-from lexa@lexa.ru)
Received: from mx3.lexa.ru (ns503534.ip-198-27-68.net [198.27.68.102])
 by mx1.freebsd.org (Postfix) with ESMTP id 3193D1FF2
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 12:24:09 +0000 (UTC)
 (envelope-from lexa@lexa.ru)
Received: by mx3.lexa.ru (Postfix, from userid 66)
 id 4750C224A66; Tue, 17 May 2016 08:24:09 -0400 (EDT)
Received: from [193.124.130.166] (unknown [193.124.130.166])
 by home-gw.lexa.ru (Postfix) with ESMTP id DC9661790
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 15:21:07 +0300 (MSK)
Subject: Re: ZFS performance bottlenecks: CPU or RAM or anything else?
To: freebsd-fs@freebsd.org
References: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
 <f87ec54a-104e-e712-7793-86c37285fdaa@internetx.com>
From: Alex Tutubalin <lexa@lexa.ru>
Message-ID: <16e474da-6b20-2e51-9981-3c262eaff350@lexa.ru>
Date: Tue, 17 May 2016 15:21:08 +0300
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <f87ec54a-104e-e712-7793-86c37285fdaa@internetx.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 12:24:10 -0000

On 5/17/2016 3:11 PM, InterNetX - Juergen Gotteswinter wrote:
> Raidz is your Problem, go for Mirrors

Raidz2 will survive two (any) drives failure, while mirrored stripe will 
not.

So, if it is possible to increase raidz2 performance by faster CPU or 
RAM I'll go this route

Alex

From owner-freebsd-fs@freebsd.org  Tue May 17 12:31:52 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 01DCDB3FD93
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 12:31:52 +0000 (UTC)
 (envelope-from killing@multiplay.co.uk)
Received: from mail-wm0-x235.google.com (mail-wm0-x235.google.com
 [IPv6:2a00:1450:400c:c09::235])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 93BAE1289
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 12:31:51 +0000 (UTC)
 (envelope-from killing@multiplay.co.uk)
Received: by mail-wm0-x235.google.com with SMTP id g17so28215353wme.1
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 05:31:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623;
 h=subject:to:references:from:message-id:date:user-agent:mime-version
 :in-reply-to; bh=z8EYrhQ0C2CDsBDzacDnjA1TF2BfexlxCJAg3MHLtYM=;
 b=rkF4zfnj6YUwQHN49pb1ZE+sU7CwrneNg2bLWyXQHX6W0dBZ1uhV6PR00+pnhBrkoK
 E46/yR6fso2EA2r8YB+oMc6rDdWb3xTDEHXYooMJyjPvBVuZVGfPz3mMScwShQ97ZXUc
 BkOJFPB8yjuwKpjCw1BcqIr2Gr9hqF0HI85jLMsKMzifUmmXdjqCiRVpusxwpltQf6Ea
 47meXpsVxSU/k2a5mlK+8XM2AsA2FzwtNIFkL9kcgN++8FtJiIIvt6csAhWpY7o2bJCF
 uG1N/Tj5zqVb/vAZjhSjRCFp9ay9QFzrtOr9fS34do9YBGh5+PCBXxI+oWAKychslp2W
 b9UQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:subject:to:references:from:message-id:date
 :user-agent:mime-version:in-reply-to;
 bh=z8EYrhQ0C2CDsBDzacDnjA1TF2BfexlxCJAg3MHLtYM=;
 b=lS1xn5ilOhoax9rID3uteton3+BuQGWdRg0sFNxZSe3QQb2+jNFPChgFjLF6jkh6BR
 soYqJ1zd+CAMil7U9ifmpswjv9FIHhLoLoQBE2JRaNEYPclZtshx4hq8RU5q0oIcijyf
 alYYf67kDjQE+V17cnCWoCsW8QYyTerhJEzFXFVSLa6Z++Kh0L9AE8Ut9BXev8mWFipv
 US/v8+w6A6HSjAZ4FWslXaJ6y10jHSg/y5rPmrct5A3s4VQh9st5UGnendW2UDaC0OLC
 HWcgakMhJjWpp0i43GERykdEFWeAYWjPsEtcIHyC33mVoytRdGwTVyithZFMebNS26Ti
 WhPQ==
X-Gm-Message-State: AOPr4FX6BwBmacSwgy981ZAnSlXlfQce5Frs8rTeEoe0Bwbq5OaZCeBSZqMJNyYXa6c1yZKS
X-Received: by 10.194.223.41 with SMTP id qr9mr1226271wjc.61.1463488308915;
 Tue, 17 May 2016 05:31:48 -0700 (PDT)
Received: from [10.10.1.58] (liv3d.labs.multiplay.co.uk. [82.69.141.171])
 by smtp.gmail.com with ESMTPSA id e16sm23879833wmc.3.2016.05.17.05.31.47
 for <freebsd-fs@freebsd.org> (version=TLSv1/SSLv3 cipher=OTHER);
 Tue, 17 May 2016 05:31:47 -0700 (PDT)
Subject: Re: ZFS performance bottlenecks: CPU or RAM or anything else?
To: freebsd-fs@freebsd.org
References: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
 <f87ec54a-104e-e712-7793-86c37285fdaa@internetx.com>
 <16e474da-6b20-2e51-9981-3c262eaff350@lexa.ru>
From: Steven Hartland <killing@multiplay.co.uk>
Message-ID: <884c4558-c207-596a-3e3e-45a6f579b666@multiplay.co.uk>
Date: Tue, 17 May 2016 13:31:53 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <16e474da-6b20-2e51-9981-3c262eaff350@lexa.ru>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 12:31:52 -0000

There's been some recent commits which help sequential reads IIRC, so 
might be worth checking on CURRENT.

On 17/05/2016 13:21, Alex Tutubalin wrote:
> On 5/17/2016 3:11 PM, InterNetX - Juergen Gotteswinter wrote:
>> Raidz is your Problem, go for Mirrors
>
> Raidz2 will survive two (any) drives failure, while mirrored stripe 
> will not.
>
> So, if it is possible to increase raidz2 performance by faster CPU or 
> RAM I'll go this route
>
> Alex
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@freebsd.org  Tue May 17 12:36:09 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 155B8B3FF6D
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 12:36:09 +0000 (UTC)
 (envelope-from lexa@lexa.ru)
Received: from mx3.lexa.ru (ns503534.ip-198-27-68.net [198.27.68.102])
 by mx1.freebsd.org (Postfix) with ESMTP id ED1321735
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 12:36:08 +0000 (UTC)
 (envelope-from lexa@lexa.ru)
Received: by mx3.lexa.ru (Postfix, from userid 66)
 id EFFA2224A5C; Tue, 17 May 2016 08:36:07 -0400 (EDT)
Received: from [193.124.130.166] (unknown [193.124.130.166])
 by home-gw.lexa.ru (Postfix) with ESMTP id 7B5881827
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 15:35:22 +0300 (MSK)
Subject: Re: ZFS performance bottlenecks: CPU or RAM or anything else?
To: freebsd-fs@freebsd.org
References: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
 <f87ec54a-104e-e712-7793-86c37285fdaa@internetx.com>
 <16e474da-6b20-2e51-9981-3c262eaff350@lexa.ru>
 <BD7DE274-04EB-4B19-988D-5A6FADC5B51A@digsys.bg>
From: Alex Tutubalin <lexa@lexa.ru>
Message-ID: <1e012e43-a49b-6923-3f0a-ee77a5c8fa70@lexa.ru>
Date: Tue, 17 May 2016 15:35:22 +0300
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <BD7DE274-04EB-4B19-988D-5A6FADC5B51A@digsys.bg>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 12:36:09 -0000

On 5/17/2016 3:29 PM, Daniel Kalchev wrote:

> Not true. You can have N-way mirror and it will survive N-1 drive failures.
I agree, but 3-way mirror does not looks economical compared to raidz2.


> The limitations of RAIDZ performance do not come from CPU or RAM limitations, but by the underlying hardware. RAIDZ is limited to the performance of a single disk IOPS.
>
> CPU/RAM these days are so much faster than spinning disks or SSDs.

Ok. But why I've got different results in my Y2012 testing ( i3-2120 was 
1.5 times faster than Q9300 on same HDDs)?

Alex

From owner-freebsd-fs@freebsd.org  Tue May 17 12:41:11 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7094CB3FFFC
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 12:41:11 +0000 (UTC)
 (envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client CN "smtp-sofia.digsys.bg",
 Issuer "Digital Systems Operational CA" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 0188718D1
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 12:41:10 +0000 (UTC)
 (envelope-from daniel@digsys.bg)
Received: from [193.68.6.100] ([193.68.6.100]) (authenticated bits=0)
 by smtp-sofia.digsys.bg (8.14.9/8.14.9) with ESMTP id u4HCTKAQ053293
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Tue, 17 May 2016 15:29:20 +0300 (EEST)
 (envelope-from daniel@digsys.bg)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: ZFS performance bottlenecks: CPU or RAM or anything else?
From: Daniel Kalchev <daniel@digsys.bg>
In-Reply-To: <16e474da-6b20-2e51-9981-3c262eaff350@lexa.ru>
Date: Tue, 17 May 2016 15:29:20 +0300
Cc: freebsd-fs@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <BD7DE274-04EB-4B19-988D-5A6FADC5B51A@digsys.bg>
References: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
 <f87ec54a-104e-e712-7793-86c37285fdaa@internetx.com>
 <16e474da-6b20-2e51-9981-3c262eaff350@lexa.ru>
To: Alex Tutubalin <lexa@lexa.ru>
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 12:41:11 -0000


> On 17.05.2016 =D0=B3., at 15:21, Alex Tutubalin <lexa@lexa.ru> wrote:
>=20
> On 5/17/2016 3:11 PM, InterNetX - Juergen Gotteswinter wrote:
>> Raidz is your Problem, go for Mirrors
>=20
> Raidz2 will survive two (any) drives failure, while mirrored stripe =
will not.
>=20

Not true. You can have N-way mirror and it will survive N-1 drive =
failures.

> So, if it is possible to increase raidz2 performance by faster CPU or =
RAM I'll go this route

The limitations of RAIDZ performance do not come from CPU or RAM =
limitations, but by the underlying hardware. RAIDZ is limited to the =
performance of a single disk IOPS.=20

CPU/RAM these days are so much faster than spinning disks or SSDs.

Daniel=

From owner-freebsd-fs@freebsd.org  Tue May 17 13:24:25 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 19CBEB3D08C
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 13:24:25 +0000 (UTC)
 (envelope-from bfriesen@simple.dallas.tx.us)
Received: from smtp.simplesystems.org (smtp.simplesystems.org [65.66.246.90])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
 bits)) (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id DFA0F2E43
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 13:24:24 +0000 (UTC)
 (envelope-from bfriesen@simple.dallas.tx.us)
Received: from freddy.simplesystems.org (freddy.simplesystems.org
 [65.66.246.65])
 by smtp.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id u4HDOMxW023751;
 Tue, 17 May 2016 08:24:22 -0500 (CDT)
Date: Tue, 17 May 2016 08:24:22 -0500 (CDT)
From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-X-Sender: bfriesen@freddy.simplesystems.org
To: Ben RUBSON <ben.rubson@gmail.com>
cc: freebsd-fs@freebsd.org
Subject: Re: Best practice for high availability ZFS pool
In-Reply-To: <AB71607F-7048-404E-AFE3-D448823BB768@gmail.com>
Message-ID: <alpine.GSO.2.20.1605170819220.7756@freddy.simplesystems.org>
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
 <alpine.GSO.2.20.1605162034170.7756@freddy.simplesystems.org>
 <AB71607F-7048-404E-AFE3-D448823BB768@gmail.com>
User-Agent: Alpine 2.20 (GSO 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
 (smtp.simplesystems.org [65.66.246.90]); Tue, 17 May 2016 08:24:22 -0500 (CDT)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 13:24:25 -0000

On Tue, 17 May 2016, Ben RUBSON wrote:
>>
>> Without completely isolated systems there is always the risk of total failure.  Even with zfs send there is the risk of total failure if the sent data results in corruption on the receiving side.
>
> In this case rollback one of the previous snapshots on the receiving side ?
> Did you mean the sent data can totally brake the receiving pool making it unusable / unable to import ? Did we already see this ?

There is at least one case of zfs send propagating a problem into the 
receiving pool. I don't know if it broke the pool.  Corrupt data may 
be sent from one pool to another if it passes checksums.  With any 
solution, there is the possibility of software bugs.

Adding more parallel hardware decreases the chance of data loss but it 
increases the chance of hardware failure.

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From owner-freebsd-fs@freebsd.org  Tue May 17 14:17:05 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id BFC61B3DE7D
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 14:17:05 +0000 (UTC)
 (envelope-from s_sourceforge@nedprod.com)
Received: from mail.nedprod.com (europe4.nedproductions.biz [213.251.186.177])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
 bits)) (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 5EFA82DE5
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 14:17:04 +0000 (UTC)
 (envelope-from s_sourceforge@nedprod.com)
Received: from authenticated-user (mail.nedprod.com [213.251.186.177])
 (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mail.nedprod.com (Postfix) with ESMTPSA id 451B614D78
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 15:16:57 +0100 (BST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=nedprod.com; s=mail;
 t=1463494617; bh=AV2fSLMfb78I/zkiilNlcTW+c851ayn1D9nWAc/litQ=;
 h=Resent-from:Resent-to:Resent-date:From:To:Subject:Date:From;
 b=btrXNiby2SihQYmsQsSMuvDFOY9rbGHAXYUCvp7y9Kge4GSUFGUlgNEZY5d9dmjA/
 mHEKPLIv6i5GEuNV1AT4ArGTM5no9vpf0q3gqXbr40YtkdIuhQiMlahjLweeLQv6lP
 qGpnK4UXRxvPoGpqu1Ia9U6yL0pffGEx8OlEGzqTDAfqsVqGpEBmbY/mnRdWa1QceI
 DF48fMstbwiA3PIbWB01lKIsWez0S+4HiHw8JaG6BtescM2HSQkCqIpN1wqT43ERpI
 RFEX/GniZiBb98FFIoMRi9WTK8O9yHH/nRpcaQPB82YuZQ0X3b7UCANxvLVezTvWj3
 CsD6xooYs3LCw==
Resent-from: "Niall Douglas" <s_sourceforge@nedprod.com>
Resent-to: freebsd-fs@FreeBSD.org <freebsd-fs@freebsd.org>
Resent-date: Tue, 17 May 2016 15:17:01 +0100
X-cs: R
X-CS-Version: 1.0
From: Niall Douglas <s_sourceforge@nedprod.com>
X-RS-ID: s_sourceforge
X-RS-Flags: 0,0,1,1,0,0,0
X-RS-Header: In-reply-to: <9F057D48-5413-437B-A612-64D47E95C846@gmail.com>
X-RS-Header: References: <5736E7B4.1000409@gmail.com>,
 <9ead4b28-9711-5e38-483f-ef9eaf0bc583@digiware.nl>,
 <9F057D48-5413-437B-A612-64D47E95C846@gmail.com>
X-RS-Sigset: 1
To: freebsd-fs@FreeBSD.org <freebsd-fs@freebsd.org>
Subject: Re: Bigger MAX_PATH (Was: Re: State of native encryption in ZFS)
MIME-Version: 1.0
Content-type: text/plain; charset=UTF-8
Content-transfer-encoding: 8BIT
Date: Tue, 17 May 2016 11:37:54 +0100
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 14:17:05 -0000

On 17 May 2016 at 11:58, Ben RUBSON wrote:

> >>> If FreeBSD had a bigger PATH_MAX then stackable encryptions layers
> >>> like ecryptfs (encfs?) would be viable choices. Because encrypted
> >>> path components are so long, one runs very rapidly into the maximum
> >>> path on the system when PATH_MAX is so low.
> 
> Could you give us some examples where PATH_MAX was too low for you using ecryptfs ?
> I (for the moment) do not run into troubles using EncFS.

Sure.

I typed this command into my encrypted store to find all paths and sort 
them by length:

  find .  | awk '{ print length, $0 }' | sort -n -s | cut -d" " -f2-

And this was the last (longest) path returned:

./ECRYPTFS_FNEK_ENCRYPTED.FWbfg.wnsu2EnUQFbyMTM6advEpfCnjqSMVUiW5.LgoJrMb2-r
t6c61qRU--/ECRYPTFS_FNEK_ENCRYPTED.FWbfg.wnsu2EnUQFbyMTM6advEpfCnjqSMVULVNQa
FNaPoVWHcEh8FJ8mE--/ECRYPTFS_FNEK_ENCRYPTED.FWbfg.wnsu2EnUQFbyMTM6advEpfCnjq
SMVU7LmQnMhHfh0u5yByHsE6r---/ECRYPTFS_FNEK_ENCRYPTED.FWbfg.wnsu2EnUQFbyMTM6a
dvEpfCnjqSMVUyFs1x3YH5TrEDJn4uOR7qk--/ECRYPTFS_FNEK_ENCRYPTED.FXbfg.wnsu2EnU
QFbyMTM6advEpfCnjqSMVU4ghdTluFURviDBNaKn5dqiV0xCDj5Ikg1JCyAoTTJN6-/ECRYPTFS_
FNEK_ENCRYPTED.FXbfg.wnsu2EnUQFbyMTM6advEpfCnjqSMVU-jycWae440W7yMwmiyP3Y2kL7
WCaoKGKU66C7Cxvk.c-/ECRYPTFS_FNEK_ENCRYPTED.FXbfg.wnsu2EnUQFbyMTM6advEpfCnjq
SMVUhi1aG2eEb2eWm.A0HVk-wDsIJHSIpRFFKCNvTGLuRog-/ECRYPTFS_FNEK_ENCRYPTED.FWb
fg.wnsu2EnUQFbyMTM6advEpfCnjqSMVUQSSHq93LQdjeusuoEfcYl---/ECRYPTFS_FNEK_ENCR
YPTED.FWbfg.wnsu2EnUQFbyMTM6advEpfCnjqSMVUtcq-q29SemW-IOdIxu-WME--/ECRYPTFS_
FNEK_ENCRYPTED.FWbfg.wnsu2EnUQFbyMTM6advEpfCnjqSMVU8IbibKeFBd7fIPHPjbXAUU--/
ECRYPTFS_FNEK_ENCRYPTED.FWbfg.wnsu2EnUQFbyMTM6advEpfCnjqSMVUIRoDoGSCaes2geXo
.1ofyE--/ECRYPTFS_FNEK_ENCRYPTED.FWbfg.wnsu2EnUQFbyMTM6advEpfCnjqSMVUhJMdWTL
zJ5ZmOdxFdiH61E--/ECRYPTFS_FNEK_ENCRYPTED.FWbfg.wnsu2EnUQFbyMTM6advEpfCnjqSM
VUa8r9YMTUfWQ4jQCNcJSCfE--/ECRYPTFS_FNEK_ENCRYPTED.FWbfg.wnsu2EnUQFbyMTM6adv
EpfCnjqSMVUX0etb6f3dEb6LYy1ZX6uFk--/ECRYPTFS_FNEK_ENCRYPTED.FXbfg.wnsu2EnUQF
byMTM6advEpfCnjqSMVUegHOnwkmJTzIaxaWQLEC-4bmxptKCfKKtzQ3vS4I4Mc-

(which is 1356 characters not including the mount point of the encrypted 
drive)

This isn't a particularly crazy encrypted drive. It contains a few backups, 
accounts, keys and so on. I'm not deliberately storing deep directory trees 
or anything.

> > http://freebsd.1045724.n5.nabble.com/misc-184340-PATH-MAX-not-interope
> > rable-with-Linux-td5864469.html
> 
> And examples where PATH_MAX is too low using Rsync ?

I've run into this when rsyncing Jenkins workspaces to FreeBSD (Jenkins 
matrix builder generates long long paths). It isn't just rsync though, it 
affects extracting tar archives on FreeBSD too.

Niall


From owner-freebsd-fs@freebsd.org  Tue May 17 14:47:02 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 41F30B3E4CA
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 14:47:02 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x22a.google.com (mail-wm0-x22a.google.com
 [IPv6:2a00:1450:400c:c09::22a])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id CC7013F06
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 14:47:01 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x22a.google.com with SMTP id e201so143233208wme.0
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 07:47:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=ZDrpdpEa/I64pn3vvDxSgzgSd0T/PlKMZX/lrkOBJZU=;
 b=FBAMTorGbZu3zxxQJXRsKVyZMy2tLtRQZ0qn3D/jKmShhVKhHxfdX3zaKXfRxA/bzd
 J9A9KOKoEUI1sk7lO8rt4vyrwNVkHK9o4Knuhmj/6FJDeghvCEXh/ZO6j2B+uhUMqM+e
 lbbZ/0i6p10A+Mv2MCGxwrgKw+IDeGj/dTs8butGBC70jxUb8ocoej29gma2oBidCIZI
 ZMUUyZvOCnF3avd4g9w7CO/KyBR8Tcj6JCWyo6ifuRmEVDu1RQQF2eVGZFpLvfK0qXbl
 3LxIRZXoZoqZ8RFQXnilaGTWYiw/S6AV5L41OjQpSygRPOTNAgq/bgmx1NF94AOygLCj
 cnfg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=ZDrpdpEa/I64pn3vvDxSgzgSd0T/PlKMZX/lrkOBJZU=;
 b=jz4i6YUFwAWcmSpQbDl6NfYSfMvcnSCSsmrfjbHPGuXXmQ+w4B8EbTJsEuwHd16XFu
 7b0qyMMgmB67Y/FLqsu3MAODw+ccq96Jextj0ufGkkUcZ8OicuizNwnv93oQ6fWkEu4m
 0NrZNEUlHMvAu9BWAJo57pq+obo+EcmAXL8rakKrrE6keOqB3MnU2URWScwBLelYcCTO
 Kr3On5iYZ4IO/vW5zQmTQqzUpmdTYqBwQvm0o5ee+zk6yuaaO4VtwS+r3vybLb3vRn/E
 XE0nZMYAzE1zV1A9ajxe7p+4pvIZs/ZCDvOC79d+GZs2Sqq5w23AvU0z08nPXlDA+i7j
 LbjA==
X-Gm-Message-State: AOPr4FVCX47sJBvZBouZorX14muX8m4eP6njloMqHQkPq8fJDpMNFjZWP9YylhLPecf82w==
X-Received: by 10.28.31.6 with SMTP id f6mr23254524wmf.69.1463496420394;
 Tue, 17 May 2016 07:47:00 -0700 (PDT)
Received: from [192.168.1.16] (210.236.26.109.rev.sfr.net. [109.26.236.210])
 by smtp.gmail.com with ESMTPSA id g3sm3485949wjb.47.2016.05.17.07.46.58
 for <freebsd-fs@freebsd.org>
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Tue, 17 May 2016 07:46:59 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: Best practice for high availability ZFS pool
From: Ben RUBSON <ben.rubson@gmail.com>
In-Reply-To: <alpine.GSO.2.20.1605170819220.7756@freddy.simplesystems.org>
Date: Tue, 17 May 2016 16:46:53 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <40C35566-B7FB-4F59-BB41-D43BC0362C26@gmail.com>
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
 <alpine.GSO.2.20.1605162034170.7756@freddy.simplesystems.org>
 <AB71607F-7048-404E-AFE3-D448823BB768@gmail.com>
 <alpine.GSO.2.20.1605170819220.7756@freddy.simplesystems.org>
To: freebsd-fs@freebsd.org
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 14:47:02 -0000

> On 17 may 2016 at 15:24, Bob Friesenhahn =
<bfriesen@simple.dallas.tx.us> wrote:
>=20
> On Tue, 17 May 2016, Ben RUBSON wrote:
>>>=20
>>> Without completely isolated systems there is always the risk of =
total failure.  Even with zfs send there is the risk of total failure if =
the sent data results in corruption on the receiving side.
>>=20
>> In this case rollback one of the previous snapshots on the receiving =
side ?
>> Did you mean the sent data can totally brake the receiving pool =
making it unusable / unable to import ? Did we already see this ?
>=20
> There is at least one case of zfs send propagating a problem into the =
receiving pool. I don't know if it broke the pool.  Corrupt data may be =
sent from one pool to another if it passes checksums.

Do you have any link to this problem ? Would be interesting to know if =
it was possible to come-back to a previous snapshot / consistent pool.

I think that making ZFS send/receive has a higher security level than =
mirroring to a second (or third) JBOD box.
With mirroring you will still have only one ZFS pool.
With send/receive, you have a second / different ZFS pool / data =
"envelope", which could (I think) mitigate the "chance" of a broken / =
dead pool.
Mirror over 2 different JBOD boxes, and send/receive to a third one, is =
I think a nice solution.

However, if send/receive makes the receiving pool the exact 1:1 copy of =
the sending pool, then the thing which made the sending pool to corrupt =
could reach (and corrupt) the receiving pool...
I don't know whether or not this could occur, and if ever it occurs, if =
we have the chance to revert to a previous snapshot, at least on the =
receiving side...


Ben=

From owner-freebsd-fs@freebsd.org  Tue May 17 15:35:29 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8B01BB3F561
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 15:35:29 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x234.google.com (mail-wm0-x234.google.com
 [IPv6:2a00:1450:400c:c09::234])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 11B1D65FCC
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 15:35:29 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x234.google.com with SMTP id a17so37870716wme.0
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 08:35:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=B1h9c6mb6sLZC5yI1K+4gySFzHY8bBFLjTP8dgn/CKU=;
 b=jdSZfqJ3u9293aWr9MMTq/DYAzCVa15/OXav6etiwP8hEwwmJOuIo7Eo8e1MXN7eul
 5ka7T9FUJosKfCvvaK7lqPo7bwDlomM2bkFOFniZwTL8b/EgEwIpjSmMSTstiORF3Ll6
 gxwsa5GFnNzRpvOUVkTEPG1L11BvDNSbBz8/FkDGP7QIrlv9f97kdRaEqkn9Uicnf3+a
 kIEnaqdS9InTmcuAyFSCVGGriXFhRPfZt8oAAWc08BKb5bixTKBcYdf+SM+tIGyd5uHq
 6iUdl767JGvHAf+jMSQQkcnWjCbYEJX3QriMTHpI5OTA0ZO45b+pyQgrxt8dIg4z7DIz
 /Uwg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=B1h9c6mb6sLZC5yI1K+4gySFzHY8bBFLjTP8dgn/CKU=;
 b=NKyHbcehTOKmC6EXrBY2RBiALh1FcPyax/uG22NWoUdMmsAZAsQnndMpXjG0WuF00f
 8cX4Ze/yxHcb+0q+YMgXXJfILYzjCQf29rJ6KavwVmvu7niYCOuL5U7hYslEKS515AL6
 YzKQg0otZfvgBZqXC07V76JkOQiG82Wmp3KxURbNwmLz1XNXNly3arzcDCkNF/YXy/GJ
 i3Jsr1teip5fwvzqo6gf79JDQmsLieeJGAaCImTBC8Qysh11Mbe48+D96YdL3X2ThGW1
 CKFmVVeaKP+4ZRSs/kkSP0MHUa1ZRbmGXfR440+Bng9Q5aqjB+Um8fVeFlr4GpvJd63v
 x62Q==
X-Gm-Message-State: AOPr4FVE4Oo0en/VfKoV0sJnQuX7qTrFwwWScZMexeVq/H28giphk8RpBrVLjqKrySCNOQ==
X-Received: by 10.28.39.196 with SMTP id n187mr2204377wmn.4.1463499327370;
 Tue, 17 May 2016 08:35:27 -0700 (PDT)
Received: from [192.168.1.16] (210.236.26.109.rev.sfr.net. [109.26.236.210])
 by smtp.gmail.com with ESMTPSA id kz1sm3705899wjc.46.2016.05.17.08.35.26
 for <freebsd-fs@freebsd.org>
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Tue, 17 May 2016 08:35:26 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: Bigger MAX_PATH (Was: Re: State of native encryption in ZFS)
From: Ben RUBSON <ben.rubson@gmail.com>
In-Reply-To: <573b27e8.0604620a.3a15c.ffffe914SMTPIN_ADDED_MISSING@mx.google.com>
Date: Tue, 17 May 2016 17:35:25 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <546E5477-E636-49D4-A137-16FDA2CA1E7B@gmail.com>
References: <573b27e8.0604620a.3a15c.ffffe914SMTPIN_ADDED_MISSING@mx.google.com>
To: "freebsd-fs@FreeBSD.org" <freebsd-fs@freebsd.org>
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 15:35:29 -0000

> On 17 may 2016 at 12:37, Niall Douglas wrote:
>=20
> On 17 May 2016 at 11:58, Ben RUBSON wrote:
>=20
>>>>> If FreeBSD had a bigger PATH_MAX then stackable encryptions layers
>>>>> like ecryptfs (encfs?) would be viable choices. Because encrypted
>>>>> path components are so long, one runs very rapidly into the =
maximum
>>>>> path on the system when PATH_MAX is so low.
>>=20
>> Could you give us some examples where PATH_MAX was too low for you =
using ecryptfs ?
>> I (for the moment) do not run into troubles using EncFS.
>=20
> Sure.
>=20
> I typed this command into my encrypted store to find all paths and =
sort=20
> them by length:
>=20
>  find .  | awk '{ print length, $0 }' | sort -n -s | cut -d" " -f2-
>=20
> And this was the last (longest) path returned:
>=20
> (...)
>=20
> (which is 1356 characters not including the mount point of the =
encrypted=20
> drive)
>=20
> This isn't a particularly crazy encrypted drive. It contains a few =
backups,=20
> accounts, keys and so on. I'm not deliberately storing deep directory =
trees=20
> or anything.
>=20
>>> =
http://freebsd.1045724.n5.nabble.com/misc-184340-PATH-MAX-not-interope
>>> rable-with-Linux-td5864469.html
>>=20
>> And examples where PATH_MAX is too low using Rsync ?
>=20
> I've run into this when rsyncing Jenkins workspaces to FreeBSD =
(Jenkins=20
> matrix builder generates long long paths). It isn't just rsync though, =
it=20
> affects extracting tar archives on FreeBSD too.

Thank you Niall for your answer.
I managed to reproduce the issue creating a 900 characters path (9 =
subfolders of 100 characters each) and Rsyncing it to an EncFS remote =
folder.
No problem Rsyncing to EncFS on Linux, but it fails on FreeBSD, it only =
created 4 of the 9 subfolders.
So yes PATH_MAX could really be a limitation.

Why did not you make the choice to rebuild the kernel using a higher =
PATH_MAX (instead of using ZOL) ?

Ben


From owner-freebsd-fs@freebsd.org  Tue May 17 15:48:18 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id F0BE3B3F825
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 15:48:18 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x22e.google.com (mail-wm0-x22e.google.com
 [IPv6:2a00:1450:400c:c09::22e])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 873F86795C
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 15:48:18 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x22e.google.com with SMTP id g17so38443557wme.1
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 08:48:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=C+LWWZlWj7EdsooAVmplL5kc3DR7eCEuWRKTw7DK1yM=;
 b=dw0FcIRHHTrYdggM9JNIs1fc0ujz+zsNwpJMktbFh7kqoLZipXYpAfDm041RBGORE5
 0JobYYDMJro90oQhOlsHSxM+w1MjQVemOXRGt1uyEh1PfaI2RilJtsJ4On86aaOIqV3r
 nfv70tu4CzfHtRFsfG9fiSLSZ+uR1+f6DrkNGIkdwL7KVYf//+o/S3Od+NspKi7Ffnu7
 1MAfS2DOmCtUx6ORik37uV7ZpH5UUwRGV8mkKwGx8cX+S57F0x7bes9MgMO5/GfqgovQ
 MzTJYuIYoYtLFDqyn+mU+C4EQoujW23awKdhShsMOsaRqI8Be9pAiJSIX5o4FAIr0mrd
 qMtw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=C+LWWZlWj7EdsooAVmplL5kc3DR7eCEuWRKTw7DK1yM=;
 b=RraUwOJQNNiXfkeqlyhcCxBUo1PrkhTA0w1mGxuWmvwb943DQJ8KciNWPgKMNt/y2G
 L17/3J9e7iTu6qgTdo9arIAYbvx8+HF310yUPu95v3TqYAu+gBntKCXTpb04heVm2N7i
 +5dhEyy2Qjo4wszn5/cT5Rfn0qNcuj0JflMi9sVWGlIH94FPe41JD7JRNg7ZsiGAgRcD
 JIPE3G2GxacQUebjqn2g3TLlPpq9MihPTg21tXNV6IsrRY4bQc2GTkFPwFkeGww6m5c+
 FmBwnuAfDFB2dPBaBzpp5psmaVjgPZTOsUiz1+rq0vfnrWX4+m5yfyX9NxOxA5i3HOlt
 QUZQ==
X-Gm-Message-State: AOPr4FWsv7z/wE3k1dZthdajQ5G9IiCOFTB7yfmbT87iVJ8UzmtNnxFhsOwW5BmIwqY+vQ==
X-Received: by 10.194.166.3 with SMTP id zc3mr2300110wjb.104.1463500097109;
 Tue, 17 May 2016 08:48:17 -0700 (PDT)
Received: from [192.168.1.16] (210.236.26.109.rev.sfr.net. [109.26.236.210])
 by smtp.gmail.com with ESMTPSA id m140sm24636025wma.24.2016.05.17.08.48.16
 for <freebsd-fs@freebsd.org>
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Tue, 17 May 2016 08:48:16 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: Bigger MAX_PATH (Was: Re: State of native encryption in ZFS)
From: Ben RUBSON <ben.rubson@gmail.com>
In-Reply-To: <546E5477-E636-49D4-A137-16FDA2CA1E7B@gmail.com>
Date: Tue, 17 May 2016 17:48:15 +0200
Content-Transfer-Encoding: 7bit
Message-Id: <E39BCF51-3EEA-450A-8B9A-4A6B253E6E9D@gmail.com>
References: <573b27e8.0604620a.3a15c.ffffe914SMTPIN_ADDED_MISSING@mx.google.com>
 <546E5477-E636-49D4-A137-16FDA2CA1E7B@gmail.com>
To: "freebsd-fs@FreeBSD.org" <freebsd-fs@freebsd.org>
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 15:48:19 -0000

For reference, bug report is here :
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=184340

From owner-freebsd-fs@freebsd.org  Tue May 17 16:13:26 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 718DBB3FE99
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 16:13:26 +0000 (UTC)
 (envelope-from joe@getsomewhere.net)
Received: from prak.gameowls.com (prak.gameowls.com
 [IPv6:2001:19f0:5c00:950b:5400:ff:fe14:46b7])
 by mx1.freebsd.org (Postfix) with ESMTP id 4CEF13FB1;
 Tue, 17 May 2016 16:13:26 +0000 (UTC)
 (envelope-from joe@getsomewhere.net)
Received: from [IPv6:2001:470:c412:beef:135:c8df:2d0e:4ea6] (unknown
 [IPv6:2001:470:c412:beef:135:c8df:2d0e:4ea6])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by prak.gameowls.com (Postfix) with ESMTPSA id 1BC5118C3D;
 Tue, 17 May 2016 11:13:18 -0500 (CDT)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: Best practice for high availability ZFS pool
From: Joe Love <joe@getsomewhere.net>
In-Reply-To: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
Date: Tue, 17 May 2016 11:13:18 -0500
Cc: freebsd-fs@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <5DA13472-F575-4D3D-80B7-1BE371237CE5@getsomewhere.net>
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
To: Palle Girgensohn <girgen@FreeBSD.org>
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 16:13:26 -0000


> On May 16, 2016, at 5:08 AM, Palle Girgensohn <girgen@FreeBSD.org> =
wrote:
>=20
> Hi,
>=20
> We need to set up a ZFS pool with redundance. The main goal is high =
availability - uptime.
>=20
> I can see a few of paths to follow.
>=20
> 1. HAST + ZFS
>=20
> 2. Some sort of shared storage, two machines sharing a JBOD box.
>=20
> 3. ZFS replication (zfs snapshot + zfs send | ssh | zfs receive)
>=20
> 4. using something else than ZFS, even a different OS if required.
>=20
> My main concern with HAST+ZFS is performance. Google offer some =
insights here, I find mainly unsolved problems. Please share any success =
stories or other experiences.
>=20
> Shared storage still has a single point of failure, the JBOD box. =
Apart from that, is there even any support for the kind of storage PCI =
cards that support dual head for a storage box? I cannot find any.
>=20
> We are running with ZFS replication today, but it is just too slow for =
the amount of data.
>=20
> We prefer to keep ZFS as we already have a rather big (~30 TB) pool =
and also tools, scripts, backup all is using ZFS, but if there is no =
solution using ZFS, we're open to alternatives. Nexenta springs to mind, =
but I believe it is using shared storage for redundance, so it does have =
single points of failure?
>=20
> Any other suggestions? Please share your experience. :)
>=20
> Palle
>=20

I don=E2=80=99t know if this falls into the realm of what you want, but =
BSDMag just released an issue with an article entitled =E2=80=9CAdding =
ZFS to the FreeBSD dual-controller storage concept.=E2=80=9D
https://bsdmag.org/download/reusing_openbsd/

My understanding in this setup is that the only single point of failure =
for this model is the backplanes that the drives would connect to.  =
Depending on your controller cards, this could be alleviated by simply =
using multiple drive shelves, and only using one drive/shelf as part of =
a vdev (then stripe or whatnot over your vdevs).

It might not be what you=E2=80=99re after, as it=E2=80=99s basically two =
systems with their own controllers, with a shared set of drives.  Some =
expansion from the virtual world to real physical systems will probably =
need additional variations.
I think the TrueNAS system (with HA) is setup similar to this, only =
without the split between the drives being primarily handled by separate =
controllers, but someone with more in-depth knowledge would need to =
confirm/deny this.

-Joe


From owner-freebsd-fs@freebsd.org  Tue May 17 16:20:04 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 93633B3F04A
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 16:20:04 +0000 (UTC)
 (envelope-from girgen@FreeBSD.org)
Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202])
 by mx1.freebsd.org (Postfix) with ESMTP id 257DC6454D
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 16:20:03 +0000 (UTC)
 (envelope-from girgen@FreeBSD.org)
Received: from [172.16.0.5] (citron.pingpong.net [195.178.173.66])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.pingpong.net (Postfix) with ESMTPSA id 4821716BE8;
 Tue, 17 May 2016 18:19:55 +0200 (CEST)
Subject: Re: Best practice for high availability ZFS pool
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Content-Type: multipart/signed;
 boundary="Apple-Mail=_7087E43E-6579-48E7-BFA5-610E1B270D42";
 protocol="application/pgp-signature"; micalg=pgp-sha256
X-Pgp-Agent: GPGMail 2.6b2
From: Palle Girgensohn <girgen@FreeBSD.org>
In-Reply-To: <5DA13472-F575-4D3D-80B7-1BE371237CE5@getsomewhere.net>
Date: Tue, 17 May 2016 18:19:54 +0200
Cc: freebsd-fs@freebsd.org,
 Julian Akehurst <julian@pingpong.net>
Message-Id: <7D4449E9-5875-45EB-8559-3B43F2E5E3B0@FreeBSD.org>
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
 <5DA13472-F575-4D3D-80B7-1BE371237CE5@getsomewhere.net>
To: Joe Love <joe@getsomewhere.net>
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 16:20:04 -0000


--Apple-Mail=_7087E43E-6579-48E7-BFA5-610E1B270D42
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8


> 17 maj 2016 kl. 18:13 skrev Joe Love <joe@getsomewhere.net>:
>=20
>=20
>> On May 16, 2016, at 5:08 AM, Palle Girgensohn <girgen@FreeBSD.org> =
wrote:
>>=20
>> Hi,
>>=20
>> We need to set up a ZFS pool with redundance. The main goal is high =
availability - uptime.
>>=20
>> I can see a few of paths to follow.
>>=20
>> 1. HAST + ZFS
>>=20
>> 2. Some sort of shared storage, two machines sharing a JBOD box.
>>=20
>> 3. ZFS replication (zfs snapshot + zfs send | ssh | zfs receive)
>>=20
>> 4. using something else than ZFS, even a different OS if required.
>>=20
>> My main concern with HAST+ZFS is performance. Google offer some =
insights here, I find mainly unsolved problems. Please share any success =
stories or other experiences.
>>=20
>> Shared storage still has a single point of failure, the JBOD box. =
Apart from that, is there even any support for the kind of storage PCI =
cards that support dual head for a storage box? I cannot find any.
>>=20
>> We are running with ZFS replication today, but it is just too slow =
for the amount of data.
>>=20
>> We prefer to keep ZFS as we already have a rather big (~30 TB) pool =
and also tools, scripts, backup all is using ZFS, but if there is no =
solution using ZFS, we're open to alternatives. Nexenta springs to mind, =
but I believe it is using shared storage for redundance, so it does have =
single points of failure?
>>=20
>> Any other suggestions? Please share your experience. :)
>>=20
>> Palle
>>=20
>=20
> I don=E2=80=99t know if this falls into the realm of what you want, =
but BSDMag just released an issue with an article entitled =E2=80=9CAdding=
 ZFS to the FreeBSD dual-controller storage concept.=E2=80=9D
> https://bsdmag.org/download/reusing_openbsd/
>=20
> My understanding in this setup is that the only single point of =
failure for this model is the backplanes that the drives would connect =
to.  Depending on your controller cards, this could be alleviated by =
simply using multiple drive shelves, and only using one drive/shelf as =
part of a vdev (then stripe or whatnot over your vdevs).
>=20
> It might not be what you=E2=80=99re after, as it=E2=80=99s basically =
two systems with their own controllers, with a shared set of drives.  =
Some expansion from the virtual world to real physical systems will =
probably need additional variations.
> I think the TrueNAS system (with HA) is setup similar to this, only =
without the split between the drives being primarily handled by separate =
controllers, but someone with more in-depth knowledge would need to =
confirm/deny this.
>=20
> -Joe
>=20


This is actually very interesting IMO.

It is simple and easy to understand. Problem is I didn't find any proper =
controller cards for it. I think this is what Nexenta does as well as =
TrueNAS, with their HA versions.

I'll check out the article, thanks!

Palle


--Apple-Mail=_7087E43E-6579-48E7-BFA5-610E1B270D42
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iQEcBAEBCAAGBQJXO0SrAAoJEDQn0sf36UlsInYIAMPaUZ8Fw5YYy0Zqk3/1JpL0
q4KLG8+iCMuagZWJyarF5EdmEJAw+hEuWRbG8uAH1gr7XS8BEN58QutxI6zKKdVm
LSKgpXCxlOQdR3M/fJuE09t+YWepcs+MmAbR8ns5YoceURZU1rXNdjTwGdhqPvk3
PfFVhPX6CiFG3YlqsGcfAKfqVBbhkzmh5bvg7rHGH+TIZDx3qTsOhnW97j86Rr5V
rV2Egf6vEOCuJN8GvzQAmE4E7X2+o+kS2EugUtbWCAurmK0/kM3qTC2+7BpQW2vn
dgUmXX6wElNSTIOyBzksLMlq7L4fxi0Gdv2p1EOWP1LU9AKDInNMDrfpkogN/Ls=
=4d2V
-----END PGP SIGNATURE-----

--Apple-Mail=_7087E43E-6579-48E7-BFA5-610E1B270D42--

From owner-freebsd-fs@freebsd.org  Tue May 17 17:06:44 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E98EBB3F35A
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 17:06:44 +0000 (UTC)
 (envelope-from bfriesen@simple.dallas.tx.us)
Received: from smtp.simplesystems.org (smtp.simplesystems.org [65.66.246.90])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
 bits)) (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id B88661105
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 17:06:44 +0000 (UTC)
 (envelope-from bfriesen@simple.dallas.tx.us)
Received: from freddy.simplesystems.org (freddy.simplesystems.org
 [65.66.246.65])
 by smtp.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id u4HH6fv9028372;
 Tue, 17 May 2016 12:06:42 -0500 (CDT)
Date: Tue, 17 May 2016 12:06:41 -0500 (CDT)
From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-X-Sender: bfriesen@freddy.simplesystems.org
To: Ben RUBSON <ben.rubson@gmail.com>
cc: freebsd-fs@freebsd.org
Subject: Re: Best practice for high availability ZFS pool
In-Reply-To: <40C35566-B7FB-4F59-BB41-D43BC0362C26@gmail.com>
Message-ID: <alpine.GSO.2.20.1605171201040.14628@freddy.simplesystems.org>
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
 <alpine.GSO.2.20.1605162034170.7756@freddy.simplesystems.org>
 <AB71607F-7048-404E-AFE3-D448823BB768@gmail.com>
 <alpine.GSO.2.20.1605170819220.7756@freddy.simplesystems.org>
 <40C35566-B7FB-4F59-BB41-D43BC0362C26@gmail.com>
User-Agent: Alpine 2.20 (GSO 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
 (smtp.simplesystems.org [65.66.246.90]); Tue, 17 May 2016 12:06:42 -0500 (CDT)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 17:06:45 -0000

On Tue, 17 May 2016, Ben RUBSON wrote:

>> On 17 may 2016 at 15:24, Bob Friesenhahn <bfriesen@simple.dallas.tx.us> wrote:
>>
>> There is at least one case of zfs send propagating a problem into the receiving pool. I don't know if it broke the pool.  Corrupt data may be sent from one pool to another if it passes checksums.
>
> Do you have any link to this problem ? Would be interesting to know if it was possible to come-back to a previous snapshot / consistent pool.

I don't have a link but I recall that it had something to do with the 
ability to send file 'holes' in the stream.

> I think that making ZFS send/receive has a higher security level than mirroring to a second (or third) JBOD box.
> With mirroring you will still have only one ZFS pool.

This is a reasonable assumption.

> However, if send/receive makes the receiving pool the exact 1:1 copy 
> of the sending pool, then the thing which made the sending pool to 
> corrupt could reach (and corrupt) the receiving pool... I don't know 
> whether or not this could occur, and if ever it occurs, if we have 
> the chance to revert to a previous snapshot, at least on the 
> receiving side...

Zfs receive does not result in a 1:1 copy.  The underlying data 
organization can be completely different and compression or other 
options can be changed.

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From owner-freebsd-fs@freebsd.org  Tue May 17 18:00:45 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D0516B3F2BB
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 18:00:45 +0000 (UTC)
 (envelope-from lkateley@kateley.com)
Received: from mail-io0-x22f.google.com (mail-io0-x22f.google.com
 [IPv6:2607:f8b0:4001:c06::22f])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id A5FFB1C1F
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 18:00:45 +0000 (UTC)
 (envelope-from lkateley@kateley.com)
Received: by mail-io0-x22f.google.com with SMTP id i75so33759545ioa.3
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 11:00:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=kateley-com.20150623.gappssmtp.com; s=20150623;
 h=reply-to:subject:references:to:from:organization:message-id:date
 :user-agent:mime-version:in-reply-to:content-transfer-encoding;
 bh=wP2UykMcSjQT9UMzYWI3gy6Q2O3oGjERkEmxjNa9Mk8=;
 b=jvjoRUiURtJav+iXlTT9V8az+nURg55tF/X9oqY5+rJIRdvIUrwdBFY4c2upNluZFN
 TZTtIgvhXMywMoU5au4dpBLbBOFX3ucjOyg5Vh75aT4eegh8iwFCYHkGF/IG5K83vZtz
 dO2piTvjdvU76wixOG0qn/xe8yslZw6FXnJyVi4WF+xFXoHE41iVinGZ1oblx1/f03ag
 xX1vKrSwYYEYndQUvPkv6G9SYZe8vxW0dsuktWT6xVObwZOtyiamHbW8+HBi7LgZvWzc
 kRhCBA2dYWc/JbOTa/yZi+mcgNhF9urNcWhxsyutrZt9vNnXaUcV+kihDn88Qbd7t9NL
 IuOg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:reply-to:subject:references:to:from:organization
 :message-id:date:user-agent:mime-version:in-reply-to
 :content-transfer-encoding;
 bh=wP2UykMcSjQT9UMzYWI3gy6Q2O3oGjERkEmxjNa9Mk8=;
 b=ZgfdViujFJQcwxR+F3Jql8exIgAO+asYCggEjax1AqerwGM0l0LmgmZdQZ4zoiom7o
 ppWypqYnh4o4wi/JhOqmT0t/5gp2xC/ZmqmM6wIu9dV2JKBobg6mxC4JjQBJ3zMmgxVp
 NFMbEi+VeExbs1Up7hv0D+Nnl3NAJKHNAxfu34ILRyNS3gvM6t2X/yYLkAGyNVfx/ShA
 YcUDDTfO/7Wwl0mMqD07+DWcbMDynABU5p56Ph30d4dL4ZhtNsgUkbv7dU9izRuR6OAp
 MGSDGGxAD9KYv1AV2yBOSIpojPG1AYTmNRlduBRCI0mkMqNedIYU4IG9zipHUxGlqMvv
 3WIA==
X-Gm-Message-State: AOPr4FWoZ7l+qZCwWdFvDrXb+/BOKz5Esq8I7ddJjDhkAZpTJwkyDyIRLa0nKB6CGLHmxA==
X-Received: by 10.36.83.20 with SMTP id n20mr2175221itb.61.1463508045094;
 Tue, 17 May 2016 11:00:45 -0700 (PDT)
Received: from [192.168.0.4] ([63.231.252.189])
 by smtp.googlemail.com with ESMTPSA id j188sm1399272ita.8.2016.05.17.11.00.43
 for <freebsd-fs@freebsd.org> (version=TLSv1/SSLv3 cipher=OTHER);
 Tue, 17 May 2016 11:00:44 -0700 (PDT)
Reply-To: linda@kateley.com
Subject: Re: Best practice for high availability ZFS pool
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
 <5DA13472-F575-4D3D-80B7-1BE371237CE5@getsomewhere.net>
To: freebsd-fs@freebsd.org
From: Linda Kateley <lkateley@kateley.com>
Organization: Kateley Company
Message-ID: <573B5C4B.80406@kateley.com>
Date: Tue, 17 May 2016 13:00:43 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0)
 Gecko/20100101 Thunderbird/38.7.2
MIME-Version: 1.0
In-Reply-To: <5DA13472-F575-4D3D-80B7-1BE371237CE5@getsomewhere.net>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 18:00:45 -0000


On 5/17/16 11:13 AM, Joe Love wrote:
>> On May 16, 2016, at 5:08 AM, Palle Girgensohn <girgen@FreeBSD.org> wrote:
>>
>> Hi,
>>
>> We need to set up a ZFS pool with redundance. The main goal is high availability - uptime.
>>
>> I can see a few of paths to follow.
>>
>> 1. HAST + ZFS
>>
>> 2. Some sort of shared storage, two machines sharing a JBOD box.
>>
>> 3. ZFS replication (zfs snapshot + zfs send | ssh | zfs receive)
>>
>> 4. using something else than ZFS, even a different OS if required.
>>
>> My main concern with HAST+ZFS is performance. Google offer some insights here, I find mainly unsolved problems. Please share any success stories or other experiences.
>>
>> Shared storage still has a single point of failure, the JBOD box. Apart from that, is there even any support for the kind of storage PCI cards that support dual head for a storage box? I cannot find any.
>>
>> We are running with ZFS replication today, but it is just too slow for the amount of data.
>>
>> We prefer to keep ZFS as we already have a rather big (~30 TB) pool and also tools, scripts, backup all is using ZFS, but if there is no solution using ZFS, we're open to alternatives. Nexenta springs to mind, but I believe it is using shared storage for redundance, so it does have single points of failure?
>>
>> Any other suggestions? Please share your experience. :)
For true high availability there is an application RSF-1 that can get 
full HA. I am not sure the exact failover times, but the last time I 
talked to them, it was very low. They also run higher up in ZFS.
>>
>> Palle
>>
> I don’t know if this falls into the realm of what you want, but BSDMag just released an issue with an article entitled “Adding ZFS to the FreeBSD dual-controller storage concept.”
> https://bsdmag.org/download/reusing_openbsd/
>
> My understanding in this setup is that the only single point of failure for this model is the backplanes that the drives would connect to.
Most of the jbods you can buy also have the ability to have dual 
backplanes also
>   Depending on your controller cards, this could be alleviated by simply using multiple drive shelves, and only using one drive/shelf as part of a vdev (then stripe or whatnot over your vdevs).
>
> It might not be what you’re after, as it’s basically two systems with their own controllers, with a shared set of drives.  Some expansion from the virtual world to real physical systems will probably need additional variations.
> I think the TrueNAS system (with HA) is setup similar to this, only without the split between the drives being primarily handled by separate controllers, but someone with more in-depth knowledge would need to confirm/deny this.
>
> -Joe
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@freebsd.org  Tue May 17 19:04:27 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C5A8BB3E25C
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 19:04:27 +0000 (UTC)
 (envelope-from brandon.wandersee@gmail.com)
Received: from mail-ig0-f171.google.com (mail-ig0-f171.google.com
 [209.85.213.171])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 9D71918CF
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 19:04:27 +0000 (UTC)
 (envelope-from brandon.wandersee@gmail.com)
Received: by mail-ig0-f171.google.com with SMTP id bi2so80216550igb.0
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 12:04:27 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:references:user-agent:from:to:cc:subject
 :in-reply-to:date:message-id:mime-version;
 bh=qXUBYXlS1mLmujjk3kms9ENcUdPNe0C8oGEQ3MbMq98=;
 b=Zg9rue/4lvyDuFeNQ0FGZ/ixCw6ridrbh/suV4sMFRo8kVoEZ/HLTvyhfURAIIZpCq
 kbwz1DerbeXM1dQrgrMqLKwmnt59wUe1/UZg8dAFHLO4G+OHs0D+sv9k6Cz48irLmXs/
 WcDeTyQr/vNEAxaPlD0/HNCjjDV9Y8WBAbMq3HT7VbQGoVGHThAnFZs3yIHOrT6vretO
 h+hqogtAI3tiCgf6kGICcui6oR5nnwNk/euQA4FqoDvMjVlulzNRXe4zLNaUAxcERBU2
 +IC/a7gOguN1B0C1FqRTBskUNEzvpX2bR1NwF14Z05G5DQJhXUahz4sAurqZr5H11JT7
 9pgg==
X-Gm-Message-State: AOPr4FUYo2MOQpD7EUwUYvYARaODmRIs1hD35K4MNJDXZxh3abFttmTuZ622nk4Lj8Yc3g==
X-Received: by 10.50.140.193 with SMTP id ri1mr15275060igb.60.1463511861711;
 Tue, 17 May 2016 12:04:21 -0700 (PDT)
Received: from WorkBox.Home.gmail.com (97-116-8-66.mpls.qwest.net.
 [97.116.8.66])
 by smtp.gmail.com with ESMTPSA id g186sm1478399iof.27.2016.05.17.12.04.19
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 17 May 2016 12:04:20 -0700 (PDT)
References: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
 <f87ec54a-104e-e712-7793-86c37285fdaa@internetx.com>
 <16e474da-6b20-2e51-9981-3c262eaff350@lexa.ru>
 <BD7DE274-04EB-4B19-988D-5A6FADC5B51A@digsys.bg>
 <1e012e43-a49b-6923-3f0a-ee77a5c8fa70@lexa.ru>
User-agent: mu4e 0.9.16; emacs 24.5.1
From: Brandon J. Wandersee <brandon.wandersee@gmail.com>
To: Alex Tutubalin <lexa@lexa.ru>
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS performance bottlenecks: CPU or RAM or anything else?
In-reply-to: <1e012e43-a49b-6923-3f0a-ee77a5c8fa70@lexa.ru>
Date: Tue, 17 May 2016 14:04:18 -0500
Message-ID: <86shxgsdzh.fsf@WorkBox.Home>
MIME-Version: 1.0
Content-Type: text/plain
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 19:04:27 -0000


Alex Tutubalin writes:

> On 5/17/2016 3:29 PM, Daniel Kalchev wrote:
>
>> Not true. You can have N-way mirror and it will survive N-1 drive failures.
> I agree, but 3-way mirror does not looks economical compared to raidz2.

If you're already planning for multiple simultaneous drive failures,
"economical" isn't really a factor, is it? Those disks have to get
replaced regardless of the redundancy scheme you assign to them. ;)

Whether the concern is performance or capacity, mirrors will offer the
most flexibility. Increasing either the performance or capacity of a
RAIDZ pool necessitates either replacing every disk in the pool or
doubling the number of disks in the pool, all at once. Mirrors allow you
to grow a pool and increase/decrease redundancy asymmetrically. True,
four disks in a two-mirror stripe will see you restoring a backup if one
disk from each mirror dies, but (arguably) six disks in a two-mirror
stripe offer both better redundancy and better performance.

Speaking strictly about performance, RAIDZ performance is pretty much
fixed, while mirrored performance will (I believe) increase slightly as
you add disks and increase greatly as you add vdevs.

-- 

::  Brandon J. Wandersee
::  brandon.wandersee@gmail.com
::  --------------------------------------------------
::  'The best design is as little design as possible.'
::  --- Dieter Rams ----------------------------------

From owner-freebsd-fs@freebsd.org  Tue May 17 19:11:06 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6E7C3B3E397
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 19:11:06 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 5B6101B79
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 19:11:06 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 571DDB3E396; Tue, 17 May 2016 19:11:06 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 54778B3E394
 for <fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 19:11:06 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au
 [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id EF8D71B76
 for <fs@freebsd.org>; Tue, 17 May 2016 19:11:05 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au
 (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109])
 by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 9F912780BD4;
 Wed, 18 May 2016 05:11:01 +1000 (AEST)
Date: Wed, 18 May 2016 05:11:01 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
cc: fs@freebsd.org
Subject: Re: quick fix for slow directory shrinking in ffs
In-Reply-To: <20160517111715.GC89104@kib.kiev.ua>
Message-ID: <20160518035413.L4357@besplex.bde.org>
References: <20160517072705.F2157@besplex.bde.org>
 <20160517082050.GX89104@kib.kiev.ua>
 <20160517192933.U4573@besplex.bde.org> <20160517111715.GC89104@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=c+ZWOkJl c=1 sm=1 tr=0
 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10
 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=m90GG2ySlDWwqfHBogYA:9
 a=5Ij-lXQwDHgBn59Q:21 a=qLoGEIkQGsovWur8:21 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 19:11:06 -0000

On Tue, 17 May 2016, Konstantin Belousov wrote:

> On Tue, May 17, 2016 at 08:26:26PM +1000, Bruce Evans wrote:
>> On Tue, 17 May 2016, Konstantin Belousov wrote:
>>
>>> On Tue, May 17, 2016 at 07:54:27AM +1000, Bruce Evans wrote:
>>>> ffs does very slow shrinking of directories after removing some files
>>>> leaves unused blocks at the end, by always doing synchronous truncation.
>>>> ...
>>>> X Index: ufs_lookup.c
>>>> X ===================================================================
>>>> X --- ufs_lookup.c	(revision 299263)
>>>> X +++ ufs_lookup.c	(working copy)
>>>> X @@ -1131,9 +1131,9 @@
>>>> X  		if (tvp != NULL)
>>>> X  			VOP_UNLOCK(tvp, 0);
>>>> X  		error = UFS_TRUNCATE(dvp, (off_t)dp->i_endoff,
>>>> X -		    IO_NORMAL | IO_SYNC, cr);
>>>> X +		    IO_NORMAL | (DOINGASYNC(dvp) ? 0 : IO_SYNC), cr);
>>>> X  		if (error != 0)
>>>> X -			vprint("ufs_direnter: failted to truncate", dvp);
>>>> X +			vprint("ufs_direnter: failed to truncate", dvp);

I keep looking at wrong versions.  I checked the old version and now see
another problem with this "failted" message (which you fixed).  It is
debugging code and shouldn't be printed at all.  Old versions ignored
errors from the truncation since the truncation is supposed to be optional
but that was broken for dirhash so r262812 added error handling.  If
the error handling actually works, then this becomes a non-error.

>> Some relevant code in ffs_truncate:

This was from an old versions.  Perhaps r181717.  FreeBSD-8 is
similar, but FreeBSD-9+ has most of my DOINGASYNC() additions and
-current has just 2 more of them than FreeBSD-9.

>> Y 	/*
>> Y 	 * Shorten the size of the file. If the file is not being
>> Y 	 * truncated to a block boundary, the contents of the
>> Y 	 * partial block following the end of the file must be
>> Y 	 * zero'ed in case it ever becomes accessible again because
>> Y 	 * of subsequent file growth. Directories however are not
>> Y 	 * zero'ed as they should grow back initialized to empty.
>> Y 	 */
>> Y 	offset = blkoff(fs, length);
>> Y 	if (offset == 0) {
>> Y 		ip->i_size = length;
>> Y 		DIP_SET(ip, i_size, length);
>> Y 	} else {
>> Y 		lbn = lblkno(fs, length);
>> Y 		flags |= BA_CLRBUF;
>> Y 		error = UFS_BALLOC(vp, length - 1, 1, cred, flags, &bp);
>> Y 		if (error) {
>> Y 			return (error);
>> Y 		}
>> Y 		/*
>> Y 		 * When we are doing soft updates and the UFS_BALLOC
>> Y 		 * above fills in a direct block hole with a full sized
>> Y 		 * block that will be truncated down to a fragment below,
>> Y 		 * we must flush out the block dependency with an FSYNC
>> Y 		 * so that we do not get a soft updates inconsistency
>> Y 		 * when we create the fragment below.
>> Y 		 */
>> Y 		if (DOINGSOFTDEP(vp) && lbn < NDADDR &&
>> Y 		    fragroundup(fs, blkoff(fs, length)) < fs->fs_bsize &&
>> Y 		    (error = ffs_syncvnode(vp, MNT_WAIT)) != 0)
>> Y 			return (error);
>> Y 		ip->i_size = length;
>> Y 		DIP_SET(ip, i_size, length);
>> Y 		size = blksize(fs, ip, lbn);
>> Y 		if (vp->v_type != VDIR)
>> Y 			bzero((char *)bp->b_data + offset,
>> Y 			    (u_int)(size - offset));
>> Y 		/* Kirk's code has reallocbuf(bp, size, 1) here */
>> Y 		allocbuf(bp, size);
>> Y 		if (bp->b_bufsize == fs->fs_bsize)
>> Y 			bp->b_flags |= B_CLUSTEROK;
>> Y 		if (flags & IO_SYNC)
>> Y 			bwrite(bp);
>> Y 		else
>> Y 			bawrite(bp);

FreeBSD-9+ already has my DOINGASYNC() fix here.  However, an async
write is still done when DOINGASYNC().  It is done by vtruncbuf() 50
lines after here.  vtruncbuf() doesn't know about DOINGASYNC().  It
turns delayed writes into unconditional async ones.

>> Y 	}
>>
>> I think we usually arrive here and honor the IO_SYNC flag.  This is correct.
>> Otherwise, we always do an async write, but that is wrong for async mounts.
>> Here is my old fix for this:
>>
>> Z diff -u2 ffs_inode.c~ ffs_inode.c
>> Z --- ffs_inode.c~	Wed Apr  7 21:22:26 2004
>> Z +++ ffs_inode.c	Sat Mar 23 01:23:16 2013
>> Z @@ -345,4 +431,6 @@
>> Z  		if (flags & IO_SYNC)
>> Z  			bwrite(bp);
>> Z +		else if (DOINGASYNC(ovp))
>> Z +			bdwrite(bp);
>> Z  		else
>> Z  			bawrite(bp);
>>
>> This fix must be sprinkled in most places where there is a bwrite()/
>> bawrite() decision.
> No, I do not think that it would be correct for SU mounts.  It is essential

SU silently ignores the async mount flag (by silently killing it instead
of ignoring it later), so the DOINGASYNC() checks don't affect it.

> for the correct operation of e.g. ffs_indirtrunc() that writes for SU
> case are synchronous, since no dependencies on the indirect block updates
> are recorded. The fact that syncvnode() is done before is similarly
> important, because no existing dependencies are cleared.
>
> On the other hand, I agree with the note that the final ffs_update()
> must honour IO_SYNC requests.
>
> Anyway, my point was that your patch does not change the hardest source
> of sync writes, only the write of the final block.  I will commit the
> following.

Er, it fixes all cases of directory shrinking for async mounts.

All cases should probably use watermarks and shrink at block or frag
boundaries instead of 512-boundaries.  E.g., for small directories,
shrink if size - endoff >= fs_fsize && <shrinking would leave at
least half of fs_fsize for expansion without adding another frag>.
WIth fs_fsize = 2K, this gives for example:
- size <= 2K: never shrink
- size nearly 4K but endoff between 1K and 2K: don't shrink, because
   shrinking would free a frag but not leave much space for expansion.

> diff --git a/sys/ufs/ffs/ffs_inode.c b/sys/ufs/ffs/ffs_inode.c
> index 0202820..50b456b 100644
> --- a/sys/ufs/ffs/ffs_inode.c
> +++ b/sys/ufs/ffs/ffs_inode.c
> @@ -610,7 +610,7 @@ extclean:
> 		softdep_journal_freeblocks(ip, cred, length, IO_EXT);
> 	else
> 		softdep_setup_freeblocks(ip, length, IO_EXT);
> -	return (ffs_update(vp, !DOINGASYNC(vp)));
> +	return (ffs_update(vp, (flags & IO_SYNC) != 0 || !DOINGASYNC(vp)));
> }
>
> /*

Oops, this needs fixing in my version, but in -current the fix has
little effect since in -current ffs_update() still dishonors the waitfor
flag for its bwrite()/bdwrite() decision if DOINGASYNC().  This is
essentially the same as dishonoring the IO_SYNC flag here.

ffs_update() needs the same fix in 4 more places.

> diff --git a/sys/ufs/ufs/ufs_lookup.c b/sys/ufs/ufs/ufs_lookup.c
> index 43b4e5c..53536ff 100644
> --- a/sys/ufs/ufs/ufs_lookup.c
> +++ b/sys/ufs/ufs/ufs_lookup.c
> @@ -1131,7 +1131,7 @@ ufs_direnter(dvp, tvp, dirp, cnp, newdirbp, isrename)
> 		if (tvp != NULL)
> 			VOP_UNLOCK(tvp, 0);
> 		error = UFS_TRUNCATE(dvp, (off_t)dp->i_endoff,
> -		    IO_NORMAL | IO_SYNC, cr);
> +		    IO_NORMAL | (DOINGASYNC(dvp) ? 0 : IO_SYNC), cr);
> 		if (error != 0)
> 			vprint("ufs_direnter: failed to truncate", dvp);
> #ifdef UFS_DIRHASH
>

OK.  I want this to avoid _any_ sync writes here for async mounts even
after the excessive truncations are fixed.

Perhaps vtruncbuf() should just check the async mount flag to avoid
async writes (except possibly when buf_dirty_count_severe()).

Bruce

From owner-freebsd-fs@freebsd.org  Tue May 17 19:28:10 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B2E36B3EB1E
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 19:28:10 +0000 (UTC)
 (envelope-from lexa@lexa.ru)
Received: from mx3.lexa.ru (ns503534.ip-198-27-68.net [198.27.68.102])
 by mx1.freebsd.org (Postfix) with ESMTP id 95EB01383
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 19:28:09 +0000 (UTC)
 (envelope-from lexa@lexa.ru)
Received: by mx3.lexa.ru (Postfix, from userid 66)
 id 1F830224A61; Tue, 17 May 2016 15:28:08 -0400 (EDT)
Received: from [193.124.130.166] (unknown [193.124.130.166])
 by home-gw.lexa.ru (Postfix) with ESMTP id 38382CA5
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 22:24:19 +0300 (MSK)
Subject: Re: ZFS performance bottlenecks: CPU or RAM or anything else?
References: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
 <f87ec54a-104e-e712-7793-86c37285fdaa@internetx.com>
 <16e474da-6b20-2e51-9981-3c262eaff350@lexa.ru>
 <BD7DE274-04EB-4B19-988D-5A6FADC5B51A@digsys.bg>
 <1e012e43-a49b-6923-3f0a-ee77a5c8fa70@lexa.ru> <86shxgsdzh.fsf@WorkBox.Home>
To: freebsd-fs@freebsd.org
From: Alex Tutubalin <lexa@lexa.ru>
Message-ID: <d02963de-bcfe-271e-7536-6546e4e6230e@lexa.ru>
Date: Tue, 17 May 2016 22:24:19 +0300
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <86shxgsdzh.fsf@WorkBox.Home>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 19:28:10 -0000


> If you're already planning for multiple simultaneous drive failures,
> "economical" isn't really a factor, is it? Those disks have to get
> replaced regardless of the redundancy scheme you assign to them. ;)

I do not plan failures, but there always a chance of bad drive model.  
I've survived 3-Tb Seagates without data loss. It was not possible in my 
case without RAID6 (hardware).

> Speaking strictly about performance, RAIDZ performance is pretty much
> fixed,

Anyway, my thread-starting question is different:

I see great performance difference on same pool connected to different 
CPU/ram combo.
I do not know what caused this difference: CPU speed, RAM bandwidth, or 
RAM latency.

May be someone in this list has benchmarked ZFS RAIDZ for performance 
and knows what is the bottleneck?


Alex Tutubalin

From owner-freebsd-fs@freebsd.org  Tue May 17 19:40:19 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id EC41CB3D061
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 19:40:19 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id D91BE1D0B
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 19:40:19 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id D4FC3B3D05F; Tue, 17 May 2016 19:40:19 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D2582B3D05E
 for <fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 19:40:19 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au
 [211.29.132.249])
 by mx1.freebsd.org (Postfix) with ESMTP id 8F5851D0A
 for <fs@freebsd.org>; Tue, 17 May 2016 19:40:19 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au
 (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109])
 by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 4E38C1046239;
 Wed, 18 May 2016 05:40:10 +1000 (AEST)
Date: Wed, 18 May 2016 05:40:07 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
cc: Konstantin Belousov <kostikbel@gmail.com>, fs@freebsd.org
Subject: Re: quick fix for slow directory shrinking in ffs
In-Reply-To: <20160518035413.L4357@besplex.bde.org>
Message-ID: <20160518052656.R5764@besplex.bde.org>
References: <20160517072705.F2157@besplex.bde.org>
 <20160517082050.GX89104@kib.kiev.ua>
 <20160517192933.U4573@besplex.bde.org> <20160517111715.GC89104@kib.kiev.ua>
 <20160518035413.L4357@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=c+ZWOkJl c=1 sm=1 tr=0
 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10
 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=7sixqL4dHYFS359oKKQA:9
 a=91Z7TcaMQPEi1bVU:21 a=sc9wKKxEHgg0U6Hn:21 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 19:40:20 -0000

On Wed, 18 May 2016, Bruce Evans wrote:

> On Tue, 17 May 2016, Konstantin Belousov wrote:
>> diff --git a/sys/ufs/ffs/ffs_inode.c b/sys/ufs/ffs/ffs_inode.c
>> index 0202820..50b456b 100644
>> --- a/sys/ufs/ffs/ffs_inode.c
>> +++ b/sys/ufs/ffs/ffs_inode.c
>> @@ -610,7 +610,7 @@ extclean:
>> 		softdep_journal_freeblocks(ip, cred, length, IO_EXT);
>> 	else
>> 		softdep_setup_freeblocks(ip, length, IO_EXT);
>> -	return (ffs_update(vp, !DOINGASYNC(vp)));
>> +	return (ffs_update(vp, (flags & IO_SYNC) != 0 || !DOINGASYNC(vp)));
>> }
>> 
>> /*
>
> Oops, this needs fixing in my version, but in -current the fix has
> little effect since in -current ffs_update() still dishonors the waitfor
> flag for its bwrite()/bdwrite() decision if DOINGASYNC().  This is
> essentially the same as dishonoring the IO_SYNC flag here.
>
> ffs_update() needs the same fix in 4 more places.

Also, ftruncate() seems to be broken.  POSIX doesn't seem to require it
to honor O_SYNC, but POLA requires this.  But there is no VOP_TRUNCATE();
truncation is done using VOP_SETATTR() and there is no way to pass down
the O_SYNC flag to it; in practice, ffs just does UFS_TRUNCATE() without
IO_SYNC.

This makes a difference mainly for async mounts with my fixes to honor
IO_SYNC in ffs_update().  With async mounts, consistency of the file
system is not guaranteed but O_SYNC for a file should at least cause
all of the file data and most of its metdata to be written.  Not syncing
for ftruncate() unnecessarily loses metadata writes.  With !async mounts,
consistency of the file system is partly guaranteed and lost metadata
writes for ftruncate() shouldn't affect this -- they should just lose
the ftruncate() atomically.

vfs could do an fsync() after VOP_SETATTR() for the O_SYNC case.  This
reduces the race window.

Bruce

From owner-freebsd-fs@freebsd.org  Tue May 17 20:39:56 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7D24CB3FAB6
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 20:39:56 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 6B4E214EE
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 20:39:56 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 6A8DDB3FAB5; Tue, 17 May 2016 20:39:56 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6A310B3FAB3
 for <fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 20:39:56 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id 37A8314ED
 for <fs@freebsd.org>; Tue, 17 May 2016 20:39:55 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au
 (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id AE2BA428F79;
 Wed, 18 May 2016 06:39:52 +1000 (AEST)
Date: Wed, 18 May 2016 06:39:49 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
cc: fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
In-Reply-To: <20160517084241.GY89104@kib.kiev.ua>
Message-ID: <20160518061040.D5948@besplex.bde.org>
References: <20160517072104.I2137@besplex.bde.org>
 <20160517084241.GY89104@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=EfU1O6SC c=1 sm=1 tr=0
 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10
 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=QJha7pIZtpZdJjQaBnUA:9
 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 20:39:56 -0000

On Tue, 17 May 2016, Konstantin Belousov wrote:

> On Tue, May 17, 2016 at 07:26:08AM +1000, Bruce Evans wrote:
>> Counting of i/o's in g_vfs_strategy() requires the fs to initialize
>> devvp->v_rdev->si_mountpt to non-null.  This seems to be done correctly
>> in ext2fs and msdosfs, but in ffs it is not done for ro mounts, or for
>> rw mounts that started as ro.  The bug is most obvious for the root
>> file system since it always starts as ro.
>
> I committed the comments updates.
>
> For the accounting patch, don't we want to account for all io, including
> the mount-time metadata reads and initial superblock update ?
>
> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
> index 9776554..712fc21 100644
> --- a/sys/ufs/ffs/ffs_vfsops.c
> +++ b/sys/ufs/ffs/ffs_vfsops.c
> @@ -780,6 +780,8 @@ ffs_mountfs(devvp, mp, td)
> 		mp->mnt_iosize_max = MAXPHYS;
>
> 	devvp->v_bufobj.bo_ops = &ffs_ops;
> +	if (devvp->v_type == VCHR)
> +		devvp->v_rdev->si_mountpt = mp;
>
> 	fs = NULL;
> 	sblockloc = 0;
> @@ -1049,8 +1051,6 @@ ffs_mountfs(devvp, mp, td)
> 			ffs_flushfiles(mp, FORCECLOSE, td);
> 			goto out;
> 		}
> -		if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
> -			devvp->v_rdev->si_mountpt = mp;
> 		if (fs->fs_snapinum[0] != 0)
> 			ffs_snapshot_mount(mp);
> 		fs->fs_fmod = 1;
> @@ -1083,6 +1083,8 @@ ffs_mountfs(devvp, mp, td)
> out:
> 	if (bp)
> 		brelse(bp);
> +	if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
> +		devvp->v_rdev->si_mountpt = NULL;
> 	if (cp != NULL) {
> 		DROP_GIANT();
> 		g_topology_lock();

Yes, that looks better.

The other file systems that support the counters (ext2fs and msdosfs)
need a similar change.

Grepping for si_mountpoint shows no other file systems that support this.
The recently axed reiserfs sets si_mountpt, but only if si_mountpt is
#defined.  This only works in old versions:
- in old versions, si_mountpt is #defined.  GEOM broke this, and the
   #define was removed.  The ifdef kept reiserfs compiling.  History
   for reiserfs was broken by repo-copying after the ifdefed was added.
- mckusick fixed the counting for ffs and restored si_mountpt, but it
   is now not #define'd.

The following file systems are something like ffs so they should set
si_mountpt, but don't: cd9660, fuse, nandfs (?), udf, zfs.  I only
understand cd9660 and udf.

The following file systems used to set si_mountpt but now don't:
hpfs (axed), ntfs (axed), udf (but not cd9660).

Counters for the ro file systems are only moderately useful.  They
tell you that the block the block size is too small and/or if the
clustering is bad.

No version seems to be as careful as the above -- they don't set
si_mountpt until near the end of a sucessful mount.  This takes
just 1 statement in mount() and 1 in umount().  I'd like vfs to
do this setting so that leaf file systems can't forget to do it,
but never figured out the plumbing to tell upper layers of vfs
about devvp.  g_vfs_open() can't do it since it knows devvp but
not mp.

Bruce

From owner-freebsd-fs@freebsd.org  Tue May 17 21:11:25 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8A665B3F2D6
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 21:11:25 +0000 (UTC)
 (envelope-from steven@multiplay.co.uk)
Received: from mail-wm0-x22f.google.com (mail-wm0-x22f.google.com
 [IPv6:2a00:1450:400c:c09::22f])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 15BB019FD
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 21:11:24 +0000 (UTC)
 (envelope-from steven@multiplay.co.uk)
Received: by mail-wm0-x22f.google.com with SMTP id g17so50917814wme.1
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 14:11:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc; bh=18HaE95cOAcNXvLcgKRPFkuzI+tn9EgBFp0adEt+bLM=;
 b=YS/lDuk7Jwu3N2EsuJ4/VFu60UCK20GpgHV9LTqiw/AhwGwJnbNGW9Q7eTpFxbVHyy
 tV+tzTNYJGG1duBa2R3QQunJwgP2WSytUmWSjLyt2KTHbbzsxEYZbPFaicnFbhTBB7C2
 0Nxxz7JTgyLzAZyO382/5cP9IkAmutNg3fhj3yZZxbCzgUDouUj0esrPCK08pmiwLY+x
 7iCgd73li5/H8I1+Unr8NvSdgyq6Z8v9hcnqGDteu8I3Y3eDy6mPAJFhVYkDLb/a5mMO
 fDxKDKeY+iqe+EIII7HiOf5WIlF+T3QXuWACDuqgqkVwyOLhs/ZA3AQ1LRgXkALZ7Yv8
 2sqw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:date
 :message-id:subject:from:to:cc;
 bh=18HaE95cOAcNXvLcgKRPFkuzI+tn9EgBFp0adEt+bLM=;
 b=Kcg7Rql9vGP9LQwwtuPWcSrS7lu0RIwnJacIrH54jUmoMZ2pn4sVfcBCV7p+oUt0rX
 v2cvzaxH4KW2iqKe5oqDbcGK92AZSyhvTqsHCKRr+ZHvqQVq8oCZhVYmEC7wYtzTe3E3
 3knms84pJ2IYFJN6LD3Ty2EyWyjtCplnX+3mDQ+Ny/weFiNfuKCEHq8zx7z2dI+Gnk8r
 Gd0OC6liQaLkGTDcSxJEpKmzHFv1VMfx/VKiFQg0K6vio6tkJBDQ/UrqxO4lOPVHzsQ4
 4VZqGS9X3BVXKU9di1YJ2NRAJJIvQwuZwAvbPcW5K26DlYQofPCBMKP2AKZRCJTEG2c5
 4Qkg==
X-Gm-Message-State: AOPr4FWICj2RBIai881B0XxcNPvt4doGJEtSoW3ooHJOIkRWBzOQx+cD+Njx8p8i7xVHRFHA6HPopTv9Tr/IYlwa
MIME-Version: 1.0
X-Received: by 10.194.139.104 with SMTP id qx8mr3425725wjb.14.1463519483422;
 Tue, 17 May 2016 14:11:23 -0700 (PDT)
Received: by 10.28.93.203 with HTTP; Tue, 17 May 2016 14:11:23 -0700 (PDT)
In-Reply-To: <86shxgsdzh.fsf@WorkBox.Home>
References: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
 <f87ec54a-104e-e712-7793-86c37285fdaa@internetx.com>
 <16e474da-6b20-2e51-9981-3c262eaff350@lexa.ru>
 <BD7DE274-04EB-4B19-988D-5A6FADC5B51A@digsys.bg>
 <1e012e43-a49b-6923-3f0a-ee77a5c8fa70@lexa.ru>
 <86shxgsdzh.fsf@WorkBox.Home>
Date: Tue, 17 May 2016 22:11:23 +0100
Message-ID: <CAHEMsqZto0wD9Ko4E9YUpYvea4jM0E4f2nC1HkAwcCG=6DfX-A@mail.gmail.com>
Subject: Re: ZFS performance bottlenecks: CPU or RAM or anything else?
From: Steven Hartland <steven@multiplay.co.uk>
To: "Brandon J. Wandersee" <brandon.wandersee@gmail.com>
Cc: Alex Tutubalin <lexa@lexa.ru>,
 "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 21:11:25 -0000

Raidz is limited essential limited to a single drive performance
per dev for read and write while mirror is single drive performance for
write its number of drives for read. Don't forget mirror is not limited to
two it can be three, four or more; so if you need more read throughput you
can add drives to the mirror.

To increase raidz performance you need to add more vdevs. While this
doesn't have to be double i.e. the same vdev config as the first it
generally a good idea.

Don't forget that while it rebalances write performance of a multi vdev
raidz will be limited to the added vdev.

On Tuesday, 17 May 2016, Brandon J. Wandersee <brandon.wandersee@gmail.com>
wrote:

>
> Alex Tutubalin writes:
>
> > On 5/17/2016 3:29 PM, Daniel Kalchev wrote:
> >
> >> Not true. You can have N-way mirror and it will survive N-1 drive
> failures.
> > I agree, but 3-way mirror does not looks economical compared to raidz2.
>
> If you're already planning for multiple simultaneous drive failures,
> "economical" isn't really a factor, is it? Those disks have to get
> replaced regardless of the redundancy scheme you assign to them. ;)
>
> Whether the concern is performance or capacity, mirrors will offer the
> most flexibility. Increasing either the performance or capacity of a
> RAIDZ pool necessitates either replacing every disk in the pool or
> doubling the number of disks in the pool, all at once. Mirrors allow you
> to grow a pool and increase/decrease redundancy asymmetrically. True,
> four disks in a two-mirror stripe will see you restoring a backup if one
> disk from each mirror dies, but (arguably) six disks in a two-mirror
> stripe offer both better redundancy and better performance.
>
> Speaking strictly about performance, RAIDZ performance is pretty much
> fixed, while mirrored performance will (I believe) increase slightly as
> you add disks and increase greatly as you add vdevs.
>
> --
>
> ::  Brandon J. Wandersee
> ::  brandon.wandersee@gmail.com <javascript:;>
> ::  --------------------------------------------------
> ::  'The best design is as little design as possible.'
> ::  --- Dieter Rams ----------------------------------
> _______________________________________________
> freebsd-fs@freebsd.org <javascript:;> mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org
> <javascript:;>"
>

From owner-freebsd-fs@freebsd.org  Tue May 17 21:16:17 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 75680B3F456
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 21:16:17 +0000 (UTC)
 (envelope-from fjwcash@gmail.com)
Received: from mail-io0-x232.google.com (mail-io0-x232.google.com
 [IPv6:2607:f8b0:4001:c06::232])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 3D6FA1CE1
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 21:16:17 +0000 (UTC)
 (envelope-from fjwcash@gmail.com)
Received: by mail-io0-x232.google.com with SMTP id f89so40661181ioi.0
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 14:16:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc; bh=kkj1+UCBiYndbNrtv0GY52CycSb/yBIhTS5KmeN8Hno=;
 b=TIZhrcngAZYRJ6cGCDKupKT/vFj3s9R7uB56jomdtKCUgC8dGe9L9fuGF2VnedX2DT
 wEkocW6M/+DJKGNg/rVIw9PqRUd8F/mg38bDt/a5Q11S8I5XgfV19N0e8fcKu8V10jsz
 OQ0lZX5iLn7en0jcVVrS05F9n3TiMm+SCtn34NaAuFXCXiy7O6+LT5/S9FMHX/j5ynG8
 FuIyEFLc7ZP5dr/8ACVQvRcbkdAPy3XwJ1uBrBfQWtzEMjmpO+D2jBWugLYlEmPiM91x
 OlZbsIgV6Q/SHwOPlkZQVvX/oj7lCpeqFCFt5czoS9wV82wmIo2fpA+9XCdw+hZCciWn
 r1OA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:date
 :message-id:subject:from:to:cc;
 bh=kkj1+UCBiYndbNrtv0GY52CycSb/yBIhTS5KmeN8Hno=;
 b=WDiT5D60qP/ukDGmfVfiADk6Plcxkycc12455MsACSU09zQY7iY2PRFUnLbJTR87VF
 1C5L+bk6jvn/nSEk9OoRiiSF4mgpubyaeKGfOd7LOOoYNvoc/qBFgT08UV2i5Zlyvm4j
 AVvvE0mpg31Vt5lrBRA1/SN086q4WFsbiV/+0ujb0f00CtbwKQKcpy+JHoZquIqbE1El
 w40q2jBAOxBt/0fMTFiKWby58/OF6VZBA5xMaoc0AWNyLSt08jrsjo2UQDnZA9obGAUk
 jr/Qz5FtQLpLeJexXvZHkY6IB8ZHrTV9z4NJDhvV5xvOT2AAVlLDeugaW8gCrdHiYFps
 SRAw==
X-Gm-Message-State: AOPr4FWVagn6ObPY5Fj7nNlVx/yeZbQ7/o+V7bKWLypXloDUY9PTizUyECHbFAh8C9LJreIsVLie13BjM8lmGA==
MIME-Version: 1.0
X-Received: by 10.107.134.24 with SMTP id i24mr2716422iod.130.1463519776605;
 Tue, 17 May 2016 14:16:16 -0700 (PDT)
Received: by 10.107.173.79 with HTTP; Tue, 17 May 2016 14:16:16 -0700 (PDT)
In-Reply-To: <CAHEMsqZto0wD9Ko4E9YUpYvea4jM0E4f2nC1HkAwcCG=6DfX-A@mail.gmail.com>
References: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
 <f87ec54a-104e-e712-7793-86c37285fdaa@internetx.com>
 <16e474da-6b20-2e51-9981-3c262eaff350@lexa.ru>
 <BD7DE274-04EB-4B19-988D-5A6FADC5B51A@digsys.bg>
 <1e012e43-a49b-6923-3f0a-ee77a5c8fa70@lexa.ru>
 <86shxgsdzh.fsf@WorkBox.Home>
 <CAHEMsqZto0wD9Ko4E9YUpYvea4jM0E4f2nC1HkAwcCG=6DfX-A@mail.gmail.com>
Date: Tue, 17 May 2016 14:16:16 -0700
Message-ID: <CAOjFWZ6o8Gqh1BzUbkLj+KKXm=r2S-3qy_yk5k84Q57yj7FuAw@mail.gmail.com>
Subject: Re: ZFS performance bottlenecks: CPU or RAM or anything else?
From: Freddie Cash <fjwcash@gmail.com>
To: Steven Hartland <steven@multiplay.co.uk>
Cc: "Brandon J. Wandersee" <brandon.wandersee@gmail.com>, 
 "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 21:16:17 -0000

On Tue, May 17, 2016 at 2:11 PM, Steven Hartland <steven@multiplay.co.uk>
wrote:

> Raidz is limited essential limited to a single drive performance
> per dev for read and write while mirror is single drive performance for
> write its number of drives for read. Don't forget mirror is not limited t=
o
> two it can be three, four or more; so if you need more read throughput yo=
u
> can add drives to the mirror.
>
> To increase raidz performance you need to add more vdevs. While this
> doesn't have to be double i.e. the same vdev config as the first it
> generally a good idea.
>
> Don't forget that while it rebalances write performance of a multi vdev
> raidz will be limited to the added vdev.
>

=E2=80=8BEverybody is missing the point of the OP.

They're not asking for ways to improve the performance of a raidz-based
pool; they're asking why they get different performance metrics from the
exact same pool when they change the CPU and RAM.

And, more importantly, why a Core-i3-based system shows better performance
than a Core-i7-based system.=E2=80=8B  Is there something inherent to the w=
ay ZFS
works that favours one setup over another (lower CPU core counts running at
higher speeds is better/worse than higher CPU core counts running at lower
speeds; more RAM channels is better/worse; things like that).


--=20
Freddie Cash
fjwcash@gmail.com

From owner-freebsd-fs@freebsd.org  Tue May 17 21:22:37 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B74E4B3F62F
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 21:22:37 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id A06B21099
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 21:22:37 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 9BDD0B3F62C; Tue, 17 May 2016 21:22:37 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9B85DB3F62B
 for <fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 21:22:37 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4560C1098
 for <fs@freebsd.org>; Tue, 17 May 2016 21:22:37 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u4HLMRFu006554
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Wed, 18 May 2016 00:22:27 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u4HLMRFu006554
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u4HLMR6Z006553;
 Wed, 18 May 2016 00:22:27 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 18 May 2016 00:22:27 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Cc: fs@freebsd.org
Subject: Re: quick fix for slow directory shrinking in ffs
Message-ID: <20160517212227.GE89104@kib.kiev.ua>
References: <20160517072705.F2157@besplex.bde.org>
 <20160517082050.GX89104@kib.kiev.ua>
 <20160517192933.U4573@besplex.bde.org>
 <20160517111715.GC89104@kib.kiev.ua>
 <20160518035413.L4357@besplex.bde.org>
 <20160518052656.R5764@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160518052656.R5764@besplex.bde.org>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 21:22:37 -0000

On Wed, May 18, 2016 at 05:40:07AM +1000, Bruce Evans wrote:
> Also, ftruncate() seems to be broken.  POSIX doesn't seem to require it
> to honor O_SYNC, but POLA requires this.  But there is no VOP_TRUNCATE();
> truncation is done using VOP_SETATTR() and there is no way to pass down
> the O_SYNC flag to it; in practice, ffs just does UFS_TRUNCATE() without
> IO_SYNC.
> 
> This makes a difference mainly for async mounts with my fixes to honor
> IO_SYNC in ffs_update().  With async mounts, consistency of the file
> system is not guaranteed but O_SYNC for a file should at least cause
> all of the file data and most of its metdata to be written.  Not syncing
> for ftruncate() unnecessarily loses metadata writes.  With !async mounts,
> consistency of the file system is partly guaranteed and lost metadata
> writes for ftruncate() shouldn't affect this -- they should just lose
> the ftruncate() atomically.
> 
> vfs could do an fsync() after VOP_SETATTR() for the O_SYNC case.  This
> reduces the race window.

vattr already has the va_vaflags field.  It is trivial to add flag there
requesting O_SYNC behaviour.  Of course, other updates could also
honour VA_SYNC, but this is for later.  Like this:

diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c
index 0a3a88a..1e42a3d 100644
--- a/sys/kern/vfs_vnops.c
+++ b/sys/kern/vfs_vnops.c
@@ -1314,6 +1314,8 @@ vn_truncate(struct file *fp, off_t length, struct ucred *active_cred,
 	if (error == 0) {
 		VATTR_NULL(&vattr);
 		vattr.va_size = length;
+		if ((fp->f_flag & O_FSYNC) != 0)
+			vattr.va_vaflags |= VA_SYNC;
 		error = VOP_SETATTR(vp, &vattr, fp->f_cred);
 	}
 out:
diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h
index e82f6ee..41ec7f7 100644
--- a/sys/sys/vnode.h
+++ b/sys/sys/vnode.h
@@ -286,6 +286,7 @@ struct vattr {
  */
 #define	VA_UTIMES_NULL	0x01		/* utimes argument was NULL */
 #define	VA_EXCLUSIVE	0x02		/* exclusive create request */
+#define	VA_SYNC		0x04		/* O_SYNC truncation */
 
 /*
  * Flags for ioflag. (high 16 bits used to ask for read-ahead and
diff --git a/sys/ufs/ufs/ufs_vnops.c b/sys/ufs/ufs/ufs_vnops.c
index c0729f8..83df347 100644
--- a/sys/ufs/ufs/ufs_vnops.c
+++ b/sys/ufs/ufs/ufs_vnops.c
@@ -625,7 +625,8 @@ ufs_setattr(ap)
 			 */
 			return (0);
 		}
-		if ((error = UFS_TRUNCATE(vp, vap->va_size, IO_NORMAL,
+		if ((error = UFS_TRUNCATE(vp, vap->va_size, IO_NORMAL |
+		    ((vap->va_vaflags & VA_SYNC) != 0 ? IO_SYNC : 0),
 		    cred)) != 0)
 			return (error);
 	}

From owner-freebsd-fs@freebsd.org  Tue May 17 21:28:24 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id EA9A2B3F906
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 21:28:24 +0000 (UTC)
 (envelope-from steven@multiplay.co.uk)
Received: from mail-wm0-x232.google.com (mail-wm0-x232.google.com
 [IPv6:2a00:1450:400c:c09::232])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id A664C1A2B
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 21:28:24 +0000 (UTC)
 (envelope-from steven@multiplay.co.uk)
Received: by mail-wm0-x232.google.com with SMTP id r12so7433440wme.0
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 14:28:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc; bh=z7vstwTQiUNHIdnrUIAx7ViqJDOqm6/6JMO4H7a+tYQ=;
 b=AAdvZCT1eg2AgFpWM6/1yLfVKmwRGPAIyLDV5vppFlDbpsypakG/Qc5YyjGBOfAm2O
 4+bjR82+v77Q7ixDolp0+cL6ho0HEsZS6/BKnbSGDUmBsqIPw4YntSWJy9trTAdEzKlc
 lsbef3u6bpNcD8IEQ4Q1SOKv/kGvLH4Y5cb4bYHO4TVHovSrKHoNhn8/5yXgU1XE5A56
 BZU4hMmn7EpiUMmd4vOivHHk4p71mWdiGalhpHHt4LBnVnA/rUmIsNmKLc5O8pD1Jg3C
 /2oY1o2CH2p/55NNLr2QsAFqtERLuGYWHPeRbAd7FKYqgMcd74aMSDiV1bliFTfN+TIZ
 P5fA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:date
 :message-id:subject:from:to:cc;
 bh=z7vstwTQiUNHIdnrUIAx7ViqJDOqm6/6JMO4H7a+tYQ=;
 b=ipZR72+TlMbikBQ1tGMYBJayjW+HUGrMYaxP8QUxh3jW3T++2IlmNhEc5VM5jS8/Zo
 wuRpxGaTUx+n43O49oSs2oSa3gVPKcACRm72vfL2+LhYQ3eDrN9efFu2z3VU2Xen+olo
 /aabXnJPjJfK1VNTURQ7MF3VDL/a4/yxfsy8Xlgn7+C+vmb5e5PCN0hYT928KUML6h8L
 uksf9UhlytvdgYcL7DuOQILmUrFknfa3dvSOiCP16iW9vn69+pD4zt5Vu3oMlWsIEztJ
 A0iRrZd+xOda60CrPOjUTiD6H9Q4h3uev+IxpGfbXrjZweWmICfFdNOEuuJgzGANi4rF
 xsvg==
X-Gm-Message-State: AOPr4FUKSlKvD0j2neQYKuun6V3pti+KkHKGGHvuZ9B41tvnO9MuNwfxnSL6RuMZ3OnulrUDY/+kfEEnDoauHU4i
MIME-Version: 1.0
X-Received: by 10.194.163.229 with SMTP id yl5mr3582806wjb.6.1463520503074;
 Tue, 17 May 2016 14:28:23 -0700 (PDT)
Received: by 10.28.93.203 with HTTP; Tue, 17 May 2016 14:28:22 -0700 (PDT)
In-Reply-To: <CAOjFWZ6o8Gqh1BzUbkLj+KKXm=r2S-3qy_yk5k84Q57yj7FuAw@mail.gmail.com>
References: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
 <f87ec54a-104e-e712-7793-86c37285fdaa@internetx.com>
 <16e474da-6b20-2e51-9981-3c262eaff350@lexa.ru>
 <BD7DE274-04EB-4B19-988D-5A6FADC5B51A@digsys.bg>
 <1e012e43-a49b-6923-3f0a-ee77a5c8fa70@lexa.ru>
 <86shxgsdzh.fsf@WorkBox.Home>
 <CAHEMsqZto0wD9Ko4E9YUpYvea4jM0E4f2nC1HkAwcCG=6DfX-A@mail.gmail.com>
 <CAOjFWZ6o8Gqh1BzUbkLj+KKXm=r2S-3qy_yk5k84Q57yj7FuAw@mail.gmail.com>
Date: Tue, 17 May 2016 22:28:22 +0100
Message-ID: <CAHEMsqaBtjk2Zt+oLERT7xgrCNcR6YHBw044N3qaL6fUA==nuA@mail.gmail.com>
Subject: Re: ZFS performance bottlenecks: CPU or RAM or anything else?
From: Steven Hartland <steven@multiplay.co.uk>
To: Freddie Cash <fjwcash@gmail.com>
Cc: "Brandon J. Wandersee" <brandon.wandersee@gmail.com>, 
 "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 21:28:25 -0000

Tbh if the results were from more than 6 months ago they are likely quite
out of date as things have changed quite significantly in that period,
so retesting would be advised.

On Tuesday, 17 May 2016, Freddie Cash <fjwcash@gmail.com> wrote:

> On Tue, May 17, 2016 at 2:11 PM, Steven Hartland <steven@multiplay.co.uk
> <javascript:_e(%7B%7D,'cvml','steven@multiplay.co.uk');>> wrote:
>
>> Raidz is limited essential limited to a single drive performance
>> per dev for read and write while mirror is single drive performance for
>> write its number of drives for read. Don't forget mirror is not limited =
to
>> two it can be three, four or more; so if you need more read throughput y=
ou
>> can add drives to the mirror.
>>
>> To increase raidz performance you need to add more vdevs. While this
>> doesn't have to be double i.e. the same vdev config as the first it
>> generally a good idea.
>>
>> Don't forget that while it rebalances write performance of a multi vdev
>> raidz will be limited to the added vdev.
>>
>
> =E2=80=8BEverybody is missing the point of the OP.
>
> They're not asking for ways to improve the performance of a raidz-based
> pool; they're asking why they get different performance metrics from the
> exact same pool when they change the CPU and RAM.
>
> And, more importantly, why a Core-i3-based system shows better performanc=
e
> than a Core-i7-based system.=E2=80=8B  Is there something inherent to the=
 way ZFS
> works that favours one setup over another (lower CPU core counts running =
at
> higher speeds is better/worse than higher CPU core counts running at lowe=
r
> speeds; more RAM channels is better/worse; things like that).
>
>
> --
> Freddie Cash
> fjwcash@gmail.com <javascript:_e(%7B%7D,'cvml','fjwcash@gmail.com');>
>

From owner-freebsd-fs@freebsd.org  Tue May 17 21:30:31 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 82A85B3F9D9
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 21:30:31 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 70B661C8C
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 21:30:31 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 6C66FB3F9D7; Tue, 17 May 2016 21:30:31 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6C128B3F9D6
 for <fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 21:30:31 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id 37D5D1C8B
 for <fs@freebsd.org>; Tue, 17 May 2016 21:30:30 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au
 (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id BA0F3429C03;
 Wed, 18 May 2016 07:30:28 +1000 (AEST)
Date: Wed, 18 May 2016 07:30:25 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
cc: Konstantin Belousov <kostikbel@gmail.com>, fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
In-Reply-To: <20160518061040.D5948@besplex.bde.org>
Message-ID: <20160518070252.F6121@besplex.bde.org>
References: <20160517072104.I2137@besplex.bde.org>
 <20160517084241.GY89104@kib.kiev.ua>
 <20160518061040.D5948@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=EfU1O6SC c=1 sm=1 tr=0
 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10
 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=JdS-s63oAcdDXkgrDngA:9
 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 21:30:31 -0000

On Wed, 18 May 2016, Bruce Evans wrote:

> On Tue, 17 May 2016, Konstantin Belousov wrote:
>> ...
>> For the accounting patch, don't we want to account for all io, including
>> the mount-time metadata reads and initial superblock update ?
>> 
>> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
>> index 9776554..712fc21 100644
>> --- a/sys/ufs/ffs/ffs_vfsops.c
>> +++ b/sys/ufs/ffs/ffs_vfsops.c
>> @@ -780,6 +780,8 @@ ffs_mountfs(devvp, mp, td)
>> 		mp->mnt_iosize_max = MAXPHYS;
>> 
>> 	devvp->v_bufobj.bo_ops = &ffs_ops;
>> +	if (devvp->v_type == VCHR)
>> +		devvp->v_rdev->si_mountpt = mp;
>> 
>> 	fs = NULL;
>> 	sblockloc = 0;
>> @@ -1049,8 +1051,6 @@ ffs_mountfs(devvp, mp, td)
>> 			ffs_flushfiles(mp, FORCECLOSE, td);
>> 			goto out;
>> 		}
>> -		if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
>> -			devvp->v_rdev->si_mountpt = mp;
>> 		if (fs->fs_snapinum[0] != 0)
>> 			ffs_snapshot_mount(mp);
>> 		fs->fs_fmod = 1;
>> @@ -1083,6 +1083,8 @@ ffs_mountfs(devvp, mp, td)
>> out:
>> 	if (bp)
>> 		brelse(bp);
>> +	if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
>> +		devvp->v_rdev->si_mountpt = NULL;
>> 	if (cp != NULL) {
>> 		DROP_GIANT();
>> 		g_topology_lock();
>
> Yes, that looks better.

Further cleanups:
- the null pointer check is bogus since we already dereferenced
   devvp->v_rdev.  We also assigned devvp->v_rdev to the variable
   dev but spelled out devvp->v_rdev in a couple of other places.
- the VCHR check is bogus since we only work for VCHR and have
   already checked for VCHR in vn_isdisk().

Similarly in ffs_umount() except there is no dev variable there.

Similarly in msdosfs.

NOT similarly in ext2fs.  I was looking at the wrong tree again.
Only 1 of my trees has the patch to do this in ext2fs.  The patch
for ffs applies almost verbatim.

Bruce

From owner-freebsd-fs@freebsd.org  Tue May 17 21:35:58 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A3D43B3FBEA
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 21:35:58 +0000 (UTC)
 (envelope-from fullermd@over-yonder.net)
Received: from mail.infocus-llc.com (mail.infocus-llc.com [199.15.120.13])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 8249F125C
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 21:35:58 +0000 (UTC)
 (envelope-from fullermd@over-yonder.net)
Received: from draco.over-yonder.net (c-75-65-60-66.hsd1.ms.comcast.net
 [75.65.60.66])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.tarragon.infocus-llc.com (Postfix) with ESMTPSA id 3r8Vxf4z8LzX2;
 Tue, 17 May 2016 16:35:50 -0500 (CDT)
Received: by draco.over-yonder.net (Postfix, from userid 100)
 id 3r8Vxd6xw5z1mp; Tue, 17 May 2016 16:35:49 -0500 (CDT)
Date: Tue, 17 May 2016 16:35:49 -0500
From: "Matthew D. Fuller" <fullermd@over-yonder.net>
To: Freddie Cash <fjwcash@gmail.com>
Cc: Steven Hartland <steven@multiplay.co.uk>,
 "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject: Re: ZFS performance bottlenecks: CPU or RAM or anything else?
Message-ID: <20160517213549.GK24656@over-yonder.net>
References: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
 <f87ec54a-104e-e712-7793-86c37285fdaa@internetx.com>
 <16e474da-6b20-2e51-9981-3c262eaff350@lexa.ru>
 <BD7DE274-04EB-4B19-988D-5A6FADC5B51A@digsys.bg>
 <1e012e43-a49b-6923-3f0a-ee77a5c8fa70@lexa.ru>
 <86shxgsdzh.fsf@WorkBox.Home>
 <CAHEMsqZto0wD9Ko4E9YUpYvea4jM0E4f2nC1HkAwcCG=6DfX-A@mail.gmail.com>
 <CAOjFWZ6o8Gqh1BzUbkLj+KKXm=r2S-3qy_yk5k84Q57yj7FuAw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAOjFWZ6o8Gqh1BzUbkLj+KKXm=r2S-3qy_yk5k84Q57yj7FuAw@mail.gmail.com>
X-Editor: vi
X-OS: FreeBSD <http://www.freebsd.org/>
X-Virus-Scanned: clamav-milter 0.99 at mail.tarragon.infocus-llc.com
X-Virus-Status: Clean
User-Agent: Mutt/1.6.0-fullermd.4 (2016-04-01)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 21:35:58 -0000

On Tue, May 17, 2016 at 02:16:16PM -0700 I heard the voice of
Freddie Cash, and lo! it spake thus:
> 
> They're not asking for ways to improve the performance of a
> raidz-based pool; they're asking why they get different performance
> metrics from the exact same pool when they change the CPU and RAM.

More specifically, as I read it, different performance in a very
specific metric; single-thread linear bulk writes.  That doesn't seem
like it would benefit heavily from a lot of cores available, or from
RAM bandwidth or size above a pretty low threshold.

Of course, it's not just changing the CPU and RAM; it's also the
motherboard, and possibly the HBA (at least the bus the HBA is on, if
it's a card being transplanted with the pool).  And the Core 2 would
be back in the plain-old FSB era, so RAM access would be competing
with the disk IO on the bus.


-- 
Matthew Fuller     (MF4839)   |  fullermd@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
           On the Internet, nobody can hear you scream.

From owner-freebsd-fs@freebsd.org  Tue May 17 21:46:47 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5BD93B3FEA0
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 21:46:47 +0000 (UTC)
 (envelope-from steven@multiplay.co.uk)
Received: from mail-wm0-x22e.google.com (mail-wm0-x22e.google.com
 [IPv6:2a00:1450:400c:c09::22e])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id DF2561777
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 21:46:46 +0000 (UTC)
 (envelope-from steven@multiplay.co.uk)
Received: by mail-wm0-x22e.google.com with SMTP id n129so158118761wmn.1
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 14:46:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc; bh=HbfemyyWjoDRZ6770FQPy4PY02rYdxvUX21JKyZnTvE=;
 b=GBmlXcYH54XMqkc88djwby+crSG/B+62kv1kFDresvyb3AzZN1b/qMKSr2uMZvJwJL
 YxlHEU5fq/egqQaK4yZwyWW57eJ2hC38kNUnbwIacDwv9STRj5rB+Iphof+Ql2PmhuHK
 UCLEsyYuHSJ7T/PlRq0ZPhltu+6nTXKtIzMVgfCgticqe3bcLPPPXQKqgCTM7YN/UPbA
 tQyOxD27jAaJj3meff9faAi7qXiG5uz/GHhKJPNRE6u5GkmqO6OdaI8S9LLiZidaV7rH
 4ZZAJKg0fn0+mNTLl67hjlJhQgRyesYoBRaSX0UccYrhVAXhqP29TnJchb0gyVN5iccV
 gKBA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:date
 :message-id:subject:from:to:cc;
 bh=HbfemyyWjoDRZ6770FQPy4PY02rYdxvUX21JKyZnTvE=;
 b=bfRb0nqGwUUwi9aXA3Sofo5GDIdn0g41b9mziHbUtvUxe7BexUDfMKHVBHKLfqUCFB
 qXjptfESSGHB3/0ULr810pUVMWVeBREmafHv4GR3nIsSdNjXrRlRHMNgVFuuGhcJgs49
 bihNITJyLa2VKa1vC/DZmRjtorDBrhep0a5gTk+3fh0SXqj+74JrnY86N/7Es8vEOFjz
 +6R8f/oZBqYfcZ0ZT++lGcnxu0yOCP9u7aZx0EIB/xDJmtZ1b6Dx5kpdipK6ccQcgeAD
 tMsDbYe60NLlp048EtRsigzYYfd+puL6c3CaR9gntwUslX6ZdRU3nsa5IxPvsmT2S21a
 Yw8g==
X-Gm-Message-State: AOPr4FVmJMTjRjVUon74W+pMIx7aq310aFzp1+2lgvgS7RJSJEb0/Nns/ljgHHFOwJvsrwTcv47K3mzSx1s0KCHv
MIME-Version: 1.0
X-Received: by 10.28.6.138 with SMTP id 132mr25022114wmg.60.1463521605006;
 Tue, 17 May 2016 14:46:45 -0700 (PDT)
Received: by 10.28.93.203 with HTTP; Tue, 17 May 2016 14:46:44 -0700 (PDT)
In-Reply-To: <20160517213549.GK24656@over-yonder.net>
References: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
 <f87ec54a-104e-e712-7793-86c37285fdaa@internetx.com>
 <16e474da-6b20-2e51-9981-3c262eaff350@lexa.ru>
 <BD7DE274-04EB-4B19-988D-5A6FADC5B51A@digsys.bg>
 <1e012e43-a49b-6923-3f0a-ee77a5c8fa70@lexa.ru>
 <86shxgsdzh.fsf@WorkBox.Home>
 <CAHEMsqZto0wD9Ko4E9YUpYvea4jM0E4f2nC1HkAwcCG=6DfX-A@mail.gmail.com>
 <CAOjFWZ6o8Gqh1BzUbkLj+KKXm=r2S-3qy_yk5k84Q57yj7FuAw@mail.gmail.com>
 <20160517213549.GK24656@over-yonder.net>
Date: Tue, 17 May 2016 22:46:44 +0100
Message-ID: <CAHEMsqYNHtzy=NT+a9pMw3n61ys4jTCfuAxY+DfFjLYBOmcqjQ@mail.gmail.com>
Subject: Re: ZFS performance bottlenecks: CPU or RAM or anything else?
From: Steven Hartland <steven@multiplay.co.uk>
To: "Matthew D. Fuller" <fullermd@over-yonder.net>
Cc: Freddie Cash <fjwcash@gmail.com>,
 "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 21:46:47 -0000

Not to mention it's so easy to cripple performance with a bad bios setup
this could easily be a simple setup issue.

I had an issue the other day where a 4ghz Intel CPU couldn't process video
transcode in real time which turned out to be a power saving option in the
bios that was utterly destroying performance by running the CPU at 800Mhz
instead of 4Ghz. Everything else seemed fine with nothing was using more
then a few present of CPU. Disabling power saving fixed the issue. This
issue was not present on a much lower power / older box simply because it
didn't have the advanced power saving options.

I'm not saying this was the case in these tests but simply providing a
comcrete example that it's sometimes hard to get like for like comparisons
even for what should be simple tests.

On Tuesday, 17 May 2016, Matthew D. Fuller <fullermd@over-yonder.net> wrote:

> On Tue, May 17, 2016 at 02:16:16PM -0700 I heard the voice of
> Freddie Cash, and lo! it spake thus:
> >
> > They're not asking for ways to improve the performance of a
> > raidz-based pool; they're asking why they get different performance
> > metrics from the exact same pool when they change the CPU and RAM.
>
> More specifically, as I read it, different performance in a very
> specific metric; single-thread linear bulk writes.  That doesn't seem
> like it would benefit heavily from a lot of cores available, or from
> RAM bandwidth or size above a pretty low threshold.
>
> Of course, it's not just changing the CPU and RAM; it's also the
> motherboard, and possibly the HBA (at least the bus the HBA is on, if
> it's a card being transplanted with the pool).  And the Core 2 would
> be back in the plain-old FSB era, so RAM access would be competing
> with the disk IO on the bus.
>
>
> --
> Matthew Fuller     (MF4839)   |  fullermd@over-yonder.net <javascript:;>
> Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
>            On the Internet, nobody can hear you scream.
>

From owner-freebsd-fs@freebsd.org  Tue May 17 22:01:05 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id DAE69B401E6
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 22:01:05 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id C4AD21D50
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 22:01:05 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id C3FEAB401E5; Tue, 17 May 2016 22:01:05 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C39D5B401E4
 for <fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 22:01:05 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6D5F21D4F
 for <fs@freebsd.org>; Tue, 17 May 2016 22:01:05 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u4HM0tdA016113
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Wed, 18 May 2016 01:00:56 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u4HM0tdA016113
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u4HM0t7E016110;
 Wed, 18 May 2016 01:00:55 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 18 May 2016 01:00:55 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Cc: fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
Message-ID: <20160517220055.GF89104@kib.kiev.ua>
References: <20160517072104.I2137@besplex.bde.org>
 <20160517084241.GY89104@kib.kiev.ua>
 <20160518061040.D5948@besplex.bde.org>
 <20160518070252.F6121@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160518070252.F6121@besplex.bde.org>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 22:01:06 -0000

On Wed, May 18, 2016 at 07:30:25AM +1000, Bruce Evans wrote:
> Further cleanups:
> - the null pointer check is bogus since we already dereferenced
>    devvp->v_rdev.  We also assigned devvp->v_rdev to the variable
>    dev but spelled out devvp->v_rdev in a couple of other places.
> - the VCHR check is bogus since we only work for VCHR and have
>    already checked for VCHR in vn_isdisk().
No, these are not bogus.  The checks are incorrect because they are
racy, but they are needed with the proper locking.  I intended to look
at this tomorrow, since the fixes are not related to the current changes,
but you forced me.

VCHR check ensures that the devvp vnode is not reclaimed. I do not want
to remove the check and rely on the caller of ffs_mountfs() to always do
the right thing for it without unlocking devvp, this is too subtle.

We are safe from devvp being reclaimed when io is in progress, since
our reference prevents cdev memory from free, which ensures that v_rdev
is valid if non-NULL. Unmount is not supposed to finish until all io is
finished (but we had bugs there).

> 
> Similarly in ffs_umount() except there is no dev variable there.
There is ump->um_dev.

diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
index 712fc21..da61c15 100644
--- a/sys/ufs/ffs/ffs_vfsops.c
+++ b/sys/ufs/ffs/ffs_vfsops.c
@@ -771,17 +771,18 @@ ffs_mountfs(devvp, mp, td)
 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
 	g_topology_unlock();
 	PICKUP_GIANT();
-	VOP_UNLOCK(devvp, 0);
-	if (error)
+	if (error) {
+		VOP_UNLOCK(devvp, 0);
 		goto out;
-	if (devvp->v_rdev->si_iosize_max != 0)
+	}
+	if (dev->si_iosize_max != 0)
 		mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
 	if (mp->mnt_iosize_max > MAXPHYS)
 		mp->mnt_iosize_max = MAXPHYS;
-
 	devvp->v_bufobj.bo_ops = &ffs_ops;
 	if (devvp->v_type == VCHR)
-		devvp->v_rdev->si_mountpt = mp;
+		dev->si_mountpt = mp;
+	VOP_UNLOCK(devvp, 0);
 
 	fs = NULL;
 	sblockloc = 0;
@@ -1083,8 +1084,10 @@ ffs_mountfs(devvp, mp, td)
 out:
 	if (bp)
 		brelse(bp);
+	VOP_LOCK(devvp, LK_EXCLUSIVE | LK_RETRY);
 	if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
 		devvp->v_rdev->si_mountpt = NULL;
+	VOP_UNLOCK(devvp, 0);
 	if (cp != NULL) {
 		DROP_GIANT();
 		g_topology_lock();
@@ -1287,9 +1290,11 @@ ffs_unmount(mp, mntflags)
 	g_vfs_close(ump->um_cp);
 	g_topology_unlock();
 	PICKUP_GIANT();
-	if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL)
-		ump->um_devvp->v_rdev->si_mountpt = NULL;
-	vrele(ump->um_devvp);
+	VOP_LOCK(ump->um_devvp, LK_EXCLUSIVE | LK_RETRY);
+	if (ump->um_devvp->v_type == VCHR &&
+	    ump->um_devvp->v_rdev == ump->um_dev)
+		ump->um_dev->si_mountpt = NULL;
+	vput(ump->um_devvp);
 	dev_rel(ump->um_dev);
 	mtx_destroy(UFS_MTX(ump));
 	if (mp->mnt_gjprovider != NULL) {

From owner-freebsd-fs@freebsd.org  Tue May 17 22:39:20 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C0AC2B409DB
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 22:39:20 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id ABFAD100E
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 22:39:20 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id A6FABB409D8; Tue, 17 May 2016 22:39:20 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A6AB3B409D7
 for <fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 22:39:20 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au
 [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 3FFCE1005
 for <fs@freebsd.org>; Tue, 17 May 2016 22:39:19 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au
 (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109])
 by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 76183D400F2;
 Wed, 18 May 2016 08:39:08 +1000 (AEST)
Date: Wed, 18 May 2016 08:39:08 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
cc: fs@freebsd.org
Subject: Re: quick fix for slow directory shrinking in ffs
In-Reply-To: <20160517212227.GE89104@kib.kiev.ua>
Message-ID: <20160518081302.X6396@besplex.bde.org>
References: <20160517072705.F2157@besplex.bde.org>
 <20160517082050.GX89104@kib.kiev.ua>
 <20160517192933.U4573@besplex.bde.org> <20160517111715.GC89104@kib.kiev.ua>
 <20160518035413.L4357@besplex.bde.org> <20160518052656.R5764@besplex.bde.org>
 <20160517212227.GE89104@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=TuMb/2jh c=1 sm=1 tr=0
 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10
 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=LHc1-K_XCWJFMkeID04A:9
 a=AnwALAnBAE9uY5ff:21 a=85e_Wl8BSmVFEIn7:21 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 22:39:20 -0000

On Wed, 18 May 2016, Konstantin Belousov wrote:

> On Wed, May 18, 2016 at 05:40:07AM +1000, Bruce Evans wrote:
>> Also, ftruncate() seems to be broken.  POSIX doesn't seem to require it
>> to honor O_SYNC, but POLA requires this.  But there is no VOP_TRUNCATE();
>> truncation is done using VOP_SETATTR() and there is no way to pass down
>> the O_SYNC flag to it; in practice, ffs just does UFS_TRUNCATE() without
>> IO_SYNC.
>>
>> This makes a difference mainly for async mounts with my fixes to honor
>> IO_SYNC in ffs_update().  With async mounts, consistency of the file
>> system is not guaranteed but O_SYNC for a file should at least cause
>> all of the file data and most of its metdata to be written.  Not syncing
>> for ftruncate() unnecessarily loses metadata writes.  With !async mounts,
>> consistency of the file system is partly guaranteed and lost metadata
>> writes for ftruncate() shouldn't affect this -- they should just lose
>> the ftruncate() atomically.
>>
>> vfs could do an fsync() after VOP_SETATTR() for the O_SYNC case.  This
>> reduces the race window.
>
> vattr already has the va_vaflags field.  It is trivial to add flag there
> requesting O_SYNC behaviour.  Of course, other updates could also
> honour VA_SYNC, but this is for later.  Like this:
>
> diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c
> index 0a3a88a..1e42a3d 100644
> --- a/sys/kern/vfs_vnops.c
> +++ b/sys/kern/vfs_vnops.c
> @@ -1314,6 +1314,8 @@ vn_truncate(struct file *fp, off_t length, struct ucred *active_cred,
> 	if (error == 0) {
> 		VATTR_NULL(&vattr);
> 		vattr.va_size = length;
> +		if ((fp->f_flag & O_FSYNC) != 0)
> +			vattr.va_vaflags |= VA_SYNC;
> 		error = VOP_SETATTR(vp, &vattr, fp->f_cred);
> 	}
> out:
> diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h
> index e82f6ee..41ec7f7 100644
> --- a/sys/sys/vnode.h
> +++ b/sys/sys/vnode.h
> @@ -286,6 +286,7 @@ struct vattr {
>  */
> #define	VA_UTIMES_NULL	0x01		/* utimes argument was NULL */
> #define	VA_EXCLUSIVE	0x02		/* exclusive create request */
> +#define	VA_SYNC		0x04		/* O_SYNC truncation */
>
> /*
>  * Flags for ioflag. (high 16 bits used to ask for read-ahead and
> diff --git a/sys/ufs/ufs/ufs_vnops.c b/sys/ufs/ufs/ufs_vnops.c
> index c0729f8..83df347 100644
> --- a/sys/ufs/ufs/ufs_vnops.c
> +++ b/sys/ufs/ufs/ufs_vnops.c
> @@ -625,7 +625,8 @@ ufs_setattr(ap)
> 			 */
> 			return (0);
> 		}
> -		if ((error = UFS_TRUNCATE(vp, vap->va_size, IO_NORMAL,
> +		if ((error = UFS_TRUNCATE(vp, vap->va_size, IO_NORMAL |
> +		    ((vap->va_vaflags & VA_SYNC) != 0 ? IO_SYNC : 0),
> 		    cred)) != 0)
> 			return (error);
> 	}

Looks good.

O_SYNC is actually spelled O_FSYNC in FreeBSD.  You spelled it correctly
in the above, but about 2 places in the kernel use the POSIX spelling.
It is confusing enough to also have the spellings FFSYNC and IO_SYNC
for this flag in different layers.  FFSYNC is for fcntl and must equal
O_FSYNC since the layers are not clearly separated.  IO_SYNC is for
a clearly separated layer and is supposed to be translated to, but it
has the same value as O_FSYNC so O_FSYNC might work accidentally when
not translated.

This should probably also be done for truncations with O_TRUNC at open
time.  There are a couple of these in vfs_syscalls.c.  O_TRUNC is used
much more than ftruncate() so the extra overhead from this would be
more noticeable.  I think the implementation is not very good.  If
open() with O_TRUNC or truncate with O_FSYNC or fsync() fails, then
the file contents might be garbage.  So it would be better to do
large truncations mostly async and only sync at the end.  *fs_truncate()
could operate like that, but I think it takes the IO_SYNC flag as a
directive to do the whole operation synchronously. A non-sync truncation
followed by fsync() is likely to work better for ffs and just work for
all fs's.

Bruce

From owner-freebsd-fs@freebsd.org  Tue May 17 23:11:57 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 150EDB40E1A
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 23:11:57 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id F209A1D52
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 23:11:56 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id F1559B40E19; Tue, 17 May 2016 23:11:56 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id F0F75B40E18
 for <fs@mailman.ysv.freebsd.org>; Tue, 17 May 2016 23:11:56 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 71FD11D51
 for <fs@freebsd.org>; Tue, 17 May 2016 23:11:56 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u4HNBpVC033011
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Wed, 18 May 2016 02:11:51 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u4HNBpVC033011
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u4HNBoea033010;
 Wed, 18 May 2016 02:11:50 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 18 May 2016 02:11:50 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Cc: fs@freebsd.org
Subject: Re: quick fix for slow directory shrinking in ffs
Message-ID: <20160517231150.GG89104@kib.kiev.ua>
References: <20160517072705.F2157@besplex.bde.org>
 <20160517082050.GX89104@kib.kiev.ua>
 <20160517192933.U4573@besplex.bde.org>
 <20160517111715.GC89104@kib.kiev.ua>
 <20160518035413.L4357@besplex.bde.org>
 <20160518052656.R5764@besplex.bde.org>
 <20160517212227.GE89104@kib.kiev.ua>
 <20160518081302.X6396@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160518081302.X6396@besplex.bde.org>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 23:11:57 -0000

On Wed, May 18, 2016 at 08:39:08AM +1000, Bruce Evans wrote:
> Looks good.
...

> 
> This should probably also be done for truncations with O_TRUNC at open
> time.  There are a couple of these in vfs_syscalls.c.  O_TRUNC is used
> much more than ftruncate() so the extra overhead from this would be
> more noticeable.  I think the implementation is not very good.  If
> open() with O_TRUNC or truncate with O_FSYNC or fsync() fails, then
> the file contents might be garbage.  So it would be better to do
> large truncations mostly async and only sync at the end.  *fs_truncate()
> could operate like that, but I think it takes the IO_SYNC flag as a
> directive to do the whole operation synchronously. A non-sync truncation
> followed by fsync() is likely to work better for ffs and just work for
> all fs's.

I see only two places which calls fo_truncate() in vfs_syscalls.c,
after O_TRUNC test.  Both cases are after some kind of open, and
the mechanism from my patch does synchronous truncation automatically
for the callers.

Of course, truncation errors from O_TRUNC in open are fatal, and the
precious file (otherwise O_SYNC would be not specified at all) is in
undefined and damaged state if that happens. From this point of view,
O_TRUNC was bad idea.

I looked at POSIX text, and while ftruncate(2) is allowed to return e.g.
EIO, for open(2) EIO is not listed in case of truncation problems.
I am not sure if generic rules of POSIX allow to say that the condition
is undefined.  Implementation cannot handle that without loss.


From owner-freebsd-fs@freebsd.org  Wed May 18 00:00:42 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C5721B3FD81
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 00:00:42 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id AF019149C
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 00:00:42 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id A98A8B3FD7B; Wed, 18 May 2016 00:00:42 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A91AEB3FD78
 for <fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 00:00:42 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id 991681475
 for <fs@freebsd.org>; Wed, 18 May 2016 00:00:40 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au
 (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id C347F428F06;
 Wed, 18 May 2016 10:00:31 +1000 (AEST)
Date: Wed, 18 May 2016 10:00:09 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
cc: fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
In-Reply-To: <20160517220055.GF89104@kib.kiev.ua>
Message-ID: <20160518084931.T6534@besplex.bde.org>
References: <20160517072104.I2137@besplex.bde.org>
 <20160517084241.GY89104@kib.kiev.ua>
 <20160518061040.D5948@besplex.bde.org> <20160518070252.F6121@besplex.bde.org>
 <20160517220055.GF89104@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=c+ZWOkJl c=1 sm=1 tr=0
 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10
 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=_FYEsZlHC8cxAzq_7eoA:9
 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 00:00:42 -0000

On Wed, 18 May 2016, Konstantin Belousov wrote:

> On Wed, May 18, 2016 at 07:30:25AM +1000, Bruce Evans wrote:
>> Further cleanups:
>> - the null pointer check is bogus since we already dereferenced
>>    devvp->v_rdev.  We also assigned devvp->v_rdev to the variable
>>    dev but spelled out devvp->v_rdev in a couple of other places.
>> - the VCHR check is bogus since we only work for VCHR and have
>>    already checked for VCHR in vn_isdisk().
> No, these are not bogus.  The checks are incorrect because they are
> racy, but they are needed with the proper locking.  I intended to look
> at this tomorrow, since the fixes are not related to the current changes,
> but you forced me.

You are too efficient :-).

> VCHR check ensures that the devvp vnode is not reclaimed. I do not want
> to remove the check and rely on the caller of ffs_mountfs() to always do
> the right thing for it without unlocking devvp, this is too subtle.

Surely the caller must lock devvp?  Otherwise none of the uses of devvp
can be trusted, and there are several others.

> We are safe from devvp being reclaimed when io is in progress, since
> our reference prevents cdev memory from free, which ensures that v_rdev
> is valid if non-NULL. Unmount is not supposed to finish until all io is
> finished (but we had bugs there).
>>
>> Similarly in ffs_umount() except there is no dev variable there.
> There is ump->um_dev.

There is also ump->um_devvvp, but this seems to be unusable since it
might go away.

So using the devvp->v_rdev instead of the dev variable is not just a
style bug.

> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
> index 712fc21..da61c15 100644
> --- a/sys/ufs/ffs/ffs_vfsops.c
> +++ b/sys/ufs/ffs/ffs_vfsops.c
> @@ -771,17 +771,18 @@ ffs_mountfs(devvp, mp, td)
> 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
> 	g_topology_unlock();
> 	PICKUP_GIANT();
> -	VOP_UNLOCK(devvp, 0);
> -	if (error)
> +	if (error) {
> +		VOP_UNLOCK(devvp, 0);
> 		goto out;
> -	if (devvp->v_rdev->si_iosize_max != 0)
> +	}
> +	if (dev->si_iosize_max != 0)
> 		mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
> 	if (mp->mnt_iosize_max > MAXPHYS)
> 		mp->mnt_iosize_max = MAXPHYS;
> -
> 	devvp->v_bufobj.bo_ops = &ffs_ops;
> 	if (devvp->v_type == VCHR)

devvp must be still VCHR since this is now under the vnode lock, and we
depend on dev remaining a character device for the disk described by
devvp at the time of the vn_isdisk() check.

> -		devvp->v_rdev->si_mountpt = mp;
> +		dev->si_mountpt = mp;
> +	VOP_UNLOCK(devvp, 0);

The unlocking could be a little earlier since dev is still for a disk even
if devvp went away and you changed this to not used devvp->v_rdev.

>
> 	fs = NULL;
> 	sblockloc = 0;

Unlocking and then using devvp sure looks like a race.

You only needed to move the unlocking to fix.  devvp->v_bufobj.  How does
that work?  The write is now locked, but if devvp goes away, then don't
we lose its bufobj?

> @@ -1083,8 +1084,10 @@ ffs_mountfs(devvp, mp, td)
> out:
> 	if (bp)
> 		brelse(bp);
> +	VOP_LOCK(devvp, LK_EXCLUSIVE | LK_RETRY);
> 	if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
> 		devvp->v_rdev->si_mountpt = NULL;
> +	VOP_UNLOCK(devvp, 0);
> 	if (cp != NULL) {
> 		DROP_GIANT();
> 		g_topology_lock();

Why not just dev->si_mountpt = NULL unconditionally?  We must do this
even if devvp went away, and we can easily do it using dev alone, as
above.

> @@ -1287,9 +1290,11 @@ ffs_unmount(mp, mntflags)
> 	g_vfs_close(ump->um_cp);
> 	g_topology_unlock();
> 	PICKUP_GIANT();
> -	if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL)
> -		ump->um_devvp->v_rdev->si_mountpt = NULL;
> -	vrele(ump->um_devvp);
> +	VOP_LOCK(ump->um_devvp, LK_EXCLUSIVE | LK_RETRY);
> +	if (ump->um_devvp->v_type == VCHR &&
> +	    ump->um_devvp->v_rdev == ump->um_dev)
> +		ump->um_dev->si_mountpt = NULL;
> +	vput(ump->um_devvp);

As above.  We don't care if um_devvp went away, at least for clearing
si_mountpt, and must use ump->um_dev to clear si_mountpt.

> 	dev_rel(ump->um_dev);

Presumably ump->um_dev was reference throughout until here, and this is
the only thing keeping the device from going away too.

> 	mtx_destroy(UFS_MTX(ump));
> 	if (mp->mnt_gjprovider != NULL) {
>

How does any use of ump->um_devvp work?

I tried revoke(2) on the devvp of a mounted file system.  This worked
to give v_type = VBAD and v_rdev = NULL, but didn't crash.  ffs_unmount()
checked for the bad vnode, unlike most places, and failed to clear
si_mountpt.

Normal use doesn't have revokes, but if the vnode is reclaimed instead
of just becoming bad, then worse things probably happen.  I think vnode
cache resizing gives very unstable storage so the pointer becomes very
invalid.  But even revoke followed by setting kern.numvnodes to 1 didn't
crash (15 vnodes remained).  So devvp must be referenced throughout.
It seems to have reference count 2, since umounting reduced kern.numvnodes
from 15 to 13.  (It is surprising how much works with kern.maxvnodes=1.
I was able to run revoke, sysctl and umount.)  It is still a mystery
that the VBAD vnode doesn't crash soon.

Bruce

From owner-freebsd-fs@freebsd.org  Wed May 18 01:54:16 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id DC984B40DD7
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 01:54:16 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id C979B1E07
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 01:54:16 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id C8CB4B40DD6; Wed, 18 May 2016 01:54:16 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C6308B40DD5
 for <fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 01:54:16 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail108.syd.optusnet.com.au (mail108.syd.optusnet.com.au
 [211.29.132.59]) by mx1.freebsd.org (Postfix) with ESMTP id 871B01E06
 for <fs@freebsd.org>; Wed, 18 May 2016 01:54:15 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au
 (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109])
 by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id 2CF791A6B44;
 Wed, 18 May 2016 11:54:11 +1000 (AEST)
Date: Wed, 18 May 2016 11:54:02 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
cc: Konstantin Belousov <kostikbel@gmail.com>, fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
In-Reply-To: <20160518084931.T6534@besplex.bde.org>
Message-ID: <20160518110928.Q6900@besplex.bde.org>
References: <20160517072104.I2137@besplex.bde.org>
 <20160517084241.GY89104@kib.kiev.ua>
 <20160518061040.D5948@besplex.bde.org> <20160518070252.F6121@besplex.bde.org>
 <20160517220055.GF89104@kib.kiev.ua> <20160518084931.T6534@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=c+ZWOkJl c=1 sm=1 tr=0
 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10
 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=WiiwgCuxqNMNQUgj__EA:9
 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 01:54:17 -0000

On Wed, 18 May 2016, Bruce Evans wrote:

> ...
> How does any use of ump->um_devvp work?
>
> I tried revoke(2) on the devvp of a mounted file system.  This worked
> to give v_type = VBAD and v_rdev = NULL, but didn't crash.  ffs_unmount()
> checked for the bad vnode, unlike most places, and failed to clear
> si_mountpt.
>
> Normal use doesn't have revokes, but if the vnode is reclaimed instead
> of just becoming bad, then worse things probably happen.  I think vnode
> ...

I still haven't generated a crash, but revoke certainly does one bad thing:
it breaks detection of busy devices so that the same device can be mounted
more than once.  GEOM was supposed to allow multiple mounts for ro mounts,
but this gave garbage pointers and is turned off.  To turn it back on, use:

     % mount -o ro /dev/ad4s4a /i		# my normal mount
     % mount -o ro /dev/ad4s4a /i		# fails with EBUSY
     % revoke /dev/dev/ad4s4a
     % ls /i					# seems to work
     % mount -o ro /dev/ad4s4a /i		# doesn't fail; clobbers ptrs
     % ls /i					# seems to work
     % umount /i					# seems to work, but clobbers
     % ls /i					# top of stack still there
     % umount /i					# seems to work

Crashes can probably be arranged by writing to the device after it is
revoked.  The device is supposed to be exclusive access or at least ro,
but revoke breaks that.  Or just put 2 independent valid file systems on
the same device in advance or by writing, so as to clobber the pointers
better.

The exclusive access can also be broken using separate devfs instances:

     % mount -o ro /dev/ad4s4a /i
     % mkdir /tmp/dev
     % mount -t devfs devfs /tmp/dev		# normal sort of use for jails?
     % mount -o rw /tmp/dev/ad4s4a /i		# doesn't fail; can even be rw

Perhaps this doesn't clobber pointers near bufobj as badly as the turned off
code, but it certainly clobbers si_mountpt.  Each new mount sets si_mountpt
in the shared cdev struct.  The first unmount sets this to NULL so I think
it never points to garbage.  It just points to the wrong mount struct or
is turned off.

The case of multiple devfs instances has a chance of working since devvp
is separate so assigments to devvp->v_bufobj don't clobber previous
mouts.

I now remember that this prevented me finding a fix for the i/o
counting.  Multiple mounts were supposed to work, but obviously a
single pointer in the cdev cannot work for multiple mounts.  I think it
was removed (breaking the i/o counting) because it was too hard to fix
it to work even for a single mount (since allowing multiple mounts gives
pointer clobbering problems).

Bruce

From owner-freebsd-fs@freebsd.org  Wed May 18 04:24:09 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 71B54B402DE
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 04:24:09 +0000 (UTC)
 (envelope-from lexa@lexa.ru)
Received: from mx3.lexa.ru (ns503534.ip-198-27-68.net [198.27.68.102])
 by mx1.freebsd.org (Postfix) with ESMTP id 53A8516EB
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 04:24:08 +0000 (UTC)
 (envelope-from lexa@lexa.ru)
Received: by mx3.lexa.ru (Postfix, from userid 66)
 id 9733C224A5D; Wed, 18 May 2016 00:24:07 -0400 (EDT)
Received: from [193.124.130.166] (unknown [193.124.130.166])
 by home-gw.lexa.ru (Postfix) with ESMTP id A176D1801
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 07:21:46 +0300 (MSK)
Subject: Re: ZFS performance bottlenecks: CPU or RAM or anything else?
To: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
References: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
 <f87ec54a-104e-e712-7793-86c37285fdaa@internetx.com>
 <16e474da-6b20-2e51-9981-3c262eaff350@lexa.ru>
 <BD7DE274-04EB-4B19-988D-5A6FADC5B51A@digsys.bg>
 <1e012e43-a49b-6923-3f0a-ee77a5c8fa70@lexa.ru> <86shxgsdzh.fsf@WorkBox.Home>
 <CAHEMsqZto0wD9Ko4E9YUpYvea4jM0E4f2nC1HkAwcCG=6DfX-A@mail.gmail.com>
From: Alex Tutubalin <lexa@lexa.ru>
Message-ID: <f1cc3ee5-c141-b366-83bf-3ee0179381bf@lexa.ru>
Date: Wed, 18 May 2016 07:21:46 +0300
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <CAHEMsqZto0wD9Ko4E9YUpYvea4jM0E4f2nC1HkAwcCG=6DfX-A@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 04:24:09 -0000

On 5/18/2016 12:11 AM, Steven Hartland wrote:
> Raidz is limited essential limited to a single drive performance 
> per dev for read and write while mirror is single drive performance 
> for write its number of drives for read. Don't forget mirror is not 
> limited to two it can be three, four or more; so if you need more read 
> throughput you can add drives to the mirror.

Do I understand it correctly:

  - single write of one large file (or singe local write to zvol shared 
via iSCSI) will be local: single or only several metaslabs

  - for RAIDZ each disk will get only part of throughput

  - for mirror, each disk included in write will receive full data size 
(and for single local write only limited number of disks to be included 
in write)

If so, raidz will have huge write performance benefit in my case: single 
write of one large file.

As for read speed, I hope to deal with it with large enough L2ARC on SSDs.


>
> To increase raidz performance you need to add more vdevs. While this 
> doesn't have to be double i.e. the same vdev config as the first it 
> generally a good idea.

Again, multiple vdevs will help for multiple parallel writes, but not 
for single one?

Alex Tutubalin

From owner-freebsd-fs@freebsd.org  Wed May 18 06:10:24 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 93165B40B0D
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 06:10:24 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x22c.google.com (mail-wm0-x22c.google.com
 [IPv6:2a00:1450:400c:c09::22c])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 2A5411D8C
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 06:10:24 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x22c.google.com with SMTP id a17so62290187wme.0
 for <freebsd-fs@freebsd.org>; Tue, 17 May 2016 23:10:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=Qv5tOdb3utavt048KxW1XyVrzocu+YCEfUcA8NHz4g4=;
 b=aRx2z1VZ+tv8ftgMKdOwQnv1/CO2OMxGqmS8IAnBZcMPIyMp+iGzbVYlGAhm2DbBlS
 xFm5GuwiFzSkKoEMk1uflK8/8VuRF4n89zpoP7DBJ1QPVJBLKhY8HwfJ8kVTP8OvFGsq
 5ItbzQes0HmYDOclafenJJngCEXuaN4n2ZCYAM49/dc8zgpOgI9lQWe4Xlf39T9tZPKO
 F0nDSt+eKiUyNwYxnfGhu+pHT/dz6hlb+2V6MM7k0ffg7TkjuBbeakl1yHurz79g1ECh
 5xG6M7h1bxXNRAE7/xHR2Ny51R69RFGN0Pg7JToYnrHkhFLTtOfJSIIiVZOzMrCMrriX
 uJQQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=Qv5tOdb3utavt048KxW1XyVrzocu+YCEfUcA8NHz4g4=;
 b=aEUB227kzOPHMChVgzF5D3pJq3sqvP1b2L1trzImc1AbUIhP5sliVV3jPS/favImL9
 no2LhINsrARNfB8rEuDFLElu5MY63OOf9tN+DqlNf8+1tmPD2r3E/MojyCk5sVcsmovD
 iMA778enxgIECvV0TYZ07O9YBp2dCkuXPUvHwjf7knTMRtFSKRhQ2GvUS5Km6AtGndtT
 G3PD8/tIf6cqqCh0g70wIy7B+0p6XWVsaQjrO23VJM2h1SZ6BbxmS/vEIYZNqL2qOS7R
 zCHmy89sylx44+sSsA6a3rm0cj3glkK8pourn1U5Z2kVViReJ98jG0btLRif4rHBgidg
 S69Q==
X-Gm-Message-State: AOPr4FWknu2MfbEsn3Wyu+SIpy2vcSZUoyPLsQak9ba/JPQF+lCUrMNEb+IMiRA0nmGY6Q==
X-Received: by 10.194.203.227 with SMTP id kt3mr5054812wjc.73.1463551822537;
 Tue, 17 May 2016 23:10:22 -0700 (PDT)
Received: from [192.168.0.1] (cag06-2-82-237-68-117.fbx.proxad.net.
 [82.237.68.117])
 by smtp.gmail.com with ESMTPSA id b22sm7233476wmb.9.2016.05.17.23.10.21
 for <freebsd-fs@freebsd.org>
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Tue, 17 May 2016 23:10:21 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: State of native encryption in ZFS
From: Ben RUBSON <ben.rubson@gmail.com>
In-Reply-To: <0CE6E456-CC25-4AED-A73E-F5BBE659F795@mail.turbofuzz.com>
Date: Wed, 18 May 2016 08:10:20 +0200
Content-Transfer-Encoding: 7bit
Message-Id: <A22FF379-75D1-4316-BEB6-59E096C5B930@gmail.com>
References: <5736E7B4.1000409@gmail.com>
 <0CE6E456-CC25-4AED-A73E-F5BBE659F795@mail.turbofuzz.com>
To: freebsd-fs@freebsd.org
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 06:10:24 -0000

>> I wish to know somethign new about native encryption in ZFS for FreeBSD.
>> Any works in this direction are conducted?
> 
> Short and simple answer:  No.

However, look at this :
https://github.com/zfsonlinux/zfs/pull/4329
Certainly something interesting !

Ben

From owner-freebsd-fs@freebsd.org  Wed May 18 06:43:36 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2AFC2B4046C
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 06:43:36 +0000 (UTC)
 (envelope-from peter@rulingia.com)
Received: from vps.rulingia.com (vps.rulingia.com [103.243.244.15])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "rulingia.com", Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id B1D6211EB
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 06:43:35 +0000 (UTC)
 (envelope-from peter@rulingia.com)
Received: from server.rulingia.com (ppp59-167-167-3.static.internode.on.net
 [59.167.167.3])
 by vps.rulingia.com (8.15.2/8.15.2) with ESMTPS id u4I6hJ2X001483
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Wed, 18 May 2016 16:43:25 +1000 (AEST)
 (envelope-from peter@rulingia.com)
X-Bogosity: Ham, spamicity=0.000000
Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1])
 by server.rulingia.com (8.15.2/8.15.2) with ESMTPS id u4I6hDvZ069784
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Wed, 18 May 2016 16:43:13 +1000 (AEST)
 (envelope-from peter@server.rulingia.com)
Received: (from peter@localhost)
 by server.rulingia.com (8.15.2/8.15.2/Submit) id u4I6hBKq069783;
 Wed, 18 May 2016 16:43:11 +1000 (AEST) (envelope-from peter)
Date: Wed, 18 May 2016 16:43:11 +1000
From: Peter Jeremy <peter@rulingia.com>
To: "Matthew D. Fuller" <fullermd@over-yonder.net>
Cc: Freddie Cash <fjwcash@gmail.com>,
 "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>,
 Steven Hartland <steven@multiplay.co.uk>
Subject: Re: ZFS performance bottlenecks: CPU or RAM or anything else?
Message-ID: <20160518064311.GA22800@server.rulingia.com>
References: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
 <f87ec54a-104e-e712-7793-86c37285fdaa@internetx.com>
 <16e474da-6b20-2e51-9981-3c262eaff350@lexa.ru>
 <BD7DE274-04EB-4B19-988D-5A6FADC5B51A@digsys.bg>
 <1e012e43-a49b-6923-3f0a-ee77a5c8fa70@lexa.ru>
 <86shxgsdzh.fsf@WorkBox.Home>
 <CAHEMsqZto0wD9Ko4E9YUpYvea4jM0E4f2nC1HkAwcCG=6DfX-A@mail.gmail.com>
 <CAOjFWZ6o8Gqh1BzUbkLj+KKXm=r2S-3qy_yk5k84Q57yj7FuAw@mail.gmail.com>
 <20160517213549.GK24656@over-yonder.net>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
 protocol="application/pgp-signature"; boundary="AqsLC8rIMeq19msA"
Content-Disposition: inline
In-Reply-To: <20160517213549.GK24656@over-yonder.net>
X-PGP-Key: http://www.rulingia.com/keys/peter.pgp
User-Agent: Mutt/1.6.1 (2016-04-27)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 06:43:36 -0000


--AqsLC8rIMeq19msA
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2016-May-17 16:35:49 -0500, "Matthew D. Fuller" <fullermd@over-yonder.ne=
t> wrote:
>More specifically, as I read it, different performance in a very
>specific metric; single-thread linear bulk writes.  That doesn't seem
>like it would benefit heavily from a lot of cores available, or from
>RAM bandwidth or size above a pretty low threshold.

Actually, whilst I presume the OP has compression disabled, ZFS can
very effectively use multiple cores to compress data - even if it's
only a single linear writer.

>Of course, it's not just changing the CPU and RAM; it's also the
>motherboard, and possibly the HBA (at least the bus the HBA is on, if
>it's a card being transplanted with the pool).  And the Core 2 would
>be back in the plain-old FSB era, so RAM access would be competing
>with the disk IO on the bus.

Without knowing much more about the configuration of each system, it's
impossible to identify where the bottleneck might be.

--=20
Peter Jeremy

--AqsLC8rIMeq19msA
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQJ8BAEBCgBmBQJXPA7/XxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRFRUIyOTg2QzMwNjcxRTc0RTY1QzIyN0Ux
NkE1OTdBMEU0QTIwQjM0AAoJEBall6Dkogs0kAAP/39OaH2lcRztfM/7f2QeUrLZ
zBqZUEV/s5KTAWFL/enJT0Jel7pVgpGN5JZ/zKwf+SgULNZUYRVMXrJ9+9f2EkJv
gi/nWwj1KgI9zLTd5W1IhOCnLe/3WXqnj+skFcmDpI69eGHI+ndBP4T6ZCG2LPSZ
HjKaERFPW+cqqRnLYKsuCEVB9gZixBjV0wCkrhXoTs4d6Ce7+uFk3Wbef3EA5h8U
B3bBhSdfWRIeTiJZQzq4njUmQZqQ6MAdCWUPX/EIgC0n6f6rzm4BJde8mxVilpKG
BfAAl1FuBTg3zG4zI3mJ4o7lpQoNGYYVoDFmcSOMXx+TCQ1Bug8GQjN8ComViIX/
yYLMkRS0+29FUvQhNydmSuxEwQGVJmKBZNFyJqVnLNVSbbbw3hnZOwaVBxTj0hF+
BOlGh7U0JEZj7sswfD3RssljBe3PHU9qG1FYT8OSn/ScMI0lnGvVvW6XuSaujjpv
SehrSwvU9HD080gbOdsGZjVI8kVzhZoejxz/GCxOf7/xqQo78kkkAXJFAOMD2fl1
gDeBijmKk+Mzf9D2QBhxByBgtyQCIDzf68sT6LNdy5od8r05rP2PXRr9TIJjiJTx
MUzpxhFByrB8rdDgx3hxDbQMLpNdeif2QQNxlk4dyv1e7ic88xTisMJgJcfaa2Gm
FguZKkUB0sNFjyuenM2s
=gsc7
-----END PGP SIGNATURE-----

--AqsLC8rIMeq19msA--

From owner-freebsd-fs@freebsd.org  Wed May 18 07:27:52 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5685FB403DA
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 07:27:52 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x233.google.com (mail-wm0-x233.google.com
 [IPv6:2a00:1450:400c:c09::233])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id F20FE1FD0
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 07:27:51 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x233.google.com with SMTP id n129so170672230wmn.1
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 00:27:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=ZEmanaacv3nlhLHS8gNaBDDsFjRhWPwJB5yfIkpZz9w=;
 b=hUTSlDFFHWJtyr8v9jKgc5HTqT77NusTNptpGAk+Td1BciSWPbz1+JDsGhF+n7bv8x
 2ouXewcseO/i7/XTYvx9Br0TC0Bk0RHfwIv6DqU4wreTp5hZoNFWK/QQqlgx7EiMiFkI
 HpAPDO5tiXn342wOWAefrwlmv/8ZawFs7VigE8/E886E2+U2PPrjyeYDorNGYNE0FIt4
 yq2MEu2fdD/30285G9PHM3TpUW029cZQbo6CE5VhsvxidKRHMiRKsgurxYKxxWQdYNoV
 sNFliHcKbyGhH1fLTaciLckmwf5E4wBV73N1IZcijZXfPkQbZzrKfcSHr2X2fY0RG0R3
 S8MQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=ZEmanaacv3nlhLHS8gNaBDDsFjRhWPwJB5yfIkpZz9w=;
 b=JImszdUKiXyUVd87jUFlqiXgs/kJutvb9Yg7IDTHmlcKmoj0KKlb02fMEUHReGkJuU
 E4lq9Z0p56CeH9+tNvzSkZYFy5F/a58WYOu2tVJXowtcJkfvd3ENl5uPkvTNypfSTcoo
 q54j76oYdhUjDeU8drFCJ/REhJ6Tk+mXfF5I53jMo8PLISeQrdMhFz2OQIyaNFN9U+2w
 7x/aBFWVvLua6FF5t3mjgX3OXa8uxaA3N/HcNpbCblPYf7Q6FcWhvhOpYrNgr4UOlqT8
 jGUlf1OzG+L9djm4pAT+Jn9xZACyEP7hGqymJAlVntmQr9iQwLdOfVMjBHVuFO/k17Ev
 PzMg==
X-Gm-Message-State: AOPr4FUujMPYDcHNWVBPKFP+ishS3t6YckI7csrrAJCdYRDYY5/WmVZU3zw0W8cNzG71Ow==
X-Received: by 10.28.215.197 with SMTP id o188mr5967190wmg.14.1463556470508;
 Wed, 18 May 2016 00:27:50 -0700 (PDT)
Received: from [192.168.0.1] (cag06-2-82-237-68-117.fbx.proxad.net.
 [82.237.68.117])
 by smtp.gmail.com with ESMTPSA id a75sm7532098wme.18.2016.05.18.00.27.49
 for <freebsd-fs@freebsd.org>
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Wed, 18 May 2016 00:27:49 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: Best practice for high availability ZFS pool
From: Ben RUBSON <ben.rubson@gmail.com>
In-Reply-To: <alpine.GSO.2.20.1605171201040.14628@freddy.simplesystems.org>
Date: Wed, 18 May 2016 09:27:48 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <5F874CA9-A8D9-4A09-A4BD-95466AB7D165@gmail.com>
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
 <alpine.GSO.2.20.1605162034170.7756@freddy.simplesystems.org>
 <AB71607F-7048-404E-AFE3-D448823BB768@gmail.com>
 <alpine.GSO.2.20.1605170819220.7756@freddy.simplesystems.org>
 <40C35566-B7FB-4F59-BB41-D43BC0362C26@gmail.com>
 <alpine.GSO.2.20.1605171201040.14628@freddy.simplesystems.org>
To: freebsd-fs@freebsd.org
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 07:27:52 -0000

> On 17 may 2016 at 19:06, Bob Friesenhahn =
<bfriesen@simple.dallas.tx.us> wrote:
>=20
> On Tue, 17 May 2016, Ben RUBSON wrote:
>=20
>>> On 17 may 2016 at 15:24, Bob Friesenhahn =
<bfriesen@simple.dallas.tx.us> wrote:
>>>=20
>>> There is at least one case of zfs send propagating a problem into =
the receiving pool. I don't know if it broke the pool.  Corrupt data may =
be sent from one pool to another if it passes checksums.
>>=20
>> Do you have any link to this problem ? Would be interesting to know =
if it was possible to come-back to a previous snapshot / consistent =
pool.
>=20
> I don't have a link but I recall that it had something to do with the =
ability to send file 'holes' in the stream.

OK, just for reference : =
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D207714

>> I think that making ZFS send/receive has a higher security level than =
mirroring to a second (or third) JBOD box.
>> With mirroring you will still have only one ZFS pool.
>=20
> This is a reasonable assumption.
>=20
>> However, if send/receive makes the receiving pool the exact 1:1 copy =
of the sending pool, then the thing which made the sending pool to =
corrupt could reach (and corrupt) the receiving pool... I don't know =
whether or not this could occur, and if ever it occurs, if we have the =
chance to revert to a previous snapshot, at least on the receiving =
side...
>=20
> Zfs receive does not result in a 1:1 copy.  The underlying data =
organization can be completely different and compression or other =
options can be changed.

Yes, so if we assume ZFS send/receive bug-free, having a second pool =
which receives data of the first one (mirrored to different JBOD boxes), =
makes sense.

For the first pool, we could think about the following :
- server1 with its JBOD as a iSCSI target ;
- server2 with the exact same JBOD, iSCSI initiator, hosts a ZFS pool =
which mirrors each of server2's disks with one of the server1's disks.
If ever server2 fails, server1 imports the pool and brings the service =
back up.
When server2 comes back, it acts as the new iSCSI target and gives its =
disks to server1 which reconstructs the mirror.
Disks redundancy, and hardware redundancy.

And regularly, this pool is sent/received to a different pool on =
server3, we never know...

Sounds good (to me at least :)

Ben=

From owner-freebsd-fs@freebsd.org  Wed May 18 07:53:29 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D03D2B40331
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 07:53:29 +0000 (UTC)
 (envelope-from girgen@pingpong.net)
Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202])
 by mx1.freebsd.org (Postfix) with ESMTP id 9B87D1491;
 Wed, 18 May 2016 07:53:29 +0000 (UTC)
 (envelope-from girgen@pingpong.net)
Received: from [10.22.157.15] (unknown [94.234.170.60])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.pingpong.net (Postfix) with ESMTPSA id 5DC1C16878;
 Wed, 18 May 2016 09:53:27 +0200 (CEST)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (1.0)
Subject: Re: Best practice for high availability ZFS pool
From: Palle Girgensohn <girgen@pingpong.net>
X-Mailer: iPhone Mail (13E238)
In-Reply-To: <5DA13472-F575-4D3D-80B7-1BE371237CE5@getsomewhere.net>
Date: Wed, 18 May 2016 09:53:26 +0200
Cc: Palle Girgensohn <girgen@FreeBSD.org>, freebsd-fs@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <8E674522-17F0-46AC-B494-F0053D87D2B0@pingpong.net>
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
 <5DA13472-F575-4D3D-80B7-1BE371237CE5@getsomewhere.net>
To: Joe Love <joe@getsomewhere.net>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 07:53:29 -0000


> 17 maj 2016 kl. 18:13 skrev Joe Love <joe@getsomewhere.net>:
>=20
>=20
>> On May 16, 2016, at 5:08 AM, Palle Girgensohn <girgen@FreeBSD.org> wrote:=

>>=20
>> Hi,
>>=20
>> We need to set up a ZFS pool with redundance. The main goal is high avail=
ability - uptime.
>>=20
>> I can see a few of paths to follow.
>>=20
>> 1. HAST + ZFS
>>=20
>> 2. Some sort of shared storage, two machines sharing a JBOD box.
>>=20
>> 3. ZFS replication (zfs snapshot + zfs send | ssh | zfs receive)
>>=20
>> 4. using something else than ZFS, even a different OS if required.
>>=20
>> My main concern with HAST+ZFS is performance. Google offer some insights h=
ere, I find mainly unsolved problems. Please share any success stories or ot=
her experiences.
>>=20
>> Shared storage still has a single point of failure, the JBOD box. Apart f=
rom that, is there even any support for the kind of storage PCI cards that s=
upport dual head for a storage box? I cannot find any.
>>=20
>> We are running with ZFS replication today, but it is just too slow for th=
e amount of data.
>>=20
>> We prefer to keep ZFS as we already have a rather big (~30 TB) pool and a=
lso tools, scripts, backup all is using ZFS, but if there is no solution usi=
ng ZFS, we're open to alternatives. Nexenta springs to mind, but I believe i=
t is using shared storage for redundance, so it does have single points of f=
ailure?
>>=20
>> Any other suggestions? Please share your experience. :)
>>=20
>> Palle
>=20
> I don=E2=80=99t know if this falls into the realm of what you want, but BS=
DMag just released an issue with an article entitled =E2=80=9CAdding ZFS to t=
he FreeBSD dual-controller storage concept.=E2=80=9D
> https://bsdmag.org/download/reusing_openbsd/
>=20
> My understanding in this setup is that the only single point of failure fo=
r this model is the backplanes that the drives would connect to.  Depending o=
n your controller cards, this could be alleviated by simply using multiple d=
rive shelves, and only using one drive/shelf as part of a vdev (then stripe o=
r whatnot over your vdevs).
>=20
> It might not be what you=E2=80=99re after, as it=E2=80=99s basically two s=
ystems with their own controllers, with a shared set of drives.  Some expans=
ion from the virtual world to real physical systems will probably need addit=
ional variations.
> I think the TrueNAS system (with HA) is setup similar to this, only withou=
t the split between the drives being primarily handled by separate controlle=
rs, but someone with more in-depth knowledge would need to confirm/deny this=
.
>=20
> -Jo

Hi,

Do you know any specific controllers that work with dual head?

Thanks.,
Palle


From owner-freebsd-fs@freebsd.org  Wed May 18 08:02:01 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5AE76B40896
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 08:02:01 +0000 (UTC)
 (envelope-from killing@multiplay.co.uk)
Received: from mail-wm0-x233.google.com (mail-wm0-x233.google.com
 [IPv6:2a00:1450:400c:c09::233])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id E057D18E7
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 08:02:00 +0000 (UTC)
 (envelope-from killing@multiplay.co.uk)
Received: by mail-wm0-x233.google.com with SMTP id r12so21754004wme.0
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 01:02:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623;
 h=subject:to:references:from:message-id:date:user-agent:mime-version
 :in-reply-to; bh=26hb0lp5WcDOWlKZd+27ZAvGaHKfswna5DuPsFPGHko=;
 b=q0Jxo0h4mtYVNfQr2pPmuc7vw+/biv32ehLgHGJHz+5+FdN79JW5N2DeodqKdtaU72
 OX8pHXRG8uNYz9BYohtyzKxSLir+pPS2rLZtbG/LsCjJUSROY34a2wCmO7u3NVw9yLcn
 DCMArqjco6bhVYKen3A0AWaiMThJV/BFbm1qTuMPMI92rZNOjArmyjfD3NyluL+p/bpk
 8XnaVvtsvmtgoD82BCJmTwyMi2qvFZcB0ylwWBoIbhC9TvIok3R6v6ftdi6sz9Odckoa
 Vvpw9R1sD/C4sLK6kJlZ04pZPex8M1P5LPDGpIKD2RcWIBQriralpERP7fJc9X4Q5Np7
 gxUw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:subject:to:references:from:message-id:date
 :user-agent:mime-version:in-reply-to;
 bh=26hb0lp5WcDOWlKZd+27ZAvGaHKfswna5DuPsFPGHko=;
 b=CbuUgs0GUHflFXCZH+19zhciUvQc+vBXHDr7hvxvJM5q2DwxpXd62jwwT/QBn+OIgB
 URxwwhzgFA/YmVlVqM93AUBL+gF7rGjW+NHfAv8wgx6Shy3hhlIA6hqW6HSgur4RdLEA
 sAfkJ2V3PV7t6MaAnsgF7xTT3r6KIaoYic9USLcHt+UNqIcE43l/PZ2LtdzdGDvWGe8S
 FljCAY1Yq0IkVujzGXQZwlNzlBxvYB1gODp0ZeAtgd2g/+q/Q8iZbdL6vJviMNmn6l1B
 erDTArXGlf4mTLeB2MkqnZxLX5DiOFdfyct6mZXhES1SYb+23fbJV5IteJnIw+vbreWr
 e1wA==
X-Gm-Message-State: AOPr4FUVrakDx+YHxEtJqDTbk98DawlbRHJXG9Fo9r3+azF7od5IB2O+htPkgo6IOQodqec9
X-Received: by 10.28.111.14 with SMTP id k14mr6000521wmc.32.1463558518998;
 Wed, 18 May 2016 01:01:58 -0700 (PDT)
Received: from [10.10.1.58] (liv3d.labs.multiplay.co.uk. [82.69.141.171])
 by smtp.gmail.com with ESMTPSA id w9sm28180175wme.19.2016.05.18.01.01.57
 for <freebsd-fs@freebsd.org> (version=TLSv1/SSLv3 cipher=OTHER);
 Wed, 18 May 2016 01:01:57 -0700 (PDT)
Subject: Re: ZFS performance bottlenecks: CPU or RAM or anything else?
To: freebsd-fs@freebsd.org
References: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
 <f87ec54a-104e-e712-7793-86c37285fdaa@internetx.com>
 <16e474da-6b20-2e51-9981-3c262eaff350@lexa.ru>
 <BD7DE274-04EB-4B19-988D-5A6FADC5B51A@digsys.bg>
 <1e012e43-a49b-6923-3f0a-ee77a5c8fa70@lexa.ru> <86shxgsdzh.fsf@WorkBox.Home>
 <CAHEMsqZto0wD9Ko4E9YUpYvea4jM0E4f2nC1HkAwcCG=6DfX-A@mail.gmail.com>
 <f1cc3ee5-c141-b366-83bf-3ee0179381bf@lexa.ru>
From: Steven Hartland <killing@multiplay.co.uk>
Message-ID: <39be913e-32a5-2120-fee5-4521b8b95d80@multiplay.co.uk>
Date: Wed, 18 May 2016 09:02:03 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <f1cc3ee5-c141-b366-83bf-3ee0179381bf@lexa.ru>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 08:02:01 -0000

My comment was targeted under the assumption of random IOPs workload, 
which is typically the case, where each RAIDZ group (vdev) will give 
approximately a single drive performance. For a pretty definitive guide 
/ answer see:
http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/

There's also some useful practical test results here:
https://calomel.org/zfs_raid_speed_capacity.html

On 18/05/2016 05:21, Alex Tutubalin wrote:
> On 5/18/2016 12:11 AM, Steven Hartland wrote:
>> Raidz is limited essential limited to a single drive performance per 
>> dev for read and write while mirror is single drive performance for 
>> write its number of drives for read. Don't forget mirror is not 
>> limited to two it can be three, four or more; so if you need more 
>> read throughput you can add drives to the mirror.
>
> Do I understand it correctly:
>
>  - single write of one large file (or singe local write to zvol shared 
> via iSCSI) will be local: single or only several metaslabs
>
>  - for RAIDZ each disk will get only part of throughput
>
>  - for mirror, each disk included in write will receive full data size 
> (and for single local write only limited number of disks to be 
> included in write)
>
> If so, raidz will have huge write performance benefit in my case: 
> single write of one large file.
>
> As for read speed, I hope to deal with it with large enough L2ARC on 
> SSDs.
>
>
>>
>> To increase raidz performance you need to add more vdevs. While this 
>> doesn't have to be double i.e. the same vdev config as the first it 
>> generally a good idea.
>
> Again, multiple vdevs will help for multiple parallel writes, but not 
> for single one?
>
> Alex Tutubalin
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@freebsd.org  Wed May 18 08:02:13 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E88C2B408B3
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 08:02:13 +0000 (UTC)
 (envelope-from jg@internetx.com)
Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 76A29199F
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 08:02:13 +0000 (UTC)
 (envelope-from jg@internetx.com)
Received: from localhost (localhost [127.0.0.1])
 by mx1.internetx.com (Postfix) with ESMTP id BD3F345FC0D8;
 Wed, 18 May 2016 10:02:04 +0200 (CEST)
X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de
Received: from mx1.internetx.com ([62.116.129.39])
 by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id JYzL3Wdxyf6l; Wed, 18 May 2016 10:02:02 +0200 (CEST)
Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.internetx.com (Postfix) with ESMTPSA id 343BA4C4C5E9;
 Wed, 18 May 2016 10:02:02 +0200 (CEST)
Subject: Re: Best practice for high availability ZFS pool
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
 <5DA13472-F575-4D3D-80B7-1BE371237CE5@getsomewhere.net>
 <8E674522-17F0-46AC-B494-F0053D87D2B0@pingpong.net>
To: Joe Love <joe@getsomewhere.net>
Cc: freebsd-fs@freebsd.org
Reply-To: jg@internetx.com
From: InterNetX - Juergen Gotteswinter <jg@internetx.com>
Message-ID: <361f80cb-c7e2-18f6-ad62-f6f91aa7c293@internetx.com>
Date: Wed, 18 May 2016 10:02:00 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <8E674522-17F0-46AC-B494-F0053D87D2B0@pingpong.net>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 08:02:14 -0000


Am 5/18/2016 um 9:53 AM schrieb Palle Girgensohn:
> 
> 
>> 17 maj 2016 kl. 18:13 skrev Joe Love <joe@getsomewhere.net>:
>>
>>
>>> On May 16, 2016, at 5:08 AM, Palle Girgensohn <girgen@FreeBSD.org> wrote:
>>>
>>> Hi,
>>>
>>> We need to set up a ZFS pool with redundance. The main goal is high availability - uptime.
>>>
>>> I can see a few of paths to follow.
>>>
>>> 1. HAST + ZFS
>>>
>>> 2. Some sort of shared storage, two machines sharing a JBOD box.
>>>
>>> 3. ZFS replication (zfs snapshot + zfs send | ssh | zfs receive)
>>>
>>> 4. using something else than ZFS, even a different OS if required.
>>>
>>> My main concern with HAST+ZFS is performance. Google offer some insights here, I find mainly unsolved problems. Please share any success stories or other experiences.
>>>
>>> Shared storage still has a single point of failure, the JBOD box. Apart from that, is there even any support for the kind of storage PCI cards that support dual head for a storage box? I cannot find any.
>>>
>>> We are running with ZFS replication today, but it is just too slow for the amount of data.
>>>
>>> We prefer to keep ZFS as we already have a rather big (~30 TB) pool and also tools, scripts, backup all is using ZFS, but if there is no solution using ZFS, we're open to alternatives. Nexenta springs to mind, but I believe it is using shared storage for redundance, so it does have single points of failure?
>>>
>>> Any other suggestions? Please share your experience. :)
>>>
>>> Palle
>>
>> I don’t know if this falls into the realm of what you want, but BSDMag just released an issue with an article entitled “Adding ZFS to the FreeBSD dual-controller storage concept.”
>> https://bsdmag.org/download/reusing_openbsd/
>>
>> My understanding in this setup is that the only single point of failure for this model is the backplanes that the drives would connect to.  Depending on your controller cards, this could be alleviated by simply using multiple drive shelves, and only using one drive/shelf as part of a vdev (then stripe or whatnot over your vdevs).
>>
>> It might not be what you’re after, as it’s basically two systems with their own controllers, with a shared set of drives.  Some expansion from the virtual world to real physical systems will probably need additional variations.
>> I think the TrueNAS system (with HA) is setup similar to this, only without the split between the drives being primarily handled by separate controllers, but someone with more in-depth knowledge would need to confirm/deny this.
>>
>> -Jo
> 
> Hi,
> 
> Do you know any specific controllers that work with dual head?
> 
> Thanks.,
> Palle

go for lsi sas2008 based hba

> 
> 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> 

From owner-freebsd-fs@freebsd.org  Wed May 18 08:28:11 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C8D19B3F207
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 08:28:11 +0000 (UTC)
 (envelope-from lexa@lexa.ru)
Received: from mx3.lexa.ru (ns503534.ip-198-27-68.net [198.27.68.102])
 by mx1.freebsd.org (Postfix) with ESMTP id AA3561D80
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 08:28:11 +0000 (UTC)
 (envelope-from lexa@lexa.ru)
Received: by mx3.lexa.ru (Postfix, from userid 66)
 id 7AD4C224A5C; Wed, 18 May 2016 04:28:10 -0400 (EDT)
Received: from [193.124.130.166] (unknown [193.124.130.166])
 by home-gw.lexa.ru (Postfix) with ESMTP id 2C0D61CAF
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 11:26:06 +0300 (MSK)
Subject: Re: ZFS performance bottlenecks: CPU or RAM or anything else?
To: freebsd-fs@freebsd.org
References: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
 <f87ec54a-104e-e712-7793-86c37285fdaa@internetx.com>
 <16e474da-6b20-2e51-9981-3c262eaff350@lexa.ru>
 <BD7DE274-04EB-4B19-988D-5A6FADC5B51A@digsys.bg>
 <1e012e43-a49b-6923-3f0a-ee77a5c8fa70@lexa.ru> <86shxgsdzh.fsf@WorkBox.Home>
 <CAHEMsqZto0wD9Ko4E9YUpYvea4jM0E4f2nC1HkAwcCG=6DfX-A@mail.gmail.com>
 <f1cc3ee5-c141-b366-83bf-3ee0179381bf@lexa.ru>
 <39be913e-32a5-2120-fee5-4521b8b95d80@multiplay.co.uk>
From: Alex Tutubalin <lexa@lexa.ru>
Message-ID: <411166e6-239f-0bf2-99df-e177f334270c@lexa.ru>
Date: Wed, 18 May 2016 11:26:06 +0300
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <39be913e-32a5-2120-fee5-4521b8b95d80@multiplay.co.uk>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 08:28:11 -0000

On 5/18/2016 11:02 AM, Steven Hartland wrote:
> My comment was targeted under the assumption of random IOPs workload, 
> which is typically the case, where each RAIDZ group (vdev) will give 
> approximately a single drive performance. For a pretty definitive 
> guide / answer see:
> http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/

Thank you for the link.

In my workload (single write stream) IOPs count is very low, disk write 
locality is good (each file most likely to fit in single metaslab), so 
bandwidth is not limited to single drive bandwidth.

My current box (6x 7200rpm HDDs in raidz1) provides about 430 Mb/s write 
bandwidth  over SMB link and about 500Mb/s for local writes. It is ~100 
Mb/s per spindle, close enough to expected.

I hope, I'll see 2x in bandwidth with 2x spindle count if I do not hit 
another performance limiter.  So, my initial question was 'is there any 
known raidz performance limiter, like CPU or RAM speed/latency'.

>
> There's also some useful practical test results here:
> https://calomel.org/zfs_raid_speed_capacity.html

I've already posted this link in my thread-starting message :)

And, yes, there are very strange similarity in both read and write speed 
in 6x and 10x SSD/raidz2 cases.

Unfortunately, this benchmark is not real use case because of:
"Since the disk cache can artificially inflate the results we choose to 
disable drive caches completely using Bonnie++ in synchronous test mode 
only."
Synchronous mode will result in double writes (ZIL, than data), without 
separate ZIL device ZIL to be written to the main pool.

We do not know what will happen with real-life async writes on same 
hardware.

Alex Tutubalin

From owner-freebsd-fs@freebsd.org  Wed May 18 09:28:53 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D071EB3D72B
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 09:28:53 +0000 (UTC)
 (envelope-from crest@rlwinm.de)
Received: from smtp.rlwinm.de (smtp.rlwinm.de [148.251.233.239])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 8A2CD1BB4
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 09:28:53 +0000 (UTC)
 (envelope-from crest@rlwinm.de)
Received: from crest.local (unknown [87.253.189.132])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.rlwinm.de (Postfix) with ESMTPSA id AA7C86E14
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 11:28:50 +0200 (CEST)
Subject: Re: ZFS performance bottlenecks: CPU or RAM or anything else?
To: freebsd-fs@freebsd.org
References: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
From: Jan Bramkamp <crest@rlwinm.de>
Message-ID: <f230d493-ce6c-ff6b-09f8-94cfac6d61bb@rlwinm.de>
Date: Wed, 18 May 2016 11:28:49 +0200
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
 Gecko/20100101 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 09:28:53 -0000


On 17/05/16 14:00, Alex Tutubalin wrote:
> Hi,
>
> I'm new to the list, sorry if the subject was discussed earlier (for
> many times), just point to archives....
>
> I'm building new storage server for 'linear read/linear write'
> performance with limited number of parallel data streams (load like
> read/write multi-gigabyte photoshop files, or read many large raw photo
> files).
> Target is to saturate 10G link using SMB or iSCSI.
>
> Several years ago I've tested small zpool (5x3Tb 7200rpm drives in
> RAIDZ) with different CPU/memory combos and have got  these results for
> linear write speed by big chunks:
>
>  440 Mb/sec with Core i3-2120/DDR3-1600 ram (2 channel)
>  360 Mb/sec with core i7-920/DDR3-1333 (3 channel RAM)
>  280 Mb/sec with Core 2Q Q9300 /DDR2-800 (2 channel)
>
> Mixed thoughts:  i7-920 is fastest of the three, RAM linear access also
> fastest, but beaten by i3-2120 with lower latency memory.
>
> Also, I've found this link:
> https://calomel.org/zfs_raid_speed_capacity.html
> For 6x SSD and 10x SSD in RAIDZ2, there is very similar read speed
> (1.7Gb/sec) and very close in write speed (721/806 Mb/sec for 6/10 drives).
>
> Assuming HBA/PCIe performance to be very same for read and write
> operations, write speed is not limited by HBA/bus... so it is limited by
> what?  CPU or RAM or ...?
>
> So, my question is 'what CPU/memory is optimal for ZFS performance'?
>
> In particular:
>   - DDR3 or DDR4 (twice the bandwidth) ?
>  -  limited number of cores and high clock rate (e.g. i3-6xxxx) or many
> cores/slower clock ?
>
> No plans to use compression or deduplication, only raidz2 with 8-10 HDD
> spindles and 3-4-5 SSDs for L2ARC.

Don't forget that you're not just benchmarking CPUs. You're measuring 
whole systems with different disk controllers, memory controllers, 
interrupt routing etc. For example the Core 2 CPU is limited by its old 
design putting the memory controllers into the northbridge.

Maybe you can reduce some of the differences by using the same PCI-e SAS 
HBA in each system.

From owner-freebsd-fs@freebsd.org  Wed May 18 10:38:22 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8925EB4026B
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 10:38:22 +0000 (UTC)
 (envelope-from girgen@pingpong.net)
Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202])
 by mx1.freebsd.org (Postfix) with ESMTP id 1D4231532
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 10:38:21 +0000 (UTC)
 (envelope-from girgen@pingpong.net)
Received: from [10.226.149.205] (80-254-69-13.dynamic.monzoon.net
 [80.254.69.13])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.pingpong.net (Postfix) with ESMTPSA id F199316C12;
 Wed, 18 May 2016 12:38:20 +0200 (CEST)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (1.0)
Subject: Re: Best practice for high availability ZFS pool
From: Palle Girgensohn <girgen@pingpong.net>
X-Mailer: iPhone Mail (13E238)
In-Reply-To: <5127A334-0805-46B8-9CD9-FD8585CB84F3@chittenden.org>
Date: Wed, 18 May 2016 12:38:20 +0200
Cc: freebsd-fs@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <BA724613-FD0F-4489-A42B-69E584959FFF@pingpong.net>
References: <5E69742D-D2E0-437F-B4A9-A71508C370F9@FreeBSD.org>
 <5DA13472-F575-4D3D-80B7-1BE371237CE5@getsomewhere.net>
 <8E674522-17F0-46AC-B494-F0053D87D2B0@pingpong.net>
 <5127A334-0805-46B8-9CD9-FD8585CB84F3@chittenden.org>
To: Sean Chittenden <sean@chittenden.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 10:38:22 -0000


> 18 maj 2016 kl. 09:58 skrev Sean Chittenden <sean@chittenden.org>:
>=20
> https://www.freebsdfoundation.org/wp-content/uploads/2015/12/vol2_no4_grou=
pon.pdf
>=20
> mps(4) was good to us.  What=E2=80=99s your workload?  -sc

Have to check details for peaks but average is around 0.8 MByte/s. Not much.=
 It will grow.=20

>=20
> --
> Sean Chittenden
> sean@chittenden.org
>=20
>=20
>> On May 18, 2016, at 03:53 , Palle Girgensohn <girgen@pingpong.net> wrote:=

>>=20
>>=20
>>=20
>>> 17 maj 2016 kl. 18:13 skrev Joe Love <joe@getsomewhere.net>:
>>>=20
>>>=20
>>>> On May 16, 2016, at 5:08 AM, Palle Girgensohn <girgen@FreeBSD.org> wrot=
e:
>>>>=20
>>>> Hi,
>>>>=20
>>>> We need to set up a ZFS pool with redundance. The main goal is high ava=
ilability - uptime.
>>>>=20
>>>> I can see a few of paths to follow.
>>>>=20
>>>> 1. HAST + ZFS
>>>>=20
>>>> 2. Some sort of shared storage, two machines sharing a JBOD box.
>>>>=20
>>>> 3. ZFS replication (zfs snapshot + zfs send | ssh | zfs receive)
>>>>=20
>>>> 4. using something else than ZFS, even a different OS if required.
>>>>=20
>>>> My main concern with HAST+ZFS is performance. Google offer some insight=
s here, I find mainly unsolved problems. Please share any success stories or=
 other experiences.
>>>>=20
>>>> Shared storage still has a single point of failure, the JBOD box. Apart=
 from that, is there even any support for the kind of storage PCI cards that=
 support dual head for a storage box? I cannot find any.
>>>>=20
>>>> We are running with ZFS replication today, but it is just too slow for t=
he amount of data.
>>>>=20
>>>> We prefer to keep ZFS as we already have a rather big (~30 TB) pool and=
 also tools, scripts, backup all is using ZFS, but if there is no solution u=
sing ZFS, we're open to alternatives. Nexenta springs to mind, but I believe=
 it is using shared storage for redundance, so it does have single points of=
 failure?
>>>>=20
>>>> Any other suggestions? Please share your experience. :)
>>>>=20
>>>> Palle
>>>=20
>>> I don=E2=80=99t know if this falls into the realm of what you want, but B=
SDMag just released an issue with an article entitled =E2=80=9CAdding ZFS to=
 the FreeBSD dual-controller storage concept.=E2=80=9D
>>> https://bsdmag.org/download/reusing_openbsd/
>>>=20
>>> My understanding in this setup is that the only single point of failure f=
or this model is the backplanes that the drives would connect to.  Depending=
 on your controller cards, this could be alleviated by simply using multiple=
 drive shelves, and only using one drive/shelf as part of a vdev (then strip=
e or whatnot over your vdevs).
>>>=20
>>> It might not be what you=E2=80=99re after, as it=E2=80=99s basically two=
 systems with their own controllers, with a shared set of drives.  Some expa=
nsion from the virtual world to real physical systems will probably need add=
itional variations.
>>> I think the TrueNAS system (with HA) is setup similar to this, only with=
out the split between the drives being primarily handled by separate control=
lers, but someone with more in-depth knowledge would need to confirm/deny th=
is.
>>>=20
>>> -Jo
>>=20
>> Hi,
>>=20
>> Do you know any specific controllers that work with dual head?
>>=20
>> Thanks.,
>> Palle
>>=20
>>=20
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>=20


From owner-freebsd-fs@freebsd.org  Wed May 18 11:08:41 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E12B9B40B8D
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 11:08:41 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id CA015189E
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 11:08:41 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id C9536B40B8C; Wed, 18 May 2016 11:08:41 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C6C93B40B8B
 for <fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 11:08:41 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6A90A189D
 for <fs@freebsd.org>; Wed, 18 May 2016 11:08:41 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u4IB8ZdA002871
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Wed, 18 May 2016 14:08:35 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u4IB8ZdA002871
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u4IB8YXl002870;
 Wed, 18 May 2016 14:08:34 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 18 May 2016 14:08:34 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Cc: fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
Message-ID: <20160518110834.GJ89104@kib.kiev.ua>
References: <20160517072104.I2137@besplex.bde.org>
 <20160517084241.GY89104@kib.kiev.ua>
 <20160518061040.D5948@besplex.bde.org>
 <20160518070252.F6121@besplex.bde.org>
 <20160517220055.GF89104@kib.kiev.ua>
 <20160518084931.T6534@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160518084931.T6534@besplex.bde.org>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 11:08:42 -0000

On Wed, May 18, 2016 at 10:00:09AM +1000, Bruce Evans wrote:
> On Wed, 18 May 2016, Konstantin Belousov wrote:
> > VCHR check ensures that the devvp vnode is not reclaimed. I do not want
> > to remove the check and rely on the caller of ffs_mountfs() to always do
> > the right thing for it without unlocking devvp, this is too subtle.
> 
> Surely the caller must lock devvp?  Otherwise none of the uses of devvp
> can be trusted, and there are several others.
It must lock, but the interface of ffs_mountfs() would then require
that there is no relock between vn_isdisk() check and call.

I think I know how to make a good compromise there.  I converted the
check for VCHR into the assert.

> There is also ump->um_devvvp, but this seems to be unusable since it
> might go away.
Go away as in being reclaimed, yes.  The vnode itself is there, since
we keep a reference.

> 
> So using the devvp->v_rdev instead of the dev variable is not just a
> style bug.
Might be.


> > 	devvp->v_bufobj.bo_ops = &ffs_ops;
> > 	if (devvp->v_type == VCHR)
> 
> devvp must be still VCHR since this is now under the vnode lock, and we
> depend on dev remaining a character device for the disk described by
> devvp at the time of the vn_isdisk() check.
Unless relocked.

> 
> > -		devvp->v_rdev->si_mountpt = mp;
> > +		dev->si_mountpt = mp;
> > +	VOP_UNLOCK(devvp, 0);
> 
> The unlocking could be a little earlier since dev is still for a disk even
> if devvp went away and you changed this to not used devvp->v_rdev.
> 
> >
> > 	fs = NULL;
> > 	sblockloc = 0;
> 
> Unlocking and then using devvp sure looks like a race.
> 
> You only needed to move the unlocking to fix.  devvp->v_bufobj.  How does
> that work?  The write is now locked, but if devvp goes away, then don't
> we lose its bufobj?
The buffer queues are flushed, and BO_DEAD flag is set.  But the flag
does very little.


> How does any use of ump->um_devvp work?
> 
> I tried revoke(2) on the devvp of a mounted file system.  This worked
> to give v_type = VBAD and v_rdev = NULL, but didn't crash.  ffs_unmount()
> checked for the bad vnode, unlike most places, and failed to clear
> si_mountpt.
> 
> Normal use doesn't have revokes, but if the vnode is reclaimed instead
> of just becoming bad, then worse things probably happen.  I think vnode
> cache resizing gives very unstable storage so the pointer becomes very
> invalid.  But even revoke followed by setting kern.numvnodes to 1 didn't
> crash (15 vnodes remained).  So devvp must be referenced throughout.
> It seems to have reference count 2, since umounting reduced kern.numvnodes
> from 15 to 13.  (It is surprising how much works with kern.maxvnodes=1.
> I was able to run revoke, sysctl and umount.)  It is still a mystery
> that the VBAD vnode doesn't crash soon.
> 
I believe that bo_ops assignment is the reason why UFS mounts survive the
reclamation of the devvp vnode.  Take a look at the ffs_geom_strategy(),
which is the place where UFS io is tunneled directly into geom.  It does
not pass io requests through devfs.  As result, revocation does not
change much except doing unneccessary buf queue flush.

It might be telling to try the same experiment, as conducted in your
next message, on msdosfs instead of UFS.

Below is the simplified patch.

diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
index 712fc21..412b000 100644
--- a/sys/ufs/ffs/ffs_vfsops.c
+++ b/sys/ufs/ffs/ffs_vfsops.c
@@ -764,6 +764,7 @@ ffs_mountfs(devvp, mp, td)
 	cred = td ? td->td_ucred : NOCRED;
 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
 
+	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
 	dev = devvp->v_rdev;
 	dev_ref(dev);
 	DROP_GIANT();
@@ -771,17 +772,17 @@ ffs_mountfs(devvp, mp, td)
 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
 	g_topology_unlock();
 	PICKUP_GIANT();
-	VOP_UNLOCK(devvp, 0);
-	if (error)
+	if (error != 0) {
+		VOP_UNLOCK(devvp, 0);
 		goto out;
-	if (devvp->v_rdev->si_iosize_max != 0)
-		mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
+	}
+	if (dev->si_iosize_max != 0)
+		mp->mnt_iosize_max = dev->si_iosize_max;
 	if (mp->mnt_iosize_max > MAXPHYS)
 		mp->mnt_iosize_max = MAXPHYS;
-
 	devvp->v_bufobj.bo_ops = &ffs_ops;
-	if (devvp->v_type == VCHR)
-		devvp->v_rdev->si_mountpt = mp;
+	dev->si_mountpt = mp;
+	VOP_UNLOCK(devvp, 0);
 
 	fs = NULL;
 	sblockloc = 0;
@@ -1083,8 +1084,7 @@ ffs_mountfs(devvp, mp, td)
 out:
 	if (bp)
 		brelse(bp);
-	if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
-		devvp->v_rdev->si_mountpt = NULL;
+	dev->si_mountpt = NULL;
 	if (cp != NULL) {
 		DROP_GIANT();
 		g_topology_lock();
@@ -1287,8 +1287,7 @@ ffs_unmount(mp, mntflags)
 	g_vfs_close(ump->um_cp);
 	g_topology_unlock();
 	PICKUP_GIANT();
-	if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL)
-		ump->um_devvp->v_rdev->si_mountpt = NULL;
+	ump->um_dev->si_mountpt = NULL;
 	vrele(ump->um_devvp);
 	dev_rel(ump->um_dev);
 	mtx_destroy(UFS_MTX(ump));


From owner-freebsd-fs@freebsd.org  Wed May 18 13:45:30 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B45ACB41F8D
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 13:45:30 +0000 (UTC)
 (envelope-from avg@FreeBSD.org)
Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id B8D231C69
 for <freebsd-fs@FreeBSD.org>; Wed, 18 May 2016 13:45:29 +0000 (UTC)
 (envelope-from avg@FreeBSD.org)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA20283
 for <freebsd-fs@FreeBSD.org>; Wed, 18 May 2016 16:45:27 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1b31mx-000KoE-E0
 for freebsd-fs@FreeBSD.org; Wed, 18 May 2016 16:45:27 +0300
Subject: Fwd: ZFS Encryption Implementation for Review
References: <CAF2oFs_os_JxMuuFm=mxhB_-O+g1TLYTh0FDySQZbX+N1yDPmQ@mail.gmail.com>
To: freebsd-fs <freebsd-fs@FreeBSD.org>
From: Andriy Gapon <avg@FreeBSD.org>
X-Forwarded-Message-Id: <CAF2oFs_os_JxMuuFm=mxhB_-O+g1TLYTh0FDySQZbX+N1yDPmQ@mail.gmail.com>
Message-ID: <1ea0d65f-fc7d-f472-ce0a-f3c74bf08d77@FreeBSD.org>
Date: Wed, 18 May 2016 16:44:31 +0300
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <CAF2oFs_os_JxMuuFm=mxhB_-O+g1TLYTh0FDySQZbX+N1yDPmQ@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 13:45:30 -0000


Just in case people overlooked this information in another thread here.

-------- Forwarded Message --------
Subject: [developer] ZFS Encryption Implementation for Review
Date: Tue, 17 May 2016 17:17:53 -0400
From: Thomas Caputi <tcaputi@datto.com>
To: developer@open-zfs.org

I have created an implementation for native encryption in ZFS. This
implementation is currently available as a PR against ZoL
(https://github.com/zfsonlinux/zfs/pull/4329). I would appreciate it
if this PR could receive a review for consideration. For convenience,
I have pasted the PR's description below.

Thanks,
Tom Caputi

Native encryption in zfsonlinux (See issue #494)

The change incorporates 3 major pieces:

The first is a port of the Illumos Crypto Framework to a Linux kernel
module (found in module/icp). This is needed to do the actual
encryption work. We cannot use the Linux kernel's built in crypto api
because it is only exported to GPL-licensed modules. Having the ICP
also means the crypto code can run on any of the other kernels under
OpenZFS. I ended up porting over most of the internals of the
framework, which means that porting over other API calls (if we need
them) should be fairly easy. Specifically, I have ported over the API
functions related to encryption, digests, macs, and crypto templates.
The ICP is able to use assembly-accelerated encryption on amd64
machines and AES-NI instructions on Intel chips that support it. There
are place-holder directories for similar assembly optimizations for
other architectures (although they have not been written).

The second feature is a keystore that manages wrapping and encryption
keys for encrypted datasets. It has feature parity with Solaris, but
should be more predictable and consistent. It is fully integrated with
the existing zfs create functions and zfs clone functions. It also
exposes a new set of commands via zfs key for managing the keystore.
For more info on the inconsistencies in Solaris see my comment
(https://github.com/zfsonlinux/zfs/issues/494#issuecomment-178853634)
on the issue page. The keystore operates on a few rules:

All wrapping keys are 32 bytes (256 bits), even for 128 and 192 bit
encryption types.

Encryption must be specified at dataset creation time.

Specifying a keysource while creating a dataset causes the dataset to
become the root of an encryption tree.

All members of an encryption tree share the same wrapping key.

Each dataset can have up to 1 keychain (if it is encrypted) that is
not shared with anybody.


The last feature is the actual data and metadata encryption. All data
in an encrypted dataset is stored encrypted on-disk. User-provided
metadata is also encrypted, but metadata structures have been left
plain so that scrubbing and resilvering still works without the keys
loaded. Most of the design comes from this article
(https://blogs.oracle.com/darren/entry/zfs_encryption_what_is_on).
There are a few important distinctions, however. For instance, I store
the encryption IV in the padding of blkptr_t instead of in its third
DVA. I also have L2ARC encryption implemented, which Oracle did not
have at the time.


Implementation details that should be looked at

I created a new DMU_OT_* for keychain objects instead of using the
DMU_OTN() macro. I did this mostly for the ability to to register a
name which helped with debugging. The Keychain objects also seem like
a core enough structure to warrant a new dedicated object type.

The crypto framework has some code bloat to it, particularly in the
form of function stubs in header files that are never actually
implemented. I figured it would be best to leave these in, in case
more functions needed to be ported over.

The in-memory keystore is not the most efficient structure, since it
zeros and frees encryption keys whenever they are not in use. This is
intended as a security measure so that unwrapped keys do not exist in
memory longer than they are needed.

Encrypting data going to disk requires creating a keychain_record_t
during dsl_dataset_tryown(). I added a flag to this function for code
that wishes to own the dataset, but that does not require encrypted
data, such as the scrub functions. I did my best to confirm that all
owners set this flag correctly, but someone should confirm them, just
to be sure.

zfs send and zfs recv do not currently do anything special with
regards to encryption. The format of the send file has not changed and
zfs send requires the keys to be loaded in order to work. At some
point there should probably be a way to do encrypted sends.

I altered the prototype of lzc_create() and lzc_clone(). I understand
that the purpose of libzfs_core is to have a stable api interacting
with the ZFS ioctls. However, these functions need to accept wrapping
keys separately from the rest of their parameters because they need to
use the (new) hidden_args framework to support hiding arguments from
the logs. Without this, the wrapping keys would get printed to the
zpool history.

There is an extra local label that I needed to add to the top of of
the "global" function rijndael_key_setup_enc_intel() in
module/icp/aes_intel.S. For some reason, I had to use the local label
or the module would fail to link. If any assembly experts can tell me
why this is required or a better way to fix it, I would appreciate it.

As part of the L2ARC changes, I added a 8 byte MAC field to
l2arc_buf_hdr_t. I understand that there are a lot of reasons to keep
this struct small since many of them may be allocated at once, but I
do not seem to have another reasonable option here.

The icp is a kernel module that has a directory structure (unlike the
other modules in zfs). There are a few reasons for this. First, the
ICP has assembly code for different CPU architectures and I wanted to
match the structure of libspl. The ICP also has headers that did not
really need to belong in the global zfs headers and so it made sense
to make an includes directory for them locally. The directory
structure also approximately mimics the the structure of the Illumos
Crypto Framework, which will be important for maintainability. As a
result, I had to adjust the build systems to avoid flattening module
directories. This shouldn't matter much, since the other modules were
already flat.


-------------------------------------------
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28133750-22ed9730
Modify Your Subscription:
https://www.listbox.com/member/?member_id=28133750&id_secret=28133750-6c0d6209
Powered by Listbox: http://www.listbox.com

From owner-freebsd-fs@freebsd.org  Wed May 18 13:49:42 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 704B8B3F045
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 13:49:42 +0000 (UTC)
 (envelope-from bfriesen@simple.dallas.tx.us)
Received: from smtp.simplesystems.org (smtp.simplesystems.org [65.66.246.90])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
 bits)) (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3209E1D7B
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 13:49:41 +0000 (UTC)
 (envelope-from bfriesen@simple.dallas.tx.us)
Received: from freddy.simplesystems.org (freddy.simplesystems.org
 [65.66.246.65])
 by smtp.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id u4IDnYR4028198;
 Wed, 18 May 2016 08:49:35 -0500 (CDT)
Date: Wed, 18 May 2016 08:49:34 -0500 (CDT)
From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-X-Sender: bfriesen@freddy.simplesystems.org
To: Alex Tutubalin <lexa@lexa.ru>
cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject: Re: ZFS performance bottlenecks: CPU or RAM or anything else?
In-Reply-To: <f1cc3ee5-c141-b366-83bf-3ee0179381bf@lexa.ru>
Message-ID: <alpine.GSO.2.20.1605180843280.7756@freddy.simplesystems.org>
References: <8441f4c0-f8d1-f540-b928-7ae60998ba8e@lexa.ru>
 <f87ec54a-104e-e712-7793-86c37285fdaa@internetx.com>
 <16e474da-6b20-2e51-9981-3c262eaff350@lexa.ru>
 <BD7DE274-04EB-4B19-988D-5A6FADC5B51A@digsys.bg>
 <1e012e43-a49b-6923-3f0a-ee77a5c8fa70@lexa.ru>
 <86shxgsdzh.fsf@WorkBox.Home>
 <CAHEMsqZto0wD9Ko4E9YUpYvea4jM0E4f2nC1HkAwcCG=6DfX-A@mail.gmail.com>
 <f1cc3ee5-c141-b366-83bf-3ee0179381bf@lexa.ru>
User-Agent: Alpine 2.20 (GSO 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
 (smtp.simplesystems.org [65.66.246.90]); Wed, 18 May 2016 08:49:35 -0500 (CDT)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 13:49:42 -0000

On Wed, 18 May 2016, Alex Tutubalin wrote:
>
> If so, raidz will have huge write performance benefit in my case: single 
> write of one large file.

This is not proven in practice.  With mirrors one typically has more 
vdevs and each vdev gets a zfs block-size write in turn, using a 
round robin agorithm (tuned for available space vdev).  Drive IOPs are 
saved since the blocks are not diced into smaller fragments (as raidzN 
requires).  With raidz it is necessary to also pay the cost of the 
parity computations, which are not needed with mirroring.

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From owner-freebsd-fs@freebsd.org  Wed May 18 23:03:27 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id AA0A0B41494
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 23:03:27 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 95EC516D8
 for <freebsd-fs@freebsd.org>; Wed, 18 May 2016 23:03:27 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 953BCB41493; Wed, 18 May 2016 23:03:27 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 94DC5B41492
 for <fs@mailman.ysv.freebsd.org>; Wed, 18 May 2016 23:03:27 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au
 [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 43CEE16D5
 for <fs@freebsd.org>; Wed, 18 May 2016 23:03:26 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au
 (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109])
 by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id A41E6D6568E;
 Thu, 19 May 2016 09:03:21 +1000 (AEST)
Date: Thu, 19 May 2016 09:03:19 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
cc: fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
In-Reply-To: <20160518110834.GJ89104@kib.kiev.ua>
Message-ID: <20160519065714.H1393@besplex.bde.org>
References: <20160517072104.I2137@besplex.bde.org>
 <20160517084241.GY89104@kib.kiev.ua>
 <20160518061040.D5948@besplex.bde.org> <20160518070252.F6121@besplex.bde.org>
 <20160517220055.GF89104@kib.kiev.ua> <20160518084931.T6534@besplex.bde.org>
 <20160518110834.GJ89104@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=TuMb/2jh c=1 sm=1 tr=0
 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10
 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=FF35ox4yeJKPj-cm5okA:9
 a=ZiNstujT2j9NvtWX:21 a=EXmI02U5j4sSZ_lM:21 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 23:03:27 -0000

On Wed, 18 May 2016, Konstantin Belousov wrote:

> On Wed, May 18, 2016 at 10:00:09AM +1000, Bruce Evans wrote:
>> On Wed, 18 May 2016, Konstantin Belousov wrote:
>>> VCHR check ensures that the devvp vnode is not reclaimed. I do not want
>>> to remove the check and rely on the caller of ffs_mountfs() to always do
>>> the right thing for it without unlocking devvp, this is too subtle.
>>
>> Surely the caller must lock devvp?  Otherwise none of the uses of devvp
>> can be trusted, and there are several others.
> It must lock, but the interface of ffs_mountfs() would then require
> that there is no relock between vn_isdisk() check and call.
>
> I think I know how to make a good compromise there.  I converted the
> check for VCHR into the assert.

But it is very clear there is no re-lock, and that there must be no
re-lock to work ("very clear" relative other complications).
ffs_mountfs() is only called once and only exists to make the function
more readable and debuggable (and auto-inlining it breaks debugging).
Its nearby logic is:

 	namei();			// lock vnode
 	vn_isdisk();			// return if not
 	if (MNT_UPDATE)
 		fail_sometimes();	// locking problems -- see below
 	else
 		ffs_mountfs();		// clearly guaranteed still VCHR

I found another locking problem for revoke.  After mounting /i and revoking
its device, mount -u fails.  This is clearly because its rdev has gone
away.  This makes devvp->v_rdev != ump->um_devvp->v_rdev.  The new devvp
has the old rdev and the old devvp has a null rdev.  This is not really
a locking problem, but the correct behaviour.  Most places just don't
check.

>> There is also ump->um_devvvp, but this seems to be unusable since it
>> might go away.
> Go away as in being reclaimed, yes.  The vnode itself is there, since
> we keep a reference.

I think "reclaimed" is the wrong terminology.  The reference prevents
it being reclaimed by vnlrureclaim(), but doesn't prevent it being
revoked (or vgone()d by a forced unmount of the devfs instance that it
is on).  The reference prevents it being reclaimed even if it is
revoked.  When it is revoked, some but apparently not all of the pointers
in it are cleared or become garbage.  None of them should be used, but
some are.  v_rdev is cleared and we are fairly careful not to follow it,
but we depend on it being cleared and not garbage.  Pointers that are
not cleared include v_bufobj (apparently) and GEOM's hooks related to
v_bufobj, and si_mountpt.  si_mountpt is in the cdev and not in the
vnode.

>> So using the devvp->v_rdev instead of the dev variable is not just a
>> style bug.
> Might be.

In some places.  ump->um_devvp->v_rdev gives the old rdev, and
devvp->v_rdev gives the current rdev provided devvp is locked.  These
can be compared to see if the old rdev was revoked.  Otherwise,
devvp->v_rdev is garbage and both ump->um_dev and ump->um_devvp are
close to garbage -- they are both old and the only correct use of
this is to check that they are still current, but then you have the
current devvp (locked) and can use it instead.

>> ...
>> You only needed to move the unlocking to fix.  devvp->v_bufobj.  How does
>> that work?  The write is now locked, but if devvp goes away, then don't
>> we lose its bufobj?
> The buffer queues are flushed, and BO_DEAD flag is set.  But the flag
> does very little.
>
>> How does any use of ump->um_devvp work?

The problems are similar to the ones with ttys that we are still working
on.  When the device is revoked, there may be many i/o's in progess on
it.  We don't want to block waiting for these, but they should be aborted
before doing any more.  But there are enough stale pointers to even allow
new i/o's.  Enough for tar cf of a complete small file system.

>> I tried revoke(2) on the devvp of a mounted file system.  This worked
>> to give v_type = VBAD and v_rdev = NULL, but didn't crash.  ffs_unmount()
>> checked for the bad vnode, unlike most places, and failed to clear
>> si_mountpt.
>>
>> Normal use doesn't have revokes, but if the vnode is reclaimed instead
>> of just becoming bad, then worse things probably happen.  I think vnode
>> cache resizing gives very unstable storage so the pointer becomes very
>> invalid.  But even revoke followed by setting kern.numvnodes to 1 didn't
>> crash (15 vnodes remained).  So devvp must be referenced throughout.
>> It seems to have reference count 2, since umounting reduced kern.numvnodes
>> from 15 to 13.  (It is surprising how much works with kern.maxvnodes=1.
>> I was able to run revoke, sysctl and umount.)  It is still a mystery
>> that the VBAD vnode doesn't crash soon.
>>
> I believe that bo_ops assignment is the reason why UFS mounts survive the
> reclamation of the devvp vnode.  Take a look at the ffs_geom_strategy(),
> which is the place where UFS io is tunneled directly into geom.  It does
> not pass io requests through devfs.  As result, revocation does not
> change much except doing unneccessary buf queue flush.
>
> It might be telling to try the same experiment, as conducted in your
> next message, on msdosfs instead of UFS.

Everything seems to work exactly the same for msdosfs.  I retried:
- mount; tar; revoke; mount; tar  # (2nd mount succeeds due to revoke)
- mount; tar; mount-another-devfs; mount-using-other-devfs; tar # (2nd
   mount succeeds due to separate devvp).
No crashes.  I didn't risk any rw mounts.

> Below is the simplified patch.
>
> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
> index 712fc21..412b000 100644
> --- a/sys/ufs/ffs/ffs_vfsops.c
> +++ b/sys/ufs/ffs/ffs_vfsops.c
> @@ -764,6 +764,7 @@ ffs_mountfs(devvp, mp, td)
> 	cred = td ? td->td_ucred : NOCRED;
> 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
>
> +	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
> 	dev = devvp->v_rdev;
> 	dev_ref(dev);
> 	DROP_GIANT();

Not needed.

> @@ -771,17 +772,17 @@ ffs_mountfs(devvp, mp, td)
> 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
> 	g_topology_unlock();
> 	PICKUP_GIANT();
> -	VOP_UNLOCK(devvp, 0);
> -	if (error)
> +	if (error != 0) {
> +		VOP_UNLOCK(devvp, 0);
> 		goto out;

Needed.

> -	if (devvp->v_rdev->si_iosize_max != 0)
> -		mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
> +	}
> +	if (dev->si_iosize_max != 0)
> +		mp->mnt_iosize_max = dev->si_iosize_max;
> 	if (mp->mnt_iosize_max > MAXPHYS)
> 		mp->mnt_iosize_max = MAXPHYS;
> -
> 	devvp->v_bufobj.bo_ops = &ffs_ops;
> -	if (devvp->v_type == VCHR)
> -		devvp->v_rdev->si_mountpt = mp;
> +	dev->si_mountpt = mp;
> +	VOP_UNLOCK(devvp, 0);
>
> 	fs = NULL;
> 	sblockloc = 0;

I would keep the unlock as early as possible.  Just move the initialization
of v_bufobj before it.

BTW, I don't like the fixup for > MAXPHYS.  This is removed from all file
systems in my version.  dev->si_iosize_max should be clamped to MAXPHYS
unless larger sizes work, and if larger sizes work then individual file
systems don't know enough to kill using them.

The check for si_iosize_max != 0 is bogus too, but not removed in my
version.  mp->mnt_iosize_max defaults to DFLTPHYS and the check avoids
changing that, but if si_iosize_max remains at 0 then i/o won't actually
work, and if some bug results in si_iosize_max being initialized later
but early enough for some i/o to work, then the default of DFLTPHYS
still won't work if it is larger than the driver size.  g_dev_taste()
actually defaults si_iosize_max to MAXPHYS and I think GEOM hides the
driver iosize_max from file systems so I think si_iosize_max is actually
always MAXPHYS here.

> @@ -1083,8 +1084,7 @@ ffs_mountfs(devvp, mp, td)
> out:
> 	if (bp)
> 		brelse(bp);
> -	if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
> -		devvp->v_rdev->si_mountpt = NULL;
> +	dev->si_mountpt = NULL;
> 	if (cp != NULL) {
> 		DROP_GIANT();
> 		g_topology_lock();

I think this is still racy, but the race is more harmless than most of
the problems from revokes.  I think the following can happen:
- after we unlock, another mount starts and and clobbers our si_mountpt
   with a nonzero value.  Then this clobbers the other mount's si_mountpt
   with a zero value.  The zero value is relatively harmless.  It takes
   either a revoke, a separate devfs instance, or the old multiple-mount-
   allowing code for another mount to start.
The old code has a smaller race window:
- since the vnode is unlocked, it gives a null pointer panic if the
   v_rdev becomes null after it is tested to be non-null, or if it is
   still non-null then using it may clobber another mount's si_mountpt
   if the other mount set si_mountpt races with us.  It takes a revoke
   to get the null pointer.  Clobbering only takes a separate devfs
   instance or the old multiple-mount code.


> @@ -1287,8 +1287,7 @@ ffs_unmount(mp, mntflags)
> 	g_vfs_close(ump->um_cp);
> 	g_topology_unlock();
> 	PICKUP_GIANT();
> -	if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL)
> -		ump->um_devvp->v_rdev->si_mountpt = NULL;
> +	ump->um_dev->si_mountpt = NULL;
> 	vrele(ump->um_devvp);
> 	dev_rel(ump->um_dev);
> 	mtx_destroy(UFS_MTX(ump));

This has the same problems as cleaning up after an error in mount.

I think the following works to prevent multiple mounts via all of the
known buggy paths: early in every fsmount():

 	dev = devvp->v_rdev;
 	if (dev->si_mountpt != NULL) {
 		cleanup();
 		return (EBUSY);
 	}
 	dev->si_mountpt = mp;

This also prevents other mounts racing with us before we complete.  Too
bad if we fail but the other mount would have succeeded.  In fsunmount(),
move clearing si_mountpt to near the end.  I hope si_mountpt is locked
by the device reference and that this makes si_mountpt robust enough to
use as an exclusive access flag.

GEOM's exclusive access counters somehow don't prevent the multiple
mounts.  I think they are too closely associated with the vnode via
v_bufobj.

Bruce

From owner-freebsd-fs@freebsd.org  Thu May 19 00:24:04 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id CE72AB4197B
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 19 May 2016 00:24:04 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id BAF191AD0
 for <freebsd-fs@freebsd.org>; Thu, 19 May 2016 00:24:04 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id B6508B41979; Thu, 19 May 2016 00:24:04 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B5FD1B41978
 for <fs@mailman.ysv.freebsd.org>; Thu, 19 May 2016 00:24:04 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id 55E6F1ACE
 for <fs@freebsd.org>; Thu, 19 May 2016 00:24:03 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au
 (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id C8019425632;
 Thu, 19 May 2016 10:23:54 +1000 (AEST)
Date: Thu, 19 May 2016 10:23:54 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
cc: Konstantin Belousov <kostikbel@gmail.com>, fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
In-Reply-To: <20160519065714.H1393@besplex.bde.org>
Message-ID: <20160519094901.O1798@besplex.bde.org>
References: <20160517072104.I2137@besplex.bde.org>
 <20160517084241.GY89104@kib.kiev.ua>
 <20160518061040.D5948@besplex.bde.org> <20160518070252.F6121@besplex.bde.org>
 <20160517220055.GF89104@kib.kiev.ua> <20160518084931.T6534@besplex.bde.org>
 <20160518110834.GJ89104@kib.kiev.ua> <20160519065714.H1393@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=TuMb/2jh c=1 sm=1 tr=0
 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10
 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=o_ISSmQlpySo1G4zfAAA:9
 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 19 May 2016 00:24:04 -0000

On Thu, 19 May 2016, Bruce Evans wrote:

>  ...
> I think the following works to prevent multiple mounts via all of the
> known buggy paths: early in every fsmount():
>
> 	dev = devvp->v_rdev;
> 	if (dev->si_mountpt != NULL) {
> 		cleanup();
> 		return (EBUSY);
> 	}
> 	dev->si_mountpt = mp;
>
> This also prevents other mounts racing with us before we complete.  Too
> bad if we fail but the other mount would have succeeded.  In fsunmount(),
> move clearing si_mountpt to near the end.  I hope si_mountpt is locked
> by the device reference and that this makes si_mountpt robust enough to
> use as an exclusive access flag.

Nah, the reference is not a lock.  This needs dev_lock() or similar to
be robust.

struct cdef has no documented locking, bug dev_lock() should work and is
probably needed for writes.  It is never used for accesses to si_mountpt
now.  Reads are safe enough since the are of the form
'mp = dev->si_mountpt; if (mp == NULL) dont_use_mp();'.

Bruce

From owner-freebsd-fs@freebsd.org  Thu May 19 02:26:17 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 37314B40360
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 19 May 2016 02:26:17 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 22FD31D2F
 for <freebsd-fs@freebsd.org>; Thu, 19 May 2016 02:26:17 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 22403B4035F; Thu, 19 May 2016 02:26:17 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1F93CB4035E
 for <fs@mailman.ysv.freebsd.org>; Thu, 19 May 2016 02:26:17 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au
 [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id D75C11D2E
 for <fs@freebsd.org>; Thu, 19 May 2016 02:26:16 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au
 (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109])
 by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id A6997D689E5;
 Thu, 19 May 2016 12:20:20 +1000 (AEST)
Date: Thu, 19 May 2016 12:20:19 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
cc: Konstantin Belousov <kostikbel@gmail.com>, fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
In-Reply-To: <20160519094901.O1798@besplex.bde.org>
Message-ID: <20160519120557.A2250@besplex.bde.org>
References: <20160517072104.I2137@besplex.bde.org>
 <20160517084241.GY89104@kib.kiev.ua>
 <20160518061040.D5948@besplex.bde.org> <20160518070252.F6121@besplex.bde.org>
 <20160517220055.GF89104@kib.kiev.ua> <20160518084931.T6534@besplex.bde.org>
 <20160518110834.GJ89104@kib.kiev.ua> <20160519065714.H1393@besplex.bde.org>
 <20160519094901.O1798@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=TuMb/2jh c=1 sm=1 tr=0
 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10
 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=gX_jSoV2WuXo949cYewA:9
 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 19 May 2016 02:26:17 -0000

On Thu, 19 May 2016, Bruce Evans wrote:

> On Thu, 19 May 2016, Bruce Evans wrote:
>
>>  ...
>> I think the following works to prevent multiple mounts via all of the
>> known buggy paths: early in every fsmount():

Here is a lightly tested version:

X Index: ffs_vfsops.c
X ===================================================================
X --- ffs_vfsops.c	(revision 300160)
X +++ ffs_vfsops.c	(working copy)
X @@ -771,18 +771,24 @@
X  	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
X  	g_topology_unlock();
X  	PICKUP_GIANT();
X +	/* XXX: v_bufobj is left set after errors. */
X +	devvp->v_bufobj.bo_ops = &ffs_ops;
X  	VOP_UNLOCK(devvp, 0);

Since v_bufobj isn't cleaned after later errors, I didn't move the unlock
to keep it clean for this error alone.

X  	if (error)
X -		goto out;
X -	if (devvp->v_rdev->si_iosize_max != 0)
X -		mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
X +		goto out1;
X +	dev_lock();
X +	if (dev->si_mountpt != NULL) {
X +		dev_unlock();
X +		error = EBUSY;
X +		goto out1;
X +	}
X +	dev->si_mountpt = mp;
X +	dev_unlock();
X +	if (dev->si_iosize_max != 0)
X +		mp->mnt_iosize_max = dev->si_iosize_max;
X  	if (mp->mnt_iosize_max > MAXPHYS)
X  		mp->mnt_iosize_max = MAXPHYS;
X 
X -	devvp->v_bufobj.bo_ops = &ffs_ops;
X -	if (devvp->v_type == VCHR)
X -		devvp->v_rdev->si_mountpt = mp;
X -
X  	fs = NULL;
X  	sblockloc = 0;
X  	/*
X @@ -1081,10 +1087,14 @@
X  #endif /* !UFS_EXTATTR */
X  	return (0);
X  out:
X +	dev_lock();
X +	if (dev->si_mountpt == NULL)
X +		panic("lost si_mountpt in mount");
X +	dev->si_mountpt = NULL;
X +	dev_unlock();

I don't want the debugging panics or KASSERTs in the final version.

Explicit locking the stores of NULL is probably not needed.  dev_rel()
will soon make these stores visible and other locking and ordering
makes it very unlikely that they become visible too early.

X +out1:
X  	if (bp)
X  		brelse(bp);
X -	if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
X -		devvp->v_rdev->si_mountpt = NULL;
X  	if (cp != NULL) {
X  		DROP_GIANT();
X  		g_topology_lock();
X @@ -1287,8 +1297,11 @@
X  	g_vfs_close(ump->um_cp);
X  	g_topology_unlock();
X  	PICKUP_GIANT();
X -	if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL)
X -		ump->um_devvp->v_rdev->si_mountpt = NULL;
X +	dev_lock();
X +	if (ump->um_dev->si_mountpt == NULL)
X +		panic("lost si_mountpt in unmount");
X +	ump->um_dev->si_mountpt = NULL;
X +	dev_unlock();
X  	vrele(ump->um_devvp);
X  	dev_rel(ump->um_dev);
X  	mtx_destroy(UFS_MTX(ump));

Bruce

From owner-freebsd-fs@freebsd.org  Thu May 19 10:41:36 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 103FCB42EE2
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 19 May 2016 10:41:36 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id ECACE1AB6
 for <freebsd-fs@freebsd.org>; Thu, 19 May 2016 10:41:35 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id EBFB7B42EE1; Thu, 19 May 2016 10:41:35 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id EBA37B42EE0
 for <fs@mailman.ysv.freebsd.org>; Thu, 19 May 2016 10:41:35 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9730B1AB3
 for <fs@freebsd.org>; Thu, 19 May 2016 10:41:35 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u4JAfTdn056064
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Thu, 19 May 2016 13:41:29 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u4JAfTdn056064
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u4JAfS7R056063;
 Thu, 19 May 2016 13:41:28 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Thu, 19 May 2016 13:41:28 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Cc: fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
Message-ID: <20160519104128.GN89104@kib.kiev.ua>
References: <20160517072104.I2137@besplex.bde.org>
 <20160517084241.GY89104@kib.kiev.ua>
 <20160518061040.D5948@besplex.bde.org>
 <20160518070252.F6121@besplex.bde.org>
 <20160517220055.GF89104@kib.kiev.ua>
 <20160518084931.T6534@besplex.bde.org>
 <20160518110834.GJ89104@kib.kiev.ua>
 <20160519065714.H1393@besplex.bde.org>
 <20160519094901.O1798@besplex.bde.org>
 <20160519120557.A2250@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160519120557.A2250@besplex.bde.org>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 19 May 2016 10:41:36 -0000

On Thu, May 19, 2016 at 12:20:19PM +1000, Bruce Evans wrote:
> On Thu, 19 May 2016, Bruce Evans wrote:
> 
> > On Thu, 19 May 2016, Bruce Evans wrote:
> >
> >>  ...
> >> I think the following works to prevent multiple mounts via all of the
> >> known buggy paths: early in every fsmount():
> 
> Here is a lightly tested version:

There is no need to protect the si_mountpt with any locking, the field
itself serves as a lock good enough, also preventing the parallel mounts
of the same devices.  I changed the assignement to atomic_cmpset, which
is enough there.  It is somewhat pity that this would reliably disable
multiple ro mounts of the same volume.

There is no need to move assignment of NULL to dev->si_mountpt later
in ffs_unmount(), the moment where the assignment is performed is safe
for other thread to start another mount.

I still want to keep devvp locked for long enough to cover the bufobj
hacking, and I do not want to move bufobj.bo_ops change before
g_vfs_open() succeed.

I also wanted to remove GIANT dances, but this requires geom patch,
which I will mail separately.

diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
index 712fc21..21425f5 100644
--- a/sys/ufs/ffs/ffs_vfsops.c
+++ b/sys/ufs/ffs/ffs_vfsops.c
@@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td)
 	cred = td ? td->td_ucred : NOCRED;
 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
 
+	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
 	dev = devvp->v_rdev;
 	dev_ref(dev);
+	if (!atomic_cmpset_ptr(&dev->si_mountpt, 0, mp)) {
+		dev_rel(dev);
+		VOP_UNLOCK(devvp, 0);
+		return (EBUSY);
+	}
 	DROP_GIANT();
 	g_topology_lock();
 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
 	g_topology_unlock();
 	PICKUP_GIANT();
-	VOP_UNLOCK(devvp, 0);
-	if (error)
+	if (error != 0) {
+		VOP_UNLOCK(devvp, 0);
 		goto out;
-	if (devvp->v_rdev->si_iosize_max != 0)
-		mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
+	}
+	if (dev->si_iosize_max != 0)
+		mp->mnt_iosize_max = dev->si_iosize_max;
 	if (mp->mnt_iosize_max > MAXPHYS)
 		mp->mnt_iosize_max = MAXPHYS;
-
 	devvp->v_bufobj.bo_ops = &ffs_ops;
-	if (devvp->v_type == VCHR)
-		devvp->v_rdev->si_mountpt = mp;
+	VOP_UNLOCK(devvp, 0);
 
 	fs = NULL;
 	sblockloc = 0;
@@ -1083,8 +1088,6 @@ ffs_mountfs(devvp, mp, td)
 out:
 	if (bp)
 		brelse(bp);
-	if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
-		devvp->v_rdev->si_mountpt = NULL;
 	if (cp != NULL) {
 		DROP_GIANT();
 		g_topology_lock();
@@ -1102,6 +1105,7 @@ out:
 		free(ump, M_UFSMNT);
 		mp->mnt_data = NULL;
 	}
+	dev->si_mountpt = NULL;
 	dev_rel(dev);
 	return (error);
 }
@@ -1287,8 +1291,7 @@ ffs_unmount(mp, mntflags)
 	g_vfs_close(ump->um_cp);
 	g_topology_unlock();
 	PICKUP_GIANT();
-	if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL)
-		ump->um_devvp->v_rdev->si_mountpt = NULL;
+	ump->um_dev->si_mountpt = NULL;
 	vrele(ump->um_devvp);
 	dev_rel(ump->um_dev);
 	mtx_destroy(UFS_MTX(ump));

From owner-freebsd-fs@freebsd.org  Thu May 19 23:27:49 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D4423B42384
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu, 19 May 2016 23:27:49 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id BF5321D43
 for <freebsd-fs@freebsd.org>; Thu, 19 May 2016 23:27:49 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id BAEA7B42381; Thu, 19 May 2016 23:27:49 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id BA8A3B4237E
 for <fs@mailman.ysv.freebsd.org>; Thu, 19 May 2016 23:27:49 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au
 [211.29.132.249])
 by mx1.freebsd.org (Postfix) with ESMTP id 6BED21D42
 for <fs@freebsd.org>; Thu, 19 May 2016 23:27:48 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c110-21-42-169.carlnfd1.nsw.optusnet.com.au
 [110.21.42.169])
 by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 967091049C54;
 Fri, 20 May 2016 09:27:39 +1000 (AEST)
Date: Fri, 20 May 2016 09:27:38 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
cc: fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
In-Reply-To: <20160519104128.GN89104@kib.kiev.ua>
Message-ID: <20160520074427.W1151@besplex.bde.org>
References: <20160517072104.I2137@besplex.bde.org>
 <20160517084241.GY89104@kib.kiev.ua>
 <20160518061040.D5948@besplex.bde.org> <20160518070252.F6121@besplex.bde.org>
 <20160517220055.GF89104@kib.kiev.ua> <20160518084931.T6534@besplex.bde.org>
 <20160518110834.GJ89104@kib.kiev.ua> <20160519065714.H1393@besplex.bde.org>
 <20160519094901.O1798@besplex.bde.org> <20160519120557.A2250@besplex.bde.org>
 <20160519104128.GN89104@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=TuMb/2jh c=1 sm=1 tr=0
 a=kDyANCGC9fy361NNEb9EQQ==:117 a=kDyANCGC9fy361NNEb9EQQ==:17
 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10
 a=138hlgp83k9Wl2frAKkA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 19 May 2016 23:27:49 -0000

On Thu, 19 May 2016, Konstantin Belousov wrote:

> On Thu, May 19, 2016 at 12:20:19PM +1000, Bruce Evans wrote:
>> On Thu, 19 May 2016, Bruce Evans wrote:
>>
>>> On Thu, 19 May 2016, Bruce Evans wrote:
>>>
>>>>  ...
>>>> I think the following works to prevent multiple mounts via all of the
>>>> known buggy paths: early in every fsmount():
>>
>> Here is a lightly tested version:
>
> There is no need to protect the si_mountpt with any locking, the field
> itself serves as a lock good enough, also preventing the parallel mounts
> of the same devices.  I changed the assignement to atomic_cmpset, which
> is enough there.  It is somewhat pity that this would reliably disable
> multiple ro mounts of the same volume.

I used a mutex since it is simpler.

I think your version needs atomic ops for resetting the pointer, and
maybe acquire/release too.  It has locking that is very similar to a
mutex.  Mutexes use _mtx_obtain_lock = atomic_cmpset_acq_ptr and
_mtx_release_lock = atomic_store_rel_ptr.  This is already delicately
weak -- full sequential consistency is not required.  Then on x86,
we (you) only recently finished optimizing atomic_store_rel so that
it is as weak as possible (just a compiler membar before an ordinary
store).

Maybe even weaker locking is enough here, but this is too hard to
understand.

> There is no need to move assignment of NULL to dev->si_mountpt later
> in ffs_unmount(), the moment where the assignment is performed is safe
> for other thread to start another mount.

I already noticed that it was almost as late as possible (could be moved
1 statement later) and not worth moving.

But to even reason about orders, you need atomic releases with acquire/
release semantics.  There are dev_ref() and dev_rel() calls nearby.  The
implementation of these probably has to and in fact does give some ordering.
The details are too hard to understand.  In ffs_unmount() I think it is
actually ordering given by vrele() that makes things work:

> 	PICKUP_GIANT();
> -	if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL)
> -		ump->um_devvp->v_rdev->si_mountpt = NULL;
> +	ump->um_dev->si_mountpt = NULL;
> 	vrele(ump->um_devvp);
> 	dev_rel(ump->um_dev);

We want the store to si_mountpt to become visible before the vnode is
unlocked.  Otherwise, a new mount can lock the vnode and fail with
EBUSY because it sees si_mountpt != NULL.  We have to know implementation
details of vrele() to know that this happens.

> I still want to keep devvp locked for long enough to cover the bufobj
> hacking, and I do not want to move bufobj.bo_ops change before
> g_vfs_open() succeed.

I didn't move it before g_vfs_open(), but before VOP_UNLOCK().  I think
v_bufobj is never cleared, but garbage in it is harmless except in the
multiple-mounts case which is now disallowed.

> I also wanted to remove GIANT dances, but this requires geom patch,
> which I will mail separately.

OK.  I saw the other mail.

> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
> index 712fc21..21425f5 100644
> --- a/sys/ufs/ffs/ffs_vfsops.c
> +++ b/sys/ufs/ffs/ffs_vfsops.c
> @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td)
> 	cred = td ? td->td_ucred : NOCRED;
> 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
>
> +	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
> 	dev = devvp->v_rdev;
> 	dev_ref(dev);
> +	if (!atomic_cmpset_ptr(&dev->si_mountpt, 0, mp)) {
> +		dev_rel(dev);
> +		VOP_UNLOCK(devvp, 0);
> +		return (EBUSY);
> +	}

This is cleaner and safer than my version.

> 	DROP_GIANT();
> 	g_topology_lock();
> 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);

g_vfs_open() already sets devvp->v_bufobj.bo_ops to g_vfs_bufops unless
it fails.  This clobbered our setting in the buggy multiple-mount case.
But with multiple mounts not allowed, this cleans up any garbage in
v_bufobj.

g_vfs_open() has 2 failures for non-exclusive access.  It starts by
checking v_bufobj.bo_private == devvp (this is after translating its
pointers to the ones passed here).  This is avg's fix for the multiple-
mounts problem (r206130).  It doesn't work in all cases.  I think this
is unecessary now.

Later, g_vfs_open() does a g_access() check for exclusive-enough access.
This is supposed to allow multiple mounts at least when all are ro.  I
thought that avg modified this, but he actually did something different.
I think this check only failed in buggy cases where multiple mounts were
allowed.  Our changes should make it never fail.  It still returns the
wrong errno (some general one return by g_access() instead of the one
documented for mount() --  this is EBUSY).

> 	g_topology_unlock();
> 	PICKUP_GIANT();
> -	VOP_UNLOCK(devvp, 0);

I don't like moving this below devvp accesses.  It locks devvp, not dev.

> -	if (error)
> +	if (error != 0) {
> +		VOP_UNLOCK(devvp, 0);
> 		goto out;
> -	if (devvp->v_rdev->si_iosize_max != 0)
> -		mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
> +	}
> +	if (dev->si_iosize_max != 0)
> +		mp->mnt_iosize_max = dev->si_iosize_max;

dev->si_iosize_max is locked by its undocumented lifetime.  It is invariant
since some previous time.

> 	if (mp->mnt_iosize_max > MAXPHYS)
> 		mp->mnt_iosize_max = MAXPHYS;
> -
> 	devvp->v_bufobj.bo_ops = &ffs_ops;

This needs to be before the vnode unlock of course.

I don't like the complication to avoid setting this if we g_vfs_open_fails,
but this at least makes it obvious that we don't set it to garbage when
g_vfs_open_fails.  In other error cases, and even after unmount, I think
v_bufobj is left as garbage.

I now see another cleanup: don't goto out when g_vfs_open() fails.  This
depends on it setting cp to NULL and leaving nothing to clean when it
fails.  It has no man page and this detail is documented in its source
code.

> -	if (devvp->v_type == VCHR)
> -		devvp->v_rdev->si_mountpt = mp;
> +	VOP_UNLOCK(devvp, 0);
>
> 	fs = NULL;
> 	sblockloc = 0;
> @@ -1083,8 +1088,6 @@ ffs_mountfs(devvp, mp, td)
> out:
> 	if (bp)
> 		brelse(bp);
> -	if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
> -		devvp->v_rdev->si_mountpt = NULL;
> 	if (cp != NULL) {
> 		DROP_GIANT();
> 		g_topology_lock();
> @@ -1102,6 +1105,7 @@ out:
> 		free(ump, M_UFSMNT);
> 		mp->mnt_data = NULL;
> 	}
> +	dev->si_mountpt = NULL;

This should remain before the vnode unlock.  Otherwise a new mount can
fail unnecessarily.

> 	dev_rel(dev);
> 	return (error);
> }
> @@ -1287,8 +1291,7 @@ ffs_unmount(mp, mntflags)
> 	g_vfs_close(ump->um_cp);
> 	g_topology_unlock();
> 	PICKUP_GIANT();
> -	if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL)
> -		ump->um_devvp->v_rdev->si_mountpt = NULL;
> +	ump->um_dev->si_mountpt = NULL;
> 	vrele(ump->um_devvp);
> 	dev_rel(ump->um_dev);

This order is better for avoiding unnecessary failure for new mounts, but
now I'm not sure if it is right.  Anyway, it matters less to get an
unnecessary failures for a new mount after a long-lived old mount than
after a failed mount, so the cleanup shouldn't be stricter than here.

> 	mtx_destroy(UFS_MTX(ump));
>

Bruce

From owner-freebsd-fs@freebsd.org  Fri May 20 00:27:12 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5DF15B415A1
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 00:27:12 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 48E3517E6
 for <freebsd-fs@freebsd.org>; Fri, 20 May 2016 00:27:12 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 482FCB4159F; Fri, 20 May 2016 00:27:12 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 47D62B4159E
 for <fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 00:27:12 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au
 [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id F33AD17E5
 for <fs@freebsd.org>; Fri, 20 May 2016 00:27:11 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c110-21-42-169.carlnfd1.nsw.optusnet.com.au
 [110.21.42.169])
 by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 36948D6F26C;
 Fri, 20 May 2016 10:11:39 +1000 (AEST)
Date: Fri, 20 May 2016 10:11:38 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
cc: Konstantin Belousov <kostikbel@gmail.com>, fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
In-Reply-To: <20160520074427.W1151@besplex.bde.org>
Message-ID: <20160520095504.X1527@besplex.bde.org>
References: <20160517072104.I2137@besplex.bde.org>
 <20160517084241.GY89104@kib.kiev.ua>
 <20160518061040.D5948@besplex.bde.org> <20160518070252.F6121@besplex.bde.org>
 <20160517220055.GF89104@kib.kiev.ua> <20160518084931.T6534@besplex.bde.org>
 <20160518110834.GJ89104@kib.kiev.ua> <20160519065714.H1393@besplex.bde.org>
 <20160519094901.O1798@besplex.bde.org> <20160519120557.A2250@besplex.bde.org>
 <20160519104128.GN89104@kib.kiev.ua> <20160520074427.W1151@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=c+ZWOkJl c=1 sm=1 tr=0
 a=kDyANCGC9fy361NNEb9EQQ==:117 a=kDyANCGC9fy361NNEb9EQQ==:17
 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10
 a=_iGiZFj0rCtMW9D2ktwA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 May 2016 00:27:12 -0000

PS:

On Fri, 20 May 2016, Bruce Evans wrote:

> On Thu, 19 May 2016, Konstantin Belousov wrote:
>> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
>> index 712fc21..21425f5 100644
>> --- a/sys/ufs/ffs/ffs_vfsops.c
>> +++ b/sys/ufs/ffs/ffs_vfsops.c
>> @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td)
>> 	cred = td ? td->td_ucred : NOCRED;
>> 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
>> 
>> +	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));

I still don't like this.  The source code tends to fill up with assertions
(and comments) about simple things.

>> 	dev = devvp->v_rdev;
>> 	dev_ref(dev);
>> +	if (!atomic_cmpset_ptr(&dev->si_mountpt, 0, mp)) {

I used != 0.

>> @@ -1083,8 +1088,6 @@ ffs_mountfs(devvp, mp, td)
>> out:
>> 	if (bp)
>> 		brelse(bp);
>> -	if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
>> -		devvp->v_rdev->si_mountpt = NULL;
>> 	if (cp != NULL) {
>> 		DROP_GIANT();
>> 		g_topology_lock();
>> @@ -1102,6 +1105,7 @@ out:
>> 		free(ump, M_UFSMNT);
>> 		mp->mnt_data = NULL;
>> 	}
>> +	dev->si_mountpt = NULL;
>
> This should remain before the vnode unlock.  Otherwise a new mount can
> fail unnecessarily.
>
>> 	dev_rel(dev);
>> 	return (error);
>> }

Oops.  The vnode lock is not held here, so the order in ffs_mount() cannot
be duplicated.  I moved the resetting of si_mountpt down to here too.

Not locking here and elsewhere makes the locking for v_bufobj even
more obscure.  v_bufobj is a whole struct living in the vnode.  It has
no locking annotation, but has a style bug (a stray '*') where its
locking annotation should be.  g_vfs_open() sets sc->sc_bo to
&vp->v_bufobj and I think most uses of this don't lock the vnode or
check if has been revoked.  Perhaps ro accesses are OK (revoke() must
not clean v_bufobj).  Cleaning v_bufobj on mount failure without the
vnode lock would be a bug.  I think it is just not cleaned or used
until the next mount changes it.

Bruce

From owner-freebsd-fs@freebsd.org  Fri May 20 03:22:22 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2FA42B43A04
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 03:22:22 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 1A74E10B3
 for <freebsd-fs@freebsd.org>; Fri, 20 May 2016 03:22:22 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 19AEBB43A03; Fri, 20 May 2016 03:22:22 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 170E3B43A02
 for <fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 03:22:22 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id B6B7110B2
 for <fs@freebsd.org>; Fri, 20 May 2016 03:22:21 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c110-21-42-169.carlnfd1.nsw.optusnet.com.au
 [110.21.42.169])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 9587242AB98;
 Fri, 20 May 2016 13:22:13 +1000 (AEST)
Date: Fri, 20 May 2016 13:22:09 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
cc: Konstantin Belousov <kostikbel@gmail.com>, fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
In-Reply-To: <20160520095504.X1527@besplex.bde.org>
Message-ID: <20160520120927.V2190@besplex.bde.org>
References: <20160517072104.I2137@besplex.bde.org>
 <20160517084241.GY89104@kib.kiev.ua>
 <20160518061040.D5948@besplex.bde.org> <20160518070252.F6121@besplex.bde.org>
 <20160517220055.GF89104@kib.kiev.ua> <20160518084931.T6534@besplex.bde.org>
 <20160518110834.GJ89104@kib.kiev.ua> <20160519065714.H1393@besplex.bde.org>
 <20160519094901.O1798@besplex.bde.org> <20160519120557.A2250@besplex.bde.org>
 <20160519104128.GN89104@kib.kiev.ua> <20160520074427.W1151@besplex.bde.org>
 <20160520095504.X1527@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=TuMb/2jh c=1 sm=1 tr=0
 a=kDyANCGC9fy361NNEb9EQQ==:117 a=kDyANCGC9fy361NNEb9EQQ==:17
 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10
 a=QWTFWtZIVJHxjm3Yne0A:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 May 2016 03:22:22 -0000

PS2:

On Fri, 20 May 2016, Bruce Evans wrote:

> PS:
>
> On Fri, 20 May 2016, Bruce Evans wrote:
>
>> On Thu, 19 May 2016, Konstantin Belousov wrote:
>>> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
>>> index 712fc21..21425f5 100644
>>> --- a/sys/ufs/ffs/ffs_vfsops.c
>>> +++ b/sys/ufs/ffs/ffs_vfsops.c
>>> @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td)
>>> 	cred = td ? td->td_ucred : NOCRED;
>>> 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
>>> 
>>> +	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
>
> I still don't like this.  The source code tends to fill up with assertions
> (and comments) about simple things.
>
>>> 	dev = devvp->v_rdev;
>>> 	dev_ref(dev);
>>> +	if (!atomic_cmpset_ptr(&dev->si_mountpt, 0, mp)) {
>
> I used != 0.

All file systems need this of course.

zfs doesn't use g_vfs_open(), so how can it possibly work to give
exclusive access to the device in contention with other mount
operations?  I think it doesn't even try.

Old code used vfs_mountedon() here.  vfs_mountedon() was just the above cmp
(but not set) in an extern function, with no obvious locking and an a
differently bad name for si_mountpt (it was si_mountpoint.  The correct
name is si_mp).  The vnode should be locked, but this was only enough if
the old aliasing code gave a unique vnode.  Old code also returned EBUSY
if vcount(devvp) > 1 && devvp != rootvp.  Here rootvp is special to support
some old hack involving abusing the swap device for miniroot.  This was
supposed to have been replaced by g_access() checks in g_vfs_open(), but
those aren't exclusive enough, so there is another check in g_vfs_open()
but that isn't exclusive enough so we are trying to fix it now.

zfs_mount() seems to have no exclusivity check at all, except in the
illumos case it has the old vcount() check with rootvp hack (spelled
differently as (v_flag & VROOT)).

zfs might support multiple mounts, but it can only do that for itself
and the vcount() check normally prevents this for the illumos case.

vfs_mountedon() is as good an interface as any for checking for exclusive
access in a shared way (except it probably can't support multiple mounts
like g_vfs_open() is supposed to).  It was in 4.4BSD-Lite.  4.4BSD-Lite
doesn't have si_mountp[oin]t.  It used old alias stuff which is relatively
easy to understand for it.  It searches the list of aliases and skips ones
whose type differs (these are presumably revoked ones).  The aliases are
vnodes with a common rdev.  Each vnode has a V_MOUNTEDON flag.  I wonder
if current bugs affected this too -- after revoke, the device is still
mounted but its vnode is too messed up to show this.

vfs_mountedon() in FreeBSD-3 is similar to in 4.4BSD-Lite.  In FreeBSD-4,
it is an in-between version that looks broken since it doesn't have the
alias loop; it depends on vp->v_specmountpoint being the same for all
aliases even with only 1 of the aliases locked.

Bruce

From owner-freebsd-fs@freebsd.org  Fri May 20 08:01:31 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E3A67B43D0F
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 08:01:31 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id D406D1E3E
 for <freebsd-fs@FreeBSD.org>; Fri, 20 May 2016 08:01:31 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u4K81Vsk000577
 for <freebsd-fs@FreeBSD.org>; Fri, 20 May 2016 08:01:31 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 209580] ZFS and geli broken with INVARIANTS enabled
Date: Fri, 20 May 2016 08:01:31 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.3-RELEASE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: linimon@FreeBSD.org
X-Bugzilla-Status: New
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: assigned_to
Message-ID: <bug-209580-3630-Z9dRJ5ccdP@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-209580-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-209580-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 May 2016 08:01:32 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209580

Mark Linimon <linimon@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|freebsd-bugs@FreeBSD.org    |freebsd-fs@FreeBSD.org

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Fri May 20 08:02:50 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2616DB43E2E
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 08:02:50 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 1700010B5
 for <freebsd-fs@FreeBSD.org>; Fri, 20 May 2016 08:02:50 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u4K82nLL020453
 for <freebsd-fs@FreeBSD.org>; Fri, 20 May 2016 08:02:49 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 209571] ZFS and NVMe performing poorly. TRIM requests stall I/O
 activity
Date: Fri, 20 May 2016 08:02:50 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.3-RELEASE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: linimon@FreeBSD.org
X-Bugzilla-Status: New
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: assigned_to
Message-ID: <bug-209571-3630-rBP2lR2GPt@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-209571-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-209571-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 May 2016 08:02:50 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209571

Mark Linimon <linimon@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|freebsd-bugs@FreeBSD.org    |freebsd-fs@FreeBSD.org

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Fri May 20 08:07:28 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E7E54B420CF
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 08:07:28 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id D8B3A13B3
 for <freebsd-fs@FreeBSD.org>; Fri, 20 May 2016 08:07:28 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u4K87SSP033592
 for <freebsd-fs@FreeBSD.org>; Fri, 20 May 2016 08:07:28 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 209508] zfs import assertion failed in avl_add()
Date: Fri, 20 May 2016 08:07:29 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.3-RELEASE
X-Bugzilla-Keywords: patch
X-Bugzilla-Severity: Affects Only Me
X-Bugzilla-Who: linimon@FreeBSD.org
X-Bugzilla-Status: New
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: keywords assigned_to
Message-ID: <bug-209508-3630-Om25GXLN3v@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-209508-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-209508-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 May 2016 08:07:29 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209508

Mark Linimon <linimon@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |patch
           Assignee|freebsd-bugs@FreeBSD.org    |freebsd-fs@FreeBSD.org

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Fri May 20 08:12:28 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 76367B426CB
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 08:12:28 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 66F8F1CBB
 for <freebsd-fs@FreeBSD.org>; Fri, 20 May 2016 08:12:28 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u4K8CS1R046912
 for <freebsd-fs@FreeBSD.org>; Fri, 20 May 2016 08:12:28 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [Bug 209396] ZFS primarycache attribute affects secondary cache as
 well
Date: Fri, 20 May 2016 08:12:28 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.3-RELEASE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Only Me
X-Bugzilla-Who: linimon@FreeBSD.org
X-Bugzilla-Status: New
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: assigned_to
Message-ID: <bug-209396-3630-x54lpR44Fi@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-209396-3630@https.bugs.freebsd.org/bugzilla/>
References: <bug-209396-3630@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 May 2016 08:12:28 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209396

Mark Linimon <linimon@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|freebsd-bugs@FreeBSD.org    |freebsd-fs@FreeBSD.org

--=20
You are receiving this mail because:
You are the assignee for the bug.=

From owner-freebsd-fs@freebsd.org  Fri May 20 09:23:56 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9476CB42511
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 09:23:56 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 7BA6110CB
 for <freebsd-fs@freebsd.org>; Fri, 20 May 2016 09:23:56 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 7AF17B42510; Fri, 20 May 2016 09:23:56 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7A8A1B4250E
 for <fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 09:23:56 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 0C0EE10C9
 for <fs@freebsd.org>; Fri, 20 May 2016 09:23:55 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u4K9Nnir092713
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Fri, 20 May 2016 12:23:49 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u4K9Nnir092713
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u4K9NmVK092712;
 Fri, 20 May 2016 12:23:48 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Fri, 20 May 2016 12:23:48 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Cc: fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
Message-ID: <20160520092348.GV89104@kib.kiev.ua>
References: <20160518061040.D5948@besplex.bde.org>
 <20160518070252.F6121@besplex.bde.org>
 <20160517220055.GF89104@kib.kiev.ua>
 <20160518084931.T6534@besplex.bde.org>
 <20160518110834.GJ89104@kib.kiev.ua>
 <20160519065714.H1393@besplex.bde.org>
 <20160519094901.O1798@besplex.bde.org>
 <20160519120557.A2250@besplex.bde.org>
 <20160519104128.GN89104@kib.kiev.ua>
 <20160520074427.W1151@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160520074427.W1151@besplex.bde.org>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 May 2016 09:23:56 -0000

On Fri, May 20, 2016 at 09:27:38AM +1000, Bruce Evans wrote:
> On Thu, 19 May 2016, Konstantin Belousov wrote:
> 
> > On Thu, May 19, 2016 at 12:20:19PM +1000, Bruce Evans wrote:
> >> On Thu, 19 May 2016, Bruce Evans wrote:
> >>
> >>> On Thu, 19 May 2016, Bruce Evans wrote:
> >>>
> >>>>  ...
> >>>> I think the following works to prevent multiple mounts via all of the
> >>>> known buggy paths: early in every fsmount():
> >>
> >> Here is a lightly tested version:
> >
> > There is no need to protect the si_mountpt with any locking, the field
> > itself serves as a lock good enough, also preventing the parallel mounts
> > of the same devices.  I changed the assignement to atomic_cmpset, which
> > is enough there.  It is somewhat pity that this would reliably disable
> > multiple ro mounts of the same volume.
> 
> I used a mutex since it is simpler.
> 
> I think your version needs atomic ops for resetting the pointer, and
> maybe acquire/release too.  It has locking that is very similar to a
> mutex.  Mutexes use _mtx_obtain_lock = atomic_cmpset_acq_ptr and
> _mtx_release_lock = atomic_store_rel_ptr.  This is already delicately
> weak -- full sequential consistency is not required.  Then on x86,
> we (you) only recently finished optimizing atomic_store_rel so that
> it is as weak as possible (just a compiler membar before an ordinary
> store).
> 
> Maybe even weaker locking is enough here, but this is too hard to
> understand.
Well, I do not think that barriers would add much there, since we really
do not care about two almost parallel mounts to fail, and other locking
provides enough synchronization points.  On the other hand, having
explicit barriers makes si_mountpt act as the real semaphore.  Unlike
mutex, it attributes the ownership of the device to a mount point, and
not to the locking thread.

So I added acq/rel.

> 
> > There is no need to move assignment of NULL to dev->si_mountpt later
> > in ffs_unmount(), the moment where the assignment is performed is safe
> > for other thread to start another mount.
> 
> I already noticed that it was almost as late as possible (could be moved
> 1 statement later) and not worth moving.
> 
> But to even reason about orders, you need atomic releases with acquire/
> release semantics.  There are dev_ref() and dev_rel() calls nearby.  The
> implementation of these probably has to and in fact does give some ordering.
> The details are too hard to understand.  In ffs_unmount() I think it is
> actually ordering given by vrele() that makes things work:
> 
> > 	PICKUP_GIANT();
> > -	if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL)
> > -		ump->um_devvp->v_rdev->si_mountpt = NULL;
> > +	ump->um_dev->si_mountpt = NULL;
> > 	vrele(ump->um_devvp);
> > 	dev_rel(ump->um_dev);
> 
> We want the store to si_mountpt to become visible before the vnode is
> unlocked.  Otherwise, a new mount can lock the vnode and fail with
> EBUSY because it sees si_mountpt != NULL.  We have to know implementation
> details of vrele() to know that this happens.
Yes, we do not care about this window. For this to happen, mount request
must be issued before the unmount request returned, and the mount caller
is not able to prove that his mount attempt was started before the
unmount progressed enough.

> 
> > I still want to keep devvp locked for long enough to cover the bufobj
> > hacking, and I do not want to move bufobj.bo_ops change before
> > g_vfs_open() succeed.
> 
> I didn't move it before g_vfs_open(), but before VOP_UNLOCK().  I think
> v_bufobj is never cleared, but garbage in it is harmless except in the
> multiple-mounts case which is now disallowed.
So did I.

> 
> > I also wanted to remove GIANT dances, but this requires geom patch,
> > which I will mail separately.
> 
> OK.  I saw the other mail.
> 
> > diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
> > index 712fc21..21425f5 100644
> > --- a/sys/ufs/ffs/ffs_vfsops.c
> > +++ b/sys/ufs/ffs/ffs_vfsops.c
> > @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td)
> > 	cred = td ? td->td_ucred : NOCRED;
> > 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
> >
> > +	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
> > 	dev = devvp->v_rdev;
> > 	dev_ref(dev);
> > +	if (!atomic_cmpset_ptr(&dev->si_mountpt, 0, mp)) {
> > +		dev_rel(dev);
> > +		VOP_UNLOCK(devvp, 0);
> > +		return (EBUSY);
> > +	}
> 
> This is cleaner and safer than my version.
> 
> > 	DROP_GIANT();
> > 	g_topology_lock();
> > 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
> 
> g_vfs_open() already sets devvp->v_bufobj.bo_ops to g_vfs_bufops unless
> it fails.  This clobbered our setting in the buggy multiple-mount case.
> But with multiple mounts not allowed, this cleans up any garbage in
> v_bufobj.
Yes, and this orders things.  g_vfs_open() shoudl have devvp locked,
both fo bo manipulations and for vnode_create_vobject() call.
We can only assign to bo_ops after g_vfs_open() was done successfully.

> 
> g_vfs_open() has 2 failures for non-exclusive access.  It starts by
> checking v_bufobj.bo_private == devvp (this is after translating its
> pointers to the ones passed here).  This is avg's fix for the multiple-
> mounts problem (r206130).  It doesn't work in all cases.  I think this
> is unecessary now.
At least it weeds out other devfs mounts.

> 
> Later, g_vfs_open() does a g_access() check for exclusive-enough access.
> This is supposed to allow multiple mounts at least when all are ro.  I
> thought that avg modified this, but he actually did something different.
> I think this check only failed in buggy cases where multiple mounts were
> allowed.  Our changes should make it never fail.  It still returns the
> wrong errno (some general one return by g_access() instead of the one
> documented for mount() --  this is EBUSY).
> 
> > 	g_topology_unlock();
> > 	PICKUP_GIANT();
> > -	VOP_UNLOCK(devvp, 0);
> 
> I don't like moving this below devvp accesses.  It locks devvp, not dev.
> 
> > -	if (error)
> > +	if (error != 0) {
> > +		VOP_UNLOCK(devvp, 0);
> > 		goto out;
> > -	if (devvp->v_rdev->si_iosize_max != 0)
> > -		mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
> > +	}
> > +	if (dev->si_iosize_max != 0)
> > +		mp->mnt_iosize_max = dev->si_iosize_max;
> 
> dev->si_iosize_max is locked by its undocumented lifetime.  It is invariant
> since some previous time.
> 
> > 	if (mp->mnt_iosize_max > MAXPHYS)
> > 		mp->mnt_iosize_max = MAXPHYS;
> > -
> > 	devvp->v_bufobj.bo_ops = &ffs_ops;
> 
> This needs to be before the vnode unlock of course.
> 
> I don't like the complication to avoid setting this if we g_vfs_open_fails,
> but this at least makes it obvious that we don't set it to garbage when
> g_vfs_open_fails.  In other error cases, and even after unmount, I think
> v_bufobj is left as garbage.
> 
> I now see another cleanup: don't goto out when g_vfs_open() fails.  This
> depends on it setting cp to NULL and leaving nothing to clean when it
> fails.  It has no man page and this detail is documented in its source
> code.
Then I would need to add another NULL assignment, VOP_UNLOCK etc.

> 
> > -	if (devvp->v_type == VCHR)
> > -		devvp->v_rdev->si_mountpt = mp;
> > +	VOP_UNLOCK(devvp, 0);
> >
> > 	fs = NULL;
> > 	sblockloc = 0;
> > @@ -1083,8 +1088,6 @@ ffs_mountfs(devvp, mp, td)
> > out:
> > 	if (bp)
> > 		brelse(bp);
> > -	if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
> > -		devvp->v_rdev->si_mountpt = NULL;
> > 	if (cp != NULL) {
> > 		DROP_GIANT();
> > 		g_topology_lock();
> > @@ -1102,6 +1105,7 @@ out:
> > 		free(ump, M_UFSMNT);
> > 		mp->mnt_data = NULL;
> > 	}
> > +	dev->si_mountpt = NULL;
> 
> This should remain before the vnode unlock.  Otherwise a new mount can
> fail unnecessarily.
See above.

> 
> > 	dev_rel(dev);
> > 	return (error);
> > }
> > @@ -1287,8 +1291,7 @@ ffs_unmount(mp, mntflags)
> > 	g_vfs_close(ump->um_cp);
> > 	g_topology_unlock();
> > 	PICKUP_GIANT();
> > -	if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL)
> > -		ump->um_devvp->v_rdev->si_mountpt = NULL;
> > +	ump->um_dev->si_mountpt = NULL;
> > 	vrele(ump->um_devvp);
> > 	dev_rel(ump->um_dev);
> 
> This order is better for avoiding unnecessary failure for new mounts, but
> now I'm not sure if it is right.  Anyway, it matters less to get an
> unnecessary failures for a new mount after a long-lived old mount than
> after a failed mount, so the cleanup shouldn't be stricter than here.
> 
> > 	mtx_destroy(UFS_MTX(ump));
> >

Updated patch to add acq/rel.

diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
index 712fc21..670bb15 100644
--- a/sys/ufs/ffs/ffs_vfsops.c
+++ b/sys/ufs/ffs/ffs_vfsops.c
@@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td)
 	cred = td ? td->td_ucred : NOCRED;
 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
 
+	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
 	dev = devvp->v_rdev;
 	dev_ref(dev);
+	if (atomic_cmpset_acq_ptr(&dev->si_mountpt, 0, mp) != 0) {
+		dev_rel(dev);
+		VOP_UNLOCK(devvp, 0);
+		return (EBUSY);
+	}
 	DROP_GIANT();
 	g_topology_lock();
 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
 	g_topology_unlock();
 	PICKUP_GIANT();
-	VOP_UNLOCK(devvp, 0);
-	if (error)
+	if (error != 0) {
+		VOP_UNLOCK(devvp, 0);
 		goto out;
-	if (devvp->v_rdev->si_iosize_max != 0)
-		mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
+	}
+	if (dev->si_iosize_max != 0)
+		mp->mnt_iosize_max = dev->si_iosize_max;
 	if (mp->mnt_iosize_max > MAXPHYS)
 		mp->mnt_iosize_max = MAXPHYS;
-
 	devvp->v_bufobj.bo_ops = &ffs_ops;
-	if (devvp->v_type == VCHR)
-		devvp->v_rdev->si_mountpt = mp;
+	VOP_UNLOCK(devvp, 0);
 
 	fs = NULL;
 	sblockloc = 0;
@@ -1083,8 +1088,6 @@ ffs_mountfs(devvp, mp, td)
 out:
 	if (bp)
 		brelse(bp);
-	if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
-		devvp->v_rdev->si_mountpt = NULL;
 	if (cp != NULL) {
 		DROP_GIANT();
 		g_topology_lock();
@@ -1102,6 +1105,7 @@ out:
 		free(ump, M_UFSMNT);
 		mp->mnt_data = NULL;
 	}
+	atomic_store_rel_ptr(&dev->si_mountpt, 0);
 	dev_rel(dev);
 	return (error);
 }
@@ -1287,8 +1291,7 @@ ffs_unmount(mp, mntflags)
 	g_vfs_close(ump->um_cp);
 	g_topology_unlock();
 	PICKUP_GIANT();
-	if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL)
-		ump->um_devvp->v_rdev->si_mountpt = NULL;
+	atomic_store_rel_ptr(&ump->um_dev->si_mountpt, 0);
 	vrele(ump->um_devvp);
 	dev_rel(ump->um_dev);
 	mtx_destroy(UFS_MTX(ump));

From owner-freebsd-fs@freebsd.org  Fri May 20 11:22:11 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D416BB43A55
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 11:22:11 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id BEFE21144
 for <freebsd-fs@freebsd.org>; Fri, 20 May 2016 11:22:11 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id BE28AB43A53; Fri, 20 May 2016 11:22:11 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id BDC33B43A52
 for <fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 11:22:11 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au
 [211.29.132.249])
 by mx1.freebsd.org (Postfix) with ESMTP id 6F68A1142
 for <fs@freebsd.org>; Fri, 20 May 2016 11:22:11 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c110-21-42-169.carlnfd1.nsw.optusnet.com.au
 [110.21.42.169])
 by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 036AE104A09F;
 Fri, 20 May 2016 21:22:08 +1000 (AEST)
Date: Fri, 20 May 2016 21:22:08 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
cc: fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
In-Reply-To: <20160520092348.GV89104@kib.kiev.ua>
Message-ID: <20160520194427.W1170@besplex.bde.org>
References: <20160518061040.D5948@besplex.bde.org>
 <20160518070252.F6121@besplex.bde.org>
 <20160517220055.GF89104@kib.kiev.ua> <20160518084931.T6534@besplex.bde.org>
 <20160518110834.GJ89104@kib.kiev.ua> <20160519065714.H1393@besplex.bde.org>
 <20160519094901.O1798@besplex.bde.org> <20160519120557.A2250@besplex.bde.org>
 <20160519104128.GN89104@kib.kiev.ua> <20160520074427.W1151@besplex.bde.org>
 <20160520092348.GV89104@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=TuMb/2jh c=1 sm=1 tr=0
 a=kDyANCGC9fy361NNEb9EQQ==:117 a=kDyANCGC9fy361NNEb9EQQ==:17
 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10
 a=oUMX8HkAuKKT6YiDj2EA:9 a=Yzrsocd7YFqDQDc-:21 a=wDFIkm-ngTmmU6Y6:21
 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 May 2016 11:22:11 -0000

On Fri, 20 May 2016, Konstantin Belousov wrote:

> On Fri, May 20, 2016 at 09:27:38AM +1000, Bruce Evans wrote:
>> On Thu, 19 May 2016, Konstantin Belousov wrote:
>>
>>> On Thu, May 19, 2016 at 12:20:19PM +1000, Bruce Evans wrote:
>>>> On Thu, 19 May 2016, Bruce Evans wrote:
>>>>
>>>>> On Thu, 19 May 2016, Bruce Evans wrote:
>>>>>>  ...
>>>>>> I think the following works to prevent multiple mounts via all of the
>>>>>> known buggy paths: early in every fsmount():
>>>>
>>>> Here is a lightly tested version:

I checked some details for r206130 again:
- r206130 claims to allow only 1 mount per device node.  It actually
   allows only 1 mount per vnode.
- the case of separate vnodes seems to actually work almost as intended.
   This is most easily reached using separate devfs mounts.  It gets the
   access counts right (they are combined).  It has a chance of working
   because the separate vnodes provide a place to attach separate bufobjs.
- the common si_mountpt of course can't work for multiple mounts, but
   clobbering it doesn't have to break anything more than the i/o counts.
- I think it was only intended to allow multiple ro mounts.  However,
   1 rw mount is allowed after any number of ro mounts (using separate
   vnodes after r206130).  ro after rw is also allowed, but then the ro
   mount prints the warning that the fs was not properly dismounted.
- I think the behaviour in the previous point is a side affect of allowing
   fsck to write on a device open ro for a ro mount.  fsck reloads for
   just one of the ro mounts.
So multiple mounts are still too dangerous, and we should do finish
r206130.

>> I think your version needs atomic ops for resetting the pointer, and
>> maybe acquire/release too.  It has locking that is very similar to a
>> ...
>> Maybe even weaker locking is enough here, but this is too hard to
>> understand.
> Well, I do not think that barriers would add much there, since we really
> do not care about two almost parallel mounts to fail, and other locking
> provides enough synchronization points.  On the other hand, having
> explicit barriers makes si_mountpt act as the real semaphore.  Unlike
> mutex, it attributes the ownership of the device to a mount point, and
> not to the locking thread.
>
> So I added acq/rel.

Thanks.

>>> ...
>>> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
>>> index 712fc21..21425f5 100644
>>> --- a/sys/ufs/ffs/ffs_vfsops.c
>>> +++ b/sys/ufs/ffs/ffs_vfsops.c
>>> @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td)
>>> 	cred = td ? td->td_ucred : NOCRED;
>>> 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
>>>
>>> +	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
>>> 	dev = devvp->v_rdev;
>>> 	dev_ref(dev);
>>> +	if (!atomic_cmpset_ptr(&dev->si_mountpt, 0, mp)) {
>>> +		dev_rel(dev);
>>> +		VOP_UNLOCK(devvp, 0);
>>> +		return (EBUSY);
>>> +	}
>>
>> This is cleaner and safer than my version.
>>
>>> 	DROP_GIANT();
>>> 	g_topology_lock();
>>> 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
>>
>> g_vfs_open() already sets devvp->v_bufobj.bo_ops to g_vfs_bufops unless
>> it fails.  This clobbered our setting in the buggy multiple-mount case.
>> But with multiple mounts not allowed, this cleans up any garbage in
>> v_bufobj.
> Yes, and this orders things.  g_vfs_open() shoudl have devvp locked,
> both fo bo manipulations and for vnode_create_vobject() call.
> We can only assign to bo_ops after g_vfs_open() was done successfully.

The atomic cmpset now orders things too.  Is that enough?  It ensures
that an old mount cannot be active.  I don't know if v_bufobj is used
for non-mounts.

Except, for zfs there is no g_vfs_open() to order things, and for all
other file systems there is no atomic cmpset yet.

>> g_vfs_open() has 2 failures for non-exclusive access.  It starts by
>> checking v_bufobj.bo_private == devvp (this is after translating its
>> pointers to the ones passed here).  This is avg's fix for the multiple-
>> mounts problem (r206130).  It doesn't work in all cases.  I think this
>> is unecessary now.
> At least it weeds out other devfs mounts.

Yes, we need it until everything is converted.

>> ...
>> I now see another cleanup: don't goto out when g_vfs_open() fails.  This
>> depends on it setting cp to NULL and leaving nothing to clean when it
>> fails.  It has no man page and this detail is documented in its source
>> code.
> Then I would need to add another NULL assignment, VOP_UNLOCK etc.

g_vfs_open() already sets cp to NULL when it fails, and the cleanup
depends on that now, but it is just as good to depend on no cleanup
being needed on failure.  You do need another dev_rel().

I thought about moving the dev_ref() later to simplify the early returns.
I thought that this didn't quite work.  Now I think it does work, for
obvious reasons:
- the device is attached to a vnode, so it is referenced to prevent it
   going away unless the device is revoked.  It seems to be referenced
   at least 3 times in FreeBSD-9.
- the vnode is locked, so the reference count remains > 0 until we unlock.
So we just need a dev_ref() before the unlock in the non-error case, to
keep the device from going away if it is revoked.

> Updated patch to add acq/rel.
>
> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
> index 712fc21..670bb15 100644
> --- a/sys/ufs/ffs/ffs_vfsops.c
> +++ b/sys/ufs/ffs/ffs_vfsops.c
> @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td)
> 	cred = td ? td->td_ucred : NOCRED;
> 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
>
> +	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));

Hrmph.

> 	dev = devvp->v_rdev;
> 	dev_ref(dev);

Move later...

> +	if (atomic_cmpset_acq_ptr(&dev->si_mountpt, 0, mp) != 0) {

I changed the first 0 to NULL, and this works on i386, but now I remember
that i386 has bogus casts which break detection of type mismatches --
the atomic ptr functions take a [u]intptr_t, not a pointer type, so
NULL won't work if it is ((void *)0).  At least amd64 is still missing
this bug.

> +		dev_rel(dev);

...then this dev_rel() is not needed.

> +		VOP_UNLOCK(devvp, 0);
> +		return (EBUSY);
> +	}
> 	DROP_GIANT();
> 	g_topology_lock();
> 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
> 	g_topology_unlock();
> 	PICKUP_GIANT();
> -	VOP_UNLOCK(devvp, 0);
> -	if (error)
> +	if (error != 0) {
> +		VOP_UNLOCK(devvp, 0);
> 		goto out;

This becomes:

 	if (error != 0) {
 		VOP_UNLOCK(devvp, 0);
 		return (EBUSY);
 	}

Then assign v_bufobj.

Then dev_ref(), just in time for unlocking.

Then unlock.

> -	if (devvp->v_rdev->si_iosize_max != 0)
> -		mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
> +	}
> +	if (dev->si_iosize_max != 0)
> +		mp->mnt_iosize_max = dev->si_iosize_max;
> 	if (mp->mnt_iosize_max > MAXPHYS)
> 		mp->mnt_iosize_max = MAXPHYS;
> -
> 	devvp->v_bufobj.bo_ops = &ffs_ops;
> -	if (devvp->v_type == VCHR)
> -		devvp->v_rdev->si_mountpt = mp;
> +	VOP_UNLOCK(devvp, 0);

This belongs earlier.

>
> 	fs = NULL;
> 	sblockloc = 0;
> ...

We need this in a central function.  g_vfs_open/close() can do it for
all cases except zfs.  This looks like:

 	DROP_GIANT();
 	g_topology_lock();
 	// atomic_cmpset and its error = EBUSY moved to top of g_vfs_open()
 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
 	g_topology_unlock();
 	PICKUP_GIANT();
 	if (error != 0) {
 		VOP_UNLOCK(devvp, 0);
 		return (error);
 	}
 	devvp->v_bufobj.bo_ops = &ffs_ops;
 	dev_ref(dev);
 	VOP_UNLOCK(devvp, 0);
 	if (dev->si_iosize_max != 0)
 		mp->mnt_iosize_max = dev->si_iosize_max;
 	if (mp->mnt_iosize_max > MAXPHYS)
 		mp->mnt_iosize_max = MAXPHYS;

where 2 of 2  lines with GIANT and 3 of 4 lines with iosize_max remain to
be cleaned up.

Resetting si_mountpt in g_vfs_close() is even simpler.  Oops, it also has
to be reset in g_vfs_open() on a later failure there.

Bruce

From owner-freebsd-fs@freebsd.org  Fri May 20 11:33:32 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 21112B43CC8
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 11:33:32 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 0C7FD16F9
 for <freebsd-fs@freebsd.org>; Fri, 20 May 2016 11:33:32 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 0BC46B43CC6; Fri, 20 May 2016 11:33:32 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B67CB43CC5
 for <fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 11:33:32 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id C85E516F8
 for <fs@freebsd.org>; Fri, 20 May 2016 11:33:31 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c110-21-42-169.carlnfd1.nsw.optusnet.com.au
 [110.21.42.169])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id EC3843C76BA;
 Fri, 20 May 2016 21:33:23 +1000 (AEST)
Date: Fri, 20 May 2016 21:33:22 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
cc: Konstantin Belousov <kostikbel@gmail.com>, fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
In-Reply-To: <20160520194427.W1170@besplex.bde.org>
Message-ID: <20160520212839.E1436@besplex.bde.org>
References: <20160518061040.D5948@besplex.bde.org>
 <20160518070252.F6121@besplex.bde.org>
 <20160517220055.GF89104@kib.kiev.ua> <20160518084931.T6534@besplex.bde.org>
 <20160518110834.GJ89104@kib.kiev.ua> <20160519065714.H1393@besplex.bde.org>
 <20160519094901.O1798@besplex.bde.org> <20160519120557.A2250@besplex.bde.org>
 <20160519104128.GN89104@kib.kiev.ua> <20160520074427.W1151@besplex.bde.org>
 <20160520092348.GV89104@kib.kiev.ua> <20160520194427.W1170@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=EfU1O6SC c=1 sm=1 tr=0
 a=kDyANCGC9fy361NNEb9EQQ==:117 a=kDyANCGC9fy361NNEb9EQQ==:17
 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10
 a=mxpNFrNJh-qTCabnfI8A:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 May 2016 11:33:32 -0000

PS (sigh):

On Fri, 20 May 2016, Bruce Evans wrote:

> On Fri, 20 May 2016, Konstantin Belousov wrote:
>
>> On Fri, May 20, 2016 at 09:27:38AM +1000, Bruce Evans wrote:
>>> ...
>>> I now see another cleanup: don't goto out when g_vfs_open() fails.  This
>>> depends on it setting cp to NULL and leaving nothing to clean when it
>>> fails.  It has no man page and this detail is documented in its source
>>> code.
>> Then I would need to add another NULL assignment, VOP_UNLOCK etc.
>
> g_vfs_open() already sets cp to NULL when it fails, and the cleanup
> depends on that now, but it is just as good to depend on no cleanup
> being needed on failure.  You do need another dev_rel().

Oops, you mean another NULL assignment (atomic op) for cleaning up
si_mountpt.  I got that right at the end where I moved things to
g_vfs_open().

Bruce

From owner-freebsd-fs@freebsd.org  Fri May 20 14:27:04 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 57278B426E4
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 14:27:04 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 3E68F1F34
 for <freebsd-fs@freebsd.org>; Fri, 20 May 2016 14:27:04 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 3A065B426E2; Fri, 20 May 2016 14:27:04 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 39A6AB426E0
 for <fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 14:27:04 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id BE0F61F33
 for <fs@freebsd.org>; Fri, 20 May 2016 14:27:03 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u4KEQt3e006945
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Fri, 20 May 2016 17:26:55 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u4KEQt3e006945
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u4KEQsxK006944;
 Fri, 20 May 2016 17:26:54 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Fri, 20 May 2016 17:26:54 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Cc: fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
Message-ID: <20160520142654.GW89104@kib.kiev.ua>
References: <20160517220055.GF89104@kib.kiev.ua>
 <20160518084931.T6534@besplex.bde.org>
 <20160518110834.GJ89104@kib.kiev.ua>
 <20160519065714.H1393@besplex.bde.org>
 <20160519094901.O1798@besplex.bde.org>
 <20160519120557.A2250@besplex.bde.org>
 <20160519104128.GN89104@kib.kiev.ua>
 <20160520074427.W1151@besplex.bde.org>
 <20160520092348.GV89104@kib.kiev.ua>
 <20160520194427.W1170@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160520194427.W1170@besplex.bde.org>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 May 2016 14:27:04 -0000

On Fri, May 20, 2016 at 09:22:08PM +1000, Bruce Evans wrote:
> On Fri, 20 May 2016, Konstantin Belousov wrote:
> 
> > On Fri, May 20, 2016 at 09:27:38AM +1000, Bruce Evans wrote:
> >>> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
> >>> index 712fc21..21425f5 100644
> >>> --- a/sys/ufs/ffs/ffs_vfsops.c
> >>> +++ b/sys/ufs/ffs/ffs_vfsops.c
> >>> @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td)
> >>> 	cred = td ? td->td_ucred : NOCRED;
> >>> 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
> >>>
> >>> +	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
> >>> 	dev = devvp->v_rdev;
> >>> 	dev_ref(dev);
> >>> +	if (!atomic_cmpset_ptr(&dev->si_mountpt, 0, mp)) {
> >>> +		dev_rel(dev);
> >>> +		VOP_UNLOCK(devvp, 0);
> >>> +		return (EBUSY);
> >>> +	}
> >>
> >> This is cleaner and safer than my version.
> >>
> >>> 	DROP_GIANT();
> >>> 	g_topology_lock();
> >>> 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
> >>
> >> g_vfs_open() already sets devvp->v_bufobj.bo_ops to g_vfs_bufops unless
> >> it fails.  This clobbered our setting in the buggy multiple-mount case.
> >> But with multiple mounts not allowed, this cleans up any garbage in
> >> v_bufobj.
> > Yes, and this orders things.  g_vfs_open() shoudl have devvp locked,
> > both fo bo manipulations and for vnode_create_vobject() call.
> > We can only assign to bo_ops after g_vfs_open() was done successfully.
> 
> The atomic cmpset now orders things too.  Is that enough?  It ensures
> that an old mount cannot be active.  I don't know if v_bufobj is used
> for non-mounts.
v_bufobj is logically protected against modifications by the vnode lock.

> 
> Except, for zfs there is no g_vfs_open() to order things, and for all
> other file systems there is no atomic cmpset yet.
> 
> >> g_vfs_open() has 2 failures for non-exclusive access.  It starts by
> >> checking v_bufobj.bo_private == devvp (this is after translating its
> >> pointers to the ones passed here).  This is avg's fix for the multiple-
> >> mounts problem (r206130).  It doesn't work in all cases.  I think this
> >> is unecessary now.
> > At least it weeds out other devfs mounts.
> 
> Yes, we need it until everything is converted.
> 
> >> ...
> >> I now see another cleanup: don't goto out when g_vfs_open() fails.  This
> >> depends on it setting cp to NULL and leaving nothing to clean when it
> >> fails.  It has no man page and this detail is documented in its source
> >> code.
> > Then I would need to add another NULL assignment, VOP_UNLOCK etc.
> 
> g_vfs_open() already sets cp to NULL when it fails, and the cleanup
> depends on that now, but it is just as good to depend on no cleanup
> being needed on failure.  You do need another dev_rel().
> 
> I thought about moving the dev_ref() later to simplify the early returns.
> I thought that this didn't quite work.  Now I think it does work, for
> obvious reasons:
> - the device is attached to a vnode, so it is referenced to prevent it
>    going away unless the device is revoked.  It seems to be referenced
>    at least 3 times in FreeBSD-9.
> - the vnode is locked, so the reference count remains > 0 until we unlock.
> So we just need a dev_ref() before the unlock in the non-error case, to
> keep the device from going away if it is revoked.
Yes, and this is how the current patched code is structured.

> 
> > Updated patch to add acq/rel.
> >
> > diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
> > index 712fc21..670bb15 100644
> > --- a/sys/ufs/ffs/ffs_vfsops.c
> > +++ b/sys/ufs/ffs/ffs_vfsops.c
> > @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td)
> > 	cred = td ? td->td_ucred : NOCRED;
> > 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
> >
> > +	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
> 
> Hrmph.
I want this, it would remove amount of obvious questions.

> 
> > 	dev = devvp->v_rdev;
> > 	dev_ref(dev);
> 
> Move later...
> 
> > +	if (atomic_cmpset_acq_ptr(&dev->si_mountpt, 0, mp) != 0) {
> 
> I changed the first 0 to NULL, and this works on i386, but now I remember
> that i386 has bogus casts which break detection of type mismatches --
> the atomic ptr functions take a [u]intptr_t, not a pointer type, so
> NULL won't work if it is ((void *)0).  At least amd64 is still missing
> this bug.
cmpset_<bar>_ptr() on i386 has cast for old and new parameters to u_int.
store_rel_ptr() on i386 does not cast value to u_int.  As result, NULL
is acceptable for cmpset, but not for store.  I spelled it 0 in all cases.

Hm, I also should add uintptr_t cast for cmpset, otherwise, I suspect,
some arch might be broken.

> 
> > +		dev_rel(dev);
> 
> ...then this dev_rel() is not needed.
> 
> > +		VOP_UNLOCK(devvp, 0);
> > +		return (EBUSY);
> > +	}
> > 	DROP_GIANT();
> > 	g_topology_lock();
> > 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
> > 	g_topology_unlock();
> > 	PICKUP_GIANT();
> > -	VOP_UNLOCK(devvp, 0);
> > -	if (error)
> > +	if (error != 0) {
> > +		VOP_UNLOCK(devvp, 0);
> > 		goto out;
> 
> This becomes:
> 
>  	if (error != 0) {
>  		VOP_UNLOCK(devvp, 0);
>  		return (EBUSY);
>  	}
> 
> Then assign v_bufobj.
> 
> Then dev_ref(), just in time for unlocking.
> 
> Then unlock.
Ok.

> 
> > -	if (devvp->v_rdev->si_iosize_max != 0)
> > -		mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
> > +	}
> > +	if (dev->si_iosize_max != 0)
> > +		mp->mnt_iosize_max = dev->si_iosize_max;
> > 	if (mp->mnt_iosize_max > MAXPHYS)
> > 		mp->mnt_iosize_max = MAXPHYS;
> > -
> > 	devvp->v_bufobj.bo_ops = &ffs_ops;
> > -	if (devvp->v_type == VCHR)
> > -		devvp->v_rdev->si_mountpt = mp;
> > +	VOP_UNLOCK(devvp, 0);
> 
> This belongs earlier.
> 
> >
> > 	fs = NULL;
> > 	sblockloc = 0;
> > ...
> 
> We need this in a central function.  g_vfs_open/close() can do it for
> all cases except zfs.  This looks like:
I might look at this later.

> 
>  	DROP_GIANT();
>  	g_topology_lock();
>  	// atomic_cmpset and its error = EBUSY moved to top of g_vfs_open()
>  	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
>  	g_topology_unlock();
>  	PICKUP_GIANT();
>  	if (error != 0) {
>  		VOP_UNLOCK(devvp, 0);
>  		return (error);
>  	}
>  	devvp->v_bufobj.bo_ops = &ffs_ops;
>  	dev_ref(dev);
>  	VOP_UNLOCK(devvp, 0);
>  	if (dev->si_iosize_max != 0)
>  		mp->mnt_iosize_max = dev->si_iosize_max;
>  	if (mp->mnt_iosize_max > MAXPHYS)
>  		mp->mnt_iosize_max = MAXPHYS;
> 
> where 2 of 2  lines with GIANT and 3 of 4 lines with iosize_max remain to
> be cleaned up.
> 
> Resetting si_mountpt in g_vfs_close() is even simpler.  Oops, it also has
> to be reset in g_vfs_open() on a later failure there.

diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
index 712fc21..65b1891 100644
--- a/sys/ufs/ffs/ffs_vfsops.c
+++ b/sys/ufs/ffs/ffs_vfsops.c
@@ -764,25 +764,30 @@ ffs_mountfs(devvp, mp, td)
 	cred = td ? td->td_ucred : NOCRED;
 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
 
+	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
 	dev = devvp->v_rdev;
-	dev_ref(dev);
+	if (atomic_cmpset_acq_ptr(&dev->si_mountpt, 0, (uintptr_t)mp) != 0) {
+		VOP_UNLOCK(devvp, 0);
+		return (EBUSY);
+	}
 	DROP_GIANT();
 	g_topology_lock();
 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
 	g_topology_unlock();
 	PICKUP_GIANT();
+	if (error != 0) {
+		VOP_UNLOCK(devvp, 0);
+		atomic_store_rel_ptr(&dev->si_mountpt, 0);
+		return (error);
+	}
+	dev_ref(dev);
+	devvp->v_bufobj.bo_ops = &ffs_ops;
 	VOP_UNLOCK(devvp, 0);
-	if (error)
-		goto out;
-	if (devvp->v_rdev->si_iosize_max != 0)
-		mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
+	if (dev->si_iosize_max != 0)
+		mp->mnt_iosize_max = dev->si_iosize_max;
 	if (mp->mnt_iosize_max > MAXPHYS)
 		mp->mnt_iosize_max = MAXPHYS;
 
-	devvp->v_bufobj.bo_ops = &ffs_ops;
-	if (devvp->v_type == VCHR)
-		devvp->v_rdev->si_mountpt = mp;
-
 	fs = NULL;
 	sblockloc = 0;
 	/*
@@ -1083,8 +1088,6 @@ ffs_mountfs(devvp, mp, td)
 out:
 	if (bp)
 		brelse(bp);
-	if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
-		devvp->v_rdev->si_mountpt = NULL;
 	if (cp != NULL) {
 		DROP_GIANT();
 		g_topology_lock();
@@ -1102,6 +1105,7 @@ out:
 		free(ump, M_UFSMNT);
 		mp->mnt_data = NULL;
 	}
+	atomic_store_rel_ptr(&dev->si_mountpt, 0);
 	dev_rel(dev);
 	return (error);
 }
@@ -1287,8 +1291,7 @@ ffs_unmount(mp, mntflags)
 	g_vfs_close(ump->um_cp);
 	g_topology_unlock();
 	PICKUP_GIANT();
-	if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL)
-		ump->um_devvp->v_rdev->si_mountpt = NULL;
+	atomic_store_rel_ptr(&ump->um_dev->si_mountpt, 0);
 	vrele(ump->um_devvp);
 	dev_rel(ump->um_dev);
 	mtx_destroy(UFS_MTX(ump));

From owner-freebsd-fs@freebsd.org  Fri May 20 15:36:56 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C0498B43D92
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 15:36:56 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id A76CA1062
 for <freebsd-fs@freebsd.org>; Fri, 20 May 2016 15:36:56 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id A67A4B43D90; Fri, 20 May 2016 15:36:56 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A621FB43D8F
 for <fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 15:36:56 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4FB651061
 for <fs@freebsd.org>; Fri, 20 May 2016 15:36:56 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u4KFan4k023726
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Fri, 20 May 2016 18:36:50 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u4KFan4k023726
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u4KFan5G023725;
 Fri, 20 May 2016 18:36:49 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Fri, 20 May 2016 18:36:49 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Cc: fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
Message-ID: <20160520153649.GX89104@kib.kiev.ua>
References: <20160518084931.T6534@besplex.bde.org>
 <20160518110834.GJ89104@kib.kiev.ua>
 <20160519065714.H1393@besplex.bde.org>
 <20160519094901.O1798@besplex.bde.org>
 <20160519120557.A2250@besplex.bde.org>
 <20160519104128.GN89104@kib.kiev.ua>
 <20160520074427.W1151@besplex.bde.org>
 <20160520092348.GV89104@kib.kiev.ua>
 <20160520194427.W1170@besplex.bde.org>
 <20160520142654.GW89104@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160520142654.GW89104@kib.kiev.ua>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 May 2016 15:36:56 -0000

On Fri, May 20, 2016 at 05:26:54PM +0300, Konstantin Belousov wrote:
> On Fri, May 20, 2016 at 09:22:08PM +1000, Bruce Evans wrote:
> > On Fri, 20 May 2016, Konstantin Belousov wrote:
> > 
> > > On Fri, May 20, 2016 at 09:27:38AM +1000, Bruce Evans wrote:
> > >>> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
> > >>> index 712fc21..21425f5 100644
> > >>> --- a/sys/ufs/ffs/ffs_vfsops.c
> > >>> +++ b/sys/ufs/ffs/ffs_vfsops.c
> > >>> @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td)
> > >>> 	cred = td ? td->td_ucred : NOCRED;
> > >>> 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
> > >>>
> > >>> +	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
> > >>> 	dev = devvp->v_rdev;
> > >>> 	dev_ref(dev);
> > >>> +	if (!atomic_cmpset_ptr(&dev->si_mountpt, 0, mp)) {
> > >>> +		dev_rel(dev);
> > >>> +		VOP_UNLOCK(devvp, 0);
> > >>> +		return (EBUSY);
> > >>> +	}
> > >>
> > >> This is cleaner and safer than my version.
> > >>
> > >>> 	DROP_GIANT();
> > >>> 	g_topology_lock();
> > >>> 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
> > >>
> > >> g_vfs_open() already sets devvp->v_bufobj.bo_ops to g_vfs_bufops unless
> > >> it fails.  This clobbered our setting in the buggy multiple-mount case.
> > >> But with multiple mounts not allowed, this cleans up any garbage in
> > >> v_bufobj.
> > > Yes, and this orders things.  g_vfs_open() shoudl have devvp locked,
> > > both fo bo manipulations and for vnode_create_vobject() call.
> > > We can only assign to bo_ops after g_vfs_open() was done successfully.
> > 
> > The atomic cmpset now orders things too.  Is that enough?  It ensures
> > that an old mount cannot be active.  I don't know if v_bufobj is used
> > for non-mounts.
> v_bufobj is logically protected against modifications by the vnode lock.
> 
> > 
> > Except, for zfs there is no g_vfs_open() to order things, and for all
> > other file systems there is no atomic cmpset yet.
> > 
> > >> g_vfs_open() has 2 failures for non-exclusive access.  It starts by
> > >> checking v_bufobj.bo_private == devvp (this is after translating its
> > >> pointers to the ones passed here).  This is avg's fix for the multiple-
> > >> mounts problem (r206130).  It doesn't work in all cases.  I think this
> > >> is unecessary now.
> > > At least it weeds out other devfs mounts.
> > 
> > Yes, we need it until everything is converted.
> > 
> > >> ...
> > >> I now see another cleanup: don't goto out when g_vfs_open() fails.  This
> > >> depends on it setting cp to NULL and leaving nothing to clean when it
> > >> fails.  It has no man page and this detail is documented in its source
> > >> code.
> > > Then I would need to add another NULL assignment, VOP_UNLOCK etc.
> > 
> > g_vfs_open() already sets cp to NULL when it fails, and the cleanup
> > depends on that now, but it is just as good to depend on no cleanup
> > being needed on failure.  You do need another dev_rel().
> > 
> > I thought about moving the dev_ref() later to simplify the early returns.
> > I thought that this didn't quite work.  Now I think it does work, for
> > obvious reasons:
> > - the device is attached to a vnode, so it is referenced to prevent it
> >    going away unless the device is revoked.  It seems to be referenced
> >    at least 3 times in FreeBSD-9.
> > - the vnode is locked, so the reference count remains > 0 until we unlock.
> > So we just need a dev_ref() before the unlock in the non-error case, to
> > keep the device from going away if it is revoked.
> Yes, and this is how the current patched code is structured.
> 
> > 
> > > Updated patch to add acq/rel.
> > >
> > > diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
> > > index 712fc21..670bb15 100644
> > > --- a/sys/ufs/ffs/ffs_vfsops.c
> > > +++ b/sys/ufs/ffs/ffs_vfsops.c
> > > @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td)
> > > 	cred = td ? td->td_ucred : NOCRED;
> > > 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
> > >
> > > +	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
> > 
> > Hrmph.
> I want this, it would remove amount of obvious questions.
> 
> > 
> > > 	dev = devvp->v_rdev;
> > > 	dev_ref(dev);
> > 
> > Move later...
> > 
> > > +	if (atomic_cmpset_acq_ptr(&dev->si_mountpt, 0, mp) != 0) {
> > 
> > I changed the first 0 to NULL, and this works on i386, but now I remember
> > that i386 has bogus casts which break detection of type mismatches --
> > the atomic ptr functions take a [u]intptr_t, not a pointer type, so
> > NULL won't work if it is ((void *)0).  At least amd64 is still missing
> > this bug.
> cmpset_<bar>_ptr() on i386 has cast for old and new parameters to u_int.
> store_rel_ptr() on i386 does not cast value to u_int.  As result, NULL
> is acceptable for cmpset, but not for store.  I spelled it 0 in all cases.
> 
> Hm, I also should add uintptr_t cast for cmpset, otherwise, I suspect,
> some arch might be broken.
Even more casts are needed, updated patch is below.

> 
> > 
> > > +		dev_rel(dev);
> > 
> > ...then this dev_rel() is not needed.
> > 
> > > +		VOP_UNLOCK(devvp, 0);
> > > +		return (EBUSY);
> > > +	}
> > > 	DROP_GIANT();
> > > 	g_topology_lock();
> > > 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
> > > 	g_topology_unlock();
> > > 	PICKUP_GIANT();
> > > -	VOP_UNLOCK(devvp, 0);
> > > -	if (error)
> > > +	if (error != 0) {
> > > +		VOP_UNLOCK(devvp, 0);
> > > 		goto out;
> > 
> > This becomes:
> > 
> >  	if (error != 0) {
> >  		VOP_UNLOCK(devvp, 0);
> >  		return (EBUSY);
> >  	}
> > 
> > Then assign v_bufobj.
> > 
> > Then dev_ref(), just in time for unlocking.
> > 
> > Then unlock.
> Ok.
> 
> > 
> > > -	if (devvp->v_rdev->si_iosize_max != 0)
> > > -		mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
> > > +	}
> > > +	if (dev->si_iosize_max != 0)
> > > +		mp->mnt_iosize_max = dev->si_iosize_max;
> > > 	if (mp->mnt_iosize_max > MAXPHYS)
> > > 		mp->mnt_iosize_max = MAXPHYS;
> > > -
> > > 	devvp->v_bufobj.bo_ops = &ffs_ops;
> > > -	if (devvp->v_type == VCHR)
> > > -		devvp->v_rdev->si_mountpt = mp;
> > > +	VOP_UNLOCK(devvp, 0);
> > 
> > This belongs earlier.
> > 
> > >
> > > 	fs = NULL;
> > > 	sblockloc = 0;
> > > ...
> > 
> > We need this in a central function.  g_vfs_open/close() can do it for
> > all cases except zfs.  This looks like:
> I might look at this later.
> 
> > 
> >  	DROP_GIANT();
> >  	g_topology_lock();
> >  	// atomic_cmpset and its error = EBUSY moved to top of g_vfs_open()
> >  	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
> >  	g_topology_unlock();
> >  	PICKUP_GIANT();
> >  	if (error != 0) {
> >  		VOP_UNLOCK(devvp, 0);
> >  		return (error);
> >  	}
> >  	devvp->v_bufobj.bo_ops = &ffs_ops;
> >  	dev_ref(dev);
> >  	VOP_UNLOCK(devvp, 0);
> >  	if (dev->si_iosize_max != 0)
> >  		mp->mnt_iosize_max = dev->si_iosize_max;
> >  	if (mp->mnt_iosize_max > MAXPHYS)
> >  		mp->mnt_iosize_max = MAXPHYS;
> > 
> > where 2 of 2  lines with GIANT and 3 of 4 lines with iosize_max remain to
> > be cleaned up.
> > 
> > Resetting si_mountpt in g_vfs_close() is even simpler.  Oops, it also has
> > to be reset in g_vfs_open() on a later failure there.

diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
index 712fc21..0487c2f 100644
--- a/sys/ufs/ffs/ffs_vfsops.c
+++ b/sys/ufs/ffs/ffs_vfsops.c
@@ -764,25 +764,31 @@ ffs_mountfs(devvp, mp, td)
 	cred = td ? td->td_ucred : NOCRED;
 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
 
+	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
 	dev = devvp->v_rdev;
-	dev_ref(dev);
+	if (atomic_cmpset_acq_ptr((uintptr_t *)&dev->si_mountpt, 0,
+	    (uintptr_t)mp) != 0) {
+		VOP_UNLOCK(devvp, 0);
+		return (EBUSY);
+	}
 	DROP_GIANT();
 	g_topology_lock();
 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
 	g_topology_unlock();
 	PICKUP_GIANT();
+	if (error != 0) {
+		VOP_UNLOCK(devvp, 0);
+		atomic_store_rel_ptr((uintptr_t *)&dev->si_mountpt, 0);
+		return (error);
+	}
+	dev_ref(dev);
+	devvp->v_bufobj.bo_ops = &ffs_ops;
 	VOP_UNLOCK(devvp, 0);
-	if (error)
-		goto out;
-	if (devvp->v_rdev->si_iosize_max != 0)
-		mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
+	if (dev->si_iosize_max != 0)
+		mp->mnt_iosize_max = dev->si_iosize_max;
 	if (mp->mnt_iosize_max > MAXPHYS)
 		mp->mnt_iosize_max = MAXPHYS;
 
-	devvp->v_bufobj.bo_ops = &ffs_ops;
-	if (devvp->v_type == VCHR)
-		devvp->v_rdev->si_mountpt = mp;
-
 	fs = NULL;
 	sblockloc = 0;
 	/*
@@ -1083,8 +1089,6 @@ ffs_mountfs(devvp, mp, td)
 out:
 	if (bp)
 		brelse(bp);
-	if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
-		devvp->v_rdev->si_mountpt = NULL;
 	if (cp != NULL) {
 		DROP_GIANT();
 		g_topology_lock();
@@ -1102,6 +1106,7 @@ out:
 		free(ump, M_UFSMNT);
 		mp->mnt_data = NULL;
 	}
+	atomic_store_rel_ptr((uintptr_t *)&dev->si_mountpt, 0);
 	dev_rel(dev);
 	return (error);
 }
@@ -1287,8 +1292,7 @@ ffs_unmount(mp, mntflags)
 	g_vfs_close(ump->um_cp);
 	g_topology_unlock();
 	PICKUP_GIANT();
-	if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL)
-		ump->um_devvp->v_rdev->si_mountpt = NULL;
+	atomic_store_rel_ptr((uintptr_t *)&ump->um_dev->si_mountpt, 0);
 	vrele(ump->um_devvp);
 	dev_rel(ump->um_dev);
 	mtx_destroy(UFS_MTX(ump));

From owner-freebsd-fs@freebsd.org  Fri May 20 20:31:23 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id F2EB0B44717
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 20 May 2016 20:31:23 +0000 (UTC)
 (envelope-from jessie.carla@mrpwashtech.com)
Received: from mrpwashtech.com (mrpwashtech.com [162.144.116.18])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id CF38F10BF
 for <freebsd-fs@freebsd.org>; Fri, 20 May 2016 20:31:23 +0000 (UTC)
 (envelope-from jessie.carla@mrpwashtech.com)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 d=mrpwashtech.com; s=default; h=Content-Type:MIME-Version:Message-ID:Date:
 Subject:To:From:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID:
 Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
 :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:
 List-Subscribe:List-Post:List-Owner:List-Archive;
 bh=2Irct9d+wbDoWV/U2ladyIORFamJzsDzK+LTF4o8UZQ=; b=GQiS4FhkxeImw+xjQh9ZjJq9vp
 3B+CEymBArftN2WpP99P/bTH/XOZKH8Mwm2PAB6C0E+V25P+rDh/V1rC5xCCFkS4Tab4KuuH4JZAh
 qSjKYlUJcVbF1JVGezcY8jITQad50jOY47twxLmNx9muYEIgj0ggp4ju/ZmRw3DjB+gh3XL20N7Cq
 WV15RVr1u9BAcsFRs6Jx9YnTqOaiWfNPAX0afREUMUv0h+W+TOIwptv9d9l1g2ueqKgyZymthhBf4
 UmX/gnGjHVZ4IhnWaHxIl5Q51KbiEsyxIk6zmakItFrkPWr0EMmSuKXa64YNkYI6tigXy0qHQhJmF
 X/nQYNfg==;
Received: from [103.227.96.165] (port=8586 helo=md3PC)
 by 162-144-193-241.webhostbox.net with esmtpa (Exim 4.87)
 (envelope-from <jessie.carla@mrpwashtech.com>) id 1b2zhL-0002Xw-An
 for freebsd-fs@freebsd.org; Wed, 18 May 2016 11:31:31 +0000
From: "Jessie Carla" <jessie.carla@mrpwashtech.com>
To: <freebsd-fs@freebsd.org>
Subject: Healthcare Email List 2016
Date: Wed, 18 May 2016 06:31:24 -0500
Message-ID: <!&!AAAAAAAAAAAYAAAAAAAAABwMhkwqD0RNtEKD0RVyaYfCgAAAEAAAAGx1BsALIXxFiV0nluC6MsABAAAAAA==@mrpwashtech.com>
MIME-Version: 1.0
X-Mailer: Microsoft Outlook 15.0
Thread-Index: AdGw+LY69vQQ4UTqR++t5MQOARt+sw==
Content-Language: en-us
X-AntiAbuse: This header was added to track abuse,
 please include it with any abuse report
X-AntiAbuse: Primary Hostname - 162-144-193-241.webhostbox.net
X-AntiAbuse: Original Domain - freebsd.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - mrpwashtech.com
X-Get-Message-Sender-Via: 162-144-193-241.webhostbox.net: authenticated_id:
 jessie.carla@mrpwashtech.com
X-Authenticated-Sender: 162-144-193-241.webhostbox.net: jessie.carla@mrpwashtech.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 May 2016 20:31:24 -0000

Hi,

I went to your company profile and understood that would be interested in
Healthcare
Industry List 2016.

We provide Data for any country/state

Our Specialties: Hospitals, Doctors and Physician with specialties,
Radiology, Allergy &
Immunology, Orthopedics, Anesthesiology, Cardiology, Dermatology,
Endocrinology, Diabetes
& Metabolism, Emergency Medicine, Family Practice/General Practice,
Geriatrics, Internal Medicine, Medical Genetics, Neurology, Obstetrics &
Gynecology, Oncology (Cancer), Ophthalmology, Otolaryngology,
Pathology, Pediatrics, Physical Medicine & Rehabilitation, Plastic Surgery,
Preventative Medicine, Psychiatry, Surgery, Urology, and Other.

how can our List benefit your Business?

*  Effortless reach your target audience.

*  lists can be customized as per client requirement.

*  Updated,accurate and validated email lists.

*  Direct personalized marketing.

*  Increase sales and improve your bussiness network.

*  Promote to message to top executives.

*  Increase brand visibility and ROI. 

*  Target the key professtionals such as c-level vp-level excecutives


Note: If Healthcare industry is not relevant to you please reply back with
your Target Market, we have all types of "Target market" available.

We have industries list like: Telecom, manufacturing, Oil & Gas, Pharmacy,
Software, Retail,
Real Estate, Construction, Energy, Government, Banking, Legal,
Transportation, Wholesale, Agriculture, Business Service, Marketing,
Education, Hospitality And Media Internet.


Please let me know your thoughts, 

Regards, 
Jessie Carla 

Database Consultant.
US Data - European Data - Email Append - Data Append - Technology Specific
Data - Email Marketing.


          If you don't wish to receive our newsletters, reply back with
"EXCLUDE ME"
                                      in subject line.


From owner-freebsd-fs@freebsd.org  Sat May 21 01:48:23 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 91E78B430AF
 for <freebsd-fs@mailman.ysv.freebsd.org>; Sat, 21 May 2016 01:48:23 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 7CA4E1739
 for <freebsd-fs@freebsd.org>; Sat, 21 May 2016 01:48:23 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 78465B430AE; Sat, 21 May 2016 01:48:23 +0000 (UTC)
Delivered-To: fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 77ED7B430AD
 for <fs@mailman.ysv.freebsd.org>; Sat, 21 May 2016 01:48:23 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au
 [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id 278E01738
 for <fs@freebsd.org>; Sat, 21 May 2016 01:48:22 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au
 (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109])
 by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id E7CAE784D81;
 Sat, 21 May 2016 11:48:20 +1000 (AEST)
Date: Sat, 21 May 2016 11:48:20 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
cc: Bruce Evans <brde@optusnet.com.au>, fs@freebsd.org
Subject: Re: fix for per-mount i/o counting in ffs
In-Reply-To: <20160520153649.GX89104@kib.kiev.ua>
Message-ID: <20160521111424.T1652@besplex.bde.org>
References: <20160518084931.T6534@besplex.bde.org>
 <20160518110834.GJ89104@kib.kiev.ua>
 <20160519065714.H1393@besplex.bde.org> <20160519094901.O1798@besplex.bde.org>
 <20160519120557.A2250@besplex.bde.org> <20160519104128.GN89104@kib.kiev.ua>
 <20160520074427.W1151@besplex.bde.org> <20160520092348.GV89104@kib.kiev.ua>
 <20160520194427.W1170@besplex.bde.org> <20160520142654.GW89104@kib.kiev.ua>
 <20160520153649.GX89104@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=TuMb/2jh c=1 sm=1 tr=0
 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10
 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=WlGRK_V-j6VVmBx-EXEA:9
 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 21 May 2016 01:48:23 -0000

On Fri, 20 May 2016, Konstantin Belousov wrote:

[This is the version with uintptr_t casts]

> On Fri, May 20, 2016 at 05:26:54PM +0300, Konstantin Belousov wrote:
>> On Fri, May 20, 2016 at 09:22:08PM +1000, Bruce Evans wrote:
>>> On Fri, 20 May 2016, Konstantin Belousov wrote:
>>>
>>>> On Fri, May 20, 2016 at 09:27:38AM +1000, Bruce Evans wrote:
>>>>>> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
>>>>>> index 712fc21..21425f5 100644
>>>>>> --- a/sys/ufs/ffs/ffs_vfsops.c
>>>>>> +++ b/sys/ufs/ffs/ffs_vfsops.c
>>>>>> @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td)
>>>>>> 	cred = td ? td->td_ucred : NOCRED;
>>>>>> 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
>>>>>>
>>>>>> +	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
>>>>>> 	dev = devvp->v_rdev;
>>>>>> 	dev_ref(dev);
>>>>>> +	if (!atomic_cmpset_ptr(&dev->si_mountpt, 0, mp)) {
>>>>>> +		dev_rel(dev);
>>>>>> +		VOP_UNLOCK(devvp, 0);
>>>>>> +		return (EBUSY);
>>>>>> +	}
>[*]
>>> The atomic cmpset now orders things too.  Is that enough?  It ensures
>>> that an old mount cannot be active.  I don't know if v_bufobj is used
>>> for non-mounts.
>> v_bufobj is logically protected against modifications by the vnode lock.

I meant "is it enough if we drop the vnode lock earlier".  Also, what protects
fro later accesses and modifications?

>>>> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
>>>> index 712fc21..670bb15 100644
>>>> --- a/sys/ufs/ffs/ffs_vfsops.c
>>>> +++ b/sys/ufs/ffs/ffs_vfsops.c
>>>> @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td)
>>>> 	cred = td ? td->td_ucred : NOCRED;
>>>> 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
>>>>
>>>> +	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
>>>
>>> Hrmph.
>> I want this, it would remove amount of obvious questions.

But it gives negatively useful runtime checking...

>>>> 	dev = devvp->v_rdev;
>>>> 	dev_ref(dev);

...this gives better runtime checking.  It gives a nice restartable null
pointer trap except in the INVARIANTS case the KASSERT() gives a non-
restartable panic.

>> cmpset_<bar>_ptr() on i386 has cast for old and new parameters to u_int.
>> store_rel_ptr() on i386 does not cast value to u_int.  As result, NULL
>> is acceptable for cmpset, but not for store.  I spelled it 0 in all cases.
>>
>> Hm, I also should add uintptr_t cast for cmpset, otherwise, I suspect,
>> some arch might be broken.
> Even more casts are needed, updated patch is below.

These are necessary, unfortunately.

Perhaps 32-bit arches need the bogus casts more because of the
difference in pointer sizes.  The caller might have a long variable
and expect this to work the same as an int variable on 32-bit arches
because the sizes are the same, but compilers should detect this
type mismatch (for pointers to these types).  On 64-bit arches, callers
must be more careful and use only long variables.  There are also
signedness problems.  Plain int and plain long shouldn't work since
only unsigned variables are supported.  Compilers should detect this
too, but traditionally they were sloppier about this, and we apparently
don't usually enable the warning flag for this.

> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
> index 712fc21..0487c2f 100644
> --- a/sys/ufs/ffs/ffs_vfsops.c
> +++ b/sys/ufs/ffs/ffs_vfsops.c
> @@ -764,25 +764,31 @@ ffs_mountfs(devvp, mp, td)
> 	cred = td ? td->td_ucred : NOCRED;
> 	ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
>
> +	KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));

Hrmph.

> 	dev = devvp->v_rdev;
> -	dev_ref(dev);
> +	if (atomic_cmpset_acq_ptr((uintptr_t *)&dev->si_mountpt, 0,
> +	    (uintptr_t)mp) != 0) {
> +		VOP_UNLOCK(devvp, 0);
> +		return (EBUSY);
> +	}
> 	DROP_GIANT();
> 	g_topology_lock();
> 	error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
> 	g_topology_unlock();
> 	PICKUP_GIANT();
> +	if (error != 0) {
> +		VOP_UNLOCK(devvp, 0);
> +		atomic_store_rel_ptr((uintptr_t *)&dev->si_mountpt, 0);

The store must be before the unlock (since we don't hold a reference to
the dev yet, the dev may go away on revoke after unlock).

Otherwise OK.

> +		return (error);
> +	}
> +	dev_ref(dev);

Bruce

From owner-freebsd-fs@freebsd.org  Sat May 21 04:22:33 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id DC072B445E7
 for <freebsd-fs@mailman.ysv.freebsd.org>; Sat, 21 May 2016 04:22:33 +0000 (UTC)
 (envelope-from pr@fiemail.ru)
Received: from fiemail.ru (fiemail.ru [185.144.30.80])
 (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 972E9171D
 for <freebsd-fs@freebsd.org>; Sat, 21 May 2016 04:22:33 +0000 (UTC)
 (envelope-from pr@fiemail.ru)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=fiemail.ru;
 s=mail; 
 h=Message-Id:Date:From:To:Subject:MIME-Version:Content-Type;
 bh=/n58EBxLWxpv/ZfNcQwwfEgv0FZdmeF6mBi6+NF2nSk=; 
 b=Ju7k2eh9k2W+J4E8oFysQg6SuRkXR5ZjYbByT5P6YbpQ7vNghTRMeIb6lZKCpUAY9XXvX+kig/5sCYL0+7QDkKeP+Ft/mMgochjobYApW0IDywI/OX5pCLcPY3Ie4J/atjlYIETWMskfR8B1EToek95Ohrg/KK4NRH7tvKblVXY=;
Received: by fiemail.ru with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:256)
 (Exim 4.80) id 1b1hjV-0002OX-8w; Sun, 15 May 2016 03:08:25 +0500
MIME-Version: 1.0
Subject: freebsd-fs@freebsd.org Your Account was Blacklisted. Verify
 freebsd-fs@freebsd.org
To: freebsd-fs@freebsd.org
From: Email Admin <pr@fiemail.ru>
Date: Sat, 14 May 2016 15:08:17 -0700
Message-Id: <E1b1hjV-0002OX-8w@fiemail.ru>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Description: Mail message body
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 21 May 2016 04:22:34 -0000


   Your account was blacklisted
   We've detected that your account was blackisted. To serve you better, we=
've required a verification process.

   If you do not delist your email before 16th May 2016, incoming messages =
to your email account will be returned to the sender and you won=92t be abl=
e to send new messages. You will not be able to sync or upload additional i=
tems such as "pdf,jpg,gif,msdocs,jpeg" files to your Drive or add photos.
 Please review your account activity and we'll help you take corrective act=
ion.

  Re-Verify Now
   To opt out or change where you receive security notifications, click her=
e.
 Note: Failure to respond to this message will result to deactivation of ac=
count mailbox from the database.

  Thanks,
  The Webmail Account Team.