From owner-freebsd-fs@FreeBSD.ORG Thu May 22 12:52:20 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A16D79F2 for ; Thu, 22 May 2014 12:52:20 +0000 (UTC) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "NewFS.denninger.net", Issuer "NewFS.denninger.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 649ED2F18 for ; Thu, 22 May 2014 12:52:19 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by fs.denninger.net (8.14.8/8.14.8) with ESMTP id s4MCq8gt014400 for ; Thu, 22 May 2014 07:52:09 -0500 (CDT) (envelope-from karl@denninger.net) Received: from [127.0.0.1] (TLS/SSL) [192.168.1.40] by Spamblock-sys (LOCAL/AUTH); Thu May 22 07:52:09 2014 Message-ID: <537DF2F3.10604@denninger.net> Date: Thu, 22 May 2014 07:52:03 -0500 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Turn off RAID read and write caching with ZFS? References: <719056985.20140522033824@supranet.net> In-Reply-To: <719056985.20140522033824@supranet.net> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha1; boundary="------------ms030208020007040007000000" X-Antivirus: avast! (VPS 140521-1, 05/21/2014), Outbound message X-Antivirus-Status: Clean X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 May 2014 12:52:20 -0000 This is a cryptographically signed message in MIME format. --------------ms030208020007040007000000 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable On 5/22/2014 5:38 AM, Jeff Chan wrote: > As mentioned before we have a server with the LSI 2208 RAID chip which > apparently doesn't seem to have HBA firmware available. (If anyone > knows of one, please let me know.) Therefore we are running each drive= > as separate, individual RAID0, and we've turned off the RAID harware > read and write caching on the claim it performs better with ZFS, such > as: > > > http://forums.freenas.org/index.php?threads/disable-cache-flush.12253/ > > " cyberjock, Apr 7, 2013 > > AAh. You have a RAID controller with on-card RAM. Based on my > testing with 3 different RAID controllers that had RAM and benchmark > and real world tests, here's my recommended settings for ZFS users: > > 1. Disable your on-card write cache. Believe it or not this > improves write performance significantly. I was very disappointed with > this choice, but it seems to be a universal truth. I upgraded one of > the cards to 4GB of cache a few months before going to ZFS and I'm > disappointed that I wasted my money. It helped a LOT on the Windows > server, but in FreeBSD it's a performance killer. :(" > > 2. If your RAID controller supports read-ahead cache, you should > be setting to either "disabled", the most "conservative"(smallest > read-ahead) or "normal"(medium size read-ahead). I found that > "conservative" was better for random reads from lots of users and the > "normal" was better for things where you were constantly reading a > file in order(such as copying a single very large file). If you choose > anything else for the read-ahead size the latency of your zpool will > go way up because any read by the zpool will be multiplied by 100x > because the RAID card is constantly reading a bunch of sectors before > and after the one sector or area requested." > > > > Does anyone have any comments or test results about this? I have not > attempted to test it independently. Should we run with RAID hardware > caching on or off? > That's mostly-right. Write-caching is very evil in a ZFS world, because ZFS checksums each=20 block. If the filesystem gets back an "OK" for a block not actually on=20 the disk ZFS will presume the checksum is ok. If that assumption proves = to be false down the road you're going to have a very bad day. READ caching is not so simple. The problem that comes about is that in=20 order to obtain the best speed from a spinning piece of rust you must=20 read whole tracks. If you don't you take a latency penalty every time=20 you want a sector, because you must wait for the rust to pass under the=20 head. If you read a single sector and then come back to read a second=20 one inter-sector gap sync is lost and you get to wait for another rotatio= n. Therefore what you WANT for spinning rust in virtually all cases is for=20 all reads coming off the rust to be one full **TRACK** in size. If you=20 wind up only using one sector of that track you still don't get hurt=20 materially because you had to wait for the rotational latency anyway as=20 soon as you move the head. Unfortunately this stopped being easy to figure out quite a long time=20 ago in the disk drive world with the sort of certainty that you need to=20 best-optimize workload. It used to be that ST506-style drives had 17=20 sectors per track and RLL 2,7 ones had 26. Then areal density became=20 the limit and variable geometry showed up, frustrating an operating=20 system (or disk controller!) that tried to, at the driver level, issue=20 one DMA command per physical track in an attempt to capitalize on the=20 fact that all but the first sector read for a given rotation were=20 essentially "free". Modern drives typically try to compensate for their=20 variable-geometryness through their own read-ahead cache, but the exact=20 details of their algorithm are typically not exposed. What I would love to find is a "buffered" controller that recognizes all = of this and works as follows: 1. Writes, when committed, are committed and no return is made until=20 storage has written the data and claims it's on the disk. If the=20 sector(s) written are in the buffer memory (from a previous read in 2=20 below) then the write physically alters both the disk AND the buffer. 2. Reads are always one full track in size and go into the buffer memory = on a LRU basis. A read for a sector already in the buffer memory=20 results in no physical I/O taking place. The controller does not store=20 sectors per-se in the buffer, it stores tracks. This requires that the=20 adapter be able to discern the *actual* underlying geometry of the drive = so it knows where track boundaries are. Yes, I know drive caches=20 themselves try to do this, but how well do they manage? Evidence=20 suggests that it's not particularly effective. Without this read cache is a crapshoot that gets difficult to tune and=20 is very workload-dependent in terms of what delivers best performance. =20 All you can do is tune (if you're able with a given controller) and test.= --=20 -- Karl karl@denninger.net --------------ms030208020007040007000000 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFTzCC BUswggQzoAMCAQICAQgwDQYJKoZIhvcNAQEFBQAwgZ0xCzAJBgNVBAYTAlVTMRAwDgYDVQQI EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkqhkiG9w0BCQEWIGN1c3Rv bWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0MB4XDTEzMDgyNDE5MDM0NFoXDTE4MDgyMzE5 MDM0NFowWzELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExFzAVBgNVBAMTDkthcmwg RGVubmluZ2VyMSEwHwYJKoZIhvcNAQkBFhJrYXJsQGRlbm5pbmdlci5uZXQwggIiMA0GCSqG SIb3DQEBAQUAA4ICDwAwggIKAoICAQC5n2KBrBmG22nVntVdvgKCB9UcnapNThrW1L+dq6th d9l4mj+qYMUpJ+8I0rTbY1dn21IXQBoBQmy8t1doKwmTdQ59F0FwZEPt/fGbRgBKVt3Quf6W 6n7kRk9MG6gdD7V9vPpFV41e+5MWYtqGWY3ScDP8SyYLjL/Xgr+5KFKkDfuubK8DeNqdLniV jHo/vqmIgO+6NgzPGPgmbutzFQXlxUqjiNAAKzF2+Tkddi+WKABrcc/EqnBb0X8GdqcIamO5 SyVmuM+7Zdns7D9pcV16zMMQ8LfNFQCDvbCuuQKMDg2F22x5ekYXpwjqTyfjcHBkWC8vFNoY 5aFMdyiN/Kkz0/kduP2ekYOgkRqcShfLEcG9SQ4LQZgqjMpTjSOGzBr3tOvVn5LkSJSHW2Z8 Q0dxSkvFG2/lsOWFbwQeeZSaBi5vRZCYCOf5tRd1+E93FyQfpt4vsrXshIAk7IK7f0qXvxP4 GDli5PKIEubD2Bn+gp3vB/DkfKySh5NBHVB+OPCoXRUWBkQxme65wBO02OZZt0k8Iq0i4Rci WV6z+lQHqDKtaVGgMsHn6PoeYhjf5Al5SP+U3imTjF2aCca1iDB5JOccX04MNljvifXgcbJN nkMgrzmm1ZgJ1PLur/ADWPlnz45quOhHg1TfUCLfI/DzgG7Z6u+oy4siQuFr9QT0MQIDAQAB o4HWMIHTMAkGA1UdEwQCMAAwEQYJYIZIAYb4QgEBBAQDAgWgMAsGA1UdDwQEAwIF4DAsBglg hkgBhvhCAQ0EHxYdT3BlblNTTCBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFHw4 +LnuALyLA5Cgy7T5ZAX1WzKPMB8GA1UdIwQYMBaAFF3U3hpBZq40HB5VM7B44/gmXiI0MDgG CWCGSAGG+EIBAwQrFilodHRwczovL2N1ZGFzeXN0ZW1zLm5ldDoxMTQ0My9yZXZva2VkLmNy bDANBgkqhkiG9w0BAQUFAAOCAQEAZ0L4tQbBd0hd4wuw/YVqEBDDXJ54q2AoqQAmsOlnoxLO 31ehM/LvrTIP4yK2u1VmXtUumQ4Ao15JFM+xmwqtEGsh70RRrfVBAGd7KOZ3GB39FP2TgN/c L5fJKVxOqvEnW6cL9QtvUlcM3hXg8kDv60OB+LIcSE/P3/s+0tEpWPjxm3LHVE7JmPbZIcJ1 YMoZvHh0NSjY5D0HZlwtbDO7pDz9sZf1QEOgjH828fhtborkaHaUI46pmrMjiBnY6ujXMcWD pxtikki0zY22nrxfTs5xDWGxyrc/cmucjxClJF6+OYVUSaZhiiHfa9Pr+41okLgsRB0AmNwE f6ItY3TI8DGCBQowggUGAgEBMIGjMIGdMQswCQYDVQQGEwJVUzEQMA4GA1UECBMHRmxvcmlk YTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExDMRwwGgYD VQQDExNDdWRhIFN5c3RlbXMgTExDIENBMS8wLQYJKoZIhvcNAQkBFiBjdXN0b21lci1zZXJ2 aWNlQGN1ZGFzeXN0ZW1zLm5ldAIBCDAJBgUrDgMCGgUAoIICOzAYBgkqhkiG9w0BCQMxCwYJ KoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNDA1MjIxMjUyMDNaMCMGCSqGSIb3DQEJBDEW BBSvff0Qm625KGVcraX266RLbfn6ODBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIG0BgkrBgEEAYI3EAQxgaYwgaMwgZ0xCzAJBgNV BAYTAlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoT EEN1ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkq hkiG9w0BCQEWIGN1c3RvbWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0AgEIMIG2BgsqhkiG 9w0BCRACCzGBpqCBozCBnTELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNV BAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3Vk YSBTeXN0ZW1zIExMQyBDQTEvMC0GCSqGSIb3DQEJARYgY3VzdG9tZXItc2VydmljZUBjdWRh c3lzdGVtcy5uZXQCAQgwDQYJKoZIhvcNAQEBBQAEggIAK5XZy5GieJaKJXpeuhNaOC6vRDjg 9fs/ovWaEYWt2FLgO2rtb2vFNihDLcsgd2JdbEfo/6/9Z01ve5jng2JbkFWfvDxjpZxQxZ8d jBsn9PASBYwdicfz4Or1A8erlj3tKU1IEmx7zJBkXj8kOFdsftoBo3hhdfBQXmzlFxI1pcKJ j+5KWmMDIUHxbVaf/lRBxjLxsGSrTZkVUtOCjIxR671WJ45+2R9dBRgC4R+Az/mMt530cVq7 bsWlNfnH8nZashxx3omYkMVAFjs81ffKJFKvlTL40PJt4rnoiYc2PxXQftSoORLyeJ8kvJvH 9KHmX1XpRtU2IczgGg9/KcJgYDvmJMTq/SP+lfpR+IPFkDubn24NDnW7VI2st8WomR1kXhtA BSc13h3GeMT67CxyZGzGNu6AY3KrjVGmzPyxvVC9nDj+l4CMz35gCIkx4YjCbwyL4nMp3C8h ex6e8HH4aV5etTja2s65gJXrFwzEVjgsW9NJaazhv6hCc3yWeoa05Vd9W/3NB77VKWZSLXzZ /qTGu8Iaqbm2yhkvezfHaplLgxik/E3zsVF28vSwq+TFg4s5FUdJVLst/D84RGYbRlbiotnL EXGLrP9cu0CBoYWZDe1xZwrodutGCkF4y15ppSsHmUaz+/jw7A3QTKoy72O56Jo7souUM4/5 4r1y7h0AAAAAAAA= --------------ms030208020007040007000000--