From owner-freebsd-geom@freebsd.org Mon Nov 27 13:08:45 2017 Return-Path: Delivered-To: freebsd-geom@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 24E2FE4FCD2 for ; Mon, 27 Nov 2017 13:08:45 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from constantine.ingresso.co.uk (unknown [IPv6:2a02:b90:3002:411::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DA97870619 for ; Mon, 27 Nov 2017 13:08:44 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from dilbert.ingresso.co.uk ([2a02:b90:3002:411::6]) by constantine.ingresso.co.uk with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89 (FreeBSD)) (envelope-from ) id 1eJJ9M-0006ch-Km for freebsd-geom@freebsd.org; Mon, 27 Nov 2017 13:08:40 +0000 Received: from petefrench by dilbert.ingresso.co.uk with local (Exim 4.89 (FreeBSD)) (envelope-from ) id 1eJJ9M-000FSw-GZ for freebsd-geom@freebsd.org; Mon, 27 Nov 2017 13:08:40 +0000 To: freebsd-geom@freebsd.org Subject: GELI strangeness with gstat Message-Id: From: Pete French Date: Mon, 27 Nov 2017 13:08:40 +0000 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Nov 2017 13:08:45 -0000 So, I have a set of machines running rul disc encryption with GELI. The output from gstat on an example one looks something like this: 0 27 3 100 0.4 16 100 0.1 1.1| ada0p4 0 27 3 100 0.9 16 100 0.2 1.3| ada0p4.eli I uapgraded a couple of thme to much faster CPUs - the output then started looking like this: 0 146 0 0 0.0 125 604 0.1 5.7| ada0p4 2 146 0 0 0.0 125 604 0.1 104.3| ada0p4.eli ...so the .eli device is now running at 100% despite the underlying disc only being about 6% busy. This was software encryption - my assumption was that the faster COU's were now enabling me to overload the encryption somehow, so I enabled AES-NI on the COU. Now I have hardware encryption. But the output from gstat still looks the same. Whats going on here ? Its very ouzzlking. What is even odder is that these machines are ina HAST pair, and the secondary side looks fine - i.e. only a few percent busy on the disc and the encrypted device. If I sap roles then the efect persists - the HAST primary has a massively busy ELI device. I realise the oprimary will be doing reads as well as writes, but as you can see from the snapshot above, its not singificany compares to the writes, and the effect is also ther when the load is dominated by writes. I am teoprted to think that gstat is being screwy here, but it bothers me not knowing (especially as I am trying to tarck diwn bottlenecks in the system), Anyone got any opinions on what might be showing up here ? -pete. From owner-freebsd-geom@freebsd.org Mon Nov 27 15:29:22 2017 Return-Path: Delivered-To: freebsd-geom@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5B1ABE54EA9; Mon, 27 Nov 2017 15:29:22 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2766977653; Mon, 27 Nov 2017 15:29:21 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 9910B209F9; Mon, 27 Nov 2017 10:29:14 -0500 (EST) Received: from frontend2 ([10.202.2.161]) by compute6.internal (MEProxy); Mon, 27 Nov 2017 10:29:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsco.org; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; bh=MHh2Qhy/trxoyPVOFzbQh5tKlSTVu Pvo190/Bdb/M5w=; b=DpDiqHRqwco+jcWBg0YjR06Th+9defM8k+vSKzMdzLT4i MVUEsVLNvKgxeqhql3PemZvGgH5wuHpOGiU/XHp/ZecDHTTY6COOZ5i361tI3Ccg dK1XGPfx1Ijawa1oiWS034WNGrhcfZEGobc1b1qmwrVQ7DVRVjaCjgPDjWlB7vy3 uUSTIypyJ9O5OT3kg6uFwoq/PVuv1RQ7WprvajNFQs9pmbvj7Og/9bxi+K7D5jgl RaPGzIoRbmIFsHMg6flns2T8Qo/NR79XbNozX1rmkPShigFSCt7ukS1pIP8w+3Cr f8vTJ75PzX/lHi7y6tuBIVpExOCNzgHWpv7oMy3WQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=MHh2Qh y/trxoyPVOFzbQh5tKlSTVuPvo190/Bdb/M5w=; b=A2uJyB7oFCSO0UNCw6TB/V I2kBiWs9brQFLhQfD6q+RLghECWtdg323jvRLzAmmlNm7XpvqWsZ467/4FgNCN7G N7cd5ci1pjLryyl8P8IiLM3aKM73VKY0Iv7OIG3/vxBPZR4Pp7ZJfvfikI6L3qj2 sppsejgEmaEpkwmeingIvxgQAj1/lhWYJrZ+vSEX9PnETEvrIBkYrK5Sr1jVvrAb fRRG724bc30i4Y0iIgRZNH+0i6/BXbWGxJ2+ZuW0HrxDUgQ/76NuagebA0WnD5b5 DqyIIjQy3P12rADec16JZsLV7Vzhta3O4oF4H/0EAD+uPn+icWd9pDI0+6IocBgA == X-ME-Sender: Received: from [192.168.0.119] (unknown [161.97.249.191]) by mail.messagingengine.com (Postfix) with ESMTPA id ED66F24009; Mon, 27 Nov 2017 10:29:13 -0500 (EST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 11.1 \(3445.4.7\)) Subject: Re: add BIO_NORETRY flag, implement support in ata_da, use in ZFS vdev_geom From: Scott Long In-Reply-To: <33101e6c-0c74-34b7-ee92-f9c4a11685d5@FreeBSD.org> Date: Mon, 27 Nov 2017 08:29:11 -0700 Cc: Warner Losh , FreeBSD FS , freebsd-geom@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <783CA790-C0D3-442D-A5A2-4CB424FCBD62@samsco.org> References: <391f2cc7-0036-06ec-b6c9-e56681114eeb@FreeBSD.org> <64f37301-a3d8-5ac4-a25f-4f6e4254ffe9@FreeBSD.org> <39E8D9C4-6BF3-4844-85AD-3568A6D16E64@samsco.org> <33101e6c-0c74-34b7-ee92-f9c4a11685d5@FreeBSD.org> To: Andriy Gapon X-Mailer: Apple Mail (2.3445.4.7) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Nov 2017 15:29:22 -0000 > On Nov 25, 2017, at 10:36 AM, Andriy Gapon wrote: >=20 > On 25/11/2017 12:54, Scott Long wrote: >>=20 >>=20 >>> On Nov 24, 2017, at 10:17 AM, Andriy Gapon wrote: >>>=20 >>>=20 >>>>> IMO, this is not optimal. I'd rather pass BIO_NORETRY to the = first read, get >>>>> the error back sooner and try the other disk sooner. Only if I = know that there >>>>> are no other copies to try, then I would use the normal read with = all the retrying. >>>>>=20 >>>>=20 >>>> I agree with Warner that what you are proposing is not correct. It = weakens the >>>> contract between the disk layer and the upper layers, making it = less clear who is >>>> responsible for retries and less clear what =E2=80=9CEIO=E2=80=9D = means. That contract is already >>>> weak due to poor design decisions in VFS-BIO and GEOM, and Warner = and I >>>> are working on a plan to fix that. >>>=20 >>> Well... I do realize now that there is some problem in this area, = both you and >>> Warner mentioned it. But knowing that it exists is not the same as = knowing what >>> it is :-) >>> I understand that it could be rather complex and not easy to = describe in a short >>> email=E2=80=A6 >>>=20 >>=20 >> There are too many questions to ask, I will do my best to keep the = conversation >> logical. First, how do you propose to distinguish between EIO due to = a lengthy >> set of timeouts, vs EIO due to an immediate error returned by the = disk hardware? >=20 > At what layer / component? > If I am the issuer of the request then I know how I issued that = request and what > kind of request it was. If I am an intermediate layer, then what do I = care. >=20 I don=E2=80=99t understand your response, I think we are talking about 2 = different things. >> CAM has an extensive table-driven error recovery protocol who=E2=80=99s= purpose is to >> decide whether or not to do retries based on hardware state = information that is >> not made available to the upper layers. Do you have a test case that = demonstrates >> the problem that you=E2=80=99re trying to solve? Maybe the error = recovery table is wrong >> and you=E2=80=99re encountering a case that should not be retried. = If that=E2=80=99s what=E2=80=99s going on, >> we should fix CAM instead of inventing a new work-around. >=20 > Let's assume that I am talking about the case of not being able to = read an HDD > sector that is gone bad. > Here is a real world example: >=20 > Jun 16 10:40:18 trant kernel: ahcich0: NCQ error, slot =3D 20, port =3D = -1 > Jun 16 10:40:18 trant kernel: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. = ACB: 60 > 00 00 58 62 40 2c 00 00 08 00 00 > Jun 16 10:40:18 trant kernel: (ada0:ahcich0:0:0:0): CAM status: ATA = Status Error > Jun 16 10:40:18 trant kernel: (ada0:ahcich0:0:0:0): ATA status: 41 = (DRDY ERR), > error: 40 (UNC ) > Jun 16 10:40:18 trant kernel: (ada0:ahcich0:0:0:0): RES: 41 40 68 58 = 62 40 2c 00 > 00 00 00 > Jun 16 10:40:18 trant kernel: (ada0:ahcich0:0:0:0): Retrying command > Jun 16 10:40:20 trant kernel: ahcich0: NCQ error, slot =3D 22, port =3D = -1 ... > I do not see anything wrong in what CAM / ahci /ata_da did here. > They did what I would expect them to do. They tried very hard to get = data that > I told them I need. Two things I see here. The first is that the drive is trying for 2 = seconds to get good data off of the media. The second is that it=E2=80=99s failing and = reporting the error as uncorrectable. I think that retries at the OS/driver layer are therefore useless; the drive is already doing a bunch of its = own retries and is failing, and is telling you that it=E2=80=99s failing. In the past, = the conventional wisdom has been to do retries, because 30 years ago drives had minimal firmware and = didn=E2=80=99t do a good job at managing data integrity. Now they do an extensive amount = of analysis-driven error correction and retries, so I think it=E2=80=99s = time to change the=20 conventional wisdom. I=E2=80=99d propose that for direct-attach SSDs = and HDDs we treat this error as non-retriable. Normally this would be a one-line change, but I = think it needs to be nuanced to distinguish between optical drives, simple flash media = drives, and regular HDDs and SSDs. An interim solution would be to just set the kern.cam.ada.retry_count to = 0. >=20 > Timestamp of the first error is Jun 16 10:40:18. > Timestamp of the last error is Jun 16 10:40:27. > So, it took additional 9 seconds to finally produce EIO. > That disk is a part of a ZFS mirror. If the request was failed after = the first > attempt, then ZFS would be able to get the data from a good disk much = sooner. It=E2=80=99s the fault of ZFS for not reading all members in parallel. = I ask people to consider and possibly accept that ZFS is not perfect. >=20 > And don't take me wrong, I do NOT want CAM or GEOM to make that = decision by > itself. I want ZFS to be able to tell the lower layers when they = should try as > hard as they normally do and when they should report an I/O error as = soon as it > happens without any retries. >=20 Again, I=E2=80=99m not sure I understand your answer. It is the job of = the disk drivers to reliably complete storage requests, not to just try once and then assume = that an upper layer will figure things out. As I said in my previous email, the = upper layers, and in this case I mean everything above the drivers and CAM, do not = have the error information to know why something failed, and thus have only = limited knowledge of what to do to remediate a failure. The error recovery = knowledge is in CAM and the drivers. It sounds like you=E2=80=99re proposing to make these simple, dumb = layers and move the error recovery up to ZFS. I understand what you=E2=80=99re saying about = wanting this being opt-in and not affecting other filesystems, but I think that you are missing the point of what = CAM does and provides. By definition the =E2=80=98da=E2=80=99 and =E2=80=98ada=E2=80=99= drivers provide reliable I/O transactions to the drive. That means that they handle error recovery, retries, etc. = If you do not want reliable I/O transactions, then you should not use the da/ada = drivers, and instead should either write your own periph driver (let=E2=80=99s call = it =E2=80=9Czfsda=E2=80=9D), or you should have ZFS communicate with the disks via the pass driver. That is = what userland tools like cdparanoia do when they want to bypass the = traditional operational characteristics and constraints of the standard periphs. Think of it this way, would you propose a flag to a TCP socket that = tells TCP to not do retransmits? No! If you wanted that semantic then you=E2=80=99d = use UDP and invent your own retry/reliability layer. I think it=E2=80=99s a big = job to re-invent error recovery into ZFS, while it would be relatively easy to adapt the CAM retry policy to the behavior of modern disks, i.e. believe = them when they say that there=E2=80=99s an uncorrectable read error and = don=E2=80=99t blindly attempt a retry. >> Second, what about disk subsystems that do retries internally, out of = the control >> of the FreeBSD driver? This would include most hardware RAID = controllers. >> Should what you are proposing only work for a subset of the kinds of = storage >> systems that are available and in common use? >=20 > Yes. I do not worry about things that are beyond my control. > Those subsystems would behave as they do now. So, nothing would get = worse. It gets worse because it builds inconsistency into the system and makes = expectations for behavior unclear. That turns into more hacky and incorrect code = over time. >=20 >> Third, let=E2=80=99s say that you run out of alternate copies to try, = and as you stated >> originally, that will force you to retry the copies that had returned = EIO. How >> will you know when you can retry? How will you know how many times = you >> will retry? How will you know that a retry is even possible? >=20 > I am continuing to use ZFS as an example. It already has all the = logic built > in. If all vdev zio-s (requests to mirror members as an example) = fail, then > their parent zio (a logical read from the mirror) will be retried (by = ZFS) and > when ZFS retries it sets a special flag (ZIO_FLAG_IO_RETRY) on that = zio and on > its future child zio-s. >=20 > Essentially, my answer is you have to program it correctly, there is = no magic. Again, I think that you=E2=80=99re missing the point of what I=E2=80=99m = asking. All that ZFS can see is =E2=80=9CEIO=E2=80=9D. Maybe the EIO was generated because = of an uncorrectable media error, and all attempts to retry will be futile. Maybe it was = generated because of a correctable error, and a retry might be useful. There are = many, many, many error cases that are tested for an handled in CAM. You are = collapsing them down to 1 case, and I don=E2=80=99t agree with that. >=20 >> Should the retries >> be able to be canceled? >=20 > I think that this is an orthogonal question. > I do not have any answer and I am not ready to discuss this at all. >=20 Please be ready to discuss this. >> Why is overloading EIO so bad? brelse() will call bdirty() when a = BIO_WRITE >> command has failed with EIO. Calling bdirty() has the effect of = retrying the I/O. >> This disregards the fact that disk drivers only return EIO when = they=E2=80=99ve decided >> that the I/O cannot be retried. It has no termination condition for = the retries, and >> will endlessly retry I/O in vain; I=E2=80=99ve seen this quite = frequently. It also disregards >> the fact that I/O marked as B_PAGING can=E2=80=99t be retried in this = fashion, and will >> trigger a panic. Because we pretend that EIO can be retried, we are = left with >> a system that is very fragile when I/O actually does fail. Instead = of adding >> more special cases and blurred lines, I want to go back to enforcing = strict >> contracts between the layers and force the core parts of the system = to respect >> those contracts and handle errors properly, instead of just retrying = and >> hoping for the best. >=20 > So, I suggest that the buffer layer (all the b* functions) does not = use the > proposed flag. Any problems that exist in it should be resolved = first. > ZFS does not use that layer. >=20 I think it=E2=80=99s foolish to propose a change in the disk API and = then say =E2=80=9Conly ZFS should use this=E2=80=9D >>> But then, this flag is optional, it's off by default and no one is = forced to >>> used it. If it's used only by ZFS, then it would not be horrible. >>> Unless it makes things very hard for the infrastructure. >>> But I am circling back to not knowing what problem(s) you and Warner = are >>> planning to fix. >>>=20 >>=20 >> Saying that a feature is optional means nothing; while consumers of = the API >> might be able to ignore it, the producers of the API cannot ignore = it. It is >> these producers who are sick right now and should be fixed, instead = of >> creating new ways to get even more sick. >=20 > I completely agree. > But which producers of the API do you mean specifically? > So far, you mentioned only the consumer level problems with the = b-layer. >=20 > Having said all of the above, I must admit one thing. > When I proposed BIO_NORETRY I had only the simplest GEOM topology in = mind: > ZFS -> [partition] -> disk. > Now that I start to think about more complex topologies I am not = really sure how > the flag should be handled by geom-s with complex internal behavior. = If that > can be implemented reasonably and clearly, if the flag will create a = big mess. > E.g., things like gmirrors on top of gmirrors and so on. > Maybe the flag, if it ever accepted, should never be propagated = automatically. > Only geom-s that are aware of it should propagate or request it. That = should be > safer. >=20 Yes, it gets chaotic and unreliable once you have multiple layers each = thinking that they know better than the previous layer. That is the situation = that FreeBSD is currently in with CAM, GEOM, and VFS-BIO. I do not propose to make = it even more chaotic. Through just basic rules of abstraction, basic computer = science principles, it follows that the layers of software closest to the = operation (in this case, the hardware) have the least abstraction and the most amount of information on how to act. Saying that ZFS knows best how the hardware = works, when it has no access to the actual control channel information of the = hardware, is confusing and illogical to me. I think that there=E2=80=99s a legitimate problem with CAM error = recovery trying too hard to do retries when the disk is clearly telling it that it=E2=80=99s = futile. I don=E2=80=99t have a quick patch for that because I would like to think how to make it dependent on = the media type. However, it=E2=80=99s something that I=E2=80=99m willing to = discuss further and work on. Scott From owner-freebsd-geom@freebsd.org Tue Nov 28 07:08:12 2017 Return-Path: Delivered-To: freebsd-geom@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EFFD1DFBFA3 for ; Tue, 28 Nov 2017 07:08:12 +0000 (UTC) (envelope-from jmg@gold.funkthat.com) Received: from gold.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gate2.funkthat.com", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CFFE0794AE for ; Tue, 28 Nov 2017 07:08:12 +0000 (UTC) (envelope-from jmg@gold.funkthat.com) Received: from gold.funkthat.com (localhost [127.0.0.1]) by gold.funkthat.com (8.15.2/8.15.2) with ESMTPS id vAS78BHU048455 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 27 Nov 2017 23:08:11 -0800 (PST) (envelope-from jmg@gold.funkthat.com) Received: (from jmg@localhost) by gold.funkthat.com (8.15.2/8.15.2/Submit) id vAS78BRp048454; Mon, 27 Nov 2017 23:08:11 -0800 (PST) (envelope-from jmg) Date: Mon, 27 Nov 2017 23:08:11 -0800 From: John-Mark Gurney To: Pete French Cc: freebsd-geom@freebsd.org Subject: Re: GELI strangeness with gstat Message-ID: <20171128070811.GZ42467@funkthat.com> Mail-Followup-To: Pete French , freebsd-geom@freebsd.org References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Operating-System: FreeBSD 11.0-RELEASE-p7 amd64 X-PGP-Fingerprint: D87A 235F FB71 1F3F 55B7 ED9B D5FF 5A51 C0AC 3D65 X-Files: The truth is out there X-URL: https://www.funkthat.com/ X-Resume: https://www.funkthat.com/~jmg/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? User-Agent: Mutt/1.6.1 (2016-04-27) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (gold.funkthat.com [127.0.0.1]); Mon, 27 Nov 2017 23:08:11 -0800 (PST) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Nov 2017 07:08:13 -0000 Pete French wrote this message on Mon, Nov 27, 2017 at 13:08 +0000: > So, I have a set of machines running rul disc > encryption with GELI. The output from gstat on > an example one looks something like this: > > 0 27 3 100 0.4 16 100 0.1 1.1| ada0p4 > 0 27 3 100 0.9 16 100 0.2 1.3| ada0p4.eli > > I uapgraded a couple of thme to much faster CPUs - the output then > started looking like this: > > 0 146 0 0 0.0 125 604 0.1 5.7| ada0p4 > 2 146 0 0 0.0 125 604 0.1 104.3| ada0p4.eli > > ...so the .eli device is now running at 100% despite > the underlying disc only being about 6% busy. > > This was software encryption - my assumption was that the faster COU's > were now enabling me to overload the encryption somehow, so I enabled > AES-NI on the COU. Now I have hardware encryption. But the output from gstat > still looks the same. If you just did a kldload aesni, but did not reattach the geli device, then you are still using software encryption... You should see something like this: GEOM_ELI: Device gpt/werner.eli created. GEOM_ELI: Encryption: AES-XTS 256 GEOM_ELI: Crypto: hardware if you are using AES-NI... Also, what version of FreeBSD are you using? If you're using pre-10.0-R, the performance increase from using AES-NI is only marginal... The above does make it look like you're disks are CPU bound by the encryption... > Whats going on here ? Its very ouzzlking. What is even odder is that these > machines are ina HAST pair, and the secondary side looks fine - i.e. only > a few percent busy on the disc and the encrypted device. If I sap > roles then the efect persists - the HAST primary has a massively busy > ELI device. > > I realise the oprimary will be doing reads as well as writes, but as you can > see from the snapshot above, its not singificany compares to the > writes, and the effect is also ther when the load is dominated by writes. > > I am teoprted to think that gstat is being screwy here, but it bothers me not > knowing (especially as I am trying to tarck diwn bottlenecks in the system), > > Anyone got any opinions on what might be showing up here ? To get a better idea of what is happening, you can run top -S to see how much CPU the geli threads are using. Also, how many eli volumes do you have? In my case, I have 13... In order to reduce the load on the scheduler, I have: kern.geom.eli.threads="1" in /boot/loader.conf in order to prevent creating 6 threads (I have a 6 core CPU) per ELI device, or 78 threads in total... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-geom@freebsd.org Tue Nov 28 09:37:01 2017 Return-Path: Delivered-To: freebsd-geom@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 14D19DFF4CD for ; Tue, 28 Nov 2017 09:37:01 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from constantine.ingresso.co.uk (unknown [IPv6:2a02:b90:3002:411::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D6E4D7D917 for ; Tue, 28 Nov 2017 09:37:00 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from cpc73666-dals20-2-0-cust303.20-2.cable.virginm.net ([82.47.237.48] helo=foula.drayhouse.twisted.org.uk) by constantine.ingresso.co.uk with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.89 (FreeBSD)) (envelope-from ) id 1eJcK2-000FDZ-3c for freebsd-geom@freebsd.org; Tue, 28 Nov 2017 09:36:58 +0000 Subject: Re: GELI strangeness with gstat To: freebsd-geom@freebsd.org References: <20171128070811.GZ42467@funkthat.com> From: Pete French Message-ID: <11df15ff-7a30-3698-ff3b-ffced80a78c8@ingresso.co.uk> Date: Tue, 28 Nov 2017 09:37:03 +0000 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <20171128070811.GZ42467@funkthat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Nov 2017 09:37:01 -0000 Tnaks for the reply.... > If you just did a kldload aesni, but did not reattach the geli device, > then you are still using software encryption... You should see something > like this: > GEOM_ELI: Device gpt/werner.eli created. > GEOM_ELI: Encryption: AES-XTS 256 > GEOM_ELI: Crypto: hardware > > if you are using AES-NI... Yes, thats exactly what I see - I rebooted the system (had to as I was enabling AES_NI in the BIOS) and am loading the crypto and aesni modules at boot time form loader.conf. I see this in dmesg: Enter passphrase for ada0p4: GEOM_ELI: Device ada0p4.eli created. GEOM_ELI: Encryption: AES-XTS 128 GEOM_ELI: Crypto: hardware GEOM_ELI: Device ada1p4.eli created. GEOM_ELI: Encryption: AES-XTS 128 GEOM_ELI: Crypto: hardware and 'geli list' says 'Crypto: hardware' in its output > Also, what version of FreeBSD are you using? If you're using pre-10.0-R, > the performance increase from using AES-NI is only marginal... Am using 11.1-STABLE from mid June - is it worth getting the last few weeks of updates ? I wouldnt have thought so as I havent see any crypto chnages go past, but I can give it a go... > The above does make it look like you're disks are CPU bound by the > encryption... Indeed. But the odd this is it only started happening *after* I upgraded the machine to much faster CPU's. I am suspecting (hoping!) its actually a statistical anomaly in gstat and not a real effect. > To get a better idea of what is happening, you can run top -S to see > how much CPU the geli threads are using. Ah, good idea. Hmmmm... that shows up 12 threads, each using about 0.2% CPU. > Also, how many eli volumes do you have? In my case, I have 13... In > order to reduce the load on the scheduler, I have: > kern.geom.eli.threads="1" This is interesting - my understanding was that with hardware encryption that setting did nothing. From the manpage: kern.geom.eli.threads: 0 Specifies how many kernel threads should be used for doing software cryptography. Its purpose is to increase performance on SMP systems. If hardware acceleration is available, only one thread will be started. If set to 0, CPU-bound thread will be started for every active CPU. So my expectation would be that I only get 1 thread - but actuyally I see 12 in 'top -S'. Again, is that a bug or out of date documentation ? I only have two volumes, and am runnign ZFS over the top with compresison enabled, which I think only uses a single write thread, so maybe reducing that would help. But theres still a very large discrepancy between what all of my other jetricsare showing me and gstat. Te underlying discs are SSD's by the way, so I am fairly sure thats not a bottleneck (and gstats dosnt show the undrelying drive busy). -pete. From owner-freebsd-geom@freebsd.org Fri Dec 1 13:04:38 2017 Return-Path: Delivered-To: freebsd-geom@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2872FDFFE58 for ; Fri, 1 Dec 2017 13:04:38 +0000 (UTC) (envelope-from reply@west-art.net) Received: from west-art.net (west-art.net [162.144.110.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F328A1175 for ; Fri, 1 Dec 2017 13:04:37 +0000 (UTC) (envelope-from reply@west-art.net) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=west-art.net; s=default; h=Content-Transfer-Encoding:Content-Type: MIME-Version:List-Unsubscribe:Message-ID:Subject:Reply-To:From:To:Date:Sender :Cc:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Id: List-Help:List-Subscribe:List-Post:List-Owner:List-Archive; bh=lwr3y09pR5ac/Whh7eavrPy0YYM7D2OY9UYtFu89nsw=; b=P2PSZsLX5KIzjobGSBzVyooacp e1n3mOaSk4AtbHr2tlNNeUDnt+IwQzmEQJjAXCW0lNSnMvKkxWra9ZuXr9B/KeUuL1enRCZNg933X sTbsDAam9MOh+fb1nrVvhfaj2+uH17Vjo0azD7yQKHjc+j0iBbjl/PHlF8/pKgMlsAttwOQWkR+4a +7VX1o9oV3VbCnzi+YwV1+NtvjhXVzlIu8V/GwIvKZYaffeiaNXguhiNwYJ3nu5Ed5EUffqOiz051 w8WU8J0fnktzdznc29ksLz7tecrs0HhrVot84mtkCRBEYG5qpsVKIhnucrcu7bde08sGSFowqHm4/ 33LT62AA==; Received: from westartn by server.west-art.net with local (Exim 4.89) (envelope-from ) id 1eKkzb-0007Cb-04 for freebsd-geom@freebsd.org; Fri, 01 Dec 2017 15:04:35 +0200 Date: Fri, 1 Dec 2017 13:04:14 +0000 To: freebsd-geom@freebsd.org From: Benefis Company Reply-To: reply@west-art.net Subject: Best Ballet Costume for Winter Performances! Message-ID: <905d18b16b2921cbcb60c3223af0bbeb@massmail.benefistutu.com> X-Mailer: Mailster 2.2.14 (5.2.22) X-Mailster: d290a9698c8fd1014b248f3f83cfa022 X-Mailster-Campaign: 121 X-Mailster-ID: 34695d011f7794251628561f772eda6d X-Message-ID: <5a215362ef85c-d290a9698c8fd1014b248f3f83cfa022-121-34695d011f7794251628561f772eda6d@massmail.benefistutu.com> MIME-Version: 1.0 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.west-art.net X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [500 500] / [47 12] X-AntiAbuse: Sender Address Domain - west-art.net X-Get-Message-Sender-Via: server.west-art.net: authenticated_id: westartn/from_h X-Authenticated-Sender: server.west-art.net: reply@west-art.net X-Source: /usr/bin/php X-Source-Args: /usr/bin/php X-Source-Dir: west-art.net:/public_html/massmail Content-Type: text/plain; charset=us-ascii X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Dec 2017 13:04:38 -0000 [1] Winter Special Price! [2] This gorgeous Professional Stage Costume is just what you need for any show, audition, or performance. It is a perfect choice for the most renown ballets, such as Nutcracker, Sleeping Beauty and many more. Model: F 0001 Brand: Benefis Santa Claus [facebook] [3] [instagram] [4] [pinterest] [5] [twitter] [6] [linkedin] [7] [youtube] [8] [vkontakte] [9] Links: ------ [1] http://massmail.benefistutu.com/mailster/121/d290a9698c8fd1014b248f3f83cfa022/aHR0cHM6Ly9iZW5lZmlzdHV0dS5jb20vaW5kZXgucGhwP3JvdXRlPXByb2R1Y3QvcHJvZHVjdCZhbXA7c2VhcmNoPWYlMjUyMDAwMDEmYW1wO3Byb2R1Y3RfaWQ9MTI2MA [2] http://massmail.benefistutu.com/mailster/121/d290a9698c8fd1014b248f3f83cfa022/aHR0cHM6Ly9iZW5lZmlzdHV0dS5jb20vaW5kZXgucGhwP3JvdXRlPXByb2R1Y3QvcHJvZHVjdCZhbXA7c2VhcmNoPWYlMjUyMDAwMDEmYW1wO3Byb2R1Y3RfaWQ9MTI2MA/1 [3] http://massmail.benefistutu.com/mailster/121/d290a9698c8fd1014b248f3f83cfa022/aHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL0JlbmVmaXMtQmFsbGV0LUNvc3R1bWVzLTEyMTc2ODAzNDgzNzcxMS8 [4] http://massmail.benefistutu.com/mailster/121/d290a9698c8fd1014b248f3f83cfa022/aHR0cHM6Ly93d3cuaW5zdGFncmFtLmNvbS90ZXJlbnRpZXZhOTgxNi8 [5] http://massmail.benefistutu.com/mailster/121/d290a9698c8fd1014b248f3f83cfa022/aHR0cHM6Ly93d3cucGludGVyZXN0LmNvbS9iYmFsbGV0Y29zdHVtZXMv [6] http://massmail.benefistutu.com/mailster/121/d290a9698c8fd1014b248f3f83cfa022/aHR0cHM6Ly90d2l0dGVyLmNvbS9iZW5lZmlzc2hvcA [7] http://massmail.benefistutu.com/mailster/121/d290a9698c8fd1014b248f3f83cfa022/aHR0cHM6Ly93d3cubGlua2VkaW4uY29tL2luL2JlbmVmaXNzaG9wY29tLw [8] http://massmail.benefistutu.com/mailster/121/d290a9698c8fd1014b248f3f83cfa022/aHR0cHM6Ly93d3cueW91dHViZS5jb20vdXNlci9CZW5lZmlzc2hvcA [9] http://massmail.benefistutu.com/mailster/121/d290a9698c8fd1014b248f3f83cfa022/aHR0cHM6Ly92ay5jb20vaWQxNzgxNzY2OTg