From owner-freebsd-fs@freebsd.org Fri Dec 21 23:50:02 2018 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C481E1337669 for ; Fri, 21 Dec 2018 23:50:02 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660040.outbound.protection.outlook.com [40.107.66.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 19BF3898E5 for ; Fri, 21 Dec 2018 23:50:00 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YQBPR01MB0388.CANPRD01.PROD.OUTLOOK.COM (10.169.142.146) by YQBPR01MB0178.CANPRD01.PROD.OUTLOOK.COM (10.169.141.136) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1446.18; Fri, 21 Dec 2018 23:49:58 +0000 Received: from YQBPR01MB0388.CANPRD01.PROD.OUTLOOK.COM ([fe80::9d84:f9d8:b5bb:3b7c]) by YQBPR01MB0388.CANPRD01.PROD.OUTLOOK.COM ([fe80::9d84:f9d8:b5bb:3b7c%8]) with mapi id 15.20.1446.022; Fri, 21 Dec 2018 23:49:58 +0000 From: Rick Macklem To: Peter Eriksson , "freebsd-fs@freebsd.org" Subject: Re: Suggestion for hardware for ZFS fileserver Thread-Topic: Suggestion for hardware for ZFS fileserver Thread-Index: AQHUmU1yCLTPDjfxZEms4Ul8AJXDZ6WJb0QAgAAbMACAAE3yOQ== Date: Fri, 21 Dec 2018 23:49:58 +0000 Message-ID: References: <4f816be7-79e0-cacb-9502-5fbbe343cfc9@denninger.net>, <3160F105-85C1-4CB4-AAD5-D16CF5D6143D@ifm.liu.se> In-Reply-To: <3160F105-85C1-4CB4-AAD5-D16CF5D6143D@ifm.liu.se> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; YQBPR01MB0178; 6:bimqE2c7+hsAxrU6jueLWI/re4ENRIa6PVzXOVQmv3TCW3r0AJaMBzrT2N6qhYitXRiNb7+Tc2hj4VAzrxbP0xpycAl6acpoVkGVIpUjJC3XmWlpLFUnyfIqYuM2GSnAr7KbN99JZDt9BCYOtSMy6EeOFiNrwaFxomuobWKnemsayoD+/e6dIbAs1kBzlHYPX/rCq8aTa9TNA4h6f8NrwEG0D7ZAsY3j9y+uy/R6oQyxu0LRInQSvfR67lAslVTwosmWmbS18+a5R301DiUtPlJPkBpSGjeLb9pahUq929dQCO2GvHthwlGF009b3Qa4jNnKFvemqMi1PALNGRYrAqN0/uWhvRwfDWwSBmS1hU1tb9uCaN5ZlOxzGtnE5S/Hb2xpVxQoLfm259isOPjU9lxae0ZRq48qhWl0auNBph2pkD0f8vaLrql35ykmX9ctKJbS+rRO09wBisYOUPxZfKjPAp0MsQaCsjm8PXEl49I=; 5:7IBnnk2onT70sUOE7zTm4Z7gC2uLjdO4ELUXbUzKXSq8TAUEKdx7U5E8HkSCUWGGZ6qnA3ZQjqal34kFPoaF2uA3fls1piQ+FXCQwPfGVPLnkyDm8W0tUnFvA1uI2CHzRlzj4d8SRP5SzJItqPWGgqlcXziMsqIBT5+zmMHYVo0=; 7:RcphuH/cOLrU1JwoPt+cQeheqwTbZHbuqpXkPx8XYWm09Dcs8Di5wwVZJ8pVP4qckQACsQdtcMcmiUODhVDiiT2DzUVWQq6rXOry3lIaRc8goFHPqu41zLm0vjDIaCfIVsOOr42Q1zTvNhHjzYwqDQ== x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-correlation-id: fc29ec1f-443e-4a06-38cd-08d6679f0423 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(2017052603328)(7153060)(7193020); SRVR:YQBPR01MB0178; x-ms-traffictypediagnostic: YQBPR01MB0178: x-microsoft-antispam-prvs: x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(3230021)(999002)(5005026)(6040522)(2401047)(8121501046)(3002001)(10201501046)(93006095)(93001095)(3231475)(944501520)(52105112)(149066)(150057)(6041310)(20161123558120)(20161123564045)(20161123562045)(20161123560045)(201703131423095)(201703031522075)(201702281529075)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(201708071742011)(7699051)(76991095); SRVR:YQBPR01MB0178; BCL:0; PCL:0; RULEID:; SRVR:YQBPR01MB0178; x-forefront-prvs: 0893636978 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(376002)(366004)(136003)(39860400002)(396003)(346002)(189003)(199004)(476003)(486006)(74482002)(110136005)(446003)(11346002)(305945005)(33656002)(478600001)(99286004)(74316002)(2501003)(97736004)(8936002)(6246003)(6436002)(5660300001)(68736007)(316002)(46003)(93886005)(786003)(296002)(105586002)(6506007)(81166006)(81156014)(55016002)(9686003)(86362001)(53936002)(102836004)(76176011)(8676002)(229853002)(186003)(2906002)(106356001)(25786009)(71200400001)(71190400001)(7696005)(14444005)(256004)(14454004); DIR:OUT; SFP:1101; SCL:1; SRVR:YQBPR01MB0178; H:YQBPR01MB0388.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-microsoft-antispam-message-info: uO+uw8mQ2E4BMcVLa3ZpqgvXkBcUUIBLqknLnMz1PNgVQZP2qDJYEij1fL1AvNJkyPwYVACjEEBmU+bQxMXE2jYm5LMpdEiS6wx2T1owDWG8tCDJOnjws6ZsNcKOSkd/iXTGwnQ38CNDNtUWAeRxTjSkpTsmhH/BEEqkocKfMt5kNnRCFpW72ySfgEqRjLwNn0tqmTU2CukrB61njt+SDcH6gtgPRGYBroelFPFxVJmt94hFlqwdwlhmhjxowpvxvEdjLwgOjlgoi3ggOUbmD8U69ZiLhaoMPZNq08H5qaPpryPtsBf7088ZGEmrkBMN spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: fc29ec1f-443e-4a06-38cd-08d6679f0423 X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Dec 2018 23:49:58.5000 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-Transport-CrossTenantHeadersStamped: YQBPR01MB0178 X-Rspamd-Queue-Id: 19BF3898E5 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 40.107.66.40 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-3.97 / 15.00]; ARC_NA(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/17]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[uoguelph.ca]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[40.66.107.40.list.dnswl.org : 127.0.3.0]; NEURAL_HAM_SHORT(-0.81)[-0.807,0]; IP_SCORE(-0.86)[ipnet: 40.64.0.0/10(-2.17), asn: 8075(-2.04), country: US(-0.08)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:8075, ipnet:40.64.0.0/10, country:US]; RCVD_TLS_LAST(0.00)[]; RWL_MAILSPIKE_POSSIBLE(0.00)[40.66.107.40.rep.mailspike.net : 127.0.0.17] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Dec 2018 23:50:03 -0000 Peter Eriksson wrote: [good stuff snipped] >This has caused some interesting problems=85 > >First thing we noticed was that booting would take forever=85 Mounting the= 20-100k >filesystems _and_ enabling them to be shared via NFS is not done = efficient at all (for >each filesystem it re-reads /etc/zfs/exports (a coup= le of times) befor appending one >line to the end. Repeat 20-100,000 times= =85 Not to mention the big kernel lock for >NFS =93hold all NFS activity wh= ile we flush and reinstalls all sharing information per >filesystem=94 bein= g done by mountd=85 Yes, /etc/exports and mountd were implemented in the 1980s, when a dozen file systems would have been a large server. Scaling to 10,000 or more file systems wasn't even conceivable back then. >Wish list item #1: A BerkeleyDB-based =92sharetab=92 that replaces the hor= ribly >slow /etc/zfs/exports text file. >Wish list item #2: A reimplementation of mountd and the kernel interface t= o allow >a =93diff=94 between the contents of the DB-based sharetab above b= e input into the >kernel instead of the brute-force way it=92s done now.. The parser in mountd for /etc/exports is already an ugly beast and I think implementing a "diff" version will be difficult, especially figuring out wh= at needs to be deleted. I do have a couple of questions related to this: 1 - Would your case work if there was an "add these lines to /etc/exports"? (Basically adding entries for file systems, but not trying to delete a= nything previously exported. I am not a ZFS guy, but I think ZFS just generat= es another exports file and then gets mountd to export everything again.) 2 - Are all (or maybe most) of these ZFS file systems exported with the sam= e arguments? - Here I am thinking that a "default-for-all-ZFS-filesystems" line co= uld be put in /etc/exports that would apply to all ZFS file systems not e= xported by explicit lines in the exports file(s). This would be fairly easy to implement and would avoid trying to hand= le 1000s of entries. In particular, #2 above could be easily implemented on top of what is alrea= dy there, using a new type of line in /etc/exports and handling that as a spec= ial case by the NFS server code, when no specific export for the file system to= the client is found. >(I=92ve written some code that implements item #1 above and it helps quite= a bit. >Nothing near production quality yet though. I have looked at item = #2 a bit too but >not done anything about it.) [more good stuff snipped] Btw, although I put the questions here, I think a separate thread discussin= g how to scale to 10000+ file systems might be useful. (On freebsd-fs@ or freebsd-current@. The latter sometimes gets the attention of more developer= s.) rick