From owner-freebsd-fs@freebsd.org Fri May 25 21:14:07 2018 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E4E33EF4A06 for ; Fri, 25 May 2018 21:14:06 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660087.outbound.protection.outlook.com [40.107.66.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT TLS CA 4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6D6A874DBB for ; Fri, 25 May 2018 21:14:05 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YTXPR0101MB0959.CANPRD01.PROD.OUTLOOK.COM (52.132.34.15) by YTXPR0101MB0976.CANPRD01.PROD.OUTLOOK.COM (52.132.35.22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.797.11; Fri, 25 May 2018 21:14:04 +0000 Received: from YTXPR0101MB0959.CANPRD01.PROD.OUTLOOK.COM ([fe80::f52d:74c4:9a70:3781]) by YTXPR0101MB0959.CANPRD01.PROD.OUTLOOK.COM ([fe80::f52d:74c4:9a70:3781%13]) with mapi id 15.20.0797.011; Fri, 25 May 2018 21:14:04 +0000 From: Rick Macklem To: "freebsd-fs@freebsd.org" CC: James Rose Subject: pNFS mirror file distribution (big picture question?) Thread-Topic: pNFS mirror file distribution (big picture question?) Thread-Index: AQHT9G1Ldhfq+qKVJkKq44+uTUX/TA== Date: Fri, 25 May 2018 21:14:04 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; YTXPR0101MB0976; 7:tBGQHDOgkGntPN+P668eTMmbFUaQUeFgdcoZw0O3Cj+69lnOX7gtd1Bt0vrwKfD3+SKl8KDha8vHpgT3jW9Jfd2W9JxS1BCXCjMosVrNQuuumbZgDRLk5SCgSo40jMNGIjMti/cAGciLxC2xEIWq67rpgJt0w3XdOutqudUysNk8ggiwL/b8ZJt7KLpQzw7GeCg9c8S25nKVHcybnJ/bYQ7igSmr8VvSesHFzP2FKPGYc09WaqyBLDDYI43t4q+I x-ms-exchange-antispam-srfa-diagnostics: SOS; x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(8989080)(5600026)(4534165)(4627221)(201703031133081)(201702281549075)(8990040)(2017052603328)(7153060)(7193020); SRVR:YTXPR0101MB0976; x-ms-traffictypediagnostic: YTXPR0101MB0976: authentication-results: spf=none (sender IP is ) smtp.mailfrom=rmacklem@uoguelph.ca; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(158342451672863)(5213294742642); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040522)(2401047)(8121501046)(5005006)(3002001)(10201501046)(93006095)(93001095)(3231254)(944501410)(52105095)(149027)(150027)(6041310)(20161123558120)(201703131423095)(201702281529075)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(20161123564045)(20161123560045)(6072148)(201708071742011)(7699016); SRVR:YTXPR0101MB0976; BCL:0; PCL:0; RULEID:; SRVR:YTXPR0101MB0976; x-forefront-prvs: 06833C6A67 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(376002)(39860400002)(396003)(366004)(39380400002)(189003)(199004)(9686003)(25786009)(478600001)(5640700003)(6436002)(97736004)(2900100001)(4326008)(53936002)(74482002)(316002)(786003)(55016002)(59450400001)(3280700002)(33656002)(26005)(2906002)(6506007)(6916009)(14454004)(68736007)(106356001)(105586002)(3660700001)(8676002)(81156014)(8936002)(81166006)(7696005)(2501003)(476003)(2351001)(305945005)(486006)(74316002)(99286004)(5660300001)(102836004)(186003)(5250100002)(86362001); DIR:OUT; SFP:1101; SCL:1; SRVR:YTXPR0101MB0976; H:YTXPR0101MB0959.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-microsoft-antispam-message-info: Qyx/oc2RQ17hQObRCSlQLGBbqqae6HKk/vPFPjDGw4k3Z5/BPkrIxnf4pCzD0thLbOa7hnh2x+3K3FNHGjIHjbIYooTgP+1ovL3Cx6AEyKrI5o7Zess4POSVrG2pBJABVelsu9TGXL+cGS1UhXv+t3r6rW7Zm2qOQOzKj0JqJT4Hud9IHBLD7jRpeD/ZxQi/ spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: 91ba7a0e-53a9-43b1-61cc-08d5c284720c X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: 91ba7a0e-53a9-43b1-61cc-08d5c284720c X-MS-Exchange-CrossTenant-originalarrivaltime: 25 May 2018 21:14:04.6741 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-Transport-CrossTenantHeadersStamped: YTXPR0101MB0976 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 May 2018 21:14:07 -0000 Hi, #1 The code currently in projects/pnfs-planb-server allows creation of sets= of mirrored data servers (DSs). For example, the "-p" nfsd option argument: nfsv4-data0#nfsv4-data1,nfsv4-data2#nfsv4-data3 defines two mirrored sets of data servers with two servers in each one. ("#" separates mirrors within a mirror set) I did this a couple of years ago, in part because I thought having a well d= efined "mirror" for a DS would facilitate mirror recovery. Now that I have completed the mirror recovery code, having a defined mirror set is not needed. #2 An alternate mirroring approach would be what I might call the random/di= stributed approach, where each file is distributed on any two (or more) of the DSs. For this approach, the "-p" nfsd option argument: nfsv4-data0,nfsv4-data1,nfsv4-data2,nfsv4-data3 defines four DSs and a separate flag would say "two way mirroring", so each file would be placed on 2 of the 4 DSs. The question is "should I switch the code to approach #2?". I won't call it elegant, but #1 is neat and tidy, since the sysadmin knows = that a data file ends up on either nfsv4-data0, nfsv4-data1 or nfsv4-data2, nfsv= 4-data3. Assuming the mirrored DSs in a set have the same amount of storage, they w= ill have the same amount of free space. --> This implies that they will run out of space at the same time and the p= NFS service won't be able to write to files on the mirror set. With #2, one of the DSs will probably run out of space first. I think this = will make a client trying to write a file on it to report to the Metadata server (MDS= ) a write error and that will cause the DS to be taken offline. Then, the write will succeed on the other mirror and things will continue t= o run. Eventually all the DSs will fill up, but hopefully a sysadmin can step in a= nd fix the "out of space" problem before that point. Another advantage I can see for #2 is that it gives the MDS more flexibilit= y when it chooses which DSs to create the data files on than #1 does. (I will be less "neat and tidy", but the sysadmin can find out which DSs st= ore the data for a file on the MDS on a "per file" basis.) James Rose was asking about "manual migration". It is almost the same as wh= at is already done for mirror recovery and is a pretty trivial addition for #2= . For #1, it can be done, but is more work. "manual migration" refers to a sysadmin doin= g a command that moves a data file from one DS to another. (Others that are more clever than I could use the "manual migration" syscal= l to implement automagic versions to try and balance storage use and I/O loa= d.) Given easier migration and what I think is better handling of "out of space= " failures, I am leaning towards switching the code to #2. What do others think? rick=