Date: Tue, 22 Aug 2017 19:51:11 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: Ronald Klop <ronald-lists@klop.ws>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: when has a pNFS data server failed? Message-ID: <YTXPR01MB0189D2D15AF6AA25FCF7E08FDD840@YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <op.y5c7rrupkndu52@klop.ws> References: <YTXPR01MB018952E64C3026F95165B45FDD800@YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM>, <op.y5c7rrupkndu52@klop.ws>
next in thread | previous in thread | raw e-mail | index | archive | help
Ronald Klop wrote: >On Fri, 18 Aug 2017 23:52:12 +0200, Rick Macklem <rmacklem@uoguelph.ca> >wrote: >> This is kind of a "big picture" question that I thought I 'd throw out. >> >> As a brief background, I now have the code for running mirrored pNFS >> Data Servers >> working for normal operation. You can look at: >> http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt >> if you are interested in details related to the pNFS server code/testing= . >> >> So, now I am facing the interesting part: >> 1 - The Metadata Server (MDS) needs to decide that a mirrored DS has >> failed at some >> point. Once that happens, it stops using the DS, etc. >> --> This brings me to the question of "when should the MDS decide that >> the DS has >> failed and should be taken offline?". >> - I'm not up to date w.r.t. the TCP stack, so I'm not sure how >> long it will take for the >> TCP connection to decide that a DS server is no longer working >> and fail the TCP >> connection. I think it takes a fair amount of time, so I'm not >> sure if TCP connection >> loss is a good indicator of DS server failure or not? >> - It seems to me that the MDS should wait a fairly long time before >> failing the DS, >> since this will have a major impact on the pNFS server, requiring >> repair/resilvering >> by a sysadmin once it happens. >> So, any comments or thoughts on this? rick > >This is a quite common problem for all clustered/connected systems. I >think there is no general answer. And there are a lot of papers written >about it. If you have a suggestion for one good paper, I might be willing to read it. Short answer is I'm retired after 30years of working for a University and I= have roughly a 0 interest in reading academic papers. >For example: in NFS you have the 'soft' option. It is recommended not to >use it. I can imagine that if your home-dir or /usr is mounted over NFS, >but at work I want my http-servers to not hang and just give an IO-error >when the backend fileserver with data is gone. >Something similar happens here. Yes. However, the analogy only works so far, in that a failure of a "soft" = mount affects integrity of the file, if it is a write that fails. In this case, there shouldn't be data corruption/loss, however there may be degraded performance during the mirror failure and subsequent resilvering. (A closer analogy might be a drive failure when in a mirrored configuration with another drive. These days drive hardware does try to indicate "hardwa= re health", which the mirrored server may not provide, at least in the early version.) > Doesn't the protocol definition say something about this? Nope, except for some "on the wire" information that the pNFS client can pr= ovide to indicate to the MDS that it is having problems with a DS. (The RFCs deal with what goes on the wire and not how servers get implement= ed.) > Or what do other implementations do? I have no idea. At this point, all extant pNFS server implementations are p= roprietary blobs, such as a Netapp clustered configuration. I've only seen "high level= " white papers (one notch away from marketing). To be honest, I think the answer for version 1 will come down to... How long should the MDS try to communicate with the DS before it gives up a= nd considers it failed? It will probably be setable via a sysctl, but does need a reasonable defaul= t value. (A "very large" value would indicate "leave it for the sysadmin to decide a= nd do manually.) I also think there might be certain error returns from sosend()/sorecieve()= that may want special handling. A simple example I experienced in recent testing was... - One system was misconfigured with the same IP# as one of the DS systems. After fixing the misconfiguration, the pNFS server was wedged because it= had a bogus arp entry so it couldn't talk to the one mirror. --> This was easily handled by a "arp -d" done by me on the MDS, but if the= MDS had given up on the DS before I did that, it would have been a lot mo= re work to fix. (The bogus arp entry had a very long timeout on it.) Anyhow, thanks for the comments and we'll see if others have comments, rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTXPR01MB0189D2D15AF6AA25FCF7E08FDD840>