Date: Fri, 18 Aug 2017 21:52:12 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: when has a pNFS data server failed? Message-ID: <YTXPR01MB018952E64C3026F95165B45FDD800@YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM>
next in thread | raw e-mail | index | archive | help
This is kind of a "big picture" question that I thought I 'd throw out. As a brief background, I now have the code for running mirrored pNFS Data S= ervers working for normal operation. You can look at: http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt if you are interested in details related to the pNFS server code/testing. So, now I am facing the interesting part: 1 - The Metadata Server (MDS) needs to decide that a mirrored DS has failed= at some point. Once that happens, it stops using the DS, etc. --> This brings me to the question of "when should the MDS decide that the = DS has failed and should be taken offline?". - I'm not up to date w.r.t. the TCP stack, so I'm not sure how long i= t will take for the TCP connection to decide that a DS server is no longer working and = fail the TCP connection. I think it takes a fair amount of time, so I'm not sure= if TCP connection loss is a good indicator of DS server failure or not? - It seems to me that the MDS should wait a fairly long time before fai= ling the DS, since this will have a major impact on the pNFS server, requiring rep= air/resilvering by a sysadmin once it happens. So, any comments or thoughts on this? rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTXPR01MB018952E64C3026F95165B45FDD800>