Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Aug 2017 21:52:12 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   when has a pNFS data server failed?
Message-ID:  <YTXPR01MB018952E64C3026F95165B45FDD800@YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM>

next in thread | raw e-mail | index | archive | help
This is kind of a "big picture" question that I thought I 'd throw out.

As a brief background, I now have the code for running mirrored pNFS Data S=
ervers
working for normal operation. You can look at:
http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt
if you are interested in details related to the pNFS server code/testing.

So, now I am facing the interesting part:
1 - The Metadata Server (MDS) needs to decide that a mirrored DS has failed=
 at some
      point. Once that happens, it stops using the DS, etc.
--> This brings me to the question of "when should the MDS decide that the =
DS has
      failed and should be taken offline?".
      - I'm not up to date w.r.t. the TCP stack, so I'm not sure how long i=
t will take for the
        TCP connection to decide that a DS server is no longer working and =
fail the TCP
        connection. I think it takes a fair amount of time, so I'm not sure=
 if TCP connection
        loss is a good indicator of DS server failure or not?
    - It seems to me that the MDS should wait a fairly long time before fai=
ling the DS,
      since this will have a major impact on the pNFS server, requiring rep=
air/resilvering
      by a sysadmin once it happens.
So, any comments or thoughts on this? rick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTXPR01MB018952E64C3026F95165B45FDD800>