Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Aug 2017 12:36:04 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        =?iso-8859-1?Q?Karli_Sj=F6berg?= <karli@inparadise.se>
Cc:        Ronald Klop <ronald-lists@klop.ws>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: when has a pNFS data server failed?
Message-ID:  <YTXPR01MB0189D827E7005084B669561BDD850@YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <2fbb5be6-f9c0-467a-a200-1783cf2c4a67@email.android.com>
References:  <2fbb5be6-f9c0-467a-a200-1783cf2c4a67@email.android.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Karli Sj=F6berg wrote:
[stuff snipped for brevity]
>>Rick Macklem wrote:
>>To be honest, I think the answer for version 1 will come down to...
>>
>>How long should the MDS try to communicate with the DS before it gives up=
 and
>>considers it failed?
>>
>>It will probably be setable via a sysctl, but does need a reasonable defa=
ult value.
>>(A "very large" value would indicate "leave it for the sysadmin to decide=
 and do
>>manually.)
[more stuff snipped]
>This is what one prominent "customer" says about timeout:
>https://kb.vmware.com/selfservice/microsites/search.do?language=3Den_US&cm=
d=3DdisplayKC&externalId=3D1009465
>"These issues occur when the guest operating system timeout values are exc=
eeded for >attached storage disks. This may be caused by an underlying stor=
age problem or due to >brief transient pauses during normal operations (suc=
h as path failover). To accommodate >transient events, the VMware Tools inc=
reases the SCSI disk timeout to 60 seconds for >Virtual Infrastructure 3 an=
d 180 seconds for vSphere 4 and higher."
>
>Which means that you have a minute before the "customers" start complainin=
g:)
Thanks. I was thinking that a minute or two is about what the default might=
 want
to be. It may need to be longer than that, since a DS needs to be able to r=
eboot
and start servicing RPCs before this timeout happens as one example.
(Fortunately a DS does not need to wait for the "grace period that an NFSv4=
/MDS
 server does after boot, since that time is for clients to recover locks an=
d the locks
 are handled by the MDS and not the DSs.)

Thanks for the comment, rick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTXPR01MB0189D827E7005084B669561BDD850>