Date: Wed, 23 Aug 2017 12:36:04 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: =?iso-8859-1?Q?Karli_Sj=F6berg?= <karli@inparadise.se> Cc: Ronald Klop <ronald-lists@klop.ws>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: when has a pNFS data server failed? Message-ID: <YTXPR01MB0189D827E7005084B669561BDD850@YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <2fbb5be6-f9c0-467a-a200-1783cf2c4a67@email.android.com> References: <2fbb5be6-f9c0-467a-a200-1783cf2c4a67@email.android.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Karli Sj=F6berg wrote: [stuff snipped for brevity] >>Rick Macklem wrote: >>To be honest, I think the answer for version 1 will come down to... >> >>How long should the MDS try to communicate with the DS before it gives up= and >>considers it failed? >> >>It will probably be setable via a sysctl, but does need a reasonable defa= ult value. >>(A "very large" value would indicate "leave it for the sysadmin to decide= and do >>manually.) [more stuff snipped] >This is what one prominent "customer" says about timeout: >https://kb.vmware.com/selfservice/microsites/search.do?language=3Den_US&cm= d=3DdisplayKC&externalId=3D1009465 >"These issues occur when the guest operating system timeout values are exc= eeded for >attached storage disks. This may be caused by an underlying stor= age problem or due to >brief transient pauses during normal operations (suc= h as path failover). To accommodate >transient events, the VMware Tools inc= reases the SCSI disk timeout to 60 seconds for >Virtual Infrastructure 3 an= d 180 seconds for vSphere 4 and higher." > >Which means that you have a minute before the "customers" start complainin= g:) Thanks. I was thinking that a minute or two is about what the default might= want to be. It may need to be longer than that, since a DS needs to be able to r= eboot and start servicing RPCs before this timeout happens as one example. (Fortunately a DS does not need to wait for the "grace period that an NFSv4= /MDS server does after boot, since that time is for clients to recover locks an= d the locks are handled by the MDS and not the DSs.) Thanks for the comment, rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTXPR01MB0189D827E7005084B669561BDD850>