Date: Fri, 21 Feb 2025 10:12:54 +0200 From: Toomas Soome <tsoome@me.com> To: FreeBSD CURRENT <freebsd-current@freebsd.org> Cc: Steve Rikli <sr@genyosha.net>, Gleb Smirnoff <glebius@glebi.us>, Rick Macklem <rick.macklem@gmail.com> Subject: Re: RFC: mount_nfs failure due to dns not running yet Message-ID: <862576B0-EFBF-4CC9-B99A-723125D60983@me.com> In-Reply-To: <CAM5tNy55atdBE2iNhhEWBPcytSe=ikXW00kN1fRqJr8HXQpuYg@mail.gmail.com> References: <CAM5tNy5wA9DyBP%2BJdq1O6J=VVtXm6Rmm5rtXjJqyJRKvJ8WY=A@mail.gmail.com> <Z7fIlo1Dt6AfO%2BZx@dragon.home.genyosha.net> <CAM5tNy55atdBE2iNhhEWBPcytSe=ikXW00kN1fRqJr8HXQpuYg@mail.gmail.com>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --]
> On 21. Feb 2025, at 04:39, Rick Macklem <rick.macklem@gmail.com> wrote:
>
> On Thu, Feb 20, 2025 at 4:28 PM Steve Rikli <sr@genyosha.net> wrote:
>>
>> On Wed, Feb 19, 2025 at 02:40:15PM -0800, Rick Macklem wrote:
>>>
>>> The subject line basically describes the problem glebius@
>>> ran into. When doing an NFS mount in /etc/fstab, it failed
>>> since the DNS service was not yet working and, as such,
>>> the DNS lookup of the server fqdn failed, causing the mount
>>> to fail. Note that this behaviour has existed for decades.
>>>
>>> He feels this is a bug and that mount_nfs(8) should retry
>>> getaddrinfo(3) calls until success, instead of failing the
>>> mount when the first attempt fails.
>>> The problem with just retrying getaddrinfo(3) is that it
>>> could retry forever for simple failures like a typo in the
>>> server fqdn.
>>> I can see several ways this can be handled and would
>>> like feedback from others w.r.t. these alternatives.
>>>
>>> 1) Simply document this case and encourage use of
>>> host names in /etc/hosts for NFS servers along with
>>> specifying use of file before dns in nsswitch.conf.
>>> Doing this results in the mounts working whether or
>>> not DNS is working.
>>>
>>> 2) Call it a bug and patch mount_nfs(8) to retry getaddrinfo(3)
>>> until it succeeds. (I feel this would be a POLA violation,
>>> given that the current behaviour has existed for decades
>>> and for simple cases where the fqdn will never resolve
>>> the behaviour would be to hang at the mount attempt
>>> during boot unless "bg" is specified for the /etc/fstab entry.)
>>>
>>> 3) Add a new NFS mount option "retrydns=<N>", which would enable
>>> retries of getaddrinfo(3). This would avoid any POLA violation and
>>> would allow for a convenient way to document the behaviour in
>>> "man mount_nfs".
>>>
>>> 4) ???
>>>
>>> So, what do you think is the preferred change?
>>
>> I don't think I would change mount_nfs code behavior for this.
>>
>> That is, requiring services and daemons etc. to workaround missing,
>> misconfigured, slow, or misbehaving nameservice (whether it's DNS,
>> /etc/hosts, NIS, whatever) seems like more complexity, possibly not
>> effective, and maybe not focused on the right thing.
>>
>> Now, without meaning to be presumptuous, it may be worth re-examining
>> the startup sequence, e.g. to make sure NFS mounts are tried after the
>> known dependencies can reasonably be expected to have started, including
>> the network, plus local_unbound or bind (if used), possibly others.
>>
>> After a quick look, I don't see an obvious problem with the sequence,
>> but more knowledgeable eyes than mine are welcome. I don't quite follow
>> some of the output from rcorder and service -r.
>>
>>> ps: I looked and the return value from getaddrinfo(3) does not
>>> appear to be useful to discern the case of "DNS service not
>>> running yet". (I think it replies EAI_FAIL for this case.)
>>
>> In that area, I'll note FreeBSD rc.d has a "NETWORKING" dependency for
>> PROVIDE and REQUIRE, and it's included in scripts like nfsclient,
>> mountcritremote et al. However there seems to be no similar dependency
>> for something like "NAMESERVICE" (generic, as opposed to "named"
>> specifically), and I'm not sure how that might be implemented, even
>> assuming it could be useful in a situation like this.
>>
>> I.e. there are many things to potentially check for "can the system
>> resolve hostnames yet", and not all of them involve running a local
>> instance of named, unbound, etc.
>>
>> In general, if I were running into problems with nameservice not being
>> available by the time NFS mounts happen, I think I'd start by looking
>> into possible nameservice issues, then check out some mechanisms other
>> folks have mentioned (fstab IP addresses or late option, rc.conf
>> netwait_enable, etc.) rather than coding workarounds into NFS itself.
> Well, the patch I have created (it took about 15min) only changes behaviour
> if a new "retrydns" option i used. As such, I think it might be useful for some,
> but doesn't change things unless someone uses it.
>
> I agree with you that I don't think the rc scripts have a way to check REQUIRE
> dns working. (I, personally, always put the fqdn for NFS servers in /etc/hosts
> and make sure "files" is first in nsswitch.conf, but others argue that is not
> feasible for some deployments. (Using IP numbers works for AUTH_SYS,
> but not Kerberized mounts.)
>
> Note that there is already "retrycnt", which specifies retry the mount,
> but that retry loop doesn't include getaddrinfo(3) calls.
> --> Personally, I do not like always doing retries since I often
> type mount commands manually and I'm a terrible typist, so I
> often mistype the server's name.
>
> This reply was mostly a followup on all the good comments and
> not just yours.
>
> Thanks everyone, for your comments, rick
>
my 2cents:
there is a difference of name service not responding and name not resolving. In first case, it will go to:
bg If an initial attempt to contact the server fails, fork
off a child to keep trying the mount in the background.
Useful for fstab(5), where the file system mount is not
critical to multiuser operation.
bgnow Like bg, fork off a child to keep trying the mount in the
background, but do not attempt to mount in the foreground
first. This eliminates a 60+ second timeout when the
server is not responding. Useful for speeding up the
boot process of a client when the server is likely to be
unavailable. This is often the case for interdependent
servers such as cross-mounted servers (each of two
servers is an NFS client of the other) and for cluster
nodes that must boot before the file servers.
in second case, its a failure you can not recover from.
rgds,
toomas
[-- Attachment #2 --]
<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><br id="lineBreakAtBeginningOfMessage"><div><br><blockquote type="cite"><div>On 21. Feb 2025, at 04:39, Rick Macklem <rick.macklem@gmail.com> wrote:</div><br class="Apple-interchange-newline"><div><div>On Thu, Feb 20, 2025 at 4:28 PM Steve Rikli <sr@genyosha.net> wrote:<br><blockquote type="cite"><br>On Wed, Feb 19, 2025 at 02:40:15PM -0800, Rick Macklem wrote:<br><blockquote type="cite"><br>The subject line basically describes the problem glebius@<br>ran into. When doing an NFS mount in /etc/fstab, it failed<br>since the DNS service was not yet working and, as such,<br>the DNS lookup of the server fqdn failed, causing the mount<br>to fail. Note that this behaviour has existed for decades.<br><br>He feels this is a bug and that mount_nfs(8) should retry<br>getaddrinfo(3) calls until success, instead of failing the<br>mount when the first attempt fails.<br>The problem with just retrying getaddrinfo(3) is that it<br>could retry forever for simple failures like a typo in the<br>server fqdn.<br>I can see several ways this can be handled and would<br>like feedback from others w.r.t. these alternatives.<br><br>1) Simply document this case and encourage use of<br> host names in /etc/hosts for NFS servers along with<br> specifying use of file before dns in nsswitch.conf.<br> Doing this results in the mounts working whether or<br> not DNS is working.<br><br>2) Call it a bug and patch mount_nfs(8) to retry getaddrinfo(3)<br> until it succeeds. (I feel this would be a POLA violation,<br> given that the current behaviour has existed for decades<br> and for simple cases where the fqdn will never resolve<br> the behaviour would be to hang at the mount attempt<br> during boot unless "bg" is specified for the /etc/fstab entry.)<br><br>3) Add a new NFS mount option "retrydns=<N>", which would enable<br> retries of getaddrinfo(3). This would avoid any POLA violation and<br> would allow for a convenient way to document the behaviour in<br> "man mount_nfs".<br><br>4) ???<br><br>So, what do you think is the preferred change?<br></blockquote><br>I don't think I would change mount_nfs code behavior for this.<br><br>That is, requiring services and daemons etc. to workaround missing,<br>misconfigured, slow, or misbehaving nameservice (whether it's DNS,<br>/etc/hosts, NIS, whatever) seems like more complexity, possibly not<br>effective, and maybe not focused on the right thing.<br><br>Now, without meaning to be presumptuous, it may be worth re-examining<br>the startup sequence, e.g. to make sure NFS mounts are tried after the<br>known dependencies can reasonably be expected to have started, including<br>the network, plus local_unbound or bind (if used), possibly others.<br><br>After a quick look, I don't see an obvious problem with the sequence,<br>but more knowledgeable eyes than mine are welcome. I don't quite follow<br>some of the output from rcorder and service -r.<br><br><blockquote type="cite">ps: I looked and the return value from getaddrinfo(3) does not<br> appear to be useful to discern the case of "DNS service not<br> running yet". (I think it replies EAI_FAIL for this case.)<br></blockquote><br>In that area, I'll note FreeBSD rc.d has a "NETWORKING" dependency for<br>PROVIDE and REQUIRE, and it's included in scripts like nfsclient,<br>mountcritremote et al. However there seems to be no similar dependency<br>for something like "NAMESERVICE" (generic, as opposed to "named"<br>specifically), and I'm not sure how that might be implemented, even<br>assuming it could be useful in a situation like this.<br><br>I.e. there are many things to potentially check for "can the system<br>resolve hostnames yet", and not all of them involve running a local<br>instance of named, unbound, etc.<br><br>In general, if I were running into problems with nameservice not being<br>available by the time NFS mounts happen, I think I'd start by looking<br>into possible nameservice issues, then check out some mechanisms other<br>folks have mentioned (fstab IP addresses or late option, rc.conf<br>netwait_enable, etc.) rather than coding workarounds into NFS itself.<br></blockquote>Well, the patch I have created (it took about 15min) only changes behaviour<br>if a new "retrydns" option i used. As such, I think it might be useful for some,<br>but doesn't change things unless someone uses it.<br><br>I agree with you that I don't think the rc scripts have a way to check REQUIRE<br>dns working. (I, personally, always put the fqdn for NFS servers in /etc/hosts<br>and make sure "files" is first in nsswitch.conf, but others argue that is not<br>feasible for some deployments. (Using IP numbers works for AUTH_SYS,<br>but not Kerberized mounts.)<br><br>Note that there is already "retrycnt", which specifies retry the mount,<br>but that retry loop doesn't include getaddrinfo(3) calls.<br>--> Personally, I do not like always doing retries since I often<br> type mount commands manually and I'm a terrible typist, so I<br> often mistype the server's name.<br><br>This reply was mostly a followup on all the good comments and<br>not just yours.<br><br>Thanks everyone, for your comments, rick<br><br></div></div></blockquote><br></div><div>my 2cents:</div><div><br></div><div>there is a difference of name service not responding and name not resolving. In first case, it will go to:</div><div><br></div><div><p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255);"><span style="font-variant-ligatures: no-common-ligatures"> <b>bg</b> If an initial attempt to contact the server fails, fork</span></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255);"><span style="font-variant-ligatures: no-common-ligatures"> off a child to keep trying the mount in the background.</span></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255);"><span style="font-variant-ligatures: no-common-ligatures"> Useful for fstab(5), where the file system mount is not</span></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255);"><span style="font-variant-ligatures: no-common-ligatures"> critical to multiuser operation.</span></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255); min-height: 16px;"><span style="font-variant-ligatures: no-common-ligatures"></span><br></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255);"><span style="font-variant-ligatures: no-common-ligatures"> <b>bgnow</b> Like <b>bg</b>, fork off a child to keep trying the mount in the</span></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255);"><span style="font-variant-ligatures: no-common-ligatures"> background, but do not attempt to mount in the foreground</span></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255);"><span style="font-variant-ligatures: no-common-ligatures"> first. This eliminates a 60+ second timeout when the</span></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255);"><span style="font-variant-ligatures: no-common-ligatures"> server is not responding. Useful for speeding up the</span></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255);"><span style="font-variant-ligatures: no-common-ligatures"> boot process of a client when the server is likely to be</span></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255);"><span style="font-variant-ligatures: no-common-ligatures"> unavailable. This is often the case for interdependent</span></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255);"><span style="font-variant-ligatures: no-common-ligatures"> servers such as cross-mounted servers (each of two</span></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255);"><span style="font-variant-ligatures: no-common-ligatures"> servers is an NFS client of the other) and for cluster</span></p>
<p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255);"><span style="font-variant-ligatures: no-common-ligatures"> nodes that must boot before the file servers.</span></p><p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255);"><span style="font-variant-ligatures: no-common-ligatures"><br></span></p><p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255);">in second case, its a failure you can not recover from.</p><p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255);"><br></p><p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255);">rgds,</p><p style="margin: 0px; font-style: normal; font-variant-caps: normal; font-stretch: normal; line-height: normal; font-family: Hack; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-variant-emoji: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal; background-color: rgb(255, 255, 255);">toomas</p></div><div><br></div><div><br></div><div><br></div><br></body></html>
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?862576B0-EFBF-4CC9-B99A-723125D60983>
