Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Feb 2011 04:47:04 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Olaf Seibert <O.Seibert@cs.ru.nl>, net@freebsd.org, freebsd-stable@freebsd.org, Jeremy Chadwick <freebsd@jdc.parodius.com>, Steven Hartland <killing@multiplay.co.uk>
Subject:   Re: mountd has resolving problems
Message-ID:  <20110218043432.S3233@besplex.bde.org>
In-Reply-To: <201102171158.24636.jhb@freebsd.org>
References:  <20100909131017.GO4404@twoquid.cs.ru.nl> <20100909140529.GB76889@icarus.home.lan> <FD94648144304764A7A8E4589DC33EE6@multiplay.co.uk> <201102171158.24636.jhb@freebsd.org>

index | next in thread | previous in thread | raw e-mail

On Thu, 17 Feb 2011, John Baldwin wrote:

> On Thursday, February 17, 2011 7:18:28 am Steven Hartland wrote:
>> This has become a issue for us in 8.x as well.
>>
>> I'm pretty sure in pre 8.x these nfs mounts would simply background but
>> recently machines are now failing to boot. It seems that failure to
>> lookup nfs mount point hosts now causes this fatal error :(
>>
>> We've just tried Jeremy's netwait script and it works perfectly so either
>> this or something similar needs to get pushed into base.
>>
>> For reference the reason we need a delay here is our core Cisco router
>> takes a while to bring the port up properly on boot.
>>
>> Thanks for sharing the script Jeremy :)
>
> I use a similar hack that waits up to 30 seconds for the default gateway to be
> pingable.  I think it is at least partly related to the new ARP code that now
> drops packets in IP output if the link is down.

I use hackish ping -t <timeout much smaller than 30 seconds since even 2
seconds is annoying>s and traceroutes in /etc/rc.d/netif.  Don't know if
it is the same problem.  It affects mainly nfs and ntpdate/ntpd to local
systems here.  Even with all-static routes.

> This can be very problematic
> during boot since some interfaces take a few seconds to negotiate link but
> the end result of the new check in IP output is that the attempt to send the
> packet fails with an error causing gethostbyname() and getaddrinfo() to fail
> completely without doing any retries.  In 7 the packet would either sit in the

Also after down/up to change something.  If you try to use the network
before it is back then you have to wait much longer before it is really
back.  This is a relatively minor problem since down/up is not needed
routinely.

> descriptor ring until link was up, or it would be dropped, but it would
> silently fail, so the resolver in libc would just retry in 30 seconds or so at
> which time it would work fine.
>
> Waiting for the default route to be pingable actually fixed a few other
> problems for us on 7 though as well (often ntpdate would not work on boot and
> now it works reliably, etc.) so we went with that route.

I thought I first saw the problem a little earlier, and it affected bge more
than fxp.  Maybe the latter is correct and the problem is smaller with fxp
just because it is ready sooner.

Bruce


home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110218043432.S3233>