Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Nov 2006 13:06:31 +0200
From:      Ian FREISLICH <if@hetzner.co.za>
To:        Robert Watson <rwatson@FreeBSD.org>
Cc:        current@freebsd.org
Subject:   Re: Panic during boot (in_arpinput). 
Message-ID:  <E1Gkf4h-00011r-PC@hetzner.co.za>
In-Reply-To: Message from Robert Watson <rwatson@FreeBSD.org> of "Wed, 15 Nov 2006 09:03:28 GMT." <20061115085427.R79655@fledge.watson.org> 

next in thread | previous in thread | raw e-mail | index | archive | help
Robert Watson wrote:
> 
> On Wed, 15 Nov 2006, Ian FREISLICH wrote:
> 
> > Robert Watson wrote:
> >> On Wed, 15 Nov 2006, Ian FREISLICH wrote:
> >>
> >>> Ian FREISLICH wrote:
> >>>>
> >>>> I have 2 servers each with 255 vlan interfaces and carp interfaces in 
> >>>> each vlan.During the boot up while it's configuring the interfaces, it 
> >>>> reliably panics.  It boots fine if no network cables are plugged in (and
 
> >>>> in the test evironment on a quient lan).
> >>>>
> >>>> It's an SMP machine.  My guess (from the panic message below) is that an
 
> >>>> arp query arives on an interface it's in the middle of creating or 
> >>>> something like that (highly unsophisticated debugging conjecture).
> >>>>
> >>>> In the mean time I'm going to try a UP kernel and see if that masks the 
> >>>> problem.
> >>>
> >>> FWIW, a UP kernel has the same problem.
> >>
> >> What happens if you disable PREEMPTION on UP and try the same thing again?
> >
> > Same thing.
> >
> > If I don't assign the carp interfaces a vhid and pass at boot time, it boot
s 
> > up OK, but I need the carp interfaces.  I can arrange serial console access
. 
> > I have a similar system from ~"Tue Aug 29 09:47:50 SAST 2006" that works, 
> > but I suspect it may suffer the same problem.  I'm about to test this.
> 
> This suggests that it is not the race I was worried it was, which is really 
> good news :-).  This makes me suspect a CARP-specific bug as opposed to the 
> wider issue of under-synchronization of the address lists.

I'm not sure that ia_hash from

netinet/in_var.h: LIST_ENTRY(in_ifaddr) ia_hash;  /* entry in bucket of inet addresses */

is being locked properly.  Or somehow junk data is making its way
into the list.

I've narrowed it down to configuration errors in /etc/rc.conf.
Bringing vlans or CARP interfaces up with some bogus data (due to
cut and paste errors):

ifconfig_vlan2001="vlandev em0 vlan 2001"
...
ifconfig_vlan2001="vhid 1 pass 4f5116b5b66a5bcc3c096c72eeabb7bd"

or

ifconfig_carp166_name="vlan166_vrrp"
ifconfig_carp167_name="vlan167_vrrp"
ifconfig_vlan166_vrrp="vhid 161 advskew 0 pass 0f2c9a93eea6f38fabb3acb1c31488c6"ifconfig_vlan167_vrrp="vhid 161 advskew 0 pass 0f2c9a93eea6f38fabb3acb1c31488c6"

Note that the 3rd and "4th" line depending on your terminal size
are 1 line.  It seems that when CARP interfaces are in play, even
without addresses so their parent ip interfaces aren't in promiscuous
mode, if an ifconfig at boot time produces an error:

ifconfig: SIOCGVH: Invalid argument
ifconfig: ioctl (SIOCAIFADDR): Can't assign requested address

later, while it's printing the ifconfig data for the 255 vlans on
the console, the system panics as previously written if it's ethernet
recieves an arp request at a critical time.

Ian

--
Ian Freislich



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E1Gkf4h-00011r-PC>