Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 Oct 2016 08:16:22 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 213410] [carp] service netif restart causes hang only when carp is enabled
Message-ID:  <bug-213410-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D213410

            Bug ID: 213410
           Summary: [carp] service netif restart causes hang only when
                    carp is enabled
           Product: Base System
           Version: 11.0-STABLE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: bin
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: dch@skunkwerks.at

Created attachment 175654
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D175654&action=
=3Dedit
dmesg

# steps

FreeBSD 11.0Rp1 amd64

- dmesg attached
- ifconfig (IPs masked)

igb0: flags=3D8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0=
 mtu
1500
=20=20=20=20=20=20=20
options=3D6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSU=
M,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 78:45:c4:fa:d2:12
        nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
igb1: flags=3D8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0=
 mtu
1500
=20=20=20=20=20=20=20
options=3D6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSU=
M,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 78:45:c4:fa:d2:12
        nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
lo0: flags=3D8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=3D600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: lo
lagg0: flags=3D8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric =
0 mtu
1500
=20=20=20=20=20=20=20
options=3D6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSU=
M,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 78:45:c4:fa:d2:12
        inet 10.0.9.83 netmask 0xfffffff0 broadcast 10.0.9.95
        inet 10.0.9.84 netmask 0xffffffff broadcast 10.0.9.84 vhid 1
        inet 10.0.9.85 netmask 0xffffffff broadcast 10.0.9.85 vhid 3
        inet6 fe80::7a45:c4ff:fefa:d212%lagg0 prefixlen 64 scopeid 0x4
        inet6 3000:3050:3000:4::83 prefixlen 64
        inet6 3000:3050:3000:4::84 prefixlen 64 vhid 2
        inet6 3000:3050:3000:4::85 prefixlen 64 vhid 4
        nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        carp: BACKUP vhid 1 advbase 1 advskew 100
        carp: BACKUP vhid 3 advbase 1 advskew 0
        carp: BACKUP vhid 2 advbase 1 advskew 100
        carp: BACKUP vhid 4 advbase 1 advskew 0
        groups: lagg
        laggproto lacp lagghash l2,l3,l4
        laggport: igb0 flags=3D1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: igb1 flags=3D1c<ACTIVE,COLLECTING,DISTRIBUTING>

issue `service netif restart`

This was initially done via net/mosh connection and tmux inside that,=20
but repeated again with direct console access (KVM remote mgmt tool).

## actual results

the system hangs, 100% reproducible.

- no keyboard entry
- no ability to Alt-F3 to switch tabs
- no ping over network
- a hard reboot is required to regain control
- final message in log appears to be=20
    Oct 12 08:01:22 bridget kernel: lagg0: link state changed to DOWN


### console

Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enab=
le
is set to YES.
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enab=
le
is set to YES.
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: run_rc_command: doit:
netif_stop
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lo0: 48

### /var/log/messages
Oct 12 08:00:00 bridget newsyslog[1525]: logfile turned over due to size>10=
0K
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enab=
le
is set to YES.
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enab=
le
is set to YES.
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: run_rc_command: doit:
netif_stop
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lo0: 48
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_gateway_enable is set to NO.
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lagg0: 3
Oct 12 08:01:21 bridget kernel: carp: 2@lagg0: BACKUP -> INIT (hardware
interface up)
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lagg0: 3
Oct 12 08:01:21 bridget kernel: carp: 4@lagg0: MASTER -> INIT (hardware
interface up)
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lagg0: 3
Oct 12 08:01:21 bridget last message repeated 3 times
Oct 12 08:01:21 bridget kernel: carp: 1@lagg0: BACKUP -> INIT (hardware
interface up)
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lagg0: 3
Oct 12 08:01:21 bridget last message repeated 2 times
Oct 12 08:01:21 bridget kernel: carp: 3@lagg0: MASTER -> INIT (hardware
interface up)
Oct 12 08:01:21 bridget kernel: igb0: promiscuous mode disabled
Oct 12 08:01:21 bridget kernel: igb1: promiscuous mode disabled
Oct 12 08:01:21 bridget kernel: lagg0: promiscuous mode disabled
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: The following interfac=
es
were not configured:
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lagg0: 3
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Destroyed wlan(4)s:
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno:
cloned_interfaces_sticky is set to NO.
Oct 12 08:01:21 bridget kernel: lagg0: link state changed to DOWN
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Destroyed clones: lagg0
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enab=
le
is set to YES.
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: run_rc_command: doit:
netif_start
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Created wlan(4)s:
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Cloned: lagg0
Oct 12 08:01:21 bridget root: /etc/pccard_ether: DEBUG: run_rc_command:
start_precmd: checkauto
Oct 12 08:01:21 bridget root: /etc/pccard_ether: DEBUG: run_rc_command: doi=
t:
pccard_ether_start
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_activate_all_interfaces is set to NO.
Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: checkyesno: netif_ena=
ble
is set to YES.
Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: run_rc_command: doit:
netif_start lagg0
Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: Created wlan(4)s:
Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: Cloned:
Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_activate_all_interfaces is set to NO.
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_activate_all_interfaces is set to NO.
Oct 12 08:01:21 bridget kernel: lagg0: link state changed to UP
Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_gateway_enable is set to NO.
Oct 12 08:01:21 bridget kernel: igb0: promiscuous mode enabled
Oct 12 08:01:21 bridget kernel: igb1: promiscuous mode enabled
Oct 12 08:01:21 bridget kernel: lagg0: promiscuous mode enabled
Oct 12 08:01:21 bridget kernel: igb0: link state changed to DOWN
Oct 12 08:01:21 bridget kernel: carp: 1@lagg0: INIT -> BACKUP (initializati=
on
complete)
Oct 12 08:01:21 bridget kernel: carp: 3@lagg0: INIT -> BACKUP (initializati=
on
complete)
Oct 12 08:01:21 bridget kernel: carp: 2@lagg0: INIT -> BACKUP (initializati=
on
complete)
Oct 12 08:01:21 bridget kernel: carp: 4@lagg0: INIT -> BACKUP (initializati=
on
complete)
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_activate_all_interfaces is set to NO.
Oct 12 08:01:22 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_activate_all_interfaces is set to NO.
Oct 12 08:01:22 bridget kernel: igb1: link state changed to DOWN
Oct 12 08:01:22 bridget kernel: carp: 1@lagg0: BACKUP -> INIT (hardware
interface down)
Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 240 (interface down)
Oct 12 08:01:22 bridget kernel: carp: 3@lagg0: BACKUP -> INIT (hardware
interface down)
Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 480 (interface down)
Oct 12 08:01:22 bridget kernel: carp: 2@lagg0: BACKUP -> INIT (hardware
interface down)
Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 720 (interface down)
Oct 12 08:01:22 bridget kernel: carp: 4@lagg0: BACKUP -> INIT (hardware
interface down)
Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 960 (interface down)
Oct 12 08:01:22 bridget kernel: lagg0: link state changed to DOWN
Oct 12 08:01:24 bridget root: /etc/rc.d/netif: DEBUG: checkyesno: rc_startm=
sgs
is set to YES.

# expected results

after a short period of downtime, the network is re-established.

# notes

if carp config is disabled, and system is rebooted, this functions as expec=
ted.

# config

```
# /etc/rc.conf on 1st node
hostname=3D"one.my.domain"
ifconfig_igb0=3D"up"
ifconfig_igb1=3D"up"
cloned_interfaces=3D"lagg0"
ifconfig_lagg0=3D"inet 10.0.9.82 netmask 255.255.255.240 laggproto lacp lag=
gport
igb0 laggport igb1"
ifconfig_lagg0_ipv6=3D"inet6 3000:3050:3000:4::82/64"
# ifconfig_lo1=3D"inet 10.0.0.254 netmask 255.255.255.0"
defaultrouter=3D"10.0.9.81"
ipv6_defaultrouter=3D"3000:3050:3000:4::1"
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev=3D"AUTO"
zfs_enable=3D"YES"

# carp on
kld_list=3D"carp"
ifconfig_lagg0_aliases=3D"\
        inet  vhid 1 advskew   0 pass pwd1 10.0.9.84/32 \
        inet6 vhid 2 advskew   0 pass pwd2 3000:3050:3000:4::84/64 \
        inet  vhid 3 advskew 100 pass pwd3 10.0.9.85/32 \
        inet6 vhid 4 advskew 100 pass pwd4 3000:3050:3000:4::85/64"

# debugging rc.d scripts
rc_debug=3D"YES"
rc_startmsgs=3D"YES"
```

```
# /etc/rc.conf on 2nd node
hostname=3D"two.my.domain"
ifconfig_igb0=3D"up"
ifconfig_igb1=3D"up"
cloned_interfaces=3D"lagg0"
ifconfig_lagg0=3D"inet 10.0.9.83 netmask 255.255.255.240 laggproto lacp lag=
gport
igb0 laggport igb1"
ifconfig_lagg0_ipv6=3D"inet6 3000:3050:3000:4::83/64"
defaultrouter=3D"10.0.9.81"
ipv6_defaultrouter=3D"3000:3050:3000:4::1"
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev=3D"AUTO"
zfs_enable=3D"YES"

# carp on
kld_list=3D"carp"
ifconfig_lagg0_aliases=3D"\
        inet  vhid 1 advskew 100 pass pwd1 10.0.9.84/32 \
        inet6 vhid 2 advskew 100 pass pwd2 3000:3050:3000:4::84/64 \
        inet  vhid 3 advskew   0 pass pwd3 10.0.9.85/32 \
        inet6 vhid 4 advskew   0 pass pwd4 3000:3050:3000:4::85/64"

# debugging rc.d scripts
rc_debug=3D"YES"
rc_startmsgs=3D"YES"
```

```
# /boot/loader.conf
/boot/loader.conf
# storage
# zfs won't start mounting volumes without this
zfs_load=3D"YES"
kern.geom.label.gptid.enable=3D"0"

# hardware
coretemp_load=3D"YES"

# console
# ensure console in IPMI mode remains accessible instead of going all white
hw.vga.textmode=3D1

# bhyve and jails
vmm_load=3D"YES"
nmdm_load=3D"YES"
if_bridge_load=3D"YES"
if_tap_load=3D"YES"
kern.racct.enable=3D1

# debug super powers
dtraceall_load=3D"YES"

# runtime
# maxfiles
kern.maxfiles=3D"25000"

# network
# fibs
# https://blog.feld.me/posts/2015/06/routing-a-freebsd-jail-through-openvpn/
# https://www.freebsd.org/cgi/man.cgi?query=3Dsetfib
net.fibs=3D2
# from https://calomel.org/freebsd_network_tuning.html
accf_data_load=3D"YES"
accf_dns_load=3D"YES"
autoboot_delay=3D"3"
ahci_load=3D"YES"
aio_load=3D"YES"
cc_htcp_load=3D"YES"
net.tcp.hostcache.cachelimit=3D"0"
```


```
# /etc/sysctl.conf
# carp tweaks
net.inet.carp.preempt=3D1
```

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-213410-8>