Date: Wed, 12 Oct 2016 08:16:22 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 213410] [carp] service netif restart causes hang only when carp is enabled Message-ID: <bug-213410-8@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D213410 Bug ID: 213410 Summary: [carp] service netif restart causes hang only when carp is enabled Product: Base System Version: 11.0-STABLE Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: bin Assignee: freebsd-bugs@FreeBSD.org Reporter: dch@skunkwerks.at Created attachment 175654 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D175654&action= =3Dedit dmesg # steps FreeBSD 11.0Rp1 amd64 - dmesg attached - ifconfig (IPs masked) igb0: flags=3D8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0= mtu 1500 =20=20=20=20=20=20=20 options=3D6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSU= M,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 78:45:c4:fa:d2:12 nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active igb1: flags=3D8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0= mtu 1500 =20=20=20=20=20=20=20 options=3D6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSU= M,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 78:45:c4:fa:d2:12 nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active lo0: flags=3D8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=3D600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6> inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 inet 127.0.0.1 netmask 0xff000000 nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL> groups: lo lagg0: flags=3D8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric = 0 mtu 1500 =20=20=20=20=20=20=20 options=3D6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSU= M,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 78:45:c4:fa:d2:12 inet 10.0.9.83 netmask 0xfffffff0 broadcast 10.0.9.95 inet 10.0.9.84 netmask 0xffffffff broadcast 10.0.9.84 vhid 1 inet 10.0.9.85 netmask 0xffffffff broadcast 10.0.9.85 vhid 3 inet6 fe80::7a45:c4ff:fefa:d212%lagg0 prefixlen 64 scopeid 0x4 inet6 3000:3050:3000:4::83 prefixlen 64 inet6 3000:3050:3000:4::84 prefixlen 64 vhid 2 inet6 3000:3050:3000:4::85 prefixlen 64 vhid 4 nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect status: active carp: BACKUP vhid 1 advbase 1 advskew 100 carp: BACKUP vhid 3 advbase 1 advskew 0 carp: BACKUP vhid 2 advbase 1 advskew 100 carp: BACKUP vhid 4 advbase 1 advskew 0 groups: lagg laggproto lacp lagghash l2,l3,l4 laggport: igb0 flags=3D1c<ACTIVE,COLLECTING,DISTRIBUTING> laggport: igb1 flags=3D1c<ACTIVE,COLLECTING,DISTRIBUTING> issue `service netif restart` This was initially done via net/mosh connection and tmux inside that,=20 but repeated again with direct console access (KVM remote mgmt tool). ## actual results the system hangs, 100% reproducible. - no keyboard entry - no ability to Alt-F3 to switch tabs - no ping over network - a hard reboot is required to regain control - final message in log appears to be=20 Oct 12 08:01:22 bridget kernel: lagg0: link state changed to DOWN ### console Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enab= le is set to YES. Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enab= le is set to YES. Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: run_rc_command: doit: netif_stop Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lo0: 48 ### /var/log/messages Oct 12 08:00:00 bridget newsyslog[1525]: logfile turned over due to size>10= 0K Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enab= le is set to YES. Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enab= le is set to YES. Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: run_rc_command: doit: netif_stop Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lo0: 48 Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_gateway_enable is set to NO. Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0: 3 Oct 12 08:01:21 bridget kernel: carp: 2@lagg0: BACKUP -> INIT (hardware interface up) Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0: 3 Oct 12 08:01:21 bridget kernel: carp: 4@lagg0: MASTER -> INIT (hardware interface up) Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0: 3 Oct 12 08:01:21 bridget last message repeated 3 times Oct 12 08:01:21 bridget kernel: carp: 1@lagg0: BACKUP -> INIT (hardware interface up) Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0: 3 Oct 12 08:01:21 bridget last message repeated 2 times Oct 12 08:01:21 bridget kernel: carp: 3@lagg0: MASTER -> INIT (hardware interface up) Oct 12 08:01:21 bridget kernel: igb0: promiscuous mode disabled Oct 12 08:01:21 bridget kernel: igb1: promiscuous mode disabled Oct 12 08:01:21 bridget kernel: lagg0: promiscuous mode disabled Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: The following interfac= es were not configured: Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0: 3 Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Destroyed wlan(4)s: Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: cloned_interfaces_sticky is set to NO. Oct 12 08:01:21 bridget kernel: lagg0: link state changed to DOWN Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Destroyed clones: lagg0 Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enab= le is set to YES. Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: run_rc_command: doit: netif_start Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Created wlan(4)s: Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Cloned: lagg0 Oct 12 08:01:21 bridget root: /etc/pccard_ether: DEBUG: run_rc_command: start_precmd: checkauto Oct 12 08:01:21 bridget root: /etc/pccard_ether: DEBUG: run_rc_command: doi= t: pccard_ether_start Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_activate_all_interfaces is set to NO. Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: checkyesno: netif_ena= ble is set to YES. Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: run_rc_command: doit: netif_start lagg0 Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: Created wlan(4)s: Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: Cloned: Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_activate_all_interfaces is set to NO. Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_activate_all_interfaces is set to NO. Oct 12 08:01:21 bridget kernel: lagg0: link state changed to UP Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_gateway_enable is set to NO. Oct 12 08:01:21 bridget kernel: igb0: promiscuous mode enabled Oct 12 08:01:21 bridget kernel: igb1: promiscuous mode enabled Oct 12 08:01:21 bridget kernel: lagg0: promiscuous mode enabled Oct 12 08:01:21 bridget kernel: igb0: link state changed to DOWN Oct 12 08:01:21 bridget kernel: carp: 1@lagg0: INIT -> BACKUP (initializati= on complete) Oct 12 08:01:21 bridget kernel: carp: 3@lagg0: INIT -> BACKUP (initializati= on complete) Oct 12 08:01:21 bridget kernel: carp: 2@lagg0: INIT -> BACKUP (initializati= on complete) Oct 12 08:01:21 bridget kernel: carp: 4@lagg0: INIT -> BACKUP (initializati= on complete) Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_activate_all_interfaces is set to NO. Oct 12 08:01:22 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_activate_all_interfaces is set to NO. Oct 12 08:01:22 bridget kernel: igb1: link state changed to DOWN Oct 12 08:01:22 bridget kernel: carp: 1@lagg0: BACKUP -> INIT (hardware interface down) Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 240 (interface down) Oct 12 08:01:22 bridget kernel: carp: 3@lagg0: BACKUP -> INIT (hardware interface down) Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 480 (interface down) Oct 12 08:01:22 bridget kernel: carp: 2@lagg0: BACKUP -> INIT (hardware interface down) Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 720 (interface down) Oct 12 08:01:22 bridget kernel: carp: 4@lagg0: BACKUP -> INIT (hardware interface down) Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 960 (interface down) Oct 12 08:01:22 bridget kernel: lagg0: link state changed to DOWN Oct 12 08:01:24 bridget root: /etc/rc.d/netif: DEBUG: checkyesno: rc_startm= sgs is set to YES. # expected results after a short period of downtime, the network is re-established. # notes if carp config is disabled, and system is rebooted, this functions as expec= ted. # config ``` # /etc/rc.conf on 1st node hostname=3D"one.my.domain" ifconfig_igb0=3D"up" ifconfig_igb1=3D"up" cloned_interfaces=3D"lagg0" ifconfig_lagg0=3D"inet 10.0.9.82 netmask 255.255.255.240 laggproto lacp lag= gport igb0 laggport igb1" ifconfig_lagg0_ipv6=3D"inet6 3000:3050:3000:4::82/64" # ifconfig_lo1=3D"inet 10.0.0.254 netmask 255.255.255.0" defaultrouter=3D"10.0.9.81" ipv6_defaultrouter=3D"3000:3050:3000:4::1" # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable dumpdev=3D"AUTO" zfs_enable=3D"YES" # carp on kld_list=3D"carp" ifconfig_lagg0_aliases=3D"\ inet vhid 1 advskew 0 pass pwd1 10.0.9.84/32 \ inet6 vhid 2 advskew 0 pass pwd2 3000:3050:3000:4::84/64 \ inet vhid 3 advskew 100 pass pwd3 10.0.9.85/32 \ inet6 vhid 4 advskew 100 pass pwd4 3000:3050:3000:4::85/64" # debugging rc.d scripts rc_debug=3D"YES" rc_startmsgs=3D"YES" ``` ``` # /etc/rc.conf on 2nd node hostname=3D"two.my.domain" ifconfig_igb0=3D"up" ifconfig_igb1=3D"up" cloned_interfaces=3D"lagg0" ifconfig_lagg0=3D"inet 10.0.9.83 netmask 255.255.255.240 laggproto lacp lag= gport igb0 laggport igb1" ifconfig_lagg0_ipv6=3D"inet6 3000:3050:3000:4::83/64" defaultrouter=3D"10.0.9.81" ipv6_defaultrouter=3D"3000:3050:3000:4::1" # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable dumpdev=3D"AUTO" zfs_enable=3D"YES" # carp on kld_list=3D"carp" ifconfig_lagg0_aliases=3D"\ inet vhid 1 advskew 100 pass pwd1 10.0.9.84/32 \ inet6 vhid 2 advskew 100 pass pwd2 3000:3050:3000:4::84/64 \ inet vhid 3 advskew 0 pass pwd3 10.0.9.85/32 \ inet6 vhid 4 advskew 0 pass pwd4 3000:3050:3000:4::85/64" # debugging rc.d scripts rc_debug=3D"YES" rc_startmsgs=3D"YES" ``` ``` # /boot/loader.conf /boot/loader.conf # storage # zfs won't start mounting volumes without this zfs_load=3D"YES" kern.geom.label.gptid.enable=3D"0" # hardware coretemp_load=3D"YES" # console # ensure console in IPMI mode remains accessible instead of going all white hw.vga.textmode=3D1 # bhyve and jails vmm_load=3D"YES" nmdm_load=3D"YES" if_bridge_load=3D"YES" if_tap_load=3D"YES" kern.racct.enable=3D1 # debug super powers dtraceall_load=3D"YES" # runtime # maxfiles kern.maxfiles=3D"25000" # network # fibs # https://blog.feld.me/posts/2015/06/routing-a-freebsd-jail-through-openvpn/ # https://www.freebsd.org/cgi/man.cgi?query=3Dsetfib net.fibs=3D2 # from https://calomel.org/freebsd_network_tuning.html accf_data_load=3D"YES" accf_dns_load=3D"YES" autoboot_delay=3D"3" ahci_load=3D"YES" aio_load=3D"YES" cc_htcp_load=3D"YES" net.tcp.hostcache.cachelimit=3D"0" ``` ``` # /etc/sysctl.conf # carp tweaks net.inet.carp.preempt=3D1 ``` --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-213410-8>