From owner-freebsd-bugs@freebsd.org Wed Oct 12 08:16:24 2016 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 48DE9C0D36C for ; Wed, 12 Oct 2016 08:16:24 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 37F43CB3 for ; Wed, 12 Oct 2016 08:16:24 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u9C8GO9J086016 for ; Wed, 12 Oct 2016 08:16:24 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 213410] [carp] service netif restart causes hang only when carp is enabled Date: Wed, 12 Oct 2016 08:16:22 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: 11.0-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: dch@skunkwerks.at X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Oct 2016 08:16:24 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D213410 Bug ID: 213410 Summary: [carp] service netif restart causes hang only when carp is enabled Product: Base System Version: 11.0-STABLE Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: bin Assignee: freebsd-bugs@FreeBSD.org Reporter: dch@skunkwerks.at Created attachment 175654 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D175654&action= =3Dedit dmesg # steps FreeBSD 11.0Rp1 amd64 - dmesg attached - ifconfig (IPs masked) igb0: flags=3D8943 metric 0= mtu 1500 =20=20=20=20=20=20=20 options=3D6403bb ether 78:45:c4:fa:d2:12 nd6 options=3D29 media: Ethernet autoselect (1000baseT ) status: active igb1: flags=3D8943 metric 0= mtu 1500 =20=20=20=20=20=20=20 options=3D6403bb ether 78:45:c4:fa:d2:12 nd6 options=3D29 media: Ethernet autoselect (1000baseT ) status: active lo0: flags=3D8049 metric 0 mtu 16384 options=3D600003 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 inet 127.0.0.1 netmask 0xff000000 nd6 options=3D21 groups: lo lagg0: flags=3D8943 metric = 0 mtu 1500 =20=20=20=20=20=20=20 options=3D6403bb ether 78:45:c4:fa:d2:12 inet 10.0.9.83 netmask 0xfffffff0 broadcast 10.0.9.95 inet 10.0.9.84 netmask 0xffffffff broadcast 10.0.9.84 vhid 1 inet 10.0.9.85 netmask 0xffffffff broadcast 10.0.9.85 vhid 3 inet6 fe80::7a45:c4ff:fefa:d212%lagg0 prefixlen 64 scopeid 0x4 inet6 3000:3050:3000:4::83 prefixlen 64 inet6 3000:3050:3000:4::84 prefixlen 64 vhid 2 inet6 3000:3050:3000:4::85 prefixlen 64 vhid 4 nd6 options=3D21 media: Ethernet autoselect status: active carp: BACKUP vhid 1 advbase 1 advskew 100 carp: BACKUP vhid 3 advbase 1 advskew 0 carp: BACKUP vhid 2 advbase 1 advskew 100 carp: BACKUP vhid 4 advbase 1 advskew 0 groups: lagg laggproto lacp lagghash l2,l3,l4 laggport: igb0 flags=3D1c laggport: igb1 flags=3D1c issue `service netif restart` This was initially done via net/mosh connection and tmux inside that,=20 but repeated again with direct console access (KVM remote mgmt tool). ## actual results the system hangs, 100% reproducible. - no keyboard entry - no ability to Alt-F3 to switch tabs - no ping over network - a hard reboot is required to regain control - final message in log appears to be=20 Oct 12 08:01:22 bridget kernel: lagg0: link state changed to DOWN ### console Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enab= le is set to YES. Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enab= le is set to YES. Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: run_rc_command: doit: netif_stop Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lo0: 48 ### /var/log/messages Oct 12 08:00:00 bridget newsyslog[1525]: logfile turned over due to size>10= 0K Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enab= le is set to YES. Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enab= le is set to YES. Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: run_rc_command: doit: netif_stop Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lo0: 48 Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_gateway_enable is set to NO. Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0: 3 Oct 12 08:01:21 bridget kernel: carp: 2@lagg0: BACKUP -> INIT (hardware interface up) Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0: 3 Oct 12 08:01:21 bridget kernel: carp: 4@lagg0: MASTER -> INIT (hardware interface up) Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0: 3 Oct 12 08:01:21 bridget last message repeated 3 times Oct 12 08:01:21 bridget kernel: carp: 1@lagg0: BACKUP -> INIT (hardware interface up) Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0: 3 Oct 12 08:01:21 bridget last message repeated 2 times Oct 12 08:01:21 bridget kernel: carp: 3@lagg0: MASTER -> INIT (hardware interface up) Oct 12 08:01:21 bridget kernel: igb0: promiscuous mode disabled Oct 12 08:01:21 bridget kernel: igb1: promiscuous mode disabled Oct 12 08:01:21 bridget kernel: lagg0: promiscuous mode disabled Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: The following interfac= es were not configured: Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0: 3 Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Destroyed wlan(4)s: Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: cloned_interfaces_sticky is set to NO. Oct 12 08:01:21 bridget kernel: lagg0: link state changed to DOWN Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Destroyed clones: lagg0 Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enab= le is set to YES. Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: run_rc_command: doit: netif_start Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Created wlan(4)s: Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Cloned: lagg0 Oct 12 08:01:21 bridget root: /etc/pccard_ether: DEBUG: run_rc_command: start_precmd: checkauto Oct 12 08:01:21 bridget root: /etc/pccard_ether: DEBUG: run_rc_command: doi= t: pccard_ether_start Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_activate_all_interfaces is set to NO. Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: checkyesno: netif_ena= ble is set to YES. Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: run_rc_command: doit: netif_start lagg0 Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: Created wlan(4)s: Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: Cloned: Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_activate_all_interfaces is set to NO. Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_activate_all_interfaces is set to NO. Oct 12 08:01:21 bridget kernel: lagg0: link state changed to UP Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_gateway_enable is set to NO. Oct 12 08:01:21 bridget kernel: igb0: promiscuous mode enabled Oct 12 08:01:21 bridget kernel: igb1: promiscuous mode enabled Oct 12 08:01:21 bridget kernel: lagg0: promiscuous mode enabled Oct 12 08:01:21 bridget kernel: igb0: link state changed to DOWN Oct 12 08:01:21 bridget kernel: carp: 1@lagg0: INIT -> BACKUP (initializati= on complete) Oct 12 08:01:21 bridget kernel: carp: 3@lagg0: INIT -> BACKUP (initializati= on complete) Oct 12 08:01:21 bridget kernel: carp: 2@lagg0: INIT -> BACKUP (initializati= on complete) Oct 12 08:01:21 bridget kernel: carp: 4@lagg0: INIT -> BACKUP (initializati= on complete) Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_activate_all_interfaces is set to NO. Oct 12 08:01:22 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_activate_all_interfaces is set to NO. Oct 12 08:01:22 bridget kernel: igb1: link state changed to DOWN Oct 12 08:01:22 bridget kernel: carp: 1@lagg0: BACKUP -> INIT (hardware interface down) Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 240 (interface down) Oct 12 08:01:22 bridget kernel: carp: 3@lagg0: BACKUP -> INIT (hardware interface down) Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 480 (interface down) Oct 12 08:01:22 bridget kernel: carp: 2@lagg0: BACKUP -> INIT (hardware interface down) Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 720 (interface down) Oct 12 08:01:22 bridget kernel: carp: 4@lagg0: BACKUP -> INIT (hardware interface down) Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 960 (interface down) Oct 12 08:01:22 bridget kernel: lagg0: link state changed to DOWN Oct 12 08:01:24 bridget root: /etc/rc.d/netif: DEBUG: checkyesno: rc_startm= sgs is set to YES. # expected results after a short period of downtime, the network is re-established. # notes if carp config is disabled, and system is rebooted, this functions as expec= ted. # config ``` # /etc/rc.conf on 1st node hostname=3D"one.my.domain" ifconfig_igb0=3D"up" ifconfig_igb1=3D"up" cloned_interfaces=3D"lagg0" ifconfig_lagg0=3D"inet 10.0.9.82 netmask 255.255.255.240 laggproto lacp lag= gport igb0 laggport igb1" ifconfig_lagg0_ipv6=3D"inet6 3000:3050:3000:4::82/64" # ifconfig_lo1=3D"inet 10.0.0.254 netmask 255.255.255.0" defaultrouter=3D"10.0.9.81" ipv6_defaultrouter=3D"3000:3050:3000:4::1" # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable dumpdev=3D"AUTO" zfs_enable=3D"YES" # carp on kld_list=3D"carp" ifconfig_lagg0_aliases=3D"\ inet vhid 1 advskew 0 pass pwd1 10.0.9.84/32 \ inet6 vhid 2 advskew 0 pass pwd2 3000:3050:3000:4::84/64 \ inet vhid 3 advskew 100 pass pwd3 10.0.9.85/32 \ inet6 vhid 4 advskew 100 pass pwd4 3000:3050:3000:4::85/64" # debugging rc.d scripts rc_debug=3D"YES" rc_startmsgs=3D"YES" ``` ``` # /etc/rc.conf on 2nd node hostname=3D"two.my.domain" ifconfig_igb0=3D"up" ifconfig_igb1=3D"up" cloned_interfaces=3D"lagg0" ifconfig_lagg0=3D"inet 10.0.9.83 netmask 255.255.255.240 laggproto lacp lag= gport igb0 laggport igb1" ifconfig_lagg0_ipv6=3D"inet6 3000:3050:3000:4::83/64" defaultrouter=3D"10.0.9.81" ipv6_defaultrouter=3D"3000:3050:3000:4::1" # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable dumpdev=3D"AUTO" zfs_enable=3D"YES" # carp on kld_list=3D"carp" ifconfig_lagg0_aliases=3D"\ inet vhid 1 advskew 100 pass pwd1 10.0.9.84/32 \ inet6 vhid 2 advskew 100 pass pwd2 3000:3050:3000:4::84/64 \ inet vhid 3 advskew 0 pass pwd3 10.0.9.85/32 \ inet6 vhid 4 advskew 0 pass pwd4 3000:3050:3000:4::85/64" # debugging rc.d scripts rc_debug=3D"YES" rc_startmsgs=3D"YES" ``` ``` # /boot/loader.conf /boot/loader.conf # storage # zfs won't start mounting volumes without this zfs_load=3D"YES" kern.geom.label.gptid.enable=3D"0" # hardware coretemp_load=3D"YES" # console # ensure console in IPMI mode remains accessible instead of going all white hw.vga.textmode=3D1 # bhyve and jails vmm_load=3D"YES" nmdm_load=3D"YES" if_bridge_load=3D"YES" if_tap_load=3D"YES" kern.racct.enable=3D1 # debug super powers dtraceall_load=3D"YES" # runtime # maxfiles kern.maxfiles=3D"25000" # network # fibs # https://blog.feld.me/posts/2015/06/routing-a-freebsd-jail-through-openvpn/ # https://www.freebsd.org/cgi/man.cgi?query=3Dsetfib net.fibs=3D2 # from https://calomel.org/freebsd_network_tuning.html accf_data_load=3D"YES" accf_dns_load=3D"YES" autoboot_delay=3D"3" ahci_load=3D"YES" aio_load=3D"YES" cc_htcp_load=3D"YES" net.tcp.hostcache.cachelimit=3D"0" ``` ``` # /etc/sysctl.conf # carp tweaks net.inet.carp.preempt=3D1 ``` --=20 You are receiving this mail because: You are the assignee for the bug.=