Date: Wed, 08 Jun 2005 00:37:59 -0500 (EST) From: Terry Kennedy <terry@tmk.com> To: freebsd-stable@freebsd.org Subject: Strange TCP-related hang on startup w/ recent CVSUP Message-ID: <01LP77OMTW08000CP8@tmk.com>
next in thread | raw e-mail | index | archive | help
I have a number of boxes which have been running well w/ 5.4-stable. The last kernel and userland builds on those boxes were on May 26th, from a CVSUP done that same day. Last night I did another CVSUP and the usual buildworld/buildkernel/in- stallkernel/installworld/mergemaster, and the boxes wouldn't come back up, hanging in sbwait state at the point where they would normally do their first net access. In one box, this is a batch of NFS mounts; on the other it is the initial ntpdate query. At this point the boxes don't respond to pings and sit for- ever (at least an hour) not doing anything. Typing ^C on the console aborts the hanging process, and then the startup proceeds. With a ping going from another system, I see that the net doesn't come up on the problem boxes until several seconds into the execution of the next network-related command - it is almost as if the previous ifconfig (done from rc.conf) didn't have any effect (despite the "link up" message on the console). The other oddity is that I get a "rpc.lockd: 100024 RPC: Port mapper failure" right after lockd starts, even though other net commands have completed successfully. Booting from kernel.old (the May 26th one) boots normally, so it is a ker- nel issue, not something in userland. Hardware is a Tyan S2721-533 (dual Xeon) board w/ onboard Intel "em" Gig- abit Ethernet. To keep this message short, I'm omitting the dmesg output, but it can be found at http://www.tmk.com/raidzilla. Here's the relevant parts of the console output from the two systems: first system: /dev/da1s3h: clean, 199137891 free (819 frags, 24892134 blocks, 0.0% fragmentation) Setting hostname: rz1.tmk.com. em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 options=b<RXCSUM,TXCSUM,VLAN_MTU> inet 204.141.35.40 netmask 0xffffff00 broadcast 204.141.35.255 inet6 fe80::2e0:81ff:fe28:94d6%em0 prefixlen 64 tentative scopeid 0x1 ether 00:e0:81:28:94:d6 media: Ethernet 1000baseTX <full-duplex> (autoselect) status: no carrier lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 add net default: gateway 204.141.35.1 Additional routing options:. Starting devd. Mounting NFS file systems:em0: Link is up 1000 Mbps Full Duplex load: 1.10 cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k load: 1.10 cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k load: 1.10 cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k load: 1.10 cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k load: 1.01 cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k load: 1.01 cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k load: 0.56 cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k load: 0.56 cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k load: 0.07 cmd: mount_nfs 395 [sbwait] 0.00u 0.00s 0% 820k ^CScript /etc/rc.d/mountcritremote interrupted Starting syslogd. Checking for core dump on /dev/da0s1b... savecore: no dumps found Setting date via ntp. Looking for host gate.tmk.com and service ntp host found : gate.tmk.com Looking for host 204.141.35.61 and service ntp host found : gate.tmk.com Looking for host 199.224.0.146 and service ntp host found : ns1.ispnetinc.net Looking for host 199.224.0.154 and service ntp host found : ns2.ispnetinc.net Looking for host 204.141.40.135 and service ntp host found : ns3.ispnetinc.net 8 Jun 00:30:34 ntpdate[416]: step time server 199.224.0.146 offset 1.526419 sec Starting rpcbind. NFS access cache time=2 ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/X11R6/lib /usr/local/lib a.out ldconfig path: /usr/lib/aout /usr/lib/compat/aout /usr/X11R6/lib/aout Starting mountd. Starting nfsd. Starting statd. Starting lockd. Jun 8 00:30:35 rz1 rpc.lockd: 100024 RPC: Port mapper failure Starting usbd. Starting local daemons:. Updating motd. Starting ntpd. Starting rwhod. Configuring syscons: blanktime screensaver. second system: /dev/da1s3h: clean, 199127439 free (1095 frags, 24890793 blocks, 0.0% fragmentation) Setting hostname: rz2.tmk.com. em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 options=b<RXCSUM,TXCSUM,VLAN_MTU> inet 204.141.35.41 netmask 0xffffff00 broadcast 204.141.35.255 inet6 fe80::2e0:81ff:fe2e:67b0%em0 prefixlen 64 tentative scopeid 0x1 ether 00:e0:81:2e:67:b0 media: Ethernet autoselect status: no carrier lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 add net default: gateway 204.141.35.1 Additional routing options:. Starting devd. Mounting NFS file systems:. Starting syslogd. Checking for core dump on /dev/da0s1b... savecore: no dumps found Setting date via ntp. Looking for host gate.tmk.com and service ntp em0: Link is up 1000 Mbps Full Duplex load: 0.75 cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k load: 0.75 cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k load: 0.75 cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k load: 0.75 cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k load: 0.69 cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k load: 0.64 cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k load: 0.64 cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k load: 0.02 cmd: ntpdate 420 [sbwait] 0.00u 0.00s 0% 844k ^CScript /etc/rc.d/ntpdate interrupted Starting rpcbind. NFS access cache time=2 ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/X11R6/lib /usr/local/lib a.out ldconfig path: /usr/lib/aout /usr/lib/compat/aout /usr/X11R6/lib/aout Starting mountd. Starting nfsd. Starting statd. Starting lockd. Jun 8 00:58:41 rz2 rpc.lockd: 100024 RPC: Port mapper failure Starting usbd. Starting local daemons:. Updating motd. Starting ntpd. Starting rwhod. Configuring syscons: blanktime screensaver. Here are the list of changed kernel files since the previous (working) kernel: find . -newer /tmp/date ./alpha/alpha/busdma_machdep.c ./amd64/amd64/identcpu.c ./amd64/amd64/machdep.c ./conf/NOTES ./conf/files ./conf/options ./dev/ata/ata-chipset.c ./dev/ata/ata-pci.h ./dev/bktr/bktr_reg.h ./dev/sound/pci/ich.c ./kern/uipc_socket.c ./kern/kern_event.c ./kern/sysv_shm.c ./kern/subr_unit.c ./modules/netgraph/device ./modules/netgraph/device/Makefile ./modules/netgraph/Makefile.inc ./modules/udbp/Makefile ./netgraph/bluetooth/drivers/ubt ./netgraph/bluetooth/drivers/ubt/ng_ubt.c ./netgraph/ng_eiface.c ./netgraph/netgraph.h ./netgraph/ng_base.c ./netgraph/ng_iface.c ./netgraph/ng_device.c ./netgraph/ng_ksocket.c ./netinet/tcp_subr.c ./netinet/ip_divert.c ./netinet/ip_icmp.c ./netinet/ip_icmp.h ./netinet6/ipsec.c ./sys/systm.h Any ideas? Also, if there is additional information needed, let me know what to do and I'll report back. Terry Kennedy http://www.tmk.com terry@tmk.com New York, NY USA
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?01LP77OMTW08000CP8>