Date: Thu, 6 Mar 2008 21:05:32 +0100 From: Daniel Gerzo <danger@FreeBSD.org> To: current@FreeBSD.org Cc: yongari@FreeBSD.org Subject: re(4) problem Message-ID: <20080306200532.GA84961@cvsup.sk.freebsd.org>
next in thread | raw e-mail | index | archive | help
Hello people, I would like to report a problem with re(4) device. I am running the following system: FreeBSD 7.0-STABLE #2: Sat Mar 1 18:55:23 CET 2008 amd64 The system is build including a patch available at: http://people.freebsd.org/~yongari/re/re.HEAD.patch The problem occoured already 3 times (in around a week period of time), always suddenly after some time. I don't know how to reproduce it :-( The machine in a question has two NIC cards, one em(4) based and one re(4) based. When a problem occurs, I am able to connect to the machine only through em(4) - with no problems. The symptons are following: - the machine does not reply to a icmp echo requests to the re(4) device - When I try to ping some remote host over re(4) based card I get: ping: sendto: No buffer space available - When I run tcpdump -vv -i re0, I can see only arp requests (ha-web1 is the machine in question) no other reasonable traffic: 20:30:20.945662 arp who-has 85.10.197.188 tell 85.10.197.161 20:30:20.947624 arp who-has 85.10.197.189 tell 85.10.197.161 20:30:20.949021 arp who-has 85.10.197.190 tell 85.10.197.161 20:30:21.136417 arp who-has ha-web1 tell 85.10.199.1 20:30:22.153493 arp who-has 85.10.197.169 tell 85.10.197.161 20:30:23.286400 arp who-has ha-web1 tell 85.10.199.1 20:30:23.299547 arp who-has 85.10.199.12 tell 85.10.199.1 - The output of netstat -m: root@[ha-web1 /home/danger]# netstat -m 1047/648/1695 mbufs in use (current/cache/total) 879/335/1214/25600 mbuf clusters in use (current/cache/total/max) 879/267 mbuf+clusters out of packet secondary zone in use (current/cache) 16/265/281/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 2092K/1892K/3984K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 37742 requests for I/O initiated by sendfile 0 calls to protocol drain routines - ifconfig re0 output: danger@[ha-web1 ~]> ifconfig re0: flags=8c43<UP,BROADCAST,RUNNING,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4> ether 00:1d:92:34:12:7a inet 85.10.199.6 netmask 0xffffffe0 broadcast 85.10.199.31 media: Ethernet autoselect (100baseTX <full-duplex>) status: active - When I run ifconfig re0 down, the devices doesn't go down unless I type also ifconfig re0 up. In the meantime ifconfig still says that the device is active and /var/log/messages doesn't mention it has gone down. When I also type ifconfig re0 up, the device goes down and immediately up, but the network still doesn't work, however I don't get ENOBUFS error when I try to ping a remote host anymore. After this procedure I am unable to ssh to this box over em(4) as well (ping works). Now, when I run /etc/rc.d/netif restart, I can connect to the machine over em(4) again. When I ping remote host over re(4), I get ping: sendto: No route to host. When I run /etc/rc.d/routing restart, ping doesn't report anything, but I can see again arp requests over tcpdump. - No interrupt storms are being reported in /var/log/messages, also it doesn't include anything strange, either dmesg. I suppose its a bug in re(4), otherwise I assume that the network wouldn't work over em(4) as well. If you need any information I can provide to help debug this problem, please let me know, I will leave the machine in this status if a customer permits me to do so. -- Best Regards, Daniel Gerzo mailto:danger@FreeBSD.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080306200532.GA84961>