Date: Wed, 9 Jun 2010 11:28:52 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Anders Nordby <anders@FreeBSD.org> Cc: freebsd-fs@FreeBSD.org Subject: Re: Odd network issues on ZFS based NFS server Message-ID: <Pine.GSO.4.63.1006091119410.23896@muncher.cs.uoguelph.ca> In-Reply-To: <20100609122517.GA16231@fupp.net> References: <20100608083649.GA77452@fupp.net> <Pine.GSO.4.63.1006081946040.8742@muncher.cs.uoguelph.ca> <20100609122517.GA16231@fupp.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 9 Jun 2010, Anders Nordby wrote: > > Thanks. The only thing that (temporarily) solves this issue so far is > rebooting, which helps only for a day or so. I have tried different > NICs, replacing the physical server, replacing cables, changing and > resetting switch ports. But it did not help, so I think this is a > software problem. I will try zio_use_uma = 0 I think, and then try to > limit vfs.zfs.arc_max to 100 MB or so. > When you tried a different NIC, was a different type (ie. different chipset that uses a different device driver)? I suggested that not because I thought the hardware was broken but because I thought it might be related to the network interface's device driver and switching to a different device driver would isolate that possibility. > On the ZFS+NFS server while having these issues: > > root@unixfile:~# netstat -m > 1293/4602/5895 mbufs in use (current/cache/total) > 1109/3619/4728/65536 mbuf clusters in use (current/cache/total/max) > 257/1023 mbuf+clusters out of packet secondary zone in use > (current/cache) > 0/104/104/12800 4k (page size) jumbo clusters in use > (current/cache/total/max) > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > 2541K/8804K/11345K bytes allocated to network (current/cache/total) > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0/0/0 sfbufs in use (current/peak/max) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > 0 calls to protocol drain routines > > Packet loss seen from my workstation: > > anders@noname:~$ ping unixfile > PING unixfile.aftenposten.no (192.168.120.33) 56(84) bytes of data. > 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=1 > ttl=63 time=0 > .230 ms > 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=3 > ttl=63 time=0 > .262 ms > 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=5 > ttl=63 time=0 > .272 ms > 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=6 > ttl=63 time=0 > .203 ms > 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=7 > ttl=63 time=0 > .306 ms > 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=9 > ttl=63 time=0 > .309 ms Well, it doesn't seem to be mbuf exhaustion (I don't know what "out of packet secondary zone" means, I'll have to look at that) and if it doesn't handle pings it seems really hosed. Have you done a "vmstat 5" + "ps axlH" (or similar) to try and see what it's doing? ("top" and "netstat" might also help?) If you can figure out where it's spinning its wheels, that might at least give us a hint w.r.t. the problem. Good luck with it, rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.1006091119410.23896>