From owner-freebsd-fs@FreeBSD.ORG Wed Jun 9 12:25:19 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E20341065670 for ; Wed, 9 Jun 2010 12:25:19 +0000 (UTC) (envelope-from anders@FreeBSD.org) Received: from fupp.net (totem.fix.no [80.91.36.20]) by mx1.freebsd.org (Postfix) with ESMTP id 2CAC18FC13 for ; Wed, 9 Jun 2010 12:25:18 +0000 (UTC) Received: from localhost (totem.fix.no [80.91.36.20]) by fupp.net (Postfix) with ESMTP id A691C47321; Wed, 9 Jun 2010 14:25:17 +0200 (CEST) Received: from fupp.net ([80.91.36.20]) by localhost (totem.fix.no [80.91.36.20]) (amavisd-new, port 10024) with LMTP id 7URuBGWSvEX8; Wed, 9 Jun 2010 14:25:17 +0200 (CEST) Received: by fupp.net (Postfix, from userid 1000) id 295B447320; Wed, 9 Jun 2010 14:25:17 +0200 (CEST) Date: Wed, 9 Jun 2010 14:25:17 +0200 From: Anders Nordby To: Rick Macklem Message-ID: <20100609122517.GA16231@fupp.net> References: <20100608083649.GA77452@fupp.net> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-PGP-Key: http://anders.fix.no/pgp/ X-PGP-Key-FingerPrint: 1E0F C53C D8DF 6A8F EAAD 19C5 D12A BC9F 0083 5956 Cc: freebsd-fs@FreeBSD.org Subject: Re: Odd network issues on ZFS based NFS server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jun 2010 12:25:20 -0000 Hi, On Tue, Jun 08, 2010 at 07:55:32PM -0400, Rick Macklem wrote: > Well, here's a few things you might try. (I know nothing about ZFS, > except what I see discussed on the mailing lists.) > > - "netstat -m" will show you mbuf allocations. Might give you a hint > w.r.t. mbuf/mbuf cluster exhaustion. > - I'd try setting zio_use_uma = 0, since there have been reports of > issues related to ZFS using the uma allocator and mbuf allocation > uses the uma allocator now, too. (I think this is fairly recent, so > might not be relevant to FreeBSD7.) > - You can try the experimental NFS server to see if that affects the > behaviour. ("-e" option on both mountd and nfsd) > - If you have some different network hardware, you could try a different > net interface. This would isolate the problem, if it happens to be > related to the network device driver for the hardware you have. > > There are lots of email messages in the archive related to tuning the > arc for zfs. I know nothing about it, but I'd look for a message that > describes what the current recommendations are for amd64 w.r.t. this. > > Hopefully others can suggest other things to check. It smells like some > sort of resource exhaustion problem, but who knows??? Thanks. The only thing that (temporarily) solves this issue so far is rebooting, which helps only for a day or so. I have tried different NICs, replacing the physical server, replacing cables, changing and resetting switch ports. But it did not help, so I think this is a software problem. I will try zio_use_uma = 0 I think, and then try to limit vfs.zfs.arc_max to 100 MB or so. On the ZFS+NFS server while having these issues: root@unixfile:~# netstat -m 1293/4602/5895 mbufs in use (current/cache/total) 1109/3619/4728/65536 mbuf clusters in use (current/cache/total/max) 257/1023 mbuf+clusters out of packet secondary zone in use (current/cache) 0/104/104/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 2541K/8804K/11345K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines Packet loss seen from my workstation: anders@noname:~$ ping unixfile PING unixfile.aftenposten.no (192.168.120.33) 56(84) bytes of data. 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=1 ttl=63 time=0 .230 ms 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=3 ttl=63 time=0 .262 ms 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=5 ttl=63 time=0 .272 ms 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=6 ttl=63 time=0 .203 ms 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=7 ttl=63 time=0 .306 ms 64 bytes from unixfile.aftenposten.no (192.168.120.33): icmp_seq=9 ttl=63 time=0 .309 ms ^C --- unixfile.aftenposten.no ping statistics --- 10 packets transmitted, 6 received, 40% packet loss, time 9017ms rtt min/avg/max/mdev = 0.203/0.263/0.309/0.042 ms Here is also vmstat -z from the server: ITEM SIZE LIMIT USED FREE REQUESTS FAILURES UMA Kegs: 208, 0, 175, 12, 175, 0 UMA Zones: 320, 0, 175, 5, 175, 0 UMA Slabs: 568, 0, 20339, 7535, 162600, 0 UMA RCntSlabs: 568, 0, 2468, 3, 2468, 0 UMA Hash: 256, 0, 5, 85, 81, 0 16 Bucket: 152, 0, 558, 292, 1115, 0 32 Bucket: 280, 0, 269, 25, 491, 0 64 Bucket: 536, 0, 254, 5, 391, 17 128 Bucket: 1048, 0, 3598, 47, 6823, 914 VM OBJECT: 216, 0, 47009, 9529, 2668554, 0 MAP: 232, 0, 7, 25, 7, 0 KMAP ENTRY: 120, 119815, 4881, 606, 376403, 0 MAP ENTRY: 120, 0, 1797, 683, 4683855, 0 DP fakepg: 120, 0, 0, 0, 0, 0 SG fakepg: 120, 0, 0, 0, 0, 0 mt_zone: 2056, 0, 196, 3, 196, 0 16: 16, 0, 14932, 7916, 3237030, 0 32: 32, 0, 2438, 1703, 2411143, 0 64: 64, 0, 32128, 18216, 93399160, 0 128: 128, 0, 28706, 55075, 12071701, 0 256: 256, 0, 3831, 7104, 58010086, 0 512: 512, 0, 1753, 578, 32140172, 0 1024: 1024, 0, 93, 123, 201330, 0 2048: 2048, 0, 529, 375, 36122797, 0 4096: 4096, 0, 253, 184, 185892, 0 Files: 80, 0, 424, 386, 1078416, 0 TURNSTILE: 136, 0, 297, 63, 297, 0 umtx pi: 96, 0, 0, 0, 0, 0 MAC labels: 40, 0, 0, 0, 0, 0 PROC: 1120, 0, 66, 114, 107003, 0 THREAD: 984, 0, 267, 29, 294, 0 SLEEPQUEUE: 80, 0, 297, 80, 297, 0 VMSPACE: 392, 0, 45, 155, 107030, 0 cpuset: 72, 0, 2, 98, 2, 0 audit_record: 952, 0, 0, 0, 0, 0 mbuf_packet: 256, 0, 259, 1021, 34278617, 0 mbuf: 256, 0, 1025, 3590, 131614064, 0 mbuf_cluster: 2048, 65536, 2278, 2450, 16615870, 0 mbuf_jumbo_page: 4096, 12800, 0, 104, 153927, 0 mbuf_jumbo_9k: 9216, 6400, 0, 0, 0, 0 mbuf_jumbo_16k: 16384, 3200, 0, 0, 0, 0 mbuf_ext_refcnt: 4, 0, 0, 0, 0, 0 g_bio: 232, 0, 0, 8544, 1690094, 0 ttyinq: 160, 0, 135, 81, 300, 0 ttyoutq: 256, 0, 72, 48, 160, 0 ata_request: 320, 0, 0, 24, 1, 0 ata_composite: 336, 0, 0, 0, 0, 0 VNODE: 472, 0, 69327, 4057, 12604560, 0 VNODEPOLL: 112, 0, 0, 0, 0, 0 S VFS Cache: 108, 0, 70366, 6821, 12297146, 0 L VFS Cache: 328, 0, 179, 25369, 544759, 0 NAMEI: 1024, 0, 0, 96, 18824297, 0 NFSMOUNT: 616, 0, 0, 0, 0, 0 NFSNODE: 656, 0, 0, 0, 0, 0 DIRHASH: 1024, 0, 1147, 37, 1147, 0 pipe: 728, 0, 19, 86, 85332, 0 ksiginfo: 112, 0, 166, 890, 4901, 0 itimer: 344, 0, 0, 22, 1, 0 KNOTE: 128, 0, 0, 145, 622, 0 socket: 680, 131076, 53, 79, 20777, 0 unpcb: 240, 131072, 10, 182, 6269, 0 ipq: 56, 2079, 0, 189, 159, 0 udp_inpcb: 336, 131076, 11, 66, 5487, 0 udpcb: 16, 131208, 11, 661, 5487, 0 tcp_inpcb: 336, 131076, 32, 111, 9019, 0 tcpcb: 880, 131072, 32, 96, 9019, 0 tcptw: 72, 26250, 0, 200, 51, 0 syncache: 144, 15366, 0, 130, 8229, 0 hostcache: 136, 15372, 8, 132, 61, 0 tcpreass: 40, 4116, 3, 501, 662733, 0 sackhole: 32, 0, 0, 202, 11, 0 ripcb: 336, 131076, 0, 22, 1, 0 rtentry: 200, 0, 4, 34, 4, 0 selfd: 56, 0, 262, 683, 704729, 0 SWAPMETA: 288, 116519, 0, 0, 0, 0 ip4flow: 56, 99351, 16, 551, 11254, 0 ip6flow: 80, 99360, 0, 0, 0, 0 Mountpoints: 752, 0, 5, 20, 5, 0 FFS inode: 168, 0, 43495, 25739, 526228, 0 FFS1 dinode: 128, 0, 0, 0, 0, 0 FFS2 dinode: 256, 0, 43495, 25610, 526228, 0 taskq_zone: 56, 0, 0, 819, 299535, 0 zio_cache: 776, 0, 0, 2830, 7902766, 0 zio_buf_512: 512, 0, 73281, 39083, 2179139, 0 zio_data_buf_512: 512, 0, 41, 260, 86233, 0 zio_buf_1024: 1024, 0, 64, 624, 31885, 0 zio_data_buf_1024: 1024, 0, 33, 815, 14631, 0 zio_buf_1536: 1536, 0, 15, 161, 6621, 0 zio_data_buf_1536: 1536, 0, 9, 179, 666, 0 zio_buf_2048: 2048, 0, 10, 352, 13371, 0 zio_data_buf_2048: 2048, 0, 4, 82, 518, 0 zio_buf_2560: 2560, 0, 6, 76, 4631, 0 zio_data_buf_2560: 2560, 0, 8, 79, 751, 0 zio_buf_3072: 3072, 0, 3, 146, 8829, 0 zio_data_buf_3072: 3072, 0, 4, 107, 1160, 0 zio_buf_3584: 3584, 0, 5, 273, 22944, 0 zio_data_buf_3584: 3584, 0, 5, 82, 418, 0 zio_buf_4096: 4096, 0, 10, 192, 21812, 0 zio_data_buf_4096: 4096, 0, 7, 141, 1628, 0 zio_buf_5120: 5120, 0, 2, 236, 49783, 0 zio_data_buf_5120: 5120, 0, 14, 366, 2686, 0 zio_buf_6144: 6144, 0, 3, 127, 26343, 0 zio_data_buf_6144: 6144, 0, 20, 629, 1944, 0 zio_buf_7168: 7168, 0, 3, 85, 7341, 0 zio_data_buf_7168: 7168, 0, 31, 690, 2953, 0 zio_buf_8192: 8192, 0, 5, 98, 6653, 0 zio_data_buf_8192: 8192, 0, 47, 712, 3562, 0 zio_buf_10240: 10240, 0, 10, 109, 5628, 0 zio_data_buf_10240: 10240, 0, 80, 846, 5494, 0 zio_buf_12288: 12288, 0, 9, 81, 2704, 0 zio_data_buf_12288: 12288, 0, 59, 972, 4714, 0 zio_buf_14336: 14336, 0, 0, 293, 79024, 0 zio_data_buf_14336: 14336, 0, 64, 770, 5474, 0 zio_buf_16384: 16384, 0, 3409, 613, 42927, 0 zio_data_buf_16384: 16384, 0, 53, 615, 36196, 0 zio_buf_20480: 20480, 0, 0, 72, 1000, 0 zio_data_buf_20480: 20480, 0, 50, 761, 5383, 0 zio_buf_24576: 24576, 0, 3, 42, 702, 0 zio_data_buf_24576: 24576, 0, 24, 312, 3207, 0 zio_buf_28672: 28672, 0, 1, 54, 784, 0 zio_data_buf_28672: 28672, 0, 10, 157, 1538, 0 zio_buf_32768: 32768, 0, 0, 61, 1079, 0 zio_data_buf_32768: 32768, 0, 8, 129, 22324, 0 zio_buf_36864: 36864, 0, 3, 71, 486, 0 zio_data_buf_36864: 36864, 0, 11, 92, 1506, 0 zio_buf_40960: 40960, 0, 1, 53, 324, 0 zio_data_buf_40960: 40960, 0, 7, 58, 728, 0 zio_buf_45056: 45056, 0, 1, 43, 319, 0 zio_data_buf_45056: 45056, 0, 3, 55, 530, 0 zio_buf_49152: 49152, 0, 0, 65, 1224, 0 zio_data_buf_49152: 49152, 0, 1, 140, 17837, 0 zio_buf_53248: 53248, 0, 0, 53, 364, 0 zio_data_buf_53248: 53248, 0, 0, 54, 349, 0 zio_buf_57344: 57344, 0, 2, 52, 381, 0 zio_data_buf_57344: 57344, 0, 6, 97, 2164, 0 zio_buf_61440: 61440, 0, 0, 44, 267, 0 zio_data_buf_61440: 61440, 0, 1, 50, 594, 0 zio_buf_65536: 65536, 0, 172, 92, 41829, 0 zio_data_buf_65536: 65536, 0, 0, 119, 14319, 0 zio_buf_69632: 69632, 0, 0, 35, 194, 0 zio_data_buf_69632: 69632, 0, 0, 38, 195, 0 zio_buf_73728: 73728, 0, 0, 44, 525, 0 zio_data_buf_73728: 73728, 0, 3, 75, 718, 0 zio_buf_77824: 77824, 0, 0, 58, 462, 0 zio_data_buf_77824: 77824, 0, 6, 74, 557, 0 zio_buf_81920: 81920, 0, 1, 53, 422, 0 zio_data_buf_81920: 81920, 0, 0, 118, 12825, 0 zio_buf_86016: 86016, 0, 1, 34, 308, 0 zio_data_buf_86016: 86016, 0, 5, 50, 957, 0 zio_buf_90112: 90112, 0, 1, 48, 481, 0 zio_data_buf_90112: 90112, 0, 1, 29, 44, 0 zio_buf_94208: 94208, 0, 0, 49, 1036, 0 zio_data_buf_94208: 94208, 0, 0, 57, 177, 0 zio_buf_98304: 98304, 0, 0, 44, 348, 0 zio_data_buf_98304: 98304, 0, 0, 112, 12362, 0 zio_buf_102400: 102400, 0, 0, 58, 388, 0 zio_data_buf_102400: 102400, 0, 0, 20, 45, 0 zio_buf_106496: 106496, 0, 1, 35, 477, 0 zio_data_buf_106496: 106496, 0, 1, 57, 482, 0 zio_buf_110592: 110592, 0, 1, 72, 884, 0 zio_data_buf_110592: 110592, 0, 0, 71, 930, 0 zio_buf_114688: 114688, 0, 0, 61, 656, 0 zio_data_buf_114688: 114688, 0, 1, 146, 10626, 0 zio_buf_118784: 118784, 0, 0, 67, 532, 0 zio_data_buf_118784: 118784, 0, 0, 10, 29, 0 zio_buf_122880: 122880, 0, 1, 86, 1444, 0 zio_data_buf_122880: 122880, 0, 0, 50, 176, 0 zio_buf_126976: 126976, 0, 1, 59, 1029, 0 zio_data_buf_126976: 126976, 0, 0, 42, 325, 0 zio_buf_131072: 131072, 0, 0, 717, 119915, 0 zio_data_buf_131072: 131072, 0, 474, 981, 214146, 0 dmu_buf_impl_t: 224, 0, 77939, 46739, 2664713, 0 dnode_t: 776, 0, 73767, 45043, 2094869, 0 arc_buf_hdr_t: 208, 0, 27195, 24519, 605620, 0 arc_buf_t: 72, 0, 4901, 14949, 677129, 0 zil_lwb_cache: 200, 0, 2, 1233, 118944, 0 zfs_znode_cache: 376, 0, 25805, 4005, 12077350, 0 Regards, -- Anders.