Date: Thu, 29 Mar 2018 14:38:27 +0000 From: Dave Baukus <daveb@spectralogic.com> To: "cem@freebsd.org" <cem@freebsd.org> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: ZFS, Vnode cache, and poor directory listing performance via Samba Message-ID: <9f3c98c0-e5b1-9e3e-fc1e-e2362e12c98c@spectralogic.com> In-Reply-To: <CAG6CVpVfMAfZr%2BH1ZRjdAmHxJMnpTNi0LWuj2_o%2B0_LYD%2Brxyw@mail.gmail.com> References: <67aadb01-70c2-0454-2e3f-74bed67fb330@spectralogic.com> <CAG6CVpVfMAfZr%2BH1ZRjdAmHxJMnpTNi0LWuj2_o%2B0_LYD%2Brxyw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Thank you for the explanation and suggestions Conrad. Unfortunately this absurd directory is at a customer site generated by some ill-designed application. Dave Baukus On 03/28/2018 08:35 PM, Conrad Meyer wrote: > Hi Dave, > > Full scans are the worst case for an LRU cache. In particular, you > are full-scanning an *extremely* large directory, which evicts your > entire vnode cache. Then you suffer the (presumably) entirely > serialized penalty of refetching every single inode from disk again > after the first scan. > > Here are some solutions in order of preference: > 1. Organize your files better. 1 million in a single directory is > absurd. Can windows explorer meaningfully navigate a 1mil file > directory? I doubt it. > 2. Continue to bump maxvnodes to compensate for poor file organization > + naive clients doing full scans. > 3. Enhance samba to signal something like DONTNEED on > "SMB2_FIND_ID_BOTH_DIRECTORY_INFO Pattern: *" requests to the OS. > 3.a. Enhance samba to parallelize or otherwise asynchronously process > the above requests on huge directories (improve the uncached case). > > I don't think this has much to do with ZFS, other than that ZFS > performance on your hardware appears to be quite bad without the VFS > cache sitting in front to absorb most of the requests. > > Best, > Conrad > > > On Wed, Mar 28, 2018 at 7:07 PM, Dave Baukus <daveb@spectralogic.com> wrote: >> Below is narrative angst and woe for which I have the the following observations/questions: >> >> - Increasing kern.maxvnodes from 600,000 to 2,000,000 apparently solves the "problem" >> - This decreases the number of lookups in the scenario below from 40719 (some of which take over a second) to 4 >> - 2,000,00 may be extreme, but I was hoping for an authoritative comment on why/how this improves the scenario and >> then perhaps I can come up with some reasonable tuning options. >> - is this an artifact of the Freebsd 11-ish refactoring of the ZFS/Freebsd VNOP interface (?) >> >> ----------------------------------------------------- >> I have the following scenario on FreeBSD Stable 11.0: >> >> A ZFS with a directory containing 1,000,000 files; the root of this ZFS is >> exported via SAMBA using NFSv4 ACL plugin and DOS attributes with the (<get|set>extattr) implementation. >> >> A local full listing of this directory (ls -l > /dev/null) completes in about 40 seconds. >> A full listing from a Samba client (ls -l) completes in about 3 minutes. >> >> Using windows explorer from a Win2008 client is where the strangeness begins; it >> takes between 8 to 12 minutes before control is returned to win-explorer. >> >> Tracing this with wireshark I noticed that "SMB2_FIND_ID_BOTH_DIRECTORY_INFO Pattern: *" >> requests from the Win2008 client start off functioning well (client requests >> 64k of data and samba responds with 64k of directory data). After about 150 seconds of this >> interaction the client makes a "SMB2_FIND_ID_BOTH_DIRECTORY_INFO Pattern: *" request that is not >> responded to for over 60 seconds. The windows client closes the connection, starts a new >> connection, and begins directory listing from ground zero. This pattern continues for >> 6 to 10 minutes; I never see final request/response where the server indicates that the >> listing is complete; I believe win-explorer just gives up. >> >> Meanwhile, back on FreeBSD/ZFS I'm running a dtrace script that times the following >> ZFS VNOPs for the connected Samba server instance: >> >> - fbt:zfs:zfs_*extattr:entry and return (get|set|delete|list)extattr >> - fbt:zfs:zfs_freebsd_lookup:entry and return >> - fbt:zfs:zfs_freebsd_readdir:entry and return >> - fbt:zfs:zfs_freebsd_getattr:entry and return >> >> This starts off looking like: >> 12 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 19931 >> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3975 >> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2662 >> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1711 >> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1768 >> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1411 >> 12 27787 zfs_freebsd_readdir:return zfs_freebsd_readdir :: 44325 >> 12 27787 zfs_freebsd_readdir:return zfs_freebsd_readdir :: 38054 >> 12 27787 zfs_freebsd_readdir:return zfs_freebsd_readdir :: 36137 >> ... >> ... line 11,800 >> 16 27763 zfs_freebsd_getacl:return zfs_freebsd_getacl :: 2709 >> 16 27763 zfs_freebsd_getacl:return zfs_freebsd_getacl :: 2046 >> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2238 >> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1452 >> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1570 >> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1608 >> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1571 >> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1431 >> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1431 >> 16 27763 zfs_freebsd_getacl:return zfs_freebsd_getacl :: 2856 >> 16 27763 zfs_freebsd_getacl:return zfs_freebsd_getacl :: 1907 >> 16 27809 zfs_getextattr:return zfs_getextattr :: 3537 >> 16 27787 zfs_freebsd_readdir:return zfs_freebsd_readdir :: 45135 >> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2744 >> 16 27809 zfs_getextattr:return zfs_getextattr :: 3221 >> 16 27811 zfs_listextattr:return zfs_listextattr :: 3762 >> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2090 >> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2214 >> 16 27809 zfs_getextattr:return zfs_getextattr :: 20112 >> 16 27809 zfs_getextattr:return zfs_getextattr :: 14989 >> 16 27787 zfs_freebsd_readdir:return zfs_freebsd_readdir :: 35946 >> 16 27811 zfs_listextattr:return zfs_listextattr :: 46900 >> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2115 >> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1439 >> 16 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 22886 >> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1449 >> 16 27809 zfs_getextattr:return zfs_getextattr :: 4046 >> 16 27811 zfs_listextattr:return zfs_listextattr :: 2239 >> 16 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 15128 >> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1640 >> ... >> ... line 175,000 >> 12 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 85760734 >> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3617 >> 12 27809 zfs_getextattr:return zfs_getextattr :: 14064 >> 12 27811 zfs_listextattr:return zfs_listextattr :: 4088 >> 12 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 85586541 >> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2983 >> 12 27809 zfs_getextattr:return zfs_getextattr :: 11416 >> 12 27811 zfs_listextattr:return zfs_listextattr :: 3230 >> 12 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 85758027 >> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3124 >> ... >> ... line 176,0000 >> 1 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1113397903 >> 1 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3189 >> 1 27809 zfs_getextattr:return zfs_getextattr :: 6423 >> 1 27811 zfs_listextattr:return zfs_listextattr :: 3090 >> 1 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1108181740 >> 1 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3267 >> 1 27809 zfs_getextattr:return zfs_getextattr :: 5486 >> 1 27811 zfs_listextattr:return zfs_listextattr :: 3111 >> 1 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1092061756 >> 1 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3113 >> 1 27809 zfs_getextattr:return zfs_getextattr :: 5691 >> 1 27811 zfs_listextattr:return zfs_listextattr :: 3073 >> 1 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1102236755 >> 1 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3435 >> 1 27809 zfs_getextattr:return zfs_getextattr :: 5862 >> 1 27811 zfs_listextattr:return zfs_listextattr :: 3771 >> 1 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1101668231 >> 1 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3189 >> 1 27809 zfs_getextattr:return zfs_getextattr :: 6671 >> 15 27811 zfs_listextattr:return zfs_listextattr :: 12951 >> 15 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1061648117 >> 15 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 5365 >> 15 27809 zfs_getextattr:return zfs_getextattr :: 5731 >> 21 27811 zfs_listextattr:return zfs_listextattr :: 8178 >> 21 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 64429430 >> 21 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2912 >> 21 27809 zfs_getextattr:return zfs_getextattr :: 5566 >> 21 27811 zfs_listextattr:return zfs_listextattr :: 2454 >> 19 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1017176234 >> 19 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2976 >> 19 27809 zfs_getextattr:return zfs_getextattr :: 6230 >> 19 27811 zfs_listextattr:return zfs_listextattr :: 2710 >> 19 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 64211015 >> 19 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1876 >> 19 27809 zfs_getextattr:return zfs_getextattr :: 3690 >> 19 27811 zfs_listextattr:return zfs_listextattr :: 2292 >> 19 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 17007 >> 19 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1766 >> 19 27809 zfs_getextattr:return zfs_getextattr :: 3357 >> 19 27811 zfs_listextattr:return zfs_listextattr :: 2331 >> 19 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 63817436 >> 19 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1827 >> 19 27809 zfs_getextattr:return zfs_getextattr :: 12231 >> 12 27811 zfs_listextattr:return zfs_listextattr :: 8658 >> 12 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 64859702 >> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3296 >> 12 27809 zfs_getextattr:return zfs_getextattr :: 6118 >> 12 27811 zfs_listextattr:return zfs_listextattr :: 2454 >> 12 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 17442 >> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1676 >> 12 27809 zfs_getextattr:return zfs_getextattr :: 3649 >> 12 27811 zfs_listextattr:return zfs_listextattr :: 2363 >> 0 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1013471141 >> 0 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 5995 >> 0 27809 zfs_getextattr:return zfs_getextattr :: 9280 >> 0 27811 zfs_listextattr:return zfs_listextattr :: 3219 >> 0 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 64286196 >> 0 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 5618 >> 0 27809 zfs_getextattr:return zfs_getextattr :: 8919 >> 0 27811 zfs_listextattr:return zfs_listextattr :: 3117 >> 13 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 999431953 >> 13 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1062322808 >> 9 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1061885578 >> 9 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 11283 >> >> At this point the client closes the connection and the connected, samba server process exits. >> >> After increasing the vnodes to 2M, the wire transfer of the directoy listing completes >> in about 60 seconds with the final "no more files" response status observed, >> and win-explorer cogitates on the data for about another 2 minutes >> before control is returned to win-explorer. >> >> Thanks for any feed back. >> >> -- >> Dave Baukus >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > . >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9f3c98c0-e5b1-9e3e-fc1e-e2362e12c98c>
