Date: Wed, 28 Mar 2018 19:35:12 -0700 From: Conrad Meyer <cem@freebsd.org> To: Dave Baukus <daveb@spectralogic.com> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: ZFS, Vnode cache, and poor directory listing performance via Samba Message-ID: <CAG6CVpVfMAfZr%2BH1ZRjdAmHxJMnpTNi0LWuj2_o%2B0_LYD%2Brxyw@mail.gmail.com> In-Reply-To: <67aadb01-70c2-0454-2e3f-74bed67fb330@spectralogic.com> References: <67aadb01-70c2-0454-2e3f-74bed67fb330@spectralogic.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Dave, Full scans are the worst case for an LRU cache. In particular, you are full-scanning an *extremely* large directory, which evicts your entire vnode cache. Then you suffer the (presumably) entirely serialized penalty of refetching every single inode from disk again after the first scan. Here are some solutions in order of preference: 1. Organize your files better. 1 million in a single directory is absurd. Can windows explorer meaningfully navigate a 1mil file directory? I doubt it. 2. Continue to bump maxvnodes to compensate for poor file organization + naive clients doing full scans. 3. Enhance samba to signal something like DONTNEED on "SMB2_FIND_ID_BOTH_DIRECTORY_INFO Pattern: *" requests to the OS. 3.a. Enhance samba to parallelize or otherwise asynchronously process the above requests on huge directories (improve the uncached case). I don't think this has much to do with ZFS, other than that ZFS performance on your hardware appears to be quite bad without the VFS cache sitting in front to absorb most of the requests. Best, Conrad On Wed, Mar 28, 2018 at 7:07 PM, Dave Baukus <daveb@spectralogic.com> wrote: > Below is narrative angst and woe for which I have the the following observations/questions: > > - Increasing kern.maxvnodes from 600,000 to 2,000,000 apparently solves the "problem" > - This decreases the number of lookups in the scenario below from 40719 (some of which take over a second) to 4 > - 2,000,00 may be extreme, but I was hoping for an authoritative comment on why/how this improves the scenario and > then perhaps I can come up with some reasonable tuning options. > - is this an artifact of the Freebsd 11-ish refactoring of the ZFS/Freebsd VNOP interface (?) > > ----------------------------------------------------- > I have the following scenario on FreeBSD Stable 11.0: > > A ZFS with a directory containing 1,000,000 files; the root of this ZFS is > exported via SAMBA using NFSv4 ACL plugin and DOS attributes with the (<get|set>extattr) implementation. > > A local full listing of this directory (ls -l > /dev/null) completes in about 40 seconds. > A full listing from a Samba client (ls -l) completes in about 3 minutes. > > Using windows explorer from a Win2008 client is where the strangeness begins; it > takes between 8 to 12 minutes before control is returned to win-explorer. > > Tracing this with wireshark I noticed that "SMB2_FIND_ID_BOTH_DIRECTORY_INFO Pattern: *" > requests from the Win2008 client start off functioning well (client requests > 64k of data and samba responds with 64k of directory data). After about 150 seconds of this > interaction the client makes a "SMB2_FIND_ID_BOTH_DIRECTORY_INFO Pattern: *" request that is not > responded to for over 60 seconds. The windows client closes the connection, starts a new > connection, and begins directory listing from ground zero. This pattern continues for > 6 to 10 minutes; I never see final request/response where the server indicates that the > listing is complete; I believe win-explorer just gives up. > > Meanwhile, back on FreeBSD/ZFS I'm running a dtrace script that times the following > ZFS VNOPs for the connected Samba server instance: > > - fbt:zfs:zfs_*extattr:entry and return (get|set|delete|list)extattr > - fbt:zfs:zfs_freebsd_lookup:entry and return > - fbt:zfs:zfs_freebsd_readdir:entry and return > - fbt:zfs:zfs_freebsd_getattr:entry and return > > This starts off looking like: > 12 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 19931 > 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3975 > 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2662 > 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1711 > 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1768 > 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1411 > 12 27787 zfs_freebsd_readdir:return zfs_freebsd_readdir :: 44325 > 12 27787 zfs_freebsd_readdir:return zfs_freebsd_readdir :: 38054 > 12 27787 zfs_freebsd_readdir:return zfs_freebsd_readdir :: 36137 > ... > ... line 11,800 > 16 27763 zfs_freebsd_getacl:return zfs_freebsd_getacl :: 2709 > 16 27763 zfs_freebsd_getacl:return zfs_freebsd_getacl :: 2046 > 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2238 > 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1452 > 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1570 > 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1608 > 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1571 > 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1431 > 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1431 > 16 27763 zfs_freebsd_getacl:return zfs_freebsd_getacl :: 2856 > 16 27763 zfs_freebsd_getacl:return zfs_freebsd_getacl :: 1907 > 16 27809 zfs_getextattr:return zfs_getextattr :: 3537 > 16 27787 zfs_freebsd_readdir:return zfs_freebsd_readdir :: 45135 > 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2744 > 16 27809 zfs_getextattr:return zfs_getextattr :: 3221 > 16 27811 zfs_listextattr:return zfs_listextattr :: 3762 > 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2090 > 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2214 > 16 27809 zfs_getextattr:return zfs_getextattr :: 20112 > 16 27809 zfs_getextattr:return zfs_getextattr :: 14989 > 16 27787 zfs_freebsd_readdir:return zfs_freebsd_readdir :: 35946 > 16 27811 zfs_listextattr:return zfs_listextattr :: 46900 > 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2115 > 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1439 > 16 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 22886 > 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1449 > 16 27809 zfs_getextattr:return zfs_getextattr :: 4046 > 16 27811 zfs_listextattr:return zfs_listextattr :: 2239 > 16 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 15128 > 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1640 > ... > ... line 175,000 > 12 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 85760734 > 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3617 > 12 27809 zfs_getextattr:return zfs_getextattr :: 14064 > 12 27811 zfs_listextattr:return zfs_listextattr :: 4088 > 12 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 85586541 > 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2983 > 12 27809 zfs_getextattr:return zfs_getextattr :: 11416 > 12 27811 zfs_listextattr:return zfs_listextattr :: 3230 > 12 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 85758027 > 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3124 > ... > ... line 176,0000 > 1 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1113397903 > 1 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3189 > 1 27809 zfs_getextattr:return zfs_getextattr :: 6423 > 1 27811 zfs_listextattr:return zfs_listextattr :: 3090 > 1 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1108181740 > 1 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3267 > 1 27809 zfs_getextattr:return zfs_getextattr :: 5486 > 1 27811 zfs_listextattr:return zfs_listextattr :: 3111 > 1 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1092061756 > 1 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3113 > 1 27809 zfs_getextattr:return zfs_getextattr :: 5691 > 1 27811 zfs_listextattr:return zfs_listextattr :: 3073 > 1 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1102236755 > 1 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3435 > 1 27809 zfs_getextattr:return zfs_getextattr :: 5862 > 1 27811 zfs_listextattr:return zfs_listextattr :: 3771 > 1 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1101668231 > 1 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3189 > 1 27809 zfs_getextattr:return zfs_getextattr :: 6671 > 15 27811 zfs_listextattr:return zfs_listextattr :: 12951 > 15 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1061648117 > 15 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 5365 > 15 27809 zfs_getextattr:return zfs_getextattr :: 5731 > 21 27811 zfs_listextattr:return zfs_listextattr :: 8178 > 21 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 64429430 > 21 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2912 > 21 27809 zfs_getextattr:return zfs_getextattr :: 5566 > 21 27811 zfs_listextattr:return zfs_listextattr :: 2454 > 19 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1017176234 > 19 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2976 > 19 27809 zfs_getextattr:return zfs_getextattr :: 6230 > 19 27811 zfs_listextattr:return zfs_listextattr :: 2710 > 19 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 64211015 > 19 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1876 > 19 27809 zfs_getextattr:return zfs_getextattr :: 3690 > 19 27811 zfs_listextattr:return zfs_listextattr :: 2292 > 19 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 17007 > 19 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1766 > 19 27809 zfs_getextattr:return zfs_getextattr :: 3357 > 19 27811 zfs_listextattr:return zfs_listextattr :: 2331 > 19 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 63817436 > 19 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1827 > 19 27809 zfs_getextattr:return zfs_getextattr :: 12231 > 12 27811 zfs_listextattr:return zfs_listextattr :: 8658 > 12 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 64859702 > 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3296 > 12 27809 zfs_getextattr:return zfs_getextattr :: 6118 > 12 27811 zfs_listextattr:return zfs_listextattr :: 2454 > 12 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 17442 > 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1676 > 12 27809 zfs_getextattr:return zfs_getextattr :: 3649 > 12 27811 zfs_listextattr:return zfs_listextattr :: 2363 > 0 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1013471141 > 0 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 5995 > 0 27809 zfs_getextattr:return zfs_getextattr :: 9280 > 0 27811 zfs_listextattr:return zfs_listextattr :: 3219 > 0 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 64286196 > 0 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 5618 > 0 27809 zfs_getextattr:return zfs_getextattr :: 8919 > 0 27811 zfs_listextattr:return zfs_listextattr :: 3117 > 13 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 999431953 > 13 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1062322808 > 9 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1061885578 > 9 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 11283 > > At this point the client closes the connection and the connected, samba server process exits. > > After increasing the vnodes to 2M, the wire transfer of the directoy listing completes > in about 60 seconds with the final "no more files" response status observed, > and win-explorer cogitates on the data for about another 2 minutes > before control is returned to win-explorer. > > Thanks for any feed back. > > -- > Dave Baukus > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAG6CVpVfMAfZr%2BH1ZRjdAmHxJMnpTNi0LWuj2_o%2B0_LYD%2Brxyw>