From owner-freebsd-fs@FreeBSD.ORG Thu Jun 26 20:48:55 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D8C501FE; Thu, 26 Jun 2014 20:48:55 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 7565D2705; Thu, 26 Jun 2014 20:48:54 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AhIFABmGrFODaFve/2dsb2JhbABaFoNJWoJupzYBAQEBAQEGkiyGbVMBgSR1hAMBAQEEAQEBIAQnIAQHGw4DAwECAQICDRkCKQEJHggGCAcEARwEiCENpRydPBeBK4Q5iEQBBgEBGzQHgneBTAWXcIQvki+DXiEvBnwBCBci X-IronPort-AV: E=Sophos;i="5.01,555,1400040000"; d="scan'208";a="135343866" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 26 Jun 2014 16:48:47 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 6A40FB403F; Thu, 26 Jun 2014 16:48:47 -0400 (EDT) Date: Thu, 26 Jun 2014 16:48:47 -0400 (EDT) From: Rick Macklem To: Bryan Drewery Message-ID: <1142861946.4612955.1403815727424.JavaMail.root@uoguelph.ca> In-Reply-To: <53AC8377.4060408@FreeBSD.org> Subject: Re: getdirentries cookies usage outside of UFS [PATCH] MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.209] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jun 2014 20:48:55 -0000 Bryan Drewery wrote: > On 2014-04-14 17:27, Kirk McKusick wrote: > >> Date: Fri, 11 Apr 2014 21:03:57 -0500 > >> From: Bryan Drewery > >> To: freebsd-fs@freebsd.org > >> Subject: getdirentries cookies usage outside of UFS > >> > >> Recently I was working on a compat syscall for sys_getdirentries() > >> that > >> converts between our dirent and the FreeBSD dirent struct. We had > >> never > >> tried using this on TMPFS and when we did ran into weird issues > >> (hence > >> my recent commits to TMPFS to clarify some of the getdirentries() > >> code). > >> We were not using cookies, so I referenced the Linux compat module > >> (linux_file.c getdents_common()) > >> > >> I ran across this code: > >> > >>> /* > >>> * When using cookies, the vfs has the option of reading from > >>> * a different offset than that supplied (UFS truncates the > >>> * offset to a block boundary to make sure that it never > >>> reads > >>> * partway through a directory entry, even if the directory > >>> * has been compacted). > >>> */ > >>> while (len > 0 && ncookies > 0 && *cookiep <= off) { > >>> bdp = (struct dirent *) inp; > >>> len -= bdp->d_reclen; > >>> inp += bdp->d_reclen; > >>> cookiep++; > >>> ncookies--; > >>> }=20 > >> > >> > >> At first it looked innocuous but then it occurred to me it was the > >> root > >> of the issue I was having as it was eating my cookies based on > >> their > >> value, despite tmpfs cookies being random hash values that have no > >> sequential relation. So I looked at how NFS was handling the same > >> code > >> and found this lovely hack from r216691: > >> > >>> not_zfs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "zfs"); > >> =2E.. > >>> while (cpos < cend && ncookies > 0 && > >>> (dp->d_fileno == 0 || dp->d_type == DT_WHT || > >>> (not_zfs != 0 && ((u_quad_t)(*cookiep)) <= toff))) { > >>> cpos += dp->d_reclen; > >>> dp = (struct dirent *)cpos; > >>> cookiep++; > >>> ncookies--; > >>> } > >> > >> I ended up doing the opposite, only running the code if getting > >> dirents > >> from "ufs". > >> > >> So there's multiple issue here. > >> > >> 1. NFS is broken on TMPFS. I can see why it's gone so long > >> unnoticed, > >> why would you do that? Still probably worth fixing. > >> > >> 2. Linux and SVR4 getdirentries() are both broken on TMPFS/ZFS. I > >> am > >> surprised Linux+ZFS has not been noticed by now. I am aware the > >> SVR4 > >> is > >> full of other bugs too. I ran across many just reviewing the > >> getdirentries code alone. > >> > >> Do any other file systems besides UFS do this offset/cookie > >> truncation/rewind? If UFS is the only one it may be acceptable to > >> change > >> this zfs check to !ufs and add it to the other modules. If we > >> don't > >> like > >> that, or there are potentially other file systems doing this too, > >> how > >> about adding a flag to somewhere to indicate the file system has > >> monotonically increasing offsets and needs this rewind support. > >> I'm > >> not > >> sure where that is best done, struct vfsconf? > >> > >> Regards, > >> Bryan Drewery > > > > This code is specific to UFS. I concur with your fix of making > > it conditionl on UFS. I feel guilty for putting that code in > > unconditionally in the first place. In my defense it was 1982 > > and UFS was the only filesystem :-) > > > > Kirk McKusick > > Based on this discussion I have made the following patch against NFS. > If > we're comfortable with this approach I will apply the same logic to > the > Linux and SVR4 modules. > > Mirrored at > http://people.freebsd.org/~bdrewery/patches/nfs-zfs-ufs.patch > > The patch changes 'not_zfs' to 'is_ufs' in the NFS code. Some of the > code actually is ZFS-specific in regards to snapshot handling. So I > have > also added a 'is_zfs' variable to compare against for those cases. > > I've removed the comments about ZFS in the UFS-only cases as the > existing comment seems to cover it fine. > > (Unrelated) This code, from r259845, in > sys/fs/nfsserver/nfs_nfsdport.c > seems odd to me: > > /* > * Check to see if entries in this directory can be safely > acquired > * via VFS_VGET() or if a switch to VOP_LOOKUP() is > required. > * ZFS snapshot directories need VOP_LOOKUP(), so that any > * automount of the snapshot directory that is required will > * be done. > * This needs to be done here for NFSv4, since NFSv4 never > does > * a VFS_VGET() for "." or "..". > */ > - if (not_zfs == 0) { > + if (is_zfs == 1) { > r = VFS_VGET(mp, at.na_fileid, LK_SHARED, &nvp); > if (r == EOPNOTSUPP) { > usevget = 0; > cn.cn_nameiop = LOOKUP; > cn.cn_lkflags = LK_SHARED | LK_RETRY; > cn.cn_cred = nd->nd_cred; > cn.cn_thread = p; > } else if (r == 0) > vput(nvp); > } > > This fallback is also done later (from r199715), but not limited to > ZFS. Would it make sense to not limit this first check to ZFS as > well? I > see that unionfs_vget also returns EOPNOTSUPP. A nullfs mount from > ZFS > served over NFS may also return EOPNOTSUPP, as odd as that is. > Well, the answer is similar to the not_zfs one. I don't have any way to test all the different file systems to make sure a change doesn't introduce a regression for any of them. In this case, someone was able to confirm that this changed fixed a problem for ZFS, so I put the code in as ZFS specific. The change further down was done by pjd@ and not me. Since he did it for all file systems in the old server, I duplicated that. (At that point, there was no ZFS specific stuff.) Enabling it for all file system types introduces a little overhead, but I would agree with that if it was known that it did not cause a regression for any file system type, then it would be nice to avoid a file system specific chunk. Bottom line. If you test this for all file systems and/or check the code for all file system types and are convinced it is safe to do for all file system types, then I have no problem with you doing a commit to change it. rick ps: I'll look at the patch, but I imagine it's fine with me. Basically, if Kirk thinks it's UFS specific, I believe him. > > -- > Regards, > Bryan Drewery > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"