Date: Thu, 9 May 2019 03:33:01 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Conrad Meyer <cem@freebsd.org> Cc: Rick Macklem <rmacklem@uoguelph.ca>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: test hash functions for fsid Message-ID: <20190509015851.J1574@besplex.bde.org> In-Reply-To: <CAG6CVpUER0ODF5mfdGFA-soSpGbh=qkjc4rKiJEdaOUaUaUmAw@mail.gmail.com> References: <YQBPR0101MB2260D82BAE348FB82902508CDD320@YQBPR0101MB2260.CANPRD01.PROD.OUTLOOK.COM> <CAG6CVpUER0ODF5mfdGFA-soSpGbh=qkjc4rKiJEdaOUaUaUmAw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 8 May 2019, Conrad Meyer wrote: > On Wed, May 8, 2019 at 7:39 AM Rick Macklem <rmacklem@uoguelph.ca> wrote: >> If you have a FreeBSD system with lots of local file systems (1000s), maybe you could >> run the attached program on it and email me the stats. >> I'm just trying to see how well these hash functions work on fsids. > > Below I'll get to some comments on hash functions and hash tables, but > first: have you considered a non-hash data structure, such as PCTRIE? > PCTRIE_DEFINE() creates a datastructure with similar properties to a > hash table for this use, but does not require a hash function step at > all because all fsids are already unique, fixed-size, 64-bit values. > It may be especially space-efficient for non-ZFS FSIDs, when the > 64-bit key is composed with '(fsid.val[1] << 32 | fsid.val[0])'. > (Elaborated on more below.(A)) fsids must be unique in 32 bits, especially for nfs uses, since they are usually blindly truncated to 32 bits in nfs clients that have 32-bit dev_t to form st_dev, but st_dev must be unique. The FreeBSD-5 client blindly truncates: vap->va_fsid = vp->v_mount->mnt_stat.f_fsid.val[0]; The FreeBSD-11 client does the same for nfsv3, but for nfsv4 it does hashing stuff to convert from a 128-bit (?) na_filesid to 32 bits. va_fsid isn't really an fsid. It is a dev_t in all versions of FreeBSD, so it is too small to hold the 64-bit fsids created by vfs_getnewfsid() in all versions of FreeBSD except recent versions where dev_t is also 64 bits -- it works accidentally for them. > (A) FSIDs themselves have poor initial distribution and *very* few > unique bits (i.e., for filesystems that use vfs_getnewfsid, the int32 > fsid val[1] will be identical across all filesystems of the same type, > such as NFS mounts). The remaining val[0] is unique due to an > incrementing global counter. val[1] is useless unless there are more than 256 fs types, since it is already in the top 8 bits of the 24-bit minor number in fsid.val[0], except on systems with 8-bit minor numbers and in buggy versions of nfs that truncate the minor number to 8 bits (very old versions of oldnfs and not so old versions of newnfs). val[1] should never be used since it gets truncated away in many cases. val[0] has 16 bits of uniqueness from the counter, 8 mostly wasted for distinguishing the fs type, and another 8 bits even more mostly wasted for distinguishing the device type (255 for all synthetic devices was to stay away from major numbers 0-254 for physical devices, but devfs made all major numbers for physical devices 0). In some versions of FreeBSD, changing makedev() broke the uniqueness of val[0] by shifting the major bits into obvlivion (so synthetic devices were not distinguishable from physical devices, and since physical devices uses a dense set of minor numbers, collisions were likely). The difference between servers and clients' use of fsid is confusing, and I probably got some of the above wrong by conflating them. Clients should generate fsids unique in 32 bits. Using vfs_getnewfsid() and ignoring val[1] almost does this. This is needed to support applications using compat syscalls with 32-bit dev_t in st_dev and other places. Other file systems should also avoid actually using 64-bit dev_t for the same reason. Using devfs almost does this. It might be possible to generated more than 2**24 ptys so that the minor numbers start using the top 32 bits, but this is foot-shooting and hopefully disk devices are already allocated with low minor numbers. Devices which can back file systems should be in a different namespace (e.g., major == 1) to avoid this problem. I think device numbers from servers are only visible to clients for special files. Uniqueness then is not so important, but I found that changing makedev() broke it by noticing that server device numbers were truncated on clients. If the server fsid_t or dev_t is larger than the client fsid_t or dev_t, then there is no way that the client can represent all server numbers. Hashing can work for sparse numbers, but is complicated. Servers should also not actually use 64-bit dev_t. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190509015851.J1574>