Date: Thu, 9 May 2019 21:43:56 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: "cem@freebsd.org" <cem@freebsd.org> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@FreeBSD.org> Subject: Re: test hash functions for fsid Message-ID: <YQBPR0101MB226084001EB8D4756A3D81BEDD330@YQBPR0101MB2260.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <CAG6CVpWpduHKVU9Y_W01sZUcL6zT4uP5iNnF-4Lr3Bv_bt=jUg@mail.gmail.com> References: <YQBPR0101MB2260D82BAE348FB82902508CDD320@YQBPR0101MB2260.CANPRD01.PROD.OUTLOOK.COM> <CAG6CVpUER0ODF5mfdGFA-soSpGbh=qkjc4rKiJEdaOUaUaUmAw@mail.gmail.com> <YQBPR0101MB226068818D8C0D9591DD3D94DD330@YQBPR0101MB2260.CANPRD01.PROD.OUTLOOK.COM>, <CAG6CVpWpduHKVU9Y_W01sZUcL6zT4uP5iNnF-4Lr3Bv_bt=jUg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Conrad Meyer wrote: >On Wed, May 8, 2019 at 5:41 PM Rick Macklem <rmacklem@uoguelph.ca> wrote: >[liberal snipping throughout :-)] >> >> I'll admit I had never heard of PCTRIE, but <sys/pctrie.h> seems to be >> #ifdef _KERNEL, so it can't be used in userland? > >Ah, you're completely correct. I had considered using PCTRIE for a >userspace application earlier, and had started converting it to be >buildable in userspace, but I never finished. (I determined it >wouldn't be a good fit for that application, unfortunately.) I do >think it might be a good datastructure to expose to base userspace >programs eventually, but you probably don't really want to tackle that >extra work :-). A hash table is totally fine. > >> Yes. I just chose 256 for this test program. Unfortunately, the way moun= td.c is >> currently coded, the hash table must be sized before the getmntinfo() ca= ll, >> so it must be sized before it knows how many file systems there are. >> Since this is userland and table size isn't a memory issue, I'm just tem= pted to >> make it large enough for a large server with something like 25,000 file = systems. > >No objection, so long as people can still run tiny NFS servers on >their Raspberry Pis. :-) I do now think I can dynamically size it based on how many file systems (which is how many structures) will be linked off the table, so this should= n't be a problem. >> (I'd guess somewhere in the 256->1024 range would work. I'm not sure wha= t >> you mean by load factor below 0.8?) > >Load factor is just number of valid entries divided by space in the >table. So if your table has 256 spaces, load factor 0.8 would be >having about 205 valid entries. Hmm. That would suggest a large table size of about 100,000 for Peter's 72,000 file system case. It would result in a list length averaging about 1= entry, but I think a linear search through 10-20 entries won't take long enough to= be a problem for this case. (This happens whenever mountd receives a SIGHUP an= d each time a client does a mount.) As noted in the last post, I was thinking along the lines of 10->20 as a ta= rget linked list length. (Or "table_size =3D num / L", where L is the target ave= rage list length.=20 L =3D 10->20 would be roughly a load average of 10->20.) Does that sound reasonable? (Interested in hearing from anyone on this.) >> Fortunately, neither ZFS nor UFS uses vfs_getnewfsid() unless there is a= collision >> and I don't really care about other file system types. > >Ah, sorry =97 I didn't read carefully enough when looking for fsid initial= ization. > >> After all, it just does >> a linear search down the list of N file systems right and just about an= ything >> should be an improvement.) > >Yes :-). > >> I added a simple (take the low order bits of val[0]) case to the test. I= actually >> suspect any of the hash functions will work well enough, since, as you n= ote, most of the values (val[0] and 24bits of val[1]) are from a random # g= enerator which should >> be pretty uniform in distribution for ZFS. >> UFS now uses a value from the superblock. It appears that newfs sets val= [0] to the >> creation time of the file system and val[1] to a random value. > >Hm, it looks like makefs sets val[1] (fs_id[1]) to a non-random value >generated by the predictable PRNG random(3), for reproducible build >reasons. makefs seeds srandom(3) with either the current time in >seconds (low entropy) or some known timestamp (either from a file, >also in seconds, or an integer) (no entropy). I guess random(3) may >provide better distribution for the table than a plain global counter, >though. Yes. It would be the distribution of values and not their randomness that w= ould matter for this case. Hopefully someone with a large # of UFS file systems = will run the test program and post the results. To be honest, I doubt if anyone = will create a server with enough UFS file systems for it to be important to hash= their fsid well. It works fine for the small number of UFS file systems I have.) >> If it had been the >> reverse, I would be tempted to only use val[0], but I'll see how well th= e test goes >> for Peter. > >Seems like you were right =97 any old function is good enough :-). >From Peter's test, the first three did fine and almost the same. The last t= wo weren't as good, but were still tolerable. Thanks for the comments, rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQBPR0101MB226084001EB8D4756A3D81BEDD330>