Date: Thu, 9 May 2019 15:01:07 -0700 From: Conrad Meyer <cem@freebsd.org> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: test hash functions for fsid Message-ID: <CAG6CVpU4zNtEAka40ArmwuxJ3kW-z32HCaw8o6Ep0d=7VhsQdg@mail.gmail.com> In-Reply-To: <YQBPR0101MB226084001EB8D4756A3D81BEDD330@YQBPR0101MB2260.CANPRD01.PROD.OUTLOOK.COM> References: <YQBPR0101MB2260D82BAE348FB82902508CDD320@YQBPR0101MB2260.CANPRD01.PROD.OUTLOOK.COM> <CAG6CVpUER0ODF5mfdGFA-soSpGbh=qkjc4rKiJEdaOUaUaUmAw@mail.gmail.com> <YQBPR0101MB226068818D8C0D9591DD3D94DD330@YQBPR0101MB2260.CANPRD01.PROD.OUTLOOK.COM> <CAG6CVpWpduHKVU9Y_W01sZUcL6zT4uP5iNnF-4Lr3Bv_bt=jUg@mail.gmail.com> <YQBPR0101MB226084001EB8D4756A3D81BEDD330@YQBPR0101MB2260.CANPRD01.PROD.OUTLOOK.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Rick, On Thu, May 9, 2019 at 2:44 PM Rick Macklem <rmacklem@uoguelph.ca> wrote: > I do now think I can dynamically size it based on how many file systems > (which is how many structures) will be linked off the table, so this shou= ldn't be > a problem. Great! > >Load factor is just number of valid entries divided by space in the > >table. So if your table has 256 spaces, load factor 0.8 would be > >having about 205 valid entries. > > Hmm. That would suggest a large table size of about 100,000 for Peter's > 72,000 file system case. It would result in a list length averaging about= 1 entry, Yeah, that's usually how hash tables are sized =E2=80=94 aiming for a list length of 1 (for chaining hash tables) or relatively short probe sequence (for open addressing / embedded entry tables). It's what provides the various "O(1)" properties hash tables ostensibly have. > but I think a linear search through 10-20 entries won't take long enough = to be > a problem for this case. (This happens whenever mountd receives a SIGHUP = and each time a client does a mount.) Yes, it might not be noticeable in this application either way. > As noted in the last post, I was thinking along the lines of 10->20 as a = target > linked list length. (Or "table_size =3D num / L", where L is the target a= verage list length. > L =3D 10->20 would be roughly a load average of 10->20.) > > Does that sound reasonable? (Interested in hearing from anyone on this.) Wikipedia discusses it a little bit: https://en.wikipedia.org/wiki/Hash_table#Key_statistics Facebook recently open sourced their general-purpose C++ hashtable implementations and talk at length about various tradeoffs, including load: https://code.fb.com/developer-tools/f14/?r=3D1 It might be interesting to read. Anyway, usually it's a number less than 1. 10-20x is unconventional. > Yes. It would be the distribution of values and not their randomness that= would > matter for this case. Hopefully someone with a large # of UFS file system= s will > run the test program and post the results. To be honest, I doubt if anyon= e will > create a server with enough UFS file systems for it to be important to ha= sh their > fsid well. It works fine for the small number of UFS file systems I have.= ) Maybe pho@ will take that as a challenge. You could create a lot of UFS filesystems with mdconfig and enough RAM. > Thanks for the comments, rick Cheers, Conrad
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAG6CVpU4zNtEAka40ArmwuxJ3kW-z32HCaw8o6Ep0d=7VhsQdg>