From owner-freebsd-fs@freebsd.org Thu May 9 22:01:26 2019 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 928D1158EE01 for ; Thu, 9 May 2019 22:01:26 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: from mail-it1-f175.google.com (mail-it1-f175.google.com [209.85.166.175]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7344C83B75 for ; Thu, 9 May 2019 22:01:25 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: by mail-it1-f175.google.com with SMTP id q65so6226516itg.2 for ; Thu, 09 May 2019 15:01:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to:cc:content-transfer-encoding; bh=eiNQTL1WysjsYZOxsoK3tknczf+0im3gE964r0uGk4w=; b=qsKrMbh+vl1We89FCdppX/3kTbvHAsa4EI4IrOQ53jeVP/ApcJe/2UHaNdOa/3Kbpf NUb+ntk7GGVGivZIbmfP/FXxqSmwDqxuTpYggL8foLpQ8MmbGjUBhrSsqlIo/OcbE8zX QTCgYACd0DN9UGoT1TbWYzIJhp8G0MZAg5Pe+/frrioB0kdDl4LofxOZwVsUDKn53uvN hZiKUooKw5z2ebioWAkkqMBsE755KTeCEV0Moar8wUo7WC52LsA3q2zHgKyvnTcEOPzj IYwfM+V5d6djlL65OBwR4mTV6NaM2ucSQK+9rJNEmrxkrZGQUSqaVCVhE3bZ/21RPcQS /xww== X-Gm-Message-State: APjAAAUKcvKIXPWh1C+s2fBDSkE0Douoo+kLxhbsRFWB/F8iA7OHzxrx tkokc1VNAoSM++PNAxSxS2zvDBSM X-Google-Smtp-Source: APXvYqyJxUo5+DD+pTzArQw9o+GW5CWIx9AhY/3O+GnAOqAzMqhjUg19UfG7WPYjeWBumWX8w7/dnA== X-Received: by 2002:a02:5143:: with SMTP id s64mr5281708jaa.54.1557439278888; Thu, 09 May 2019 15:01:18 -0700 (PDT) Received: from mail-it1-f172.google.com (mail-it1-f172.google.com. [209.85.166.172]) by smtp.gmail.com with ESMTPSA id z21sm1377916ioz.16.2019.05.09.15.01.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 May 2019 15:01:18 -0700 (PDT) Received: by mail-it1-f172.google.com with SMTP id q65so6226436itg.2 for ; Thu, 09 May 2019 15:01:18 -0700 (PDT) X-Received: by 2002:a02:7410:: with SMTP id o16mr5080646jac.87.1557439278202; Thu, 09 May 2019 15:01:18 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Reply-To: cem@freebsd.org From: Conrad Meyer Date: Thu, 9 May 2019 15:01:07 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: test hash functions for fsid To: Rick Macklem Cc: "freebsd-fs@freebsd.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 7344C83B75 X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of csecem@gmail.com designates 209.85.166.175 as permitted sender) smtp.mailfrom=csecem@gmail.com X-Spamd-Result: default: False [-5.34 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; HAS_REPLYTO(0.00)[cem@freebsd.org]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; REPLYTO_ADDR_EQ_FROM(0.00)[]; RCVD_COUNT_THREE(0.00)[4]; MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.98)[-0.982,0]; FORGED_SENDER(0.30)[cem@freebsd.org,csecem@gmail.com]; IP_SCORE(-2.35)[ip: (-5.73), ipnet: 209.85.128.0/17(-3.68), asn: 15169(-2.27), country: US(-0.06)]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; FROM_NEQ_ENVFROM(0.00)[cem@freebsd.org,csecem@gmail.com]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; TAGGED_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; DMARC_NA(0.00)[freebsd.org]; MIME_TRACE(0.00)[0:+]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[175.166.85.209.list.dnswl.org : 127.0.5.0]; RCVD_TLS_LAST(0.00)[] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 May 2019 22:01:26 -0000 Hi Rick, On Thu, May 9, 2019 at 2:44 PM Rick Macklem wrote: > I do now think I can dynamically size it based on how many file systems > (which is how many structures) will be linked off the table, so this shou= ldn't be > a problem. Great! > >Load factor is just number of valid entries divided by space in the > >table. So if your table has 256 spaces, load factor 0.8 would be > >having about 205 valid entries. > > Hmm. That would suggest a large table size of about 100,000 for Peter's > 72,000 file system case. It would result in a list length averaging about= 1 entry, Yeah, that's usually how hash tables are sized =E2=80=94 aiming for a list length of 1 (for chaining hash tables) or relatively short probe sequence (for open addressing / embedded entry tables). It's what provides the various "O(1)" properties hash tables ostensibly have. > but I think a linear search through 10-20 entries won't take long enough = to be > a problem for this case. (This happens whenever mountd receives a SIGHUP = and each time a client does a mount.) Yes, it might not be noticeable in this application either way. > As noted in the last post, I was thinking along the lines of 10->20 as a = target > linked list length. (Or "table_size =3D num / L", where L is the target a= verage list length. > L =3D 10->20 would be roughly a load average of 10->20.) > > Does that sound reasonable? (Interested in hearing from anyone on this.) Wikipedia discusses it a little bit: https://en.wikipedia.org/wiki/Hash_table#Key_statistics Facebook recently open sourced their general-purpose C++ hashtable implementations and talk at length about various tradeoffs, including load: https://code.fb.com/developer-tools/f14/?r=3D1 It might be interesting to read. Anyway, usually it's a number less than 1. 10-20x is unconventional. > Yes. It would be the distribution of values and not their randomness that= would > matter for this case. Hopefully someone with a large # of UFS file system= s will > run the test program and post the results. To be honest, I doubt if anyon= e will > create a server with enough UFS file systems for it to be important to ha= sh their > fsid well. It works fine for the small number of UFS file systems I have.= ) Maybe pho@ will take that as a challenge. You could create a lot of UFS filesystems with mdconfig and enough RAM. > Thanks for the comments, rick Cheers, Conrad