From owner-freebsd-hackers@FreeBSD.ORG Mon Oct 15 21:58:28 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 49512758 for ; Mon, 15 Oct 2012 21:58:28 +0000 (UTC) (envelope-from ivoras@gmail.com) Received: from mail-vb0-f54.google.com (mail-vb0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id F106A8FC0C for ; Mon, 15 Oct 2012 21:58:27 +0000 (UTC) Received: by mail-vb0-f54.google.com with SMTP id v11so7416671vbm.13 for ; Mon, 15 Oct 2012 14:58:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=tJnSc+/HlFMB8Izt5qAntTCXsSyQVlSF+S8w2v+54Us=; b=oOMtU+wBsxXV1y5zjhyl5oH0K8Y1tEcStzcs6ToVXxGU01IUaWpy4hcnXQW4/3HMoz TpjLt0noi6WeqIzJ2PUgeKKKXtACetPIFyIVokwf7xukLLCo5TN8wpHeDS9p6nP2uWEL udmcsh+8SvT/65tVtRiwZ4vXfHSlLcs+VqEl4V7fy/WlvCAqihdPq47alBpghz2pWEjD /R8Y7Own8x8bTPrDWSHKHuezPwVl8ME1ftO4jeYKFoqp2uhaGsnLZdxzpGpghkaERcxp oq0sv0MC3UCESxVmph3/5x8955BWA4TRXCdFu2sniN6k5MHrdQbvTksXp9JTO6QJiHWA bWKQ== Received: by 10.58.32.234 with SMTP id m10mr4658629vei.60.1350338306884; Mon, 15 Oct 2012 14:58:26 -0700 (PDT) MIME-Version: 1.0 Sender: ivoras@gmail.com Received: by 10.59.0.37 with HTTP; Mon, 15 Oct 2012 14:57:46 -0700 (PDT) In-Reply-To: <1516511249.2287339.1350334696127.JavaMail.root@erie.cs.uoguelph.ca> References: <1516511249.2287339.1350334696127.JavaMail.root@erie.cs.uoguelph.ca> From: Ivan Voras Date: Mon, 15 Oct 2012 23:57:46 +0200 X-Google-Sender-Auth: l_I3vdWWsVANF5pElddtbH_WqPE Message-ID: Subject: Re: NFS server bottlenecks To: Rick Macklem Content-Type: text/plain; charset=UTF-8 Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2012 21:58:28 -0000 On 15 October 2012 22:58, Rick Macklem wrote: > The problem is that UDP entries very seldom time out (unless the > NFS server isn't seeing hardly any load) and are mostly trimmed > because the size exceeds the highwater mark. > > With your code, it will clear out all of the entries in the first > hash buckets that aren't currently busy, until the total count > drops below the high water mark. (If you monitor a busy server > with "nfsstat -e -s", you'll see the cache never goes below the > high water mark, which is 500 by default.) This would delete > entries of fairly recent requests. You are right about that, if testing by Nikolay goes reasonably well, I'll work on that. > If you are going to replace the global LRU list with ones for > each hash bucket, then you'll have to compare the time stamps > on the least recently used entries of all the hash buckets and > then delete those. If you keep the timestamp of the least recent > one for that hash bucket in the hash bucket head, you could at least > use that to select which bucket to delete from next, but you'll still > need to: > - lock that hash bucket > - delete a few entries from that bucket's lru list > - unlock hash bucket > - repeat for various buckets until the count is beloew the high > water mark Ah, I think I get it: is the reliance on the high watermark as a criteria for cache expiry the reason the list is a LRU instead of an ordinary unordered list? > Or something like that. I think you'll find it a lot more work that > one LRU list and one mutex. Remember that mutex isn't held for long. It could be, but the current state of my code is just groundwork for the next things I have in plan: 1) Move the expiry code (the trim function) into a separate thread, run periodically (or as a callout, I'll need to talk with someone about which one is cheaper) 2) Replace the mutex with a rwlock. The only thing which is preventing me from doing this right away is the LRU list, since each read access modifies it (and requires a write lock). This is why I was asking you if we can do away with the LRU algorithm. > Btw, the code looks very nice. (If I was being a style(9) zealot, > I'd remind you that it likes "return (X);" and not "return X;". Thanks, I'll make it more style(9) compliant as I go along.