From owner-freebsd-arch@FreeBSD.ORG Sun Aug 6 09:05:58 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9666716A4F6 for ; Sun, 6 Aug 2006 09:05:58 +0000 (UTC) (envelope-from anonymous@host167.ipowerweb.com) Received: from host167.ipowerweb.com (host167.ipowerweb.com [66.235.199.81]) by mx1.FreeBSD.org (Postfix) with SMTP id 14B7343D6A for ; Sun, 6 Aug 2006 09:05:58 +0000 (GMT) (envelope-from anonymous@host167.ipowerweb.com) Received: (qmail 36955 invoked by uid 10079); 6 Aug 2006 09:05:57 -0000 Received: from 127.0.0.1 by host167.ipowerweb.com (envelope-from , uid 1086) with qmail-scanner-1.25st (clamdscan: 0.88/1235. spamassassin: 3.1.0. perlscan: 1.25st. Clear:RC:1(127.0.0.1):SA:0(2.0/5.0):. Processed in 0.177504 secs); 06 Aug 2006 09:05:57 -0000 X-Spam-Status: No, hits=2.0 required=5.0 X-Spam-Level: ++ Message-ID: <20060806090557.36944.qmail@host167.ipowerweb.com> Content-Disposition: inline Content-Transfer-Encoding: binary Content-Type: text/plain MIME-Version: 1.0 X-Mailer: MIME::Lite 2.106 (A1.67; B2.21; Q2.21) Date: Sun, 6 Aug 2006 09:05:57 UT From: Black Dog Mastering Studio To: Friend Subject: Demo at Black Dog Mastering Studio av X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Aug 2006 09:05:58 -0000 BLACK DOG MASTERING STUDIO will provide ONE FREE DEMO for all new customers. If you want to hear what our Mastering Service can do for your MIX....come catch the vibe with Black Dog Mastering Studio at "http://blackdogmastering.com/free_master_form.html". We have the capability to work with clients WorldWide. The process is easy...upload to our server or send your CD/DVD(s) via mail. A web page will be created for you to download .mp3 or .wav files to review the progress. Mixing Services also available. K.C. Director, Marketing Black Dog Mastering Studio 5373 Ehrlich Road Suite 203-143 Tampa, FL 33625 1.800.283.0410 -------- Hear the difference - http://blackdogmastering.com/Mastering_Samples.html To unsubscribe: http://www.blackdogmastering.com/cgi-bin/autoresponder/ar.cgi?mode=r&a=master&e=freebsd-arch@freebsd.org From owner-freebsd-arch@FreeBSD.ORG Mon Aug 7 19:29:13 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 03F2416A50C for ; Mon, 7 Aug 2006 19:29:13 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from wx-out-0506.google.com (wx-out-0506.google.com [66.249.82.233]) by mx1.FreeBSD.org (Postfix) with ESMTP id A395B43D66 for ; Mon, 7 Aug 2006 19:29:05 +0000 (GMT) (envelope-from asmrookie@gmail.com) Received: by wx-out-0506.google.com with SMTP id i27so592481wxd for ; Mon, 07 Aug 2006 12:29:04 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; b=L52lskp90/eNtJzS9dlrWnCCtFJJEuCSYSheVvFrDq2OjfWtEQfhNkIO5n0Xh3+k7YpLqSAkjshCxxgDH04UWkUGfyyqtpMNdPi/NZ7wtm46JRAOmcp/D9YmyJGAqNqixbKve3GbuDW9r/oXx+GGyyJe6s60kHZkw2uB+zusaUY= Received: by 10.70.38.19 with SMTP id l19mr7338328wxl; Mon, 07 Aug 2006 12:27:20 -0700 (PDT) Received: by 10.70.11.18 with HTTP; Mon, 7 Aug 2006 12:27:19 -0700 (PDT) Message-ID: <3bbf2fe10608071227j17c4cfa6qd84e1d8e53668fda@mail.gmail.com> Date: Mon, 7 Aug 2006 21:27:19 +0200 From: "Attilio Rao" Sender: asmrookie@gmail.com To: freebsd-arch@freebsd.org, freebsd-current@freebsd.org, "John Baldwin" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Google-Sender-Auth: 4138493682e4faea Cc: Subject: [PATCH] Adding Solaris-style "owner of records" to rwlocks X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Aug 2006 19:29:13 -0000 This is a first implementation of the owner of records concept in rwlocks. It allows to avoid the priority inversion problem in the current rwlocks implementation (for readers). The main idea (that John and I discussed) is to have as owner of records the first rlock'er for a "class contention". The implementation consists in adding two flags (RW_LOCK_OWNED and RW_LOCK_EXEMPTED) which are used in order to not penalyze the easy case, and syncronizing the operation of acquiring and dropping the owner of records with the turnstile spin-lock. The main scheme might work in this way: thread1::rlock() -> sets the owner of records thread2::rlock() -> checks for RW_LOCK_OWNED bit and, if it is set, go in the easy case thread3::rlock() -> checks for RW_LOCK_OWNED... thread4::wlock() -> blocks and land its priority to thread1 thread1::runlock() -> disable the owner of records (disowning the associated turnstile) and sets the RW_LOCK_EXEMPTED flag. In this way other threads will treact as an easy case. ... What I actually need is a testing suite for heavy-load contentions, since I would like to detect eventual races I missed, etc. If somebody has a get-ready testing suite, please, let me know. The patch against HEAD is here: http://users.gufi.org/~rookie/works/patches/rwlocks.diff Please, this is not intended to be a final implementation for this, since I think that it can be improved; it is just a starting point for ongoing works and improvements. Let me know if something is not clear. Feedbacks, comments, ideas are welcome. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-arch@FreeBSD.ORG Mon Aug 7 22:00:34 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 748EF16A4F2; Mon, 7 Aug 2006 22:00:34 +0000 (UTC) (envelope-from ssouhlal@FreeBSD.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 12DC443D8F; Mon, 7 Aug 2006 22:00:30 +0000 (GMT) (envelope-from ssouhlal@FreeBSD.org) Received: from [192.168.250.2] (80-219-8-155.dclient.hispeed.ch [80.219.8.155]) by elvis.mu.org (Postfix) with ESMTP id E163A1A3C27; Mon, 7 Aug 2006 15:00:29 -0700 (PDT) Message-ID: <44D7B7ED.5070302@FreeBSD.org> Date: Tue, 08 Aug 2006 00:00:13 +0200 From: Suleiman Souhlal User-Agent: Mozilla Thunderbird 1.0.7 (X11/20051204) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Attilio Rao References: <3bbf2fe10608071227j17c4cfa6qd84e1d8e53668fda@mail.gmail.com> In-Reply-To: <3bbf2fe10608071227j17c4cfa6qd84e1d8e53668fda@mail.gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-current@freebsd.org, freebsd-arch@freebsd.org Subject: Re: [PATCH] Adding Solaris-style "owner of records" to rwlocks X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Aug 2006 22:00:34 -0000 Attilio Rao wrote: > This is a first implementation of the owner of records concept in rwlocks. > It allows to avoid the priority inversion problem in the current > rwlocks implementation (for readers). > > The main idea (that John and I discussed) is to have as owner of > records the first rlock'er for a "class contention". > The implementation consists in adding two flags (RW_LOCK_OWNED and > RW_LOCK_EXEMPTED) which are used in order to not penalyze the easy > case, and syncronizing the operation of acquiring and dropping the > owner of records with the turnstile spin-lock. > The main scheme might work in this way: > > thread1::rlock() -> sets the owner of records > thread2::rlock() -> checks for RW_LOCK_OWNED bit and, if it is set, go > in the easy case > thread3::rlock() -> checks for RW_LOCK_OWNED... > thread4::wlock() -> blocks and land its priority to thread1 > thread1::runlock() -> disable the owner of records (disowning the > associated turnstile) and sets the RW_LOCK_EXEMPTED flag. In this way > other threads will treact as an easy case. > ... Aren't you missing the hard part: transferring ownership from one reader to another? If you don't, you'll still have priority inversions as soon as the initial reader unlocks.. -- Suleiman From owner-freebsd-arch@FreeBSD.ORG Tue Aug 8 16:09:02 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5019B16A4E1 for ; Tue, 8 Aug 2006 16:09:02 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from wx-out-0506.google.com (wx-out-0506.google.com [66.249.82.234]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6193043D55 for ; Tue, 8 Aug 2006 16:09:00 +0000 (GMT) (envelope-from asmrookie@gmail.com) Received: by wx-out-0506.google.com with SMTP id i27so791015wxd for ; Tue, 08 Aug 2006 09:08:59 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=iUZZ8fulYqBGpXLvUhI0hOuHL0HWvz7y1zMXEOFwwrWZvOMEraJNXPrVgc0lIY3moc4YNpr22aoo+NJRnMJPn2BeCHD6zz4Us18YQ44V1EbsflG04P8xtCRHGNJovVtQvdslpdtgqoswahhqYVCeyVEtJuh4WHbsLKYAzg26TKI= Received: by 10.70.74.6 with SMTP id w6mr921776wxa; Tue, 08 Aug 2006 09:08:59 -0700 (PDT) Received: by 10.70.11.18 with HTTP; Tue, 8 Aug 2006 09:08:59 -0700 (PDT) Message-ID: <3bbf2fe10608080908l3c8e7c3aq1e65a610d76d189b@mail.gmail.com> Date: Tue, 8 Aug 2006 18:08:59 +0200 From: "Attilio Rao" Sender: asmrookie@gmail.com To: "Suleiman Souhlal" In-Reply-To: <44D7B7ED.5070302@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <3bbf2fe10608071227j17c4cfa6qd84e1d8e53668fda@mail.gmail.com> <44D7B7ED.5070302@FreeBSD.org> X-Google-Sender-Auth: 2680590499697724 Cc: freebsd-current@freebsd.org, freebsd-arch@freebsd.org Subject: Re: [PATCH] Adding Solaris-style "owner of records" to rwlocks X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Aug 2006 16:09:02 -0000 2006/8/8, Suleiman Souhlal : > Attilio Rao wrote: > > This is a first implementation of the owner of records concept in rwlocks. > > It allows to avoid the priority inversion problem in the current > > rwlocks implementation (for readers). > > > > The main idea (that John and I discussed) is to have as owner of > > records the first rlock'er for a "class contention". > > The implementation consists in adding two flags (RW_LOCK_OWNED and > > RW_LOCK_EXEMPTED) which are used in order to not penalyze the easy > > case, and syncronizing the operation of acquiring and dropping the > > owner of records with the turnstile spin-lock. > > The main scheme might work in this way: > > > > thread1::rlock() -> sets the owner of records > > thread2::rlock() -> checks for RW_LOCK_OWNED bit and, if it is set, go > > in the easy case > > thread3::rlock() -> checks for RW_LOCK_OWNED... > > thread4::wlock() -> blocks and land its priority to thread1 > > thread1::runlock() -> disable the owner of records (disowning the > > associated turnstile) and sets the RW_LOCK_EXEMPTED flag. In this way > > other threads will treact as an easy case. > > ... > > Aren't you missing the hard part: transferring ownership from one reader > to another? If you don't, you'll still have priority inversions as soon > as the initial reader unlocks.. Exactly, but having a complete owner switching would be: 1) too hard to achieve in terms of resource taken 2) will imply too many races and we might get a too hard function With this implementation, only the first rlock (for every class contention) will be penalyzed while the other are treacted as the easy/hard case. It doesn't completely solve the priority inversion problem, but it's the better compromise between performances/correctnes. Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-arch@FreeBSD.ORG Tue Aug 8 16:24:25 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1D63016A4E0 for ; Tue, 8 Aug 2006 16:24:25 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from wx-out-0506.google.com (wx-out-0506.google.com [66.249.82.235]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6DF2343D55 for ; Tue, 8 Aug 2006 16:24:24 +0000 (GMT) (envelope-from asmrookie@gmail.com) Received: by wx-out-0506.google.com with SMTP id i27so795125wxd for ; Tue, 08 Aug 2006 09:24:23 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=D2n6K2z1fISkOmi+BMdekFeC4mxrtkebEbitsI7xZrRqPcUu0/SP2O3KZpsz0kL3ebg4bVqrfrrReWQSil2wfF+fQlPIBVd8nRanGajUO9PdF1f4CJ2eNeJ9dcopQwpLRIQQ/IYM7DVYvoJSRS8sGQRan1VbIXGjeYmn7irBRjA= Received: by 10.70.8.8 with SMTP id 8mr1036522wxh; Tue, 08 Aug 2006 09:24:23 -0700 (PDT) Received: by 10.70.11.18 with HTTP; Tue, 8 Aug 2006 09:24:23 -0700 (PDT) Message-ID: <3bbf2fe10608080924p1536b4e5s6d3c79be3546aefe@mail.gmail.com> Date: Tue, 8 Aug 2006 18:24:23 +0200 From: "Attilio Rao" Sender: asmrookie@gmail.com To: "Suleiman Souhlal" In-Reply-To: <3bbf2fe10608080908l3c8e7c3aq1e65a610d76d189b@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <3bbf2fe10608071227j17c4cfa6qd84e1d8e53668fda@mail.gmail.com> <44D7B7ED.5070302@FreeBSD.org> <3bbf2fe10608080908l3c8e7c3aq1e65a610d76d189b@mail.gmail.com> X-Google-Sender-Auth: a7b831ac583f3eb9 Cc: freebsd-current@freebsd.org, freebsd-arch@freebsd.org Subject: Re: [PATCH] Adding Solaris-style "owner of records" to rwlocks X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Aug 2006 16:24:25 -0000 2006/8/8, Attilio Rao : > > > > Aren't you missing the hard part: transferring ownership from one reader > > to another? If you don't, you'll still have priority inversions as soon > > as the initial reader unlocks.. > > Exactly, but having a complete owner switching would be: > 1) too hard to achieve in terms of resource taken > 2) will imply too many races and we might get a too hard function > > With this implementation, only the first rlock (for every class > contention) will be penalyzed while the other are treacted as the > easy/hard case. > It doesn't completely solve the priority inversion problem, but it's > the better compromise between performances/correctnes. As addiction, I can say it would be interesting investigate other solutions (i.e: partial readers tracking or full readers tracking) and benchmarking what works in the better way, but here benchmarks would take the biggest part of time. If somebody is interested can drop a mail to me (or to John, if he has time). Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-arch@FreeBSD.ORG Tue Aug 8 18:02:06 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D198516A536; Tue, 8 Aug 2006 18:02:06 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by mx1.FreeBSD.org (Postfix) with ESMTP id 24FB743D4C; Tue, 8 Aug 2006 18:02:05 +0000 (GMT) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.13.6/8.13.6) with ESMTP id k78I1vkt051917; Tue, 8 Aug 2006 14:02:04 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: Suleiman Souhlal Date: Tue, 8 Aug 2006 13:42:15 -0400 User-Agent: KMail/1.9.1 References: <3bbf2fe10608071227j17c4cfa6qd84e1d8e53668fda@mail.gmail.com> <44D7B7ED.5070302@FreeBSD.org> In-Reply-To: <44D7B7ED.5070302@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200608081342.15839.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Tue, 08 Aug 2006 14:02:05 -0400 (EDT) X-Virus-Scanned: ClamAV 0.87.1/1640/Mon Aug 7 21:11:04 2006 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on server.baldwin.cx Cc: Attilio Rao , freebsd-current@freebsd.org, freebsd-arch@freebsd.org Subject: Re: [PATCH] Adding Solaris-style "owner of records" to rwlocks X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Aug 2006 18:02:06 -0000 On Monday 07 August 2006 18:00, Suleiman Souhlal wrote: > Attilio Rao wrote: > > This is a first implementation of the owner of records concept in rwlocks. > > It allows to avoid the priority inversion problem in the current > > rwlocks implementation (for readers). > > > > The main idea (that John and I discussed) is to have as owner of > > records the first rlock'er for a "class contention". > > The implementation consists in adding two flags (RW_LOCK_OWNED and > > RW_LOCK_EXEMPTED) which are used in order to not penalyze the easy > > case, and syncronizing the operation of acquiring and dropping the > > owner of records with the turnstile spin-lock. > > The main scheme might work in this way: > > > > thread1::rlock() -> sets the owner of records > > thread2::rlock() -> checks for RW_LOCK_OWNED bit and, if it is set, go > > in the easy case > > thread3::rlock() -> checks for RW_LOCK_OWNED... > > thread4::wlock() -> blocks and land its priority to thread1 > > thread1::runlock() -> disable the owner of records (disowning the > > associated turnstile) and sets the RW_LOCK_EXEMPTED flag. In this way > > other threads will treact as an easy case. > > ... > > Aren't you missing the hard part: transferring ownership from one reader > to another? If you don't, you'll still have priority inversions as soon > as the initial reader unlocks.. Even Solaris doesn't do this as the overhead to do this would seem to outweigh the advantages of having a perfect implementation. I think Attilio is actually going to try it several different ways and then run benchmarks to see if that assertion is true. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Fri Aug 11 02:49:22 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B44B316A4DF; Fri, 11 Aug 2006 02:49:22 +0000 (UTC) (envelope-from jd@ugcs.caltech.edu) Received: from mark.ugcs.caltech.edu (mark.ugcs.caltech.edu [131.215.176.117]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5B0D443D45; Fri, 11 Aug 2006 02:49:22 +0000 (GMT) (envelope-from jd@ugcs.caltech.edu) Received: by mark.ugcs.caltech.edu (Postfix, from userid 3640) id DAA7F3F050; Thu, 10 Aug 2006 19:49:21 -0700 (PDT) Date: Thu, 10 Aug 2006 19:49:21 -0700 From: Paul Allen To: Pawel Jakub Dawidek Message-ID: <20060811024921.GF308@mark.ugcs.caltech.edu> References: <20060808195202.GA1564@garage.freebsd.pl> <20060810184702.GA8567@nowhere> <20060810192841.GA1345@garage.freebsd.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060810192841.GA1345@garage.freebsd.pl> Sender: jd@ugcs.caltech.edu Cc: freebsd-fs@freebsd.org, Craig Boston , freebsd-geom@freebsd.org, freebsd-arch@freebsd.org Subject: Re: GJournal (hopefully) final patches. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Aug 2006 02:49:22 -0000 It's a bit disturbing that a geom-class quite far away from the storage drivers presumes that the proper action here is a cache flush. The underlying hardware may support tagged command queuing (i.e., SCSIs ability to receive not only transaction completion notications but also to permit partial-orderings to be dictated to the controller) or native-command queuing (command completion). It's true that this functionality may not always work as advertised but that's a problem to be solved with dev. sysctls, not by taking a LCD approach in a high-level geom class. This really needs broader architecture consideration, not just what it takes it make it work. Paul >From Pawel Jakub Dawidek , Thu, Aug 10, 2006 at 09:28:41PM +0200: > On Thu, Aug 10, 2006 at 01:47:23PM -0500, Craig Boston wrote: > > Hi, > > > > It's great to see this project so close to completion! I'm trying it > > out on a couple machines to see how it goes. > > > > A few comments and questions: > > > > * It took me a little by surprise that it carves 1G out of the device > > for the journal. Depending on the size of the device that can be a > > pretty hefty price to pay (and I didn't see any mention of it in the > > setup notes). For a couple of my smaller filesystems I reduced it to > > 512MB. Perhaps some algorithm for auto-sizing the journal based on > > the size / expected workload of the device would be in order? > > It will be pointed out in documentation when I finally prepare it. > I don't have plans about autosizing currently. > > > * Attached is a quick patch for geom_eli to allow it to pass BIO_FLUSH > > down to its backing device. It seems like the right thing to do and > > fixes the "BIO_FLUSH not supported" warning on my laptop that uses a > > geli encrypted disk. > > I've this already in my perforce tree. I also implemented BIO_FLUSH > passing in gmirror and graid3. > > I also added a flag for gmirror and graid3 which says "don't > resynchronize components after a power failure - trust they are > consistent". And they are always consistent when placed below gjournal. > > > * On a different system, however, it complains about it even on a raw > > ATA slice: > > > > atapci1: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0 > > ata0: on atapci1 > > ad0: 114473MB at ata0-master UDMA100 > > GEOM_JOURNAL: BIO_FLUSH not supported by ad0s1e. > > > > It seems like a reasonably modern controller and disk, at least it > > should be capable of issuing a cache flush command. Not sure why it > > doesn't like it :/ > > We would need to add some printfs to diagnoze this probably - you can > try adding some lines to ad_init() to get this: > > if (atadev->param.support.command1 & ATA_SUPPORT_WRITECACHE) { > if (ata_wc) > ata_controlcmd(dev, ATA_SETFEATURES, ATA_SF_ENAB_WCACHE, 0, 0); > else > ata_controlcmd(dev, ATA_SETFEATURES, ATA_SF_DIS_WCACHE, 0, 0); > } else { > printf("ad_init: WRITE CACHE not supported by ad%d.\n", > device_get_unit(dev)); > } > > > * How "close" does the filesystem need to be to the gjournal device in > > order for the UFS hooks to work? Directly on it? > > > > The geom stack on my laptop currently looks something like this: > > > > [geom_disk] ad0 <- [geom_eli] ad0.eli <- [geom_gpt] ad0.elip6 <- > > [geom_label] gjtest <- [geom_journal] gjtest.journal <- UFS > > > > I was wondering if an arrangement like this would work: > > > > [geom_journal] ad0p6.journal <- [geom_eli] ad0p6.journaleli <- UFS > > > > and if it would be any more efficient (journal the encrypted data > > rather than encrypt the journal). Or even gjournal the whole disk at > > once? > > When you mount file system it sends BIO_GETATTR "GJOURNAL::provider" > requests. So as long as classes between the file system and gjournal > provider pass BIO_GETATTR down, it will work. > > On my home machine I've the following configuration: > > raid3/DATA1.elid.journal > > So it's UFS over gjournal over bsdlabel over geli over raid3 over ata. > > I prefer to put gjournal on the top, because it gives consistency to > layers below it. For example I can use geli with bigger sector size > (sector size greater than disk sector size in encryption-only-mode can > be unreliable on power failures, which is not the case when gjournal is > above geli), I can turn off synchronization of gmirror/graid3 after a > power failure, etc. > > On the other hand configuring geli on top of gjournal can be more > effective for large files - geli will not encrypt the data twice. > > Fortunatelly with GEOM you can freely mix your puzzles. > > > Haven't been brave enough to try gjournal on root yet, but my /usr and > > /compile (src, obj, ports) partitions are already on it so I'm sure I'll > > try it soon ;) > > Markus Trippelsdorf reported that it doesn't work out of the box, but he > manage to make it to work with some small changes to fsck_ffs(8). > > -- > Pawel Jakub Dawidek http://www.wheel.pl > pjd@FreeBSD.org http://www.FreeBSD.org > FreeBSD committer Am I Evil? Yes, I Am!