From owner-freebsd-arch@FreeBSD.ORG Sun Nov 25 01:56:28 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5FE52970 for ; Sun, 25 Nov 2012 01:56:28 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from mail-ie0-f182.google.com (mail-ie0-f182.google.com [209.85.223.182]) by mx1.freebsd.org (Postfix) with ESMTP id 119808FC0C for ; Sun, 25 Nov 2012 01:56:27 +0000 (UTC) Received: by mail-ie0-f182.google.com with SMTP id s9so10569105iec.13 for ; Sat, 24 Nov 2012 17:56:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=5kFNywY75l5VOpgLoQQI+yPVuoy8q/kutkJaLtBzlrg=; b=UK8EEwtNnf6lGaRy8lqOCwg42g6Y3ZPo1T+bU66d9JHJ8zEcdnCcBEUN/kJP9QEduy +UrbI/+zJvDQ1YRUmPQUC/acqyFgA92gME1kzI4h8xcnlGVTxW3IAkZtlSJE5A+xoF+l ClzcONz6ddJCixN6+m0EQznTg8rEzo+UbiJxUtYWN4zhf+KUA85UyahqhLXTAEuwmlW7 +7lPX05swxi9ZJpxM9caHvqTb4WKjO7tL+jVY2plSO/wuZCSVTbdGsXbFqxe1Ru2k39a aLfhhO05HuQD0HOQwED233glQGCXQKBmWcmHmT04CaQHwBO/u3SDAZknPMFq9YZK24xM ZPBw== Received: by 10.43.105.129 with SMTP id dq1mr6691717icc.31.1353808586663; Sat, 24 Nov 2012 17:56:26 -0800 (PST) Received: from 53.imp.bsdimp.com (50-78-194-198-static.hfc.comcastbusiness.net. [50.78.194.198]) by mx.google.com with ESMTPS id fa6sm8449191igb.2.2012.11.24.17.56.22 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 24 Nov 2012 17:56:24 -0800 (PST) Sender: Warner Losh Subject: Re: [RFC] sema_wait_sig Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: <50B178A3.4070305@mu.org> Date: Sat, 24 Nov 2012 18:56:21 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <46D582BB-1EB9-4080-9733-7558D6D87FA8@bsdimp.com> References: <20121124193010.GB1627@lonesome.com> <50B12520.7040508@mu.org> <50B145C5.8070503@mu.org> <50B16E7A.60900@mu.org> <50B178A3.4070305@mu.org> To: Alfred Perlstein X-Mailer: Apple Mail (2.1085) X-Gm-Message-State: ALoCoQnGwGsYvFtm+YrzHVemWmvjwXqk/rHl0yxsvAZOB/ENfJ5Bvkhf9BQETMwtWcDTrNDOTge7 Cc: attilio@FreeBSD.org, Mark Linimon , Oleksandr Tymoshenko , arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Nov 2012 01:56:28 -0000 On Nov 24, 2012, at 6:47 PM, Alfred Perlstein wrote: > On 11/24/12 5:16 PM, Attilio Rao wrote: >> On Sun, Nov 25, 2012 at 1:03 AM, Alfred Perlstein = wrote: >>> On 11/24/12 4:38 PM, Attilio Rao wrote: >>>> On Sat, Nov 24, 2012 at 10:10 PM, Alfred Perlstein = wrote: >>>>> I don't understand why you are the one who is so upset. Your = first email >>>>> to >>>>> me implied that I had 0 smp experience. >>>>>=20 >>>>> Let me explain why this rototilling is unneeded. >>>>>=20 >>>>> Go download a copy of linux and observe the following: >>>>> spin_lock(&mb_cache_spinlock); >>>>> spin_unlock(&mb_cache_spinlock); >>>>> spin_lock_irqsave, spin_unlock_irqrestore >>>>> up() >>>>> down(dqio_mutex) >>>>>=20 >>>>> Those apis have been available for a decade at least. >>>>>=20 >>>>> I'll cut to the point on this. >>>>>=20 >>>>> If you want to change HOW the underlying freebsd SMP api works to = improve >>>>> performance, then please do! >>>>>=20 >>>>> But if you want to change the actual KPI, then please realize that = Linux >>>>> SMP >>>>> does darn well with a KPI for SMP that's pretty much unchanged for = nearly >>>>> 10 >>>>> years. >>>>>=20 >>>>> I would venture to say in this respect we've become what we used = to mock >>>>> Linux for, an OS that gratuitously changes interfaces for the sake = of >>>>> what >>>>> is cool, versus what our vendors need. >>>> Keeping old mechanisms/duplicate/etc. around just because they = existed >>>> 10 years ago is not a good reason once their KPI is not only = redundant >>>> but also dangerous. And this seems to be your only "technical" = reason >>>> opposed to my proposals. >>> Whoa, wait a second. >>>=20 >>> A user just proposed using the infrastructure to port linux drivers. >>>=20 >>> Additionally the following subsystems make use of sema(9): >>> inifiband stack (linux compat shim). >>> sysv ipc. >>> ata. >>> opensolaris compat shim. >>> xfs. >>>=20 >>> What would be the point of removing this KPI? >> Did you see also how they are used? >> In some places they have a counter of 1, which means they can be >> effectively replaced by an sx lock. >> In all the other places, they are used with a counter of 0, which >> means they can be effectively replaced by mtx and sleep. >>=20 >> Can you giving me a reason on why really keeping them? >>=20 >> Also, if you think they would help a Linux compat shim layer, keep in >> mind the following: >> - a plan for something like that has been discussed for years and by >> several people and nothing concrete, happened, with a lot of >> disagreement (both technical and philosophical) >> - there is no plan for doing so in the foreeable future, neither = there >> is agreement it is really a good idea. So you prefer to have >> completely redundant (and unused in the end) code just because it may >> or may not happen to help a compat layer that doesn't exist and maybe >> will never exits? Please answer openly. >=20 > 1) compat layer > /usr/src.local/sys/ofed/drivers/infiniband/core # > cddl/contrib/opensolaris >=20 > 2) > if a user expects semaphores and we tell them to "rethink" things, = then we're not providing the same facilities as every other non-BSD OS. >=20 > I guess that makes us "cool", but really it just seems out of touch. >=20 > The implementation is 176 lines of code + some headers. >=20 > The sad part to me is that the original user asked "hey I need = sema+signal" but we don't know the facility they really need, count of = 1? count of 10? instead of just giving them a textbook CS semaphore we = tell them to "build your own using our primitives". You don't need stdio, you can build it from the syscall primitives... Warner > At some point an OS has to grow up and realize that by doing = everything its own way it's not making itself special, so much as = limiting its acceptance. >=20 >>> Those consumers would then just have to roll their own. >>>=20 >>> Wouldn't that lead to duplicate code? >>>=20 >>> 176 sys/kern/kern_sema.c >>>=20 >>> It's not really a lot of code. >>>=20 >>>=20 >>>> Using disown for lockmgr is something very dangerous which should = not >>>> be used out of his specific case for the buffer cache. I really = don't >>>> want to incourage its use out of that and I'm sure people can build >>>> very dangerous policies using it (this is just an example, but it >>>> explains my point, I think). >>>> Maybe my proposed changes of mtx against rwlock are a bit too = extreme, >>>> I could understand that and I'm very open on changing my mind on = it, >>>> but I don't understand how would be useful to keep lockmgr() and >>>> sema() around honestly. >>>>=20 >>>> It is just a burden of code duplication (in some places) and = dangerous >>>> KPI (in other). >>> I agree that lockmgr is a very dangerous beast. Whatever that can = be done >>> to get rid of the complexity would be good. >>>=20 >>> If we could hide some of the lockmgr "features" behind a "I know = what i'm >>> doing fence" or maybe a "only to be used with filesystem code" fence = that >>> might be good. >> I don't agree, I would just like to have a clean KPI and force people >> to do right things. That clean KPI already exists, we just need to >> conver current consumers in doing their dirtiness in "controled >> environment". >=20 > Well I was just trying to agree with you, to be honest I have no idea = what your plans are. >=20 > I did want to explain that merging sx+lockmgr was tried before, and it = failed. You may have more skill with it and succeed, but you should = check source history and mailing lists for the edge cases that made = replacing it entirely fail. >=20 >=20 > -Alfred > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to = "freebsd-arch-unsubscribe@freebsd.org"