From owner-freebsd-arch@FreeBSD.ORG Sat Jun 2 17:16:37 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9D316106564A; Sat, 2 Jun 2012 17:16:37 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 083FF8FC1B; Sat, 2 Jun 2012 17:16:36 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q52HGXPx066108; Sat, 2 Jun 2012 20:16:33 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q52HGWLB075163; Sat, 2 Jun 2012 20:16:32 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q52HGWrW075162; Sat, 2 Jun 2012 20:16:32 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 2 Jun 2012 20:16:32 +0300 From: Konstantin Belousov To: Attilio Rao Message-ID: <20120602171632.GC2358@deviant.kiev.zoral.com.ua> References: <20120601193522.GA2358@deviant.kiev.zoral.com.ua> <20120602164847.GB2358@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="0CHKT3anvf6u5QiQ" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: Alexander Kabaev , Alan Cox , Konstantin Belousov , Gianni , freebsd-arch@freebsd.org Subject: Re: Fwd: [RFC] Kernel shared variables X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Jun 2012 17:16:37 -0000 --0CHKT3anvf6u5QiQ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jun 02, 2012 at 06:00:06PM +0100, Attilio Rao wrote: > Sorry, resending with all the recipients in. >=20 > Attilio >=20 >=20 > ---------- Forwarded message ---------- > From: Attilio Rao > Date: 2012/6/2 > Subject: Re: [RFC] Kernel shared variables > To: Konstantin Belousov >=20 >=20 > 2012/6/2 Konstantin Belousov : > > On Sat, Jun 02, 2012 at 02:01:35PM +0100, Attilio Rao wrote: [Tried to trim the text] > >> I think, he just wants to map in userland processes some pages from > >> the static image of the kernel (packed together in a specific > >> dataset). This imposes some non-trivial problem. The first thing is > >> that the static image is not thought to have physical pages tied to > >> it. The second is that he needs to make a clean design in order to let > >> consumer of this mechanism to correctly locate informations they want > >> within the shared page(s) and in the end read the correct values. > > Right, exactly, and this is why I object to the "offsets" approach. > > It basically moves us to the old times of the "jump tables" shared > > libraries, that fortunately was never a case for FreeBSD even when > > a.out was used. >=20 > I'm objecting to this either. My english is not good enough to understand this. Do you agree or disagree with my statement that 'indexes' make it very hard to maintain ABI ? >=20 > >> > >> I have some reservations on both the implementation and the approach > >> for retrieving datas from the page. > >> In particular, I don't like that a new vm_object is allocated for this > >> page. What I really would like would be: > >> 1) very minimal implementation -- you just use > >> pmap_enter()/pmap_remove() specifically when needed, separately, in > >> fork(), execve(), etc. cases > > Oh, this simply cannot work. >=20 > And why? Assuming you provide a vm_page_t from an UMA zone just like > fakepage do. Of course you cannot recycle for this purpose any page > caming from vm_page_alloc(). Due to pv_collect/pmap_pv_reclaim, the pte might be destroyed any time. Using hacks like mapping the page wired and then needing to hack any VM space manipulation (fork/rfork/exec/exit/swapout/I possibly missed several cases) just does not pay for it. >=20 > >> 2) more complete approach -- you make a very quick layer which let you > >> map pages from the static image of the kernel and the shared page > >> becomes just a specific consumer of this. This way the object has much > >> more sense because it becomes an object associated to all the static > >> image of the kernel > > So you want to circumvent the vm layer. >=20 > Note sure I agree with your opinion on this. >=20 > >> > >> About the layering, I don't like that you require both a kernel and > >> userland header to locate the objects within the page. This is very > >> likely ABI breakage prone. It is needed a mechanism for retrieving at > >> run time what Giovanni calls "indexes", or making it indexes-agnostic. > > > > And this is what VDSO is for. VDSO with the standard ELF symbol > > interposition rules allow to have libc that is completely unaware of the > > shared page and 'indexes', i.e. which works both for older kernel that > > do not export required index, and for new kernels that export the same > > information in some more advanced format. By having VDSO that exports > > e.g. gettimeofday() we would get override for libc gettimeofday, while > > having fully functional libc for other, future and past, kernels, even > > if the format of the data exported for super-fast gettimeofday changes. > > > > The tight between VDSO and kernel is not a problem, since VDSO is part > > of the kernel from the deployment POV. More. either existing ELF > > linker in kernel, or some trivial modifications to it, would allow > > to not use 'indexes' on the kernel side too. >=20 > I admit I don't have a better plan on how to retrieve objects from the > shared page at the moment, I didn't give much thought to it. >=20 > > We already have a shared page between kernel and whole set of the same-= ABI > > processes. Currently it is used for signal trampolines only. > > The hard parts of the task is to provide VDSO build glue. Also IMO the > > hard task is to define sensible gettimeofday() implementation, probably > > using rdtsc in usermode. Shared page is easy, or at least it is already > > there without ugly and non-working vm hacks. > > > > As an additional note, already put by Bruce, the implementation of > > usermode gettimeofday is exactly opposite of any reasonable implementat= ion. > > It looses the precision to the frequency of the event timer. Obvious > > approach is to not have any periodically updating data for gettimeofday > > purpose, and use some formula with rdtsc and kernel-provided coefficien= ts > > on the machines where rdtsc is usable. >=20 > The gettimeofday() implementation is a different story than what is asked= here. But the goal is to have fast clocks, right ? What else is planned ? In fact, I think that if the whole goal is only fast clocks, then we do not need any additional system mechanisms, since we can easily export coefficients for rdtsc formula already. E.g. we can put it into elf auxv, which is ugly but bearable. >=20 > > Interesting question is how much shared the shared page needs be. > > Obvious needs are shared between all same-ABI processes, but I can also > > easily see a need for the per-process private information be present in > > the 'private-shared' page. For silly but typical example, useful for > > moronix-style benchmarks, see getpid(). >=20 > Really the performance benefits of having fast getpid() is marginal if > compared to heavilly used things like gettimeofday(). I cannot think > of a per-process page implementing a fast syscall that can bring many > perfomance advantages. This is completely true, but there may be other process-private data that could benefit from the low access cost. I just do not know right now. --0CHKT3anvf6u5QiQ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk/KSnAACgkQC3+MBN1Mb4itVACg2xwUF4QRdToJDtqPRvRqaVUT AxwAoIx9JO6bedN2XFgQPWc/EqcAHFvv =sqUF -----END PGP SIGNATURE----- --0CHKT3anvf6u5QiQ--