From owner-freebsd-hackers@FreeBSD.ORG Sun Oct 14 16:19:49 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3E9265C8 for ; Sun, 14 Oct 2012 16:19:49 +0000 (UTC) (envelope-from dcherednik@roshianokatachi.com) Received: from smtp.nanocore.sportcomitet.org (unknown [IPv6:2a01:4f8:d13:2941::1:3]) by mx1.freebsd.org (Postfix) with ESMTP id AE8748FC0A for ; Sun, 14 Oct 2012 16:19:48 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp.nanocore.sportcomitet.org (Postfix) with SMTP id A666AC03A1 for ; Sun, 14 Oct 2012 20:19:47 +0400 (MSK) Received: from [192.168.11.92] (ppp91-76-136-49.pppoe.mtu-net.ru [91.76.136.49]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: dcherednik@roshianokatachi.com) by smtp.nanocore.sportcomitet.org (Postfix) with ESMTPSA id 574D3C01B5; Sun, 14 Oct 2012 20:19:46 +0400 (MSK) Message-ID: <507AE61D.7030709@roshianokatachi.com> Date: Sun, 14 Oct 2012 20:19:41 +0400 From: Daniil Cherednik User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121011 Thunderbird/16.0.1 MIME-Version: 1.0 To: freebsd-hackers@freebsd.org Subject: Re: Fast syscalls via sysenter References: <201206182256.30535.dcherednik@roshianokatachi.com> <201206210811.20427.jhb@freebsd.org> <4FE55F91.5070303@gmail.com> <20120623165823.GX2337@deviant.kiev.zoral.com.ua> In-Reply-To: <20120623165823.GX2337@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-DSPAM-Result: Innocent X-DSPAM-Processed: Sun Oct 14 20:19:47 2012 X-DSPAM-Confidence: 1.0000 X-DSPAM-Improbability: 1 in 98689409 chance of being spam X-DSPAM-Probability: 0.0023 X-DSPAM-Signature: 24,507ae62312612005967964 X-DSPAM-Factors: 27, amd64+#+reasonable, 0.40000, in+#+#+#+current, 0.40000, References*gmail.com+#+deviant.kiev.zoral.com.ua, 0.40000, situation+#+#+is, 0.40000, shared+#+content, 0.40000, Message-ID*507AE61D.7030709+roshianokatachi.com, 0.40000, done+though, 0.40000, 9+#+#+p4, 0.40000, was+#+several, 0.40000, Baldwin+#+#+Monday, 0.40000, would+#+#+solution, 0.40000, function+No, 0.40000, time+#+#+#+like, 0.40000, c+#+No, 0.40000, On+Monday, 0.40000, David+#+#+#+2012, 0.40000, to+#+#+#+to, 0.40000, On+#+#+#+using, 0.40000, rules+#+#+#+see, 0.40000, know+#+#+#+it, 0.40000, vdso+syscall, 0.40000, beginner+#+kernel, 0.40000, Subject*Re+#+#+via, 0.40000, Received*Postfix+with, 0.40000, pushl+#+#+3, 0.40000, looks+#+#+#+some, 0.40000, calls+#+#+#+values, 0.40000 Cc: Konstantin Belousov , davidxu@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Oct 2012 16:19:49 -0000 On 06/23/2012 08:58 PM, Konstantin Belousov wrote: > On Sat, Jun 23, 2012 at 02:17:53PM +0800, David Xu wrote: >> On 2012/06/21 20:11, John Baldwin wrote: >>> On Monday, June 18, 2012 2:56:30 pm Daniil Cherednik wrote: >>>> Hi! >>>> >>>> I am trying to continue the work started by DavidXu on implemention of >>>> fast >>>> syscalls via sysenter/sysexit. >>>> http://people.freebsd.org/~davidxu/sysenter/kernel/ >>>> I have ported it on FreeBSD9. It looks like it works. Unfortunately I am a >>>> beginner in kernel so I have some questions: >>>> >>>> 1. see http://people.freebsd.org/~davidxu/sysenter/kernel/kernel.patch >>>> /* >>>> * If %edx was changed, we can not use sysexit, because it >>>> * needs %edx to restore userland %eip. >>>> */ >>>> if (orig_edx != frame.tf_edx) >>>> td->td_pcb->pcb_flags |= PCB_FULLCTX; >>>> >>>> What is the reason why we have to do this additional check? In >>>> http://people.freebsd.org/~davidxu/sysenter/kernel/sysenter.s >>>> we store %edx to the stack in >>>> pushl %edx /* ring 3 next %eip */ >>>> and we restore the register in >>>> popl %edx /* ring 3 %eip */ >>> Some system calls return two return values (pipe(2)) or return a 64-bit >>> off_t (lseek(2)). Those system calls change %edx's value and need that >>> changed value to make it out to userland. >>> >>>> 2. see http://people.freebsd.org/~davidxu/sysenter/kernel/sysenter.s >>>> movl PCPU(CURPCB),%esi >>>> call syscall >>>> >>>> Why do we movl PCPU(CURPCB),%esi before calling syscall? syscall is just >>>> c- >>>> function. >>> No clue on this one, looks like it is not needed. >>> >> [kib@ is cc'ed] >> I implemented the sysenter syscall long time ago, it indeed can reduce >> system call overhead on i386. I think it might be the time to implement >> linux like vdso syscall now based on the work kib@ recently has done, >> though I don''t know how to hook it into kib's code. >> I quick googled it, and found they put some data into aux vector: >> http://www.trilithium.com/johan/2005/08/linux-gate/ >> http://www.takatan.net/lxr/source/arch/um/os-Linux/elf_aux.c?a=x86_64#L40 > Yes, intent is to eventually switch to VDSO from current situation were > libc is aware of shared page content. This was extensively discussed in > flame that resulted in me writing the current gettimeofday(2) patch. > It was arch@ several weeks ago, AFAIR. > > Committed gettimeofday() code structure allows for VDSO interposing without > breaking normal symbol visibility rules. > > I do not see a sense in implementing syscall or sysenter support for > i386 kernel. On the other hand, using syscall for 32bit binaries on amd64 > looks reasonable. I was not able to write some time, sorry. So. What about implementing vdso now? I know it was a patch and feature request http://lists.freebsd.org/pipermail/freebsd-bugs/2010-April/039597.html About sysenter: I have ported sysenter patch for 9.0-RELEASE-p4, it looks fine. I made some fixes in SYS.h. The reason is (if i understand it right) we have to get elf without DT_TEXTREL in ld-elf.so You can find the patch here: https://redmine.sportcomitet.org/projects/dev-freebsd/repository/revisions/master/raw/sysenter.patch https://redmine.sportcomitet.org/projects/dev-freebsd/repository/revisions/master/raw/sys/i386/i386/sysenter.s But now, this patch breaks compatibility with i386 XEN PV kernel. I wanted to fix it, but without VDSO it would be limited solution. It is one of reasons why I am interested about vdso status. So, about using 32bit binaries on amd64. It is reasonable. But if we will use it I think we have to implement vdso support in i386 kernel too for compatibility and it is better to implement sysenter too.