Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Jan 2022 23:14:55 +0000
From:      Damian's Proton Mail <damian@dmcyk.xyz>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   Re: amd64 syscall ABI (vs. Darwin)
Message-ID:  <4979A00A-9678-4BAC-881D-71F7533D93F9@dmcyk.xyz>
In-Reply-To: <YeXzCA4Yc6ya1hdA@kib.kiev.ua>
References:  <Gp_BfNXrv9qjA5V5DpeI-lfdH6EmwKDuqkMLI7DHkses-P6-bT7Ga9p_nURlQC2D4fYuWyf6pFC7s8FPUjWV5Ut7j7uL8iiqx9hv8oePlHs=@dmcyk.xyz> <YeVxXdPlmYdwV5PI@kib.kiev.ua> <94B30813-0034-4F90-9AAC-113402A1A3E8@dmcyk.xyz> <YeXzCA4Yc6ya1hdA@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
> On 17 Jan 2022, at 23:51, Konstantin Belousov <kostikbel@gmail.com> wrote:
>
> On Mon, Jan 17, 2022 at 10:31:09PM +0000, Damian's Proton Mail wrote:
>
>>> On 17 Jan 2022, at 14:38, Konstantin Belousov <kostikbel@gmail.com> wrote:
>>
>>> Look at the sys/amd64/amd64/exceptions.S. The fast_syscall entry point
>>> is where we receive control after the syscall instruction.
>>
>> A lot of new things in there for me, but the flow is clear. I was able to find corresponding logic in XNU’s sources too. Earlier I said:
>>
>>> At a first glance Darwin approach seems more optimal
>>
>> But it’s instead the opposite/no difference at all, as in Darwin, they explicitly restore/set all registers, including callee saved r12-r15.
>>
>> Explicitly preserving registers would prevent kernel data leakage too. Doing so in FreeBSD would also be an ABI compatible change I think, since users shouldn’t rely on values in those registers.
>> I’m curious if you see any obvious pros/cons with either approach, or is it just a more arbitrary implementation choice?
>
> We preserve everything on syscall entry, it is the SYSCALL instruction
> behavior that makes it look somewhat convoluted. I suggest you to read
> the SDM description of the SYSCALL instruction to understand the registers
> manipulations on entry.
>
> On the other hand, on the fast syscall return, we indeed not restore
> everything. If you want to restore full frame, use PCB_FULL_IRET pcb
> flag to request iretq return path.
>
>> Not that I’d propose changing the ABI though, I also want my toy project to work as a plug-in kernel module.
>> I guess the only other option to emulate Darwin's behaviour would be to intercept syscalls in userspace somehow first and manually preserve the register values?
>
> To emulate Darwin, you would need specific ABI personality (sysent) in the
> kernel, which would also provide sv_syscall_ret method. The method can
> do whatever is needed to the return frame, and set PCB_FULL_IRET to indicate
> that kernel should load it into CPU GPR file as is.
>
> BTW, does Darwin use SYSCALL instruction for syscall entry on amd64?

Yes, it also uses SYSCALL. Also rax/rdx for return values and the carry bit to indicate errors.
Even the syscall numbers are similar. They use different masks to distinguish BSD/Mach syscalls, but the effective BSD syscall numbers seem to be the same so far.
So I already had sysent hooks, and PCB_FULL_IRET works indeed, thanks!
[-- Attachment #2 --]
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""/><div><blockquote type="cite" class=""><div class="">On 17 Jan 2022, at 23:51, Konstantin Belousov &lt;<a href="mailto:kostikbel@gmail.com" class="">kostikbel@gmail.com</a>&gt; wrote:</div><br class="Apple-interchange-newline"/><div class=""><div class="">On Mon, Jan 17, 2022 at 10:31:09PM +0000, Damian&#39;s Proton Mail wrote:<br class=""/><blockquote type="cite" class=""><br class=""/><blockquote type="cite" class="">On 17 Jan 2022, at 14:38, Konstantin Belousov &lt;<a href="mailto:kostikbel@gmail.com" class="">kostikbel@gmail.com</a>&gt; wrote:<br class=""/><br class=""/></blockquote><blockquote type="cite" class="">Look at the sys/amd64/amd64/exceptions.S.  The fast_syscall entry point<br class=""/>is where we receive control after the syscall instruction.<br class=""/></blockquote>A lot of new things in there for me, but the flow is clear. I was able to find corresponding logic in XNU’s sources too. Earlier I said:<br class=""/><br class=""/><blockquote type="cite" class="">At a first glance Darwin approach seems more optimal<br class=""/></blockquote>But it’s instead the opposite/no difference at all, as in Darwin, they explicitly restore/set all registers, including callee saved r12-r15.<br class=""/><br class=""/>Explicitly preserving registers would prevent kernel data leakage too. Doing so in FreeBSD would also be an ABI compatible change I think, since users shouldn’t rely on values in those registers.<br class=""/>I’m curious if you see any obvious pros/cons with either approach, or is it just a more arbitrary implementation choice?<br class=""/></blockquote>We preserve everything on syscall entry, it is the SYSCALL instruction<br class=""/>behavior that makes it look somewhat convoluted.  I suggest you to read<br class=""/>the SDM description of the SYSCALL instruction to understand the registers<br class=""/>manipulations on entry.<br class=""/><br class=""/>On the other hand, on the fast syscall return, we indeed not restore<br class=""/>everything. If you want to restore full frame, use PCB_FULL_IRET pcb<br class=""/>flag to request iretq return path.<br class=""/><br class=""/><blockquote type="cite" class=""><br class=""/>Not that I’d propose changing the ABI though, I also want my toy project to work as a plug-in kernel module.<br class=""/>I guess the only other option to emulate Darwin&#39;s behaviour would be to intercept syscalls in userspace somehow first and manually preserve the register values?<br class=""/></blockquote><br class=""/>To emulate Darwin, you would need specific ABI personality (sysent) in the<br class=""/>kernel, which would also provide sv_syscall_ret method.  The method can<br class=""/>do whatever is needed to the return frame, and set PCB_FULL_IRET to indicate<br class=""/>that kernel should load it into CPU GPR file as is.<br class=""/><br class=""/>BTW, does Darwin use SYSCALL instruction for syscall entry on amd64?<br class=""/></div></div></blockquote><br class=""/></div><div>Yes, it also uses SYSCALL. Also rax/rdx for return values and the <i class="">carry</i><span style="font-style: normal;" class=""> bit to indicate errors.</span></div><div><span style="font-style: normal;" class="">Even the syscall numbers are similar. They use different masks to distinguish BSD/Mach syscalls, but the effective BSD syscall numbers seem to be the same so far.</span></div><div>So I already had sysent hooks, and PCB_FULL_IRET works indeed, thanks!</div></body></html>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4979A00A-9678-4BAC-881D-71F7533D93F9>