Date: Mon, 17 Jan 2022 23:14:55 +0000 From: Damian's Proton Mail <damian@dmcyk.xyz> To: Konstantin Belousov <kostikbel@gmail.com> Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org> Subject: Re: amd64 syscall ABI (vs. Darwin) Message-ID: <4979A00A-9678-4BAC-881D-71F7533D93F9@dmcyk.xyz> In-Reply-To: <YeXzCA4Yc6ya1hdA@kib.kiev.ua> References: <Gp_BfNXrv9qjA5V5DpeI-lfdH6EmwKDuqkMLI7DHkses-P6-bT7Ga9p_nURlQC2D4fYuWyf6pFC7s8FPUjWV5Ut7j7uL8iiqx9hv8oePlHs=@dmcyk.xyz> <YeVxXdPlmYdwV5PI@kib.kiev.ua> <94B30813-0034-4F90-9AAC-113402A1A3E8@dmcyk.xyz> <YeXzCA4Yc6ya1hdA@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] > On 17 Jan 2022, at 23:51, Konstantin Belousov <kostikbel@gmail.com> wrote: > > On Mon, Jan 17, 2022 at 10:31:09PM +0000, Damian's Proton Mail wrote: > >>> On 17 Jan 2022, at 14:38, Konstantin Belousov <kostikbel@gmail.com> wrote: >> >>> Look at the sys/amd64/amd64/exceptions.S. The fast_syscall entry point >>> is where we receive control after the syscall instruction. >> >> A lot of new things in there for me, but the flow is clear. I was able to find corresponding logic in XNU’s sources too. Earlier I said: >> >>> At a first glance Darwin approach seems more optimal >> >> But it’s instead the opposite/no difference at all, as in Darwin, they explicitly restore/set all registers, including callee saved r12-r15. >> >> Explicitly preserving registers would prevent kernel data leakage too. Doing so in FreeBSD would also be an ABI compatible change I think, since users shouldn’t rely on values in those registers. >> I’m curious if you see any obvious pros/cons with either approach, or is it just a more arbitrary implementation choice? > > We preserve everything on syscall entry, it is the SYSCALL instruction > behavior that makes it look somewhat convoluted. I suggest you to read > the SDM description of the SYSCALL instruction to understand the registers > manipulations on entry. > > On the other hand, on the fast syscall return, we indeed not restore > everything. If you want to restore full frame, use PCB_FULL_IRET pcb > flag to request iretq return path. > >> Not that I’d propose changing the ABI though, I also want my toy project to work as a plug-in kernel module. >> I guess the only other option to emulate Darwin's behaviour would be to intercept syscalls in userspace somehow first and manually preserve the register values? > > To emulate Darwin, you would need specific ABI personality (sysent) in the > kernel, which would also provide sv_syscall_ret method. The method can > do whatever is needed to the return frame, and set PCB_FULL_IRET to indicate > that kernel should load it into CPU GPR file as is. > > BTW, does Darwin use SYSCALL instruction for syscall entry on amd64? Yes, it also uses SYSCALL. Also rax/rdx for return values and the carry bit to indicate errors. Even the syscall numbers are similar. They use different masks to distinguish BSD/Mach syscalls, but the effective BSD syscall numbers seem to be the same so far. So I already had sysent hooks, and PCB_FULL_IRET works indeed, thanks! [-- Attachment #2 --] <html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""/><div><blockquote type="cite" class=""><div class="">On 17 Jan 2022, at 23:51, Konstantin Belousov <<a href="mailto:kostikbel@gmail.com" class="">kostikbel@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"/><div class=""><div class="">On Mon, Jan 17, 2022 at 10:31:09PM +0000, Damian's Proton Mail wrote:<br class=""/><blockquote type="cite" class=""><br class=""/><blockquote type="cite" class="">On 17 Jan 2022, at 14:38, Konstantin Belousov <<a href="mailto:kostikbel@gmail.com" class="">kostikbel@gmail.com</a>> wrote:<br class=""/><br class=""/></blockquote><blockquote type="cite" class="">Look at the sys/amd64/amd64/exceptions.S. The fast_syscall entry point<br class=""/>is where we receive control after the syscall instruction.<br class=""/></blockquote>A lot of new things in there for me, but the flow is clear. I was able to find corresponding logic in XNU’s sources too. Earlier I said:<br class=""/><br class=""/><blockquote type="cite" class="">At a first glance Darwin approach seems more optimal<br class=""/></blockquote>But it’s instead the opposite/no difference at all, as in Darwin, they explicitly restore/set all registers, including callee saved r12-r15.<br class=""/><br class=""/>Explicitly preserving registers would prevent kernel data leakage too. Doing so in FreeBSD would also be an ABI compatible change I think, since users shouldn’t rely on values in those registers.<br class=""/>I’m curious if you see any obvious pros/cons with either approach, or is it just a more arbitrary implementation choice?<br class=""/></blockquote>We preserve everything on syscall entry, it is the SYSCALL instruction<br class=""/>behavior that makes it look somewhat convoluted. I suggest you to read<br class=""/>the SDM description of the SYSCALL instruction to understand the registers<br class=""/>manipulations on entry.<br class=""/><br class=""/>On the other hand, on the fast syscall return, we indeed not restore<br class=""/>everything. If you want to restore full frame, use PCB_FULL_IRET pcb<br class=""/>flag to request iretq return path.<br class=""/><br class=""/><blockquote type="cite" class=""><br class=""/>Not that I’d propose changing the ABI though, I also want my toy project to work as a plug-in kernel module.<br class=""/>I guess the only other option to emulate Darwin's behaviour would be to intercept syscalls in userspace somehow first and manually preserve the register values?<br class=""/></blockquote><br class=""/>To emulate Darwin, you would need specific ABI personality (sysent) in the<br class=""/>kernel, which would also provide sv_syscall_ret method. The method can<br class=""/>do whatever is needed to the return frame, and set PCB_FULL_IRET to indicate<br class=""/>that kernel should load it into CPU GPR file as is.<br class=""/><br class=""/>BTW, does Darwin use SYSCALL instruction for syscall entry on amd64?<br class=""/></div></div></blockquote><br class=""/></div><div>Yes, it also uses SYSCALL. Also rax/rdx for return values and the <i class="">carry</i><span style="font-style: normal;" class=""> bit to indicate errors.</span></div><div><span style="font-style: normal;" class="">Even the syscall numbers are similar. They use different masks to distinguish BSD/Mach syscalls, but the effective BSD syscall numbers seem to be the same so far.</span></div><div>So I already had sysent hooks, and PCB_FULL_IRET works indeed, thanks!</div></body></html>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4979A00A-9678-4BAC-881D-71F7533D93F9>
