Date: Sat, 2 Jun 2012 14:01:35 +0100 From: Attilio Rao <attilio@freebsd.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: alc@freebsd.org, Alexander Kabaev <kan@freebsd.org>, Giovanni Trematerra <giovanni.trematerra@gmail.com>, freebsd-arch@freebsd.org Subject: Re: [RFC] Kernel shared variables Message-ID: <CAJ-FndC71=3Jo%2BBxQi==gCoLipBxj8X8XMBydjvrcKeGw%2BWOnA@mail.gmail.com> In-Reply-To: <20120601193522.GA2358@deviant.kiev.zoral.com.ua> References: <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2%2BoYo%2BwwT4ipA@mail.gmail.com> <20120601193522.GA2358@deviant.kiev.zoral.com.ua>
index | next in thread | previous in thread | raw e-mail
2012/6/1 Konstantin Belousov <kostikbel@gmail.com>: > On Fri, Jun 01, 2012 at 07:53:15PM +0200, Giovanni Trematerra wrote: >> Hello, >> I'd like to discuss a way to provide a mechanism to share some read-only >> data between kernel and user space programs avoiding syscall overhead, >> implementing some them, such as gettimeofday(3) and time(3) as ordinary >> user space routine. >> >> The patch at >> http://www.trematerra.net/patches/ksvar_experimental.patch >> >> is in a very experimental stage. It's just a proof-of-concept. >> Only works for an AMD64 kernel and only for 64-bit applications. >> The idea is to have all the variables that we want to share between kernel >> and user space into one or more consecutive pages of memory that will be >> mapped read-only into every running process. At the start of the first >> shared page >> there'll be a table with as many entries as the number of the shared variables. >> Each entry is a 32-bit value that is the offset between the start of the shared >> page and the start of the variable in the page. The user space processes need >> to find out the map address of shared page and use the table to access to the >> shared variables. >> Kernel will export a variable to user space as an index, so user space code >> must refer to a specific index to access a kernel shared variable. >> Let's take a quick look to the KPI/API for exporting/importing kernel >> shared variables. >> Say we want implement a routine to export an int from the kernel. >> To define the variable to be exported inside the kernel you would use >> >> KSVAR_DEFINE(0, int, test_value); >> >> You have just defined an int variable named "test_value" at index 0. >> Inside the kernel you can write/read as usual using the symbol test_value; >> Now you likely want add to libc a function callable from user processes >> that return the test_value variable. So first of all you need the import the >> variable. >> >> KSVAR_IMPORT(0, int, test_value); >> >> and to obtain a pointer to read the value you would use >> >> KSVAR(test_value); >> >> so your function would look like something like this >> >> int get_test_value() >> { >> >> return (*KSVAR(test_value)); >> } >> >> Then inside your process just call get_test_value() function as you usually >> do and you'll get a kernel written value without switching in kernel mode. >> >> Let's see now in more detail how that could be accomplished. >> The shared variables will be accessed as normal variables and are read/write >> inside the kernel. The variables need to be inside the same page(s) and nothing >> but the shared variables (and the table) must be into the page(s). To >> obtain that >> I changed the linker script in this way >> >> --- a/sys/conf/ldscript.amd64 >> +++ b/sys/conf/ldscript.amd64 >> @@ -177,6 +177,15 @@ SECTIONS >> *(.ldata .ldata.* .gnu.linkonce.l.*) >> . = ALIGN(. != 0 ? 64 / 8 : 1); >> } >> + .ksvar ALIGN(CONSTANT (COMMONPAGESIZE)) : >> + { >> + __ksvar_set_start = .; >> + *(.ksvar_table) >> + *(.ksvar) >> + >> + . = ALIGN(CONSTANT (COMMONPAGESIZE)); >> + __ksvar_set_stop = .; >> + } >> . = ALIGN(64 / 8); >> _end = .; PROVIDE (end = .); >> . = DATA_SEGMENT_END (.); >> >> When we want to define a variable in the kernel to share with user space >> we have to use KSVAR_DEFINE macro in sys/sys/ksvar.h >> >> +struct ksvar_set { >> + uint32_t idx; >> + char *pksvar; >> +}; >> + >> +/* >> + * Declare a variable into kernel shared linker_set. >> + */ >> +#define KSVAR_DEFINE(index, type, name) \ >> + static type name __section(".ksvar"); \ >> + static struct ksvar_set name ## _ksvar_set = { \ >> + .idx = index, \ >> + .pksvar = (char *) &name \ >> + }; \ >> + DATA_SET(ksvar_set, name ## _ksvar_set) >> >> Every variable must have a unique index. The indexes must >> start from zero and be consecutive. When you add an index >> you must bump the size of the table (KSVAR_TABLE_SIZE) >> (see sys/sys/ksvar.h) >> >> The variables are inside the kernel static image that isn't managed >> by the VM and so we need to allocate pages to map the physical addresses. >> A new SYSINIT (ksvarinit) will allocate a set of vm_page_t through >> the vm_phys_fictitious_reg_range interface and fill the table using >> the information >> of the ksvar_set linker set, then will create a vm_object_t (vm_object_ksvar), >> mark the fake pages as valid and put them into it. >> When a new process is created by exec(3) the vm_object_ksvar will be >> mapped read-only into the process address space by vm_map_fixed routine >> just before mapping the user stack. The address of mapping will be recorded >> inside the new p_ksvar field of the struct proc. >> This field will be exported through a sysctl to the user space processes. >> In order to implement syscalls as user space routines, we have to find out the >> mapped address of the kernel shared variables when the libc is mapped into >> the process. So I added a function marked with the attribute constructor. >> It will called before any code into user process and before any code inside >> the libc. >> >> +__attribute((constructor)) void init_kernel_shared() >> +{ >> + int mib[2]; >> + size_t len; >> + vm_offset_t ksvar_address; >> + >> + mib[0] = CTL_KERN; >> + mib[1] = KERN_KSVAR; >> + len = sizeof(vm_offset_t); >> + if (__sysctl(mib, 2, (void *) &ksvar_address, &len, NULL, 0) != -1) >> + ksvar_table = (uint32_t *) ksvar_address; >> +} >> >> Once the libc knows the address of the table it can access to the shared >> variables. >> >> Just as proof of concept I re-implemented gettimeofday(3) in user space. >> First of all I didn't remove the entry into the syscall.master, just renamed the >> sys_gettimeofday. I need it for the fallback path. >> In the kernel I introduced a struct wall_clock. >> >> +struct wall_clock >> +{ >> + struct timeval tv; >> + struct timezone tz; >> +}; >> >> The struct is exported through sys/sys/time.h header. >> I defined a new kernel shared variable. To do so I added an index in >> sys/sys/ksvar.h >> WALL_CLOCK_INDEX and bumped KSVAR_TABLE_SIZE to 1. >> In the sys/kern/kern_clocksource.c >> >> +/* kernel shared variable for implmenting gettimeofday. */ >> +KSVAR_DEFINE(WALL_CLOCK_INDEX, struct wall_clock, wall_clock); >> >> Now we defined a shared variable at index WALL_CLOCK_INDEX of type >> struct wall_clock and named wall_clock. >> Inside handleevents I update the info exported by wall_clock. >> >> + struct timeval tv; >> + >> + /* update time for userspace gettimeofday */ >> + microtime(&tv); >> + wall_clock.tv = tv; >> + wall_clock.tz.tz_minuteswest = tz_minuteswest; >> + wall_clock.tz.tz_dsttime = tz_dsttime; >> >> Now, in libc we import the shared variable >> >> +KSVAR_IMPORT(WALL_CLOCK_INDEX, struct wall_clock, wall_clock); >> >> note that WALL_CLOCK_INDEX must be the same of the one defined >> inside the kernel, and define a new function gettimeofday >> >> +int >> +gettimeofday(struct timeval *tp, struct timezone *tzp) >> +{ >> + >> + /* fallback to syscall if kernel doesn't export ksvar */ >> + if (!KSVAR_IS_ACTIVE()) >> + return (sys_gettimeofday(tp, tzp)); >> + >> + if (tp != NULL) >> + *tp = KSVAR(wall_clock)->tv; >> + if (tzp != NULL) >> + *tzp = KSVAR(wall_clock)->tz; >> + return (0); >> +} >> >> Now when a process will call getimeofday, will call that function actually. >> If the process makes a lot of call to gettimeofday, we will see a >> performance boost. >> Note that if ksvar are not exported from the kernel (KSVAR_IS_ACTIVE), >> the function >> fallback to call the actual syscall (sys_gettimeofday). >> >> Open tasks >> - implement support for 32-bit emulated processes running in a 64-bit >> environment. >> - extend support to others arch >> - implement more syscalls >> - benchmarks >> - Test, test, test. >> >> I'm looking forward to hear about your comments and suggestions. > > I very much dislike what you described, it makes ABI maintanence > a nightmare. > Below is some mail I wrote around Spring 2009, making some notes about > desired proposal. This is what called vdso in Linux land. Did you bother to read at least Giovanni's description? Because this has nothing to do with VDSO in Linux. I think, he just wants to map in userland processes some pages from the static image of the kernel (packed together in a specific dataset). This imposes some non-trivial problem. The first thing is that the static image is not thought to have physical pages tied to it. The second is that he needs to make a clean design in order to let consumer of this mechanism to correctly locate informations they want within the shared page(s) and in the end read the correct values. I have some reservations on both the implementation and the approach for retrieving datas from the page. In particular, I don't like that a new vm_object is allocated for this page. What I really would like would be: 1) very minimal implementation -- you just use pmap_enter()/pmap_remove() specifically when needed, separately, in fork(), execve(), etc. cases 2) more complete approach -- you make a very quick layer which let you map pages from the static image of the kernel and the shared page becomes just a specific consumer of this. This way the object has much more sense because it becomes an object associated to all the static image of the kernel About the layering, I don't like that you require both a kernel and userland header to locate the objects within the page. This is very likely ABI breakage prone. It is needed a mechanism for retrieving at run time what Giovanni calls "indexes", or making it indexes-agnostic. Attilio -- Peace can only be achieved by understanding - A. Einsteinhome | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndC71=3Jo%2BBxQi==gCoLipBxj8X8XMBydjvrcKeGw%2BWOnA>
