Date: Wed, 25 Apr 2001 10:59:03 -0700 From: Julian Elischer <julian@elischer.org> To: Arch@freebsd.org, alfred@freebsd.org, Robert Watson <rwatson@FreeBSD.ORG>, Daniel Eischen <eischen@vigrid.com> Subject: KSE threading support (first parts) Message-ID: <3AE71067.FF4BD029@elischer.org>
next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------56834A9EA7789B526697FC9C Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit After discussing this with Jason Evans before he took his new position, and having looked at his patches from December, and My similar patches from january, here is a 'merged' patch. It breaks the proc structure into 4 parts. proc... owns all 'total process' resources. (e.g. address space, limits, files) kseg... KSE 'group'. Anything to do with working out the quanta to be given to the threads (KSEs). A scheduling abstraction. kse.... Actual scheduable entity for a processor (if the KSEG has a quantum for it) ksec... Where a thread stores its context when it is blocked so tha the kse can return to either the user, or another unblocked kse to continue using is quanta. This compiles cleanly and SHOULD run (it did run in an earlier incarnation). It is by no means final, but rather designed to give us a starting point in discussions. In this view, KSEGs are on the run queue and when they get some quanta the KSEs hanging off them are run. If 2 KSEs are running, the KSEG's quanta are exhausted a twice the rate. Each KSE has a very strong affinity for one processor and KSECs have a weak affinity for a KSE. If a KSE runs out of work but has time, it will 'poach' a KSEC from another KSE in the same KSEG list. In this patch the linkages are not set up at all. All that is done is that the structures are defined and used instead of a monolithic 'proc' struct. The new structures are 'included' in the proc structure to maintain compatibility and to allow code to be changed slowely. What really needs to be done is for everyone who is interested to go over rather arbitrary allocation of fields to structures that I did and make suggested changes. Also I've punted on most things to do with signals as we haven't really discussed how we want signals to be handled in a KSE world.. (ca each KSEG or KSE get individual signals? do we need to define a special 'signal' KSE? If so is that all it does? What happens to the 'u-area'? how do we define a "cur-kse" similar to curproc? (do we need one?) presently the processor state is stored all over the place when a process is suspended.. This needs to be brought together so it can be put into the KSEC. Who understands that stuff? Some of the next steps would be: 1/ figure out what we want for signals etc.. 2/ get the contexts actually stored in the KSEC structure when a proces is suspended. (instead of some strange pcb in funny memory near the u area) 3/ Set up the linkages between these structures, and 4/ start using 'kse' instead of 'proc' in a bunch of places and using the linkages to find the appropriate other structures when needed. 5/ Add code to make new KSEs so that the 1:1 Mapping is no longer true. 6/ Add syscalls to start making KSEs other than the one that is built into the process. 7/ start making upcalls -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000-2001 ---> X_.---._/ v --------------56834A9EA7789B526697FC9C Content-Type: text/plain; charset=iso-8859-2; name="proc.4-26.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="proc.4-26.diff" Index: kern/kern_fork.c =================================================================== RCS file: /unused/cvs/freebsd/src/sys/kern/kern_fork.c,v retrieving revision 1.110 diff -u -r1.110 kern_fork.c --- kern/kern_fork.c 2001/03/28 11:52:53 1.110 +++ kern/kern_fork.c 2001/04/25 17:11:22 @@ -390,6 +390,24 @@ (unsigned) ((caddr_t)&p2->p_endcopy - (caddr_t)&p2->p_startcopy)); PROC_UNLOCK(p1); + bzero(&p2->p_kse.ke_startzero, + (unsigned) ((caddr_t)&p2->p_kse.ke_endzero + - (caddr_t)&p2->p_kse.ke_startzero)); + bcopy(&p1->p_kse.ke_startcopy, &p2->p_kse.ke_startcopy, + (unsigned) ((caddr_t)&p2->p_kse.ke_endcopy + - (caddr_t)&p2->p_kse.ke_startcopy)); + + bzero(&p2->p_ksec.kc_startzero, + (unsigned) ((caddr_t)&p2->p_ksec.kc_endzero + - (caddr_t)&p2->p_ksec.kc_startzero)); + bcopy(&p1->p_ksec.kc_startcopy, &p2->p_ksec.kc_startcopy, + (unsigned) ((caddr_t)&p2->p_ksec.kc_endcopy + - (caddr_t)&p2->p_ksec.kc_startcopy)); + + bcopy(&p1->p_kseg.kg_startcopy, &p2->p_kseg.kg_startcopy, + (unsigned) ((caddr_t)&p2->p_kseg.kg_endcopy + - (caddr_t)&p2->p_kseg.kg_startcopy)); + mtx_init(&p2->p_mtx, "process lock", MTX_DEF); PROC_LOCK(p2); Index: sys/proc.h =================================================================== RCS file: /unused/cvs/freebsd/src/sys/sys/proc.h,v retrieving revision 1.160 diff -u -r1.160 proc.h --- sys/proc.h 2001/04/20 22:34:48 1.160 +++ sys/proc.h 2001/04/25 17:20:51 @@ -147,17 +147,200 @@ * either lock is sufficient for read access, but both locks must be held * for write access. */ + struct ithd; struct nlminfo; +/* + * Here we define the four structures used for process information. + * The first is the ksec. It stands for "Kernel Schedulabale Entity Context". + * This structure contains all the information as to where a thread of + * execution was when it was suspended, why it was suspended, and anything else + * that will be needed to restart it when it is rescheduled. Always + * associated with a KSE, but can be reassigned to an equivalent KSE for + * load balancing. + */ +struct ksec; + +/* + * The second structure is the Kernel Schedulable Entity. (KSE) + * As long as this is scheduled, it will continue to run any KSECs that + * are assigned to it until either it runs out of KSECs or CPU. + * It runs on one CPU and is assigned a quantum of time. When a KSEC is + * blocked, The KSE continues to run and will search for another KSEC + * in a runnable state amongst those it has. It May decide to return to user + * mode with a new 'empty' KSEC if there are no runnable KSECs. + * KSEs are associated with a KSE for cache reasons, but a sheduled KSE with + * no runnable KSECs will try take a KSEC from a sibling KSE before + * surrendering its quantum. + */ +struct kse; + +/* + * The KSEG is allocated resources across a number of CPUs. + * (Including a number of CPUxQUANTA. It parcels these QUANTA up among + * Its KSEs, each of which should be running in a different CPU. + * Priority and total avaliable sheduled quanta are properties of a KSEG. + * Multiple KSEGs in a single process compete against each other + * for total quanta in the same way that a forked child competes against + * it's parent process. + */ +struct kseg; + +/* + * A process is the owner of all system resources allocated to a task. + * All KSEGs under one process see, and have the same access to, these + * resources (e.g. files, memory, sockets, permissions). A process may + * compete for CPU cycles on the same basis as a forked process cluster + * by spawning several KSEGs. + */ +struct proc; + +/*************** + * In pictures: + With a single run queue used by all processors: + + RUNQ: --->KSEG---KSEG--... SLEEPQ:[]---KSEC---KSEC---KSEC + | []---KSEC + KSE---KSEC--KSEC--KSEC [] + | []---KSEC---KSEC + KSE--KSEC--KSEC + + (processors run KSEs from the head KSEG until they are exhausted or + the KSEG exhausts its quantum) + +With PER-CPU run queues: +it may be easier to put the KSEs on the run queues directly +They would be given priorities calculated from the KSEG. + + * + *****************/ + +/* + * Kernel runnable context. This is what is put to sleep and reactivated. + * (Kernel Schedulable Entity Context) + * The first KSE available in the correct group will run this context. + * If several are available, use the one on the same CPU as last time. + */ +struct ksec { + /*** New fields for KSE linkage ***/ + /* While it is possible to find the proc via the kse->kseg->proc + * it is directly held here for efficiency (etc.) + */ + struct proc *kc_proc; /* Associated process. */ + struct kseg *kc_kseg; /* Associated KSEG. */ + struct kse *kc_kse; /* Associated KSE. */ + + TAILQ_ENTRY(ksec) kc_ksegq; /* All ksecs in this kseg */ + TAILQ_ENTRY(ksec) kc_slpqk; /* (j) Sleep/run queue. */ + + /* the fields below will mutate into those above */ + TAILQ_ENTRY(proc) kc_procq; /* (j) Run/mutex queue. */ + TAILQ_ENTRY(proc) kc_slpq; /* (j) Sleep queue. */ + /* The following fields are all zeroed upon creation in fork. */ +#define kc_startzero kc_dupfd + int kc_flag; /* (c) P_* flags. */ + int kc_sflag; /* (j) PS_* flags. */ + int kc_stat; /* (j) S* process status. */ + int kc_dupfd; /* (c) ret value from fdopen. XXX */ + void *kc_wchan; /* (j) Sleep address. */ + const char *kc_wmesg; /* (j) Reason for sleep. */ + u_char kc_lastcpu; /* (j) Last cpu we were on. */ + short kc_locks; /* (*) DEBUG: lockmgr count of locks */ + u_int kc_stops; /* (c) Procfs event bitmask. */ + u_int kc_stype; /* (c) Procfs stop event type. */ + char kc_step; /* (c) Procfs stop *once* flag. */ + u_char kc_pfsflags; /* (c) Procfs flags. */ + struct klist kc_klist; /* (c) Knotes attached to this proc. */ + struct mtx *kc_blocked; /* (j Mutex process is blocked on. */ + const char *kc_mtxname; /* (j) Name of mutex blocked on. */ + LIST_HEAD(, mtx) kc_contested; /* (j) Contested locks. */ + /* End area that is zeroed on creation. */ + /* The following fields are all copied upon creation in fork. */ + struct lock_list_entry *kc_sleeplocks; /* (k) Held sleep locks. */ + register_t kc_retval[2]; /* (k) Syscall aux returns. */ +#define kc_endzero kc_slpcallout +#define kc_startcopy kc_endzero + struct callout kc_slpcallout;/* (h) Callout for sleep. */ + struct mdproc kc_md; /* (k) Any machine-dependent fields. */ + /* eventually struct mdksec.... */ + /* End area that is copied on creation. */ +#define kc_endcopy kc_addr + struct user *kc_addr; /* (k) Kernel virtual addr of u-area (CPU). */ + struct pasleep kc_asleep; /* (k) Used by asleep()/await(). */ +}; + +/* + * The schedulable entity that can be given a context to run. + * A process may have several of these. Probably one per processor + * but posibly a few more. In this universe they are grouped + * with a KSEG that contains the priority and niceness + * for the group. + */ +struct kse { + struct proc *ke_proc; /* Associated process. */ + struct kseg *ke_kseg; /* Associated KSEG. */ + TAILQ_ENTRY(kse) ke_kseq; /* Queue of KSEs in ke_kseg. */ + struct ksec *ke_ksec; /* Associated KSEC, if running. */ + TAILQ_HEAD(ke_ksec_hd, ksec); /* Runnable KSECs waiting on this KSE */ + struct pstats *ke_stats; /* (bk) Accounting/statistics (CPU). */ +/* The following fields are all zeroed upon creation in fork. */ +#define ke_startzero ke_estcpu + int ke_flag; /* (c) P_* flags. */ + int ke_sflag; /* (j) PS_* flags. */ + int ke_stat; /* (j) S* process status. */ + u_int ke_estcpu; /* (j) Time averaged value of ke_cpticks. */ + int ke_cpticks; /* (j) Ticks of cpu time. */ + fixpt_t ke_pctcpu; /* (j) %cpu during p_swtime. */ + u_int64_t ke_uu; /* (j) Previous user time in microsec. */ + u_int64_t ke_su; /* (j) Previous system time in microsec. */ + u_int64_t ke_iu; /* (j) Previous interrupt time in microsec. */ + u_int64_t ke_uticks; /* (j) Statclock hits in user mode. */ + u_int64_t ke_sticks; /* (j) Statclock hits in system mode. */ + u_int64_t ke_iticks; /* (j) Statclock hits processing intr. */ + u_int ke_slptime; /* (j) Time since last blocked. */ + u_char ke_oncpu; /* (j) Which cpu we are on. */ + char ke_rqindex; /* (j) Run queue index. */ + int ke_intr_nesting_level; /* (n) Interrupt recursion. */ +/* End area that is zeroed on creation. */ +/* The following fields are all copied upon creation in fork. */ +#define ke_endzero ke_priority +#define ke_startcopy ke_endzero + u_char ke_priority; /* (j) Process priority. */ + u_char ke_usrpri; /* (j) User priority based on p_cpu and p_nice. */ +/* End area that is copied on creation. */ +#define ke_endcopy ke_ithd + struct ithd *ke_ithd; /* (b) For interrupt threads only. */ +}; +/* + * Kernel-scheduled entity group (KSEG). The scheduler considers each KSEG to + * be an indivisible unit from a time-sharing perspective, though each KSEG may + * contain multiple KSEs. + */ +struct kseg { + struct proc *kg_proc; /* Process that contains this KSEG. */ + TAILQ_ENTRY(kseg) kg_ksegq; /* Queue of KSEGs in kg_proc. */ + TAILQ_HEAD(kg_kse_hd, kse); /* Queue of KSEs in this KSEG. */ + TAILQ_HEAD(kg_ksec_hd, ksec); /* Queue of KSECs in this KSEG. */ +/* The following fields are all copied upon creation in fork. */ +#define kg_startcopy kg_itcallout + struct callout kg_itcallout; /* (h) Interval timer callout. */ + struct priority kg_pri; /* (j) Process priority. */ + char kg_nice; /* (j?/k?) Process "nice" value. */ + struct rtprio kg_rtprio; /* (j) Realtime priority. */ +/* End area that is copied on creation. */ +#define kg_endcopy kg_dummy + int kg_dummy; +}; + struct proc { - TAILQ_ENTRY(proc) p_procq; /* (j) Run/mutex queue. */ - TAILQ_ENTRY(proc) p_slpq; /* (j) Sleep queue. */ LIST_ENTRY(proc) p_list; /* (d) List of all processes. */ /* substructures: */ + TAILQ_HEAD(p_ksegq, kseg); /* Queue of KSEGs. */ struct pcred *p_cred; /* (c + k) Process owner's identity. */ struct filedesc *p_fd; /* (b) Ptr to open files structure. */ + /* accumulated stats for all owned KSEs? */ struct pstats *p_stats; /* (b) Accounting/statistics (CPU). */ struct plimit *p_limit; /* (m) Process limits. */ struct vm_object *p_upages_obj;/* (a) Upages object. */ @@ -168,7 +351,61 @@ #define p_ucred p_cred->pc_ucred #define p_rlimit p_limit->pl_rlimit - +/* + * Compatibility defines for while we are using a + * single one in the proc struct during development. + */ + struct kseg p_kseg; +#define p_itcallout p_kseg.kg_itcallout +#define p_pri p_kseg.kg_pri +#define p_nice p_kseg.kg_nice +#define p_rtprio p_kseg.kg_rtprio + + struct kse p_kse; +#define p_stats p_kse.ke_stats +#define p_estcpu p_kse.ke_estcpu +#define p_cpticks p_kse.ke_cpticks +#define p_pctcpu p_kse.ke_pctcpu +#define p_uu p_kse.ke_uu +#define p_su p_kse.ke_su +#define p_iu p_kse.ke_iu +#define p_uticks p_kse.ke_uticks +#define p_sticks p_kse.ke_sticks +#define p_iticks p_kse.ke_iticks +#define p_slptime p_kse.ke_slptime +#define p_oncpu p_kse.ke_oncpu +#define p_rqindex p_kse.ke_rqindex +#define p_usrpri p_kse.ke_usrpri +#define p_ithd p_kse.ke_ithd +#define p_intr_nesting_level p_kse.ke_intr_nesting_level + + struct ksec p_ksec; +#define p_procq p_ksec.kc_procq +#define p_slpq p_ksec.kc_slpq +#define p_dupfd p_ksec.kc_dupfd +#define p_wchan p_ksec.kc_wchan +#define p_wmesg p_ksec.kc_wmesg +#define p_lastcpu p_ksec.kc_lastcpu +#define p_locks p_ksec.kc_locks +#define p_stops p_ksec.kc_stops +#define p_stype p_ksec.kc_stype +#define p_retval p_ksec.kc_retval +#define p_step p_ksec.kc_step +#define p_pfsflags p_ksec.kc_pfsflags +#define p_klist p_ksec.kc_klist +#define p_blocked p_ksec.kc_blocked +#define p_mtxname p_ksec.kc_mtxname +#define p_contested p_ksec.kc_contested +#define p_sleeplocks p_ksec.kc_sleeplocks +#define p_slpcallout p_ksec.kc_slpcallout +#define p_md p_ksec.kc_md +#define p_asleep p_ksec.kc_asleep + + + /* + * The following don't make too much sense.. + * See the kc_ or ke_ versions of the same flags + */ int p_flag; /* (c) P_* flags. */ int p_sflag; /* (j) PS_* flags. */ int p_stat; /* (j) S* process status. */ @@ -183,80 +420,47 @@ /* The following fields are all zeroed upon creation in fork. */ #define p_startzero p_oppid - pid_t p_oppid; /* (c + e) Save parent pid during ptrace. XXX */ - int p_dupfd; /* (c) Sideways ret value from fdopen. XXX */ + pid_t p_oppid; /* (c + e) Save ppid in ptrace. XXX */ struct vmspace *p_vmspace; /* (b) Address space. */ /* scheduling */ - u_int p_estcpu; /* (j) Time averaged value of p_cpticks. */ - int p_cpticks; /* (j) Ticks of cpu time. */ - fixpt_t p_pctcpu; /* (j) %cpu during p_swtime. */ - struct callout p_slpcallout; /* (h) Callout for sleep. */ - void *p_wchan; /* (j) Sleep address. */ - const char *p_wmesg; /* (j) Reason for sleep. */ - u_int p_swtime; /* (j) Time swapped in or out. */ - u_int p_slptime; /* (j) Time since last blocked. */ + u_int p_swtime; /* (j) Time swapped in or out. */ - struct callout p_itcallout; /* (h) Interval timer callout. */ struct itimerval p_realtimer; /* (h?/k?) Alarm timer. */ - u_int64_t p_runtime; /* (j) Real time in microsec. */ - u_int64_t p_uu; /* (j) Previous user time in microsec. */ - u_int64_t p_su; /* (j) Previous system time in microsec. */ - u_int64_t p_iu; /* (j) Previous interrupt time in microsec. */ - u_int64_t p_uticks; /* (j) Statclock hits in user mode. */ - u_int64_t p_sticks; /* (j) Statclock hits in system mode. */ - u_int64_t p_iticks; /* (j) Statclock hits processing intr. */ + u_int64_t p_runtime; /* (j) Real time in microsec. */ int p_traceflag; /* (j?) Kernel trace points. */ struct vnode *p_tracep; /* (j?) Trace to vnode. */ - sigset_t p_siglist; /* (c) Signals arrived but not delivered. */ + sigset_t p_siglist; /* (c) Sigs arrived, not delivered. */ struct vnode *p_textvp; /* (b) Vnode of executable. */ struct mtx p_mtx; /* (k) Lock for this struct. */ u_int p_spinlocks; /* (k) Count of held spin locks. */ - char p_lock; /* (c) Process lock (prevent swap) count. */ - u_char p_oncpu; /* (j) Which cpu we are on. */ - u_char p_lastcpu; /* (j) Last cpu we were on. */ - char p_rqindex; /* (j) Run queue index. */ - - short p_locks; /* (*) DEBUG: lockmgr count of held locks */ - u_int p_stops; /* (c) Procfs event bitmask. */ - u_int p_stype; /* (c) Procfs stop event type. */ - char p_step; /* (c) Procfs stop *once* flag. */ - u_char p_pfsflags; /* (c) Procfs flags. */ - char p_pad3[2]; /* Alignment. */ - register_t p_retval[2]; /* (k) Syscall aux returns. */ + char p_lock; /* (c) Process (prevent swap) count. */ + char p_pad3[3]; /* Alignment. */ struct sigiolst p_sigiolst; /* (c) List of sigio sources. */ int p_sigparent; /* (c) Signal to parent on exit. */ - sigset_t p_oldsigmask; /* (c) Saved mask from before sigpause. */ + sigset_t p_oldsigmask; /* (c) Saved mask from pre sigpause. */ int p_sig; /* (n) For core dump/debugger XXX. */ u_long p_code; /* (n) For core dump/debugger XXX. */ - struct klist p_klist; /* (c) Knotes attached to this process. */ - struct lock_list_entry *p_sleeplocks; /* (k) Held sleep locks. */ - struct mtx *p_blocked; /* (j) Mutex process is blocked on. */ - const char *p_mtxname; /* (j) Name of mutex blocked on. */ - LIST_HEAD(, mtx) p_contested; /* (j) Contested locks. */ struct nlminfo *p_nlminfo; /* (?) only used by/for lockd */ void *p_aioinfo; /* (c) ASYNC I/O info. */ - struct ithd *p_ithd; /* (b) For interrupt threads only. */ - int p_intr_nesting_level; /* (k) Interrupt recursion. */ /* End area that is zeroed on creation. */ -#define p_endzero p_startcopy - /* The following fields are all copied upon creation in fork. */ #define p_startcopy p_sigmask +#define p_endzero p_startcopy + /* We haven't defined how KSEs do signals yet */ sigset_t p_sigmask; /* (c) Current signal mask. */ stack_t p_sigstk; /* (c) Stack pointer and on-stack flag. */ int p_magic; /* (b) Magic number. */ - struct priority p_pri; /* (j) Process priority. */ - char p_nice; /* (j?/k?) Process "nice" value. */ char p_comm[MAXCOMLEN + 1]; /* (b) Process name. */ + int p_kse_enabled; /* (b) 0, unless using KSEs this proc. */ struct pgrp *p_pgrp; /* (e?/c?) Pointer to process group. */ struct sysentvec *p_sysent; /* (b) System call dispatch information. */ @@ -266,7 +470,6 @@ #define p_endcopy p_addr struct user *p_addr; /* (k) Kernel virtual addr of u-area (CPU). */ - struct mdproc p_md; /* (k) Any machine-dependent fields. */ u_short p_xstat; /* (c) Exit status for wait; also stop sig. */ u_short p_acflag; /* (c) Accounting flags. */ @@ -274,7 +477,6 @@ struct proc *p_peers; /* (c) */ struct proc *p_leader; /* (c) */ - struct pasleep p_asleep; /* (k) Used by asleep()/await(). */ void *p_emuldata; /* (c) Emulator state data. */ }; @@ -293,9 +495,10 @@ #define SMTX 7 /* Blocked on a mutex. */ /* These flags are kept in p_flag. */ +/* In a KSE world some go to a KSEC or a KSE (*)*/ #define P_ADVLOCK 0x00001 /* Process may hold a POSIX advisory lock. */ #define P_CONTROLT 0x00002 /* Has a controlling terminal. */ -#define P_KTHREAD 0x00004 /* Kernel thread. */ +#define P_KTHREAD 0x00004 /* Kernel thread. (*)*/ #define P_NOLOAD 0x00008 /* Ignore during load avg calculations. */ #define P_PPWAIT 0x00010 /* Parent is waiting for child to exec/exit. */ #define P_SELECT 0x00040 /* Selecting; wakeup/waiting danger. */ @@ -305,6 +508,7 @@ #define P_WAITED 0x01000 /* Debugging process has waited for child. */ #define P_WEXIT 0x02000 /* Working on exiting. */ #define P_EXEC 0x04000 /* Process called exec. */ +#define P_KSES 0x08000 /* Process is using KSEs. */ /* Should be moved to machine-dependent areas. */ --------------56834A9EA7789B526697FC9C-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3AE71067.FF4BD029>