Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Nov 2000 17:38:59 -0500
From:      "Brian F. Feldman" <green@FreeBSD.org>
To:        Julian Elischer <julian@elischer.org>
Cc:        arch@FreeBSD.org, jasone@FreeBSD.org
Subject:   Re: Threads .. chopping up 'struct proc' 
Message-ID:  <200011262239.eAQMd0576413@green.dyndns.org>
In-Reply-To: Message from Julian Elischer <julian@elischer.org>  of "Sun, 26 Nov 2000 12:18:06 PST." <3A216FFE.BE0F780F@elischer.org> 

next in thread | previous in thread | raw e-mail | index | archive | help
Julian Elischer <julian@elischer.org> wrote:
> I'v been looking a the proc srtucture..
> 
> The aim is to eventually move some of the fields into a
> struct KSE (struct schedbox?)
> struct KSEC (struct threadcontext?)
> struct KSEG (struct schedgroup?)

Sounds about right, as far as I've been following the discussion (I read all 
of -arch, but don't follow -smp at all since I just don't have SMP ;)

My question thus far is, okay, given a proc has one of each; will a set of 
threads, in any form, ALWAYS have a proc backing it up?  It would make sense 
as such, and in that case I'd think that you would reduce a lot of the 
complexity in the switchover.

> Initially we would simply include one of each of these in the struct proc,
> but link them together as if they were correctly connected up.
> we would use macros such as:
> #define p_estcpu p_kse.kse_estcpu
> to keep present code working....
> eventually functions that get changed to receive a kse directly
> would just use kse->kse_estcpu and if they need proc they
> can use kse->kse_proc. But until then, we'd start by simply 
> separating the fields and using macros. Then we can convert 
> calls at our leasure.

What would be the difference between doing it "right" for struct proc in the 
first place rather than dummying them up?  I wouldn't want an artificial 
discrepancy here, if possible.  Perhaps you could explain a bit more of the 
vision you have here?  I haven't been able to pick that bit up from your 
posts as of yet.  A KSE of just one thread would seem to logically be 
handled the exact same as a process.

> However when going through the fields in struct proc,
> some difficulties become obvious. Here's my initial 
> division of the fields. I've added a comment at the 
> beginning of each line that indicates where I think 
> it should go, however I'm not convinced about some of them:
> 
> P = stays in struct proc
> E = goes to 'KSE' struct (schedulable entity)
> G = goes to 'group' struct
> C = goes to 'sleepable Context' struct.

Does each KSE get a sleepable context?  I don't know if I really see where 
it fits in; sounds like it would have a 1:1 mapping with KSEs.

> I note with [XXX] things I am sure about, or do nut really understand.
> these are usually new fields to do with things like events, or fields 
> where the semantics of the feature have not been decided for a 
> threaded environment.  E.g. WHO GETS A SIGNAL?
> 
> struct  proc {
> /*E*/   TAILQ_ENTRY(proc) p_procq;      /* run/mutex queue. */
> [this may need to be split to two entries.. one in a KSE or
>  and one in a KSEG, depending on how we do things ]
> 
> /*C*/   TAILQ_ENTRY(proc) p_slpq;       /* sleep queue. */
> /*P*/   LIST_ENTRY(proc) p_list;        /* List of all processes. */
> 
>         /* substructures: */
> /*P*/   struct  pcred *p_cred;          /* Process owner's identity. */
> /*P*/   struct  filedesc *p_fd;         /* Ptr to open files structure. */
> /*P*/   struct  pstats *p_stats;        /* Accounting/statistics (PROC ONLY). */
> [some of these may need to be duplicated in the KSE and KSEG.. 
> maybe even Context]

Sounds particularly evil to have a set of statistics in the process and in 
the KSEs.  How about only in the KSEs, and in the "traditional" case, the 
process usage info for example would be the addition of all that of the KSEs.

> /*P*/   struct  plimit *p_limit;        /* Process limits. */
> /*P*/   struct  vm_object *p_upages_obj;/* Upages object */

This maps to a KSE, really... The struct user maps to the signal handlers 
(should be per-KSE, I think...), the stats, and the pcb.  The pcb absolutely 
has to be one per CPU context, so proc won't work :)

> /*P*/   struct  procsig *p_procsig;
> [Well, actually who gets signals?  maybe this is per KSE? per KSEG?
> maybe even per Context as each context has a different user stack and
> signals are delivered on the user stack.. (unless set otherwise)]

I would think that a KSE should own its own and that it should be 
configurable whether to use the signal info per-KSE or per-proc.

> #define p_sigacts       p_procsig->ps_sigacts
> #define p_sigignore     p_procsig->ps_sigignore
> #define p_sigcatch      p_procsig->ps_sigcatch
>  
> #define p_ucred         p_cred->pc_ucred
> #define p_rlimit        p_limit->pl_rlimit
>  
> /*C*/   int     p_flag;                 /* P_* flags. */
> [these flags will probably need to be shared out amongst the structures]
> /*C*/   char    p_stat;                 /* S* process status. */
> [as will these]
>         char    p_pad1[3];
>  
> /*P*/   pid_t   p_pid;                  /* Process identifier. */

If signals are per-KSE, would it then follow to give a KSEG a process id and 
each KSE another process id (same namespace as pids) that could be used to 
signal it and whatnot?

> /*P*/   LIST_ENTRY(proc) p_hash;        /* Hash chain. */
> /*P*/   LIST_ENTRY(proc) p_pglist;      /* List of processes in pgrp. */
> /*P*/   struct  proc *p_pptr;           /* Pointer to parent process. */
> /*P*/   LIST_ENTRY(proc) p_sibling;     /* List of sibling processes. */ 
> /*P*/   LIST_HEAD(, proc) p_children;   /* Pointer to list of children. */

Would non-RFMEM-fork()ed processes be the only ones here, and RFMEM ones 
automatically become a KSE of the proc?

> /*P*/   struct callout_handle p_ithandle; /*
>                                               * Callout handle for scheduling
>                                               * p_realtimer.
>                                               */
> [So who gets the resulting signal? Can differnt KSEGs have
> different timers running? what about KSEs? (I vote for KSEGs)]

KSEGs would be simplest.  BTW, I don't recall there really being a 
difference between a KSEG and a process containing KSEs.  Is there one?

> /* The following fields are all zeroed upon creation in fork. */
> #define p_startzero     p_oppid
>   
> /*P*/   pid_t   p_oppid;         /* Save parent pid during ptrace. XXX */ 
> /*C*/   int     p_dupfd;         /* Sideways return value from fdopen. XXX */
> [whatever THIS means.. it's a hack so C is the safest place for it]

Per-KSE?  Optionally, it would be nice to squash these kind of hacks.

> /*P*/   struct  vmspace *p_vmspace;     /* Address space. */
>  
>         /* scheduling */
> [I've shown the following as being in the KSE structure. they would be 
> collected there, but the priority is worked out for the entire KSEG
> so it probably collects the data from all of the KSEs. UNLESS we decide that
> all KSEs can have independent priorities, in which case how do you
> control how their priorities relate..]
> 
> /*E*/   u_int   p_estcpu;        /* Time averaged value of p_cpticks. */
> /*E*/   int     p_cpticks;       /* Ticks of cpu time. */
> /*E*/   fixpt_t p_pctcpu;        /* %cpu for this process during p_swtime */
>         void    *p_wchan;        /* Sleep address. */
>         const char *p_wmesg;     /* Reason for sleep. */
> /*P*/   u_int   p_swtime;        /* Time swapped in or out. */
> /*E?*/  u_int   p_slptime;       /* Time since last blocked. */
> [what does this mean?]

The scheduler updates the amount of time the process has been in a tsleep() 
(msleep()?).  Should then be KSE, along with the process states and whatnot.

> /*?*/   struct  itimerval p_realtimer;  /* Alarm timer. */
> [who gets these? who can set them? what is their scope?]

Same as signals, no?

> /*P*/   u_int64_t p_runtime;            /* Real time in microsec. */
> 
> [If we treat separate KSEGs as seperate processes, do we keep the
> below fields per KSEG? */
> /*G?*/  u_int64_t p_uu;                 /* Previous user time in microsec. */
> /*G?*/  u_int64_t p_su;                 /* Previous system time in microsec. */
> /*G?*/  u_int64_t p_iu;                 /* Previous interrupt time in usec. */
> [how about these? do we agregate? or collect per KSE? Is there a separate
> statclock per CPU?]
> /*P?*/  u_int64_t p_uticks;             /* Statclock hits in user mode. */
> /*P?*/  u_int64_t p_sticks;             /* Statclock hits in system mode. */
> /*P?*/  u_int64_t p_iticks;             /* Statclock hits processing intr. */
> 
> /*P*/   int     p_traceflag;            /* Kernel trace points. */
> /*P*/   struct  vnode *p_tracep;        /* Trace to vnode. */
> [do we trace all KSEs at once? how do we trace individual threads? */

I'd think we'd want to enable tracing an individual KSE; this could be done 
by making the trace vnode per-KSE, but I think it would be advantageous just 
to change the ktrace info to include both the PID and the KSEid.

> /*P*/   sigset_t p_siglist;             /* Signals arrived but not delivered. */
> [who gets signals? does each KSEG (KSE?) have its own handler?]

Hm.  Do you think there's a good use for separate signal-spaces, actually?  
How would thread migration (across KSEs) be handled for signals, then?  Not 
at all?

> /*P*/   struct  vnode *p_textvp;        /* Vnode of executable. */
> 
> /*P*/   char    p_lock;                 /* Process lock (prevent swap) count. */
> /*E*/   u_char  p_oncpu;                /* Which cpu we are on */
> /*E?*/  u_char  p_lastcpu;              /* Last cpu we were on */
> [each context or each KSE? KSEs can't migrate, (under discussion)]

If I may, I believe KSEs should be able to migrate.  It doesn't much make 
sense to waste a CPU at no utilization by saying "KSE x runs on CPU 0, y on 
1, and z on 0" and if y is blocked and x and z are both runnable, they must 
compete for CPU 0 instead of splitting across.

> /*EG?*/ char    p_rqindex;              /* Run queue index */
> Who is on the run queue? KSE or KSEG?
>    
> /*C*/   short   p_locks;                /* DEBUG: lockmgr count of held locks */
> /*C*/   short   p_simple_locks;         /* DEBUG: count of held simple locks */
> [If you cannot sleep or be interrupted with these they could be in the KSE]

You can hold a lockmgr() lock while msleep()ing...

> /*P?*/  unsigned int    p_stops;        /* procfs event bitmask */
> /*P?*/  unsigned int    p_stype;        /* procfs stop event type */
> /*P?*/  char    p_step;                 /* procfs stop *once* flag */
> /*P?*/  unsigned char   p_pfsflags;     /* procfs flags */
> [the procfs stuff is problematical... dependign in what it does 
> and what it is used for, the semantics might vary]

Procfs would need modifications if we want to make KSEs visible in it, and 
this could be trouble...

>         char    p_pad3[2];              /* padding for alignment */
> /*C*/   register_t p_retval[2];         /* syscall aux returns */

E?

> /*P*/   struct  sigiolst p_sigiolst;    /* list of sigio sources */
> [who gets signals?]
> 
> /*P*/   int     p_sigparent;            /* signal to parent on exit */
> /*P*/   sigset_t p_oldsigmask;          /* saved mask from before sigpause */
> [one per signal scope.. what IS the scope of a signal?]
> /*P*/   int     p_sig;                  /* for core dump/debugger XXX */
> /*P*/   u_long  p_code;                 /* for core dump/debugger XXX */
> /*P?*/  struct  klist p_klist;          /* knotes attached to this process */

That seems right.

> /*C?*/  LIST_HEAD(, mtx) p_heldmtx;     /* for debugging code */
> /*CE?*/ struct mtx *p_blocked;          /* Mutex process is blocked on */
> [depending on what this means ]

E.

> /*C*/   LIST_HEAD(, mtx) p_contested;   /* contested locks */

Why not E?

> /* End area that is zeroed on creation. */
> #define p_endzero       p_startcopy
>   
> /* The following fields are all copied upon creation in fork. */
> #define p_startcopy     p_sigmask
>         
> /*P?*/  sigset_t p_sigmask;     /* Current signal mask. */
> /*C?*/  stack_t p_sigstk;       /* sp & on stack state variable */
> [what is the scope of a signal?]
> 
> /*??*/  int     p_magic;        /* Magic number. */
> 
> [The fields below would be in the KSEG if the priority of all KSEs in a KSEG
> were to be calculated at one time.]
> 
> /*G*/   u_char  p_priority;     /* Process priority. */
> /*G*/   u_char  p_usrpri;       /* User-priority based on p_cpu and p_nice. */
> /*G*/   u_char  p_nativepri;    /* Priority before propogation. */
> /*G*/   char    p_nice;         /* Process "nice" value. */
> /*P*/   char    p_comm[MAXCOMLEN+1];
>   
> /*P*/   struct  pgrp *p_pgrp;   /* Pointer to process group. */
>  
> /*P*/   struct  sysentvec *p_sysent; /* System call dispatch information. */
> 
> /*G*/   struct  rtprio p_rtprio;        /* Realtime priority. */
> [priorities ar eper KSEG]
> 
> /*P*/   struct  prison *p_prison;
> /*P*/   struct  pargs *p_args;
> [Either the whole Process is in gaol or it isn't]
> 
> /* End area that is copied on creation. */
> #define p_endcopy       p_addr
> /*P?*/  struct  user *p_addr;   /* Kernel virtual addr of u-area (PROC ONLY). */
> [XXX    Are there 'per KSE' filds there? (actually yes there are...the pcb is
> there).

The contents should be reevaluated.

> /*C?*/  struct  mdproc p_md;    /* Any machine-dependent fields. */
> [there is a trapframe there. not sure what it;s used for]

Trapframe?  E.

> /*P*/   u_short p_xstat;        /* Exit status for wait; also stop signal. */
> /*P*/   u_short p_acflag;       /* Accounting flags. */
> [these may be collected per KSE and harvested when needed]
> /*P*/   struct  rusage *p_ru;   /* Exit information. XXX */
>         
> /*P*/   int     p_nthreads;     /* number of threads (only in leader) */
> [not sure how this is used... may become redundant]
> 
> /*G?*/  void    *p_aioinfo;     /* ASYNC I/O info */
> [will aio be 'per KSE, per KSEG or per PROC?]

Probably the same as signals, but I'd be inclined to say per proc, keeping 
in mind that the aio is a separate thread.

> /*C*/   int     p_wakeup;       /* thread id */
> [will surely change]
> /*P*/   struct proc *p_peers;
> /*P*/   struct proc *p_leader;
> /*C*/   struct  pasleep p_asleep;       /* Used by asleep()/await(). */
> /*P*/   void    *p_emuldata;    /* process-specific emulator state data */

Should probably have another KSE-specific one, if needed.  That is, planning 
ahead :)

> /*C*/   struct ithd *p_ithd;    /* for interrupt threads only */
> };
>  
> 
> 
> 
> Obviously before we can really finish this we need to decide,
> what the scope of signals is.. Who gets externally genrated signals?
> Who gets signals that are the result of an action (e.g. SIGIO, SIGPIPE)?
> WHich signals are diverted when you allocate a signal stack?
> In the same context, what is the scope of aio?
> where are the results delivered? who is responsible for the 
> kernel threads that do the work? do we allocate a KSE to run them? etc.etc.
> What is the scope of the timers and such?

You can always be flexible enough to have a system call to set the behavior.

> All this makes a difference in where the fields live....
> 
> Does anyone have comments?
> (Everyone has been VERY quiet so far!!!)

I'll be less quiet now, at least!

> julian
> 
> 
> -- 
>       __--_|\  Julian Elischer
>      /       \ julian@elischer.org
>     (   OZ    ) World tour 2000
> ---> X_.---._/  presently in:  Budapest
>             v

--
 Brian Fundakowski Feldman           \  FreeBSD: The Power to Serve!  /
 green@FreeBSD.org                    `------------------------------'




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200011262239.eAQMd0576413>