Date: Wed, 10 Jun 2015 23:52:39 -0700 From: "K. Macy" <kmacy@freebsd.org> To: Stefan Andritoiu <stefan.andritoiu@gmail.com> Cc: freebsd-hackers@freebsd.org Subject: Re: Gang scheduling implementation in the ULE scheduler Message-ID: <CAHM0Q_PUwAFFpeH2_EbO7mh-9ZYibDUGnLH4=ypL1cEXui--Rw@mail.gmail.com> In-Reply-To: <CAO3d8=aoPypn-57-EJKk0MUXtiLwM_Md6z41ONruxArkuOcHaw@mail.gmail.com> References: <CAO3d8=aoPypn-57-EJKk0MUXtiLwM_Md6z41ONruxArkuOcHaw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I just have a more general comment. There are a lot of different conflicting demands to adapt the scheduler for different workloads. Please try to encapsulate your changes such that common structures are changed as little as possible. Thanks. -K On Jun 10, 2015 3:50 PM, "Stefan Andritoiu" <stefan.andritoiu@gmail.com> wrote: > Hello, > > I am currently working on a gang scheduling implementation for the > bhyve VCPU-threads on FreeBSD 10.1. > I have added a new field "int gang" to the thread structure to specify > the gang it is part of (0 for no gang), and have modified the bhyve > code to initialize this field when a VCPU is created. I will post > these modifications in another message. > > When I start a Virtual Machine, during the guest's boot, IPIs are sent > and received correctly between CPUs, but after a few seconds I get: > spin lock 0xffffffff8164c290 (smp rendezvous) held by > 0xfffff8000296c000 (tid 100009) too long > panic: spin lock held too long > > If I limit the number of IPIs that are sent, I do not have this > problem. Which leads me to believe that (because of the constant > context-switch when the guest boots), the high number of IPIs sent > starve the system. > > Does anyone know what is happening? And maybe know of a possible solution? > > Thank you, > Stefan > > > > ====================================================================================== > I have added here the modifications to the sched_ule.c file and a > brief explanation of it: > > In struct tdq, I have added two new field: > - int scheduled_gang; > /* Set to a non-zero value if the respective CPU is required to > schedule a thread belonging to a gang. The value of scheduled_gang > also being the ID of the gang that we want scheduled. For now I have > considered only one running guest, so the value is 0 or 1 */ > - int gang_leader; > /* Set if the respective CPU is the one who has initialized gang > scheduling. Zero otherwise. Not relevant to the final code and will be > removed. Just for debugging purposes. */ > > Created a new function "static void schedule_gang(void * arg)" that > will be called by each processor when it receives an IPI from the gang > leader: > - sets scheduled_gang = 1 > - informs the system that it needs to reschedule. Not yet implemented > > In function "struct thread* tdq_choose (struct tdq * tdq)": > if (tdq->scheduled_gang) - checks to see if a thread belonging to > a gang must be scheduled. If so, calls functions that check the runqs > and return a gang thread. I have yet to implement these functions. > > In function "sched_choose()": > if (td->gang) - checks if the chosen thread is part of a gang. If > so it signals all other CPUs to run function "schedule_gang(void * > gang)". > if (tdq->scheduled_gang) - if scheduled_gang is set it means that > the scheduler is called after the the code in schedule_gang() has ran, > and bypasses sending IPIs to the other CPUs. If not for this checkup, > a CPU would receive a IPI; set scheduled_gang=1; the scheduler would > be called and would choose a thread to run; that thread would be part > of a gang; an IPI would be sent to all other CPUs. A constant > back-and-forth of IPIs between the CPUs would be created. > > The CPU that initializes gang scheduling, does not receive an IPI, and > does not even call the "schedule_gang(void * gang)" function. It > continues in scheduling the gang-thread it selected, the one that > started the gang scheduling process. > > > =================================================================== > --- sched_ule.c (revision 24) > +++ sched_ule.c (revision 26) > @@ -247,6 +247,9 @@ > struct runq tdq_timeshare; /* timeshare run queue. */ > struct runq tdq_idle; /* Queue of IDLE threads. */ > char tdq_name[TDQ_NAME_LEN]; > + > + int gang_leader; > + int scheduled_gang; > #ifdef KTR > char tdq_loadname[TDQ_LOADNAME_LEN]; > #endif > @@ -1308,6 +1311,20 @@ > struct thread *td; > > TDQ_LOCK_ASSERT(tdq, MA_OWNED); > + > + /* Pick gang thread to run */ > + if (tdq->scheduled_gang){ > + /* basically the normal choosing of threads but with regards to > scheduled_gang > + tdq = runq_choose_gang(&tdq->realtime); > + if (td != NULL) > + return (td); > + > + td = runq_choose_from_gang(&tdq->tdq_timeshare, tdq->tdq_ridx); > + if (td != NULL) > + return (td); > + */ > + } > + > td = runq_choose(&tdq->tdq_realtime); > if (td != NULL) > return (td); > @@ -2295,6 +2312,22 @@ > return (load); > } > > +static void > +schedule_gang(void * arg){ > + struct tdq *tdq; > + struct tdq *from_tdq = arg; > + tdq = TDQ_SELF(); > + > + if(tdq == from_tdq){ > + /* Just for testing IPI. Code is never reached, and should never be*/ > + tdq->scheduled_gang = 1; > +// printf("[schedule_gang] received IPI from himself\n"); > + } > + else{ > + tdq->scheduled_gang = 1; > +// printf("[schedule_gang] received on cpu: %s \n", tdq->tdq_name); > + } > +} > /* > * Choose the highest priority thread to run. The thread is removed from > * the run-queue while running however the load remains. For SMP we set > @@ -2305,11 +2338,26 @@ > { > struct thread *td; > struct tdq *tdq; > + cpuset_t map; > > tdq = TDQ_SELF(); > TDQ_LOCK_ASSERT(tdq, MA_OWNED); > td = tdq_choose(tdq); > if (td) { > + if(tdq->scheduled_gang){ > + /* Scheduler called after IPI > + jump over rendezvous*/ > + tdq->scheduled_gang = 0; > + } > + else{ > + if(td->gang){ > + map = all_cpus; > + CPU_CLR(curcpu, &map); > + > + smp_rendezvous_cpus(map, NULL, schedule_gang, NULL, tdq); > + } > + } > + > tdq_runq_rem(tdq, td); > tdq->tdq_lowpri = td->td_priority; > return (td); > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAHM0Q_PUwAFFpeH2_EbO7mh-9ZYibDUGnLH4=ypL1cEXui--Rw>