From owner-freebsd-hackers@FreeBSD.ORG Thu Jun 11 06:52:39 2015 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F265D55D for ; Thu, 11 Jun 2015 06:52:39 +0000 (UTC) (envelope-from kmacybsd@gmail.com) Received: from mail-pa0-x242.google.com (mail-pa0-x242.google.com [IPv6:2607:f8b0:400e:c03::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C16F3142B for ; Thu, 11 Jun 2015 06:52:39 +0000 (UTC) (envelope-from kmacybsd@gmail.com) Received: by pabli10 with SMTP id li10so14675583pab.2 for ; Wed, 10 Jun 2015 23:52:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=FtvPDHVyG8RrwqHegCOz9Ql8gd4aRA78jQgiMnRejlg=; b=Eb0v9KtD8lRJHR6tNLxcKSPD2Ku3oKCmUNXCc17h4Wm6Ul/XHb1dejjIKDK8ijP6VU siEOEHMrRzLEXuMFPUEyToe6RI3hRFaZJ2bRO6ROK+EU0lNs0P5el9399kkZmmTDsBqh BBako5hZFqeDBCWS9PIoxq5HVkgGn49BgYMfYOwrPhIyHN/OOiVbvLMV0otWXRc7oh5z X8z4CApA+OYGVGA8oKXqZsqMn4/qaixTkIB4ub+iodDGuQGlWeFLm5HTuRdndolsDYNH eb3s1TDH+CT3r6X5rh10kRo9kIcIBqdN5Ago7V6EMQD/MWL4n1gEuvaahCgCtOmFQ4Ty /hOA== MIME-Version: 1.0 X-Received: by 10.70.93.69 with SMTP id cs5mr12249727pdb.165.1434005559368; Wed, 10 Jun 2015 23:52:39 -0700 (PDT) Sender: kmacybsd@gmail.com Received: by 10.66.236.36 with HTTP; Wed, 10 Jun 2015 23:52:39 -0700 (PDT) Received: by 10.66.236.36 with HTTP; Wed, 10 Jun 2015 23:52:39 -0700 (PDT) In-Reply-To: References: Date: Wed, 10 Jun 2015 23:52:39 -0700 X-Google-Sender-Auth: 9f4D0KOY4oZ6oYa2dNQS6BBcPZE Message-ID: Subject: Re: Gang scheduling implementation in the ULE scheduler From: "K. Macy" To: Stefan Andritoiu Cc: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jun 2015 06:52:40 -0000 I just have a more general comment. There are a lot of different conflicting demands to adapt the scheduler for different workloads. Please try to encapsulate your changes such that common structures are changed as little as possible. Thanks. -K On Jun 10, 2015 3:50 PM, "Stefan Andritoiu" wrote: > Hello, > > I am currently working on a gang scheduling implementation for the > bhyve VCPU-threads on FreeBSD 10.1. > I have added a new field "int gang" to the thread structure to specify > the gang it is part of (0 for no gang), and have modified the bhyve > code to initialize this field when a VCPU is created. I will post > these modifications in another message. > > When I start a Virtual Machine, during the guest's boot, IPIs are sent > and received correctly between CPUs, but after a few seconds I get: > spin lock 0xffffffff8164c290 (smp rendezvous) held by > 0xfffff8000296c000 (tid 100009) too long > panic: spin lock held too long > > If I limit the number of IPIs that are sent, I do not have this > problem. Which leads me to believe that (because of the constant > context-switch when the guest boots), the high number of IPIs sent > starve the system. > > Does anyone know what is happening? And maybe know of a possible solution? > > Thank you, > Stefan > > > > ====================================================================================== > I have added here the modifications to the sched_ule.c file and a > brief explanation of it: > > In struct tdq, I have added two new field: > - int scheduled_gang; > /* Set to a non-zero value if the respective CPU is required to > schedule a thread belonging to a gang. The value of scheduled_gang > also being the ID of the gang that we want scheduled. For now I have > considered only one running guest, so the value is 0 or 1 */ > - int gang_leader; > /* Set if the respective CPU is the one who has initialized gang > scheduling. Zero otherwise. Not relevant to the final code and will be > removed. Just for debugging purposes. */ > > Created a new function "static void schedule_gang(void * arg)" that > will be called by each processor when it receives an IPI from the gang > leader: > - sets scheduled_gang = 1 > - informs the system that it needs to reschedule. Not yet implemented > > In function "struct thread* tdq_choose (struct tdq * tdq)": > if (tdq->scheduled_gang) - checks to see if a thread belonging to > a gang must be scheduled. If so, calls functions that check the runqs > and return a gang thread. I have yet to implement these functions. > > In function "sched_choose()": > if (td->gang) - checks if the chosen thread is part of a gang. If > so it signals all other CPUs to run function "schedule_gang(void * > gang)". > if (tdq->scheduled_gang) - if scheduled_gang is set it means that > the scheduler is called after the the code in schedule_gang() has ran, > and bypasses sending IPIs to the other CPUs. If not for this checkup, > a CPU would receive a IPI; set scheduled_gang=1; the scheduler would > be called and would choose a thread to run; that thread would be part > of a gang; an IPI would be sent to all other CPUs. A constant > back-and-forth of IPIs between the CPUs would be created. > > The CPU that initializes gang scheduling, does not receive an IPI, and > does not even call the "schedule_gang(void * gang)" function. It > continues in scheduling the gang-thread it selected, the one that > started the gang scheduling process. > > > =================================================================== > --- sched_ule.c (revision 24) > +++ sched_ule.c (revision 26) > @@ -247,6 +247,9 @@ > struct runq tdq_timeshare; /* timeshare run queue. */ > struct runq tdq_idle; /* Queue of IDLE threads. */ > char tdq_name[TDQ_NAME_LEN]; > + > + int gang_leader; > + int scheduled_gang; > #ifdef KTR > char tdq_loadname[TDQ_LOADNAME_LEN]; > #endif > @@ -1308,6 +1311,20 @@ > struct thread *td; > > TDQ_LOCK_ASSERT(tdq, MA_OWNED); > + > + /* Pick gang thread to run */ > + if (tdq->scheduled_gang){ > + /* basically the normal choosing of threads but with regards to > scheduled_gang > + tdq = runq_choose_gang(&tdq->realtime); > + if (td != NULL) > + return (td); > + > + td = runq_choose_from_gang(&tdq->tdq_timeshare, tdq->tdq_ridx); > + if (td != NULL) > + return (td); > + */ > + } > + > td = runq_choose(&tdq->tdq_realtime); > if (td != NULL) > return (td); > @@ -2295,6 +2312,22 @@ > return (load); > } > > +static void > +schedule_gang(void * arg){ > + struct tdq *tdq; > + struct tdq *from_tdq = arg; > + tdq = TDQ_SELF(); > + > + if(tdq == from_tdq){ > + /* Just for testing IPI. Code is never reached, and should never be*/ > + tdq->scheduled_gang = 1; > +// printf("[schedule_gang] received IPI from himself\n"); > + } > + else{ > + tdq->scheduled_gang = 1; > +// printf("[schedule_gang] received on cpu: %s \n", tdq->tdq_name); > + } > +} > /* > * Choose the highest priority thread to run. The thread is removed from > * the run-queue while running however the load remains. For SMP we set > @@ -2305,11 +2338,26 @@ > { > struct thread *td; > struct tdq *tdq; > + cpuset_t map; > > tdq = TDQ_SELF(); > TDQ_LOCK_ASSERT(tdq, MA_OWNED); > td = tdq_choose(tdq); > if (td) { > + if(tdq->scheduled_gang){ > + /* Scheduler called after IPI > + jump over rendezvous*/ > + tdq->scheduled_gang = 0; > + } > + else{ > + if(td->gang){ > + map = all_cpus; > + CPU_CLR(curcpu, &map); > + > + smp_rendezvous_cpus(map, NULL, schedule_gang, NULL, tdq); > + } > + } > + > tdq_runq_rem(tdq, td); > tdq->tdq_lowpri = td->td_priority; > return (td); > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" >