From owner-freebsd-hackers Wed Apr 26 22:47:46 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from lor.watermarkgroup.com (lor.watermarkgroup.com [207.202.73.33]) by hub.freebsd.org (Postfix) with ESMTP id 3C44C37B933; Wed, 26 Apr 2000 22:47:42 -0700 (PDT) (envelope-from luoqi@watermarkgroup.com) Received: (from luoqi@localhost) by lor.watermarkgroup.com (8.10.1/8.10.1) id e3R5lfi24004; Thu, 27 Apr 2000 01:47:41 -0400 (EDT) Date: Thu, 27 Apr 2000 01:47:41 -0400 (EDT) From: Luoqi Chen Message-Id: <200004270547.e3R5lfi24004@lor.watermarkgroup.com> To: green@FreeBSD.ORG, hackers@FreeBSD.ORG Subject: Re: lock-ups due to the scheduler Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG This is quite interesting. I'm no scheduler expert, but my understanding is priority < PUSER won't degrade and is only set in kernel mode after waking up from a sleep. In user mode, processes should always have priority p_usrpri >= PUSER, it is obviously not true for a negative nice value: > newpriority = PUSER + p->p_estcpu / INVERSE_ESTCPU_WEIGHT + > NICE_WEIGHT * p->p_nice; We should probably offset p->p_nice by PRIO_MIN, > newpriority = PUSER + p->p_estcpu / INVERSE_ESTCPU_WEIGHT + > NICE_WEIGHT * (p->p_nice - PRIO_MIN); To fully utilize the 20 out of 32 run queues for user priorities, we might want to change NICE_WEIGHT from 2 to 1, and upper limit of p_estcpu to #define ESTCPULIM(e) \ min((e), INVERSE_ESTCPU_WEIGHT * (NICE_WEIGHT * (PRIO_MAX - PRIO_MIN) - PPQ) + \ INVERSE_ESTCPU_WEIGHT - 1) so that a cpu hog at nice 0 would have about the same priority as a low cpu usage nice +20 process. -lq > I dropped hints that there may be issues about 3 weeks ago, as my machine > had locked up for apparently no reason, and I had no idea why until > recently. It seems that it has everything to do with running things that > use lots of CPU at a very high priority (I use -20). > > I've been struggling for a few days with this... and what happens is that > the kernel never executes any user processes (or does so so very rarely > that I really can't detect it at all, in any application, the execution). > > I'm not really sure what's happening, but it definitely is with the > scheduler: if I cap the scheduler's priority computation on the lower > end to keep user processes from not executing with a p_priority < PUSER, > the system can get slightly unresponsive, but it does not lock up. > The modifications I made to allow prevention of this follow my signature. > > This is a deadlock-type situation, and I can reproduce it at will, so I'll > try to explain the steps I can reproduce it with. > > 1. start XMMS at -20 priority, and play something > 2. XMMS is decoding audio and other random things, nothing huge, but at > about 15-20% CPU. XMMS sends decoded mp3 (archives of CDs I own) to > the EsounD daemon (esd), which takes 2-3% cpu or so, and XFree86 > itself takes a good 5-10% cpu. > 3. Start a "Visualization" plugin, which basically takes XMMS to full > CPU usage (as much as it can get), and things lock up. XMMS is the > curproc for every single time I've polled it (using DDB, for example), > and I stop hearing audio. XFree86 would be doing the X11 servering, > and esd would mostly be writing to the audio device or reading from its > socket, so usually in PRIBIO or PSOCK. > > At this point, the system is really locked up, and there's nothing I can > do. I can, however, get a coredump and have the entire system state at > this point. I'm certain that that other people here will be able to try > the same tests, of course on 5.0-CURRENT, and possibly reproduce them > exactly the same as it happens for me. I can grovel in a coredump to > get information about the system as it was running at the time, so if > anyone can provide hints as to where to check for what happened that > makes things lock up nowadays, I'll be grateful, and I'll be able to > try almost anything to get this fixed. > > If you're familiar with the scheduler area of the system, please help. > I have noone's arm to twist or anything of the sort, so I'm really > going out on a limb hoping someone will be able to try to help me fix > this. Note that I've taken my HZ=1000 line out of my kernel config, > so I'm running at a standard hz = 100 and a kern.quantum of 20000. > > -- > Brian Fundakowski Feldman \ FreeBSD: The Power to Serve! / > green@FreeBSD.org `------------------------------' > > Index: kern_synch.c > =================================================================== > RCS file: /usr2/ncvs/src/sys/kern/kern_synch.c,v > retrieving revision 1.89 > diff -u -u -r1.89 kern_synch.c > --- kern_synch.c 2000/03/28 18:06:42 1.89 > +++ kern_synch.c 2000/04/27 00:55:21 > @@ -903,6 +903,10 @@ > maybe_resched(p); > } > > +static int priority_lower_cap = 0; > +SYSCTL_INT(_debug, OID_AUTO, enable_priority_lower_cap, CTLFLAG_RW, > + &priority_lower_cap, 0, ""); > + > /* > * Compute the priority of a process when running in user mode. > * Arrange to reschedule if the resulting priority is better > @@ -917,6 +921,12 @@ > if (p->p_rtprio.type == RTP_PRIO_NORMAL) { > newpriority = PUSER + p->p_estcpu / INVERSE_ESTCPU_WEIGHT + > NICE_WEIGHT * p->p_nice; > + if (priority_lower_cap && newpriority < PUSER) { > + if (p == curproc) > + uprintf("kernel: tried to use priority %d\n", > + newpriority); > + newpriority = PUSER; > + } > newpriority = min(newpriority, MAXPRI); > p->p_usrpri = newpriority; > } > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message