From owner-svn-src-all@FreeBSD.ORG Thu May 27 11:39:23 2010 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8A5DE1065676; Thu, 27 May 2010 11:39:23 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au [211.29.132.189]) by mx1.freebsd.org (Postfix) with ESMTP id 1E5A38FC0C; Thu, 27 May 2010 11:39:22 +0000 (UTC) Received: from besplex.bde.org (c122-106-175-32.carlnfd1.nsw.optusnet.com.au [122.106.175.32]) by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o4RBdICx009622 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 27 May 2010 21:39:19 +1000 Date: Thu, 27 May 2010 21:39:18 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Neel Natu In-Reply-To: Message-ID: <20100527200033.K1376@besplex.bde.org> References: <201005270127.o4R1RPaT016558@svn.freebsd.org> <4BFDE4E3.4060300@FreeBSD.org> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="0-599122876-1274960358=:1376" Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, Alexander Motin , src-committers@freebsd.org, Neel Natu Subject: Re: svn commit: r208585 - head/sys/mips/mips X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 May 2010 11:39:23 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-599122876-1274960358=:1376 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Wed, 26 May 2010, Neel Natu wrote: > On Wed, May 26, 2010 at 8:20 PM, Alexander Motin wrote: >> Neel Natu wrote: >> Also, as soon as you run timer1 on frequency higher then hz - it is >> strange to see >> =A0 =A0 =A0 =A0stathz =3D hz; >> =A0 =A0 =A0 =A0profhz =3D hz; >> there. It is just useless. Better would be to do same as for x86: >> =A0 =A0 =A0 =A0profhz =3D timer1hz; >> =A0 =A0 =A0 =A0if (timer1hz < 128) >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0stathz =3D timer1hz; >> =A0 =A0 =A0 =A0else >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0stathz =3D timer1hz / (timer1hz / 128); >> This is almost unreadable due to \xa0. =09stathz =3D timer1hz / (timer1hz / 128); only works right if timer1hz is a multiple of 128, or at least a multiple of the final stathz.. Otherwise, there may be significant rounding error in the calculation, and if the final stathz is not an exact divisor of timer1hz it is impossible to generate stathz from timer1hz by dividing it. (This has always been broken for the lapic timer on amd64 and i386. stathz =3D 133 is only nearly a divisor of 1000 or 2000, and 128 is even further from being a divisor of any timer frequency that can generate hz 1000. The effects of this can be seen in systat(1) -v 1 output -- the reported lapic timer interrupt frequencies jump every ~(lapic_timer_hz / stathz) seconds when the divider compensates for the multiple not being exact. Another bug visible in systa= t -v and vmstat -i output on ref9-amd64 right now is that the lapic timer interrupt frequencies are all reported as 960. hz is reported to be 1000, but it is impossible to generate 1000 from 960. Another bug in the lapic timer code on amd64 and i386 is that it doesn't change the lapic timer frequency to generate a high enough profhz. profhz =3D 8192, which is generated by the RTC on amd64 and i386, was adequate in 1990, and it needs to be 100-1000 times larger now, but the lapic doesn't even generate that; it claims to generates 1024, and this is even more impossible to divide down from 960 than is 1000.) > I see your point with the profiling timer. I'll fix that to be like x86. > > However it is not immediately obvious why we prefer to run the > statistics timer at (or very close to) 128Hz. Any pointers? At least SCHED_4BSD requires stathz to be almost 128. More precisely, it requires a clock of frequency about 16 Hz and divides stathz internally by INVERSE_ESTCPU_WEIGHT =3D (8 * smp_cpus) to get this. It gets some extra resolution by accumulating ticks at stathz but has to divide the result by 8 before feeding it to the priority adjustment, else the adjustment would be too sensitive to recent activity, and/or would overflow (overflow is avoided by clamping to the limit, but this is bad too). Dividing by smp_ncpus is a hack to avoid the overflow at a cost of reducing sensitivity. The requirement for stathz to be almost 128 is pushed to the clock generator(s) to avoid having dividers (other than the simple/historical division by 8) in both the clock generator(s) and the scheduler(s). WHen using lapic timers, I normally use lapic_timer_hz =3D hz =3D stathz = =3D profhz =3D 100, and don't worry about the completely broken profhz or the scheduling problems from having stathz =3D hz. The scheduling problems are mostly caused by the hardware clocks behind stathz and hz being indentical. When they are identical, having stathz !=3D hz doesn't help much, at least without the changes that I suggested a few months ago (statclock() and hardclock() should never be called from the same hardware interrupt). There are 2 types of scheduling/statistics problems: - malicious applications may hide from scheduling/statistics interrupts by arranging that they don't run across the interrupts. This is easy to do while running for most of the time if hz is much larger than stathz (now the default :-(). - even non-malicious applications may hide from scheduling/statistics interrupts if the statclock and hardclock interrupts are too synchronous= =2E This is a problem with the lapic timer interrupts in practice. I think it takes almost perfect synchronization for there to be a problem in practice, and I can't see how the syncronization was perfect enough. For hz =3D 1000, lapic_timer_hz was 2000 and hardclock was called every second interrupt, while statclock was called every 2000/(stathz=3D133) = =3D 15th or 16th interrupt. Since 15 is not a multiple of 2, statclock was normally called for the same lapic timer interrupt only every second interrupt. This should be asynchronous enough. I don't know the detail= s of the current or previous implementation (where lapic_timer_hz is not 2000) but IIRC the dividers don't know anything about the synchronicity problem so they could easily make it worse. Bruce --0-599122876-1274960358=:1376--