From owner-freebsd-current@FreeBSD.ORG Tue Feb 7 18:28:07 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 27BA016A420; Tue, 7 Feb 2006 18:28:07 +0000 (GMT) (envelope-from yar@comp.chem.msu.su) Received: from comp.chem.msu.su (comp.chem.msu.su [158.250.32.97]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0AB1843D45; Tue, 7 Feb 2006 18:28:00 +0000 (GMT) (envelope-from yar@comp.chem.msu.su) Received: from comp.chem.msu.su (localhost [127.0.0.1]) by comp.chem.msu.su (8.13.3/8.13.3) with ESMTP id k17IRuU5047913; Tue, 7 Feb 2006 21:27:56 +0300 (MSK) (envelope-from yar@comp.chem.msu.su) Received: (from yar@localhost) by comp.chem.msu.su (8.13.3/8.13.3/Submit) id k17IRtSa047912; Tue, 7 Feb 2006 21:27:56 +0300 (MSK) (envelope-from yar) Date: Tue, 7 Feb 2006 21:27:55 +0300 From: Yar Tikhiy To: Cy Schubert Message-ID: <20060207182755.GB32998@comp.chem.msu.su> References: <20060207173154.GE19674@comp.chem.msu.su> <200602071806.k17I6WQc007602@cwsys.cwsent.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200602071806.k17I6WQc007602@cwsys.cwsent.com> User-Agent: Mutt/1.5.9i Cc: freebsd-current@freebsd.org, netchild@freebsd.org Subject: Re: 7.0-CURRENT Hang X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Feb 2006 18:28:07 -0000 On Tue, Feb 07, 2006 at 10:06:32AM -0800, Cy Schubert wrote: > In message <20060207173154.GE19674@comp.chem.msu.su>, Yar Tikhiy writes: > > On Mon, Feb 06, 2006 at 08:29:35PM -0800, Cy Schubert wrote: > > > > > > On the Pentium P54C model (that's an old 120 MHz Pentium I use as a 4.x, > > > 5.x, and 7.x ports build testbed) the CPUID instruction when called with AL > > > > > = 0x02, CPUID returns EAX = EBX = ECX = EDX = 0. The code fragment in > > > identcpu.c below results in "rounds" becoming 0xffffffff. > > > > > > do_cpuid(0x2, regs); > > > rounds = (regs[0] & 0xff) - 1; > > > > > > The subsequent loop of the following will loop virtually for ever (it takes > > > > > forever tor this machine to count down from 0xffffffff performing a very > > > great many calls to get_INTEL_TLB in the process, virtually hanging the > > > machine in the process. > > > > > > while (rounds > 0) { > > > [... code ...] > > > rounds--; > > > } > > > > FWIW, my presumably P54C machine (Family 5 Model 2 Stepping 6) > > doesn't indicate it has the CPUID 0x02 function. That is, CPUID > > 0x00 returns EAX = 0x01, which is the highest function supported. > > Could you try to run the misc/cpuid port on your Pentium and show > > its output? It might appear that the code around CPUID 0x02 shouldn't > > be reached at all in your case. Zero values from CPUID 0x02 are > > pretty indicative of that. > > Mine is Family 5 Model 2 Stepping 12. All of my doc is for Pentium-Pro and > newer so you are probably correct. Do you know what CPUID function 0x00 returns in EAX for your CPU? Hint: just run misc/cpuid once and show its output here. I've just fixed the port so that it has no bogus dependencies and is very light-weight. > > Dealing with "rounds" equal to -1 can be a good idea anyway to catch > > braid dead CPUs instead of hanging the system on them. > > Well, with rounds = -1 [actually (unsigned int)0xffffffff], the CPU will > "appear" to hang as it "rounds" or loops virtually forever -- counting back > from 0xffffffff on a 120 MHz machine and performing get TLB info a number > of times each iteration takes hours to do just a few iterations. I've seen > mine go through "rounds", decrementing rounds-- each time, for hours at a > time, though initially before digging into it using DDB it did appear that > the CPU was hung, it was just starting to loop for 4,294,967,295 times. On > older and slower machines, if it took hours to iterate through a few > iterations, my guess is that it would take days to loop through this code. > My patch allows it to take the defaults and finally boot. If the CPU > doesn't support AL = 0x02, what's the point of looping? It appears to run > nicely with the patch. I do see that rounds = -1 is causing trouble. I just meant that we should not call do_cpuid(0x02) at all if (cpu_high < 2) because it can result in undefined behavior. Your patch still makes sense because it deals with possible brain-dead CPUs. I'd implement it in a slightly different way though -- stay tuned! :-) -- Yar