From owner-freebsd-mips@FreeBSD.ORG Mon Apr 22 22:26:11 2013 Return-Path: Delivered-To: freebsd-mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D265350D; Mon, 22 Apr 2013 22:26:11 +0000 (UTC) (envelope-from lists@rewt.org.uk) Received: from hosted.mx.as41113.net (hosted.mx.as41113.net [91.208.177.22]) by mx1.freebsd.org (Postfix) with ESMTP id 6CA351AAB; Mon, 22 Apr 2013 22:26:11 +0000 (UTC) Received: from [IPv6:2001:b70:201:300::397] (unknown [IPv6:2001:b70:201:300::397]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: lists@rewt.org.uk) by hosted.mx.as41113.net (Postfix) with ESMTPSA id 3Zvj746wj5z14j; Mon, 22 Apr 2013 23:26:08 +0100 (BST) Message-ID: <5175B8FC.3030307@rewt.org.uk> Date: Mon, 22 Apr 2013 23:26:04 +0100 From: Joe Holden User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: Joe Holden Subject: Re: kern/177876: [mips] kernel stack overflow panic on mips64, EdgeRouter Lite References: <201304220300.r3M301iY093070@freefall.freebsd.org> <51753506.3070901@rewt.org.uk> <007c01ce3f9a$15044d40$3f0ce7c0$@rewt.org.uk> In-Reply-To: <007c01ce3f9a$15044d40$3f0ce7c0$@rewt.org.uk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "'freebsd-mips@FreeBSD.org'" X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Apr 2013 22:26:11 -0000 Joe Holden wrote: > On Apr 22, 2013, at 11:59 AM, Juli Mallett wrote: > >> On Mon, Apr 22, 2013 at 10:35 AM, Adrian Chadd wrote: >>> Do an svn log in sys/mips/ or sys/vm/ and look at the changes. >>> >>> I don't know how far you can go back before you don't have the >>> edgerouter lite support, but maybe you can try going back to when >>> Juli initially committed it, and then just work your way forward. >>> >>> I think Juli did the initial work, so she knows when it came in. >>> >>> juli - I don't suppose you could spin up FreeBSD-HEAD on the >>> edgerouter lite and take a look? It's highly likely someone messed up >>> since you did your port. :( >> I can't quite imagine why EdgeRouter Lite (or Octeon more generally) >> could be a special case here; I'd be more inclined to think it was >> generally 64-bit MIPS that would be broken. (A too-conservative >> definition or something.) Except I was pretty sure I'd run -CURRENT >> more recently than those changes. >> >> The only change that is suspect in mips/ since I made my changes is >> Warner's change to include/regnum.h, which looks like there's the slim >> possibility that it could screw up register saving in N64 builds. >> That would mean that it wasn't tested with a 64-bit build, though, >> which I'm sure Warner wouldn't be so sloppy as to do. >> >> Joe, can you try reverting 249523 and seeing if that fixes things for >> you? It seems like this breaks the order of registers saved to the >> PCB, which would break syscalls with more than 4 arguments, like mmap. >> Even just looking at how the macros expand in the N64 case makes it >> pretty clear that this change was made clumsily, e.g. from >> exception.S: >> >> SAVE_REG($12, 8, $29) >> SAVE_REG($13, 9, $29) >> SAVE_REG($14, 10, $29) >> SAVE_REG($15, 11, $29) >> SAVE_REG($8, 12, $29) >> SAVE_REG($9, 13, $29) >> SAVE_REG($10, 14, $29) >> SAVE_REG($11, 15, $29) >> >> For this to not break syscalls, struct trapframe would need to be >> updated, > > Looking at the trapframe, you are right. . I did test boot a kernel > with the change, but after-the-fact software forensics suggest I built the > new kernel and tested the old one. I found the new one installed as > kenrel.oct rather than kernel.oct which I test booted... > >> or the syscall handling code. Joe, can you confirm that backing out >> 249523 fixes things for you? If it does, Adrian, would you be willing >> to handle a backout? I can't imagine finding the time for a couple of >> days, and if this is really so badly, unnecessarily broken, that >> should be fixed immediately. I hope I'm wrong. Nobody should be >> making incomplete changes on the basis of a half-baked reading of >> purportedly-conflicting documentation, and without testing. >> Yikes! > > > > I am just building a pre-commit kernel, but if you guys know what it is I'll > wait for a fix :) > > Will this also fix the trapframe issue when the box is under heavy cpu load > or is that a different issue? > Ok so that is confirmed, reverted regnum.h and it boots fine. J