From owner-freebsd-current@freebsd.org Fri Mar 30 18:10:51 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 277CBF52D09 for ; Fri, 30 Mar 2018 18:10:51 +0000 (UTC) (envelope-from tsoome@me.com) Received: from st13p35im-asmtp001.me.com (st13p35im-asmtp001.me.com [17.164.199.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C208E73A63; Fri, 30 Mar 2018 18:10:50 +0000 (UTC) (envelope-from tsoome@me.com) Received: from process-dkim-sign-daemon.st13p35im-asmtp001.me.com by st13p35im-asmtp001.me.com (Oracle Communications Messaging Server 8.0.1.2.20170607 64bit (built Jun 7 2017)) id <0P6F00E000UJVQ00@st13p35im-asmtp001.me.com>; Fri, 30 Mar 2018 18:10:36 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=me.com; s=04042017; t=1522433436; bh=DgL0GbxtjuYdpubgXazuJq+NTw+kKc4dcMCgIrfLxds=; h=From:Message-id:Content-type:MIME-version:Subject:Date:To; b=kiEP0p+LOu46Chi4A4mH9Yu6xKxEciBHg9TZ9B2dwr3QlUZvtWeMCNSotie7S6a/m 3fa0+XEul+lGFJRShOGw3iQWbtrm3vPgGu2uht2Huxj8g5KGjfMAhicFUqeWaiiOTE gCDrTVb/PMF4jX8rWP/up6H/7z5J7DG59WHrfoPySrGlrYOmBhLaycdcO+Fbb6ZN8N mebZBWBUTDjJ6a5a7VjKDImLEk6aERWafs2qhwdgk1yWjAoL6hDtrVH6mDnqyGmRw5 wXAHE3rFxy7fgvGuBrohU0CYrPabmwLeP1/bMm1D6X/p4l896b4mjTLbb22Ksx1Iil UCQ59acrE5TMw== Received: from icloud.com ([127.0.0.1]) by st13p35im-asmtp001.me.com (Oracle Communications Messaging Server 8.0.1.2.20170607 64bit (built Jun 7 2017)) with ESMTPSA id <0P6F0056F15JB130@st13p35im-asmtp001.me.com>; Fri, 30 Mar 2018 18:10:35 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-03-30_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 clxscore=1015 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1803300185 From: Toomas Soome Message-id: MIME-version: 1.0 (Mac OS X Mail 11.3 \(3445.6.18\)) Subject: Re: Boot failure: panic: No heap setup Date: Fri, 30 Mar 2018 21:10:31 +0300 In-reply-to: <838e40f6-2f05-9251-e5a9-13d52ba510b7@freebsd.org> Cc: "M. Warner Losh" , Kyle Evans , FreeBSD Current To: Stefan Esser References: <79d2bd72-f8b2-6476-9589-ebad9716698f@freebsd.org> <838e40f6-2f05-9251-e5a9-13d52ba510b7@freebsd.org> X-Mailer: Apple Mail (2.3445.6.18) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Mar 2018 18:10:51 -0000 > On 30 Mar 2018, at 18:03, Stefan Esser wrote: >=20 > Am 29.03.18 um 07:15 schrieb Toomas Soome: >>=20 >>=20 >>> On 29 Mar 2018, at 01:06, Stefan Esser wrote: >>>=20 >>> Am 28.03.18 um 22:28 schrieb Warner Losh: >>>>> Hmmm, the code references point into the boot loader code - I had >>>>> expected that there is a problem in the kernel, not the boot = loader. >>>>>=20 >>>>>> [1] >>>>>> = https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=3Dmarkup#l56 >>>> = >>>>>=20 >>>>>=20 >>>>> Seems that setbase has either not been called or has been called = with >>>>> base=3D0. >>>>=20 >>>> Right, which is odd... >>>>=20 >>>>>> [2] >>>>>> = https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=3Dm= arkup#l688 >>>> = >>>>>=20 >>>>>=20 >>>>> I had thought, that the zfs boot code has been initialized before = the >>>>> menu is displayed? >>>>=20 >>>> Right, all of this should be done looooong before we get to the >>>> interpreter. Can you break into the loader prompt and try the = `heap` >>>> command, see what that outputs? CC'ing imp@ because he actually = knows >>>> things. >>>>=20 >>>> Totally weird. I'd add a printf to the sethead() function to = display its args >>>> and see if you get this panic before/after that printf... >>>=20 >>> I'm currently using a Forth-enabled boot loader again, since this is = a >>> "production" machine (my home server, which also receives and keeps = all >>> my work email, for example). >>>=20 >>> I'll build a clean world with the LUA loader and test it on one of = the >>> next days. Tests will include the "heap" loader command and I'll add = the >>> printf (though, if sbrk() has really not been called, I guess that = will >>> not go too well ...). >>>=20 >>> Is it possible, that the setheap function is called a second time, = just >>> before jumping into the kernel? (In that case adding the printf = might >>> crash the loader in the first setheap call ...) >>>=20 >>> Since the loader menu (and escaping from the menu) works, there must = be >>> a valid heap, at that time. >>>=20 >>=20 >> indeed. and assuming the message really is from loader, it means, = there must >> be memory corruption - if so, you can check which variables are = located >> close to heap related ones=E2=80=A6 Also, since you have the working = menu, it has to >> be related to actual loading. Since the loading itself has been = working so >> far, it should be related to lua specific bits which are preparing = towards >> to call load functions. >=20 > Ok, some more data points: >=20 > 1) A printf in setheap reported plausible values during start-up of = zfsboot. > The menu appeared and wiped away the values so fast that I could not = take > a photo or write them down. >=20 if you got menu and stuff, it means that at that point the heap was all = OK. just after setheap() the bcache_init() is called and that too will = allocate memory. what you can do is to esc out from menu to OK prompt and check the = output of heap and biosmem commands=E2=80=A6=20 > 2) I have rebuilt world and kernel based on r331763. Booting resulted = in the > same panic as reported before. There was no debug output from the = patched > setheap call before the panic (which indicates that it was not = called a > second time). >=20 > 3) In order to get my system to boot, I interrupted loading of = zfsloader and > forced loading of the previous version (from a world build with = Forth in > the loader). Booting succeeded with the latest kernel ... >=20 > It looks as if sbrk() was called in zfsloader before setheap() has = been used > to initialize the heap parameters, if lua is enabled instead if Forth. = See > stand/i386/loader/main.c:124 for the location of the setheap call in = the > loader. this can only happen when something is called before main=E2=80=A6=20 >=20 > This is obviously hard to debug, though, since printf cannot be called = at that > point. A pure write(2) should be possible without heap, but since the = console > has not been initialized at the point of the setheap invocation, there = is no > working output device, AFAIK. >=20 > I do not see, how any sbrk() call could occur before setheap is = called. And > there does not appear to be any other setheap function (or macro) in = the > tree, that could overload the one defined in stand/libsa/sbrk.c ... >=20 > I have no idea how to proceed from here ... >=20 > But now I'm sure it is a problem in zfsloader (or loader in general?). >=20 > Hmmm: How is the panic message printed by sbrk() without a initialized = heap? > The definition of panic in stand/libsa/panic.c relies on a working = printf! >=20 > I should be able to use printf in the same way as panic does, but I = did > not succeed when I tried to use it early in zfsloader ... >=20 > Regards, STefan rgds, toomas