From owner-freebsd-current@freebsd.org Fri Mar 30 15:03:37 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9017EF73C06 for ; Fri, 30 Mar 2018 15:03:37 +0000 (UTC) (envelope-from se@freebsd.org) Received: from mailout02.t-online.de (mailout02.t-online.de [194.25.134.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mailout00.t-online.de", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1C1686C346; Fri, 30 Mar 2018 15:03:36 +0000 (UTC) (envelope-from se@freebsd.org) Received: from fwd37.aul.t-online.de (fwd37.aul.t-online.de [172.20.27.137]) by mailout02.t-online.de (Postfix) with SMTP id 40DED41B02AD; Fri, 30 Mar 2018 17:03:29 +0200 (CEST) Received: from Stefans-MBP-LAN.fritz.box (G5+YDOZTQhCZC5O+IkOIoIHiv2ZnFh2VIDS-AsOpLQneuGXTFhw4QY2oCdDY80hQlW@[84.154.109.148]) by fwd37.t-online.de with (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384 encrypted) esmtp id 1f1vYm-4eIP4q0; Fri, 30 Mar 2018 17:03:20 +0200 Subject: Re: Boot failure: panic: No heap setup References: <79d2bd72-f8b2-6476-9589-ebad9716698f@freebsd.org> From: Stefan Esser Openpgp: preference=signencrypt Autocrypt: addr=se@freebsd.org; keydata= xsBNBFVxiRIBCADOLNOZBsqlplHUQ3tG782FNtVT33rQli9EjNt2fhFERHIo4NxHlWBpHLnU b0s4L/eItx7au0i7Gegv01A9LUMwOnAc9EFAm4EW3Wmoa6MYrcP7xDClohg/Y69f7SNpEs3x YATBy+L6NzWZbJjZXD4vqPgZSDuMcLU7BEdJf0f+6h1BJPnGuwHpsSdnnMrZeIM8xQ8PPUVQ L0GZkVojHgNUngJH6e21qDrud0BkdiBcij0M3TCP4GQrJ/YMdurfc8mhueLpwGR2U1W8TYB7 4UY+NLw0McThOCLCxXflIeF/Y7jSB0zxzvb/H3LWkodUTkV57yX9IbUAGA5RKRg9zsUtABEB AAHNLlN0ZWZhbiBFw59lciAoVC1PbmxpbmUpIDxzdC5lc3NlckB0LW9ubGluZS5kZT7CwH8E EwEIACkFAlhtTvQCGwMFCQWjmoAHCwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRBH67Xv Wv31RAn0B/9skuajrZxjtCiaOFeJw9l8qEOSNF6PKMN2i/wosqNK57yRQ9AS18x4+mJKXQtc mwyejjQTO9wasBcniKMYyUiie3p7iGuFR4kSqi4xG7dXKjMkYvArWH5DxeWBrVf94yPDexEV FnEG9t1sIXjL17iFR8ng5Kkya5yGWWmikmPdtZChj9OUq4NKHKR7/HGM2dxP3I7BheOwY9PF 4mhqVN2Hu1ZpbzzJo68N8GGBmpQNmahnTsLQ97lsirbnPWyMviWcbzfBCocI9IlepwTCqzlN FMctBpLYjpgBwHZVGXKucU+eQ/FAm+6NWatcs7fpGr7dN99S8gVxnCFX1Lzp/T1YzsBNBFVx iRIBCACxI/aglzGVbnI6XHd0MTP05VK/fJub4hHdc+LQpz1MkVnCAhFbY9oecTB/togdKtfi loavjbFrb0nJhJnx57K+3SdSuu+znaQ4SlWiZOtXnkbpRWNUeMm+gtTDMSvloGAfr76RtFHs kdDOLgXsHD70bKuMhlBxUCrSwGzHaD00q8iQPhJZ5itb3WPqz3B4IjiDAWTO2obD1wtAvSuH uUj/XJRsiKDKW3x13cfavkad81bZW4cpNwUv8XHLv/vaZPSAly+hkY7NrDZydMMXVNQ7AJQu fWuTJ0q7sImRcEZ5EIa98esJPey4O7C0vY405wjeyxpVZkpqThDMurqtQFn1ABEBAAHCwGUE GAEKAA8FAlVxiRICGwwFCQWjmoAACgkQR+u171r99UQEHAf/ZxNbMxwX1v/hXc2ytE6yCAil piZzOffT1VtS3ET66iQRe5VVKL1RXHoIkDRXP7ihm3WF7ZKy9yA9BafMmFxsbXR3+2f+oND6 nRFqQHpiVB/QsVFiRssXeJ2f0WuPYqhpJMFpKTTW/wUWhsDbytFAKXLLfesKdUlpcrwpPnJo KqtVbWAtQ2/o3y+icYOUYzUig+CHl/0pEPr7cUhdDWqZfVdRGVIk6oy00zNYYUmlkkVoU7MB V5D7ZwcBPtjs254P3ecG42szSiEo2cvY9vnMTCIL37tX0M5fE/rHub/uKfG2+JdYSlPJUlva RS1+ODuLoy1pzRd907hl8a7eaVLQWA== To: tsoome@me.com Cc: "M. Warner Losh" , Kyle Evans , FreeBSD Current Message-ID: <838e40f6-2f05-9251-e5a9-13d52ba510b7@freebsd.org> Date: Fri, 30 Mar 2018 17:03:19 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: 8bit X-ID: G5+YDOZTQhCZC5O+IkOIoIHiv2ZnFh2VIDS-AsOpLQneuGXTFhw4QY2oCdDY80hQlW X-TOI-MSGID: 2148957a-98b0-49a4-9182-b22381955f51 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Mar 2018 15:03:37 -0000 Am 29.03.18 um 07:15 schrieb Toomas Soome: > > >> On 29 Mar 2018, at 01:06, Stefan Esser wrote: >> >> Am 28.03.18 um 22:28 schrieb Warner Losh: >>>> Hmmm, the code references point into the boot loader code - I had >>>> expected that there is a problem in the kernel, not the boot loader. >>>> >>>>> [1] >>>>> https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56 >>> >>>> >>>> >>>> Seems that setbase has either not been called or has been called with >>>> base=0. >>> >>> Right, which is odd... >>> >>>>> [2] >>>>> https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=markup#l688 >>> >>>> >>>> >>>> I had thought, that the zfs boot code has been initialized before the >>>> menu is displayed? >>> >>> Right, all of this should be done looooong before we get to the >>> interpreter. Can you break into the loader prompt and try the `heap` >>> command, see what that outputs? CC'ing imp@ because he actually knows >>> things. >>> >>> Totally weird. I'd add a printf to the sethead() function to display its args >>> and see if you get this panic before/after that printf... >> >> I'm currently using a Forth-enabled boot loader again, since this is a >> "production" machine (my home server, which also receives and keeps all >> my work email, for example). >> >> I'll build a clean world with the LUA loader and test it on one of the >> next days. Tests will include the "heap" loader command and I'll add the >> printf (though, if sbrk() has really not been called, I guess that will >> not go too well ...). >> >> Is it possible, that the setheap function is called a second time, just >> before jumping into the kernel? (In that case adding the printf might >> crash the loader in the first setheap call ...) >> >> Since the loader menu (and escaping from the menu) works, there must be >> a valid heap, at that time. >> > > indeed. and assuming the message really is from loader, it means, there must > be memory corruption - if so, you can check which variables are located > close to heap related ones… Also, since you have the working menu, it has to > be related to actual loading. Since the loading itself has been working so > far, it should be related to lua specific bits which are preparing towards > to call load functions. Ok, some more data points: 1) A printf in setheap reported plausible values during start-up of zfsboot. The menu appeared and wiped away the values so fast that I could not take a photo or write them down. 2) I have rebuilt world and kernel based on r331763. Booting resulted in the same panic as reported before. There was no debug output from the patched setheap call before the panic (which indicates that it was not called a second time). 3) In order to get my system to boot, I interrupted loading of zfsloader and forced loading of the previous version (from a world build with Forth in the loader). Booting succeeded with the latest kernel ... It looks as if sbrk() was called in zfsloader before setheap() has been used to initialize the heap parameters, if lua is enabled instead if Forth. See stand/i386/loader/main.c:124 for the location of the setheap call in the loader. This is obviously hard to debug, though, since printf cannot be called at that point. A pure write(2) should be possible without heap, but since the console has not been initialized at the point of the setheap invocation, there is no working output device, AFAIK. I do not see, how any sbrk() call could occur before setheap is called. And there does not appear to be any other setheap function (or macro) in the tree, that could overload the one defined in stand/libsa/sbrk.c ... I have no idea how to proceed from here ... But now I'm sure it is a problem in zfsloader (or loader in general?). Hmmm: How is the panic message printed by sbrk() without a initialized heap? The definition of panic in stand/libsa/panic.c relies on a working printf! I should be able to use printf in the same way as panic does, but I did not succeed when I tried to use it early in zfsloader ... Regards, STefan