From owner-freebsd-stable@freebsd.org Tue Dec 4 23:48:06 2018 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9C75D1325BFE for ; Tue, 4 Dec 2018 23:48:06 +0000 (UTC) (envelope-from tsoome@me.com) Received: from mr28p00im-ztfo03015001.me.com (mr28p00im-ztfo03015001.me.com [17.110.71.106]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9323079499 for ; Tue, 4 Dec 2018 23:48:05 +0000 (UTC) (envelope-from tsoome@me.com) Received: from [192.168.1.159] (unknown [80.235.52.148]) by mr28p00im-ztfo03015001.me.com (Postfix) with ESMTPSA id BEC88400A26; Tue, 4 Dec 2018 23:48:03 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) Subject: Re: Boot loader stuck after first stage upgrading 11.2 to 12.0-RC2 From: Toomas Soome X-Mailer: iPhone Mail (16B92) In-Reply-To: <1543954753.1860.243.camel@freebsd.org> Date: Wed, 5 Dec 2018 01:48:00 +0200 Cc: freebsd-current , freebsd-stable@freebsd.org, Ian Lepore Content-Transfer-Encoding: quoted-printable Message-Id: References: <22f5b92a09ea4d62ac3feb74457067f7@ijs.si> <5EEBAFC0-4FA3-4219-A918-7376F4223656@me.com> <0F5FCC70-EADB-4F9E-A391-F1A73BE5608F@me.com> <1543954753.1860.243.camel@freebsd.org> To: Mark Martinec X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-12-04_10:, , signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1812040205 X-Rspamd-Queue-Id: 9323079499 X-Spamd-Result: default: False [-4.32 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:17.110.0.0/15]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[me.com]; MX_GOOD(-0.01)[cached: mx1.mail.icloud.com]; DKIM_TRACE(0.00)[me.com:+]; DMARC_POLICY_ALLOW(-0.50)[me.com,quarantine]; NEURAL_HAM_SHORT(-0.96)[-0.961,0]; FROM_EQ_ENVFROM(0.00)[]; IP_SCORE(-0.75)[ip: (-1.99), ipnet: 17.110.0.0/15(-0.81), asn: 714(-0.88), country: US(-0.09)]; FREEMAIL_ENVFROM(0.00)[me.com]; RCVD_IN_DNSWL_LOW(-0.10)[106.71.110.17.list.dnswl.org : 127.0.5.1]; ASN(0.00)[asn:714, ipnet:17.110.0.0/15, country:US]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[me.com]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TAGGED_RCPT(0.00)[freebsd]; MIME_GOOD(-0.10)[text/plain]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-Rspamd-Server: mx1.freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Dec 2018 23:48:06 -0000 Yes, that must be true but it does not hurt to get checked. And of course, lsdev -v from 11.x loader would be good too. Anyhow, I am afraid we have reached to point where more specific debug info i= s needed (printed out), with lack of output about disks at all, it must be r= elated to floppy device checks. Rgds, Toomas Sent from my iPhone > On 4 Dec 2018, at 22:19, Ian Lepore wrote: >=20 > On Tue, 2018-12-04 at 21:51 +0200, Toomas Soome via freebsd-stable > wrote: >>=20 >>>=20 >>> On 4 Dec 2018, at 19:59, Mark Martinec >> i> wrote: >>>=20 >>>>=20 >>>>>=20 >>>>> 2018-11-29 18:43, Toomas Soome wrote: >>>>>>=20 >>>>>> I just did push biosdisk updates to stable/12, I wonder if >>>>>> you could >>>>>> test those bits=E2=80=A6 >>> Myself wrote: >>>>=20 >>>>>=20 >>>>> Thank you! I haven't tried it yet, but I wonder whether this >>>>> fix was >>>>> already incorporated into 12.0-RC3, which would make my rescue >>>>> easier. >>>>> Otherwise I can build a stable/12 on another host and >>>>> transplant >>>>> the problematic file(s) to the affected host - if I knew which >>>>> files >>>>> to copy. >>> 2018-12-02 18:59, Toomas wrote: >>>>=20 >>>> The files are /boot/loader* binaries - to be exact, check which >>>> one is >>>> linked to /boot/loader. I can provide binaries if needed. >>>> [...] >>>> rgds, >>>> toomas >>> I got a maintenance window today so I tried with the new loader, >>> and it did not help. >>>=20 >>> More specifically: >>>=20 >>> As it comes with 12-RC2, the /boot/loader was hard linked with >>> loader_lua. >>> Its size is 421888 bytes. So I concentrated on this loader. >>>=20 >>> I build a fresh stable/12 on another host, and copied the newly >>> built loader_lua (425984 bytes) to the /boot directory of the >>> affected >>> host, deleted the file 'loader', and hard-linked loader_lua to >>> loader. >>>=20 >>> The situation has not changed: the BTX loader lists all BIOS drives >>> C..J (disk0..disk7), then a spinner starts and gets stuck forever. >>> It never reaches the 'BIOS 635kB/3537856kB available memory' line. >>>=20 >>> While trying to restore the old /boot from 11.2, I tried booting >>> a live image from a 12.0-RC3 memory stick - and the loader got >>> stuck again, same as when booting from a disk. >>>=20 >>> So I had to boot from an 11.2 memstick to be able to regain >>> control. >>>=20 >>> Mark >>>=20 >>>=20 >> ok, if you could perform 2 tests: >>=20 >> 1. from loader prompt enter 0x413 0xa000 - @w . cr >>=20 >> 2. on first spinner, press space and type on boot: prompt: >> /boot/loader_4th and see if that will do better >> thanks, >> toomas >>=20 >=20 > I don't think that will be an option. If it hasn't gotten to the point > of saying how much BIOS available memory there is, it's only halfway > through loader main() and has hung before getting to interact(). >=20 > In fact, if that line hasn't printed, but some disk drives have been > listed, it pretty much has to be hung in the "March through the device > switch probing for things" loop. If all the disks are listed, then it > got through that entry in the devsw, and is likely hanging in the > dv_init calls for either the pxedisk or zfsdev devices. >=20 > -- Ian >=20 >>=20 >>>=20 >>>=20 >>>>=20 >>>>>=20 >>>>>>=20 >>>>>>>=20 >>>>>>> On 29 Nov 2018, at 17:01, Mark Martinec >>>>>> bsd@ijs.si> wrote: >>>>>>> After successfully upgraded three hosts from 11.2-p4 to >>>>>>> 12.0-RC2 (amd64, >>>>>>> zfs, bios), I tried my luck with one of our production >>>>>>> hosts, and ended up >>>>>>> with a stuck loader after rebooting with a new kernel >>>>>>> (after the first >>>>>>> stage of upgrade). >>>>>>> These were the steps, and all went smoothly and normally >>>>>>> until a reboot: >>>>>>> freebsd-update upgrade -r 12.0-RC2 >>>>>>> freebsd-update install >>>>>>> shutdown -r now >>>>>>> While booting, the 'BTX loader' comes up, lists the BIOS >>>>>>> drives, >>>>>>> then the spinner below the list comes up and begins >>>>>>> turning, >>>>>>> stuttering, and after a couple of seconds it grinds to a >>>>>>> standstill >>>>>>> and nothing happens afterwards. >>>>>>> At this point the ZFS and the bootstrap loader is supposed >>>>>>> to >>>>>>> come up, but it doesn't. >>>>>>> This host has too zfs pools, the system pool consists of >>>>>>> two SSDs >>>>>>> in a zfs mirror (also holding a freebsd-boot partition >>>>>>> each), the >>>>>>> other pool is a raidz2 with six JBOD disks on an LSI >>>>>>> controller. >>>>>>> The gptzfsboot in both freebsd-boot partitions is fresh >>>>>>> from 11.2, >>>>>>> both zpool versions are up-to-date with 11.2. The 'zpool >>>>>>> status -v' >>>>>>> is happy with both pools. >>>>>>> After rebooting from an USB drive and reverting the /boot >>>>>>> directory >>>>>>> to a previous version, the machine comes up normally again >>>>>>> with the 11.2-RELEASE-p4. >>>>>>> I found a file init.core in the / directory, slightly >>>>>>> predating the >>>>>>> last reboot with a salvaged system - although it was >>>>>>> probably not >>>>>>> a cause of the problem, but a consequence of the rescue >>>>>>> operation. >>>>>>> It is unfortunate that this is a production host, so I >>>>>>> can't play >>>>>>> much with it. One or two more quick experiments I can >>>>>>> probably >>>>>>> afford, but not much more. Should I just first wait for the >>>>>>> official 12.0 release? Should I try booting with a 12.0 on >>>>>>> USB >>>>>>> and try to import pools? Suggestions welcome. >>>>>>> Now that the /boot has been manually restored to the 11.2 >>>>>>> state, >>>>>>> A SECOND QUESTION is about freebsd-update, which still >>>>>>> thinks we are >>>>>>> in the middle of an upgrade procedure. Trying now to just >>>>>>> update >>>>>>> the 11.2-RELEASE-p4 to 11.2-RELEASE-p5, the fetch >>>>>>> complains: >>>>>>> # uname -a >>>>>>> FreeBSD xxx 11.2-RELEASE-p4 FreeBSD 11.2-RELEASE-p4 >>>>>>> # >>>>>>> # freebsd-version >>>>>>> 11.2-RELEASE-p4 >>>>>>> # >>>>>>> # freebsd-update fetch >>>>>>> src component not installed, skipped >>>>>>> You have a partially completed upgrade pending >>>>>>> Run '/usr/sbin/freebsd-update install' first. >>>>>>> Run '/usr/sbin/freebsd-update fetch -F' to proceed anyway. >>>>>>> So what is the right way to get rid of all traces of the >>>>>>> unsuccessful upgrade, and let freebsd-update believe we are >>>>>>> cleanly >>>>>>> at 11.2-p4 ? Removing /var/db/freebsd-update did not help. >>>>>>> Mark >> _______________________________________________ >> freebsd-stable@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd. >> org" >>=20 > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"=