From owner-freebsd-stable@freebsd.org Tue Dec 10 15:36:36 2019 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 3A00D1DC742 for ; Tue, 10 Dec 2019 15:36:36 +0000 (UTC) (envelope-from marcnarc@gmail.com) Received: from mail-qt1-x82d.google.com (mail-qt1-x82d.google.com [IPv6:2607:f8b0:4864:20::82d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 47XPLC33Fzz3HGV for ; Tue, 10 Dec 2019 15:36:35 +0000 (UTC) (envelope-from marcnarc@gmail.com) Received: by mail-qt1-x82d.google.com with SMTP id s8so3171019qte.2 for ; Tue, 10 Dec 2019 07:36:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=6bdH89o3ULMS3r/3ah13JNbqoXRupKKjKIyY/9vutok=; b=RO/fS56q1fvAICzt7o2js5hx37RCO8CZwZhN/7uXKLp/8KQALAUKJtczSnc0qkg94T pIG6J5TIBhpBFykF03f2G2WEB/LbgcuOlwZgj9UABv6lWleA6/fhOCMmz0UvTDEDvHZg Wq8w+gaecZnMfIJxlQBPprRo77PDRAqkGqSXpCmpuADi+wUadCBsZljBs8TAJOY2vBie JeQYwoyxXF3H/cnFq4nzmTPZS74nD7bPKqUyB5Ww7LJod9b0daNFRCogFZwuIs8PKC8N 2zWcBUdygNMJqtF41H9sTJTBNHgJcaq2iSSRh5TKuM8S4oPJb1W+uKr59LJpAtRzClCh y+dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=6bdH89o3ULMS3r/3ah13JNbqoXRupKKjKIyY/9vutok=; b=jMnUX0PePJsI7p8a/80uozvgLwr/357qPL0Wc7budBWPz2kZi0tlv9rGcfdFCAJ2OD QgvnufQIrOp+bq53/0e7t6ERl/f6EtRnAttEK3RfFjjEuEg6tqEoWmvNjHkRsKLODLlH 1jpR5kY3utXi/jzTjwgRcaTLNz7h3k++w7VhntM0KkDPX2KPlCQEzJlw/YJyn1hMWUlU aDrHWHpMqtCgJ8KOqGYvcBU9Lh/Yqa04LpzYoFBcJpKApZOMRbZS2Q1lqWrmof47Qz3L 4cvaD4IBH4QrAgAXZF9wNaL6njCwKPBLjdxVm2nQybggo7o/UZvchc31tc9wMZkj53Hq T2UQ== X-Gm-Message-State: APjAAAXhFBObmS+YMfJsBmO3acXoGuaWgUG/6RF1k+vRhjYTfMp8DcEB BpOJBdMyCFAXPUsTHXVRDGw= X-Google-Smtp-Source: APXvYqw4z4ceH9G2MhVkG/y4DBQwlakA9lGg+tRYfW55ebZKyLP36AvcxV0uqYz8oAiEtpjCe9tCNw== X-Received: by 2002:aed:20e5:: with SMTP id 92mr16509549qtb.294.1575992193781; Tue, 10 Dec 2019 07:36:33 -0800 (PST) Received: from [10.10.1.32] ([192.252.130.194]) by smtp.gmail.com with ESMTPSA id s34sm1230809qtb.73.2019.12.10.07.36.32 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 10 Dec 2019 07:36:33 -0800 (PST) Subject: Re: Boot loader stuck after first stage upgrading 11.2 to 12.0-RC2 To: Mark Martinec , freebsd-stable@freebsd.org References: <22f5b92a09ea4d62ac3feb74457067f7@ijs.si> <5EEBAFC0-4FA3-4219-A918-7376F4223656@me.com> <0F5FCC70-EADB-4F9E-A391-F1A73BE5608F@me.com> <1543954753.1860.243.camel@freebsd.org> <53ceda24-fa1b-8546-3511-bd500b440dfe@digiware.nl> <4c4019102b63054f8de93324dba0e776@ijs.si> From: Marc Branchaud Message-ID: Date: Tue, 10 Dec 2019 10:35:26 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.1 MIME-Version: 1.0 In-Reply-To: <4c4019102b63054f8de93324dba0e776@ijs.si> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 47XPLC33Fzz3HGV X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=RO/fS56q; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of marcnarc@gmail.com designates 2607:f8b0:4864:20::82d as permitted sender) smtp.mailfrom=marcnarc@gmail.com X-Spamd-Result: default: False [-3.00 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; FREEMAIL_FROM(0.00)[gmail.com]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; FROM_EQ_ENVFROM(0.00)[]; IP_SCORE(0.00)[ip: (-9.36), ipnet: 2607:f8b0::/32(-2.22), asn: 15169(-1.92), country: US(-0.05)]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TAGGED_RCPT(0.00)[freebsd]; PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE_FREEMAIL(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[d.2.8.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Dec 2019 15:36:36 -0000 On 2019-12-10 9:18 a.m., Mark Martinec wrote: > Commenting on a thread from 2018-12 and from 2019-09-20, with my solution > to the boot problem at the end, in case anyone is still interested. Thank you very much for this. A couple of questions: (1) Why do you say "raw devices for historical reasons"? Glancing through the zpool man page and the Handbook, I see nothing recommending or requiring GPT partitions. (2) Just to be 100% clear, my 11.3 non-root zpool looks like this: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0 So this is using raw devices. Are you saying that if I upgrade this machine to 12 that it won't be able to boot? Thanks again! M. > ======= > > On 2018-11-29 myself wrote: > (after upgrading from 11.2 to 12.0): >> While booting, the 'BTX loader' comes up, lists the BIOS drives, >> then the spinner below the list comes up and begins turning, >> stuttering, and after a couple of seconds it grinds to a standstill >> and nothing happens afterwards. >> At this point the ZFS and the bootstrap loader is supposed to >> come up, but it doesn't. > [...] (on 2018-12-04): >> The situation has not changed: the BTX loader lists all BIOS drives >> C..J (disk0..disk7), then a spinner starts and gets stuck forever. >> It never reaches the 'BIOS 635kB/3537856kB available memory' line. >> >> While trying to restore the old /boot from 11.2, I tried booting >> a live image from a 12.0-RC3 memory stick - and the loader got >> stuck again, same as when booting from a disk. >> So I had to boot from an 11.2 memstick to be able to regain control. > > ======= > > 2018-12-04, Ian Lepore writes: >>   Toomas Soome wrote: >> |    ok, if you could perform 2 tests: >> |    1. from loader prompt enter 0x413 0xa000 - @w . cr >> |    2. on first spinner, press space and type on boot: prompt: >> |    /boot/loader_4th and see if that will do better >> |    thanks, toomas >> I don't think that will be an option.  If it hasn't gotten to the point >> of saying how much BIOS available memory there is, it's only halfway >> through loader main() and has hung before getting to interact(). >> >> In fact, if that line hasn't printed, but some disk drives have been >> listed, it pretty much has to be hung in the "March through the device >> switch probing for things" loop. If all the disks are listed, then it >> got through that entry in the devsw, and is likely hanging in the >> dv_init calls for either the pxedisk or zfsdev devices. > > ======= > > 2018-12-07 19:08, Willem Jan Withagen wrote: >> Ended up more or less in the same situation this afternoon with >> freebsd-upgrade to [12.0]-RC3 >> Boot stops after listing all DOS disks, in a spinner. >> So that is no fix. >> >> I booted from USB 11.2 and replaced the /boot/zfs{boot,loader} by the >> 11.2 ones. >> That makes my server again happy. > > =======are > > 2019-09-19 16:02, Kurt Jaeger wrote: > Subject: Re: Lockdown adaX numbers to allow booting ? >> |  Kurt Jaeger writes: >> |    The problem is that if all 10 disks are connected, the system >> |    looses track from where it should boot and fails to boot (serial >> boot log): >> | >> |    Consoles: internal video/keyboard  serial port >> |    BTX loader 1.00  BTX version is 1.02 >> |    Consoles: internal video/keyboard  serial port >> |    BIOS drive C: is disk0 >> |    BIOS drive D: is disk1 >> |    BIOS drive E: is disk2 >> |    BIOS drive F: is disk3 >> |    BIOS drive G: is disk4 >> |    BIOS drive H: is disk5 >> |    BIOS drive I: is disk6 >> |    BIOS drive J: is disk7 >> |    BIOS drive K: is disk8 >> |    BIOS drive L: is disk9 >> |    // >> |    [...] >> |    The solution right now is this to unplug all disks of the 'bck' >> pool, >> |    reboot, and re-insert the data disks after the boot is finished. >> |    [...] >> |    No gpart on the bck pool, raw drives. > > 2019-09-20 17:27, Mark Martinec wrote: > Subject: Re: Lockdown adaX numbers to allow booting ? >> >> This sounds very much like my experience: >> >>   2018-11-29, Boot loader stuck after first stage upgrading 11.2 to >> 12.0-RC2 >> https://lists.freebsd.org/pipermail/freebsd-stable/2018-November/090129.html >> >> https://lists.freebsd.org/pipermail/freebsd-stable/2018-December/090159.html >> >> >> I now have three SuperMicro machines which are unable to boot after >> upgrading 11.2 to 12.0. After unsuccessfully fiddling with boot loaders, >> I have reverted two back to 11.2 (which boots and works fine again), >> and the third one is now at 12.0 but needs the boot hack as described >> by Kurt, i.e. pull out half the disks (of the 'data' pool), boot the >> system, plug the disks back in and zfs mount the remaining pool. >> >> Considering that the 11.2 boots and works fine on these machines, >> I consider it a btx loader failure and not a BIOS issue. >> >> What is common with these three machines is that they have one pool >> on raw devices for historical reasons (not on gpt partitions). >> My guess is that the new loader gets confused by these raw disks. > > ======= > > Ok, now to my current situation and solution/workaround. > > What was common with these hosts (and similar) is that a machine > has more than a couple of disks, with a zfs pool (non-root) on > raw devices (for historical reasons), not on gpt partitions. > > Three workarounds seem possible: > > - replace a boot loader with the one from 11.2, or > > - using a default loader from 12, disconnect a sufficient number >   of data disks, boot, then reconnect disks and zfs attach the pool, > > - or my current solution: zfs offline one disk at a time from >   a data pool, wipe it, set up a gpt partition on it and >   put it back to the pool by 'zfs replace', letting it resilver. >   It was a painful and slightly risky procedure (9 hours of >   resilvering each of the seven disks), but this procedure >   has now salvaged our remaining hosts which could not be >   upgraded from 11.2 to 12. > > Mark > > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"