From owner-freebsd-hackers@freebsd.org Mon Apr 30 10:37:51 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8D286FCACBE; Mon, 30 Apr 2018 10:37:51 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (smtp.digiware.nl [176.74.240.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2611D78C10; Mon, 30 Apr 2018 10:37:50 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from router.digiware.nl (localhost.digiware.nl [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id AF4C23D4F4; Mon, 30 Apr 2018 12:37:48 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.com Received: from smtp.digiware.nl ([127.0.0.1]) by router.digiware.nl (router.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ovyt5Zq6dPbv; Mon, 30 Apr 2018 12:37:47 +0200 (CEST) Received: from [192.168.10.67] (opteron [192.168.10.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.digiware.nl (Postfix) with ESMTPSA id 5604E3D4F0; Mon, 30 Apr 2018 12:37:47 +0200 (CEST) Subject: Re: Getting ZFS pools back. From: Willem Jan Withagen To: Warner Losh , Jan Knepper Cc: FreeBSD Filesystems , FreeBSD Hackers , Richard Yao , Alan Somers References: <5f836c79-b379-f066-689b-1645e393c5e9@digiware.nl> <1645b168-4133-693c-2dd3-8e0606abb9c3@digiware.nl> <07576f68-f67e-3a22-7a50-ff261c9b3fff@digitaldaemon.com> <7588abf8-16e4-8820-a0e5-e019a02a7bd6@digiware.nl> Message-ID: Date: Mon, 30 Apr 2018 12:37:45 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <7588abf8-16e4-8820-a0e5-e019a02a7bd6@digiware.nl> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Apr 2018 10:37:51 -0000 On 29-4-2018 23:20, Willem Jan Withagen wrote: > On 29/04/2018 20:21, Warner Losh wrote: >> >> >> On Sun, Apr 29, 2018 at 11:57 AM, Jan Knepper > > wrote: >> >>     On 04/29/2018 13:27, Willem Jan Withagen wrote: >> >>         Trouble started when I installed (freebsd-update) 11.1 over a >>         running 10.4. Which is sort of scarry? >> >>     This does sounds 'scary' as I am planning to do this in the (near) >>     future... >> >>     Has anyone else experienced issues like this? >> >>     Generally I do build the new system software on a running system, >>     but then go to single user mode to perform the actual install. >> >>     I have done many upgrades like that over 18 or so years and never >>     seen or heard of an issue alike this. >> >> >> 11.x binaries aren't guaranteed to work with a 10.x kernel. So that's >> a bit of a problem. freebsd-update shouldn't have let you do that either. >> >> However, most 11.x binaries work well enough to at least bootstrap / >> fix problems if booted on a 10.x kernel due to targeted forward >> compatibility. You shouldn't count on it for long, but it generally >> won't totally brick your box. In the past, and I believe this is still >> true, they work well enough to compile and install a new kernel after >> pulling sources. The 10.x -> 11.x syscall changes are such that you >> should be fine. At least if you are on UFS. > > I have been doing those kind of this for years and years. Even upgrading > over NFS and stuff. Sometimes it is a bit too close to the sun and > things burn. But never crash this bad. > >> However, the ZFS ioctls and such are in the bag of 'don't specifically >> guarantee and also they change a lot' so that may be why you can't >> mount ZFS by UUID. I've not checked to see if there's specifically an >> issue here or not. The ZFS ABI is somewhat more fragile than other >> parts of the system, so you may have issues here. >> >> If all else fails, you may be able to PXE boot an 11 kernel, or boot >> off a USB memstick image to install a kernel. > > Tried just about replace everything in both the boot-partition (First > growing it to take > 64K gptzfsboot) and in /boot from the memstick. > But the error never went away. > > Never had ZFS die on me this bad, that I could not get it back. > >> Generally, while we don't guarantee forward compatibility (running >> newer binaries on older kernels), we've generally built enough forward >> compat so that things work well enough to complete the upgrade. That's >> why you haven't hit an issue in 18 years of upgrading. However, the >> velocity of syscall additions has increased, and we've gone from >> fairly stable (stale?) ABIs for UFS to a more dynamic one for ZFS >> where backwards compat is a bit of a crap shoot and forward compat >> isn't really there at all. That's likely why you've hit a speed bump >> here. > > Come to think of it, I did not do this step with freebsd-update, since I > was not at an official release yet. I was going to 11.1-RELEASE, to be > able to start using freebsd-update. > > So I don't think I did just do that.... But I tried so much yesterday. > Normally I would installkernel, reboot, installworld, mergemaster, > reboot for systems that are not up for freebsd-update. Right, The story gets even sadder ..... Took the "spare" disk home, and just connected it to an older SuperMicro server I had lying about for Ceph tests. And lo and behold, it just boots. So that system got upgraded from: 10.2 -> 10.4 -> 11.1 No complaints about anything. So now I'm inclined to point at older hardware with an old bios, which confused ZFS, or probably more precisely gptzfsboot. From dmidecode: System Information Manufacturer: Supermicro Product Name: H8SGL Version: 1234567890 BIOS Information Vendor: American Megatrends Inc. Version: 3.5 Release Date: 11/25/2013 Address: 0xF0000 We only have 1 of those, so further investigation, and or tinkering, in combo with the hardware will be impossible. --WjW