From owner-freebsd-current@freebsd.org Sun May 7 10:57:04 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 33DF9D6209E for ; Sun, 7 May 2017 10:57:04 +0000 (UTC) (envelope-from tsoome@me.com) Received: from st13p35im-asmtp001.me.com (st13p35im-asmtp001.me.com [17.164.199.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 010B91758; Sun, 7 May 2017 10:57:04 +0000 (UTC) (envelope-from tsoome@me.com) Received: from process-dkim-sign-daemon.st13p35im-asmtp001.me.com by st13p35im-asmtp001.me.com (Oracle Communications Messaging Server 7.0.5.38.0 64bit (built Feb 26 2016)) id <0OPK00500X0SNR00@st13p35im-asmtp001.me.com>; Sun, 07 May 2017 10:56:57 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=me.com; s=4d515a; t=1494154617; bh=h3jeyFsxS2AxxQpL7l77yZLAqzdKhKg3FhTJ6RtpIBM=; h=From:Message-id:Content-type:MIME-version:Subject:Date:To; b=KkomI7kbo+tXKYoFqQ6KJi7Y3viCpdaNI42JmiP8Xpo6oww+A59hXYueYUsuwfy9T p8sypZPPyHjU74RXL+2nTQs3kBznVdcfaBk7e6rDDjQyvzZxpStwEN1GSBIOPjgKI1 z/w1i+7UmDHPDaTfS/GlQUm6EvPJTTbiNwndR6MyKFI62PLcEPoZsDeUuAlM0i2Qx5 rHABatHiiHH7zQB0kajkWHwVNCsVif+B3la4tpnDcgKGc5TiNDNxPUFhIIANTAe07q eyCtExOOGLaXiI4X1HjokpiTVUyxPPbHXn1mDOfFtTx/FaG9/MAbnCrpoaqEu4d0rV U9et7LFD4V/nw== Received: from icloud.com ([127.0.0.1]) by st13p35im-asmtp001.me.com (Oracle Communications Messaging Server 7.0.5.38.0 64bit (built Feb 26 2016)) with ESMTPSA id <0OPK00C43X2NOH50@st13p35im-asmtp001.me.com>; Sun, 07 May 2017 10:56:56 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-05-07_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 clxscore=1034 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1701120000 definitions=main-1705070088 From: Toomas Soome Message-id: MIME-version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: bootcode capable of booting both UFS and ZFS? (Amazon/ec2) Date: Sun, 07 May 2017 13:56:46 +0300 In-reply-to: <55ef7994-eac7-5639-0905-345a2a2d5bea@freebsd.org> Cc: Warner Losh , freebsd-current , Toomas Soome , Andriy Gapon , Colin Percival To: Julian Elischer References: <963c5c97-2f92-9983-cf90-ec9d59d87bba@freebsd.org> <053354DF-651F-423C-8057-494496DA3B91@me.com> <972d2a0b-862c-2510-090d-7e8f5d1fce4d@freebsd.org> <55ef7994-eac7-5639-0905-345a2a2d5bea@freebsd.org> X-Mailer: Apple Mail (2.3273) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 May 2017 10:57:04 -0000 > On 7. mai 2017, at 13:18, Julian Elischer wrote: >=20 > On 7/5/17 1:45 pm, Warner Losh wrote: >> On Sat, May 6, 2017 at 10:03 PM, Julian Elischer = wrote: >>> On 6/5/17 4:01 am, Toomas Soome wrote: >>>>=20 >>>>> On 5. mai 2017, at 22:07, Julian Elischer >>>> > wrote: >>>>>=20 >>>>> Subject says it all really, is this an option at this time? >>>>>=20 >>>>> we'd like to try boot the main zfs root partition and then fall = back to a >>>>> small UFS based recovery partition.. is that possible? >>>>>=20 >>>>> I know we could use grub but I'd prefer keep it in the family. >>>>>=20 >>>>>=20 >>>>>=20 >>>>=20 >>>> it is, sure. but there is an compromise to be made for it. >>>>=20 >>>> Lets start with what I have done in illumos port, as the idea there = is >>>> exactly about having as =E2=80=9Cuniversal=E2=80=9D binaries as = possible (just the binaries >>>> are listed below to get the size): >>>>=20 >>>> -r-xr-xr-x 1 root sys 171008 apr 30 19:55 bootia32.efi >>>> -r-xr-xr-x 1 root sys 148992 apr 30 19:55 bootx64.efi >>>> -r--r--r-- 1 root sys 1255 okt 25 2015 cdboot >>>> -r--r--r-- 1 root sys 154112 apr 30 19:55 gptzfsboot >>>> -r-xr-xr-x 1 root sys 482293 mai 2 21:10 loader32.efi >>>> -r-xr-xr-x 1 root sys 499218 mai 2 21:10 loader64.efi >>>> -r--r--r-- 1 root sys 512 okt 15 2015 pmbr >>>> -r--r--r-- 1 root sys 377344 mai 2 21:10 pxeboot >>>> -r--r--r-- 1 root sys 376832 mai 2 21:10 zfsloader >>>>=20 >>>> the loader (bios/efi) is built with full complement - zfs, ufs, = dosfs, >>>> cd9660, nfs, tftp + gzipfs. The cdboot is starting zfsloader (thats = trivial >>>> string change). >>>>=20 >>>> The gptzfsboot in illumos case is only built with zfs, dosfs and = ufs - as >>>> it has to support only disk based media to read out the loader. = Also I am >>>> building gptzfsboot with libstand and libi386 to get as much shared = code as >>>> possible - which has both good and bad sides, as usual;) >>>>=20 >>>> The gptzfsboot size means that with ufs the dedicated boot = partition is >>>> needed (freebsd-boot), with zfs the illumos port is always using = the 3.5MB >>>> boot area after first 2 labels (as there is no geli, the illumos = does not >>>> need dedicated boot partition with zfs). >>>>=20 >>>> As the freebsd-boot is currently created 512k, the size is not an = issue. >>>> Also using common code does allow the generic partition code to be = used, so >>>> GPT/MBR/BSD (VTOC in illumos case) labels are not problem. >>>>=20 >>>>=20 >>>> So, even just with cd boot (iso), starting zfsloader (which in fbsd = has >>>> built in ufs, zfs etc), you already can get rescue capability. >>>>=20 >>>> Now, even with just adding ufs reader to gptzfsboot, we can use gpt = + >>>> freebsd-boot and ufs root but loading zfsloader on usb image, so it = can be >>>> used for both live/install and rescue, because zfsloader itself has = support >>>> for all file systems + partition types. >>>>=20 >>>> I have kept myself a bit off from freebsd gptzfsboot because of = simple >>>> reason - the older setups have smaller size for freebsd boot, and = not >>>> everyone is necessarily happy about size changes:D also in freebsd = case >>>> there is another factor called geli - it most certainly does = contribute some >>>> bits, but also needs to be properly addressed on IO call stack (as = we have >>>> seen with zfsbootcfg bits). But then again, here also the shared = code can >>>> help to reduce the complexity. >>>>=20 >>>> Yea, the zfsloader/loader*.efi in that listing above is actually = built >>>> with framebuffer code and compiled in 8x16 default font (lz4 = compressed >>>> ascii+boxdrawing basically - because zfs has lz4, the decompressor = is always >>>> there), and ficl 4.1, so thats a bit of difference from fbsd = loader. >>>>=20 >>>> Also note that we can still build the smaller dedicated blocks like = boot2, >>>> just that we can not use those blocks for more universal cases and >>>> eventually those special cases will diminish. >>>=20 >>> thanks for that.. >>>=20 >>> so, here's my exact problem I need to solve. >>> FreeBSD 10 (or newer) on Amazon EC2. >>> We need to have a plan for recovering the scenario where somethign = goes >>> wrong (e.g. during an upgrade) and we are left with a system where = the >>> default zpool rootfs points to a dataset that doesn't boot. It is = possible >>> that mabe the entire pool is unbootable into multi-user.. Maybe = somehow it >>> filled up? who knows. It's hard to predict future problems. >>> There is no console access at all so there is no possibility of = human >>> intervention. So all recovery paths that start "enter single user = mode >>> and...." are unusable. >>>=20 >>> The customers who own the amazon account are not crazy about giving = us the >>> keys to the kingdom as far as all their EC2 instances, so taking a = root >>> drive off a 'sick' VM and grafting it onto a freebsd instance to = 'repair' it >>> becomes a task we don't want to really have to ask them to do. They = may not >>> have the in-house expertise to do it. confidently. >>>=20 >>> This leaves us with automatic recovery, or at least automatic = methods of >>> getting access to that drive from the network. >>> Since the regular root is zfs, my gut feeling is that to deduce the = chances >>> of confusion during recovery, I'd like the (recovery) system itself = to be >>> running off a UFS partition, and potentially, with a memory root = filesystem. >>> As long as it can be reached over the network we can then take over. >>>=20 >>> we'd also like to have the boot environment support in the bootcode. >>> so, what would be the minimum set we'd need? >>>=20 >>> Ufs support, zfs support, BE support, and support for selecting a = completely >>> different boot procedure after some number of boot attempts without = getting >>> all the way to multi-user. >>>=20 >>> How does that come out size-wise? And what do I need to configure = to get >>> that? >>>=20 >>> The current EC2 Instances have a 64kB boot partition , but I have a = window >>> to convince management to expand that if I have a good enough = argument. >>> (since we a re doing a repartition on the next upgrade, which is = "special" >>> (it's out upgrade to 10.3 from 8.0). >>> Being able to self heal or at least 'get at' a sick instance might = be a good >>> enough argument and would make the EC2 instances the same as all the = other >>> versions of the product.. >> You should convince them to move to 512k post-haste. I doubt 64k will >> suffice, and 512k is enough to get all the features you desire. >=20 > yeah I know but sometimes convincing management of things is like = banging one's head against a wall. > Don't think I haven't tried, and won't keep trying. >=20 To support recovery there can be 2 scenarios: 1. something has gone bad and you boot from alternate media = (iso/usb/net), log in, and fix the setup. 2. if the alternate media is not available, there has to be recovery = =E2=80=9Cimage=E2=80=9D, preferably isolated from rest of the system, = such as recovery partition. The second option needs an mechanism to get activated; something like = =E2=80=9CX times try normal boot, then use recovery=E2=80=9D. The = zfsbootcfg Andriy did, is currently providing the reverse option - try = this config, if it is failing, fall back to normal. But that work can be = used as base nevertheless - to provide not one time [next] boot config, = but fallback. Of course something like =E2=80=9Crecovery partition=E2=80=9D would need = to be architected to be as foolproof as possible, but it definitely is = possible. BTW: this is a bit specific to illumos and zfs, but some concerns and = ideas from comments are still worth to be noted: = https://www.illumos.org/rb/r/249/ - especially the pad area should = actually have not simple string, but some structure to allow different = semantics (next boot or fall back boot, maybe something other). rgds, toomas