Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Mar 2017 19:28:58 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Julian Elischer <julian@freebsd.org>
Cc:        Toomas Soome <tsoome@me.com>, Lawrence Stewart <lstewart@freebsd.org>,  freebsd-fs@freebsd.org, Toomas Soome <tsoome@freebsd.org>,  Andriy Gapon <avg@freebsd.org>, allanjude@freebsd.org
Subject:   Re: svn commit: r308089 - in head
Message-ID:  <20170309175948.R1327@besplex.bde.org>
In-Reply-To: <4a498b08-7417-e7b1-2e5d-b0dbe5f3c49a@freebsd.org>
References:  <201610291409.u9TE9WXJ020650@repo.freebsd.org> <c4cc03d0-d26e-f7c0-8399-d65f2aa0c5ef@freebsd.org> <CCB18F77-A9C3-4D22-82A3-9DD84DF783F9@me.com> <9f0b2f93-04b8-b90b-3cb5-13b8539b9171@freebsd.org> <814E1C65-23E3-42A1-8093-8008DF188506@me.com> <4a498b08-7417-e7b1-2e5d-b0dbe5f3c49a@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 8 Mar 2017, Julian Elischer wrote:

> On 7/3/17 4:48 pm, Toomas Soome wrote:
>>> ...
>> The problem is deeper, the idea behind the nextboot is that it is=20
>> attempting to provide recovery from failed boot, so if you set nextboot=
=20
>> dataset, attempt to boot from it, you need to do 2 things: 1. detect the=
=20
>> nextboot config, so you would actually be able to use it, and 2, you wan=
t=20
>> to reset it as early as possible, because later you may not have a chanc=
e.
>>=20
>> So it means the gptzfsboot has to read out the config to know where from=
 it=20
>> has to load the zfsloader, and gptzfsboot has to reset the config, so th=
at=20
>> if anything will go wrong, on next boot the fallback or =E2=80=9Cnormal=
=E2=80=9D boot=20
>> will be done. Which means that either gptzfsboot has to know how to deal=
=20
>> with geli in context of handling nextboot, or with geli, you just can no=
t=20
>> use nextboot config.
>>=20
>> The similar issue is with using boot block area in zfs pool label - to b=
e=20
>> able to store and use gptzfsboot in pool label boot area, the boot1 eith=
er=20
>> has to know how to read the geli, or geli must be able not to encrypt th=
e=20
>> bootblock area, or we just can not use that area [with geli]. All in all=
,=20
>> it is another example of the chicken and the egg issue:)
>
> this is why the ORIGINAL nextboot in freebsd 3 (+ or -) wrote the data in=
to=20
> block 1 of the drive and read it from boot0, and rewrote block 1 after=20
> zeroing out teh entry.
> All using bios calls.
> 1/ read and remove ASAP,
> 2  don't depend on the filesystem.. it may be dead, and that is why we ar=
e=20
> redirecting somewhere else.

I didn't like this method.  Anything that writes to the disk increases
fragility.  Is still use my version of biosboot (updated for elf and EDD)
and try not to see the nextboot code in it (I can't delete this code
since it would then show up even more in diffs).

My method (used mostly only interactively) is to depend on a filesystem
for boot2 and things loaded by boot2.  It is easy to maintain such a
file system somewhere (perhaps on removable write-only media), but can
be hard to find it or ensure that it is the one booted (this often bites
me when bringing up a new system from a USB drive; first it can be hard
to boot from USB, and then I forgot how the standard boot0 misbehaves
and hit F5 which tends to switch to a Windows drive with an unusable MBR
on it).

Once a known-good partition (with a good file system including boot2 or
even loader on it) is found.  It is easy to control things by booting
to the kernel on that.  My method for cycling through kernels to run
run benchmarks on them is to copy the next kernel to a standard place
and boot to there.  The kernel is selected by an index in a text file.
Cycling is done by incrementing the index.  I only do this for rebooting
within a single partition, but could immediately use a variation that
switches between i386 and amd64 partitions.

To use this method for nextboot, the main cycle would have to be of
length 2, to switch between a know-good partition and the next try on
a not-known-good partition (after booting to the known-good partition,
it is is easy to clean up the other partition and copy a hopefull-better
kernel or even a whole filesystem to it).

The index (of 0 or 1) for the main cycle can't be stored in the known-good
filesystem since after a crash booting a bad partition there is no safe
way to update the index there.  On x86, there should be space reserved
for OS use in CMOS, but unfortunately nothing is properly reserved there.
I have though of using the alarm register[s] for this.  The alarm
register[s] are not normally used, and for cycles of nextbooting the OS
could simply turn off alarms and use them.  Add ECC to this and it becomes
very safe to use them.  At worst, another OS might boot in the middle of
the cycle and change the alarm setting.  Just boot to the known-good
partition when ECC detects an error, and trust ECC to detect the unlikely
interruption and change.

Recently, I noticed that the entire msgbuf survives rebooting on amd64
systems with 16GB memory.  The msgbuf is in high memory for amd64 and
this seems to survive warm boots and is not clobbered by the BIOS.
But the BIOS on the same system scribbles over almost all memory below
4GB.  It leaves large portitions of the msbguf intact, but the msgbuf
is protected by a stupid 32-bit checksum and always detects the clobbering.
Change this to ECC and apply it to individual messages and we can recover
most of the message buffer on i386, without expanding it much.  Add
redundancy/ECC to recover more of it.  It also has atomic update problems
that are best fix by redundancy and localized checksums which could be
ECC exept that is a bit over-engineered.

If we can robustly prreserve entire msgbufs across reboot, then it is
trivial to preserve a single status bit for nextboot.  Just not so easy
to do ECC for this bit in 512 bytes in boot0.  The memory bit could be
for extra checking of the RTC registers, with simple checksums instead
of ECC on both.

> the current nextboot is not nearly as useful and needs to be replaced as =
soon=20
> as possible as a failed experiment.
> things we coudl do to improve nextboot functionality:
> 1/ declare a partition type freebsd-bootinfo tha t is just raw boot info.

Ugh, this is as bad as Windows using multiple precious (non-GPT) partitions
for itself.  My method needs a known-good partition, but this can be a
FreeBSD partition.  My methods would actually have to put the decisions int=
o
boot2, since there is not enough space in boot0 and avoid using loader:
- boot0: boot to known-good slice, say ad4s3
- boot2: this actually lives at the start of the slice, not in a partition,
   so is easy to find.  One reason I don't like loader is that it lives on
   a file system, so deferring decisions to it is not so robust.
- boot2 has to find the right drive and slice.  This is not so easy.  The
   default of the first FreeBSD slice is wrong in some of my configurations=
=2E
   This can be made more robust by hard-coding values in the boot2 binary.
- boot2 then has to find the right partition.  It almost always defaults
   to the 'a' partition, and loads /boot.config from there to possibly
   override the default.  It is safest to always make the known-good
   partition 'a'.
- before reading boot.config, boot2 does ECC on various places to find
   overrides.  With nextboot inactive or the cycle-control value is 0,
   it reads boot.config from the known-good partition.  Otherwise, it
   reads boot.config from somewhere else, say the the 'b' partition
   or just an alternative boot.config on the known-good partition.  It
   ca contain anything, so the next try is not limited to 1 alernative.

> 2/ store the info in a known place in the freebsd-zfs partition (what and=
riy=20
> is doing I believe)
> 3/ store it at the end of the freebsd-boot partition.
> It should be read by gptzfsboot and set into the environment (what comes=
=20
> earlier in a gpt system?)  originally I read it using bios calls from boo=
t0.
> that was of course a UFS system on a dedicated drive.

It is fundamentally insecure to allow booting from all over the place,
especially when selected by simple binary values written to raw drives.
My method actually works well to fix this.  Better write the alternative
boot.config to the same known-good partition as the non-alternative (this
is only impossibly of the known-good partition is read-only).  Then the
simple binary value is not very robust, but all it does is select between
the known-good boot.config and the alternative one.  Security is attained
by never pointing the alternative one to an insecure partition.

Bruce
From owner-freebsd-fs@freebsd.org  Thu Mar  9 14:32:02 2017
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 27684D0437E;
 Thu,  9 Mar 2017 14:32:02 +0000 (UTC)
 (envelope-from gergely.czuczy@harmless.hu)
Received: from marvin.harmless.hu (marvin.harmless.hu [195.56.55.204])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id DC10A1568;
 Thu,  9 Mar 2017 14:32:01 +0000 (UTC)
 (envelope-from gergely.czuczy@harmless.hu)
Received: from 178-164-145-249.pool.digikabel.hu ([178.164.145.249]
 helo=[10.219.16.1])
 by marvin.harmless.hu with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128)
 (Exim 4.88 (FreeBSD)) (envelope-from <gergely.czuczy@harmless.hu>)
 id 1clz6j-000NiM-Le; Thu, 09 Mar 2017 14:31:57 +0000
Subject: Re: process killed: text file modification
To: freebsd-current@freebsd.org, freebsd-fs@freebsd.org
References: <d4d04499-17f8-e3d7-181f-c8ee8285e32b@harmless.hu>
 <646c1395-9482-b214-118c-01573243ae5a@harmless.hu>
From: Gergely Czuczy <gergely.czuczy@harmless.hu>
Message-ID: <45436522-77df-f894-0569-737a6a74958f@harmless.hu>
Date: Thu, 9 Mar 2017 15:31:56 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <646c1395-9482-b214-118c-01573243ae5a@harmless.hu>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.23
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>;
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Mar 2017 14:32:02 -0000

[+freebsd-fs]


On 2017. 03. 09. 14:20, Gergely Czuczy wrote:
> On 2017. 03. 09. 11:27, Gergely Czuczy wrote:
>> Hello,
>>
>> I'm trying to build a few things from ports on an rpi3, the ports 
>> collection is mounted over NFS from another machine. When it's trying 
>> to build pkg i'm getting the error message in syslog:
>>
>> rpi3 kernel: pid 4451 (sh), uid 0, was killed: text file modification
>>
>> The report to pkg@:
>> https://lists.freebsd.org/pipermail/freebsd-pkg/2017-March/002048.html
>>
>> In ports-mgmt/pkg's config.log It fails at the following entry:
>> configure:3726: checking whether we are cross compiling
>> configure:3734: cc -o conftest -O2 -pipe  -Wno-error 
>> -fno-strict-aliasing   conftest.c  >&5
>> configure:3738: $? = 0
>> configure:3745: ./conftest
>> configure:3749: $? = 137
>> configure:3756: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.0':
>> configure:3760: error: cannot run C compiled programs.
>> If you meant to cross compile, use `--host'.
>> See `config.log' for more details
>>
>> # uname -a
>> FreeBSD rpi3 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r314949: Thu Mar 9 
>> 08:58:46 CET 2017 
>> aegir@marvin.harmless.hu:/tank/rpi3/crochet/work/obj/arm64.aarch64/tank/rpi3/src/sys/AEGIR 
>> arm64
> So far, a few additions:
> Time is synced between the NFS server and the client.
> it's an open() call which is getting the kill, and it's not the file 
> what's being opened, but the process executing it.
> Here's a simple code that reproduces it:
> #include <stdio.h>
>
> int main() {
>
>   FILE *f = fopen ("/bar", "w");
>
>   fclose(f);
>   return 0;
> }
>
> Conditions to reproduce it:
>  - The resulting binary must be executed from the nfs mount
>  - The binary must be built after mounting the NFS share.
>
> I haven't tried building it on a different host, I don't have access 
> to multiple RPis. Also, if I build the binary, umount/remount the NFS 
> mount point, which has the binary, execute it, then it works.
>
> I've also tried this with the raspbsd.org's image, I could reproduce 
> it as well.
>
> Another interesting thing is, when I first booted the RPi up, the NFS 
> server was a 10.2-STABLE, and later got updated to 11-STABLE. While it 
> was 10.2 I've tried to build some port, and I don't remember having 
> this issue.
>
> So, could someone please help me figure this out and fix it? This 
> stuff should work pretty much.
>
So, this error message comes from here:
https://svnweb.freebsd.org/base/head/sys/fs/nfsclient/nfs_clbio.c?revision=314436&view=markup#l1674

It's the NFS_TIMESPEC_COMPARE(&np->n_mtime, &np->n_vattr.na_mtime) 
comparision that fails, np should be the NFS node structure, from the 
vnode's v_data, and n_vattr is the attribute cache. As I've seen these 
two are being updated together, so I don't really see by the code why 
they might differ. Could someone please take a look at it, with more 
experience in the NFS code? -czg



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170309175948.R1327>