Date: Thu, 9 Mar 2017 19:28:58 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Julian Elischer <julian@freebsd.org> Cc: Toomas Soome <tsoome@me.com>, Lawrence Stewart <lstewart@freebsd.org>, freebsd-fs@freebsd.org, Toomas Soome <tsoome@freebsd.org>, Andriy Gapon <avg@freebsd.org>, allanjude@freebsd.org Subject: Re: svn commit: r308089 - in head Message-ID: <20170309175948.R1327@besplex.bde.org> In-Reply-To: <4a498b08-7417-e7b1-2e5d-b0dbe5f3c49a@freebsd.org> References: <201610291409.u9TE9WXJ020650@repo.freebsd.org> <c4cc03d0-d26e-f7c0-8399-d65f2aa0c5ef@freebsd.org> <CCB18F77-A9C3-4D22-82A3-9DD84DF783F9@me.com> <9f0b2f93-04b8-b90b-3cb5-13b8539b9171@freebsd.org> <814E1C65-23E3-42A1-8093-8008DF188506@me.com> <4a498b08-7417-e7b1-2e5d-b0dbe5f3c49a@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 8 Mar 2017, Julian Elischer wrote: > On 7/3/17 4:48 pm, Toomas Soome wrote: >>> ... >> The problem is deeper, the idea behind the nextboot is that it is=20 >> attempting to provide recovery from failed boot, so if you set nextboot= =20 >> dataset, attempt to boot from it, you need to do 2 things: 1. detect the= =20 >> nextboot config, so you would actually be able to use it, and 2, you wan= t=20 >> to reset it as early as possible, because later you may not have a chanc= e. >>=20 >> So it means the gptzfsboot has to read out the config to know where from= it=20 >> has to load the zfsloader, and gptzfsboot has to reset the config, so th= at=20 >> if anything will go wrong, on next boot the fallback or =E2=80=9Cnormal= =E2=80=9D boot=20 >> will be done. Which means that either gptzfsboot has to know how to deal= =20 >> with geli in context of handling nextboot, or with geli, you just can no= t=20 >> use nextboot config. >>=20 >> The similar issue is with using boot block area in zfs pool label - to b= e=20 >> able to store and use gptzfsboot in pool label boot area, the boot1 eith= er=20 >> has to know how to read the geli, or geli must be able not to encrypt th= e=20 >> bootblock area, or we just can not use that area [with geli]. All in all= ,=20 >> it is another example of the chicken and the egg issue:) > > this is why the ORIGINAL nextboot in freebsd 3 (+ or -) wrote the data in= to=20 > block 1 of the drive and read it from boot0, and rewrote block 1 after=20 > zeroing out teh entry. > All using bios calls. > 1/ read and remove ASAP, > 2 don't depend on the filesystem.. it may be dead, and that is why we ar= e=20 > redirecting somewhere else. I didn't like this method. Anything that writes to the disk increases fragility. Is still use my version of biosboot (updated for elf and EDD) and try not to see the nextboot code in it (I can't delete this code since it would then show up even more in diffs). My method (used mostly only interactively) is to depend on a filesystem for boot2 and things loaded by boot2. It is easy to maintain such a file system somewhere (perhaps on removable write-only media), but can be hard to find it or ensure that it is the one booted (this often bites me when bringing up a new system from a USB drive; first it can be hard to boot from USB, and then I forgot how the standard boot0 misbehaves and hit F5 which tends to switch to a Windows drive with an unusable MBR on it). Once a known-good partition (with a good file system including boot2 or even loader on it) is found. It is easy to control things by booting to the kernel on that. My method for cycling through kernels to run run benchmarks on them is to copy the next kernel to a standard place and boot to there. The kernel is selected by an index in a text file. Cycling is done by incrementing the index. I only do this for rebooting within a single partition, but could immediately use a variation that switches between i386 and amd64 partitions. To use this method for nextboot, the main cycle would have to be of length 2, to switch between a know-good partition and the next try on a not-known-good partition (after booting to the known-good partition, it is is easy to clean up the other partition and copy a hopefull-better kernel or even a whole filesystem to it). The index (of 0 or 1) for the main cycle can't be stored in the known-good filesystem since after a crash booting a bad partition there is no safe way to update the index there. On x86, there should be space reserved for OS use in CMOS, but unfortunately nothing is properly reserved there. I have though of using the alarm register[s] for this. The alarm register[s] are not normally used, and for cycles of nextbooting the OS could simply turn off alarms and use them. Add ECC to this and it becomes very safe to use them. At worst, another OS might boot in the middle of the cycle and change the alarm setting. Just boot to the known-good partition when ECC detects an error, and trust ECC to detect the unlikely interruption and change. Recently, I noticed that the entire msgbuf survives rebooting on amd64 systems with 16GB memory. The msgbuf is in high memory for amd64 and this seems to survive warm boots and is not clobbered by the BIOS. But the BIOS on the same system scribbles over almost all memory below 4GB. It leaves large portitions of the msbguf intact, but the msgbuf is protected by a stupid 32-bit checksum and always detects the clobbering. Change this to ECC and apply it to individual messages and we can recover most of the message buffer on i386, without expanding it much. Add redundancy/ECC to recover more of it. It also has atomic update problems that are best fix by redundancy and localized checksums which could be ECC exept that is a bit over-engineered. If we can robustly prreserve entire msgbufs across reboot, then it is trivial to preserve a single status bit for nextboot. Just not so easy to do ECC for this bit in 512 bytes in boot0. The memory bit could be for extra checking of the RTC registers, with simple checksums instead of ECC on both. > the current nextboot is not nearly as useful and needs to be replaced as = soon=20 > as possible as a failed experiment. > things we coudl do to improve nextboot functionality: > 1/ declare a partition type freebsd-bootinfo tha t is just raw boot info. Ugh, this is as bad as Windows using multiple precious (non-GPT) partitions for itself. My method needs a known-good partition, but this can be a FreeBSD partition. My methods would actually have to put the decisions int= o boot2, since there is not enough space in boot0 and avoid using loader: - boot0: boot to known-good slice, say ad4s3 - boot2: this actually lives at the start of the slice, not in a partition, so is easy to find. One reason I don't like loader is that it lives on a file system, so deferring decisions to it is not so robust. - boot2 has to find the right drive and slice. This is not so easy. The default of the first FreeBSD slice is wrong in some of my configurations= =2E This can be made more robust by hard-coding values in the boot2 binary. - boot2 then has to find the right partition. It almost always defaults to the 'a' partition, and loads /boot.config from there to possibly override the default. It is safest to always make the known-good partition 'a'. - before reading boot.config, boot2 does ECC on various places to find overrides. With nextboot inactive or the cycle-control value is 0, it reads boot.config from the known-good partition. Otherwise, it reads boot.config from somewhere else, say the the 'b' partition or just an alternative boot.config on the known-good partition. It ca contain anything, so the next try is not limited to 1 alernative. > 2/ store the info in a known place in the freebsd-zfs partition (what and= riy=20 > is doing I believe) > 3/ store it at the end of the freebsd-boot partition. > It should be read by gptzfsboot and set into the environment (what comes= =20 > earlier in a gpt system?) originally I read it using bios calls from boo= t0. > that was of course a UFS system on a dedicated drive. It is fundamentally insecure to allow booting from all over the place, especially when selected by simple binary values written to raw drives. My method actually works well to fix this. Better write the alternative boot.config to the same known-good partition as the non-alternative (this is only impossibly of the known-good partition is read-only). Then the simple binary value is not very robust, but all it does is select between the known-good boot.config and the alternative one. Security is attained by never pointing the alternative one to an insecure partition. Bruce From owner-freebsd-fs@freebsd.org Thu Mar 9 14:32:02 2017 Return-Path: <owner-freebsd-fs@freebsd.org> Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 27684D0437E; Thu, 9 Mar 2017 14:32:02 +0000 (UTC) (envelope-from gergely.czuczy@harmless.hu) Received: from marvin.harmless.hu (marvin.harmless.hu [195.56.55.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DC10A1568; Thu, 9 Mar 2017 14:32:01 +0000 (UTC) (envelope-from gergely.czuczy@harmless.hu) Received: from 178-164-145-249.pool.digikabel.hu ([178.164.145.249] helo=[10.219.16.1]) by marvin.harmless.hu with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.88 (FreeBSD)) (envelope-from <gergely.czuczy@harmless.hu>) id 1clz6j-000NiM-Le; Thu, 09 Mar 2017 14:31:57 +0000 Subject: Re: process killed: text file modification To: freebsd-current@freebsd.org, freebsd-fs@freebsd.org References: <d4d04499-17f8-e3d7-181f-c8ee8285e32b@harmless.hu> <646c1395-9482-b214-118c-01573243ae5a@harmless.hu> From: Gergely Czuczy <gergely.czuczy@harmless.hu> Message-ID: <45436522-77df-f894-0569-737a6a74958f@harmless.hu> Date: Thu, 9 Mar 2017 15:31:56 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <646c1395-9482-b214-118c-01573243ae5a@harmless.hu> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems <freebsd-fs.freebsd.org> List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>, <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe> List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/> List-Post: <mailto:freebsd-fs@freebsd.org> List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help> List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>, <mailto:freebsd-fs-request@freebsd.org?subject=subscribe> X-List-Received-Date: Thu, 09 Mar 2017 14:32:02 -0000 [+freebsd-fs] On 2017. 03. 09. 14:20, Gergely Czuczy wrote: > On 2017. 03. 09. 11:27, Gergely Czuczy wrote: >> Hello, >> >> I'm trying to build a few things from ports on an rpi3, the ports >> collection is mounted over NFS from another machine. When it's trying >> to build pkg i'm getting the error message in syslog: >> >> rpi3 kernel: pid 4451 (sh), uid 0, was killed: text file modification >> >> The report to pkg@: >> https://lists.freebsd.org/pipermail/freebsd-pkg/2017-March/002048.html >> >> In ports-mgmt/pkg's config.log It fails at the following entry: >> configure:3726: checking whether we are cross compiling >> configure:3734: cc -o conftest -O2 -pipe -Wno-error >> -fno-strict-aliasing conftest.c >&5 >> configure:3738: $? = 0 >> configure:3745: ./conftest >> configure:3749: $? = 137 >> configure:3756: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.0': >> configure:3760: error: cannot run C compiled programs. >> If you meant to cross compile, use `--host'. >> See `config.log' for more details >> >> # uname -a >> FreeBSD rpi3 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r314949: Thu Mar 9 >> 08:58:46 CET 2017 >> aegir@marvin.harmless.hu:/tank/rpi3/crochet/work/obj/arm64.aarch64/tank/rpi3/src/sys/AEGIR >> arm64 > So far, a few additions: > Time is synced between the NFS server and the client. > it's an open() call which is getting the kill, and it's not the file > what's being opened, but the process executing it. > Here's a simple code that reproduces it: > #include <stdio.h> > > int main() { > > FILE *f = fopen ("/bar", "w"); > > fclose(f); > return 0; > } > > Conditions to reproduce it: > - The resulting binary must be executed from the nfs mount > - The binary must be built after mounting the NFS share. > > I haven't tried building it on a different host, I don't have access > to multiple RPis. Also, if I build the binary, umount/remount the NFS > mount point, which has the binary, execute it, then it works. > > I've also tried this with the raspbsd.org's image, I could reproduce > it as well. > > Another interesting thing is, when I first booted the RPi up, the NFS > server was a 10.2-STABLE, and later got updated to 11-STABLE. While it > was 10.2 I've tried to build some port, and I don't remember having > this issue. > > So, could someone please help me figure this out and fix it? This > stuff should work pretty much. > So, this error message comes from here: https://svnweb.freebsd.org/base/head/sys/fs/nfsclient/nfs_clbio.c?revision=314436&view=markup#l1674 It's the NFS_TIMESPEC_COMPARE(&np->n_mtime, &np->n_vattr.na_mtime) comparision that fails, np should be the NFS node structure, from the vnode's v_data, and n_vattr is the attribute cache. As I've seen these two are being updated together, so I don't really see by the code why they might differ. Could someone please take a look at it, with more experience in the NFS code? -czg
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170309175948.R1327>