Date: Thu, 25 Apr 2013 13:49:24 -0500 From: Guy Helmer <guy.helmer@gmail.com> To: Jeremy Chadwick <jdc@koitsu.org> Cc: FreeBSD Stable <freebsd-stable@freebsd.org> Subject: Re: FreeBSD 9: fdisk -It crashes kernel Message-ID: <064339D8-0896-4C4E-89FA-A01E31B15E0C@gmail.com> In-Reply-To: <20130425175729.GA10142@icarus.home.lan> References: <80F41679-9C3A-4E61-8AAD-403410344C32@gmail.com> <20130425155818.GA8454@icarus.home.lan> <257D766B-A296-43CD-A2B9-5F70A95A07A2@gmail.com> <20130425175729.GA10142@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On Apr 25, 2013, at 12:57 PM, Jeremy Chadwick <jdc@koitsu.org> wrote: > On Thu, Apr 25, 2013 at 11:58:42AM -0500, Guy Helmer wrote: >> On Apr 25, 2013, at 10:58 AM, Jeremy Chadwick <jdc@koitsu.org> wrote: >>=20 >>> On Thu, Apr 25, 2013 at 09:06:49AM -0500, Guy Helmer wrote: >>>> Encountered a surprise when my disk resizing rc.d script caused = FreeBSD 9.1-STABLE to crash. I used "fdisk -It ada0" to determine what = the available size of the disk (which happened to be the root disk), and = on FreeBSD 9.1 the kernel comes crashing down: >>>>=20 >>>> + fdisk -It ada0 >>>> + /rescue/sed -En 's,.*start ([0-9]+).*size ([0-9]+).*,\1 + \2,p' >>>> vnode_pager_getpages: I/O read error >>>> vm_fault: pager read error, pid 65 (fdisk) >>>> pid 65 (fdisk), uid 0: exited on signal 11 >>>> eval: arithmetic expression: expecting primary: "" >>>> Entropy harvesting: point_to_pointeval: date: Device not configured >>>> eval: df: Device not configured >>>> eval: dmesg: Device not configured >>>> cat: /bin/ls: Device not configured >>>> kickstart. >>>> eval: cannot open /etc/fstab: Device not configured >>>> eval: cannot open /etc/fstab: Device not configured >>>> eval: swapon: Device not configured >>>> Warning! No /etc/fstab: skipping disk checks >>>> fstab: /etc/fstab:0: Device not configured >>>>=20 >>>> Fatal trap 12: page fault while in kernel mode >>>> cpuid =3D 1; apic id =3D 01 >>>> fault virtual address =3D 0x0 >>>> fault code =3D supervisor read, page not = present >>>> instruction pointer =3D 0x20:0xc0825fc4 >>>> stack pointer =3D 0x28:0xc5a088c8 >>>> frame pointer =3D 0x28:0xc5a08914 >>>> code segment =3D base 0x0, limit 0xfffff, type 0x1b >>>> =3D DLP 0, pres 1, def32 1, gran = 1 >>>> processor eflags =3D interrupt enabled, resume, IOPL =3D 0 >>>> current process =3D 91 (mount) >>>> [ thread pid 91 tid 100056 ] >>>> Stopped at g_access+0x24: mlvl 0(%ebx),%eax >>>> db> where >>>> Tracing pid 91 tid 100056 td 0xc84c42f0 >>>> g_access(c8481d34,0,1,1,0,=85) at g_access+0x24/frame 0xc5a08914 >>>> ffs_mount(c8481d34,c0d78380,2,c5a08c00,c829ae6c,=85) af = ffs_mount+0xf74/frame 0xc5a08a34 >>>> vfs_donmount(c84c42f0,10000,0,c84cf200,c84cf200,=85) at = vfs_donmount+0x1423/frame 0xc5a08c24 >>>> sys_nmount(c84c42f0,c5a08ccc,c5a08cc4,1010006,c5a08d08,=85) at = sys_nmount+0x7f/frame 0xc5a08c48 >>>> syscall(c5a08d08) at syscall+0x443/frame 0xc508cfc >>>> Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xc5a08cfc >>>> --- syscall (378, FreeBSD ELF32, sys_nmount), eip =3D 0x480d5feb, = esp =3D 0xbfbfce1c, ebp =3D 0xbfbfd378 --- >>>>=20 >>>> I'll fix my script to not do this, but it seems odd that fdisk -It = can make the disk "go away". >>>=20 >>> Please provide a full, unmodified copy of your script. >>>=20 >>> What's confusing to me is that after your sed call (which I don't = even >>> understand, because it doesn't appear to be operating on anything = except >>> stdin/stdout, and we don't know what that is -- again, show the = script), >>> the kernel starts outputting indications that the root = disk/filesystem >>> or its related metadata disappeared: >>>=20 >>>> vnode_pager_getpages: I/O read error >>>> vm_fault: pager read error, pid 65 (fdisk) >>>> pid 65 (fdisk), uid 0: exited on signal 11 >>>=20 >>> Except the kernel stack trace indicates something called = sys_nmount(), >>> which called vfs_donmount(), which called ffs_mount(), which calls >>> g_access(). All of those scream to me "someone tried to mount >>> something". fdisk does not do mounting. >>=20 >> Right, which is why I copied the entire screen output -- it appears = to me that the rc scripts had stumbled on until the kernel panicked. >>=20 >>>=20 >>> fdisk also shouldn't be writing to LBA 0 (the MBR) if you used -I = -t. >>> I've been staring at fdisk.c for about 20 minutes now and I can't = work >>> out a situation where -I -t would cause the MBR to be rewritten >>> actively. >>>=20 >>> The only GEOM calls I see in fdisk.c that would get called are >>> g_device_path(), g_open(), and g_close(). Actual device I/O uses = read() >>> and write() (only in write_s0() which shouldn't be called). >>>=20 >>> Furthermore, GEOM has foot-shooting-prevention mechanisms in place = (I'm >>> talking about kern.geom.debugflags) to keep LBA 0 from being = modified. >>> Is your script setting that sysctl to 16/0x10 blindly? Ahem. >>=20 >> No. The script is intended only to work for drives other than the one = containing the boot partition. >>=20 >>>=20 >>> It would also help if you could state exactly what 9.1-STABLE source >>> you're using; if using svn provide revision (rXXXXXX), else provide >>> uname -a output. >>=20 >> rev 249788 >>=20 >>>=20 >>> Finally: I would suggest using gpart(8) instead going forward. This = is >>> a separate recommendation though; if somehow I'm overlooking = something >>> in fdisk.c where writes to LBA 0 really do happen, then that needs = to >>> get fixed. But gpart(8) is what you should use in general these = days >>> anyway. >>>=20 >>=20 >> Seems like gpart was giving me some frustration with earlier versions = of FreeBSD (7, I think) so I went with fdisk instead. Might work OK = now... >>=20 >> I have included the full script below. >>=20 >> { snipping for brevity; for reference, see this url: } >> { = http://lists.freebsd.org/pipermail/freebsd-stable/2013-April/073234.html = } >=20 > Thanks for this. >=20 > I could practically write a book on what's going on here. Rather than > me spend hours of time reverse-engineering this, you're going to need = to > step up to the plate and see if you can figure out what exactly = triggers > the issue. >=20 > I will give you this analysis about fdisk -I -t: >=20 > When -I is specified, I_flag=3D1. >=20 > When -t is specified, v_flag=3D1, and also v_flag=3D1. >=20 > Function open_disk(), when fdisk is used with the -I option, will call > g_open() with the read-write flag set to 1. Whether or not this > succeeds I don't know (and if it fails, but only with EPERM, then it > retries in read-only mode silently). The -I flag correlates with the > I_flag variable (do not confuse this with i_flag): >=20 > 726 static int > 727 open_disk(int flag) > 728 { > 729 int rwmode; > 730 > 731 /* Write mode if one of these flags are set. */ > 732 rwmode =3D (a_flag || I_flag || B_flag || flag); > 733 fd =3D g_open(disk, rwmode); > 734 /* If the mode fails, try read-only if we didn't. */ > 735 if (fd =3D=3D -1 && errno =3D=3D EPERM && rwmode) > 736 fd =3D g_open(disk, 0); > 737 if (fd =3D=3D -1 && errno =3D=3D ENXIO) > 738 return -2; > 739 if (fd =3D=3D -1) { > 740 warnx("can't open device %s", disk); > 741 return -1; > 742 } > 743 if (get_params() =3D=3D -1) { > 744 warnx("can't get disk parameters on %s", disk); > 745 return -1; > 746 } > 747 return fd; > 748 } >=20 > Variable fd is global. >=20 > After this call to open_disk(), read_disk() is used, but that's only > doing read operations on fd. >=20 > After this, the if (I_flag) code gets run. This calls read_s0(), > reset_boot() (sounds ominous but isn't), and dos(). >=20 > read_s0() does not issue any write I/O to fd, or call any functions = that > issue write I/O. >=20 > reset_boot() just resets the in-memory-copy of the partition table. > It does not modify anything on disk. >=20 > dos() does not do any I/O at all. >=20 > At this point, if v_flag is set (which it is), print_s0() gets run. >=20 > print_s0() calls print_params(), which simply prints out the > in-memory-copy of C/H/S from the disk label and so on. No file I/O is > done. Once that's done, it calls print_part() on each partition, > which just outputs all the details -- again, no file I/O is done. >=20 > Finally, at this stage, if t_flag ISN'T set, then write_s0() gets run. > In this case write_s0() does not get called because t_flag=3D1. FYI, > write_s0() is what does the actual write I/O to LBA 0/MBR. After = that, > exit(0) is called. >=20 > So even though -I -t calls g_open() with the read-write flag set, I > don't see anything that indicates writing to LBA 0/MBR happens. >=20 > So I do not see how fdisk -I -t could cause this situation. >=20 > fdisk -v, maybe, but again, you'll need to do the testing. Thanks for the analysis.=20 > Now I have a question for you: how did you manage to get this output? >=20 >>>> + fdisk -It ada0 >>>> + /rescue/sed -En 's,.*start ([0-9]+).*size ([0-9]+).*,\1 + \2,p' >=20 > Because this looks like /bin/sh -x output, but I need to know if = that's > the case or not. Yes, I inserted "set -x" at the top of the function to get this output. > /bin/sh -x claims to echo commands to stderr ***before*** they're > executed. >=20 > So I'm then left wondering why we don't see output that equates to the > equivalent of this line: >=20 > eval $(fdisk -v $DISK | $SED -En 's,.*start ([0-9]+).*size = ([0-9]+).*,curroff=3D\1 currsize=3D\2,p') >=20 > Instead, we start seeing this: >=20 >>>> vnode_pager_getpages: I/O read error >>>> vm_fault: pager read error, pid 65 (fdisk) >>>> pid 65 (fdisk), uid 0: exited on signal 11 >>>> eval: arithmetic expression: expecting primary: "" >>> Entropy harvesting: point_to_pointeval: date: Device not configured >>> eval: df: Device not configured >>> eval: dmesg: Device not configured >>> cat: /bin/ls: Device not configured >=20 > Your script has only 1 eval statement (and eval is very very = dangerous. > I cannot stress this enough. If you ever think you need eval in shell > scripts, you probably don't.) >=20 > Your script does not call df, dmesg, date, or /bin/ls. So why are = these > mentioned? And "Entropy harvesting" comes from dmesg/the kernel = message > buffer too, how is that ending up there? Hmm, maybe the quoting inside the $() is not working the way I expected. = Perhaps the * symbols are being expanded as wildcards by the shell as a = result of the eval. > Possibly the eval: error line only gets output by sh ***after*** all = the > preceding [broken] stuff gets run. >=20 > But I'm also confused, because there isn't anything = arithmetic-oriented > in your eval line, so why is it talking about arithmetic expressions? > You don't use expr either, so the only math operation comes BEFORE all > of that, specifically here: >=20 > physsize=3D$(($(fdisk -It $DISK | $SED -En 's,.*start = ([0-9]+).*size ([0-9]+).*,\1 + \2,p'))) >=20 > My gut feeling here is that something "unexpected" happened and your > script went totally haywire as a result (probably some unexpected = output > that got turned into something you didn't expect). My favourite is > seeing asterisk/wildcards expanded to pull in all the filenames in = $cwd. >=20 > I'm sorry to tell you, but there is a point when writing shell scripts > becomes unreliable/unmanageable/results in too much risk, and is time = to > consider writing such things in an actual programming language > (preferably one without reliance on CLI tools, but real APIs). I know > you don't need to hear that right now, but it's true. There are few tools available at the time this code needs to run -- note = the use of sed from /rescue as an example. The only alternative would = probably be C code. Still seems odd that userland code could crash the = kernel this way. > See if you can work out exactly what line begins causing problems for > you. My guess is that it's the result of fdisk segfaulting, but I'm > honestly not sure because the above output doesn't make entire sense. >=20 > Let us know what you determine/find out. Thanks for your time, Jeremy. Guy
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?064339D8-0896-4C4E-89FA-A01E31B15E0C>