From owner-freebsd-stable@FreeBSD.ORG Thu Apr 25 17:57:30 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 6F92B9FE for ; Thu, 25 Apr 2013 17:57:30 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta01.emeryville.ca.mail.comcast.net (qmta01.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:16]) by mx1.freebsd.org (Postfix) with ESMTP id 528C6116D for ; Thu, 25 Apr 2013 17:57:30 +0000 (UTC) Received: from omta10.emeryville.ca.mail.comcast.net ([76.96.30.28]) by qmta01.emeryville.ca.mail.comcast.net with comcast id UESX1l0010cQ2SLA1HxWWt; Thu, 25 Apr 2013 17:57:30 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta10.emeryville.ca.mail.comcast.net with comcast id UHxV1l00L1t3BNj8WHxVBY; Thu, 25 Apr 2013 17:57:29 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 3777B73A1B; Thu, 25 Apr 2013 10:57:29 -0700 (PDT) Date: Thu, 25 Apr 2013 10:57:29 -0700 From: Jeremy Chadwick To: Guy Helmer Subject: Re: FreeBSD 9: fdisk -It crashes kernel Message-ID: <20130425175729.GA10142@icarus.home.lan> References: <80F41679-9C3A-4E61-8AAD-403410344C32@gmail.com> <20130425155818.GA8454@icarus.home.lan> <257D766B-A296-43CD-A2B9-5F70A95A07A2@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=unknown-8bit Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <257D766B-A296-43CD-A2B9-5F70A95A07A2@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1366912650; bh=VGZVTjcFNlpmxEv8KKDb2RwsZlkgzXlqDdYjm+JHZIo=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=quD1AhiUK5PMt71B0jXDjVSTjwNFCAo9Eyn2+qzufP6Uk3UvPF879K6CF6NNH9jah QT572FZ90IbeEnwUBDnZrbV3A7z/l+yRZtee9i4kEcMATX0WJpQzXnbzt50xFUMJ33 v4OCW+DaKFnlSJ4nwoEHLRfDm46Vwv8vuIatJgX74huK7j9hr/hGQMpEBY8sZv/+ZI aoS9403h1Gm0VYVhB9gQBtzpfImlyoE51a2dDFl5dloxA2PNbMckB4eagjbNr+Ja+P jk4gLMUgepsz9EswZ2JzLBNQ0srKs7jqfmLl08vJXs5Yilyz1nUUNCNCagfQK4T3aG I9tjmEJfYWI7g== Cc: FreeBSD Stable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Apr 2013 17:57:30 -0000 On Thu, Apr 25, 2013 at 11:58:42AM -0500, Guy Helmer wrote: > On Apr 25, 2013, at 10:58 AM, Jeremy Chadwick wrote: > > > On Thu, Apr 25, 2013 at 09:06:49AM -0500, Guy Helmer wrote: > >> Encountered a surprise when my disk resizing rc.d script caused FreeBSD 9.1-STABLE to crash. I used "fdisk -It ada0" to determine what the available size of the disk (which happened to be the root disk), and on FreeBSD 9.1 the kernel comes crashing down: > >> > >> + fdisk -It ada0 > >> + /rescue/sed -En 's,.*start ([0-9]+).*size ([0-9]+).*,\1 + \2,p' > >> vnode_pager_getpages: I/O read error > >> vm_fault: pager read error, pid 65 (fdisk) > >> pid 65 (fdisk), uid 0: exited on signal 11 > >> eval: arithmetic expression: expecting primary: "" > >> Entropy harvesting: point_to_pointeval: date: Device not configured > >> eval: df: Device not configured > >> eval: dmesg: Device not configured > >> cat: /bin/ls: Device not configured > >> kickstart. > >> eval: cannot open /etc/fstab: Device not configured > >> eval: cannot open /etc/fstab: Device not configured > >> eval: swapon: Device not configured > >> Warning! No /etc/fstab: skipping disk checks > >> fstab: /etc/fstab:0: Device not configured > >> > >> Fatal trap 12: page fault while in kernel mode > >> cpuid = 1; apic id = 01 > >> fault virtual address = 0x0 > >> fault code = supervisor read, page not present > >> instruction pointer = 0x20:0xc0825fc4 > >> stack pointer = 0x28:0xc5a088c8 > >> frame pointer = 0x28:0xc5a08914 > >> code segment = base 0x0, limit 0xfffff, type 0x1b > >> = DLP 0, pres 1, def32 1, gran 1 > >> processor eflags = interrupt enabled, resume, IOPL = 0 > >> current process = 91 (mount) > >> [ thread pid 91 tid 100056 ] > >> Stopped at g_access+0x24: mlvl 0(%ebx),%eax > >> db> where > >> Tracing pid 91 tid 100056 td 0xc84c42f0 > >> g_access(c8481d34,0,1,1,0,…) at g_access+0x24/frame 0xc5a08914 > >> ffs_mount(c8481d34,c0d78380,2,c5a08c00,c829ae6c,…) af ffs_mount+0xf74/frame 0xc5a08a34 > >> vfs_donmount(c84c42f0,10000,0,c84cf200,c84cf200,…) at vfs_donmount+0x1423/frame 0xc5a08c24 > >> sys_nmount(c84c42f0,c5a08ccc,c5a08cc4,1010006,c5a08d08,…) at sys_nmount+0x7f/frame 0xc5a08c48 > >> syscall(c5a08d08) at syscall+0x443/frame 0xc508cfc > >> Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xc5a08cfc > >> --- syscall (378, FreeBSD ELF32, sys_nmount), eip = 0x480d5feb, esp = 0xbfbfce1c, ebp = 0xbfbfd378 --- > >> > >> I'll fix my script to not do this, but it seems odd that fdisk -It can make the disk "go away". > > > > Please provide a full, unmodified copy of your script. > > > > What's confusing to me is that after your sed call (which I don't even > > understand, because it doesn't appear to be operating on anything except > > stdin/stdout, and we don't know what that is -- again, show the script), > > the kernel starts outputting indications that the root disk/filesystem > > or its related metadata disappeared: > > > >> vnode_pager_getpages: I/O read error > >> vm_fault: pager read error, pid 65 (fdisk) > >> pid 65 (fdisk), uid 0: exited on signal 11 > > > > Except the kernel stack trace indicates something called sys_nmount(), > > which called vfs_donmount(), which called ffs_mount(), which calls > > g_access(). All of those scream to me "someone tried to mount > > something". fdisk does not do mounting. > > Right, which is why I copied the entire screen output -- it appears to me that the rc scripts had stumbled on until the kernel panicked. > > > > > fdisk also shouldn't be writing to LBA 0 (the MBR) if you used -I -t. > > I've been staring at fdisk.c for about 20 minutes now and I can't work > > out a situation where -I -t would cause the MBR to be rewritten > > actively. > > > > The only GEOM calls I see in fdisk.c that would get called are > > g_device_path(), g_open(), and g_close(). Actual device I/O uses read() > > and write() (only in write_s0() which shouldn't be called). > > > > Furthermore, GEOM has foot-shooting-prevention mechanisms in place (I'm > > talking about kern.geom.debugflags) to keep LBA 0 from being modified. > > Is your script setting that sysctl to 16/0x10 blindly? Ahem. > > No. The script is intended only to work for drives other than the one containing the boot partition. > > > > > It would also help if you could state exactly what 9.1-STABLE source > > you're using; if using svn provide revision (rXXXXXX), else provide > > uname -a output. > > rev 249788 > > > > > Finally: I would suggest using gpart(8) instead going forward. This is > > a separate recommendation though; if somehow I'm overlooking something > > in fdisk.c where writes to LBA 0 really do happen, then that needs to > > get fixed. But gpart(8) is what you should use in general these days > > anyway. > > > > Seems like gpart was giving me some frustration with earlier versions of FreeBSD (7, I think) so I went with fdisk instead. Might work OK now... > > I have included the full script below. > > { snipping for brevity; for reference, see this url: } > { http://lists.freebsd.org/pipermail/freebsd-stable/2013-April/073234.html } Thanks for this. I could practically write a book on what's going on here. Rather than me spend hours of time reverse-engineering this, you're going to need to step up to the plate and see if you can figure out what exactly triggers the issue. I will give you this analysis about fdisk -I -t: When -I is specified, I_flag=1. When -t is specified, v_flag=1, and also v_flag=1. Function open_disk(), when fdisk is used with the -I option, will call g_open() with the read-write flag set to 1. Whether or not this succeeds I don't know (and if it fails, but only with EPERM, then it retries in read-only mode silently). The -I flag correlates with the I_flag variable (do not confuse this with i_flag): 726 static int 727 open_disk(int flag) 728 { 729 int rwmode; 730 731 /* Write mode if one of these flags are set. */ 732 rwmode = (a_flag || I_flag || B_flag || flag); 733 fd = g_open(disk, rwmode); 734 /* If the mode fails, try read-only if we didn't. */ 735 if (fd == -1 && errno == EPERM && rwmode) 736 fd = g_open(disk, 0); 737 if (fd == -1 && errno == ENXIO) 738 return -2; 739 if (fd == -1) { 740 warnx("can't open device %s", disk); 741 return -1; 742 } 743 if (get_params() == -1) { 744 warnx("can't get disk parameters on %s", disk); 745 return -1; 746 } 747 return fd; 748 } Variable fd is global. After this call to open_disk(), read_disk() is used, but that's only doing read operations on fd. After this, the if (I_flag) code gets run. This calls read_s0(), reset_boot() (sounds ominous but isn't), and dos(). read_s0() does not issue any write I/O to fd, or call any functions that issue write I/O. reset_boot() just resets the in-memory-copy of the partition table. It does not modify anything on disk. dos() does not do any I/O at all. At this point, if v_flag is set (which it is), print_s0() gets run. print_s0() calls print_params(), which simply prints out the in-memory-copy of C/H/S from the disk label and so on. No file I/O is done. Once that's done, it calls print_part() on each partition, which just outputs all the details -- again, no file I/O is done. Finally, at this stage, if t_flag ISN'T set, then write_s0() gets run. In this case write_s0() does not get called because t_flag=1. FYI, write_s0() is what does the actual write I/O to LBA 0/MBR. After that, exit(0) is called. So even though -I -t calls g_open() with the read-write flag set, I don't see anything that indicates writing to LBA 0/MBR happens. So I do not see how fdisk -I -t could cause this situation. fdisk -v, maybe, but again, you'll need to do the testing. Now I have a question for you: how did you manage to get this output? > >> + fdisk -It ada0 > >> + /rescue/sed -En 's,.*start ([0-9]+).*size ([0-9]+).*,\1 + \2,p' Because this looks like /bin/sh -x output, but I need to know if that's the case or not. /bin/sh -x claims to echo commands to stderr ***before*** they're executed. So I'm then left wondering why we don't see output that equates to the equivalent of this line: eval $(fdisk -v $DISK | $SED -En 's,.*start ([0-9]+).*size ([0-9]+).*,curroff=\1 currsize=\2,p') Instead, we start seeing this: > >> vnode_pager_getpages: I/O read error > >> vm_fault: pager read error, pid 65 (fdisk) > >> pid 65 (fdisk), uid 0: exited on signal 11 > >> eval: arithmetic expression: expecting primary: "" >> Entropy harvesting: point_to_pointeval: date: Device not configured >> eval: df: Device not configured >> eval: dmesg: Device not configured >> cat: /bin/ls: Device not configured Your script has only 1 eval statement (and eval is very very dangerous. I cannot stress this enough. If you ever think you need eval in shell scripts, you probably don't.) Your script does not call df, dmesg, date, or /bin/ls. So why are these mentioned? And "Entropy harvesting" comes from dmesg/the kernel message buffer too, how is that ending up there? Possibly the eval: error line only gets output by sh ***after*** all the preceding [broken] stuff gets run. But I'm also confused, because there isn't anything arithmetic-oriented in your eval line, so why is it talking about arithmetic expressions? You don't use expr either, so the only math operation comes BEFORE all of that, specifically here: physsize=$(($(fdisk -It $DISK | $SED -En 's,.*start ([0-9]+).*size ([0-9]+).*,\1 + \2,p'))) My gut feeling here is that something "unexpected" happened and your script went totally haywire as a result (probably some unexpected output that got turned into something you didn't expect). My favourite is seeing asterisk/wildcards expanded to pull in all the filenames in $cwd. I'm sorry to tell you, but there is a point when writing shell scripts becomes unreliable/unmanageable/results in too much risk, and is time to consider writing such things in an actual programming language (preferably one without reliance on CLI tools, but real APIs). I know you don't need to hear that right now, but it's true. See if you can work out exactly what line begins causing problems for you. My guess is that it's the result of fdisk segfaulting, but I'm honestly not sure because the above output doesn't make entire sense. Let us know what you determine/find out. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |