Date: Wed, 1 Jun 2011 02:56:10 -0700 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Holger Kipp <Holger.Kipp@alogis.com> Cc: "stable@freebsd.org" <stable@freebsd.org>, "mav@freebsd.org" <mav@freebsd.org> Subject: Re: 8-STABLE won't boot with ZFSv28 Message-ID: <20110601095610.GA20255@icarus.home.lan> In-Reply-To: <814C9E9472FDCC40AAC3FC95A2D67E3B0BD88DC0@msx3.exchange.alogis.com> References: <814C9E9472FDCC40AAC3FC95A2D67E3B0BD88C69@msx3.exchange.alogis.com> <20110601085454.GA19434@icarus.home.lan> <814C9E9472FDCC40AAC3FC95A2D67E3B0BD88DC0@msx3.exchange.alogis.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jun 01, 2011 at 09:26:05AM +0000, Holger Kipp wrote: > Jeremy Chadwick [freebsd@jdc.parodius.com] wrote on 01 June 2011 10:54 > > >On Wed, Jun 01, 2011 at 08:23:19AM +0000, Holger Kipp wrote: > >> I have a very irritating problem with 8-STABLE and ZFSv28 > >> > >> I upgraded to 8-STABLE as of yesterday (31.05.2011), > >> downloaded stable-8-zfsv28-20110521.patch.xz > >> and applied the patch using > >> > >> cd /usr/src > >> patch -E -p0 < /path/to/patchfile > >> make buildworld > >> make buildkernel KERNCONF=foo > >> make installkernel KERNCONF=foo > >> make installworld > >> mergemaster > >> > >> which all went smoothly. > >> > >> After reboot, I only got > >> unknown: WARNING - ATAPI_IDENTITFY requeued due to channel reset LBA=0 > >> all the time, and then after an hour or so (wasn't on site), > >> system gave > >> Fatal trap 12: page fault while in kernel mode > >> cupid - 0; apic id = 00 > >> fault virtual address = 0x8 > >> fault code = supervisor read data, page not present > >> instruction pointer = 0x20:0xffffffff80252301 > >> stack poiner = 0x28:0xffffff80000a7ac0 > >> frame pointer = 0x28:0xffffff80000a7b00 > >> code segment = base 0x0, limit 0xfffff, type 0x1b > >> = DPL 0, pres1, long 1, def32 0, gran 1 > >> processor eflags = interrupt enabled, resume, IOPL = 0 > >> current process = 0 (thread taskq)trap number = 12 > >> panic: page fault > >> cpuid = 0 > >> Uptime: 1h0m13s > >> Cannot dump. Device not defined or unavailable. > >> Automatic reboot in 15 seconds - press a key on the console to abort > >> > >> Needless to say the system did not reboot. Had to powercycle. > >> > >> Then always got the > >> unknown: WARNING - ATAPI_IDENTITFY requeued due to channel reset LBA=0 > >> error about once per second. > >> > >> Have now used a fixit-disk to change back to the old kernel: > >> FreeBSD 8.2-STABLE #12: Mon Apr 18 12:48:56 CEST 2011 > >> and rebootet. > >> Now zfs claims to be v28, current storage pool is at 15.I'd love to > >> try ZFSv28, but with the old kernel I don't think > >> this is a good idea - but with the new kernel it seems I can't > >> even boot properly. > >> Any suggestions as to how to proceed? > > > I think this is much more likely related to an ATA/ATAPI-related change > > that was committed on April 17th recently and is not related to ZFSv28. > > Please see this thread: > > > > * 2011/05/29 -- ICH9 panic/instability on recent kernel > > http://lists.freebsd.org/pipermail/freebsd-stable/2011-May/thread.html#62804 > > > > Holger, can you please provide the following two things? > > > > 1) Output from "pciconf -lvcb". > > That's an easy one: > > hostb0@pci0:0:0:0: class=0x060000 card=0xd28015d9 chip=0x29f08086 rev=0x01 hdr=0x00 > vendor = 'Intel Corporation' > device = '3200 Chipset (Bearlake) Processor to I/O Controller' > class = bridge > subclass = HOST-PCI > cap 09[e0] = vendor (length 12) Intel cap 9 version 1 > pcib1@pci0:0:1:0: class=0x060400 card=0xd28015d9 chip=0x29f18086 rev=0x01 hdr=0x01 > vendor = 'Intel Corporation' > device = '3200 Chipset (Bearlake) PCIe Root Port 1' > class = bridge > subclass = PCI-PCI > cap 0d[88] = PCI Bridge card=0xd28015d9 > cap 01[80] = powerspec 3 supports D0 D3 current D0 > cap 05[90] = MSI supports 1 message > cap 10[a0] = PCI-Express 2 root port max data 128(128) link x8(x16) > ecap 0002[100] = VC 1 max VC0 > ecap 0005[140] = unknown 1 > uhci0@pci0:0:26:0: class=0x0c0300 card=0xd28015d9 chip=0x29378086 rev=0x02 hdr=0x00 > vendor = 'Intel Corporation' > device = '82801IB/IR/IH (ICH9 Family) USB Universal Host Controller' > class = serial bus > subclass = USB > bar [20] = type I/O Port, range 32, base 0x1820, size 32, enabled > cap 13[50] = PCI Advanced Features: FLR TP > uhci1@pci0:0:26:1: class=0x0c0300 card=0xd28015d9 chip=0x29388086 rev=0x02 hdr=0x00 > vendor = 'Intel Corporation' > device = '82801IB/IR/IH (ICH9 Family) USB Universal Host Controller' > class = serial bus > subclass = USB > bar [20] = type I/O Port, range 32, base 0x1840, size 32, enabled > cap 13[50] = PCI Advanced Features: FLR TP > uhci2@pci0:0:26:2: class=0x0c0300 card=0xd28015d9 chip=0x29398086 rev=0x02 hdr=0x00 > vendor = 'Intel Corporation' > device = '82801IB/IR/IH (ICH9 Family) USB Universal Host Controller' > class = serial bus > subclass = USB > bar [20] = type I/O Port, range 32, base 0x1860, size 32, enabled > cap 13[50] = PCI Advanced Features: FLR TP > ehci0@pci0:0:26:7: class=0x0c0320 card=0xd28015d9 chip=0x293c8086 rev=0x02 hdr=0x00 > vendor = 'Intel Corporation' > device = '82801IB/IR/IH (ICH9 Family) USB2 Enhanced Host Controller' > class = serial bus > subclass = USB > bar [10] = type Memory, range 32, base 0xd9001000, size 1024, enabled > cap 01[50] = powerspec 2 supports D0 D3 current D0 > cap 0a[58] = EHCI Debug Port at offset 0xa0 in map 0x14 > cap 13[98] = PCI Advanced Features: FLR TP > pcib4@pci0:0:28:0: class=0x060400 card=0xd28015d9 chip=0x29408086 rev=0x02 hdr=0x01 > vendor = 'Intel Corporation' > device = '82801IB/IR/IH (ICH9 Family) PCIe Root Port 1' > class = bridge > subclass = PCI-PCI > cap 10[40] = PCI-Express 1 root port max data 128(128) link x4(x4) > cap 05[80] = MSI supports 1 message > cap 0d[90] = PCI Bridge card=0xd28015d9 > cap 01[a0] = powerspec 2 supports D0 D3 current D0 > ecap 0002[100] = VC 1 max VC0 > ecap 0005[180] = unknown 1 > pcib5@pci0:0:28:4: class=0x060400 card=0xd28015d9 chip=0x29488086 rev=0x02 hdr=0x01 > vendor = 'Intel Corporation' > device = '82801IB/IR/IH (ICH9 Family) PCIe Root Port 5' > class = bridge > subclass = PCI-PCI > cap 10[40] = PCI-Express 1 root port max data 128(128) link x1(x1) > cap 05[80] = MSI supports 1 message > cap 0d[90] = PCI Bridge card=0xd28015d9 > cap 01[a0] = powerspec 2 supports D0 D3 current D0 > ecap 0002[100] = VC 1 max VC0 > ecap 0005[180] = unknown 1 > pcib6@pci0:0:28:5: class=0x060400 card=0xd28015d9 chip=0x294a8086 rev=0x02 hdr=0x01 > vendor = 'Intel Corporation' > device = '82801IB/IR/IH (ICH9 Family) PCIe Root Port 6' > class = bridge > subclass = PCI-PCI > cap 10[40] = PCI-Express 1 root port max data 128(128) link x1(x1) > cap 05[80] = MSI supports 1 message > cap 0d[90] = PCI Bridge card=0xd28015d9 > cap 01[a0] = powerspec 2 supports D0 D3 current D0 > ecap 0002[100] = VC 1 max VC0 > ecap 0005[180] = unknown 1 > uhci3@pci0:0:29:0: class=0x0c0300 card=0xd28015d9 chip=0x29348086 rev=0x02 hdr=0x00 > vendor = 'Intel Corporation' > device = '82801IB/IR/IH (ICH9 Family) USB Universal Host Controller' > class = serial bus > subclass = USB > bar [20] = type I/O Port, range 32, base 0x1880, size 32, enabled > cap 13[50] = PCI Advanced Features: FLR TP > uhci4@pci0:0:29:1: class=0x0c0300 card=0xd28015d9 chip=0x29358086 rev=0x02 hdr=0x00 > vendor = 'Intel Corporation' > device = '82801IB/IR/IH (ICH9 Family) USB Universal Host Controller' > class = serial bus > subclass = USB > bar [20] = type I/O Port, range 32, base 0x18a0, size 32, enabled > cap 13[50] = PCI Advanced Features: FLR TP > uhci5@pci0:0:29:2: class=0x0c0300 card=0xd28015d9 chip=0x29368086 rev=0x02 hdr=0x00 > vendor = 'Intel Corporation' > device = '82801IB/IR/IH (ICH9 Family) USB Universal Host Controller' > class = serial bus > subclass = USB > bar [20] = type I/O Port, range 32, base 0x18c0, size 32, enabled > cap 13[50] = PCI Advanced Features: FLR TP > ehci1@pci0:0:29:7: class=0x0c0320 card=0xd28015d9 chip=0x293a8086 rev=0x02 hdr=0x00 > vendor = 'Intel Corporation' > device = '82801IB/IR/IH (ICH9 Family) USB2 Enhanced Host Controller' > class = serial bus > subclass = USB > bar [10] = type Memory, range 32, base 0xd9001400, size 1024, enabled > cap 01[50] = powerspec 2 supports D0 D3 current D0 > cap 0a[58] = EHCI Debug Port at offset 0xa0 in map 0x14 > cap 13[98] = PCI Advanced Features: FLR TP > pcib7@pci0:0:30:0: class=0x060401 card=0xd28015d9 chip=0x244e8086 rev=0x92 hdr=0x01 > vendor = 'Intel Corporation' > device = '82801 Family (ICH2/3/4/5/6/7/8/9,63xxESB) Hub Interface to PCI Bridge' > class = bridge > subclass = PCI-PCI > cap 0d[50] = PCI Bridge card=0xd28015d9 > isab0@pci0:0:31:0: class=0x060100 card=0xd28015d9 chip=0x29168086 rev=0x02 hdr=0x00 > vendor = 'Intel Corporation' > device = '82801IR (ICH9R) LPC Interface Controller' > class = bridge > subclass = PCI-ISA > cap 09[e0] = vendor (length 12) Intel cap 1 version 0 > features: SATA RAID-5, 4 PCI-e x1 slots > atapci0@pci0:0:31:2: class=0x01018a card=0xd28015d9 chip=0x29208086 rev=0x02 hdr=0x00 > vendor = 'Intel Corporation' > device = '82801IB/IR/IH (ICH9 Family) 4 port Serial ATA Storage Controller 1' > class = mass storage > subclass = ATA > bar [10] = type I/O Port, range 32, base 0x1f0, size 8, enabled > bar [14] = type I/O Port, range 32, base 0x3f4, size 1, enabled > bar [18] = type I/O Port, range 32, base 0x170, size 8, enabled > bar [1c] = type I/O Port, range 32, base 0x374, size 1, enabled > bar [20] = type I/O Port, range 32, base 0x1c10, size 16, enabled > bar [24] = type I/O Port, range 32, base 0x1c00, size 16, enabled > cap 01[70] = powerspec 3 supports D0 D3 current D0 > cap 13[b0] = PCI Advanced Features: FLR TP > none0@pci0:0:31:3: class=0x0c0500 card=0xd28015d9 chip=0x29308086 rev=0x02 hdr=0x00 > vendor = 'Intel Corporation' > device = 'Intel(R) ICH9 Family SMBus Controller working fine with http://download.cnet.com/Chipset-Driver-Inte (8086)' > class = serial bus > subclass = SMBus > bar [10] = type Memory, range 64, base 0xd9001800, size 256, enabled > bar [20] = type I/O Port, range 32, base 0x1100, size 32, enabled > atapci1@pci0:0:31:5: class=0x010185 card=0xd28015d9 chip=0x29268086 rev=0x02 hdr=0x00 > vendor = 'Intel Corporation' > device = '82801IB/IR/IH (ICH9 Family) 2 port Serial ATA Storage Controller 2' > class = mass storage > subclass = ATA > bar [10] = type I/O Port, range 32, base 0x1c68, size 8, enabled > bar [14] = type I/O Port, range 32, base 0x1c5c, size 4, enabled > bar [18] = type I/O Port, range 32, base 0x1c60, size 8, enabled > bar [1c] = type I/O Port, range 32, base 0x1c58, size 4, enabled > bar [20] = type I/O Port, range 32, base 0x1c30, size 16, enabled > bar [24] = type I/O Port, range 32, base 0x1c20, size 16, enabled > cap 01[70] = powerspec 3 supports D0 D3 current D0 > cap 13[b0] = PCI Advanced Features: FLR TP > none1@pci0:0:31:6: class=0x118000 card=0x000015d9 chip=0x29328086 rev=0x02 hdr=0x00 > vendor = 'Intel Corporation' > device = '82801IB/IR/IH (ICH9 Family) Thermal Subsystem' > class = dasp > bar [10] = type Memory, range 64, base 0xd9000000, size 4096, enabled > cap 01[50] = powerspec 3 supports D0 D3 current D0 > pcib2@pci0:1:0:0: class=0x060400 card=0x00000000 chip=0x03298086 rev=0x09 hdr=0x01 > vendor = 'Intel Corporation' > device = 'PCI Express-to-PCI Express Bridge A (6700PXH)' > class = bridge > subclass = PCI-PCI > cap 10[44] = PCI-Express 1 PCI bridge max data 128(256) link x8(x8) > cap 05[5c] = MSI supports 1 message, 64 bit > cap 01[6c] = powerspec 2 supports D0 D3 current D0 > cap 07[d8] = PCI-X bridge > ecap 0001[100] = AER 1 0 fatal 1 non-fatal 0 corrected > ecap 0004[300] = unknown 1 > ioapic0@pci0:1:0:1: class=0x080020 card=0xd28015d9 chip=0x03268086 rev=0x09 hdr=0x00 > vendor = 'Intel Corporation' > device = '6700/6702PXH I/OxAPIC Interrupt Controller A' > class = base peripheral > subclass = interrupt controller > bar [10] = type Memory, range 32, base 0xd8900000, size 4096, enabled > cap 10[44] = PCI-Express 1 endpoint max data 256(256) link x8(x8) > cap 01[6c] = powerspec 2 supports D0 D3 current D0 > pcib3@pci0:1:0:2: class=0x060400 card=0x00000000 chip=0x032a8086 rev=0x09 hdr=0x01 > vendor = 'Intel Corporation' > device = 'PCI Express-to-PCI Express Bridge B (6700PXH)' > class = bridge > subclass = PCI-PCI > cap 10[44] = PCI-Express 1 PCI bridge max data 128(256) link x8(x8) > cap 05[5c] = MSI supports 1 message, 64 bit > cap 01[6c] = powerspec 2 supports D0 D3 current D0 > cap 07[d8] = PCI-X bridge > ecap 0001[100] = AER 1 0 fatal 1 non-fatal 0 corrected > ecap 0004[300] = unknown 1 > ioapic1@pci0:1:0:3: class=0x080020 card=0xd28015d9 chip=0x03278086 rev=0x09 hdr=0x00 > vendor = 'Intel Corporation' > device = 'I/OxAPIC Interrupt Controller B (6700PXH)' > class = base peripheral > subclass = interrupt controller > bar [10] = type Memory, range 32, base 0xd8901000, size 4096, enabled > cap 10[44] = PCI-Express 1 endpoint max data 256(256) link x8(x8) > cap 01[6c] = powerspec 2 supports D0 D3 current D0 > twe0@pci0:2:1:0: class=0x010400 card=0x100113c1 chip=0x100113c1 rev=0x01 hdr=0x00 > vendor = '3ware Inc' > device = 'ATA-133 Storage Controller (7000/8000 series)' > class = mass storage > subclass = RAID > bar [10] = type I/O Port, range 32, base 0x2000, size 16, enabled > bar [14] = type Memory, range 32, base 0xd8800000, size 16, enabled > bar [18] = type Memory, range 32, base 0xd8000000, size 8388608, enabled > cap 01[40] = powerspec 1 supports D0 D1 D3 current D0 > isp0@pci0:5:0:0: class=0x0c0400 card=0x01371077 chip=0x24321077 rev=0x03 hdr=0x00 > vendor = 'QLogic Corporation' > device = 'Dual Channel 4G PCIe Fibre Channel Adapter (ISP2432)' > class = serial bus > subclass = Fibre Channel > bar [10] = type I/O Port, range 32, base 0x3000, size 256, enabled > bar [14] = type Memory, range 64, base 0xd8b00000, size 16384, enabled > cap 01[44] = powerspec 2 supports D0 D3 current D0 > cap 10[4c] = PCI-Express 1 endpoint max data 128(1024) link x4(x4) > cap 05[64] = MSI supports 16 messages, 64 bit enabled with 1 message > cap 03[74] = VPD > cap 11[7c] = MSI-X supports 16 messages in map 0x14 > ecap 0001[100] = AER 1 0 fatal 1 non-fatal 0 corrected > ecap 0004[138] = unknown 1 > em0@pci0:13:0:0: class=0x020000 card=0x108c15d9 chip=0x108c8086 rev=0x03 hdr=0x00 > vendor = 'Intel Corporation' > device = 'Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (82573E)' > class = network > subclass = ethernet > bar [10] = type Memory, range 32, base 0xd8a00000, size 131072, enabled > bar [18] = type I/O Port, range 32, base 0x4000, size 32, enabled > cap 01[c8] = powerspec 2 supports D0 D3 current D0 > cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message > cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1) > ecap 0001[100] = AER 1 0 fatal 1 non-fatal 0 corrected > ecap 0003[140] = Serial 1 003048ffffd21aba > em1@pci0:15:0:0: class=0x020000 card=0x109a15d9 chip=0x109a8086 rev=0x00 hdr=0x00 > vendor = 'Intel Corporation' > device = 'Intel PRO/1000 PL Network Adaptor (82573L)' > class = network > subclass = ethernet > bar [10] = type Memory, range 32, base 0xd8c00000, size 131072, enabled > bar [18] = type I/O Port, range 32, base 0x5000, size 32, enabled > cap 01[c8] = powerspec 2 supports D0 D3 current D0 > cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message > cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1) > ecap 0001[100] = AER 1 0 fatal 1 non-fatal 0 corrected > ecap 0003[140] = Serial 1 003048ffffd21abb > vgapci0@pci0:17:4:0: class=0x030000 card=0xd28015d9 chip=0x515e1002 rev=0x02 hdr=0x00 > vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' > device = 'Radeon ES1000 (Radeon ES1000)' > class = display > subclass = VGA > bar [10] = type Prefetchable Memory, range 32, base 0xd0000000, size 134217728, enabled > bar [14] = type I/O Port, range 32, base 0x6000, size 256, enabled > bar [18] = type Memory, range 32, base 0xd8d00000, size 65536, enabled > cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 > > > > 2) Full output from a verbose boot (option "5" at the loader prompt). > > That's a bit more difficult... I'll check, but don't think anything is set up > there to get a meaningful dump or verbose boot information in electronic form... > > > I imagine #2 isn't going to work for most users because there's no way > > to get pages and pages and pages of data from a panic'd machine without > > either serial console (which will require a 2nd machine and possibly a > > null-modem cable) or properly setting up a dedicated swap partition and > > large-enough /var filesystem, plus their kernel would need DDB support > > added to it (so they could properly do "call doadump" then "reboot"). By the way, regarding this specific explanation -- you'll need dumpdev="auto" in rc.conf for this to work, plus after the panic will need to ensure that the very next kernel which boots can work reliably. I'll explain how I'd go about this -- there may be an easier way, but it's what I'd do. - Adjust your kernel config to include these options: options KDB # Enable kernel debugger support options KDB_TRACE # Print stack trace automatically on panic options DDB # Support DDB options GDB # Support remote GDB options MSGBUF_SIZE=262144 # Increase kern.msgbufsize from 64K to 256K - Make sure you have a dedicated swap partition, and /var big enough to hold a kernel panic (panic will probably be ~1-2GB in size). Or make a new filesystem for /var/crash (if so, be sure to copy over /var/crash/minfree!) - Add dumpdev="auto" to /etc/rc.conf. - Add ddb_enable="yes" to /etc/rc.conf. - Revert /usr/src to RELENG_8 dated April 16th or earlier. You can accomplish this using "date=YYYY.MM.DD.hh.mm.ss" in your csup file. See csup(1) man page for details. I imagine svn has a better way to do this, but I don't use it. - Rebuild world/kernel per instructions in /usr/src/Makefile. This will get you a kernel and system which won't panic. - Update RELENG_8 to latest source code (e.g. remove "date=" line you just added) using csup (or svn method). - Rebuild world/kernel per instructions in /usr/src/Makefile. This will get you a kernel and system which WILL panic, with the old (usable) kernel in /boot/kernel.old. - Boot into "bad" kernel, but at the loader menu, pick option 5 for a verbose boot. - Panic will happen and drop you to a db> prompt. There, you should do "call doadump" and then "reboot". The first will cause the kernel to dump memory to your swap partition, the 2nd will reboot. - Boot into "good" kernel by pressing option 6 at the loader menu, then at "ok" prompt enter "boot kernel.old". If that doesn't work, you might need to do something like this: unload load /boot/kernel.old/kernel load -t /boot/zfs/zpool.cache /boot/zfs/zpool.cache boot - During the startup phase, savecore(8) will detect the crash from the previous kernel and populate /var/crash with details. One of these files will be /var/crash/core.txt.0, which should contain the contents of your verbose dmesg (or most of it anyway; this is why I adjusted MSGBUF_SIZE). I sure hope this works. :-) Serial console would, as I said, be a better choice. You wouldn't need to worry about all the panic/crash/swap stuff -- a verbose boot would be all you'd need, and you'd have the output on another machine in its scrollback buffer from whatever terminal or utility you were using that connected to the serial port. > > A workaround which one user has confirmed is to enable AHCI for your > > SATA controller in your system BIOS (if such is available). ataahci.ko > > will be used (which is AHCI via ATA) and your device names probably > > won't change. Alternatively you could enable AHCI and use ahci.ko > > (ahci_load="yes" in /boot/loader.conf) to get AHCI via CAM, which > > provides NCQ and other features, but your device names will change. > > My familiarity with ATAPI is limited however. > > Will try both if available and let you know the results. Understood. The former method worked for Michael, and I imagine it would work for you too if AHCI is available in your BIOS. I tend to advocate people use ahci.ko if they're using AHCI at all though, given the excellent CAM subsystem. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110601095610.GA20255>