From owner-freebsd-stable@FreeBSD.ORG Fri Oct 17 21:31:57 2008 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 22E251065688; Fri, 17 Oct 2008 21:31:57 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [91.103.162.4]) by mx1.freebsd.org (Postfix) with ESMTP id 8EF1A8FC20; Fri, 17 Oct 2008 21:31:56 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from localhost (localhost.codelab.cz [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id F008419E02A; Fri, 17 Oct 2008 23:31:54 +0200 (CEST) Received: from [192.168.1.2] (r5bb235.net.upc.cz [86.49.61.235]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id 4790D19E023; Fri, 17 Oct 2008 23:31:52 +0200 (CEST) Message-ID: <48F90469.7020503@quip.cz> Date: Fri, 17 Oct 2008 23:32:25 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 X-Accept-Language: cz, cs, en, en-us MIME-Version: 1.0 To: Jeremy Chadwick References: <20080927202250.GA60980@icarus.home.lan> <48E0DB7E.20804@quip.cz> <1222699642.24339.12.camel@buffy.york.ac.uk> <48E0F36C.1080400@quip.cz> <20080929153220.GA11459@icarus.home.lan> <48F7964C.4060309@quip.cz> <20081016202322.GA2429@icarus.home.lan> <48F87C0E.8060404@quip.cz> <20081017120858.GA20746@icarus.home.lan> <48F89C8D.5020301@quip.cz> <20081017150616.GA24321@icarus.home.lan> In-Reply-To: <20081017150616.GA24321@icarus.home.lan> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org Subject: Re: Recommendations for servers running SATA drives [hot-swap] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Oct 2008 21:31:57 -0000 Jeremy Chadwick wrote: > On Fri, Oct 17, 2008 at 04:09:17PM +0200, Miroslav Lachman wrote: > >>Jeremy Chadwick wrote: >> >>>On Fri, Oct 17, 2008 at 01:50:38PM +0200, Miroslav Lachman wrote: >>> >>>>Jeremy Chadwick wrote: >>>> >>>>>On Thu, Oct 16, 2008 at 09:30:20PM +0200, Miroslav Lachman wrote: >>>>> >>>>> >>>>> >>>>>>Today I was replacing disk in one Sun Fire X2100 M2 so I tried >>>>>>hot-swapping. It was as you said: atacontrol detach ata3, replace >>>>>>the HDD, atacontrol attach ata3 and new disk is in the system. I >>>>>>tried it 3 times to be sure that it was not coincidence - no >>>>>>panic was produced ;o) >>>>>>So in this case, hot-swapping on Sun Fire X2100 M2 with FreeBSD >>>>>>7.0 i386 works. >>>>> >>>>> >>>>>That's excellent news. So it seems possibly the problem I was seeing >>>>>was with "reinit" causing some sort of chaos. I'll have to check things >>>>>on my testbox here at home to see how I caused the panic last time. >>>>> >>>>>Thanks for providing feedback, as usual! :-) >>>> >>>>Unfortunately there is one problem - I see a lot of interrupts after >>>>disk swapping (about 193k of atapci1) >>>> >>>>Interrupts >>>>197k total >>>> ohci0 21 >>>> ehci0 22 >>>>193k atapci1 23 >>>>2001 cpu0: time >>>> 1 bge1 273 >>>>2001 cpu1: time >>> >>> >>>Okay, so it looks like the interrupt rate on atapci1 after swapping is >>>going crazy. What you're showing there looks like heavily modified >>>vmstat -i output. >> >>The shown is manually cropped from systat -vm, I'll try vmstat -i next >>time. ;) >> >> >>>>Full output of systat -vm 2 is attached. >>>> >>>>It is shown in top as 50% interrupt (CPU state) and load 1 until I >>>>rebooted the machine (I can provide MRTG graphs). The system was not >>>>in production load, but almost idle. (I will put it in production >>>>tomorrow). >>>>After reboot, everything is OK. >>> >>> >>>And this box is running the ATA patch Andrey provided, yes? >> >>It is clean install of FreeBSD 7.0-RELEASE-p5 amd64 without patches. >> >> >>>>Can somebody test hot-swapping with SATA drives and confirm this >>>>behavior? (I can't test it now, because machine is in datacenter) >>> >>> >>>I can test it on my P4SCE box. >>> >>>I'll check the interrupt rates after each step of the hot-swap to see >>>if/when the problem starts. >> >>I'll check the interrupts next time too and will post results to this >>thread. > > > As promised, here are notes from my testing: > > > First thing to note is that the BIOS on my P4SCE had the ICH5 SATA mode > set to "Auto", which was causing PATA emulation to happen on the SATA > controller, e.g. disk #0 == ata0-master, disk #1 == ata0-slave. > > I changed the BIOS option from Auto to "SATA Enhanced", and now the > disks show up on their own channels, e.g. disk #0 == ata2-master, disk > #1 == ata3-master. > > Here's the applicable data. Note that this kernel ***DOES*** include > Andrey's ATA patch: > > FreeBSD testbox.home.lan 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Thu Oct 16 10:56:42 PDT 2008 root@testbox.home.lan:/usr/obj/usr/src/sys/TESTBOX i386 > > atapci1: port 0xc000-0xc007,0xc400-0xc403,0xc800-0xc807,0xcc00-0xcc03,0xd000-0xd00f irq 18 at device 31.2 on pci0 > atapci1: [ITHREAD] > ata2: on atapci1 > ata2: [ITHREAD] > ata3: on atapci1 > ata3: [ITHREAD] > > SATA controller is on IRQ 18. > > ad4: 114473MB at ata2-master SATA150 > ad6: 238474MB at ata3-master SATA150 > > ATA channel 2: > Master: ad4 Serial ATA v1.0 > Slave: no device present > ATA channel 3: > Master: ad6 Serial ATA II > Slave: no device present > > testbox# df -k > Filesystem 1024-blocks Used Avail Capacity Mounted on > /dev/ad4s1a 507630 230182 236838 49% / > devfs 1 1 0 100% /dev > /dev/ad4s1e 507630 12 467008 0% /tmp > /dev/ad4s1f 108498334 2944826 96873642 3% /usr > /dev/ad4s1d 2008622 32360 1815574 2% /var > /dev/ad6s1d 236511738 4 217590796 0% /hotswap > > testbox# vmstat -i > interrupt total rate > irq4: sio0 1398 34 > irq6: fdc0 10 0 > irq15: ata1 58 1 > irq18: atapci1 945 23 > irq23: em1 8 0 > cpu0: timer 80033 1952 > cpu1: timer 79808 1946 > Total 162260 3957 > > testbox# umount /hotswap > testbox# atacontrol detach ata3 > subdisk6: detached > ad6: detached > testbox# vmstat -i | grep atapci1 > irq18: atapci1 2671 11 > > At this point I wanted to see what happened if I just reattached without > any physical changes to the SATA bus. > > testbox# atacontrol attach ata3 > ata3: [ITHREAD] > ad6: 238474MB at ata3-master SATA150 > Master: ad6 Serial ATA II > Slave: no device present > > testbox# vmstat -i | grep atapci1 > irq18: atapci1 2764 9 > testbox# mount /dev/ad6s1d /hotswap > testbox# vmstat -i | grep atapci1 > irq18: atapci1 2779 8 > > Now we're going to try detaching *without* umounting the filesystem, > then reattaching to see what happens. Based on what I've seen and > others have reported in the past, this should panic the kernel. > Supposedly this problem is fixed on CURRENT. > > testbox# atacontrol detach ata3 > subdisk6: detached > ad6: detached > > testbox# atacontrol attach ata3 > ata3: [ITHREAD] > ad6: 238474MB at ata3-master SATA150 > Master: ad6 Serial ATA II > Slave: no device present > > testbox# df -k > Filesystem 1024-blocks Used Avail Capacity Mounted on > /dev/ad4s1a 507630 230182 236838 49% / > devfs 1 1 0 100% /dev > /dev/ad4s1e 507630 12 467008 0% /tmp > /dev/ad4s1f 108498334 2944826 96873642 3% /usr > /dev/ad4s1d 2008622 32360 1815574 2% /var > /dev/ad6s1d 236511738 4 217590796 0% /hotswap > > testbox# ls -l /hotswap > > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0xc0 > fault code = supervisor read, page not present > instruction pointer = 0x20:0xc0503ca7 > stack pointer = 0x28:0xe6310a5c > frame pointer = 0x28:0xe6310a5c > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 795 (ls) > [thread pid 795 tid 100043 ] > Stopped at dev2udev+0x11: movl 0xc0(%eax),%eax > > db> bt > Tracing pid 795 tid 100043 td 0xc3dcc460 > dev2udev(3287166208,3861973668,3228755039,3861973872,3286025312,...) at dev2udev+17 > ufs_getattr(3861973664,3861973800,3227504003,3229700640,3861973664,...) at ufs_getattr+222 > VOP_GETATTR_APV(3229700640,3861973664,3229768320,3288955040,3861973684,...) at VOP_GETATTR_APV+68 > vn_stat(3288955040,3861973908,3286230784,0,3286025312,...) at vn_stat+73 > kern_lstat(3286025312,135344488,0,3861974040,3861974064,...) at kern_lstat+147 > lstat(3286025312,3861974268,8,3861974328,3861974316,...) at lstat+43 > syscall(3861974328) at syscall+814 > Xint0x80_syscall() at Xint0x80_syscall+32 > --- syscall (190, FreeBSD ELF32, lstat), eip = 1746463051, esp = 3217024524, ebp = 3217024664 --- > > Yup, there's the panic. :-) > > I rebooted the box from db, brought the system up in single-user, fsck'd > all the disks/filesystems (no anomalies were found), and rebooted the > box once more. > > Now we're going to do everything properly: unmount /hotswap, detach, > yank the disk and insert a new Maxtor hard disk, attach, and see what > happens. > > testbox# umount /hotswap > testbox# atacontrol detach ata3 > subdisk6: detached > ad6: detached > > testbox# vmstat -i | grep atapci1 > irq18: atapci1 1174 6 > > I've now removed the disk physically from the machine. Let's check > interrupts again. > > testbox# vmstat -i | grep atapci1 > irq18: atapci1 1185 4 > > Now the new Maxtor disk has been inserted. LEDs for the SATA hot-swap > backplane lit up for about 5-6 seconds, then went off. Let's check > interrupts at this point: > > testbox# vmstat -i | grep atapci1 > irq18: atapci1 1193 3 > > Now let's attach. Note that there is no filesystem on this disk (it's > completely blank), so there's nothing to mount. > > testbox# atacontrol attach ata3 > ata3: [ITHREAD] > ad6: 286188MB at ata3-master SATA150 > Master: ad6 Serial ATA v1.0 > Slave: no device present > > And now we check interrupts: > > testbox# vmstat -i | grep atapci1 > irq18: atapci1 1258 2 > > Looks fine to me. Thank you for your time, testing and reporting detailed results! I will investigate my case somewhen in the future (if time permits) Miroslav Lachman