From owner-freebsd-stable@FreeBSD.ORG Sun Oct 26 12:41:27 2008 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8BB8A106567E; Sun, 26 Oct 2008 12:41:27 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [91.103.162.4]) by mx1.freebsd.org (Postfix) with ESMTP id 02C4B8FC23; Sun, 26 Oct 2008 12:41:26 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from localhost (localhost.codelab.cz [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id C0F4F19E023; Sun, 26 Oct 2008 13:41:25 +0100 (CET) Received: from [192.168.1.2] (r5bb235.net.upc.cz [86.49.61.235]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id 1077419E019; Sun, 26 Oct 2008 13:41:23 +0100 (CET) Message-ID: <49046596.3080605@quip.cz> Date: Sun, 26 Oct 2008 13:41:58 +0100 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 X-Accept-Language: cz, cs, en, en-us MIME-Version: 1.0 To: Jeremy Chadwick References: <20080927202250.GA60980@icarus.home.lan> <48E0DB7E.20804@quip.cz> <1222699642.24339.12.camel@buffy.york.ac.uk> <48E0F36C.1080400@quip.cz> <20080929153220.GA11459@icarus.home.lan> <48F7964C.4060309@quip.cz> <20081016202322.GA2429@icarus.home.lan> <48F87C0E.8060404@quip.cz> <20081017120858.GA20746@icarus.home.lan> <48F89C8D.5020301@quip.cz> <20081017150616.GA24321@icarus.home.lan> In-Reply-To: <20081017150616.GA24321@icarus.home.lan> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: Gavin Atkinson , freebsd-stable@FreeBSD.org Subject: Re: Recommendations for servers running SATA drives [hot-swap] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2008 12:41:27 -0000 Jeremy Chadwick wrote: > On Fri, Oct 17, 2008 at 04:09:17PM +0200, Miroslav Lachman wrote: > >>Jeremy Chadwick wrote: >> >>>On Fri, Oct 17, 2008 at 01:50:38PM +0200, Miroslav Lachman wrote: >>> >>>>Jeremy Chadwick wrote: >>>> >>>>>On Thu, Oct 16, 2008 at 09:30:20PM +0200, Miroslav Lachman wrote: >>>>> >>>>>>Today I was replacing disk in one Sun Fire X2100 M2 so I tried >>>>>>hot-swapping. It was as you said: atacontrol detach ata3, replace >>>>>>the HDD, atacontrol attach ata3 and new disk is in the system. I >>>>>>tried it 3 times to be sure that it was not coincidence - no >>>>>>panic was produced ;o) >>>>>>So in this case, hot-swapping on Sun Fire X2100 M2 with FreeBSD >>>>>>7.0 i386 works. >>>>> >>>>> >>>>>That's excellent news. So it seems possibly the problem I was seeing >>>>>was with "reinit" causing some sort of chaos. I'll have to check things >>>>>on my testbox here at home to see how I caused the panic last time. >>>>> >>>>>Thanks for providing feedback, as usual! :-) >>>> >>>>Unfortunately there is one problem - I see a lot of interrupts after >>>>disk swapping (about 193k of atapci1) >>>> >>>>Interrupts >>>>197k total >>>> ohci0 21 >>>> ehci0 22 >>>>193k atapci1 23 >>>>2001 cpu0: time >>>> 1 bge1 273 >>>>2001 cpu1: time >>> >>> >>>Okay, so it looks like the interrupt rate on atapci1 after swapping is >>>going crazy. What you're showing there looks like heavily modified >>>vmstat -i output. >> >>The shown is manually cropped from systat -vm, I'll try vmstat -i next >>time. ;) >> >> >>>>Full output of systat -vm 2 is attached. >>>> >>>>It is shown in top as 50% interrupt (CPU state) and load 1 until I >>>>rebooted the machine (I can provide MRTG graphs). The system was not >>>>in production load, but almost idle. (I will put it in production >>>>tomorrow). >>>>After reboot, everything is OK. >>> >>> >>>And this box is running the ATA patch Andrey provided, yes? >> >>It is clean install of FreeBSD 7.0-RELEASE-p5 amd64 without patches. >> >> >>>>Can somebody test hot-swapping with SATA drives and confirm this >>>>behavior? (I can't test it now, because machine is in datacenter) >>> >>> >>>I can test it on my P4SCE box. >>> >>>I'll check the interrupt rates after each step of the hot-swap to see >>>if/when the problem starts. >> >>I'll check the interrupts next time too and will post results to this >>thread. > > > As promised, here are notes from my testing: > > > First thing to note is that the BIOS on my P4SCE had the ICH5 SATA mode > set to "Auto", which was causing PATA emulation to happen on the SATA > controller, e.g. disk #0 == ata0-master, disk #1 == ata0-slave. > > I changed the BIOS option from Auto to "SATA Enhanced", and now the > disks show up on their own channels, e.g. disk #0 == ata2-master, disk > #1 == ata3-master. > > Here's the applicable data. Note that this kernel ***DOES*** include > Andrey's ATA patch: > > FreeBSD testbox.home.lan 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Thu Oct 16 10:56:42 PDT 2008 root@testbox.home.lan:/usr/obj/usr/src/sys/TESTBOX i386 > > atapci1: port 0xc000-0xc007,0xc400-0xc403,0xc800-0xc807,0xcc00-0xcc03,0xd000-0xd00f irq 18 at device 31.2 on pci0 > atapci1: [ITHREAD] > ata2: on atapci1 > ata2: [ITHREAD] > ata3: on atapci1 > ata3: [ITHREAD] > > SATA controller is on IRQ 18. > > ad4: 114473MB at ata2-master SATA150 > ad6: 238474MB at ata3-master SATA150 > > ATA channel 2: > Master: ad4 Serial ATA v1.0 > Slave: no device present > ATA channel 3: > Master: ad6 Serial ATA II > Slave: no device present > > testbox# df -k > Filesystem 1024-blocks Used Avail Capacity Mounted on > /dev/ad4s1a 507630 230182 236838 49% / > devfs 1 1 0 100% /dev > /dev/ad4s1e 507630 12 467008 0% /tmp > /dev/ad4s1f 108498334 2944826 96873642 3% /usr > /dev/ad4s1d 2008622 32360 1815574 2% /var > /dev/ad6s1d 236511738 4 217590796 0% /hotswap > > testbox# vmstat -i > interrupt total rate > irq4: sio0 1398 34 > irq6: fdc0 10 0 > irq15: ata1 58 1 > irq18: atapci1 945 23 > irq23: em1 8 0 > cpu0: timer 80033 1952 > cpu1: timer 79808 1946 > Total 162260 3957 > > testbox# umount /hotswap > testbox# atacontrol detach ata3 > subdisk6: detached > ad6: detached > testbox# vmstat -i | grep atapci1 > irq18: atapci1 2671 11 > > At this point I wanted to see what happened if I just reattached without > any physical changes to the SATA bus. > > testbox# atacontrol attach ata3 > ata3: [ITHREAD] > ad6: 238474MB at ata3-master SATA150 > Master: ad6 Serial ATA II > Slave: no device present > > testbox# vmstat -i | grep atapci1 > irq18: atapci1 2764 9 > testbox# mount /dev/ad6s1d /hotswap > testbox# vmstat -i | grep atapci1 > irq18: atapci1 2779 8 > > Now we're going to try detaching *without* umounting the filesystem, > then reattaching to see what happens. Based on what I've seen and > others have reported in the past, this should panic the kernel. > Supposedly this problem is fixed on CURRENT. > > testbox# atacontrol detach ata3 > subdisk6: detached > ad6: detached > > testbox# atacontrol attach ata3 > ata3: [ITHREAD] > ad6: 238474MB at ata3-master SATA150 > Master: ad6 Serial ATA II > Slave: no device present > > testbox# df -k > Filesystem 1024-blocks Used Avail Capacity Mounted on > /dev/ad4s1a 507630 230182 236838 49% / > devfs 1 1 0 100% /dev > /dev/ad4s1e 507630 12 467008 0% /tmp > /dev/ad4s1f 108498334 2944826 96873642 3% /usr > /dev/ad4s1d 2008622 32360 1815574 2% /var > /dev/ad6s1d 236511738 4 217590796 0% /hotswap > > testbox# ls -l /hotswap > > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0xc0 > fault code = supervisor read, page not present > instruction pointer = 0x20:0xc0503ca7 > stack pointer = 0x28:0xe6310a5c > frame pointer = 0x28:0xe6310a5c > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 795 (ls) > [thread pid 795 tid 100043 ] > Stopped at dev2udev+0x11: movl 0xc0(%eax),%eax > > db> bt > Tracing pid 795 tid 100043 td 0xc3dcc460 > dev2udev(3287166208,3861973668,3228755039,3861973872,3286025312,...) at dev2udev+17 > ufs_getattr(3861973664,3861973800,3227504003,3229700640,3861973664,...) at ufs_getattr+222 > VOP_GETATTR_APV(3229700640,3861973664,3229768320,3288955040,3861973684,...) at VOP_GETATTR_APV+68 > vn_stat(3288955040,3861973908,3286230784,0,3286025312,...) at vn_stat+73 > kern_lstat(3286025312,135344488,0,3861974040,3861974064,...) at kern_lstat+147 > lstat(3286025312,3861974268,8,3861974328,3861974316,...) at lstat+43 > syscall(3861974328) at syscall+814 > Xint0x80_syscall() at Xint0x80_syscall+32 > --- syscall (190, FreeBSD ELF32, lstat), eip = 1746463051, esp = 3217024524, ebp = 3217024664 --- > > Yup, there's the panic. :-) > > I rebooted the box from db, brought the system up in single-user, fsck'd > all the disks/filesystems (no anomalies were found), and rebooted the > box once more. > > Now we're going to do everything properly: unmount /hotswap, detach, > yank the disk and insert a new Maxtor hard disk, attach, and see what > happens. > > testbox# umount /hotswap > testbox# atacontrol detach ata3 > subdisk6: detached > ad6: detached > > testbox# vmstat -i | grep atapci1 > irq18: atapci1 1174 6 > > I've now removed the disk physically from the machine. Let's check > interrupts again. > > testbox# vmstat -i | grep atapci1 > irq18: atapci1 1185 4 > > Now the new Maxtor disk has been inserted. LEDs for the SATA hot-swap > backplane lit up for about 5-6 seconds, then went off. Let's check > interrupts at this point: > > testbox# vmstat -i | grep atapci1 > irq18: atapci1 1193 3 > > Now let's attach. Note that there is no filesystem on this disk (it's > completely blank), so there's nothing to mount. > > testbox# atacontrol attach ata3 > ata3: [ITHREAD] > ad6: 286188MB at ata3-master SATA150 > Master: ad6 Serial ATA v1.0 > Slave: no device present > > And now we check interrupts: > > testbox# vmstat -i | grep atapci1 > irq18: atapci1 1258 2 I played again with hot-swapping disks in Sun Fire X2100 M2 on FreeBSD 7.0-RELEASE-p5 i386 without ATA patches. Both disks (ad4 + ad6) are in gmirror. There were high interrupts load again! I tracked it to the point of pulling out the disk. Interrupt was OK after 'atacontrol detach', but rise up after disk was removed. When the disk is inserted back (same disk), interrupts are going to normal rate without need to reboot. I tried it three times and behavior was always the same. It can be related to the use of gmirror. Side note: If the disk was detached by 'atacontrol detach ata2' without removing from gmirror (without gmirror remove or gmirror deactivate) and then pulled out + inserted back, it was automagically attached without need of 'atacontrol attach ata2' and gmirror synchronization was autostarted. As I am planing my vacation, I will not have time to test newer versions of FreeBSD (or patches), I will test it later in December. Miroslav Lachman