Date: Sun, 26 Oct 2008 05:53:35 -0700 From: Jeremy Chadwick <koitsu@FreeBSD.org> To: Miroslav Lachman <000.fbsd@quip.cz> Cc: Gavin Atkinson <gavin@FreeBSD.org>, freebsd-stable@FreeBSD.org Subject: Re: Recommendations for servers running SATA drives [hot-swap] Message-ID: <20081026125335.GA89389@icarus.home.lan> In-Reply-To: <49046596.3080605@quip.cz> References: <1222699642.24339.12.camel@buffy.york.ac.uk> <48E0F36C.1080400@quip.cz> <20080929153220.GA11459@icarus.home.lan> <48F7964C.4060309@quip.cz> <20081016202322.GA2429@icarus.home.lan> <48F87C0E.8060404@quip.cz> <20081017120858.GA20746@icarus.home.lan> <48F89C8D.5020301@quip.cz> <20081017150616.GA24321@icarus.home.lan> <49046596.3080605@quip.cz>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Oct 26, 2008 at 01:41:58PM +0100, Miroslav Lachman wrote: > Jeremy Chadwick wrote: >> On Fri, Oct 17, 2008 at 04:09:17PM +0200, Miroslav Lachman wrote: >> >>> Jeremy Chadwick wrote: >>> >>>> On Fri, Oct 17, 2008 at 01:50:38PM +0200, Miroslav Lachman wrote: >>>> >>>>> Jeremy Chadwick wrote: >>>>> >>>>>> On Thu, Oct 16, 2008 at 09:30:20PM +0200, Miroslav Lachman wrote: >>>>>> >>>>>>> Today I was replacing disk in one Sun Fire X2100 M2 so I >>>>>>> tried hot-swapping. It was as you said: atacontrol detach >>>>>>> ata3, replace the HDD, atacontrol attach ata3 and new disk is >>>>>>> in the system. I tried it 3 times to be sure that it was not >>>>>>> coincidence - no panic was produced ;o) >>>>>>> So in this case, hot-swapping on Sun Fire X2100 M2 with >>>>>>> FreeBSD 7.0 i386 works. >>>>>> >>>>>> >>>>>> That's excellent news. So it seems possibly the problem I was seeing >>>>>> was with "reinit" causing some sort of chaos. I'll have to check things >>>>>> on my testbox here at home to see how I caused the panic last time. >>>>>> >>>>>> Thanks for providing feedback, as usual! :-) >>>>> >>>>> Unfortunately there is one problem - I see a lot of interrupts >>>>> after disk swapping (about 193k of atapci1) >>>>> >>>>> Interrupts >>>>> 197k total >>>>> ohci0 21 >>>>> ehci0 22 >>>>> 193k atapci1 23 >>>>> 2001 cpu0: time >>>>> 1 bge1 273 >>>>> 2001 cpu1: time >>>> >>>> >>>> Okay, so it looks like the interrupt rate on atapci1 after swapping is >>>> going crazy. What you're showing there looks like heavily modified >>>> vmstat -i output. >>> >>> The shown is manually cropped from systat -vm, I'll try vmstat -i >>> next time. ;) >>> >>> >>>>> Full output of systat -vm 2 is attached. >>>>> >>>>> It is shown in top as 50% interrupt (CPU state) and load 1 until >>>>> I rebooted the machine (I can provide MRTG graphs). The system >>>>> was not in production load, but almost idle. (I will put it in >>>>> production tomorrow). >>>>> After reboot, everything is OK. >>>> >>>> >>>> And this box is running the ATA patch Andrey provided, yes? >>> >>> It is clean install of FreeBSD 7.0-RELEASE-p5 amd64 without patches. >>> >>> >>>>> Can somebody test hot-swapping with SATA drives and confirm this >>>>> behavior? (I can't test it now, because machine is in >>>>> datacenter) >>>> >>>> >>>> I can test it on my P4SCE box. >>>> >>>> I'll check the interrupt rates after each step of the hot-swap to see >>>> if/when the problem starts. >>> >>> I'll check the interrupts next time too and will post results to this >>> thread. >> >> >> As promised, here are notes from my testing: >> >> >> First thing to note is that the BIOS on my P4SCE had the ICH5 SATA mode >> set to "Auto", which was causing PATA emulation to happen on the SATA >> controller, e.g. disk #0 == ata0-master, disk #1 == ata0-slave. >> >> I changed the BIOS option from Auto to "SATA Enhanced", and now the >> disks show up on their own channels, e.g. disk #0 == ata2-master, disk >> #1 == ata3-master. >> >> Here's the applicable data. Note that this kernel ***DOES*** include >> Andrey's ATA patch: >> >> FreeBSD testbox.home.lan 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Thu Oct 16 10:56:42 PDT 2008 root@testbox.home.lan:/usr/obj/usr/src/sys/TESTBOX i386 >> >> atapci1: <Intel ICH5 SATA150 controller> port 0xc000-0xc007,0xc400-0xc403,0xc800-0xc807,0xcc00-0xcc03,0xd000-0xd00f irq 18 at device 31.2 on pci0 >> atapci1: [ITHREAD] >> ata2: <ATA channel 0> on atapci1 >> ata2: [ITHREAD] >> ata3: <ATA channel 1> on atapci1 >> ata3: [ITHREAD] >> >> SATA controller is on IRQ 18. >> >> ad4: 114473MB <Seagate ST3120026AS 3.05> at ata2-master SATA150 >> ad6: 238474MB <WDC WD2500KS-00MJB0 02.01C03> at ata3-master SATA150 >> >> ATA channel 2: >> Master: ad4 <ST3120026AS/3.05> Serial ATA v1.0 >> Slave: no device present >> ATA channel 3: >> Master: ad6 <WDC WD2500KS-00MJB0/02.01C03> Serial ATA II >> Slave: no device present >> >> testbox# df -k >> Filesystem 1024-blocks Used Avail Capacity Mounted on >> /dev/ad4s1a 507630 230182 236838 49% / >> devfs 1 1 0 100% /dev >> /dev/ad4s1e 507630 12 467008 0% /tmp >> /dev/ad4s1f 108498334 2944826 96873642 3% /usr >> /dev/ad4s1d 2008622 32360 1815574 2% /var >> /dev/ad6s1d 236511738 4 217590796 0% /hotswap >> >> testbox# vmstat -i >> interrupt total rate >> irq4: sio0 1398 34 >> irq6: fdc0 10 0 >> irq15: ata1 58 1 >> irq18: atapci1 945 23 >> irq23: em1 8 0 >> cpu0: timer 80033 1952 >> cpu1: timer 79808 1946 >> Total 162260 3957 >> >> testbox# umount /hotswap >> testbox# atacontrol detach ata3 >> subdisk6: detached >> ad6: detached >> testbox# vmstat -i | grep atapci1 >> irq18: atapci1 2671 11 >> >> At this point I wanted to see what happened if I just reattached without >> any physical changes to the SATA bus. >> >> testbox# atacontrol attach ata3 >> ata3: [ITHREAD] >> ad6: 238474MB <WDC WD2500KS-00MJB0 02.01C03> at ata3-master SATA150 >> Master: ad6 <WDC WD2500KS-00MJB0/02.01C03> Serial ATA II >> Slave: no device present >> >> testbox# vmstat -i | grep atapci1 >> irq18: atapci1 2764 9 >> testbox# mount /dev/ad6s1d /hotswap >> testbox# vmstat -i | grep atapci1 >> irq18: atapci1 2779 8 >> >> Now we're going to try detaching *without* umounting the filesystem, >> then reattaching to see what happens. Based on what I've seen and >> others have reported in the past, this should panic the kernel. >> Supposedly this problem is fixed on CURRENT. >> >> testbox# atacontrol detach ata3 >> subdisk6: detached >> ad6: detached >> >> testbox# atacontrol attach ata3 >> ata3: [ITHREAD] >> ad6: 238474MB <WDC WD2500KS-00MJB0 02.01C03> at ata3-master SATA150 >> Master: ad6 <WDC WD2500KS-00MJB0/02.01C03> Serial ATA II >> Slave: no device present >> >> testbox# df -k >> Filesystem 1024-blocks Used Avail Capacity Mounted on >> /dev/ad4s1a 507630 230182 236838 49% / >> devfs 1 1 0 100% /dev >> /dev/ad4s1e 507630 12 467008 0% /tmp >> /dev/ad4s1f 108498334 2944826 96873642 3% /usr >> /dev/ad4s1d 2008622 32360 1815574 2% /var >> /dev/ad6s1d 236511738 4 217590796 0% /hotswap >> >> testbox# ls -l /hotswap >> >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 0; apic id = 00 >> fault virtual address = 0xc0 >> fault code = supervisor read, page not present >> instruction pointer = 0x20:0xc0503ca7 >> stack pointer = 0x28:0xe6310a5c >> frame pointer = 0x28:0xe6310a5c >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, def32 1, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 795 (ls) >> [thread pid 795 tid 100043 ] >> Stopped at dev2udev+0x11: movl 0xc0(%eax),%eax >> >> db> bt >> Tracing pid 795 tid 100043 td 0xc3dcc460 >> dev2udev(3287166208,3861973668,3228755039,3861973872,3286025312,...) at dev2udev+17 >> ufs_getattr(3861973664,3861973800,3227504003,3229700640,3861973664,...) at ufs_getattr+222 >> VOP_GETATTR_APV(3229700640,3861973664,3229768320,3288955040,3861973684,...) at VOP_GETATTR_APV+68 >> vn_stat(3288955040,3861973908,3286230784,0,3286025312,...) at vn_stat+73 >> kern_lstat(3286025312,135344488,0,3861974040,3861974064,...) at kern_lstat+147 >> lstat(3286025312,3861974268,8,3861974328,3861974316,...) at lstat+43 >> syscall(3861974328) at syscall+814 >> Xint0x80_syscall() at Xint0x80_syscall+32 >> --- syscall (190, FreeBSD ELF32, lstat), eip = 1746463051, esp = 3217024524, ebp = 3217024664 --- >> >> Yup, there's the panic. :-) >> >> I rebooted the box from db, brought the system up in single-user, fsck'd >> all the disks/filesystems (no anomalies were found), and rebooted the >> box once more. >> >> Now we're going to do everything properly: unmount /hotswap, detach, >> yank the disk and insert a new Maxtor hard disk, attach, and see what >> happens. >> >> testbox# umount /hotswap >> testbox# atacontrol detach ata3 >> subdisk6: detached >> ad6: detached >> >> testbox# vmstat -i | grep atapci1 >> irq18: atapci1 1174 6 >> >> I've now removed the disk physically from the machine. Let's check >> interrupts again. >> >> testbox# vmstat -i | grep atapci1 >> irq18: atapci1 1185 4 >> >> Now the new Maxtor disk has been inserted. LEDs for the SATA hot-swap >> backplane lit up for about 5-6 seconds, then went off. Let's check >> interrupts at this point: >> >> testbox# vmstat -i | grep atapci1 >> irq18: atapci1 1193 3 >> >> Now let's attach. Note that there is no filesystem on this disk (it's >> completely blank), so there's nothing to mount. >> >> testbox# atacontrol attach ata3 >> ata3: [ITHREAD] >> ad6: 286188MB <Maxtor 6L300S0 BANC1G20> at ata3-master SATA150 >> Master: ad6 <Maxtor 6L300S0/BANC1G20> Serial ATA v1.0 >> Slave: no device present >> >> And now we check interrupts: >> >> testbox# vmstat -i | grep atapci1 >> irq18: atapci1 1258 2 > > > I played again with hot-swapping disks in Sun Fire X2100 M2 on FreeBSD > 7.0-RELEASE-p5 i386 without ATA patches. > Both disks (ad4 + ad6) are in gmirror. There were high interrupts load > again! I tracked it to the point of pulling out the disk. > Interrupt was OK after 'atacontrol detach', but rise up after disk was > removed. When the disk is inserted back (same disk), interrupts are > going to normal rate without need to reboot. > I tried it three times and behavior was always the same. > It can be related to the use of gmirror. Interesting -- yeah, that's the one difference (besides hardware) between your tests and mine: you're using gmirror, I'm not. I'll take some time this weekend (or the upcoming weekend) to set up gmirror on the above test box, and see if I can reproduce what you're seeing. This will be my first experience with gmirror. > Side note: If the disk was detached by 'atacontrol detach ata2' without > removing from gmirror (without gmirror remove or gmirror deactivate) and > then pulled out + inserted back, it was automagically attached without > need of 'atacontrol attach ata2' and gmirror synchronization was > autostarted. This could mean that gmirror is constantly "polling" the underlying device layer for certain things, which might explain the high interrupt rate you're seeing. We should probably involve pjd@ in this discussion. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081026125335.GA89389>