Date: Sun, 29 May 2005 13:45:40 -0700 From: Tim Spencer <tspencer@hungry.com> To: freebsd-scsi@freebsd.org Subject: isp driver + clustered NetApp failover = strangeness Message-ID: <047FCAD3-439C-47EC-A4E4-2253A25CCB39@hungry.com>
next in thread | raw e-mail | index | archive | help
Hey there! I've got a pair of NetApp 940c heads that are exporting LUNs out to a bunch of FreeBSD hosts with qla2312 cards in them over a Brocade 2850 FC switch. Everything works great until I test out standby cluster failover on the NetApps. To quote NetApp's manual: "Port A on each target HBA operates as the active port, and Port B operates as a standby port. When the cluster is in normal operation, Port A provides access to local LUNs, and Port B is not available to the initiator. When one filer fails, Port B on the partner filer becomes active and provides access to the LUNs on the failed filer. The Port B assumes the WWPN of the Port A on the failed partner." So, to me, it sounds like this _should_ work for our FreeBSD hosts, which don't support multipathing, and thus must use this sort of failover. When the failover happens, the WWPN moves over to port B on the other head, perhaps a link reset happens or something, and everything keeps going. Well, it turns out that this is only partly true. If there is no I/O happening during the swap, then everything does seem to work out fine. But if there is I/O going on, then things quickly go downhill. I see this: May 28 19:35:56 toc2-db1 /kernel: (da0:isp0:0:1:0): Invalidating pack May 28 19:35:58 toc2-db1 /kernel: (da0:isp0:0:1:0): Invalidating pack May 28 19:36:50 toc2-db1 /kernel: (da0:isp0:0:1:0): isp0: watchdog timeout for handle 0x1f3 After this, sometimes the system locks up completely, and sometimes the system is operational, but anything that has to do with the filesystem in question hangs, etc. So here's my question: Is this something that we can make work? I really don't know all that much about the lower levels of how Fibre-Channel and the isp driver work, but it sounds like this ought to work. Is there anybody out there who knows more about the driver who might be willing to work on this? I can't guarantee anything, but our company does support FreeBSD development, and we might be able to swing some cash towards somebody who would be able to make this work. Is there anything else that I can include to help figure out what is going wrong? Below, I include dmesg from one of the hosts so you can see what sort of system is running this, but if you've got more things that I can do to diagnose this, let me know. Thanks, and have fun! -tspencer : toc2-db2 []$; dmesg Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.11-STABLE #0: Wed May 25 05:39:38 GMT 2005 root@:/usr/src/sys/compile/BSD4.11.GODSPEED-SMP Timecounter "i8254" frequency 1193182 Hz CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2786.13-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE ,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Hyperthreading: 2 logical CPUs real memory = 3221094400 (3145600K bytes) avail memory = 3134447616 (3060984K bytes) Changing APIC ID for IO APIC #0 from 0 to 8 on chip Changing APIC ID for IO APIC #1 from 0 to 9 on chip Changing APIC ID for IO APIC #2 from 0 to 10 on chip Programming 16 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 Programming 16 pins in IOAPIC #1 Programming 16 pins in IOAPIC #2 FreeBSD/SMP: Multiprocessor motherboard: 4 CPUs cpu0 (BSP): apic id: 0, version: 0x00050014, at 0xfee00000 cpu1 (AP): apic id: 1, version: 0x00050014, at 0xfee00000 cpu2 (AP): apic id: 6, version: 0x00050014, at 0xfee00000 cpu3 (AP): apic id: 7, version: 0x00050014, at 0xfee00000 io0 (APIC): apic id: 8, version: 0x000f0011, at 0xfec00000 io1 (APIC): apic id: 9, version: 0x000f0011, at 0xfec01000 io2 (APIC): apic id: 10, version: 0x000f0011, at 0xfec02000 Preloaded elf kernel "kernel" at 0x9f3d2000. Warning: Pentium 4 CPU: PSE disabled Pentium Pro MTRR support enabled md0: Malloc disk Using $PIR table, 9 entries at 0x9f0fc410 npx0: <math processor> on motherboard npx0: INT 16 interface pcib0: <Host to PCI bridge> on motherboard IOAPIC #1 intpin 3 -> irq 2 IOAPIC #1 intpin 7 -> irq 7 IOAPIC #1 intpin 11 -> irq 10 pci0: <PCI bus> on pcib0 pci0: <unknown card> (vendor=0x1028, dev=0x000c) at 4.0 irq 2 pci0: <unknown card> (vendor=0x1028, dev=0x0008) at 4.1 irq 7 pci0: <unknown card> (vendor=0x1028, dev=0x000d) at 4.2 irq 10 pci0: <ATI Mach64-GR graphics accelerator> at 14.0 atapci0: <ServerWorks CSB5 ATA100 controller> port 0x8b0-0x8bf, 0x8d8-0x8db,0x8d0-0x8d7,0x8c8-0x8cb,0x8c0-0x8c7 at device 15.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 pci0: <OHCI USB controller> at 15.2 irq 5 isab0: <PCI to ISA bridge (vendor=1166 device=0225)> at device 15.3 on pci0 isa0: <ISA bus> on isab0 pcib1: <Host to PCI bridge> on motherboard IOAPIC #1 intpin 4 -> irq 11 pci1: <PCI bus> on pcib1 fxp0: <Intel 82550 Pro/100 Ethernet> port 0xdcc0-0xdcff mem 0xfcf00000-0xfcf1ffff,0xfcf20000-0xfcf20fff irq 11 at device 8.0 on pci1 fxp0: Ethernet address 00:0e:0c:62:9e:17 inphy0: <i82555 10/100 media interface> on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto pcib2: <Host to PCI bridge> on motherboard IOAPIC #1 intpin 8 -> irq 13 pci2: <PCI bus> on pcib2 isp0: <Qlogic ISP 2312 PCI FC-AL Adapter> port 0xcc00-0xccff mem 0xfcd00000-0xfcd00fff irq 13 at device 6.0 on pci2 isp0: bad execution throttle of 0- using 16 pcib3: <Host to PCI bridge> on motherboard IOAPIC #1 intpin 12 -> irq 16 IOAPIC #1 intpin 13 -> irq 17 pci3: <PCI bus> on pcib3 bge0: <Broadcom BCM5703 Gigabit Ethernet, ASIC rev. 0x1002> mem 0xfcb10000-0xfcb1ffff irq 16 at device 6.0 on pci3 bge0: Ethernet address: 00:11:43:34:7b:3f miibus1: <MII bus> on bge0 brgphy0: <BCM5703 10/100/1000baseTX PHY> on miibus1 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge1: <Broadcom BCM5703 Gigabit Ethernet, ASIC rev. 0x1002> mem 0xfcb00000-0xfcb0ffff irq 17 at device 8.0 on pci3 bge1: Ethernet address: 00:11:43:34:7b:40 miibus2: <MII bus> on bge1 brgphy1: <BCM5703 10/100/1000baseTX PHY> on miibus2 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto pcib4: <ServerWorks host to PCI bridge(unknown chipset)> on motherboard IOAPIC #1 intpin 14 -> irq 18 pci4: <PCI bus> on pcib4 pcib8: <PCI to PCI bridge (vendor=8086 device=0309)> at device 8.0 on pci4 pci5: <PCI bus> on pcib8 aac0: <Dell PERC 3/Di> mem 0xf0000000-0xf7ffffff irq 18 at device 8.1 on pci4 aac0: i960RX 100MHz, 118MB cache memory, optional battery present aac0: Kernel 2.8-0, Build 6089, S/N 74a1d3 aac0: Supported Options=275c<WCACHE,DATA64,HOSTTIME,WINDOW4GB,SOFTERR,NORECOND,SGMAP64> pcib5: <ServerWorks host to PCI bridge(unknown chipset)> on motherboard pci6: <PCI bus> on pcib5 pcib6: <ServerWorks host to PCI bridge(unknown chipset)> on motherboard pci7: <PCI bus> on pcib6 pcib7: <ServerWorks host to PCI bridge(unknown chipset)> on motherboard pci8: <PCI bus> on pcib7 orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc8fff, 0xc9800-0xcd7ff,0xcd800-0xcefff,0xec000-0xeffff on isa0 pmtimer0 on isa0 fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A APIC_IO: Testing 8254 interrupt delivery APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin 2 APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0 IP packet filtering initialized, divert disabled, rule-based forwarding enabled, default to accept, logging limited to 100 packets/ entry by default ata0-slave: ATAPI identify retries exceeded SMP: AP CPU #2 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #1 Launched! acd0: CDROM <TEAC CD-ROM CD-224E> at ata0-master PIO4 aacd0: <RAID 0/1> on aac0 aacd0: 139997MB (286714368 sectors) Mounting root from ufs:/dev/aacd0s1a da0 at isp0 bus 0 target 0 lun 0 da0: <NETAPP LUN 0.2> Fixed Direct Access SCSI-4 device da0: 200.000MB/s transfers, Tagged Queueing Enabled da0: 817152MB (1673527296 512 byte sectors: 255H 63S/T 38636C) WARNING: / was not properly dismounted bge0: gigabit link up ohci0: <OHCI (generic) USB controller> mem 0xfe100000-0xfe100fff irq 5 at device 15.2 on pci0 usb0: OHCI version 1.0, legacy support usb0: <OHCI (generic) USB controller> on ohci0 usb0: USB revision 1.0 uhub0: 4 ports with 4 removable, self powered : toc2-db2 []$;
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?047FCAD3-439C-47EC-A4E4-2253A25CCB39>