Date: Fri, 7 Apr 2006 14:14:52 +0200 From: Albert Shih <shih@math.jussieu.fr> To: freebsd-stable@FreeBSD.ORG Subject: Disappointed-new Message-ID: <20060407121452.GO1784@math.jussieu.fr>
next in thread | raw e-mail | index | archive | help
--0ntfKIWw70PvrIHh Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit Hi all I've sent a message two days ago with "Disappointed" subject. Many of you answer me I don't have describe the bug. Well, first my english is very bad, and second I don't blame anyone, never the developper. Personnaly I'm very impresse by the work you doing. Now a fine description of my problem. Hardware : HP Proliant ML 350 G4 2 x Xeon 3.2 Ghz (HT disable) 1 Go Ram 1 bge network interface on mothercard 1 dual 1000Mbits/s (em chipset) 1 dual 100Mbits/s (fxp chipset) 1 Internal 641 Smart Array with 2 Hotplug disk in raid 1 1 MSA1000 with 14 disk on fiber channel attachement Situation : On every network card we have different IP subnet Every network card have he's owne IP address All interface is connected on Foundry 1000 Mbits/s switch L2 Purpose : It's central nfs server (NetApp for «small» budget...), the server run a dhcpd server and that's all. There are no user account on this server. The server is not routed on Internet. The nfs is bind on 3 of 5 IP number (the 2 other is just running ssh for scp) There are 13 nfs clients running Linux (different version of kernel) There are also 4 nfs clients running FreeBSD (different version but all > 5.2 and < 5.5) In the time : The 6-stable is installed on the server on begin of February 2006 Problems : First time : Kernel : SMP+ipfw In first time the «main» nfsd is bind on the bge0 interface (main=90% nfs traffic) After 10-15 days of perfect running, the bge0 don't work, but the other interface working perfectly. The server is up and the other nfs clients can acces without problem the nfs partition. I can logon the console. And I've try many ifconfig bge0 down ifconfig bge up ifconfig bge0 delete etc... nothing work On the console the are repeatly message like bge0 watchdog timeout problems bge0 watchdog timeout problems Only (for me) reboot can make the system re-work. And after reboot everything work fine. But after some days the problem is come again. And in this second case all interfaces don't work. But the I always can logon in the console. But the reboot is not clean (I need to make a big fsck) Second time : Kernel : NO_SMP +ipfw After some advice on this mailing-list I switch to mono-proc version of the kernel. This time after some days working fine the bge0 don't work again (same condition of first time) third time : Kernel : NO_SMP + ipfw I switch the main nfs(=90% of traffic) interface to em0 and put a not running nfs (only scp) ip number on the bge0. Again after some days the .... em0 interface don't work. And this time the message on console is em0 watchdog timeout problems sometime I have fxpX watchdog timeout problem too forth time : Kernel : NO_SMP + polling + ipfw Now I'm running all interface in polling mode. And...I hope it's work...(running from 2 days). Information : I can't tell if it's during heavy nfs load, but I really don't think. There are on crash during saturday (and we don't have many users in this day). I cannot reproduce this bug. I've try to make a big nfs access (on 4 linux clients I'm running in same time something like find . -type f -exec md5sum {} \; but he won't crash. In this partition there are 30 Go. I forget to tell I'm running a very close configuration (a old ML350G3 with same MSA1000 in same condition) with 4.x during 4 years without any crash (with the same clients etc...) In attachement the dmesg just after the server boot. Next Monday I switch to DB kernel but now I just can reboot the server (600 users). I hope that's can help you to make FreeBSD better than best OS ;-) . Lots of thanks. -- Albert SHIH Universite de Paris 7 (Denis DIDEROT) U.F.R. de Mathematiques. Heure local/Local time: Fri Apr 7 13:38:55 CEST 2006 --0ntfKIWw70PvrIHh Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=dmesg-20060405-174250 Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.1-PRERELEASE #1: Wed Apr 5 17:27:03 CEST 2006 root@nfs3.math.jussieu.fr:/usr/obj/usr/src/sys/NFS3-mono ACPI APIC Table: <HP 00000083> Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(TM) CPU 3.20GHz (3200.13-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf41 Stepping = 1 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x641d<SSE3,RSVD2,MON,DS_CPL,CNTX-ID,CX16,<b14>> AMD Features=0x20000000<LM> Hyperthreading: 2 logical CPUs real memory = 1073688576 (1023 MB) avail memory = 1041752064 (993 MB) ioapic1: Changing APIC ID to 9 ioapic0 <Version 2.0> irqs 0-23 on motherboard ioapic1 <Version 2.0> irqs 24-47 on motherboard ioapic2 <Version 2.0> irqs 48-71 on motherboard ioapic3 <Version 2.0> irqs 72-95 on motherboard npx0: [FAST] npx0: <math processor> on motherboard npx0: INT 16 interface acpi0: <HP D17> on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x908-0x90b on acpi0 cpu0: <ACPI CPU> on acpi0 pcib0: <ACPI Host-PCI bridge> on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0 pci5: <ACPI PCI bus> on pcib1 pcib2: <ACPI PCI-PCI bridge> at device 0.0 on pci5 pci6: <ACPI PCI bus> on pcib2 isp0: <Qlogic ISP 2312 PCI FC-AL Adapter> port 0x6000-0x60ff mem 0xfdef0000-0xfdef0fff irq 48 at device 1.0 on pci6 isp0: [GIANT-LOCKED] pcib3: <ACPI PCI-PCI bridge> at device 0.2 on pci5 pci9: <ACPI PCI bus> on pcib3 em0: <Intel(R) PRO/1000 Network Connection Version - 3.2.18> port 0x7000-0x703f mem 0xfdfe0000-0xfdffffff,0xfdf80000-0xfdfbffff irq 76 at device 1.0 on pci9 em0: Ethernet address: 00:11:0a:56:57:9e em1: <Intel(R) PRO/1000 Network Connection Version - 3.2.18> port 0x7040-0x707f mem 0xfdf60000-0xfdf7ffff irq 77 at device 1.1 on pci9 em1: Ethernet address: 00:11:0a:56:57:9f ciss0: <HP Smart Array 641> port 0x7400-0x74ff mem 0xfdf50000-0xfdf51fff,0xfdf00000-0xfdf3ffff irq 72 at device 2.0 on pci9 ciss0: [GIANT-LOCKED] pcib4: <ACPI PCI-PCI bridge> at device 4.0 on pci0 pci13: <ACPI PCI bus> on pcib4 pcib5: <ACPI PCI-PCI bridge> at device 6.0 on pci0 pci16: <ACPI PCI bus> on pcib5 pcib6: <ACPI PCI-PCI bridge> at device 28.0 on pci0 pci2: <ACPI PCI bus> on pcib6 pcib7: <PCI-PCI bridge> at device 2.0 on pci2 pci3: <PCI bus> on pcib7 fxp0: <Intel 82559 Pro/100 Ethernet> port 0x5000-0x503f mem 0xfddf0000-0xfddf0fff,0xfdc00000-0xfdcfffff irq 26 at device 4.0 on pci3 miibus0: <MII bus> on fxp0 inphy0: <i82555 10/100 media interface> on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:08:02:cd:d5:be fxp1: <Intel 82559 Pro/100 Ethernet> port 0x5040-0x507f mem 0xfdbf0000-0xfdbf0fff,0xfda00000-0xfdafffff irq 26 at device 5.0 on pci3 miibus1: <MII bus> on fxp1 inphy1: <i82555 10/100 media interface> on miibus1 inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp1: Ethernet address: 00:08:02:cd:d5:bf mpt0: <LSILogic 1030 Ultra4 Adapter> port 0x4000-0x40ff mem 0xfd9e0000-0xfd9fffff,0xfd9c0000-0xfd9dffff irq 24 at device 3.0 on pci2 mpt0: [GIANT-LOCKED] mpt0: MPI Version=1.2.14.0 mpt0: Unhandled Event Notify Frame. Event 0xa. mpt1: <LSILogic 1030 Ultra4 Adapter> port 0x4400-0x44ff mem 0xfd9a0000-0xfd9bffff,0xfd980000-0xfd99ffff irq 25 at device 3.1 on pci2 mpt1: [GIANT-LOCKED] mpt1: MPI Version=1.2.14.0 mpt1: Unhandled Event Notify Frame. Event 0xa. uhci0: <UHCI (generic) USB controller> port 0x2000-0x201f irq 16 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] usb0: <UHCI (generic) USB controller> on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: <UHCI (generic) USB controller> port 0x2020-0x203f irq 19 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] usb1: <UHCI (generic) USB controller> on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered pci0: <base peripheral> at device 29.4 (no driver attached) pci0: <base peripheral, interrupt controller> at device 29.5 (no driver attached) ehci0: <Intel 6300ESB USB 2.0 controller> mem 0xfbee0000-0xfbee03ff irq 23 at device 29.7 on pci0 ehci0: [GIANT-LOCKED] usb2: EHCI version 1.0 usb2: companion controllers, 2 ports each: usb0 usb1 usb2: <Intel 6300ESB USB 2.0 controller> on ehci0 usb2: USB revision 2.0 uhub2: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub2: 4 ports with 4 removable, self powered pcib8: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci1: <ACPI PCI bus> on pcib8 bge0: <Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003> mem 0xfd8f0000-0xfd8fffff irq 17 at device 2.0 on pci1 miibus2: <MII bus> on bge0 brgphy0: <BCM5705 10/100/1000baseTX PHY> on miibus2 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge0: Ethernet address: 00:15:60:0b:09:b4 pci1: <display, VGA> at device 3.0 (no driver attached) pci1: <base peripheral> at device 4.0 (no driver attached) isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel 6300ESB UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x500-0x50f at device 31.1 on pci0 ata0: <ATA channel 0> on atapci0 ata1: <ATA channel 1> on atapci0 acpi_tz0: <Thermal Zone> on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] ppc0: <Standard parallel printer port> port 0x378-0x37f,0x778-0x77d irq 7 drq 0 on acpi0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppbus0: <Parallel port bus> on ppc0 plip0: <PLIP network interface> on ppbus0 lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port ppi0: <Parallel I/O> on ppbus0 sio0: <Standard PC COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A fdc0: <floppy drive controller (FDE)> port 0x3f2-0x3f5 irq 6 drq 2 on acpi0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 pmtimer0 on isa0 orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc87ff,0xc8800-0xc9fff,0xca000-0xcdfff,0xee000-0xeffff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 3200131784 Hz quality 800 Timecounters tick every 1.000 msec ipfw2 (+ipv6) initialized, divert loadable, rule-based forwarding disabled, default to deny, logging unlimited acd0: CDROM <HL-DT-ST CD-ROM GCR-8482B/2.09> at ata1-master UDMA33 Waiting 5 seconds for SCSI devices to settle sa0 at mpt0 bus 0 target 0 lun 0 sa0: <QUANTUM SDLT600 1E1E> Removable Sequential Access SCSI-4 device sa0: 160.000MB/s transfers (80.000MHz, offset 126, 16bit) pass0 at isp0 bus 0 target 125 lun 0 pass0: <COMPAQ MSA1000 2.38> Fixed Storage Array SCSI-4 device pass0: 200.000MB/s transfers, Tagged Queueing Enabled da2 at ciss0 bus 0 target 0 lun 0 da2: <COMPAQ RAID 1 VOLUME OK> Fixed Direct Access SCSI-0 device da2: 135.168MB/s transfers da2: 69459MB (142253280 512 byte sectors: 255H 32S/T 17433C) da0 at isp0 bus 0 target 125 lun 1 da0: <COMPAQ MSA1000 VOLUME 2.38> Fixed Direct Access SCSI-4 device da0: 200.000MB/s transfers, Tagged Queueing Enabled da0: 555714MB (1138103296 512 byte sectors: 255H 63S/T 70843C) da1 at isp0 bus 0 target 125 lun 2 da1: <COMPAQ MSA1000 VOLUME 2.38> Fixed Direct Access SCSI-4 device da1: 200.000MB/s transfers, Tagged Queueing Enabled da1: 277850MB (569038365 512 byte sectors: 255H 63S/T 35421C) Trying to mount root from ufs:/dev/da2s1a em0: link state changed to UP em1: link state changed to UP --0ntfKIWw70PvrIHh--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060407121452.GO1784>