Date: Tue, 30 Jul 2013 19:19:38 +0200 From: Ewald Jenisch <a@jenisch.at> To: <freebsd-questions@freebsd.org> Subject: System hangs for several minutes (disk IO related) Message-ID: <20130730171938.GA3602@aurora.oekb.co.at>
next in thread | raw e-mail | index | archive | help
Hi, I'm seeing rather strange behavior on an HP DL585 G5 wrt. disk IO: When there's any disk io the machine completely freezes, i.e. no console input possible, no screen output - complete hang. After some minutes the box comes back to normal again - but sure enough with the next disk io it freezes again. To give you a typical example: While a "portsnap fetch extract" was running I did a "sync". Normally this should complete in a matter of milliseconds to seconds in the worst case - but dig this: # date;time sync;date Tue Jul 30 09:57:38 CEST 2013 0.000u 0.311s 9:54.69 0.0% 4+161k 0+1287io 0pf+0w Tue Jul 30 10:07:38 CEST 2013 # No, this is not a typo - it really took nearly ten minutes (!) for the sync to complete. In the meantime - every windows, all activity (console, screen-output etc.) is completely blocked. ('portsnap fetch extract' was only given as an example here - the lockup occurs whenever there is disk io like for example tar, etc). We're speaking about a machine with decent hardware here, here's an excerpt from "dmesg": ------------------------------ < Cut here > ------------------------------ FreeBSD 9.2-BETA2 #0 r253750: Mon Jul 29 11:07:04 CEST 2013 root@sniff-rz2:/usr/obj/usr/src/sys/GENERIC amd64 gcc version 4.2.1 20070831 patched [FreeBSD] CPU: Quad-Core AMD Opteron(tm) Processor 8358 SE (2411.16-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x100f23 Family = 0x10 Model = 0x2 Stepping = 3 Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x802009<SSE3,MON,CX16,POPCNT> AMD Features=0xee400800<SYSCALL,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!> AMD Features2=0x7ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS> TSC: P-state invariant real memory = 137438953472 (131072 MB) avail memory = 132973432832 (126813 MB) Event timer "LAPIC" quality 400 ACPI APIC Table: <HP ProLiant> FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs ... ciss0: <HP Smart Array P400> port 0x3000-0x30ff mem 0xd9e00000-0xd9efffff,0xd9df0000-0xd9df0fff irq 16 at device 0.0 on pci8 ciss0: PERFORMANT Transport ... da0 at ciss0 bus 0 scbus2 target 0 lun 0 da0: <COMPAQ RAID 1(1+0) OK> Fixed Direct Access SCSI-5 device da0: 135.168MB/s transfers da0: Command Queueing enabled da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) da0: quirks=0x1<NO_SYNC_CACHE> ------------------------------ < Cut here > ------------------------------ Kernel: Latest kernel as of yesterday (9.2Beta) BIOS: is at the latest level (Support pack as of Spring 2013) installed which updated BIOS, iLO etc. Aside from that I reset BIOS to default values just to be sure. SmartArray P400 - Firmware 7.24 (latest) Harddisks: Two 146GB HDs running in Raid1-mode. Already tried hot-swapping the disks - didn't change anything. Needless to say - no error message etc. in neither dmesg nor /var/log/messages :-( To me it looks like this is some sort of timing problem - but where should I start looking? Thanks much in advance for any help, -ewald
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130730171938.GA3602>