Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Dec 2001 15:51:44 -0800 (PST)
From:      Dennis Dai <ddai55@yahoo.com>
To:        questions@freebsd.org
Subject:   rsync causes solid deadlock
Message-ID:  <20011217235144.76997.qmail@web14703.mail.yahoo.com>

next in thread | raw e-mail | index | archive | help
Hi all,

I'm having problems with rsync hanging my server solid.

Some backgroud first: Several months ago I built a server to be a hot 
backup for another server which has about 242GB disk space holding 
mainly images (that's an image processing company). The config of the 
server:

- ASUS A7V-133E 1GHz
- 256MB SDRAM
- 30GB WD Harddrive holding the OS
- 512MB swap
- 3Com 3C905C originally, changed to Intel card (i82555 based) today
- 4 promise Ultra100 TX2 cards
- 8 80GB WD Harddrives attached to the 4 promise cards
- raid5 vinum volume on those 8 HDs
- all partitions mounted with softupdate
- cron job to use rsync to backup from the main server

At first, it was running fine. But recently, as the main server being 
filled up with files (right now about 190GB), it crashed when running 
rsync. When it crashed, it just hung the server solidly and didn't 
produce a crash dump after reboot although I configured it to do so. 
Also no traces in the log. OK, there might be a problem with rsync. 
So I tried to copy over the whole thing using tar over ssh (ssh 
root@main tar cpf - /var/images | (cd /data ; tar xpf -)), but it 
also had problems. This time I still can access the console and the 
console was full of error message like "watchdog timed out".

So I did a bit search on google and found out that it could be the 
problem with the 3Com card so I decided to change to an Intel card. 
After changing the card, when I run rsync again, it locked up after a 
while. The following is what I got from the log:

===================
Dec 17 10:43:20 opa /kernel: ad4: WRITE command timeout tag=0 serv=0 
- resetting
Dec 17 10:44:07 opa /kernel: ata2: resetting devices .. done
Dec 17 10:44:07 opa /kernel: ad6: WRITE command timeout tag=0 serv=0 
- resetting
Dec 17 10:44:07 opa /kernel: ata3: resetting devices .. done
Dec 17 10:44:07 opa /kernel: ad9: READ command timeout tag=0 serv=0 - 
resetting
Dec 17 10:44:07 opa /kernel: ata4: resetting devices .. done
Dec 17 10:44:07 opa /kernel: ad10: READ command timeout tag=0 serv=0 
- resetting
Dec 17 10:44:07 opa /kernel: ata5: resetting devices .. done
Dec 17 10:44:07 opa /kernel: ad0: WRITE command timeout tag=0 serv=0 
- resetting
Dec 17 10:44:07 opa /kernel: ata0: resetting devices .. done
Dec 17 10:44:07 opa /kernel: fxp0: SCB timeout: 0x70 0x0 0x90 0x400
Dec 17 10:44:07 opa /kernel: fxp0: SCB timeout: 0xf0 0x0 0x90 0x400
Dec 17 10:44:07 opa /kernel: ad4: WRITE command timeout tag=0 serv=0 
- resetting
Dec 17 10:44:07 opa /kernel: ata2: resetting devices .. done
Dec 17 10:44:07 opa /kernel: ad6: WRITE command timeout tag=0 serv=0 
- resetting
Dec 17 10:44:07 opa /kernel: ata3: resetting devices .. done
Dec 17 10:44:07 opa /kernel: fxp0: device timeout
Dec 17 10:44:07 opa /kernel: fxp0: SCB timeout: 0xe0 0x0 0x90 0x400
Dec 17 10:44:07 opa /kernel: fxp0: SCB timeout: 0x86 0x0 0x90 0x400
Dec 17 10:44:07 opa /kernel: fxp0: SCB timeout: 0xc0 0x0 0x90 0x400
Dec 17 10:44:07 opa /kernel: fxp0: DMA timeout
Dec 17 10:44:07 opa /kernel: fxp0: SCB timeout: 0x90 0x0 0x90 0x400
Dec 17 10:44:07 opa /kernel: fxp0: DMA timeout
Dec 17 10:44:07 opa /kernel: fxp0: SCB timeout: 0x90 0x0 0x90 0x400
Dec 17 10:44:07 opa /kernel: fxp0: SCB timeout: 0x90 0x0 0x90 0x400
Dec 17 10:44:07 opa /kernel: ad9: READ command timeout tag=0 serv=0 - 
resetting
Dec 17 10:44:07 opa /kernel: ata4: resetting devices .. done
Dec 17 10:44:07 opa /kernel: ad10: READ command timeout tag=0 serv=0 
- resetting
Dec 17 10:44:08 opa /kernel: ata5: resetting devices .. done
Dec 17 10:44:08 opa /kernel: ad0: WRITE command timeout tag=0 serv=0 
- resetting
Dec 17 10:44:08 opa /kernel: ata0: resetting devices .. done
Dec 17 10:44:08 opa /kernel: ad4: WRITE command timeout tag=0 serv=0 
- resetting
Dec 17 10:44:08 opa /kernel: ata2: resetting devices .. done
Dec 17 10:44:08 opa /kernel: ad6: WRITE command timeout tag=0 serv=0 
- resetting
Dec 17 10:44:08 opa /kernel: ata3: resetting devices .. done
Dec 17 10:44:08 opa /kernel: ad9: READ command timeout tag=0 serv=0 - 
resetting
Dec 17 10:44:08 opa /kernel: ata4: resetting devices .. done
Dec 17 10:44:08 opa /kernel: ad10: READ command timeout tag=0 serv=0 
- resetting
Dec 17 10:44:08 opa /kernel: ata5: resetting devices .. done
Dec 17 10:44:08 opa /kernel: ad0: WRITE command timeout tag=0 serv=0 
- resetting
Dec 17 10:44:08 opa /kernel: ata0: resetting devices .. done
Dec 17 10:44:08 opa /kernel: fxp0: command queue timeout
Dec 17 10:44:08 opa /kernel: ad4: WRITE command timeout tag=0 serv=0 
- resetting
Dec 17 10:44:08 opa /kernel: ata2-master: timeout waiting for 
command=ef s=d0 e=00
Dec 17 10:44:08 opa /kernel: ad4: trying fallback to PIO mode
Dec 17 10:44:08 opa /kernel: ata2: resetting devices .. done
Dec 17 10:44:08 opa /kernel: ad6: WRITE command timeout tag=0 serv=0 
- resetting
Dec 17 10:44:08 opa /kernel: ata3-master: timeout waiting for 
command=ef s=d0 e=00
Dec 17 10:44:08 opa /kernel: ad6: trying fallback to PIO mode
Dec 17 10:44:08 opa /kernel: ata3: resetting devices .. done
Dec 17 10:44:08 opa /kernel: ad9: READ command timeout tag=0 serv=0 - 
resetting
Dec 17 10:44:08 opa /kernel: ad9: trying fallback to PIO mode
Dec 17 10:44:08 opa /kernel: ata4: resetting devices .. done
Dec 17 10:44:08 opa /kernel: ad10: READ command timeout tag=0 serv=0 
- resetting
Dec 17 10:44:08 opa /kernel: ad10: trying fallback to PIO mode
Dec 17 10:44:08 opa /kernel: ata5: resetting devices .. done
Dec 17 10:44:08 opa /kernel: ad0: WRITE command timeout tag=0 serv=0 
- resetting
Dec 17 10:44:08 opa /kernel: ata0-master: timeout waiting for 
command=ef s=d0 e=00
Dec 17 10:44:08 opa /kernel: ad0: trying fallback to PIO mode
Dec 17 10:44:08 opa /kernel: ata0: resetting devices .. done
Dec 17 10:44:08 opa /kernel: fxp0: SCB timeout: 0x81 0x0 0x90 0x400
Dec 17 10:44:10 opa last message repeated 14 times
Dec 17 10:44:11 opa /kernel: ad5: READ command timeout tag=0 serv=0 - 
resetting
Dec 17 10:44:11 opa /kernel: ata2: resetting devices .. done
======================

OK, now trying the tar over ssh method. It locked up in less than 10 
minutes again. And this time it didn't even leave any trace in the 
log (actually the above is the only thing I got from the various 
crashes before), just locked up the server and no crash dump after 
reboot.

So I'm kind of stuck as to what is causing the deadlock. Anyone has 
any ideas?

TIA,

Dennis

PS. dmesg below:
=============================
Dec 17 12:07:23 opa /kernel: Copyright (c) 1992-2001 The FreeBSD 
Project.
Dec 17 12:07:23 opa /kernel: Copyright (c) 1979, 1980, 1983, 1986, 
1988, 1989, 1991, 1992, 1993, 1994
Dec 17 12:07:23 opa /kernel: The Regents of the University of 
California. All rights reserved.
Dec 17 12:07:23 opa /kernel: FreeBSD 4.4-RELEASE-p1 #3: Wed Dec 12 
09:03:56 PST 2001
Dec 17 12:07:23 opa /kernel: 
root@opa:/usr/obj/usr/src/sys/OPA
Dec 17 12:07:23 opa /kernel: Timecounter "i8254"  frequency 1193182 Hz
Dec 17 12:07:23 opa /kernel: Timecounter "TSC"  frequency 1008990763 
Hz
Dec 17 12:07:23 opa /kernel: CPU: AMD Athlon(tm) Processor 
(1008.99-MHz 686-class CPU)
Dec 17 12:07:23 opa /kernel: Origin = "AuthenticAMD"  Id = 0x642  
Stepping = 2
Dec 17 12:07:23 opa /kernel: 
Features=0x183f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR>Dec

17 12:07:23 opa /kernel: AMD 
Features=0xc0440000<<b18>,AMIE,DSP,3DNow!>
Dec 17 12:07:23 opa /kernel: real memory  = 268353536 (262064K bytes)
Dec 17 12:07:23 opa /kernel: avail memory = 258154496 (252104K bytes)
Dec 17 12:07:23 opa /kernel: Preloaded elf kernel "kernel" at 
0xc02f2000.
Dec 17 12:07:23 opa /kernel: Pentium Pro MTRR support enabled
Dec 17 12:07:23 opa /kernel: Using $PIR table, 9 entries at 0xc00f1690
Dec 17 12:07:23 opa /kernel: npx0: <math processor> on motherboard
Dec 17 12:07:23 opa /kernel: npx0: INT 16 interface
Dec 17 12:07:23 opa /kernel: pcib0: <Host to PCI bridge> on 
motherboard
Dec 17 12:07:23 opa /kernel: pci0: <PCI bus> on pcib0
Dec 17 12:07:23 opa /kernel: pcib2: <PCI to PCI bridge (vendor=1106 
device=b115)> at device 1.0 on pci0
Dec 17 12:07:23 opa /kernel: pci1: <PCI bus> on pcib2
Dec 17 12:07:23 opa /kernel: pci1: <NVidia Riva Ultra Vanta TNT2 
graphics accelerator> at 0.0 irq 11
Dec 17 12:07:23 opa /kernel: isab0: <VIA 82C686 PCI-ISA bridge> at 
device 7.0 on pci0
Dec 17 12:07:23 opa /kernel: isa0: <ISA bus> on isab0
Dec 17 12:07:23 opa /kernel: atapci0: <VIA 82C686 ATA100 controller> 
port 0xd800-0xd80f at device 7.1 on pci0
Dec 17 12:07:23 opa /kernel: ata0: at 0x1f0 irq 14 on atapci0
Dec 17 12:07:23 opa /kernel: ata1: at 0x170 irq 15 on atapci0
Dec 17 12:07:23 opa /kernel: atapci1: <Promise TX2 ATA100 controller> 
port 
0x9000-0x900f,0x9400-0x9403,0x9800-0x9807,0xa000-0xa003,0xa400-0xa407 
mem 0xf7800000-0xf7803fff irq 12 at device 10.0 on pci0
Dec 17 12:07:23 opa /kernel: ata2: at 0xa400 on atapci1
Dec 17 12:07:23 opa /kernel: ata3: at 0x9800 on atapci1
Dec 17 12:07:23 opa /kernel: atapci2: <Promise TX2 ATA100 controller> 
port 
0x7400-0x740f,0x7800-0x7803,0x8000-0x8007,0x8400-0x8403,0x8800-0x8807 
mem 0xf7000000-0xf7003fff irq 12 at device 11.0 on pci0
Dec 17 12:07:23 opa /kernel: ata4: at 0x8800 on atapci2
Dec 17 12:07:23 opa /kernel: ata5: at 0x8000 on atapci2
Dec 17 12:07:23 opa /kernel: fxp0: <Intel Pro 10/100B/100+ Ethernet> 
port 0x7000-0x703f mem 0xf6000000-0xf601ffff,0xf6800000-0xf6800fff 
irq 10 at device 14.0 on pci0
Dec 17 12:07:23 opa /kernel: fxp0: Ethernet address 00:02:b3:4b:a4:56
Dec 17 12:07:23 opa /kernel: inphy0: <i82555 10/100 media interface> 
on miibus0
Dec 17 12:07:23 opa /kernel: inphy0:  10baseT, 10baseT-FDX, 
100baseTX, 100baseTX-FDX, auto
Dec 17 12:07:23 opa /kernel: pcib1: <Host to PCI bridge> on 
motherboard
Dec 17 12:07:23 opa /kernel: pci2: <PCI bus> on pcib1
Dec 17 12:07:23 opa /kernel: orm0: <Option ROMs> at iomem 
0xc0000-0xcffff,0xd0000-0xd3fff,0xd4000-0xd5fff,0xd8000-0xdbfff,0xdc000-0xdd7ff

on isa0
Dec 17 12:07:23 opa /kernel: fdc0: direction bit not set
Dec 17 12:07:23 opa /kernel: fdc0: cmd 3 failed at out byte 1 of 3
Dec 17 12:07:23 opa /kernel: atkbdc0: <Keyboard controller (i8042)> 
at port 0x60,0x64 on isa0
Dec 17 12:07:23 opa /kernel: atkbd0: <AT Keyboard> flags 0x1 irq 1 on 
atkbdc0
Dec 17 12:07:23 opa /kernel: kbd0 at atkbd0
Dec 17 12:07:23 opa /kernel: vga0: <Generic ISA VGA> at port 
0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Dec 17 12:07:23 opa /kernel: sc0: <System console> at flags 0x100 on 
isa0
Dec 17 12:07:23 opa /kernel: sc0: VGA <16 virtual consoles, 
flags=0x300>
Dec 17 12:07:23 opa /kernel: sio0 at port 0x3f8-0x3ff irq 4 flags 
0x10 on isa0
Dec 17 12:07:23 opa /kernel: sio0: type 16550A
Dec 17 12:07:23 opa /kernel: sio1 at port 0x2f8-0x2ff irq 3 on isa0
Dec 17 12:07:23 opa /kernel: sio1: type 16550A
Dec 17 12:07:23 opa /kernel: ppc0: <Parallel port> at port 
0x378-0x37f irq 7 on isa0
Dec 17 12:07:23 opa /kernel: ppc0: SMC-like chipset 
(ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
Dec 17 12:07:23 opa /kernel: ppc0: FIFO with 16/16/8 bytes threshold
Dec 17 12:07:23 opa /kernel: lpt0: <Printer> on ppbus0
Dec 17 12:07:23 opa /kernel: lpt0: Interrupt-driven port
Dec 17 12:07:23 opa /kernel: ad0: 28629MB <WDC WD300AB-22BVA0> 
[58168/16/63] at ata0-master UDMA100
Dec 17 12:07:23 opa /kernel: ad4: 76319MB <WDC WD800BB-00BSA0> 
[155061/16/63] at ata2-master UDMA100
Dec 17 12:07:23 opa /kernel: ad5: 76319MB <WDC WD800BB-00BSA0> 
[155061/16/63] at ata2-slave UDMA100
Dec 17 12:07:23 opa /kernel: ad6: 76319MB <WDC WD800BB-00CCB0> 
[155061/16/63] at ata3-master UDMA100
Dec 17 12:07:23 opa /kernel: ad7: 76319MB <WDC WD800BB-00BSA0> 
[155061/16/63] at ata3-slave UDMA100
Dec 17 12:07:23 opa /kernel: ad8: 76319MB <WDC WD800BB-00BSA0> 
[155061/16/63] at ata4-master UDMA100
Dec 17 12:07:23 opa /kernel: ad9: 76319MB <WDC WD800BB-00CCB0> 
[155061/16/63] at ata4-slave UDMA100
Dec 17 12:07:23 opa /kernel: ad10: 76319MB <WDC WD800BB-00BSA0> 
[155061/16/63] at ata5-master UDMA100
Dec 17 12:07:23 opa /kernel: ad11: 76319MB <WDC WD800BB-00CCB0> 
[155061/16/63] at ata5-slave UDMA100
Dec 17 12:07:23 opa /kernel: Mounting root from ufs:/dev/ad0s1a
Dec 17 12:07:23 opa /kernel: WARNING: / was not properly dismounted

__________________________________________________
Do You Yahoo!?
Check out Yahoo! Shopping and Yahoo! Auctions for all of
your unique holiday gifts! Buy at http://shopping.yahoo.com
or bid at http://auctions.yahoo.com

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011217235144.76997.qmail>