Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Feb 2006 23:37:47 +0100
From:      Paolo Maero <paolo@euresis.it>
To:        Michael Reifenberger <mike@Reifenberger.com>, FreeBSD Stable <freebsd-stable@freebsd.org>
Subject:   Re: IDE DMA Timeouts
Message-ID:  <20B290A8-6AED-4767-9DE8-082CB9D35353@euresis.it>
In-Reply-To: <20060210132529.I6359@fw.reifenberger.com>
References:  <20060210111959.Y5942@fw.reifenberger.com> <20060210121952.GB4925@bsd.trippelsdorf.de> <20060210132529.I6359@fw.reifenberger.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Feb 10, 2006, at 1:28 PM, Michael Reifenberger wrote:

> On Fri, 10 Feb 2006, Markus Trippelsdorf wrote:
> ...
>>> ...
>>> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=58914495
>>> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=123039679
>>> ad1: WARNING - WRITE_DMA UDMA ICRC error (retrying request)  
>>> LBA=54591167
>>
>> It looks like bad cabling to me. Try new cables and also run
>> smartctl -a /dev/ad0 (and ad1) to check if the hardware is OK.
>>
> smartctl doesn't reports any errors, and accessing only on disk at  
> a time
> doesn't give errors either. So probably cabling isn't the issue here.
> More likely a timing/locking interaction between gmirror/ata...
>
> Bye/2
> ---
> Michael Reifenberger, Business Development Manager SAP-Basis, Plaut  
> Consulting
> Comp: Michael.Reifenberger@plaut.de | Priv: Michael@Reifenberger.com
>       http://www.plaut.de           |       http:// 
> www.Reifenberger.com
>
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable- 
> unsubscribe@freebsd.org"
>


I have the same problem with 6.0 and gmirror. It's not cabling or HW  
problem as it works fine with FreeBSD 5.4 or previous, OpenBSD 3.8  
and Linux 2.6.x, with and without mirroring.

I have a 2xPIII 700 MHz SMP system with Maxtor PCI SATA controller  
and two 250 GB Maxtor disks, plus SCSI disks for the OS.

atapci0: <Promise PDC20375 SATA150 controller> port 0x3080-0x30bf, 
0x30c0-0x30cf,0x3000-0x307f mem 0xf4220000-0xf4220fff, 
0xf4200000-0xf421ffff irq 20 at device 4.0 on pci2
ad4: 239372MB <Maxtor 7L250S0 BANC1E00> at ata2-master SATA150
ad6: 239372MB <Maxtor 7L250S0 BANC1E00> at ata2-master SATA150

The disks are gmirror'ed and never completes a synchronization. When  
reaching around 40%-45% of gmirror resynch the system crashes. No log  
is written and the screen has some garbage in it. The system also  
crashes occasionally under heavy load, after logging some TIMEOUT -  
READ_DMA errors.

The system is then unable to boot again. It crashes during the boot  
sequence when the mirror is reestablished and the resynch is started  
again. It crashes also if the resynch is prevented (through  
NOAUTOSYNCH). I need to boot the installation CD and clean gmirror  
metadata on one disk to be able to boot.

I have 6.0-RELEASE-p4 (but it happens on any 6.0) and this is my kernel:

# include standard distribution's SMP kernel build file, which in  
turns include the generic kernel build file (named GENERIC)
include         SMP

# set custom kernel ident name
ident           ZOE_020

# additional/overridden settings starts here
nooptions       PREEMPTION              # Disable kernel thread  
preemption

# standard system settings
maxusers        64                      # 64 users is a lot, but we  
should have plenty of memory!
options         INCLUDE_CONFIG_FILE     # Include this file in kernel  
for reference

# memory settings - 2 GBytes for data or stack maximum size, 1 GByte  
as default initial size
options         MAXDSIZ=(2048UL*1024*1024)
options         MAXSSIZ=(2048UL*1024*1024)
options         DFLDSIZ=(1024UL*1024*1024)

# SYSV options (shared memory, semaphores, message queues)
options         SEMMAP=63               # Maximum number of entries  
in a semaphore map.
options         SEMMNI=512              # Maximum number of System V  
semaphores that can be used on the system at one time.
options         SEMMNS=512              # Total number of semaphores  
system wide
options         SEMMNU=512              # Total number of undo  
structures in system
options         SEMMSL=64               # Maximum number of System V  
semaphores that can be used by a single process at one time.
options         SEMOPM=128              # Maximum number of  
operations that can be outstanding on a single System V semaphore at  
one time.
options         SEMUME=48               # Maximum number of undo  
operations that can be outstanding on a single System V semaphore at  
one time.
options         SHMALL=262144           # Maximum number of shared  
memory pages system wide.
options         SHMMAX=(SHMMAXPGS*PAGE_SIZE+1)  # Maximum size, in  
bytes, of a single System V shared memory region.
options         SHMMAXPGS=262144        # Maximum size, in pages, of  
a single System V shared memory region.
options         SHMMIN=2                # Minimum size, in bytes, of  
a single System V shared memory region.
options         SHMMNI=128              # Maximum number of shared  
memory regions that can be used on the system at one time.
options         SHMSEG=32               # Maximum number of System V  
shared memory regions that can be attached to a single process at one  
time.
options         MSGMNB=2049             # Max number of chars in queue
options         MSGMNI=41               # Max number of message queue  
identifiers
options         MSGSEG=2049             # Max number of message segments
options         MSGSSZ=16               # Size of a message segment  
(must be a power of 2 between 8 and 1024)
options         MSGTQL=41               # Max number of messages in  
system

I need to go to production soon and I want FreeBSD 6.0 as Linux/other- 
BSD don't fit my requirements (e.g. jail, GEOM...), so I changed the  
controller to a Promise TX2300. It still has problems with 6.0 as  
again, at 40%-45% of mirror rebuild I get this error:

ad4: req=0xc1c43d48 SETFEATURES SET TRANSFER MODE semaphore  
timeout !! DANGER Will Robinson !!

... I am quite worried as I need to trust storage! Any idea of what  
is the problem?

If needed I can provide more data or make tests. I have a photo of  
the screen of the panic during the boot sequence (1MB)

Thanks and excuse me for the bad english!

Paolo




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20B290A8-6AED-4767-9DE8-082CB9D35353>