From owner-freebsd-stable@FreeBSD.ORG Fri Feb 10 22:37:54 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 08F7816A420 for ; Fri, 10 Feb 2006 22:37:54 +0000 (GMT) (envelope-from paolo@euresis.it) Received: from agnus.ngi.it (ns.virtuo.it [88.149.128.9]) by mx1.FreeBSD.org (Postfix) with ESMTP id ACD5443D46 for ; Fri, 10 Feb 2006 22:37:51 +0000 (GMT) (envelope-from paolo@euresis.it) Received: from smtp.paolo.maero.net (81-174-13-252.f5.ngi.it [81.174.13.252]) by agnus.ngi.it (8.13.5/8.13.5) with ESMTP id k1AMbmLj016663; Fri, 10 Feb 2006 23:37:49 +0100 Received: from [192.168.2.16] (mac.paolo.maero.net [192.168.2.16]) by smtp.paolo.maero.net (Postfix) with ESMTP id 4D06B145F426; Fri, 10 Feb 2006 23:37:48 +0100 (CET) Mime-Version: 1.0 (Apple Message framework v746.2) In-Reply-To: <20060210132529.I6359@fw.reifenberger.com> References: <20060210111959.Y5942@fw.reifenberger.com> <20060210121952.GB4925@bsd.trippelsdorf.de> <20060210132529.I6359@fw.reifenberger.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <20B290A8-6AED-4767-9DE8-082CB9D35353@euresis.it> Content-Transfer-Encoding: 7bit From: Paolo Maero Date: Fri, 10 Feb 2006 23:37:47 +0100 To: Michael Reifenberger , FreeBSD Stable X-Mailer: Apple Mail (2.746.2) Cc: Subject: Re: IDE DMA Timeouts X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Feb 2006 22:37:54 -0000 On Feb 10, 2006, at 1:28 PM, Michael Reifenberger wrote: > On Fri, 10 Feb 2006, Markus Trippelsdorf wrote: > ... >>> ... >>> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=58914495 >>> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=123039679 >>> ad1: WARNING - WRITE_DMA UDMA ICRC error (retrying request) >>> LBA=54591167 >> >> It looks like bad cabling to me. Try new cables and also run >> smartctl -a /dev/ad0 (and ad1) to check if the hardware is OK. >> > smartctl doesn't reports any errors, and accessing only on disk at > a time > doesn't give errors either. So probably cabling isn't the issue here. > More likely a timing/locking interaction between gmirror/ata... > > Bye/2 > --- > Michael Reifenberger, Business Development Manager SAP-Basis, Plaut > Consulting > Comp: Michael.Reifenberger@plaut.de | Priv: Michael@Reifenberger.com > http://www.plaut.de | http:// > www.Reifenberger.com > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable- > unsubscribe@freebsd.org" > I have the same problem with 6.0 and gmirror. It's not cabling or HW problem as it works fine with FreeBSD 5.4 or previous, OpenBSD 3.8 and Linux 2.6.x, with and without mirroring. I have a 2xPIII 700 MHz SMP system with Maxtor PCI SATA controller and two 250 GB Maxtor disks, plus SCSI disks for the OS. atapci0: port 0x3080-0x30bf, 0x30c0-0x30cf,0x3000-0x307f mem 0xf4220000-0xf4220fff, 0xf4200000-0xf421ffff irq 20 at device 4.0 on pci2 ad4: 239372MB at ata2-master SATA150 ad6: 239372MB at ata2-master SATA150 The disks are gmirror'ed and never completes a synchronization. When reaching around 40%-45% of gmirror resynch the system crashes. No log is written and the screen has some garbage in it. The system also crashes occasionally under heavy load, after logging some TIMEOUT - READ_DMA errors. The system is then unable to boot again. It crashes during the boot sequence when the mirror is reestablished and the resynch is started again. It crashes also if the resynch is prevented (through NOAUTOSYNCH). I need to boot the installation CD and clean gmirror metadata on one disk to be able to boot. I have 6.0-RELEASE-p4 (but it happens on any 6.0) and this is my kernel: # include standard distribution's SMP kernel build file, which in turns include the generic kernel build file (named GENERIC) include SMP # set custom kernel ident name ident ZOE_020 # additional/overridden settings starts here nooptions PREEMPTION # Disable kernel thread preemption # standard system settings maxusers 64 # 64 users is a lot, but we should have plenty of memory! options INCLUDE_CONFIG_FILE # Include this file in kernel for reference # memory settings - 2 GBytes for data or stack maximum size, 1 GByte as default initial size options MAXDSIZ=(2048UL*1024*1024) options MAXSSIZ=(2048UL*1024*1024) options DFLDSIZ=(1024UL*1024*1024) # SYSV options (shared memory, semaphores, message queues) options SEMMAP=63 # Maximum number of entries in a semaphore map. options SEMMNI=512 # Maximum number of System V semaphores that can be used on the system at one time. options SEMMNS=512 # Total number of semaphores system wide options SEMMNU=512 # Total number of undo structures in system options SEMMSL=64 # Maximum number of System V semaphores that can be used by a single process at one time. options SEMOPM=128 # Maximum number of operations that can be outstanding on a single System V semaphore at one time. options SEMUME=48 # Maximum number of undo operations that can be outstanding on a single System V semaphore at one time. options SHMALL=262144 # Maximum number of shared memory pages system wide. options SHMMAX=(SHMMAXPGS*PAGE_SIZE+1) # Maximum size, in bytes, of a single System V shared memory region. options SHMMAXPGS=262144 # Maximum size, in pages, of a single System V shared memory region. options SHMMIN=2 # Minimum size, in bytes, of a single System V shared memory region. options SHMMNI=128 # Maximum number of shared memory regions that can be used on the system at one time. options SHMSEG=32 # Maximum number of System V shared memory regions that can be attached to a single process at one time. options MSGMNB=2049 # Max number of chars in queue options MSGMNI=41 # Max number of message queue identifiers options MSGSEG=2049 # Max number of message segments options MSGSSZ=16 # Size of a message segment (must be a power of 2 between 8 and 1024) options MSGTQL=41 # Max number of messages in system I need to go to production soon and I want FreeBSD 6.0 as Linux/other- BSD don't fit my requirements (e.g. jail, GEOM...), so I changed the controller to a Promise TX2300. It still has problems with 6.0 as again, at 40%-45% of mirror rebuild I get this error: ad4: req=0xc1c43d48 SETFEATURES SET TRANSFER MODE semaphore timeout !! DANGER Will Robinson !! ... I am quite worried as I need to trust storage! Any idea of what is the problem? If needed I can provide more data or make tests. I have a photo of the screen of the panic during the boot sequence (1MB) Thanks and excuse me for the bad english! Paolo