From owner-freebsd-stable@FreeBSD.ORG Thu Oct 7 17:39:02 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 50D8F106566C for ; Thu, 7 Oct 2010 17:39:02 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta12.westchester.pa.mail.comcast.net (qmta12.westchester.pa.mail.comcast.net [76.96.59.227]) by mx1.freebsd.org (Postfix) with ESMTP id D65658FC08 for ; Thu, 7 Oct 2010 17:39:01 +0000 (UTC) Received: from omta04.westchester.pa.mail.comcast.net ([76.96.62.35]) by qmta12.westchester.pa.mail.comcast.net with comcast id FnAf1f0010ldTLk5Ctf1XJ; Thu, 07 Oct 2010 17:39:01 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta04.westchester.pa.mail.comcast.net with comcast id Ftf01f0013LrwQ23Qtf0Fd; Thu, 07 Oct 2010 17:39:01 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id B004D9B427; Thu, 7 Oct 2010 10:38:58 -0700 (PDT) Date: Thu, 7 Oct 2010 10:38:58 -0700 From: Jeremy Chadwick To: John Hay Message-ID: <20101007173858.GA85862@icarus.home.lan> References: <20101007121558.GA70199@zibbi.meraka.csir.co.za> <20101007155042.GA88362@zibbi.meraka.csir.co.za> <20101007173102.GA95405@zibbi.meraka.csir.co.za> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20101007173102.GA95405@zibbi.meraka.csir.co.za> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org, Andriy Gapon Subject: Re: zfs hang in zio->io_cv) with dd read X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Oct 2010 17:39:02 -0000 On Thu, Oct 07, 2010 at 07:31:02PM +0200, John Hay wrote: > On Thu, Oct 07, 2010 at 06:19:48PM +0200, Goran Lowkrantz wrote: > > --On October 7, 2010 17:50:42 +0200 John Hay wrote: > > > > >On Thu, Oct 07, 2010 at 02:35:31PM +0200, Ivan Voras wrote: > > >>On 10/07/10 14:15, John Hay wrote: > > >>> Hi, > > >>> > > >>> I got hold of a SunFire X4500 with 48 X 500G disks and thought to try > > >>> FreeBSD 8-stable with zfs on it. > > >>> > > >>> I have setup the two boot disks in a zfs mirror and then the rest in > > >>> a pool of 6 X raidz2 of 7 disks each. > > >>> > > >>> I have created a 10G file with dd in the second pool, but if I try to > > >>> read it with dd, dd will hang in "zio->io_cv)" according to ^T. This > > >>> happens everytime. The first time I saw messages about an interrupt > > >>> storm, so I have put "hw.intr_storm_threshold=10000" in > > >>> /etc/sysctl.conf. According to "systat -vm 1" there is atapci for 2-3 > > >>> seconds and then it is quiet. > > >> > > >>There are two things you could try: 1) use the AHCI driver > > >>(ahci_load="YES" in /boot/loader.conf) and 2) disable superpages, they > > >>don't get along on a few models of Opterons (vm.pmap.pg_ps_enabled=0 in > > >>/boot/loader.conf). > > > > > >ahci does not grab them. According to the ahci man page, it can handle > > >Marvell 88SX61xx, while these are MV88SX6081 according to pciconf -lcv: > > > > > >atapci0@pci0:1:1:0: class=0x010000 card=0x11ab11ab chip=0x608111ab > > >rev=0x09 hdr=0x00 vendor = 'Marvell Semiconductor (Was: Galileo > > >Technology Ltd)' device = 'MV88SX6081 8-port SATA II PCI-X > > >Controller' > > > class = mass storage > > > subclass = SCSI > > > cap 01[40] = powerspec 2 supports D0 D3 current D0 > > > cap 05[50] = MSI supports 1 message, 64 bit > > > cap 07[60] = PCI-X 64-bit supports 133MHz, 512 burst read, 4 split > > >transactions > > > > Then try mvs_load="YES" > > > > mvs0@pci0:6:2:0: class=0x010000 card=0x11ab11ab chip=0x608111ab > > rev=0x09 hdr=0x00 > > vendor = 'Marvell Semiconductor (Was: Galileo Technology Ltd)' > > device = 'MV88SX6081 8-port SATA II PCI-X Controller' > > class = mass storage > > subclass = SCSI > > mvs1@pci0:5:1:0: class=0x010000 card=0x11ab11ab chip=0x608111ab > > rev=0x09 hdr=0x00 > > vendor = 'Marvell Semiconductor (Was: Galileo Technology Ltd)' > > device = 'MV88SX6081 8-port SATA II PCI-X Controller' > > class = mass storage > > subclass = SCSI > > That helped, thanks. Now the disks are detected as adaXX devices. > > The problem still happens though. I think it takes a little longer after > I have started dd before it hangs, but it still hangs. > > One thing seems a little different though. Occasionaly a short burst of > interrupts on the mvsX devices do come through. It also seems that a few > seconds after I press ^T in the dd window, I see a burst of mvsX > interrupts happen and dd will report in/out records and bytes. This did > not happen with the ata driver. It is still hanging in "zio->io_cv)" > though. > > I also see these messages in /var/log/messages > > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 16 (->0) 0 4000 > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 17 (->0) 1 4000 > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 18 (->0) 2 4000 > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 20 (->0) 0 4000 > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 21 (->0) 1 4000 > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 22 (->0) 2 4000 > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 23 (->0) 0 4000 > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 24 (->0) 0 4000 > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 25 (->0) 1 4000 > Oct 7 17:08:04 thumper1 kernel: > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 26 (->0) 0 4000 > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 27 (->0) 0 4000 > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 28 (->0) 1 4000 > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 29 (->0) 2 4000 > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 30 (->0) 3 4000 > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 31 (->0) 0 4000 > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 2 (->18) 1 4000 > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 5 (->18) 0 4000 > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 6 (->18) 0 4000 > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 7 (->18) 0 4000 > Oct 7 17:08:04 thumper1 kernel: mvsch31: EMPTY CRPB 8 (->18) 1 4000 > Oct 7 17:08:05 thumper1 kernel: > Oct 7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 9 (->18) 2 4000 > Oct 7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 10 (->18) 0 4000 > Oct 7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 11 (->18) 1 4000 > Oct 7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 12 (->18) 0 4000 > Oct 7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 13 (->18) 0 4000 > Oct 7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 14 (->18) 1 4000 > Oct 7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 15 (->18) 2 4000 > Oct 7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 16 (->18) 3 4000 > Oct 7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 17 (->18) 0 4000 > Oct 7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 18 (->22) 1 4000 > Oct 7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 19 (->22) 2 4000 > Oct 7 17:08:05 thumper1 kernel: > Oct 7 17:08:05 thumper1 kernel: mvsch31: EMPTY CRPB 20 (->22) 3 4000 > Oct 7 17:08:05 thumper1 kernel: mvsch20: EMPTY CRPB 30 (->31) 0 0000 > Oct 7 17:08:05 thumper1 kernel: mvsch20: EMPTY CRPB 7 (->0) 0 4000 > Oct 7 17:08:05 thumper1 kernel: mvsch20: EMPTY CRPB 8 (->0) 1 4000 > Oct 7 17:08:05 thumper1 kernel: mvsch20: EMPTY CRPB 10 (->0) 3 4000 > Oct 7 17:08:05 thumper1 kernel: mvsch20: EMPTY CRPB 11 (->0) 0 4000 > Oct 7 17:08:05 thumper1 kernel: mvsch20: EMPTY CRPB 12 (->0) 1 4000 > Oct 7 17:08:05 thumper1 kernel: mvsch20: EMPTY CRPB 13 (->0) 2 4000 > Oct 7 17:08:05 thumper1 kernel: mvsch20: EMPTY CRPB 14 (->0) 3 4000 > ... > Oct 7 17:08:09 thumper1 kernel: mvsch19: EMPTY CRPB 9 (->10) 0 0000 > Oct 7 17:08:39 thumper1 kernel: mvsch19: Timeout on slot 1 > Oct 7 17:08:39 thumper1 kernel: mvsch19: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001024 dma_c 00000000 dma_s 00000000 rs 00000002 status 50 > Oct 7 17:08:39 thumper1 kernel: mvsch22: EMPTY CRPB 19 (->0) 1 4000 > Oct 7 17:08:39 thumper1 kernel: mvsch22: EMPTY CRPB 20 (->0) 3 4000 > Oct 7 17:08:39 thumper1 kernel: mvsch22: EMPTY CRPB 21 (->0) 2 4000 > ... > Oct 7 17:08:43 thumper1 kernel: mvsch18: EMPTY CRPB 14 (->15) 0 4000 > Oct 7 17:08:43 thumper1 kernel: mvsch29: EMPTY CRPB 29 (->0) 0 4000 > Oct 7 17:09:13 thumper1 kernel: mvsch29: Timeout on slot 1 > Oct 7 17:09:13 thumper1 kernel: mvsch29: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001025 dma_c 00000000 dma_s 00000000 rs 00000002 status 50 > Oct 7 17:09:14 thumper1 kernel: mvsch20: EMPTY CRPB 27 (->0) 1 4000 > Oct 7 17:09:14 thumper1 kernel: mvsch20: EMPTY CRPB 29 (->0) 3 4000 > Oct 7 17:09:14 thumper1 kernel: mvsch20: EMPTY CRPB 30 (->0) 0 4000 > ... > Oct 7 17:09:14 thumper1 kernel: mvsch20: EMPTY CRPB 26 (->29) 1 4000 > Oct 7 17:09:14 thumper1 kernel: mvsch20: EMPTY CRPB 27 (->29) 2 4000 > Oct 7 17:09:14 thumper1 kernel: mvsch20: EMPTY CRPB 28 (->29) 0 4000 > Oct 7 17:09:14 thumper1 kernel: mvsch20: EMPTY CRPB 8 (->9) 0 0000 > Oct 7 17:09:44 thumper1 kernel: mvsch20: Timeout on slot 1 > Oct 7 17:09:44 thumper1 kernel: mvsch20: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001123 dma_c 00000000 dma_s 00000000 rs 00000002 status 50 > Oct 7 17:09:45 thumper1 kernel: mvsch12: EMPTY CRPB 26 (->27) 0 002a > Oct 7 17:10:15 thumper1 kernel: mvsch12: Timeout on slot 1 > Oct 7 17:10:15 thumper1 kernel: mvsch12: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001022 dma_c 00000000 dma_s 00000000 rs 00000002 status 50 > Oct 7 17:10:15 thumper1 kernel: mvsch4: EMPTY CRPB 14 (->15) 0 0000 > Oct 7 17:10:45 thumper1 kernel: mvsch4: Timeout on slot 1 > Oct 7 17:10:45 thumper1 kernel: mvsch4: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001020 dma_c 00000000 dma_s 00000000 rs 00000002 status 50 > Oct 7 17:10:45 thumper1 kernel: mvsch22: EMPTY CRPB 7 (->0) 1 4000 > Oct 7 17:10:45 thumper1 kernel: mvsch22: EMPTY CRPB 8 (->0) 0 4000 > Oct 7 17:10:45 thumper1 kernel: mvsch22: EMPTY CRPB 9 (->0) 1 4000 > ... > Oct 7 17:10:48 thumper1 kernel: mvsch19: EMPTY CRPB 21 (->26) 3 4000 > Oct 7 17:10:48 thumper1 kernel: mvsch21: EMPTY CRPB 19 (->20) 0 0000 > Oct 7 17:10:48 thumper1 kernel: mvsch21: EMPTY CRPB 28 (->29) 0 0000 > Oct 7 17:11:18 thumper1 kernel: mvsch21: Timeout on slot 2 > Oct 7 17:11:18 thumper1 kernel: mvsch21: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001026 dma_c 00000000 dma_s 00000000 rs 00000014 status 50 > Oct 7 17:11:18 thumper1 kernel: mvsch21: ... waiting for slots 00000010 > Oct 7 17:11:18 thumper1 kernel: mvsch21: Timeout on slot 4 > Oct 7 17:11:18 thumper1 kernel: mvsch21: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001026 dma_c 00000000 dma_s 00000000 rs 00000014 status 50 > Oct 7 17:11:19 thumper1 kernel: mvsch27: EMPTY CRPB 22 (->23) 0 0000 > Oct 7 17:11:19 thumper1 kernel: mvsch21: EMPTY CRPB 12 (->13) 0 4000 > Oct 7 17:11:19 thumper1 kernel: mvsch29: EMPTY CRPB 6 (->7) 0 0000 > ... > Oct 7 17:11:19 thumper1 kernel: mvsch1: EMPTY CRPB 13 (->16) 0 4000 > Oct 7 17:11:19 thumper1 kernel: mvsch1: EMPTY CRPB 14 (->16) 1 4000 > Oct 7 17:11:19 thumper1 kernel: mvsch1: EMPTY CRPB 15 (->16) 0 4000 > Oct 7 17:11:49 thumper1 kernel: mvsch27: Timeout on slot 1 > Oct 7 17:11:49 thumper1 kernel: mvsch27: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001023 dma_c 00000000 dma_s 00000000 rs 00000002 status 50 > Oct 7 17:11:49 thumper1 kernel: mvsch21: Timeout on slot 1 > Oct 7 17:11:49 thumper1 kernel: mvsch21: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001024 dma_c 00000000 dma_s 00000000 rs 00000002 status 50 > Oct 7 17:11:49 thumper1 kernel: mvsch29: Timeout on slot 1 > Oct 7 17:11:49 thumper1 kernel: mvsch29: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001024 dma_c 00000000 dma_s 00000000 rs 0000000a status 50 > Oct 7 17:11:49 thumper1 kernel: mvsch29: ... waiting for slots 00000008 > Oct 7 17:11:49 thumper1 kernel: mvsch3: Timeout on slot 2 > Oct 7 17:11:49 thumper1 kernel: mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001023 dma_c 00000000 dma_s 00000000 rs 00000004 status 50 > Oct 7 17:11:49 thumper1 kernel: mvsch1: EMPTY CRPB 18 (->20) 1 4000 > Oct 7 17:11:49 thumper1 kernel: mvsch29: Timeout on slot 3 > Oct 7 17:11:49 thumper1 kernel: mvsch29: iec 02000000 sstat 00000123 serr 00000000 edma_s 00001024 dma_c 00000000 dma_s 00000000 rs 0000000a status 50 > Oct 7 17:11:49 thumper1 kernel: mvsch2: EMPTY CRPB 14 (->15) 0 0000 > Oct 7 17:11:49 thumper1 kernel: mvsch30: EMPTY CRPB 21 (->22) 0 0000 > Oct 7 17:11:49 thumper1 kernel: mvsch12: EMPTY CRPB 22 (->23) 0 0000 > Oct 7 17:11:49 thumper1 kernel: mvsch23: EMPTY CRPB 30 (->0) 0 4000 > ... > > Ahh and dd did finish after a long time: > > > dd if=tst.dd of=/dev/null bs=64k > load: 0.87 cmd: dd 1399 [zio->io_cv)] 35.50r 0.01u 10.57s 0% 868k > 107740+0 records in > 107740+0 records out > 7060848640 bytes transferred in 48.836175 secs (144582344 bytes/sec) > > load: 0.44 cmd: dd 1399 [zio->io_cv)] 76.87r 0.01u 11.99s 0% 880k > 114476+0 records in > 114476+0 records out > 7502299136 bytes transferred in 83.218699 secs (90151603 bytes/sec) > load: 0.79 cmd: dd 1399 [zio->io_cv)] 104.27r 0.01u 12.37s 0% 880k > 118032+0 records in > 118032+0 records out > 7735345152 bytes transferred in 114.212610 secs (67727593 bytes/sec) > load: 0.29 cmd: dd 1399 [zio->io_cv)] 165.37r 0.01u 12.77s 0% 880k > 121816+0 records in > 121816+0 records out > 7983333376 bytes transferred in 174.702303 secs (45696784 bytes/sec) > load: 0.01 cmd: dd 1399 [zio->io_cv)] 454.84r 0.01u 15.57s 0% 880k > 134566+0 records in > 134566+0 records out > 8818917376 bytes transferred in 455.140146 secs (19376268 bytes/sec) > load: 0.00 cmd: dd 1399 [zio->io_cv)] 671.34r 0.02u 16.71s 0% 880k > 143446+0 records in > 143446+0 records out > 9400877056 bytes transferred in 699.953712 secs (13430712 bytes/sec) > 160000+0 records in > 160000+0 records out > 10485760000 bytes transferred in 1077.266770 secs (9733671 bytes/sec) Isn't slow I/O one of the results of ARC starvation? Please also provide vmstat -i output. Also adding Andriy Gapon to the CC list. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |