Date: Tue, 10 Jul 2012 11:08:15 -0700 (PDT) From: Dennis Glatting <dg@pki2.com> To: George Kontostanos <gkontos.mail@gmail.com> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS hanging Message-ID: <alpine.BSF.2.00.1207101100270.69533@btw.pki2.com> In-Reply-To: <CA%2BdUSypXZ5qq9vj%2BkaVicfkUHEQEeaCt46F%2Bg%2B7wDyATbw9UbA@mail.gmail.com> References: <1341864787.32803.43.camel@btw.pki2.com> <CA%2BdUSypXZ5qq9vj%2BkaVicfkUHEQEeaCt46F%2Bg%2B7wDyATbw9UbA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 10 Jul 2012, George Kontostanos wrote: > On Mon, Jul 9, 2012 at 11:13 PM, Dennis Glatting <freebsd@pki2.com> wrote: >> I have a ZFS array of disks where the system simply stops as if forever >> blocked by some IO mutex. This happens often and the following is the >> output of top: >> >> last pid: 6075; load averages: 0.00, 0.00, 0.00 up 0+16:54:41 >> 13:04:10 >> 135 processes: 1 running, 134 sleeping >> CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle >> Mem: 47M Active, 24M Inact, 18G Wired, 120M Buf, 44G Free >> Swap: 32G Total, 32G Free >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU >> COMMAND >> 2410 root 1 33 0 11992K 2820K zio->i 7 331:25 0.00% >> bzip2 >> 2621 root 1 52 4 28640K 5544K tx->tx 24 245:33 0.00% >> john >> 2624 root 1 48 4 28640K 5544K tx->tx 4 239:08 0.00% >> john >> 2623 root 1 49 4 28640K 5544K tx->tx 7 238:44 0.00% >> john >> 2640 root 1 42 4 28640K 5420K tx->tx 23 206:51 0.00% >> john >> 2638 root 1 42 4 28640K 5420K tx->tx 28 206:34 0.00% >> john >> 2639 root 1 42 4 28640K 5420K tx->tx 9 206:30 0.00% >> john >> 2637 root 1 42 4 28640K 5420K tx->tx 18 206:24 0.00% >> john >> >> >> This system is presently resilvering a disk but these stops have >> happened before. >> >> >> iirc# zpool status disk-1 >> pool: disk-1 >> state: DEGRADED >> status: One or more devices is currently being resilvered. The pool >> will >> continue to function, possibly in a degraded state. >> action: Wait for the resilver to complete. >> scan: resilver in progress since Sun Jul 8 13:07:46 2012 >> 104G scanned out of 12.4T at 1.73M/s, (scan is slow, no >> estimated time) >> 10.3G resilvered, 0.82% done >> config: >> >> NAME STATE READ WRITE CKSUM >> disk-1 DEGRADED 0 0 0 >> raidz2-0 DEGRADED 0 0 0 >> da1 ONLINE 0 0 0 >> da2 ONLINE 0 0 0 >> da10 ONLINE 0 0 0 >> da9 ONLINE 0 0 0 >> da5 ONLINE 0 0 0 >> da6 ONLINE 0 0 0 >> da7 ONLINE 0 0 0 >> replacing-7 DEGRADED 0 0 0 >> 17938531774236227186 UNAVAIL 0 0 0 was /dev/da8 >> da3 ONLINE 0 0 0 (resilvering) >> da8 ONLINE 0 0 0 >> da4 ONLINE 0 0 0 >> logs >> ada2p1 ONLINE 0 0 0 >> cache >> ada1 ONLINE 0 0 0 >> >> errors: No known data errors >> >> >> This system has dissimilar disks, which I understand should not be a >> problem but the stopping also happened before I started the slow disk >> upgrade process. >> >> The disks are served by: >> >> * A LSI 9211 flashed to IT, and >> * A LSI 2008 controller on the motherboard also flashed to IT. >> >> The 2008 BIOS and firmware is the most recent from LSI. The motherboard >> is a Supermicro H8DG6-F. >> >> >> My question is what should I be looking at and how should I look at it? >> There is nothing in the logs or the console, rather the system is >> forever paused and entering commands results in no response (it's as if >> everything is deadlocked). >> >> >> >> >> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > Can you post your 'dmesg | grep mps', the FreeBSD version you run? > Also, is there any chance that those disks are 4K? > I sent that in another post but included it below. Yes, the disks are a mix. I'm presently migrating 2TB crappy disks, and some 2TB not-so-crappy disks, to 3TB crappy-unknown disks. However: 1) Why would a mix of 512/4k disks in a ZFS volume lock out a hardware RAID1 volume on another controller? 2) Is there are known problem, other than performance, mixing 512/4k? 3) Related: How does a SSD array of block size foo impact an array of sectory size bar? Thanks. iirc> dmesg | grep mps mps0: <LSI SAS2008> port 0xd000-0xd0ff mem 0xdfe3c000-0xdfe3ffff,0xdfe40000-0xdfe7ffff irq 19 at device 0.0 on pci4 mps0: Firmware: 13.00.57.00, Driver: 14.00.00.01-fbsd mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc> mps0: attempting to allocate 1 MSI-X vectors (15 supported) mps0: using IRQ 256 for MSI-X mps1: <LSI SAS2008> port 0xc000-0xc0ff mem 0xdfd3c000-0xdfd3ffff,0xdfd40000-0xdfd7ffff irq 16 at device 0.0 on pci3 mps1: Firmware: 13.00.57.00, Driver: 14.00.00.01-fbsd mps1: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc> mps1: attempting to allocate 1 MSI-X vectors (15 supported) mps1: using IRQ 257 for MSI-X da1 at mps0 bus 0 scbus1 target 0 lun 0 da5 at mps1 bus 0 scbus2 target 1 lun 0 da4 at mps0 bus 0 scbus1 target 6 lun 0 da2 at mps0 bus 0 scbus1 target 1 lun 0 da6 at mps1 bus 0 scbus2 target 2 lun 0 da8 at mps1 bus 0 scbus2 target 5 lun 0 da7 at mps1 bus 0 scbus2 target 3 lun 0 da10 at mps1 bus 0 scbus2 target 8 lun 0 pass2 at mps0 bus 0 scbus1 target 0 lun 0 pass3 at mps0 bus 0 scbus1 target 1 lun 0 pass4 at mps0 bus 0 scbus1 target 5 lun 0 pass5 at mps0 bus 0 scbus1 target 6 lun 0 pass6 at mps1 bus 0 scbus2 target 1 lun 0 pass7 at mps1 bus 0 scbus2 target 2 lun 0 pass8 at mps1 bus 0 scbus2 target 3 lun 0 pass9 at mps1 bus 0 scbus2 target 5 lun 0 pass10 at mps1 bus 0 scbus2 target 7 lun 0 pass11 at mps1 bus 0 scbus2 target 8 lun 0 da3 at mps0 bus 0 scbus1 target 5 lun 0 da9 at mps1 bus 0 scbus2 target 7 lun 0 iirc> uname -a FreeBSD iirc 9.0-STABLE FreeBSD 9.0-STABLE #14: Sun Jul 8 16:54:00 PDT 2012 root@iirc:/sys/amd64/compile/SMUNI amd64 > -- > George Kontostanos > Aicom telecoms ltd > http://www.aisecure.net >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1207101100270.69533>