Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Jul 2012 11:08:15 -0700 (PDT)
From:      Dennis Glatting <dg@pki2.com>
To:        George Kontostanos <gkontos.mail@gmail.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS hanging
Message-ID:  <alpine.BSF.2.00.1207101100270.69533@btw.pki2.com>
In-Reply-To: <CA%2BdUSypXZ5qq9vj%2BkaVicfkUHEQEeaCt46F%2Bg%2B7wDyATbw9UbA@mail.gmail.com>
References:  <1341864787.32803.43.camel@btw.pki2.com> <CA%2BdUSypXZ5qq9vj%2BkaVicfkUHEQEeaCt46F%2Bg%2B7wDyATbw9UbA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help


On Tue, 10 Jul 2012, George Kontostanos wrote:

> On Mon, Jul 9, 2012 at 11:13 PM, Dennis Glatting <freebsd@pki2.com> wrote:
>> I have a ZFS array of disks where the system simply stops as if forever
>> blocked by some IO mutex. This happens often and the following is the
>> output of top:
>>
>> last pid:  6075;  load averages:  0.00,  0.00,  0.00    up 0+16:54:41
>> 13:04:10
>> 135 processes: 1 running, 134 sleeping
>> CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
>> Mem: 47M Active, 24M Inact, 18G Wired, 120M Buf, 44G Free
>> Swap: 32G Total, 32G Free
>>
>>   PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU
>> COMMAND
>>  2410 root          1  33    0 11992K  2820K zio->i  7 331:25  0.00%
>> bzip2
>>  2621 root          1  52    4 28640K  5544K tx->tx 24 245:33  0.00%
>> john
>>  2624 root          1  48    4 28640K  5544K tx->tx  4 239:08  0.00%
>> john
>>  2623 root          1  49    4 28640K  5544K tx->tx  7 238:44  0.00%
>> john
>>  2640 root          1  42    4 28640K  5420K tx->tx 23 206:51  0.00%
>> john
>>  2638 root          1  42    4 28640K  5420K tx->tx 28 206:34  0.00%
>> john
>>  2639 root          1  42    4 28640K  5420K tx->tx  9 206:30  0.00%
>> john
>>  2637 root          1  42    4 28640K  5420K tx->tx 18 206:24  0.00%
>> john
>>
>>
>> This system is presently resilvering a disk but these stops have
>> happened before.
>>
>>
>> iirc#  zpool status disk-1
>>   pool: disk-1
>>  state: DEGRADED
>> status: One or more devices is currently being resilvered.  The pool
>> will
>>         continue to function, possibly in a degraded state.
>> action: Wait for the resilver to complete.
>>   scan: resilver in progress since Sun Jul  8 13:07:46 2012
>>         104G scanned out of 12.4T at 1.73M/s, (scan is slow, no
>> estimated time)
>>         10.3G resilvered, 0.82% done
>> config:
>>
>>         NAME                        STATE     READ WRITE CKSUM
>>         disk-1                      DEGRADED     0     0     0
>>           raidz2-0                  DEGRADED     0     0     0
>>             da1                     ONLINE       0     0     0
>>             da2                     ONLINE       0     0     0
>>             da10                    ONLINE       0     0     0
>>             da9                     ONLINE       0     0     0
>>             da5                     ONLINE       0     0     0
>>             da6                     ONLINE       0     0     0
>>             da7                     ONLINE       0     0     0
>>             replacing-7             DEGRADED     0     0     0
>>               17938531774236227186  UNAVAIL      0     0     0  was /dev/da8
>>               da3                   ONLINE       0     0     0  (resilvering)
>>             da8                     ONLINE       0     0     0
>>             da4                     ONLINE       0     0     0
>>         logs
>>           ada2p1                    ONLINE       0     0     0
>>         cache
>>           ada1                      ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>>
>> This system has dissimilar disks, which I understand should not be a
>> problem but the stopping also happened before I started the slow disk
>> upgrade process.
>>
>> The disks are served by:
>>
>> * A LSI 9211 flashed to IT, and
>> * A LSI 2008 controller on the motherboard also flashed to IT.
>>
>> The 2008 BIOS and firmware is the most recent from LSI. The motherboard
>> is a Supermicro H8DG6-F.
>>
>>
>> My question is what should I be looking at and how should I look at it?
>> There is nothing in the logs or the console, rather the system is
>> forever paused and entering commands results in no response (it's as if
>> everything is deadlocked).
>>
>>
>>
>>
>>
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>
> Can you post your 'dmesg | grep mps', the FreeBSD version you run?
> Also, is there any chance that those disks are 4K?
>

I sent that in another post but included it below.

Yes, the disks are a mix. I'm presently migrating 2TB crappy disks, and 
some 2TB not-so-crappy disks, to 3TB crappy-unknown disks. However:

1) Why would a mix of 512/4k disks in a ZFS volume lock out a hardware 
RAID1 volume on another controller?

2) Is there are known problem, other than performance, mixing 512/4k?

3) Related: How does a SSD array of block size foo impact an array of 
sectory size bar?

Thanks.


iirc> dmesg | grep mps
mps0: <LSI SAS2008> port 0xd000-0xd0ff mem 
0xdfe3c000-0xdfe3ffff,0xdfe40000-0xdfe7ffff irq 19 at device 0.0 on pci4
mps0: Firmware: 13.00.57.00, Driver: 14.00.00.01-fbsd
mps0: IOCCapabilities: 
1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
mps0: attempting to allocate 1 MSI-X vectors (15 supported)
mps0: using IRQ 256 for MSI-X
mps1: <LSI SAS2008> port 0xc000-0xc0ff mem 
0xdfd3c000-0xdfd3ffff,0xdfd40000-0xdfd7ffff irq 16 at device 0.0 on pci3
mps1: Firmware: 13.00.57.00, Driver: 14.00.00.01-fbsd
mps1: IOCCapabilities: 
1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
mps1: attempting to allocate 1 MSI-X vectors (15 supported)
mps1: using IRQ 257 for MSI-X
da1 at mps0 bus 0 scbus1 target 0 lun 0
da5 at mps1 bus 0 scbus2 target 1 lun 0
da4 at mps0 bus 0 scbus1 target 6 lun 0
da2 at mps0 bus 0 scbus1 target 1 lun 0
da6 at mps1 bus 0 scbus2 target 2 lun 0
da8 at mps1 bus 0 scbus2 target 5 lun 0
da7 at mps1 bus 0 scbus2 target 3 lun 0
da10 at mps1 bus 0 scbus2 target 8 lun 0
pass2 at mps0 bus 0 scbus1 target 0 lun 0
pass3 at mps0 bus 0 scbus1 target 1 lun 0
pass4 at mps0 bus 0 scbus1 target 5 lun 0
pass5 at mps0 bus 0 scbus1 target 6 lun 0
pass6 at mps1 bus 0 scbus2 target 1 lun 0
pass7 at mps1 bus 0 scbus2 target 2 lun 0
pass8 at mps1 bus 0 scbus2 target 3 lun 0
pass9 at mps1 bus 0 scbus2 target 5 lun 0
pass10 at mps1 bus 0 scbus2 target 7 lun 0
pass11 at mps1 bus 0 scbus2 target 8 lun 0
da3 at mps0 bus 0 scbus1 target 5 lun 0
da9 at mps1 bus 0 scbus2 target 7 lun 0


iirc> uname -a
FreeBSD iirc 9.0-STABLE FreeBSD 9.0-STABLE #14: Sun Jul  8 16:54:00 PDT 
2012     root@iirc:/sys/amd64/compile/SMUNI  amd64





> -- 
> George Kontostanos
> Aicom telecoms ltd
> http://www.aisecure.net
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1207101100270.69533>