Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Oct 2012 22:38:29 -0700
From:      Dennis Glatting <freebsd@pki2.com>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS hang status update
Message-ID:  <1350711509.86715.59.camel@btw.pki2.com>
In-Reply-To: <1350698905.86715.33.camel@btw.pki2.com>
References:  <1350698905.86715.33.camel@btw.pki2.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 2012-10-19 at 19:08 -0700, Dennis Glatting wrote:
> I applied your debugging patch and that system has been running under
> load for 43 hours. I have no idea why.
> 
> That said, some of my prior batch jobs have run for over a month. There
> was a time when ZFS was fairly stable but took a dive some months ago.
> 

Boom. Roughly 49 hours, adding a SFTP transfer (60GB off the pool
disk-1) and a ls (a directory in the disk-1 pool) in a while loop.


My pools:

mc# zpool status
  pool: disk-1
 state: ONLINE
  scan: scrub repaired 0 in 0h38m with 0 errors on Tue Oct 16 16:47:51
2012
config:

	NAME        STATE     READ WRITE CKSUM
	disk-1      ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    da5     ONLINE       0     0     0
	    da6     ONLINE       0     0     0
	    da7     ONLINE       0     0     0
	    da2     ONLINE       0     0     0
	    da3     ONLINE       0     0     0
	    da4     ONLINE       0     0     0
	cache
	  da0       ONLINE       0     0     0

errors: No known data errors

  pool: disk-2
 state: ONLINE
  scan: scrub repaired 0 in 0h6m with 0 errors on Tue Oct 16 17:05:43
2012
config:

	NAME        STATE     READ WRITE CKSUM
	disk-2      ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    da9     ONLINE       0     0     0
	    da10    ONLINE       0     0     0

errors: No known data errors


camcontrol output (static linked and stored in a md):

mc# /mnt/camcontrol tags da0  -v
(no output. session hung.)

mc# /mnt/camcontrol tags da1  -v (** swap disk **)
(pass1:mps0:0:5:0): dev_openings  255
(pass1:mps0:0:5:0): dev_active    0
(pass1:mps0:0:5:0): devq_openings 255
(pass1:mps0:0:5:0): devq_queued   0
(pass1:mps0:0:5:0): held          0
(pass1:mps0:0:5:0): mintags       2
(pass1:mps0:0:5:0): maxtags       255
mc# /mnt/camcontrol tags da2 -v
(pass2:mps0:0:6:0): dev_openings  245
(pass2:mps0:0:6:0): dev_active    10
(pass2:mps0:0:6:0): devq_openings 245
(pass2:mps0:0:6:0): devq_queued   0
(pass2:mps0:0:6:0): held          0
(pass2:mps0:0:6:0): mintags       2
(pass2:mps0:0:6:0): maxtags       255
mc# /mnt/camcontrol tags da3 -v
(pass3:mps0:0:7:0): dev_openings  245
(pass3:mps0:0:7:0): dev_active    10
(pass3:mps0:0:7:0): devq_openings 245
(pass3:mps0:0:7:0): devq_queued   0
(pass3:mps0:0:7:0): held          0
(pass3:mps0:0:7:0): mintags       2
(pass3:mps0:0:7:0): maxtags       255
mc# /mnt/camcontrol tags da4 -v
(pass4:mps0:0:8:0): dev_openings  245
(pass4:mps0:0:8:0): dev_active    10
(pass4:mps0:0:8:0): devq_openings 245
(pass4:mps0:0:8:0): devq_queued   0
(pass4:mps0:0:8:0): held          0
(pass4:mps0:0:8:0): mintags       2
(pass4:mps0:0:8:0): maxtags       255
mc# /mnt/camcontrol tags da5 -v
(pass5:mps0:0:9:0): dev_openings  245
(pass5:mps0:0:9:0): dev_active    10
(pass5:mps0:0:9:0): devq_openings 245
(pass5:mps0:0:9:0): devq_queued   0
(pass5:mps0:0:9:0): held          0
(pass5:mps0:0:9:0): mintags       2
(pass5:mps0:0:9:0): maxtags       255
mc# /mnt/camcontrol tags da6 -v
(pass6:mps0:0:10:0): dev_openings  245
(pass6:mps0:0:10:0): dev_active    10
(pass6:mps0:0:10:0): devq_openings 245
(pass6:mps0:0:10:0): devq_queued   0
(pass6:mps0:0:10:0): held          0
(pass6:mps0:0:10:0): mintags       2
(pass6:mps0:0:10:0): maxtags       255
mc# /mnt/camcontrol tags da7 -v
(pass7:mps0:0:11:0): dev_openings  245
(pass7:mps0:0:11:0): dev_active    10
(pass7:mps0:0:11:0): devq_openings 245
(pass7:mps0:0:11:0): devq_queued   0
(pass7:mps0:0:11:0): held          0
(pass7:mps0:0:11:0): mintags       2
(pass7:mps0:0:11:0): maxtags       255
mc# /mnt/camcontrol tags da8 -v  (** OS hdw RAID1 **)
(pass8:mps1:0:0:0): dev_openings  245
(pass8:mps1:0:0:0): dev_active    10
(pass8:mps1:0:0:0): devq_openings 245
(pass8:mps1:0:0:0): devq_queued   0
(pass8:mps1:0:0:0): held          0
(pass8:mps1:0:0:0): mintags       2
(pass8:mps1:0:0:0): maxtags       255
mc# /mnt/camcontrol tags da9 -v
(pass9:mps1:0:9:0): dev_openings  251
(pass9:mps1:0:9:0): dev_active    4
(pass9:mps1:0:9:0): devq_openings 251
(pass9:mps1:0:9:0): devq_queued   0
(pass9:mps1:0:9:0): held          0
(pass9:mps1:0:9:0): mintags       2
(pass9:mps1:0:9:0): maxtags       255
mc# /mnt/camcontrol tags da10 -v
(pass10:mps1:0:11:0): dev_openings  251
(pass10:mps1:0:11:0): dev_active    4
(pass10:mps1:0:11:0): devq_openings 251
(pass10:mps1:0:11:0): devq_queued   0
(pass10:mps1:0:11:0): held          0
(pass10:mps1:0:11:0): mintags       2
(pass10:mps1:0:11:0): maxtags       255


I did not run procstat before reboot. I wasn't sure if that was
redundant information from my prior email.


This is da0 (the cache --SSD) on which camcontrol hanged. It is on the
same controller.

da0 at mps0 bus 0 scbus0 target 3 lun 0
da0: <ATA M4-CT256M4SSD2 000F> Fixed Direct Access SCSI-6 device 
da0: 600.000MB/s transfers
da0: Command Queueing enabled
da0: 244198MB (500118192 512 byte sectors: 255H 63S/T 31130C)






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1350711509.86715.59.camel>