Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 02 Dec 2015 13:34:28 +0200
From:      "Zeus Panchenko" <zeus@ibs.dn.ua>
To:        "FreeBSD Filesystems" <freebsd-fs@freebsd.org>
Subject:   advice needed: zpool of 10 x (raidz2 on (4+2) x 2T HDD)
Message-ID:  <20151202133428.35820@smtp.new-ukraine.org>

next in thread | raw e-mail | index | archive | help
=2D----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

greetings,

we deployed storage, and as it was filling until now, I see I need
an advice regarding the configuration and optimization/s ...

the main cause I decided to ask for an advice is this:

once per month (or even more frequently, depends on the load I
suggest) host hangs and only power reset helps, nothing helpful in log
files though ... just the fact of restart logged and usual ctld activity

after reboot, `zpool import' lasts 40min and more, and during this time
no resource of the host is used much ... neither CPU nor memory ... top
and systat shows no load (I need to export pool first since I need to
attach geli first, and if I attach geli with zpool still imported, I
receive in the end a lot of "absent/damaged" disks in zpool which
disappears after export/import)


so, I'm wondering what can I do to trace the cause of hangs? what to monito=
re to
understand what to expect and how to prevent ...=20


so, please, advise



=2D -----------------------------------------------------------------------=
-----------
bellow the details are:
=2D -----------------------------------------------------------------------=
-----------

the box is Supermicro X9DRD-7LN4F with:

  CPU: Intel(R) Xeon(R) CPU E5-2630L (2 package(s) x 6 core(s) x 2 SMT thre=
ads)
  RAM: 128Gb
 STOR: 3 x LSI SAS2308 PCI-Express Fusion-MPT SAS-2 (jbod)
       60 x HDD 2T (ATA WDC WD20EFRX-68A 0A80, Fixed Direct Access SCSI-6 d=
evice 600.000MB/s)

OS: FreeBSD 10.1-RELEASE #0 r274401 amd64

to avoid OS memory shortage sysctl vfs.zfs.arc_max is set to 120275861504

to clients, storage is provided via iSCSI by ctld (each target is file back=
ed)

zpool created of 10 x raidz2, each raidz2 consists of 6 geli devices and
now looks so (yes, deduplication is on):

> zpool list storage
NAME            SIZE  ALLOC   FREE   FRAG  EXPANDSZ    CAP  DEDUP  HEALTH  =
ALTROOT
storage         109T  33.5T  75.2T      -         -    30%  1.57x  ONLINE  -


> zpool history storage
2013-10-21.01:31:14 zpool create storage=20
  raidz2 gpt/c0s00 gpt/c0s01 gpt/c1s00 gpt/c1s01 gpt/c2s00 gpt/c2s01
  raidz2 gpt/c0s02 gpt/c0s03 gpt/c1s02 gpt/c1s03 gpt/c2s02 gpt/c2s03
  ...
  raidz2 gpt/c0s18 gpt/c0s19 gpt/c1s18 gpt/c1s19 gpt/c2s18 gpt/c2s19
 log mirror gpt/log0 gpt/log1
 cache gpt/cache0 gpt/cache1


> zdb storage
Cached configuration:
        version: 5000
        name: 'storage'
        state: 0
        txg: 13340514
        pool_guid: 11994995707440773547
        hostid: 1519855013
        hostname: 'storage.foo.bar'
        vdev_children: 11
        vdev_tree:
            type: 'root'
            id: 0
            guid: 11994995707440773547
            children[0]:
                type: 'raidz'
                id: 0
                guid: 12290021428260525074
                nparity: 2
                metaslab_array: 46
                metaslab_shift: 36
                ashift: 12
                asize: 12002364751872
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 3897093815971447961
                    path: '/dev/gpt/c0s00'
                    phys_path: '/dev/gpt/c0s00'
                    whole_disk: 1
                    DTL: 9133
                    create_txg: 4
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 1036685341766239763
                    path: '/dev/gpt/c0s01'
                    phys_path: '/dev/gpt/c0s01'
                    whole_disk: 1
                    DTL: 9132
                    create_txg: 4
		    ...


each geli is created on one HDD
> geli list da50.eli
Geom name: da50.eli
State: ACTIVE
EncryptionAlgorithm: AES-XTS
KeyLength: 256
Crypto: hardware
Version: 6
UsedKey: 0
Flags: (null)
KeysAllocated: 466
KeysTotal: 466
Providers:
1. Name: da50.eli
   Mediasize: 2000398929920 (1.8T)
   Sectorsize: 4096
   Mode: r1w1e3
Consumers:
1. Name: da50
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1



each raidz2 disk configured as:
> gpart show da50.eli=20=20=20=20=20
=3D>        6  488378634  da50.eli  GPT  (1.8T)
          6  488378634         1  freebsd-zfs  (1.8T)


> zfs-stats -a
=2D -----------------------------------------------------------------------=
---
ZFS Subsystem Report				Wed Dec  2 09:59:27 2015
=2D -----------------------------------------------------------------------=
---
System Information:

	Kernel Version:				1001000 (osreldate)
	Hardware Platform:			amd64
	Processor Architecture:			amd64

FreeBSD 10.1-RELEASE #0 r274401: Tue Nov 11 21:02:49 UTC 2014     root
 9:59AM  up 1 day, 46 mins, 10 users, load averages: 1.03, 0.46, 0.75
=2D -----------------------------------------------------------------------=
---
System Memory Statistics:
	Physical Memory:			131012.88M
	Kernel Memory:				1915.37M
	DATA:				98.62%	1888.90M
	TEXT:				1.38%	26.47M
=2D -----------------------------------------------------------------------=
---
ZFS pool information:
	Storage pool Version (spa):		5000
	Filesystem Version (zpl):		5
=2D -----------------------------------------------------------------------=
---
ARC Misc:
	Deleted:				1961248
	Recycle Misses:				127014
	Mutex Misses:				5973
	Evict Skips:				5973

ARC Size:
	Current Size (arcsize):		100.00%	114703.88M
	Target Size (Adaptive, c):	100.00%	114704.00M
	Min Size (Hard Limit, c_min):	12.50%	14338.00M
	Max Size (High Water, c_max):	~8:1	114704.00M

ARC Size Breakdown:
	Recently Used Cache Size (p):	93.75%	107535.69M
	Freq. Used Cache Size (c-p):	6.25%	7168.31M

ARC Hash Breakdown:
	Elements Max:				6746532
	Elements Current:		100.00%	6746313
	Collisions:				9651654
	Chain Max:				0
	Chains:					1050203

ARC Eviction Statistics:
	Evicts Total:				194298918912
	Evicts Eligible for L2:		81.00%	157373345280
	Evicts Ineligible for L2:	19.00%	36925573632
	Evicts Cached to L2:			97939090944

ARC Efficiency
	Cache Access Total:			109810376
	Cache Hit Ratio:		91.57%	100555148
	Cache Miss Ratio:		8.43%	9255228
	Actual Hit Ratio:		90.54%	99423922

	Data Demand Efficiency:		76.64%
	Data Prefetch Efficiency:	48.46%

	CACHE HITS BY CACHE LIST:
	  Anonymously Used:		0.88%	881966
	  Most Recently Used (mru):	23.11%	23236902
	  Most Frequently Used (mfu):	75.77%	76187020
	  MRU Ghost (mru_ghost):	0.03%	26449
	  MFU Ghost (mfu_ghost):	0.22%	222811

	CACHE HITS BY DATA TYPE:
	  Demand Data:			10.17%	10227867
	  Prefetch Data:		0.45%	455126
	  Demand Metadata:		88.69%	89184329
	  Prefetch Metadata:		0.68%	687826

	CACHE MISSES BY DATA TYPE:
	  Demand Data:			33.69%	3117808
	  Prefetch Data:		5.23%	484140
	  Demand Metadata:		56.55%	5233984
	  Prefetch Metadata:		4.53%	419296
=2D -----------------------------------------------------------------------=
---
L2 ARC Summary:
	Low Memory Aborts:			77
	R/W Clashes:				13
	Free on Write:				523

L2 ARC Size:
	Current Size: (Adaptive)		91988.13M
	Header Size:			0.13%	120.08M

L2 ARC Read/Write Activity:
	Bytes Written:				97783.99M
	Bytes Read:				2464.81M

L2 ARC Breakdown:
	Access Total:				8110124
	Hit Ratio:			2.89%	234616
	Miss Ratio:			97.11%	7875508
	Feeds:					85129

	WRITES:
	  Sent Total:			100.00%	18448
=2D -----------------------------------------------------------------------=
---
VDEV Cache Summary:
	Access Total:				0
	Hits Ratio:			0.00%	0
	Miss Ratio:			0.00%	0
	Delegations:				0
=2D -----------------------------------------------------------------------=
---
File-Level Prefetch Stats (DMU):

DMU Efficiency:
	Access Total:				162279162
	Hit Ratio:			91.69%	148788486
	Miss Ratio:			8.31%	13490676

	Colinear Access Total:			13490676
	Colinear Hit Ratio:		0.06%	8166
	Colinear Miss Ratio:		99.94%	13482510

	Stride Access Total:			146863482
	Stride Hit Ratio:		99.31%	145846806
	Stride Miss Ratio:		0.69%	1016676

DMU misc:
	Reclaim successes:			124372
	Reclaim failures:			13358138
	Stream resets:				618
	Stream noresets:			2938602
	Bogus streams:				0
=2D -----------------------------------------------------------------------=
---
ZFS Tunable (sysctl):
	kern.maxusers=3D8524
	vfs.zfs.arc_max=3D120275861504
	vfs.zfs.arc_min=3D15034482688
	vfs.zfs.arc_average_blocksize=3D8192
	vfs.zfs.arc_meta_used=3D24838283936
	vfs.zfs.arc_meta_limit=3D30068965376
	vfs.zfs.l2arc_write_max=3D8388608
	vfs.zfs.l2arc_write_boost=3D8388608
	vfs.zfs.l2arc_headroom=3D2
	vfs.zfs.l2arc_feed_secs=3D1
	vfs.zfs.l2arc_feed_min_ms=3D200
	vfs.zfs.l2arc_noprefetch=3D1
	vfs.zfs.l2arc_feed_again=3D1
	vfs.zfs.l2arc_norw=3D1
	vfs.zfs.anon_size=3D27974656
	vfs.zfs.anon_metadata_lsize=3D0
	vfs.zfs.anon_data_lsize=3D0
	vfs.zfs.mru_size=3D112732930560
	vfs.zfs.mru_metadata_lsize=3D18147921408
	vfs.zfs.mru_data_lsize=3D92690379776
	vfs.zfs.mru_ghost_size=3D7542758400
	vfs.zfs.mru_ghost_metadata_lsize=3D1262705664
	vfs.zfs.mru_ghost_data_lsize=3D6280052736
	vfs.zfs.mfu_size=3D3748620800
	vfs.zfs.mfu_metadata_lsize=3D1014886912
	vfs.zfs.mfu_data_lsize=3D2723481600
	vfs.zfs.mfu_ghost_size=3D24582345728
	vfs.zfs.mfu_ghost_metadata_lsize=3D682512384
	vfs.zfs.mfu_ghost_data_lsize=3D23899833344
	vfs.zfs.l2c_only_size=3D66548531200
	vfs.zfs.dedup.prefetch=3D1
	vfs.zfs.nopwrite_enabled=3D1
	vfs.zfs.mdcomp_disable=3D0
	vfs.zfs.dirty_data_max=3D4294967296
	vfs.zfs.dirty_data_max_max=3D4294967296
	vfs.zfs.dirty_data_max_percent=3D10
	vfs.zfs.dirty_data_sync=3D67108864
	vfs.zfs.delay_min_dirty_percent=3D60
	vfs.zfs.delay_scale=3D500000
	vfs.zfs.prefetch_disable=3D0
	vfs.zfs.zfetch.max_streams=3D8
	vfs.zfs.zfetch.min_sec_reap=3D2
	vfs.zfs.zfetch.block_cap=3D256
	vfs.zfs.zfetch.array_rd_sz=3D1048576
	vfs.zfs.top_maxinflight=3D32
	vfs.zfs.resilver_delay=3D2
	vfs.zfs.scrub_delay=3D4
	vfs.zfs.scan_idle=3D50
	vfs.zfs.scan_min_time_ms=3D1000
	vfs.zfs.free_min_time_ms=3D1000
	vfs.zfs.resilver_min_time_ms=3D3000
	vfs.zfs.no_scrub_io=3D0
	vfs.zfs.no_scrub_prefetch=3D0
	vfs.zfs.metaslab.gang_bang=3D131073
	vfs.zfs.metaslab.fragmentation_threshold=3D70
	vfs.zfs.metaslab.debug_load=3D0
	vfs.zfs.metaslab.debug_unload=3D0
	vfs.zfs.metaslab.df_alloc_threshold=3D131072
	vfs.zfs.metaslab.df_free_pct=3D4
	vfs.zfs.metaslab.min_alloc_size=3D10485760
	vfs.zfs.metaslab.load_pct=3D50
	vfs.zfs.metaslab.unload_delay=3D8
	vfs.zfs.metaslab.preload_limit=3D3
	vfs.zfs.metaslab.preload_enabled=3D1
	vfs.zfs.metaslab.fragmentation_factor_enabled=3D1
	vfs.zfs.metaslab.lba_weighting_enabled=3D1
	vfs.zfs.metaslab.bias_enabled=3D1
	vfs.zfs.condense_pct=3D200
	vfs.zfs.mg_noalloc_threshold=3D0
	vfs.zfs.mg_fragmentation_threshold=3D85
	vfs.zfs.check_hostid=3D1
	vfs.zfs.spa_load_verify_maxinflight=3D10000
	vfs.zfs.spa_load_verify_metadata=3D1
	vfs.zfs.spa_load_verify_data=3D1
	vfs.zfs.recover=3D0
	vfs.zfs.deadman_synctime_ms=3D1000000
	vfs.zfs.deadman_checktime_ms=3D5000
	vfs.zfs.deadman_enabled=3D1
	vfs.zfs.spa_asize_inflation=3D24
	vfs.zfs.txg.timeout=3D5
	vfs.zfs.vdev.cache.max=3D16384
	vfs.zfs.vdev.cache.size=3D0
	vfs.zfs.vdev.cache.bshift=3D16
	vfs.zfs.vdev.trim_on_init=3D1
	vfs.zfs.vdev.mirror.rotating_inc=3D0
	vfs.zfs.vdev.mirror.rotating_seek_inc=3D5
	vfs.zfs.vdev.mirror.rotating_seek_offset=3D1048576
	vfs.zfs.vdev.mirror.non_rotating_inc=3D0
	vfs.zfs.vdev.mirror.non_rotating_seek_inc=3D1
	vfs.zfs.vdev.max_active=3D1000
	vfs.zfs.vdev.sync_read_min_active=3D10
	vfs.zfs.vdev.sync_read_max_active=3D10
	vfs.zfs.vdev.sync_write_min_active=3D10
	vfs.zfs.vdev.sync_write_max_active=3D10
	vfs.zfs.vdev.async_read_min_active=3D1
	vfs.zfs.vdev.async_read_max_active=3D3
	vfs.zfs.vdev.async_write_min_active=3D1
	vfs.zfs.vdev.async_write_max_active=3D10
	vfs.zfs.vdev.scrub_min_active=3D1
	vfs.zfs.vdev.scrub_max_active=3D2
	vfs.zfs.vdev.trim_min_active=3D1
	vfs.zfs.vdev.trim_max_active=3D64
	vfs.zfs.vdev.aggregation_limit=3D131072
	vfs.zfs.vdev.read_gap_limit=3D32768
	vfs.zfs.vdev.write_gap_limit=3D4096
	vfs.zfs.vdev.bio_flush_disable=3D0
	vfs.zfs.vdev.bio_delete_disable=3D0
	vfs.zfs.vdev.trim_max_bytes=3D2147483648
	vfs.zfs.vdev.trim_max_pending=3D64
	vfs.zfs.max_auto_ashift=3D13
	vfs.zfs.min_auto_ashift=3D9
	vfs.zfs.zil_replay_disable=3D0
	vfs.zfs.cache_flush_disable=3D0
	vfs.zfs.zio.use_uma=3D1
	vfs.zfs.zio.exclude_metadata=3D0
	vfs.zfs.sync_pass_deferred_free=3D2
	vfs.zfs.sync_pass_dont_compress=3D5
	vfs.zfs.sync_pass_rewrite=3D2
	vfs.zfs.snapshot_list_prefetch=3D0
	vfs.zfs.super_owner=3D0
	vfs.zfs.debug=3D0
	vfs.zfs.version.ioctl=3D4
	vfs.zfs.version.acl=3D1
	vfs.zfs.version.spa=3D5000
	vfs.zfs.version.zpl=3D5
	vfs.zfs.vol.mode=3D1
	vfs.zfs.trim.enabled=3D1
	vfs.zfs.trim.txg_delay=3D32
	vfs.zfs.trim.timeout=3D30
	vfs.zfs.trim.max_interval=3D1
	vm.kmem_size=3D133823901696
	vm.kmem_size_scale=3D1
	vm.kmem_size_min=3D0
	vm.kmem_size_max=3D1319413950874

=2D --=20
Zeus V. Panchenko				jid:zeus@im.ibs.dn.ua
IT Dpt., I.B.S. LLC					  GMT+2 (EET)
=2D----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iEYEARECAAYFAlZe10QACgkQr3jpPg/3oyqVAwCdHeRra+H9ac/+HCiQ80DhthlZ
SSUAnjucvvosNjcUzTqKgGe+LlLctaoV
=3DWPge
=2D----END PGP SIGNATURE-----



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151202133428.35820>