Date: Fri, 18 Nov 2016 11:48:58 +0100 From: Jan Bramkamp <crest@rlwinm.de> To: Miroslav Lachman <000.fbsd@quip.cz>, freebsd-emulation@freebsd.org Subject: Re: bhyve: zvols for guest disk - yes or no? Message-ID: <735660c2-6cd7-f499-07dc-30c171c1fc26@rlwinm.de> In-Reply-To: <582EC02D.4010602@quip.cz> References: <D991D88D-1327-4580-B6E5-2D59338147C0@punkt.de> <b775f684-98a2-b929-2b13-9753c95fd4f2@rlwinm.de> <D5A6875B-A2AE-4DD9-B941-71146AEF2578@punkt.de> <5be68f57-c9c5-7c20-f590-1beed55fd6bb@rlwinm.de> <582EC02D.4010602@quip.cz>
next in thread | previous in thread | raw e-mail | index | archive | help
On 18/11/2016 09:47, Miroslav Lachman wrote: > Jan Bramkamp wrote on 2016/11/17 11:16: >> On 16/11/2016 19:10, Patrick M. Hausen wrote: >>>> Without ZFS you would require a reliable hardware RAID controller (if >>>> such a magical creature exists) instead (or build a software RAID1+0 >>>> from gmirror and gstripe). IMO money is better invested into more RAM >>>> keeping ZFS and the admin happy. >>> >>> And we always use geom_mirror with UFS ... >> >> That would work but I don't recommend for new setups. ZFS offers you a >> lot of operation which in my opinion alone is worth the overhead. >> Without ZFS you would have to use either large raw image files in UFS or >> fight with an old fashioned volume manager. > > One thing to note - ZFS isn't holy grail and has own problems too. Of course ZFS isn't perfect. Nothing as complex as ZFS could be. > For example there is not fsck_zfs and there are some cases where you can end > up with broken pool and because of its complexity the only thing you can > do is to restore from backup. Because ZFS takes a different approach to data and meta data integrity. By design ZFS should be able to automatically recover without dataloss from all cases a fsck_zfs could handle without user interaction. This is possible because ZFS is a Merkle-DAG (edges are stored inside nodes and contain the checksums of the referenced nodes) and stores multiple copies of important metadata (in addition to mirroring and RAID-Z). Fsck on UFS includes a good amount of guesswork which usually works because UFS the on-disk data structures are a lot simpler. That way you end up with some state the kernel can mount without panic()ing, but it doesn't imply that its always exactly the state the users and applications expected the system to be in. > This can occured on ZFS with higher > probability than on simple UFS2. Only if you pick your metrics with a strong bias in favor of UFS. The ZFS data structures are more complicated and you can't repair a ZFS pool with a hex editor and a pocket calculator. At the same time ZFS protects it data (including meta data) a lot better from corruption. * ZFS uses a copy on write B-tree with path copying instead of modifying live data in place. * Because the ZFS graph is directed and cycle-free its edges can (and do) contain the checksum of the pointed to node. * By default ZFS stores three copies of vital pool level meta data and two copies of dataset level meta data in addition to VDEV level redundancy. * The UFS and ZFS code has been battle tested in production enough years. UFS is not suitable for today's large file systems. It trusts its backing storage to much. UFS can't protect from your data from undetected read errors because it doesn't store any checksums along the data. It can't help you detect phantom writes because there a are checksums in the edges. You could swap two blocks of file content with each other and UFS wouldn't notice. The ratio of disk capacity to throughput has reached a point where it is no longer acceptable to run fsck at boot. UFS2 on FreeBSD offers soft-updates and snapshots which allow fsck to run in the background but this requires a lot of RAM and steals a lot IOPS from the other applications running on the system. Running with journaled soft-updates instead requires even more trust in notoriously lying disk, disk controllers and their caches. Additionally UFS snapshots and journaled soft-updates are incompatible and without snapshots you can't create consistent backups of your file systems. UFS is a great file system for the hardware it was designed for, but hardware evolved and now we have to deal with orders of magnitude more storage on disks which haven gotten a lot more reliable. There are still use-cases for UFS and it is a good fit for small systems even if most of these systems could use a NAND flash optimized file system as well. -- Jan Bramkamp
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?735660c2-6cd7-f499-07dc-30c171c1fc26>