Date: Thu, 03 Dec 2009 10:00:25 +0100 From: Ivan Voras <ivoras@freebsd.org> To: freebsd-current@freebsd.org Subject: Re: NCQ vs UFS/ZFS benchmark [Was: Re: FreeBSD 8.0 Performance (at Phoronix)] Message-ID: <hf7un0$9pp$1@ger.gmane.org> In-Reply-To: <4B170FCB.3030102@FreeBSD.org> References: <1259583785.00188655.1259572802@10.7.7.3> <1259659388.00189017.1259647802@10.7.7.3> <1259691809.00189274.1259681402@10.7.7.3> <1259695381.00189283.1259682004@10.7.7.3> <4B170FCB.3030102@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Alexander Motin wrote: > Ivan Voras wrote: >> If you have a drive to play with, could you also check UFS vs ZFS on >> both ATA & AHCI? To try and see if the IO scheduling of ZFS plays nicely. >> >> For benchmarks I suggest blogbench and bonnie++ (in ports) and if you >> want to bother, randomio, http://arctic.org/~dean/randomio . > gstat shown that most of time only one request at a time was running on > disk. Looks like read or read-modify-write operations (due to many short > writes in test pattern) are heavily serialized in UFS, even when several > processes working with the same file. It has almost eliminated effect of > NCQ in this test. > > Test 2: Same as before, but without O_DIRECT flag: > ata(4), 1 process, first tps: 78 > ata(4), 1 process, second tps: 469 > ata(4), 32 processes, first tps: 83 > ata(4), 32 processes, second tps: 475 > ahci(4), 1 process, first tps: 79 > ahci(4), 1 process, second tps: 476 > ahci(4), 32 processes, first tps: 93 > ahci(4), 32 processes, second tps: 488 Ok, so this is UFS, normal caching. > Data doesn't fit into cache. Multiple parallel requests give some effect > even with legacy driver, but with NCQ enabled it gives much more, almost > doubling performance! You've seen queueing in gstat for ZFS+NCQ? > Teste 4: Same as 3, but with kmem_size=1900M and arc_max=1700M. > ata(4), 1 process, first tps: 90 > ata(4), 1 process, second tps: ~160-300 > ata(4), 32 processes, first tps: 112 > ata(4), 32 processes, second tps: ~190-322 > ahci(4), 1 process, first tps: 90 > ahci(4), 1 process, second tps: ~140-300 > ahci(4), 32 processes, first tps: 180 > ahci(4), 32 processes, second tps: ~280-550 And this is ZFS with some tuning. I've also seen high deviation in performance on ZFS so it seems normal. > As conclusion: > - in this particular test ZFS scaled well with parallel requests, > effectively using multiple disks. NCQ shown great benefits. But i386 > constraints are significantly limited ZFS caching abilities. > - UFS behaves very poorly in this test. Even with parallel workload it > often serializes device accesses. May be results would be different if I wouldn't say UFS behaves poorly from your results. It looks like only the multiprocess case is bad on the UFS. For single-process access the difference in favour of ZFS is ~10 TPS on the first case and UFS is apparently much better in all cases but the last on the second try. This may be explained if you have a large variation between runs. Also, did you use the whole drive for the file system? In cases like this it would be interesting to create a special partition (in all cases, on all drives), covering only a small segment on the disk (thinking of the drive as a rotational media, made of cylinders). For example, a partition of size of 30 GB covering only the outer tracks. > there would be separate file for each process, or with some other > options, but I think pattern I have used is also possible in some > applications. Only benefit UFS shown here is more effective memory > management on i386, leading to higher cache effectiveness. > > It would be nice if somebody explained that UFS behavior. Possibly, read-only access to memory cache structures is protected by read-only locks, which are efficient, and ARC is more complicated than it's worth? But others should have better guesses :)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?hf7un0$9pp$1>