Date: Tue, 28 Apr 2009 02:20:39 -0700 From: Maxim Sobolev <sobomax@FreeBSD.org> To: "hackers@freebsd.org" <current@FreeBSD.org> Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, Pawel Jakub Dawidek <pjd@FreeBSD.org>, "current@freebsd.org" <current@FreeBSD.org> Subject: Improving geom_mirror(4)'s read balancing Message-ID: <49F6CA67.6030302@FreeBSD.org>
next in thread | raw e-mail | index | archive | help
Hi, We have few production systems using geom_mirror. The functionality is rock solid, however I have noticed that the read performance of the array, especially in sequential reading, is often worse than performance of a single member of array in the same conditions, which made me curious as to what's going on there. After little bit of a research and experimenting with different settings, I came to a conclusion that this problem happened since all read balancing algorithms implemented in geom_mirror ignored important property of modern hard drives. Particularly I am talking about the fact that even when asked to read a single sector, the drive usually reads the whole track, storing it in the internal buffer. Therefore, sending requests for the sectors N and N+1 to the different drives (round-robin) or splitting one big request and sending it to two separate disks (split) in parallel *degrades* combined performance instead of improving it compared to the read from the single drive. The observed decline apparently happened due to additional latency resulting from the fact that disks needed different time to position themselves to the track in question, increasing the combined latency on average. Sustained linear transfer speed was limited by the platter-to-buffer speed, not buffer-to-interface speed, so that by combining two or more streams gained nothing. Moreover, such "balancing" causes both disks to do a seek, potentially distracting one of them from serving other requests in the meantime, reducing RAID's potential for handling concurrent requests. As a result I have produced a small patch, which caches offset of the last served requests in the disk parameters and sends subsequent requests that fail within certain area around that offset to the same disk. In addition, it implements another small optimization by analyzing number of outstanding requests and using only least busy disks for round-robin. This should allow to smooth any inequality of load distribution caused by the proximity algorithm and also help in the cases when disks require different time to complete their read or write requests. Most of the improvement comes from the first part of the patch though. I have tested few values of HDD_CACHE_SIZE from 1MB to 8MB and did not found much of the difference in performance, which probably suggests that most of the improvement comes from clustering very close reads. To measure effect I have run few benchmarks: - file copy over gigabit LAN (SMB) - local bonnie++ - local raidtest - Intel NASPT over gigabit LAN (SMB) Perhaps the most obvious improvement I've seen in the single-thread copy to the Vista SMB client - the speed has increased from some 55MB/sec to 86MB/sec. Due to its fully random nature there has been no improvement in the raidtest results (no degradation either). All other benchmarks have shown improvement in all I/O bound read tests randing from 20% to 400%. The latter has been observed in the bonnie++ with random create speed increasing from 5,000/sec to 20,000/sec. No test has registered any measurable speed degradation. For example, below are typical results with NASPT (numbers are in MB/sec): New code: Test: HDVideo_1Play Throughput: 38.540 Test: HDVideo_2Play Throughput: 29.655 Test: HDVideo_4Play Throughput: 32.885 Test: HDVideo_1Record Throughput: 33.925 Test: HDVideo_1Play_1Record Throughput: 23.967 Test: ContentCreation Throughput: 14.012 Test: OfficeProductivity Throughput: 20.053 Test: FileCopyToNAS Throughput: 24.906 Test: FileCopyFromNAS Throughput: 46.035 Test: DirectoryCopyToNAS Throughput: 11.367 Test: DirectoryCopyFromNAS Throughput: 17.806 Test: PhotoAlbum Throughput: 19.161 Old code: Test: HDVideo_1Play Throughput: 26.037 Test: HDVideo_2Play Throughput: 28.666 Test: HDVideo_4Play Throughput: 31.623 Test: HDVideo_1Record Throughput: 29.714 Test: HDVideo_1Play_1Record Throughput: 16.857 Test: ContentCreation Throughput: 11.934 Test: OfficeProductivity Throughput: 18.524 Test: FileCopyToNAS Throughput: 25.329 Test: FileCopyFromNAS Throughput: 26.182 Test: DirectoryCopyToNAS Throughput: 10.139 Test: DirectoryCopyFromNAS Throughput: 13.306 Test: PhotoAlbum Throughput: 20.783 The patch is available here: http://sobomax.sippysoft.com/~sobomax/geom_mirror.diff. I would like to get input on the functionality/code itself, as well on what is the best way to add this functionality. Right now, it's part of the round-robin balancing code. Technically, it could be added as a separate new balancing method, but for the reasons outlined above I really doubt having "pure" round-robin has any practical value now. The only case where previous behavior might be beneficial is with solid-state/RAM disks where there is virtually no seek time, so that by reading close sectors from two separate disks one could actually get a better speed. At the very least, the new method should become default, while "old round-robin" be another option with clearly documented shortcomings. I would really like to hear what people think about that. -Maxim
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49F6CA67.6030302>