Date: Wed, 6 Mar 2013 02:01:08 -0800 (PST) From: Don Lewis <truckman@FreeBSD.org> To: lev@FreeBSD.org Cc: freebsd-fs@FreeBSD.org, ivoras@FreeBSD.org, freebsd-geom@FreeBSD.org Subject: Re: Unexpected SU+J inconsistency AGAIN -- please, don't shift topic to ZFS! Message-ID: <201303061001.r26A18n3015414@gw.catspoiler.org> In-Reply-To: <1198028260.20130306124139@serebryakov.spb.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On 6 Mar, Lev Serebryakov wrote: > DL> With NCQ or TCQ, the drive can have a sizeable number of writes > DL> internally queued and it is free to reorder them as it pleases even with > DL> write caching disabled, but if write caching is disabled it has to delay > DL> the notification of their completion until the data is on the platters > DL> so that UFS+SU can enforce the proper dependency ordering. > But, again, performance would be terrible :( I've checked it. On > very sparse multi-threaded patterns (multiple torrents download on > fast channel in my simple home case, and, I think, things could be > worse in case of big file server in organization) and "simple" SATA > drives it significant worse in my experience :( I'm surprised that a typical drive would have enough onboard cache for write caching to help signficantly in that situation. Is the torrent software doing a lot of fsync() calls? Those would essentially turn into NOPs if write caching is enabled, but would stall the thread until the data hits the platter if write caching is disabled. One limitation of NCQ is that it only supports 32 simultaneous commands. With write caching enabled, you might be able to stuff more writes into the drive's onboard memory so that it can do a better job of optimizing the ordering and increase it's number of I/O's per second, though I wouldn't expect miracles. A SAS drive and controller with TCQ would support more simultaneous commands and might also perform better. Creating a file by writing it in random order is fairly expensive. Each time a new block is written by the application, UFS+SU has to first find a free block by searching the block bitmaps, mark that block as allocated, wait for that write of the bitmap block to complete, write the data to that block, wait for that to complete, and then write the block pointer to the inode or an indirect block. Because of the random write ordering, there is probably not enough locality to do coalesce multiple updates to the bitmap and indirect blocks into one write before the syncer interval expires. These operations all happen in the background after the write() call, but once you hit the I/O per second limit of the drive, eventually enough backlog builds to stall the application. Also, if another update needs to be done to a block that the syncer has queued for writing, that may also cause a stall until the write completes. If you hack the torrent software to create and pre-zero each file before it starts downloading it, then each bitmap and indirect block will probably only get written once during that operation and won't get written again during the actual download, and zeroing the data blocks will be sequential and fast. During the download, the only writes will be to the data blocks, so you might see something like a 3x performance improvement.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201303061001.r26A18n3015414>