Date: Tue, 11 Feb 1997 13:38:03 -0800 (PST) From: Simon Shapiro <Shimon@i-Connect.Net> To: freebsd-hackers@freebsd.org Subject: Raw I/O Question Message-ID: <XFMail.970211141038.Shimon@i-Connect.Net>
next in thread | raw e-mail | index | archive | help
Can someone take a moment and describe briefly the execution path of a lseek/read/write system call to a raw (character) SCSI partition? We are very interested in the most optimal, shortest path to I/O on a large number of disks. We performed some measurements and see some results we would like to understand; For example, we did READ and WRITE to random records in a block device. The test was run several times, each using a different block size (starting at 512 bytes and ending with 128KB). All our measurements are in I/O Transfers/Sec. We see a depression in READ and WRITE performance, until block size reaches 2K. At this point performance picks up and levels off until block size reaches 8KB. At this point it starts gradual, linear decline. What we see is a flat WRITE response until 2K. then it starts a linear decline until it reaches 8K block size. At this point it converges with READ performance. The initial WRITE performance, for small blocks is quite poor compared to READ. We attribute it to the need to do read-modify-write when blocks are smaller than a certain ``natural block size (page?). Another attribute of performance loss, we think to be the lack of O_SYNC) option to the write(2) system call. This forces the application to do an fsync after EVERY WRITE. We have to do that for many good reasons. The READ performance is even more peculiar. It starts higher than WRITE, declines rapidly until block size reaches 2K. It peaks at 4K blocks and starts a linear decline from that point on (as block size increases). We intend to use the RAW (character) device with the mpool buffering system and would like to understand its behavior without reading the WHOLE kernel source :-) We are very interested in the flow of control and flow of data. How do synchronous WRITE operations pass through? We need this to guarantee transaction completion (commits) There are several problems here we want to understand: How does the system call logic transfer control to the SCSI layer? All we see is the condtruction of a struct buf and a call to scsi_scsi_cmd. How is the SCSI FLUSH CACHE passed down? We may need to trap it in the HBA driver, so the HBA can flush its buffers too. What block size I/O do we need so that we do not ever do read-modify-write? This sort of questions... Easy stuf... I hope this community (which has welcomed me very warmly and has been so helpful, will find these questions useful. Maybe when one of us is older and has more time on their hands {s}he will write``FreeBSD Internals'' book and all will be well in Zion... Simon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.970211141038.Shimon>