Date: Fri, 12 Mar 2004 00:04:47 +1100 (EST) From: Bruce Evans <bde@zeta.org.au> To: Colin Percival <colin.percival@wadham.ox.ac.uk> Cc: cvs-all@freebsd.org Subject: Re: cvs commit: src/sys/sys mdioctl.h src/sys/dev/md md.c src/sbin/mdconfig mdconfig.8 mdconfig.c Message-ID: <20040311230444.G6384@gamplex.bde.org> In-Reply-To: <6.0.1.1.1.20040311063721.03e220b8@imap.sfu.ca> References: <Your message of "Thu, 11 Mar 2004 06:30:28 GMT." <48348.1078986950@critter.freebsd.dk> <6.0.1.1.1.20040311063721.03e220b8@imap.sfu.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 11 Mar 2004, Colin Percival wrote: > At 06:35 11/03/2004, Poul-Henning Kamp wrote: > >In message <6.0.1.1.1.20040311062306.03f9ade0@imap.sfu.ca>, Colin Percival > >writ > >es: > > ><kernelnewbie> > > > Is it really necessary for vnode-backed memory disks to be > > >accessed through the filesystem? Why can't md(4) hijack the > > >disk blocks which constitute the file (telling the filesystem > > >not to touch them, of course) and translate I/O operations > > >directly into I/O on the underlying device? > > ></kernelnewbie> Script started on Thu Mar 11 23:13:06 2004 ttyp0:root@besplex:/c/tmp> dd if=/dev/zero of=zz bs=1 oseek=32767g count=1 1+0 records in 1+0 records out 1 bytes transferred in 0.000070 secs (14266 bytes/sec) ttyp0:root@besplex:/c/tmp> du zz 448 zz ttyp0:root@besplex:/c/tmp> exit Script done on Thu Mar 11 23:13:47 2004 This creates a file of size 32TB-epsilon with 1 minimal block in it (a 2K frag for ffs). md could map this block but would have difficulty using the other 32TB-2*epsilon bytes in the file. It would have to duplcicate the file system's block allocator to allocate new blocks. The block allocator is the most interesting part of a file system. The script doesn't show mdconfig'ing this file since md has overflow bugs at 4G sectors and can't actually handle files olf this size. Apart from this, direct access would sort of work. Very old versions did this. See rev.1.1 of sys/dev/vn/vn.c. It only uses VOP_BMAP() to map the blocks and VOP_STRATEGY() to do i/o. It apparently doesn't work for writing to holes in the file. > >That would be a really complex solution to a problem which should not > >exist in the first place :-) The version in rev.1.1 of vn.c is about twice large and more than twice as complex as the current code. It probably needs to be more complex to actually work (apart from not supporting sparse files). There were several intermediate versions that worked better but still had deadlock problems (IIRC, it got rewritten 3 or 4 times mainly to reduce deadlock problems). The version in RELENG_3 still uses VOP_BMAP/VOP_STRATEGY. The version in RELENG_4 still claims to use VOP_BMAP/VOP_STRATEGY in a comment, but actually uses VOP_READ/VOP_WRITE for vnodes (as explained in another comment). The version in md.c in -current is similar to vn.c in RELENG_4. > Well... yes, but it *would* make sure that data didn't get passed > back up to the filesystem layer. And it would probably be faster, > which is why I thought of it. It might also have fewer deadlock possibilities. VOP_READ/VOP_WRITE are inherently more blocking than VOP_STRATEGY. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040311230444.G6384>