Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Nov 1995 10:35:39 +0500 (GMT+0500)
From:      "Serge A. Babkin" <babkin@hq.icb.chel.su>
To:        terry@lambert.org (Terry Lambert)
Cc:        hackers@freebsd.org
Subject:   Re: ISP state their FreeBSD concerns
Message-ID:  <199511170535.KAA29102@hq.icb.chel.su>
In-Reply-To: <199511161850.LAA03418@phaeton.artisoft.com> from "Terry Lambert" at Nov 16, 95 11:50:47 am

next in thread | previous in thread | raw e-mail | index | archive | help
> > We can divide all clients simultaneously accessing some file into two
> > groups:
> > 
> > 1. Clients who synchronyse their access using locks
> > 2. Clients who are ignoring locks
> > 
> > In my opinion we don't have to bother about the second group of clients
> > because their interaction is very random anywayi (there is absolutely
> > no difference between the situations where a client generates a write
> > request N seconds later or a write cache delaying this write request
> > for N seconds). Consider the first
> > group. When a process locks some part of file it assumes that it gets
> > ability to do anything with this part. So, it must not bother about
> > any cache coherence until it does an unlock call.
> 
> If you don't cache the second case, I agree.
> 
> Otherwise, you potentially screw applications that lock the index for
> a record in order to lock the record itself (the index and data files
> being seperate, so the lock doesn't apply over the cacheable data).

It really looks like a problem. For DOS client the solution may be to flush the
whole cache before issuing an unlock request. But I think any network-aware
DOS program must use fsync.

> > > The principle that a server is like a disk: when the write returns,
> > > you are guaranteed that a read several days later will return the
> > > same data by virtue of the semantics.
> > 
> > Agreed, but why can not we have a write cache before writing to server like
> > the write cache before writing to disk ?
> 
> Write cache going to disk relies on the fact that some writes are async,
> and may be cached, and some writes are sync, and must complete before
> they return, leaving the data on the disk.
> 
> DOS doesn't have the distinction between async and sync writes, so you
> can't just say "I'll just cache async writes" -- because there aren't
> any.

The programs using Netware must know about this difference. Most programs
that really do record locking are working on Netware, not on a local machine
(why else you need record locking on a single-tasking OS?) and differ
async and sync writes.

> So in the client redirector, whose job is to convert DOS calls into
> NFS requests, there is some leeway for interpretation.  But not very
> much.  DOS calls expect to have status returned immediately.  You
> can't succeed on caching a write, telling the application that the
> write succeeded, and then when it fails (because the server is currently
> down or because some other client has changed file protection), turn
> around and fail the write.

I think in many cases failing of some later write to this file will be
enough. The return code of close call can be used for this purpose too,
the only problem is that most applications don't bother about this code.

> 
> Since the client won't make one INT 21 call and then make another while
> that one is pending (DOS is a non-reentrant real mode interrupt handler),
> you can't cache one and not return.  You *must* return.

How about to use a like algorithm for an Unix client ?

> An index record.  The relationship is abstract, and not related to the
> committed write order.  Typically, UNIX applications which do this (like
> the NWU NetWare Directory Services Database) will use sync I/O, either
> by opening the index file O_SYNC, or explicitly calling fsync() to commit
> the transaction record.
> 
> DOS client applications assume O_SYNC on all opens.

Pure DOS (without any network) assume exclusive access to anything and do
not use any locks at all. 

> You'd do better to define your own API and make it conform to POSIX
> semantics (basically, a libc replacement) and write the DOS client
> programs that way.

The problem is that we need to run the ready applications, not only the
self-written ones. BTW I think Novell did exactly this with its Netware
and now all network-aware clients use such API implemented with DOS calls.

> > Another issue is that when the write requests are overlapped the later one
> > must overrule the earlier one (or wait until the earlier one is completed).
> 
> Probably it will have to wait, considering DOS is non-reentrant.  The most
> common case will be a rewrite of another record sharing the same cache
> block.  You could overwrite it only if it were exactly the same record.

Agreed.

> The problem here is that cache blocks must be committed in byte ranges
> instead of pages.  The efficiency will go way, way down.

We need to have a per-file cache, not the common one. We can limit the
number of outstanding write requests to a small number enough to work
as the window for the delay of our network. So we need not compare the
latest block against all block in cache, we need to compare it with the
small number of blocks of the same file.

> > Yes, this alogrithm implies a client-side "write only" cache and no
> > server-side write cache, the server remains as it is now. In your
> > analogy server is here a "disk".
> 
> What about Solaris, SGI, and SVR4 servers, which do server caching?

You said that this solution is unreliable and it will not be yet more
unreliable due to the client-side write caching.

> 
> > IMHO a client must not do this assumption until it issues an explicit
> > fsync() request. How else can it be shure that when its file is on the
> > common disk all its requests are not cached by OS?
> 
> DOS libc fsync() does nothing.  Or applications don't call fsync().
> Very differnt form assembly programs calling the INT 21 op.

How do they work on the Netware network ? If they want to work on it they
need to use fsync() too. And I think most dBASE-like databases do these
calls. If they do not use these calls they would not work on Netware too.

> > > The problem comes when some other client updates the same block
> > > before you do.
> > 
> > When this client did not used locking we get a general problem of
> > write interference, when it used locking this algorithm guarantees
> > that there will be no interference because this other client issues an
> > unlock call only after its cache is flushed and our client gets its lock
> > only after other clients issues an unlock call.
> 
> What about an implied data file lock because of a lock on the record.

It needs to use fsync(). IMHO most Unix programs that use this technique
use flock() or close the file they changed in case of running something like
shell scripts after locking. All DOS programs that work on
Netware must do it too or they will cause data corruption.

[...]

> > After latest experiments I have found (with Bruce Evans) the problem due to 
> > which DOS cients give terribly bad performance with FreeBSD. The problem
> > is that typical DOS transfer is one sector, 512 bytes while the FreeBSD
> > FFS block size is 8K, so when the file is written sequentially FreeBSD
> > needs to rewrite each block 16 times. The results with SCO or Linux are 
> > better because they use 1K blocks. HP-UX results are good too although it
> > has UFS but it divides each block into fragments of 1K and can work with
> > them separately. 
> 
> This is interesting, but hard to believe in the vmio case.  There is a
> bitmap in the buffer cacch for page sized buffers which should result in
> devbsize-based I/O for partial pages.  Basically, if I have a dirty write
> to a small record, (512 bytes in this case), the maximum span should be
> two devbsize blocks of the page.

If it is so then obvioulsly my conclusion was wrong.

> This may in fact be a bug in the VM system that you have found instead of
> a bug in the way I/O is intended to be done.  The worst you should see is
> ~350k/s, not 200k/s, except that the write commits must return before the
> next write can be done on the client.  And much of that is due to using
> sync I/O to guarantee ordering.

May be. Actually I have experimented with a SCO client mounting FreeBSD disks
with different [rw]size and executing dd with different block sizes. The
numbers I got are:

[rw]size=8192 dd bs=100k: 150K/s write 705K/s read
[rw]size=8192 dd bs=1k:   122K/s write 705K/s read
[rw]size=8192 dd bs=512:   28K/s write 688K/s read

[rw]size=2048 dd bs=100k:  53K/s write 316K/s read
[rw]size=2048 dd bs=1536:  40K/s write
[rw]size=2048 dd bs=1234:  33K/s write
[rw]size=2048 dd bs=1025:  28K/s write
[rw]size=2048 dd bs=1k:    52K/s write 307K/s read
[rw]size=2048 dd bs=512:   28K/s write

[rw]size=1024 dd bs=100k:  27K/s write 691K/s read
[rw]size=1024 dd bs=1k:    27K/s write 690K/s read
[rw]size=1024 dd bs=512:   28K/s write 651K/s read

With DOS Tsoft's client I got 12K/s write and 200K/s read independently of 
[rw]size when testing with sysinfo (I'm not shure was it Norton's or 
PC-tools).

The network was not empty so I carried out every test several times and
took the best one but the fluctuations are still possible.

The FreeBSD 2.0.5 NFS server was 486DX2/66 with 20M of memory and an IDE drive
with raw transfer speed about 4M/s.

The SCO client was Pentium/90 with SCO OpenServer 5.

The DOS client was 486DX2/66.

The network was Ethernet, all network cards are 3c509B except the on in SCO
server which was 3c579. The DOS client was connected through a 3COM TP hub,
FreeBSD and SCO are on thin Ethernet.

The results are looking like there is some problem with write requests of
512 bytes size.

		Serge Babkin

! (babkin@hq.icb.chel.su)
! Headquarter of Joint Stock Commercial Bank "Chelindbank"
! Chelyabinsk, Russia



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199511170535.KAA29102>