Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Aug 2002 20:25:01 -0700 (PDT)
From:      Matthew Jacob <mjacob@feral.com>
To:        Cejka Rudolf <cejkar@fit.vutbr.cz>
Cc:        stable@FreeBSD.ORG
Subject:   Re: EOT tape handling changed?
Message-ID:  <Pine.BSF.4.21.0208292002510.60710-100000@beppo>
In-Reply-To: <20020829100033.GA2174@fit.vutbr.cz>

next in thread | previous in thread | raw e-mail | index | archive | help

I'm not sure what you're talking about here with this test program
(deleted)- unless you've dorked with MAXPHYS defines, the maximum you
can any tape record at is 64K.

> 
> > I'm not sure that this is right. The NetBSD driver will do the same
> > behaviour at this point.
> 
> Maybe it has been changed too? Or maybe afbackup on NetBSD has problems
> (however there is a trivial and universal fix for both cases, no problem)
> as in FreeBSD now, even if sources of afbackup take NetBSD into account.
> Can anybody try EOT handling in NetBSD in the reality? I'm sorry, I'm
> afraid that I could not do this.

I might. 

> 
> > The problem is that you cannot return a residual if you return an error.
> > For tape drives that then write *partial* final records (e.g., if you
> > have EEW off), you cannot know exactly what you wrote. Therefore, when
> 
> Please, what is "EEW off"?

Enable Early Warning- this allows you to get EOT notification prior to
hard end of tape. Otherwise, you get a VOLUME OVERFLOW and lose data
(usually). See below.

> > you read things back, you end up with duplicated data when you do tape
> > spanning.
> 
> I'm not sure, if I see this problem too. If I understand correctly,
> previous behaviour was that when write() was successful just partially,
> it returned -1/ENOSPC, so some data could be duplicated on the next
> tape. Now write() should return partial count, then zero and
> then -1/ENOSPC...

But the problem here is that you need a signifier. It's been a while
since I worked on this stuff, so I had to go refrehs my memory- sorry if
my story keeps changing.


The model I'm trying to converge to has the following (if writing):

If you got a VOLUME OVERFLOW, then you're at hard eot, so you latch up a
residual, which is pointless because you're going to set ENOSPC.

If you get EOM notification (Early Warning), then you mark EOM pending.
Now- it turns out that for all the tape drives I tested that showed a
non-zero residual after EOM notification that they were, actually,
wrong. They did in fact finish writing the data out. So, in any case,
this is where SA_FLAG_EOM_PENDING is set, deferring action until the
*next* I/O.

For 5.0-Current, the choice is then made to to make the signifier on the
next I/O be setting residual to equal byte count- indicating that zero
bytes had been written. To me these are the correct semantics. However,
setting ENOSPC here is probably okay for -stable. The key point though
is that SA_FLAG_EOM_PENDING is then cleared if there is no further I/O
queued up. Additional I/O will get the same Early Warning error, but the
I/O *will* complete (unless hard EOT is hit).

So- to re-summarize:

	If you hit hard EOT, this reflects right away back to the
	application, who gets ENOSPC and stops writing.

	If you hit Early Warning, you get a signifier on the next
	write- but you're allowed to continue to write since the
	one *after* the signifier goes thru.

Another issue then arises- should you allow I/O past Early Warning? That's
the whole point of this funky dance. My take is that you *should* allow
I/O (in order to write trailer records, should the application want to).

A very similar mechanism was put in place on Solaris. From st(7):
----------------
  EOT Handling
     The Emulex drives have only a physical end of  tape  (PEOT);
     thus  it is not possible to write past EOT. All other drives
     have a logical end of tape (LEOT) before PEOT  to  guarantee
     flushing  the  data  onto  the  tape.  The amount of storage
     between  LEOT and  PEOT varies from less  than  1  Mbyte  to
     about 20 Mbyte, depending on the tape drive.

     If EOT is encountered while writing an Emulex, no  error  is
     reported  but  the  number  of bytes transferred is 0 and no
     further writing is allowed. On all other drives,  the  first
     write that encounters EOT will return a short count or 0. If
     a short count is returned, then the next write  will  return
     0.  After a zero count is returned, the next write returns a
     full count or short  count.  A  following  write  returns  0
     again.  It  is important that the number and size of trailer
     records be kept as small as possible to prevent  data  loss.
     Therefore, writing after EOT is not recommended.
----------------


It seems to me that in the process of doing this for FreeBSD, we run afoul
of some applications who seem to expect perfect I/O up until hard EOT.

This is similar in NetBSD where 'early warning' is disabled by default.

So- sorry for the verbiage. We're left with a "what to do" type of
issue now. 

Let me do a little testing with -stable modified to force ENOSPC
instead of a zero i/o move count signifier- I'll let you know.

> 
> PS: I have/had another problem with timeouts - what do you think about
> increasing the standard value for SA_IO_TIMEOUT? In case of M2 from
> Exabyte, there is SmartClean feature, that when there are too much
> write errors during write operation, it rewinds tape at the beginning
> (up to 3 minutes), where is some cleaning tape and tries to clean heads
> (I hope no more that 2 minutes), returns to the previous position (again
> up to 3 minutes) and finishes write - so I'm trying to increase
> SA_IO_TIMEOUT from 4 to atleast 8 minutes (but I'm rather trying 15 now,
> because I realized this possible source of timeouts very recently, so
> I'm just experimenting... ;-).

This is good- send me a separate note about this to remind me, would you?
We *really* need to use hints for this, though.

-matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0208292002510.60710-100000>