From owner-freebsd-scsi@FreeBSD.ORG Fri Jun 6 07:38:41 2003 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3C26B37B401 for ; Fri, 6 Jun 2003 07:38:41 -0700 (PDT) Received: from matou.sibbald.com (matou.sibbald.com [195.202.201.48]) by mx1.FreeBSD.org (Postfix) with ESMTP id F082143F93 for ; Fri, 6 Jun 2003 07:38:34 -0700 (PDT) (envelope-from kern@sibbald.com) Received: from [192.168.68.112] (rufus [192.168.68.112]) by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h56EUpv23575; Fri, 6 Jun 2003 16:30:52 +0200 From: Kern Sibbald To: mjacob@feral.com In-Reply-To: <20030604074943.E98367@wonky.in0.lcl> References: <3EDB31AB.16420.C8964B7D@localhost> <3EDB59A4.27599.C93270FB@localhost> <20030602110836.H71034@beppo> <20030602131225.F71034@beppo> <1054645616.13630.161.camel@rufus> <3490610000.1054651919@aslan.scsiguy.com> <20030603084701.U24586@wonky.in0.lcl> <20030603103611.R24586@wonky.in0.lcl> <20030604074943.E98367@wonky.in0.lcl> Content-Type: text/plain Organization: Message-Id: <1054909851.13630.967.camel@rufus> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.4 Date: 06 Jun 2003 16:30:51 +0200 Content-Transfer-Encoding: 7bit cc: freebsd-scsi@freebsd.org Subject: Re: SCSI tape data loss X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Jun 2003 14:38:41 -0000 Hello, I have now completed a fairly extensive series of tests on my Linux machine with a DDS-4 drive and on Dan's FreeBSD machine with a DDS-1 drive. Bottom line: There is a significant data loss (500KB to 2MB) at the EOM on Dan's drive. There is no data loss on my drive. The variation in the data loss seems to be inversely dependent on how compressible the data is (i.e the more the data can be compressed to fit in a fixed size driver buffer, the more user data is lost). I ran three different kinds of tests and several variations of some of those tests: Tests: 1. Bacula saving a 1GB file containing random data. 2. Simulation of Bacula writing easily compressible, non-random data. 3. Raw write() of random data (same data each write except for first 32 bits). Variations: 1. Bacula stop writing before EOM reached. 2. Test 2 above without drive hardware compression 3. Test 3 above without writing EOF but simply rewinding 4. Tests with and without using ioctl(MTIOCLRERROR). 5. Various tests with block size at 64,512 bytes, others with block size at 61,440 bytes. Results: 1. All tests on my machine succeeded. 2. All tests (Test 1 Variation 1) not writing to EOM succeed on both machines. (Previously we indicated that there was a loss when not writing to the EOM. I could not produce this and believe we had a misunderstanding somewhere). 3. All tests of all variations writing to EOM failed on Dan's machine. 4. The number of buffers lost was quite consistent (1-2 buffer difference) for any given variation. 5. There was not much difference in the number of buffers lost with/without hardware compression when the data was random. 6. The number of buffers lost was 4 times greater with non-random data and drive compression enabled than with random data or with no drive compression. Conclusions: 1. On Dan's machine, data is always lost at EOM. 2. The amount of data lost appears to be closely related to what is in the drive buffer (more buffers are lost if the data is easily compressed). Possible causes: 1. The hardware does not have an LEOM 2. The driver is not signaling to the program when an LEOM occurs thus the buffered data is lost at the PEOM, The ONLY write() status I got in all the tests was -1 with errno=ENOSPC (no zero bytes written were ever returned). 3. Some miscommunication between the hardware and the driver. What next: - Time for the SCSI guys to look at this. The problem is easily repeatable on Dan's machine -- just do a whole bunch of write()s, nothing else, and it is guaranteed to happen. Perhaps all the above is not clear enough, in which case, please ask, but if I write it out with all the reasoning, it will be a monster essay, so I've tried to give the important test results so that you can draw your own conclusions and then compare them to mine. Best regards, Kern