Date: Sat, 14 Aug 1999 11:20:11 +0200 (CEST) From: Wilko Bulte <wilko@yedi.iaf.nl> To: karl@Denninger.Net (Karl Denninger) Cc: randy@psg.com, freebsd-scsi@FreeBSD.ORG Subject: Re: dump to dlt gets write error Message-ID: <199908140920.LAA53941@yedi.iaf.nl> In-Reply-To: <19990813191646.A57450@Denninger.Net> from Karl Denninger at "Aug 13, 1999 7:16:46 pm"
next in thread | previous in thread | raw e-mail | index | archive | help
As Karl Denninger wrote ... > I've seen this kind of stupidity before and you're not going to like the > problem or solution. > > Put the DLT on a different SCSI bus (different host adapter) from the disks. > > Specifically, separate the fast/wide and narrow SCSI devices. > > I've seen both DLTs and other "non-wide" devices have kittens with disks > running fast/wide on the same SCSI bus. It usually manifests itself as > an I/O error on the narrow device - which is exactly what you're getting. I've been doing this for years and it works just fine: FreeBSD 3.2-STABLE #5: Sun Aug 8 17:13:28 CEST 1999 root@yedi.iaf.nl:/usr/freebsd-stable-src/src/sys/compile/YEDI [....] da0: <DEC RZ1EF-CB (C) DEC 0371> Fixed Direct Access SCSI-2 device da0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled da0: 17365MB (35565080 512 byte sectors: 255H 63S/T 2213C) evice to da0s2a cd0 at ahc0 bus 0 target 4 lun 0 cd0: <TOSHIBA CD-ROM XM-5701TA 0557> Removable CD-ROM SCSI-2 device cd0: 10.000MB/s transfers (10.000MHz, offset 8) cd0: Attempt to query device size failed: NOT READY, Medium not present da1 at ahc0 bus 0 target 1 lun 0 da1: <IBM DDRS-39130D DC1B> Fixed Direct Access SCSI-2 device da1: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled da1: 8715MB (17850000 512 byte sectors: 255H 63S/T 1111C) cd1: <PHILIPS CDD3600 CD-R/RW 2.00> Removable CD-ROM SCSI-2 device cd1: 10.000MB/s transfers (10.000MHz, offset 15) cd1: Attempt to query device size failed: NOT READY, Medium not present sa3 at ahc0 bus 0 target 2 lun 0 sa3: <ARCHIVE 4326XX 27871-XXX 0322> Removable Sequential Access SCSI-2 device sa3: 5.000MB/s transfers (5.000MHz, offset 15) sa2 at ahc0 bus 0 target 5 lun 0 sa2: <DEC TZ88 (C) DEC D473> Removable Sequential Access SCSI-2 device sa2: 10.000MB/s transfers (10.000MHz, offset 15) sa0 at ahc0 bus 0 target 6 lun 0 sa0: <TANDBERG TDC 4200 00A1> Removable Sequential Access SCSI-2 device sa0: 3.300MB/s transfers The TZ88 is a DLT4000 btw, I also used a TZ87 which is a DLT2000. All my tapes are in a Storageworks shelf. > My guess is that the hardware on the narrow (and not-so-fast) device gets > mightily confused by the shorter signal times (even though they're not > aimed at that target) and randomly "freaks out" enough to botch an > operation. A DLT4000 is also a fast scsi device. My DLT2000 which is 5 Mb/sec also worked just fine. If I had to guess this is bad interconnect of some kind, or lousy termination. > Do you get any kind of DMESG log when the write *fails* (check it) or a > console log of the actual error? You can also pull the error logs from within the DLT drive itself. Try the script below: #!/bin/sh # dltinfo: get more information out of your DLT tape drive. # # (C) 1996, Wilko Bulte, wilko@freebsd.org # # Warning: This script has only been tested on a DEC TZ87 & TZ88 DLT # # You need the DLT drive's OEM manual (or similar) to make # sense out of some of the data reported. # Please send any constructive comments by email to wilko@freebsd.org Unit=2 ## camcontrol(8) setup Verbose="-v" TimeOut="-t 3" get_write_error_log() { RetVal=`camcontrol cmd -n sa -u $Unit \ $Verbose \ $Timeout \ -c "4d 0 42 0 0 0 0 0 3f 0" \ -i 63 \ "{skip} *i4 \ {skip} *i4 \ {Corrected errors without substantial delay} i4 \ {skip} *i4 \ {Corrected errors with possible delay } i4 \ {skip} *i4 \ {Total errors } i4 \ {skip} *i4 \ {Total errors corrected } i4 \ {skip} *i4 \ {Total times correction algorithm processed} i4 \ {skip} *i4 \ {Total bytes processed } i8 \ {skip} *i4 \ {Total uncorrected errors } i4" ` set $RetVal echo "--- write errors ---" printf "Corrected errors without substantial delay = %d\n" $1 printf "Corrected errors with possible delay = %d\n" $2 printf "Total errors = %d\n" $3 printf "Total errors corrected = %d\n" $4 printf "Total times correction algorithm processed = %d\n" $5 printf "Total bytes processed = %d\n" $6 printf "Total uncorrected errors = %d\n" $7 } get_read_error_log() { RetVal=`camcontrol cmd -n sa -u $Unit \ $Verbose \ $Timeout \ -c "4d 0 43 0 0 0 0 0 3f 0" \ -i 63 \ "{skip} *i4 \ {skip} *i4 \ {Corrected errors without substantial delay} i4 \ {skip} *i4 \ {Corrected errors with possible delay } i4 \ {skip} *i4 \ {Total errors } i4 \ {skip} *i4 \ {Total errors corrected } i4 \ {skip} *i4 \ {Total times correction algorithm processed} i4 \ {skip} *i4 \ {Total bytes processed } i8 \ {skip} *i4 \ {Total uncorrected errors } i4" ` set $RetVal echo "--- read errors ---" printf "Corrected errors without substantial delay = %d\n" $1 printf "Corrected errors with possible delay = %d\n" $2 printf "Total errors = %d\n" $3 printf "Total errors corrected = %d\n" $4 printf "Total times correction algorithm processed = %d\n" $5 printf "Total bytes processed = %d\n" $6 printf "Total uncorrected errors = %d\n" $7 } get_compression_log() { # Assumption: from the results observed in testing it lookse # like the residual counts are in kBytes (and not # in Mbytes as the TZ87 manual tells us). RetVal=`camcontrol cmd -n sa -u $Unit \ $Verbose \ $Timeout \ -c "4d 0 72 0 0 0 0 0 4c 0" \ -i 76 \ "{skip} *i4 \ {skip } *i4 \ {Read compression ratio (* 100 %) } i2 \ {skip } *i4 \ {Write compression ratio (* 100 %) } i2 \ {skip } *i4 \ {Total host Mbytes reads } i4 \ {skip } *i4 \ {Total host kbytes read residual } i4 \ {skip } *i4 \ {On tape Mbytes read } i4 \ {skip} *i4 \ {On tape kbytes read residual } i4 \ {skip} *i4 \ {Host requested Mbytes written } i4 \ {skip} *i4 \ {Host requested kbytes written residual } i4 \ {skip} *i4 \ {On tape Mbytes written } i4 \ {skip} *i4 \ {On tape kbytes written residual } i4 " ` set $RetVal echo "--- compression statistics ---" printf "Read compression ratio = %d %%\n" $1 printf "Write compression ratio = %d %%\n" $2 printf "Total host Mbytes read = %d\n" $3 printf "Total host kbytes read residual = %d\n" $4 printf "On tape Mbytes read = %d\n" $5 printf "On tape kbytes read residual = %d\n" $6 printf "Host requested Mbytes written = %d\n" $7 printf "Host requested kbytes written residual = %d\n" $8 printf "On tape Mbytes written = %d\n" $9 printf "On tape kbytes written residual = %d\n" $10 } get_read_error_log echo get_write_error_log echo get_compression_log echo It is a quick hack but it works for me. Like in: Mon Aug 9 00:32:07 CEST 1999 DUMP: Date of this level 0 dump: Mon Aug 9 00:32:07 1999 DUMP: Date of last level 0 dump: the epoch DUMP: Dumping /dev/rda1c (/local2) to /dev/nrsa2 DUMP: mapping (Pass I) [regular files] DUMP: mapping (Pass II) [directories] DUMP: estimated 2685410 tape blocks. DUMP: dumping (Pass III) [directories] DUMP: dumping (Pass IV) [regular files] DUMP: 12.18% done, finished in 0:36 DUMP: 27.84% done, finished in 0:25 DUMP: 43.49% done, finished in 0:19 DUMP: 59.02% done, finished in 0:13 DUMP: 73.04% done, finished in 0:09 DUMP: 86.80% done, finished in 0:04 DUMP: 99.85% done, finished in 0:00 DUMP: DUMP: 2686684 tape blocks on 1 volumes(s) DUMP: finished in 2104 seconds, throughput 1276 KBytes/sec DUMP: level 0 dump on Mon Aug 9 00:32:07 1999 DUMP: Closing /dev/nrsa2 DUMP: DUMP IS DONE Mon Aug 9 01:07:41 CEST 1999 --- read errors --- Corrected errors without substantial delay = 0 Corrected errors with possible delay = 0 Total errors = 0 Total errors corrected = 0 Total times correction algorithm processed = 0 Total bytes processed = 0 Total uncorrected errors = 0 --- write errors --- Corrected errors without substantial delay = 0 Corrected errors with possible delay = 0 Total errors = 105 Total errors corrected = 105 Total times correction algorithm processed = 0 Total bytes processed = 0 Total uncorrected errors = 0 --- compression statistics --- Read compression ratio = 0 % Write compression ratio = 100 % Total host Mbytes read = 0 Total host kbytes read residual = 0 On tape Mbytes read = 0 On tape kbytes read residual = 0 Host requested Mbytes written = 9196 Host requested kbytes written residual = 196608 On tape Mbytes written = 9196 On tape kbytes written residual = 0 The most interesting part is the Total errors thingy. I've seen that sky rocket with bad media or DLT drives with a bad head. IMHO this kind of errorlogging would be cool to have in any standard shape. Like VMS does, or in a quite different form, DEC Unix, eh Tru64. Really useful in case you have hardware problems. > Karl Denninger (karl@denninger.net) Web: childrens-justice.org > > > On Fri, Aug 13, 1999 at 05:05:17PM -0700, Randy Bush wrote: > > asus p2b-ds 2x350mhz, 128mb > > two barracudas > > quantum dlt2000 > > 4.0-currnt of 99.04.03 > > > > rip.psg.com:/# /do-dump > > ... > > DUMP: Date of this level 0 dump: Fri Aug 13 16:07:11 1999 > > DUMP: Date of last level 0 dump: the epoch > > DUMP: Dumping /dev/rccd5c (/usr) to /dev/nrsa0 > > DUMP: mapping (Pass I) [regular files] > > DUMP: mapping (Pass II) [directories] > > DUMP: estimated 2951716 tape blocks. > > DUMP: dumping (Pass III) [directories] > > DUMP: dumping (Pass IV) [regular files] > > DUMP: 11.37% done, finished in 0:38 > > DUMP: 25.73% done, finished in 0:28 > > DUMP: 39.41% done, finished in 0:23 > > DUMP: 51.77% done, finished in 0:18 > > DUMP: 64.66% done, finished in 0:13 > > DUMP: 76.19% done, finished in 0:09 > > DUMP: 88.92% done, finished in 0:04 > > DUMP: write error 2700020 blocks into volume 1 > > DUMP: Do you want to restart?: ("yes" or "no") > > > > usually a LOT more fits on a tape, like four machines more. > > > > i ran the cleaning tape. i tried different tapes from different batches, Don't ever run cleaning tapes on a DLT drive unless the 'Use cleaning tape' LED comes on. Cleaning tapes are really bad news for the DLT heads if they are run on a regular basis. They consist of a more or less normal data tape that did not get it's final polishing steps in manufacturing. They are quite abrasive and do bad things to non-dirty heads. > > including one that worked in the past. it breaks at different places, but > > always much of the way through that partition. > > > > clues solicited. > > > > randy > > -- | / o / / _ Arnhem, The Netherlands - Powered by FreeBSD - |/|/ / / /( (_) Bulte WWW : http://www.tcja.nl http://www.freebsd.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199908140920.LAA53941>