Date: Mon, 17 Jan 2005 15:28:08 -0000 From: "Herminio" <herminio.gonzalez@ctsu.ox.ac.uk> To: <aic7xxx@freebsd.org> Subject: RE: Error attempting to read or write to /dev/st0: sense key MediumError Message-ID: <004801c4fca9$261d3f20$bc06010a@ctsu.ox.ac.uk> In-Reply-To: <004101c4fc8a$4d301660$bc06010a@ctsu.ox.ac.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
> > > I am a newbie to linux servers and tape backups, and I have a > > > problem performing a simple 'tar -cf' backup. > > > > > > The system in question: > > > Redhat 9.0 on a Dell PowerEdge 600SC with > > > Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8 > > > > > > The tape drive is a DELL PowerVault 100T DDS4. It appears > as 'Python > > > 06408-XXX Rev: 9100' in /proc/scsi/scsi. It is the only > SCSI device > > > on a Adaptec 3960D Ultra160 SCSI adapter. > > > > > > Using a brand new DDS-4 tape, I try the following: > > > > > > [root@nccserver root]# tar cvf /dev/st0 data/ > > > data/ > > > ... > > > tar: /dev/st0: Wrote only 0 of 10240 bytes > > > tar: Error is not recoverable: exiting now > > > > Does > > > > mt -f /dev/nst0 setblk 0 > > > > help any? I've needed this in the past on exabyte 8mm tapes > > with some drives. > > > > Thanks for the tip James, but unfortunately that has does not seem to > make any difference on this system. the tar command returns the same > error messages. > I have found out something that I think reveals the cause of the problem. First though, I think I should explain more about the situation. We have two almost identically configured servers - a production server and a test server. Up to now in this thread I have just been talking about the production server. The only hardware difference I know of between these two servers is in the RAM (test server has 256MB, production server has 512MB) and the DVD combo drive (one is LG, the other is Samsung). Unfortunately the production server is in an office in Beijing, while the test server is here with me in the UK. I cannot log into the production server. Up to now I have been pretending that I have access to the production server, for the sake of simplicity. The Beijing office does not have much IT expertise, so I cannot easily ask them to open up the server to check cabling, terminators, etc. But I can ask them to log in and execute commands as root. Both were built and delivered by DELL, although possibly from different assembly plants. Still, according to the specs on the invoice, they are pretty much indentical. Moreover, the production server has been writing to tape successfully (using tar) for over a year until just recently, when it developed the fault described in this thread. The test server is fully functional. I have asked the IT rep in the Beijing office to send me the system logs in /var/log/messages* . After comparing them with the system logs from the test server, I have spotted error messages (apart from the errors while using tar) that appear in the production server, but not in the test server. These messages get logged during startup, at the point just before it logs information about the AIC7XXX driver. The error messages are the 4 lines that start with ahc_pci: ... Jan 11 08:48:41 nccserver kernel: Freeing initrd memory: 308k freed Jan 11 08:48:41 nccserver kernel: VFS: Mounted root (ext2 filesystem). Jan 11 08:48:41 nccserver kernel: SCSI subsystem driver Revision: 1.00 Jan 11 08:48:41 nccserver kernel: ahc_pci:0:6:0: PCI error Interrupt at seqaddr = 0x47 Jan 11 08:48:41 nccserver kernel: ahc_pci:0:6:0: Data Parity Error Detected during address or write data phase Jan 11 08:48:41 nccserver kernel: ahc_pci:0:6:1: PCI error Interrupt at seqaddr = 0x46 Jan 11 08:48:41 nccserver kernel: ahc_pci:0:6:1: Data Parity Error Detected during address or write data phase Jan 11 08:48:41 nccserver kernel: scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8 Jan 11 08:48:42 nccserver kernel: <Adaptec 3960D Ultra160 SCSI adapter> Jan 11 08:48:42 nccserver kernel: aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs Jan 11 08:48:42 nccserver kernel: Jan 11 08:48:42 nccserver kernel: scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8 Jan 11 08:48:42 nccserver kernel: <Adaptec 3960D Ultra160 SCSI adapter> Jan 11 08:48:42 nccserver kernel: aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs Jan 11 08:48:42 nccserver kernel: Jan 11 08:48:42 nccserver kernel: blk: queue c256da14, I/O limit 4095Mb (mask 0xffffffff) Jan 11 08:48:42 nccserver kernel: Vendor: ARCHIVE Model: Python 06408-XXX Rev: 9100 Jan 11 08:48:42 nccserver kernel: Type: Sequential-Access ANSI SCSI revision: 03 Jan 11 08:48:42 nccserver kernel: blk: queue c256dc14, I/O limit 4095Mb (mask 0xffffffff) Jan 11 08:48:42 nccserver kernel: megaraid: v1.18h (Release Date: Thu Feb 6 17:25:43 EST 2003) Jan 11 08:48:42 nccserver kernel: megaraid: found 0x1000:0x1960:idx 0:bus 0:slot 7:func 0 Jan 11 08:48:42 nccserver kernel: scsi2 : Found a MegaRAID controller at 0xe085f000, IRQ: 5 Jan 11 08:48:42 nccserver kernel: scsi2 : Enabling 64 bit support Jan 11 08:48:42 nccserver kernel: megaraid: [3.28:1.05] detected 1 logical drives Jan 11 08:48:42 nccserver kernel: megaraid: supports extended CDBs. Jan 11 08:48:42 nccserver kernel: megaraid: channel[1] is raid. Jan 11 08:48:42 nccserver kernel: scsi2 : LSI Logic MegaRAID 3.28 254 commands 15 targs 4 chans 7 luns Jan 11 08:48:42 nccserver kernel: scsi2: scanning virtual channel 0 for logical drives. Jan 11 08:48:42 nccserver kernel: Vendor: MegaRAID Model: LD0 RAID1 34678R Rev: 3.28 Jan 11 08:48:42 nccserver kernel: Type: Direct-Access ANSI SCSI revision: 02 Jan 11 08:48:42 nccserver kernel: blk: queue c256de14, I/O limit 4095Mb (mask 0xffffffff) Jan 11 08:48:42 nccserver kernel: scsi2: scanning virtual channel 1 for logical drives. Jan 11 08:48:42 nccserver kernel: scsi2: scanning virtual channel 2 for logical drives. Jan 11 08:48:42 nccserver kernel: scsi2: scanning physical channel 0 for devices. Jan 11 08:48:42 nccserver kernel: Attached scsi disk sda at scsi2, channel 0, id 0, lun 0 Jan 11 08:48:42 nccserver kernel: SCSI device sda: 71020544 512-byte hdwr sectors (36363 MB) Jan 11 08:48:42 nccserver kernel: Partition check: Jan 11 08:48:42 nccserver kernel: sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 > Jan 11 08:48:42 nccserver kernel: LVM version 1.0.5+(22/07/2002) module loaded Jan 11 08:48:42 nccserver kernel: Journalled Block Device driver loaded Jan 11 08:48:42 nccserver kernel: kjournald starting. Commit interval 5 seconds Jan 11 08:48:42 nccserver kernel: EXT3-fs: mounted filesystem with ordered data mode. ... ahc_pci sounds as though it is related to the aic7xxx driver. I will start to investigate on the web what this error message means, but I thought maybe somebody in this mailing list is able to understand what is happening by just seeing the error messages. Regards, Herminio
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?004801c4fca9$261d3f20$bc06010a>