Date: Mon, 19 Apr 1999 15:08:24 -0700 (PDT) From: milt <milt@moth.vicor-nb.com> To: FreeBSD-gnats-submit@freebsd.org Cc: daver@moth.vicor-nb.com, jpl@moth.vicor-nb.com Subject: kern/11226: Invalid files on disk after fsync Message-ID: <199904192208.PAA05235@moth.vicor-nb.com>
next in thread | raw e-mail | index | archive | help
>Number: 11226 >Category: kern >Synopsis: Invalid files on disk after fsync >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Apr 19 15:10:00 PDT 1999 >Closed-Date: >Last-Modified: >Originator: milt@vicor-nb.com >Release: FreeBSD 2.2.6-STABLE >Organization: vicor >Environment: We are running a very busy server which includes scsi drives and lots of nfs mounts. df and dmesg are: 52. ? df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/sd0s1a 59471 21463 33251 39% / /dev/sd0s1f 8202004 2540061 5005783 34% /usr /dev/sd0s1e 98479 4681 85920 5% /var /dev/sd1c 8621381 1 7931670 0% /disk2 procfs 4 4 0 100% /proc opshome:/usr/opshome 8241012 4715979 2865753 62% /usr/opshome oos0a:/raid1 60821938 10291873 45664310 18% /raid1 oos0a:/raid2 60821938 11946475 44009708 21% /raid2 oos0a:/raid3 60821938 73528 55882655 0% /raid3 oos0a:/usr/env1/data/lb 1843066 847170 848451 50% /usr/env1/data/lb oos0a:/usr/env5/data/lb 1843066 847170 848451 50% /usr/env5/data/lb oos0a:/usr/env3/data/lb 1843066 847170 848451 50% /usr/env3/data/lb oos0a:/usr/env4/data/lb 1843066 847170 848451 50% /usr/env4/data/lb Copyright (c) 1992-1998 FreeBSD Inc. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. FreeBSD 2.2.6-STABLE #0: Tue Apr 28 15:08:30 PDT 1998 root@ipm0.lbxrich.vicor-nb.com:/disk2/src.patched/sys/compile/WSS CPU: Pentium (199.43-MHz 586-class CPU) Origin = "GenuineIntel" Id = 0x544 Stepping=4 Features=0x8001bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,MMX> real memory = 134217728 (131072K bytes) avail memory = 129515520 (126480K bytes) Probing for devices on PCI bus 0: chip0 <Intel 82439TX PCI cache memory controller> rev 1 on pci0:0:0 chip1 <Intel 82371AB PCI-ISA bridge> rev 1 on pci0:7:0 chip2 <Intel 82371AB IDE interface> rev 1 on pci0:7:1 chip3 <Intel 82371AB USB interface> rev 1 int d irq 11 on pci0:7:2 chip4 <Intel 82371AB Power management controller> rev 1 on pci0:7:3 ahc0 <Adaptec 2940 SCSI host adapter> rev 0 int a irq 10 on pci0:9:0 ahc0: aic7870 Single Channel, SCSI Id=7, 16 SCBs ahc0 waiting for scsi devices to settle ahc0: target 0 Tagged Queuing Device (ahc0:0:0): "SEAGATE ST19171N 0023" type 0 fixed SCSI 2 sd0(ahc0:0:0): Direct-Access 8683MB (17783112 512 byte sectors) ahc0: target 1 Tagged Queuing Device (ahc0:1:0): "SEAGATE ST19171N 0023" type 0 fixed SCSI 2 sd1(ahc0:1:0): Direct-Access 8683MB (17783112 512 byte sectors) (ahc0:3:0): "HP C1533A A708" type 1 removable SCSI 2 st0(ahc0:3:0): Sequential-Access density code 0x24, variable blocks, write-enabled fxp0 <Intel EtherExpress Pro 10/100B Ethernet> rev 1 int a irq 9 on pci0:10:0 fxp0: Ethernet address 00:a0:c9:27:93:fd vga0 <VGA-compatible display device> rev 0 int a irq 12 on pci0:11:0 Probing for devices on the ISA bus: sc0 at 0x60-0x6f irq 1 on motherboard sc0: VGA color <16 virtual consoles, flags=0x0> sio0 at 0x3f8-0x3ff irq 4 on isa sio0: type 16550A sio1 at 0x2f8-0x2ff irq 3 on isa sio1: type 16550A psm0 not found at 0x60 fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa fdc0: FIFO enabled, 8 bytes threshold fd0: 1.44MB 3.5in npx0 on motherboard npx0: INT 16 interface Intel Pentium F00F detected, installing workaround >Description: What we notice is invalid file contents. The sequence is: 1. Our Qft utility writes a file by using a open,write/write/...,fsync,close sequence: int outFd = open (inPath, O_WRONLY | O_CREAT | O_TRUNC, 0664); if (outFd <= 0) ReportError ("Unable to open " << inPath << " for writing") while (... ... receive 8192 bytes from tcp if (write (fd, fileBuffer, sizeof (fileBuffer)) != sizeof (fileBuffer)) ReportError ("Error Writing File " << inPath); ... the last write may be less than 8192 bytes. if (fsync (outFd)) ReportError ("fsync failure on file " << inPath) close (outFd); 2. Some program (maybe a backup utility, maybe Qft) reads the file and receives valid contents. 3. Some program reads the file and receives invalid contents. *. At this point, the data on disk is permanently invalid. *. The problem consists of 8192 wrong bytes which start on a page boundary in the middle of the file. The first time this came up, the bad data consisted of two short files which had existed some weeks prior to the problem. The second time this came up, the bad data looked like part of an executeable and could have been old or new. *. The file date/time modified (from ls -l) remains the time of the initial file write from step 1. THAT IS: we write a file, we read it and find it good, we read it and find it invalid even though the date hasn't changed. Probably you will think I am making this up, confused, or incompetent. Sounds pretty phenomenal. Tell me how one could accidentally change the file contents without changing the access date? Also, the valid read is happening within 5 to 20 minutes of the initial file write and may well be seeing data which is in cache but never makes it to disk. (The invalid read is happening hours or days later.) >How-To-Repeat: I wish I could. I spent all week end trying. The cause is going to have to include some unusual interaction or other or I would have duplicated it by now. We have seen two incidents one month apart on two different hosts at the same site (our busiest site). >Fix: I wish I knew. Fortunately it doesn't come up often. >Release-Note: >Audit-Trail: >Unformatted: Reply-To: milt@vicor-nb.com X-send-pr-version: 3.2 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199904192208.PAA05235>