Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 19 Apr 1999 15:08:24 -0700 (PDT)
From:      milt <milt@moth.vicor-nb.com>
To:        FreeBSD-gnats-submit@freebsd.org
Cc:        daver@moth.vicor-nb.com, jpl@moth.vicor-nb.com
Subject:   kern/11226: Invalid files on disk after fsync
Message-ID:  <199904192208.PAA05235@moth.vicor-nb.com>

next in thread | raw e-mail | index | archive | help

>Number:         11226
>Category:       kern
>Synopsis:       Invalid files on disk after fsync
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Apr 19 15:10:00 PDT 1999
>Closed-Date:
>Last-Modified:
>Originator:     milt@vicor-nb.com
>Release:        FreeBSD 2.2.6-STABLE
>Organization:
vicor
>Environment:

   We are running a very busy server which includes scsi drives and lots of nfs
   mounts.  df and dmesg are:

52. ? df
Filesystem              1K-blocks     Used    Avail Capacity  Mounted on
/dev/sd0s1a                 59471    21463    33251    39%    /
/dev/sd0s1f               8202004  2540061  5005783    34%    /usr
/dev/sd0s1e                 98479     4681    85920     5%    /var
/dev/sd1c                 8621381        1  7931670     0%    /disk2
procfs                          4        4        0   100%    /proc
opshome:/usr/opshome      8241012  4715979  2865753    62%    /usr/opshome
oos0a:/raid1             60821938 10291873 45664310    18%    /raid1
oos0a:/raid2             60821938 11946475 44009708    21%    /raid2
oos0a:/raid3             60821938    73528 55882655     0%    /raid3
oos0a:/usr/env1/data/lb   1843066   847170   848451    50%    /usr/env1/data/lb
oos0a:/usr/env5/data/lb   1843066   847170   848451    50%    /usr/env5/data/lb
oos0a:/usr/env3/data/lb   1843066   847170   848451    50%    /usr/env3/data/lb
oos0a:/usr/env4/data/lb   1843066   847170   848451    50%    /usr/env4/data/lb


Copyright (c) 1992-1998 FreeBSD Inc.
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.

FreeBSD 2.2.6-STABLE #0: Tue Apr 28 15:08:30 PDT 1998
    root@ipm0.lbxrich.vicor-nb.com:/disk2/src.patched/sys/compile/WSS
CPU: Pentium (199.43-MHz 586-class CPU)
  Origin = "GenuineIntel"  Id = 0x544  Stepping=4
  Features=0x8001bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,MMX>
real memory  = 134217728 (131072K bytes)
avail memory = 129515520 (126480K bytes)
Probing for devices on PCI bus 0:
chip0 <Intel 82439TX PCI cache memory controller> rev 1 on pci0:0:0
chip1 <Intel 82371AB PCI-ISA bridge> rev 1 on pci0:7:0
chip2 <Intel 82371AB IDE interface> rev 1 on pci0:7:1
chip3 <Intel 82371AB USB interface> rev 1 int d irq 11 on pci0:7:2
chip4 <Intel 82371AB Power management controller> rev 1 on pci0:7:3
ahc0 <Adaptec 2940 SCSI host adapter> rev 0 int a irq 10 on pci0:9:0
ahc0: aic7870 Single Channel, SCSI Id=7, 16 SCBs
ahc0 waiting for scsi devices to settle
ahc0: target 0 Tagged Queuing Device
(ahc0:0:0): "SEAGATE ST19171N 0023" type 0 fixed SCSI 2
sd0(ahc0:0:0): Direct-Access 8683MB (17783112 512 byte sectors)
ahc0: target 1 Tagged Queuing Device
(ahc0:1:0): "SEAGATE ST19171N 0023" type 0 fixed SCSI 2
sd1(ahc0:1:0): Direct-Access 8683MB (17783112 512 byte sectors)
(ahc0:3:0): "HP C1533A A708" type 1 removable SCSI 2
st0(ahc0:3:0): Sequential-Access density code 0x24, variable blocks, write-enabled
fxp0 <Intel EtherExpress Pro 10/100B Ethernet> rev 1 int a irq 9 on pci0:10:0
fxp0: Ethernet address 00:a0:c9:27:93:fd
vga0 <VGA-compatible display device> rev 0 int a irq 12 on pci0:11:0
Probing for devices on the ISA bus:
sc0 at 0x60-0x6f irq 1 on motherboard
sc0: VGA color <16 virtual consoles, flags=0x0>
sio0 at 0x3f8-0x3ff irq 4 on isa
sio0: type 16550A
sio1 at 0x2f8-0x2ff irq 3 on isa
sio1: type 16550A
psm0 not found at 0x60
fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1.44MB 3.5in
npx0 on motherboard
npx0: INT 16 interface
Intel Pentium F00F detected, installing workaround


>Description:

What we notice is invalid file contents.  The sequence is:

1. Our Qft utility writes a file by using a open,write/write/...,fsync,close
   sequence:
      int outFd = open (inPath, O_WRONLY | O_CREAT | O_TRUNC, 0664);
      if (outFd <= 0)
         ReportError ("Unable to open " << inPath << " for writing")

      while (...
      ... receive 8192 bytes from tcp
      if (write (fd, fileBuffer, sizeof (fileBuffer)) != sizeof (fileBuffer))
         ReportError ("Error Writing File " << inPath);

      ... the last write may be less than 8192 bytes.

      if (fsync (outFd)) ReportError ("fsync failure on file " << inPath)
      close (outFd);

2. Some program (maybe a backup utility, maybe Qft) reads the file and receives
   valid contents.

3. Some program reads the file and receives invalid contents.

   *. At this point, the data on disk is permanently invalid.

   *. The problem consists of 8192 wrong bytes which start on a page boundary in
      the middle of the file.  The first time this came up, the bad data
      consisted of two short files which had existed some weeks prior to the
      problem.  The second time this came up, the bad data looked like part of
      an executeable and could have been old or new.

   *. The file date/time modified (from ls -l) remains the time of the initial
      file write from step 1.

THAT IS: we write a file, we read it and find it good, we read it and find it
invalid even though the date hasn't changed.

Probably you will think I am making this up, confused, or incompetent.  Sounds
pretty phenomenal.  Tell me how one could accidentally change the file contents
without changing the access date?

Also, the valid read is happening within 5 to 20 minutes of the initial file
write and may well be seeing data which is in cache but never makes it to
disk.  (The invalid read is happening hours or days later.)


>How-To-Repeat:

   I wish I could.  I spent all week end trying.  The cause is going to have to
   include some unusual interaction or other or I would have duplicated it by
   now.

   We have seen two incidents one month apart on two different hosts at the
   same site (our busiest site).

>Fix:
    
   I wish I knew.  Fortunately it doesn't come up often.

>Release-Note:
>Audit-Trail:
>Unformatted:
 Reply-To: milt@vicor-nb.com
 X-send-pr-version: 3.2
 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199904192208.PAA05235>