Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 7 Jul 2008 21:59:13 GMT
From:      Andrew Hammond <andrew.george.hammond@gmail.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/125382: ENOSPC may be misleading, consider EIO
Message-ID:  <200807072159.m67LxDnd002481@www.freebsd.org>
Resent-Message-ID: <200807072200.m67M01Rt026647@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help


>Number:         125382
>Category:       kern
>Synopsis:       ENOSPC may be misleading, consider EIO
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jul 07 22:00:01 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator:     Andrew Hammond
>Release:        6.2 amd64
>Organization:
AdECN, a Microsoft Company
>Environment:
FreeBSD db1.sjc.adecn.com 6.2-RELEASE-p6 FreeBSD 6.2-RELEASE-p6 #1: Thu Jul 19 09:21:10 PDT 2007     root@qaipc1.qa1.adecn.com:/usr/obj/usr/src/sys/ADECNDB  amd64
>Description:
Found the following error message in PostgreSQL logs:

vacuumdb: vacuuming of database "adecndb" failed: ERROR:  could not
write block 209610 of relation 1663/16386/236356665: No space left on
device

Didn't make sense since device is only at 18% usage. Got on pgsql-hackers mailing list (subject "the un-vacuumable table", thread starts at http://archives.postgresql.org/pgsql-hackers/2008-06/msg00922.php).

> Have you looked into the machine's kernel log to see if there is any
> evidence of low-level distress (hardware or filesystem level)?  I'm
> wondering if ENOSPC is being reported because it is the closest
> available errno code, but the real problem is something different than
> the error message text suggests.  Other than the errno the symptoms
> all look quite a bit like a bad-sector problem ...

Uhm, just for the record FileWrite returns error messages which get printed
this way for two reasons other than write(2) returning ENOSPC:

1) if FileAccess has to reopen the file then open(2) could return an error. I
don't see how open returns ENOSPC without O_CREAT (and that's cleared for
reopening)

2) If write(2) returns < 0 but doesn't set errno. That also seems like a
strange case that shouldn't happen, but perhaps there's some reason it can.



On Thu, Jul 3, 2008 at 10:57 PM, Andrew Hammond
<andrew.george.hammond@gmail.com> wrote:
> On Thu, Jul 3, 2008 at 3:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>How-To-Repeat:

>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:
 >> Have you looked into the machine's kernel log to see if there is any
 >> evidence of low-level distress (hardware or filesystem level)?  I'm
 >> wondering if ENOSPC is being reported because it is the closest
 >> available errno code, but the real problem is something different than
 >> the error message text suggests.  Other than the errno the symptoms
 >> all look quite a bit like a bad-sector problem ...
 
 da1 is the storage device where the PGDATA lives.
 
 Jun 19 03:06:14 db1 kernel: mpt1: request 0xffffffff929ba560:6810
 timed out for ccb 0xffffff0000e20000 (req->ccb 0xffffff0000e20000)
 Jun 19 03:06:14 db1 kernel: mpt1: request 0xffffffff929b90c0:6811
 timed out for ccb 0xffffff0001081000 (req->ccb 0xffffff0001081000)
 Jun 19 03:06:14 db1 kernel: mpt1: request 0xffffffff929b9f88:6812
 timed out for ccb 0xffffff0000d93800 (req->ccb 0xffffff0000d93800)
 Jun 19 03:06:14 db1 kernel: mpt1: attempting to abort req
 0xffffffff929ba560:6810 function 0
 Jun 19 03:06:14 db1 kernel: mpt1: request 0xffffffff929bcc90:6813
 timed out for ccb 0xffffff03e132dc00 (req->ccb 0xffffff03e132dc00)
 Jun 19 03:06:14 db1 kernel: mpt1: completing timedout/aborted req
 0xffffffff929ba560:6810
 Jun 19 03:06:14 db1 kernel: mpt1: abort of req 0xffffffff929ba560:0 completed
 Jun 19 03:06:14 db1 kernel: mpt1: attempting to abort req
 0xffffffff929b90c0:6811 function 0
 Jun 19 03:06:14 db1 kernel: mpt1: completing timedout/aborted req
 0xffffffff929b90c0:6811
 Jun 19 03:06:14 db1 kernel: mpt1: abort of req 0xffffffff929b90c0:0 completed
 Jun 19 03:06:14 db1 kernel: mpt1: attempting to abort req
 0xffffffff929b9f88:6812 function 0
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): WRITE(16). CDB: 8a 0 0 0
 0 1 6c 99 9 c0 0 0 0 20 0 0
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): CAM Status: SCSI Status Error
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): SCSI Status: Check Condition
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): UNIT ATTENTION asc:29,0
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): Power on, reset, or bus
 device reset occurred
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): Retrying Command (per Sense Data)
 Jun 19 03:06:14 db1 kernel: mpt1: completing timedout/aborted req
 0xffffffff929b9f88:6812
 Jun 19 03:06:14 db1 kernel: mpt1: abort of req 0xffffffff929b9f88:0 completed
 Jun 19 03:06:14 db1 kernel: mpt1: attempting to abort req
 0xffffffff929bcc90:6813 function 0
 Jun 19 03:06:14 db1 kernel: mpt1: completing timedout/aborted req
 0xffffffff929bcc90:6813
 Jun 19 03:06:14 db1 kernel: mpt1: abort of req 0xffffffff929bcc90:0 completed
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): WRITE(16). CDB: 8a 0 0 0
 0 1 65 1b 71 a0 0 0 0 20 0 0
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): CAM Status: SCSI Status Error
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): SCSI Status: Check Condition
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): UNIT ATTENTION asc:29,0
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): Power on, reset, or bus
 device reset occurred
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): Retrying Command (per Sense Data)
 
 Tom Lane writes:
 
 Also, I suggest filing a bug with your kernel distributor --- ENOSPC was
 a totally misleading error code here.  Seems like EIO would be more
 appropriate.  They'll probably want to see the kernel log.
 
                        regards, tom lane
 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200807072159.m67LxDnd002481>