From owner-freebsd-bugs@FreeBSD.ORG Mon Jul 7 22:00:02 2008 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2B39E1065679 for ; Mon, 7 Jul 2008 22:00:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id F363D8FC18 for ; Mon, 7 Jul 2008 22:00:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m67M01qO026648 for ; Mon, 7 Jul 2008 22:00:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m67M01Rt026647; Mon, 7 Jul 2008 22:00:01 GMT (envelope-from gnats) Resent-Date: Mon, 7 Jul 2008 22:00:01 GMT Resent-Message-Id: <200807072200.m67M01Rt026647@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Andrew Hammond Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 58AD0106566C for ; Mon, 7 Jul 2008 21:59:14 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id 475A78FC19 for ; Mon, 7 Jul 2008 21:59:14 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.2/8.14.2) with ESMTP id m67LxEXO002482 for ; Mon, 7 Jul 2008 21:59:14 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.2/8.14.1/Submit) id m67LxDnd002481; Mon, 7 Jul 2008 21:59:13 GMT (envelope-from nobody) Message-Id: <200807072159.m67LxDnd002481@www.freebsd.org> Date: Mon, 7 Jul 2008 21:59:13 GMT From: Andrew Hammond To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: kern/125382: ENOSPC may be misleading, consider EIO X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 22:00:02 -0000 >Number: 125382 >Category: kern >Synopsis: ENOSPC may be misleading, consider EIO >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Jul 07 22:00:01 UTC 2008 >Closed-Date: >Last-Modified: >Originator: Andrew Hammond >Release: 6.2 amd64 >Organization: AdECN, a Microsoft Company >Environment: FreeBSD db1.sjc.adecn.com 6.2-RELEASE-p6 FreeBSD 6.2-RELEASE-p6 #1: Thu Jul 19 09:21:10 PDT 2007 root@qaipc1.qa1.adecn.com:/usr/obj/usr/src/sys/ADECNDB amd64 >Description: Found the following error message in PostgreSQL logs: vacuumdb: vacuuming of database "adecndb" failed: ERROR: could not write block 209610 of relation 1663/16386/236356665: No space left on device Didn't make sense since device is only at 18% usage. Got on pgsql-hackers mailing list (subject "the un-vacuumable table", thread starts at http://archives.postgresql.org/pgsql-hackers/2008-06/msg00922.php). > Have you looked into the machine's kernel log to see if there is any > evidence of low-level distress (hardware or filesystem level)? I'm > wondering if ENOSPC is being reported because it is the closest > available errno code, but the real problem is something different than > the error message text suggests. Other than the errno the symptoms > all look quite a bit like a bad-sector problem ... Uhm, just for the record FileWrite returns error messages which get printed this way for two reasons other than write(2) returning ENOSPC: 1) if FileAccess has to reopen the file then open(2) could return an error. I don't see how open returns ENOSPC without O_CREAT (and that's cleared for reopening) 2) If write(2) returns < 0 but doesn't set errno. That also seems like a strange case that shouldn't happen, but perhaps there's some reason it can. On Thu, Jul 3, 2008 at 10:57 PM, Andrew Hammond wrote: > On Thu, Jul 3, 2008 at 3:47 PM, Tom Lane wrote: >How-To-Repeat: >Fix: >Release-Note: >Audit-Trail: >Unformatted: >> Have you looked into the machine's kernel log to see if there is any >> evidence of low-level distress (hardware or filesystem level)? I'm >> wondering if ENOSPC is being reported because it is the closest >> available errno code, but the real problem is something different than >> the error message text suggests. Other than the errno the symptoms >> all look quite a bit like a bad-sector problem ... da1 is the storage device where the PGDATA lives. Jun 19 03:06:14 db1 kernel: mpt1: request 0xffffffff929ba560:6810 timed out for ccb 0xffffff0000e20000 (req->ccb 0xffffff0000e20000) Jun 19 03:06:14 db1 kernel: mpt1: request 0xffffffff929b90c0:6811 timed out for ccb 0xffffff0001081000 (req->ccb 0xffffff0001081000) Jun 19 03:06:14 db1 kernel: mpt1: request 0xffffffff929b9f88:6812 timed out for ccb 0xffffff0000d93800 (req->ccb 0xffffff0000d93800) Jun 19 03:06:14 db1 kernel: mpt1: attempting to abort req 0xffffffff929ba560:6810 function 0 Jun 19 03:06:14 db1 kernel: mpt1: request 0xffffffff929bcc90:6813 timed out for ccb 0xffffff03e132dc00 (req->ccb 0xffffff03e132dc00) Jun 19 03:06:14 db1 kernel: mpt1: completing timedout/aborted req 0xffffffff929ba560:6810 Jun 19 03:06:14 db1 kernel: mpt1: abort of req 0xffffffff929ba560:0 completed Jun 19 03:06:14 db1 kernel: mpt1: attempting to abort req 0xffffffff929b90c0:6811 function 0 Jun 19 03:06:14 db1 kernel: mpt1: completing timedout/aborted req 0xffffffff929b90c0:6811 Jun 19 03:06:14 db1 kernel: mpt1: abort of req 0xffffffff929b90c0:0 completed Jun 19 03:06:14 db1 kernel: mpt1: attempting to abort req 0xffffffff929b9f88:6812 function 0 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): WRITE(16). CDB: 8a 0 0 0 0 1 6c 99 9 c0 0 0 0 20 0 0 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): CAM Status: SCSI Status Error Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): SCSI Status: Check Condition Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): UNIT ATTENTION asc:29,0 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): Power on, reset, or bus device reset occurred Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): Retrying Command (per Sense Data) Jun 19 03:06:14 db1 kernel: mpt1: completing timedout/aborted req 0xffffffff929b9f88:6812 Jun 19 03:06:14 db1 kernel: mpt1: abort of req 0xffffffff929b9f88:0 completed Jun 19 03:06:14 db1 kernel: mpt1: attempting to abort req 0xffffffff929bcc90:6813 function 0 Jun 19 03:06:14 db1 kernel: mpt1: completing timedout/aborted req 0xffffffff929bcc90:6813 Jun 19 03:06:14 db1 kernel: mpt1: abort of req 0xffffffff929bcc90:0 completed Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): WRITE(16). CDB: 8a 0 0 0 0 1 65 1b 71 a0 0 0 0 20 0 0 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): CAM Status: SCSI Status Error Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): SCSI Status: Check Condition Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): UNIT ATTENTION asc:29,0 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): Power on, reset, or bus device reset occurred Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): Retrying Command (per Sense Data) Tom Lane writes: Also, I suggest filing a bug with your kernel distributor --- ENOSPC was a totally misleading error code here. Seems like EIO would be more appropriate. They'll probably want to see the kernel log. regards, tom lane