From owner-freebsd-bugs@FreeBSD.ORG Tue Oct 30 20:30:01 2007 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F06B816A468 for ; Tue, 30 Oct 2007 20:30:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id D14E713C4D3 for ; Tue, 30 Oct 2007 20:30:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l9UKU11S002042 for ; Tue, 30 Oct 2007 20:30:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l9UKU1V4002041; Tue, 30 Oct 2007 20:30:01 GMT (envelope-from gnats) Resent-Date: Tue, 30 Oct 2007 20:30:01 GMT Resent-Message-Id: <200710302030.l9UKU1V4002041@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Matt Lehner Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A384D16A41A for ; Tue, 30 Oct 2007 20:25:38 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id 8ED5E13C4A3 for ; Tue, 30 Oct 2007 20:25:38 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.1/8.14.1) with ESMTP id l9UKPbwo030432 for ; Tue, 30 Oct 2007 20:25:37 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.1/8.14.1/Submit) id l9UKPbgB030431; Tue, 30 Oct 2007 20:25:37 GMT (envelope-from nobody) Message-Id: <200710302025.l9UKPbgB030431@www.freebsd.org> Date: Tue, 30 Oct 2007 20:25:37 GMT From: Matt Lehner To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: misc/117688: mpt disk timeout and hang X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Oct 2007 20:30:02 -0000 >Number: 117688 >Category: misc >Synopsis: mpt disk timeout and hang >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Oct 30 20:30:00 UTC 2007 >Closed-Date: >Last-Modified: >Originator: Matt Lehner >Release: 7.0-BETA1 >Organization: >Environment: FreeBSD vault.buffalo.rr.com 7.0-BETA1 FreeBSD 7.0-BETA1 #0: Mon Oct 22 07:41:02 UTC 2007 root@vault.buffalo.rr.com:/usr/obj/usr/src/sys/VAULT amd64 >Description: I installed FreeBSD7 so I could take advantage of the ZFS support. While testing out the ZFS support, I came across an issue with the mpt(4) driver. After an extended period of moderate to heavy load on the disks, I would get following errors in dmesg. Moderate to heavy disk load would be ~50-70MB/s with bursts to 86MB/s and 600 ops/s per disk according to gstat. Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6f150:13878 timed out for ccb 0xffffff0001a15000 (req->ccb 0xffffff0001a15000) Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e75450:13879 timed out for ccb 0xffffff0001a10000 (req->ccb 0xffffff0001a10000) Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 function 0 Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6ea00:13880 timed out for ccb 0xffffff0001998400 (req->ccb 0xffffff0001998400) Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e70740:13881 timed out for ccb 0xffffff0001395400 (req->ccb 0xffffff0001395400) Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e69ab0:13886 timed out for ccb 0xffffff000157dc00 (req->ccb 0xffffff000157dc00) Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e762f0:13887 timed out for ccb 0xffffff0001982400 (req->ccb 0xffffff0001982400) Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6b520:13888 timed out for ccb 0xffffff000198ec00 (req->ccb 0xffffff000198ec00) Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e7a820:13889 timed out for ccb 0xffffff00019bf000 (req->ccb 0xffffff00019bf000) Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 function 0 Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6dda0:13890 timed out for ccb 0xffffff0001983400 (req->ccb 0xffffff0001983400) Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6df50:13891 timed out for ccb 0xffffff00019be000 (req->ccb 0xffffff00019be000) Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6b9a0:13892 timed out for ccb 0xffffff00018c4400 (req->ccb 0xffffff00018c4400) Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e72a20:13893 timed out for ccb 0xffffff0001a10800 (req->ccb 0xffffff0001a10800) Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e696c0:13894 timed out for ccb 0xffffff000197ec00 (req->ccb 0xffffff000197ec00) Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e74d90:13895 timed out for ccb 0xffffff00018c4000 (req->ccb 0xffffff00018c4000) Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 function 0 Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e78e40:13904 timed out for ccb 0xffffff0001a0f000 (req->ccb 0xffffff0001a0f000) Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6f8a0:13905 timed out for ccb 0xffffff0001a0ac00 (req->ccb 0xffffff0001a0ac00) Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e75f00:13906 timed out for ccb 0xffffff000194e000 (req->ccb 0xffffff000194e000) Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e772b0:13907 timed out for ccb 0xffffff0001984000 (req->ccb 0xffffff0001984000) Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 function 0 Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 function 0 Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 function 0 Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed The last two lines would continue to repeat (indefinately I would assume) until I had to power-cycle the machine. When the server would come back online, ZFS would function fine and it reported no checksum errors or anything. I did a scrub and again no problems. But if I put enough load onto the disks for an extended period of time it would again crash with the same errors. There doesn't appear to be a certain length of time or exact combination of factors that causes the errors. Sometimes it would occur much more quickly than other times. When the errors were scrolling the screen, one disk or the other or both would have their activity light on steady. Currently the machine boots over the network (using pxeboot) from another machine. The ZFS array is the only physical disks it has. So while this is happening, the system itself does not lock up. Motherboard: Tyan Tiger i7501 S2723 CPU: Dual Opteron 244 Controller: LSI SAS3041X-R Harddrives: 2x 1TB Hitachi Deskstars vault# zfs list NAME USED AVAIL REFER MOUNTPOINT tank 824G 89.6G 18K /tank tank/storage 824G 89.6G 824G /storage vault# vault# zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 errors: No known data errors vault# mpt0: port 0x8800-0x88ff mem 0xfc2fc000-0xfc2fffff,0xfc2e0000-0xfc2effff irq 28 at device 3.0 on pci1 mpt0: [ITHREAD] mpt0: MPI Version=1.5.10.0 >How-To-Repeat: do a lot of IO over an mpt(4) device for an extended period >Fix: >Release-Note: >Audit-Trail: >Unformatted: