Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Oct 2007 20:25:37 GMT
From:      Matt Lehner <matt@aim2game.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   misc/117688: mpt disk timeout and hang
Message-ID:  <200710302025.l9UKPbgB030431@www.freebsd.org>
Resent-Message-ID: <200710302030.l9UKU1V4002041@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         117688
>Category:       misc
>Synopsis:       mpt disk timeout and hang
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Oct 30 20:30:00 UTC 2007
>Closed-Date:
>Last-Modified:
>Originator:     Matt Lehner
>Release:        7.0-BETA1
>Organization:
>Environment:
FreeBSD vault.buffalo.rr.com 7.0-BETA1 FreeBSD 7.0-BETA1 #0: Mon Oct 22 07:41:02 UTC 2007     root@vault.buffalo.rr.com:/usr/obj/usr/src/sys/VAULT  amd64
>Description:
I installed FreeBSD7 so I could take advantage of the ZFS support. While testing out the ZFS support, I came across an issue with the mpt(4) driver. After an extended period of moderate to heavy load on the disks, I would get following errors in dmesg. Moderate to heavy disk load would be ~50-70MB/s with bursts to 86MB/s and 600 ops/s per disk according to gstat.

Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6f150:13878 timed out for ccb 0xffffff0001a15000 (req->ccb 0xffffff0001a15000)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e75450:13879 timed out for ccb 0xffffff0001a10000 (req->ccb 0xffffff0001a10000)
Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 function 0
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6ea00:13880 timed out for ccb 0xffffff0001998400 (req->ccb 0xffffff0001998400)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e70740:13881 timed out for ccb 0xffffff0001395400 (req->ccb 0xffffff0001395400)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e69ab0:13886 timed out for ccb 0xffffff000157dc00 (req->ccb 0xffffff000157dc00)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e762f0:13887 timed out for ccb 0xffffff0001982400 (req->ccb 0xffffff0001982400)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6b520:13888 timed out for ccb 0xffffff000198ec00 (req->ccb 0xffffff000198ec00)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e7a820:13889 timed out for ccb 0xffffff00019bf000 (req->ccb 0xffffff00019bf000)
Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed
Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 function 0
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6dda0:13890 timed out for ccb 0xffffff0001983400 (req->ccb 0xffffff0001983400)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6df50:13891 timed out for ccb 0xffffff00019be000 (req->ccb 0xffffff00019be000)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6b9a0:13892 timed out for ccb 0xffffff00018c4400 (req->ccb 0xffffff00018c4400)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e72a20:13893 timed out for ccb 0xffffff0001a10800 (req->ccb 0xffffff0001a10800)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e696c0:13894 timed out for ccb 0xffffff000197ec00 (req->ccb 0xffffff000197ec00)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e74d90:13895 timed out for ccb 0xffffff00018c4000 (req->ccb 0xffffff00018c4000)
Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed
Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 function 0
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e78e40:13904 timed out for ccb 0xffffff0001a0f000 (req->ccb 0xffffff0001a0f000)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6f8a0:13905 timed out for ccb 0xffffff0001a0ac00 (req->ccb 0xffffff0001a0ac00)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e75f00:13906 timed out for ccb 0xffffff000194e000 (req->ccb 0xffffff000194e000)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e772b0:13907 timed out for ccb 0xffffff0001984000 (req->ccb 0xffffff0001984000)
Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed
Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 function 0
Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed
Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 function 0
Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed
Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 function 0
Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed

The last two lines would continue to repeat (indefinately I would assume) until I had to power-cycle the machine. When the server would come back online, ZFS would function fine and it reported no checksum errors or anything. I did a scrub and again no problems. But if I put enough load onto the disks for an extended period of time it would again crash with the same errors. There doesn't appear to be a certain length of time or exact combination of factors that causes the errors. Sometimes it would occur much more quickly than other times. When the errors were scrolling the screen, one disk or the other or both would have their activity light on steady.

Currently the machine boots over the network (using pxeboot) from another machine. The ZFS array is the only physical disks it has. So while this is happening, the system itself does not lock up.

Motherboard: Tyan Tiger i7501 S2723
CPU: Dual Opteron 244
Controller: LSI SAS3041X-R
Harddrives: 2x 1TB Hitachi Deskstars

vault# zfs list
NAME           USED  AVAIL  REFER  MOUNTPOINT
tank           824G  89.6G    18K  /tank
tank/storage   824G  89.6G   824G  /storage
vault#

vault# zpool status
  pool: tank
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            da0     ONLINE       0     0     0
            da1     ONLINE       0     0     0

errors: No known data errors
vault#

mpt0: <LSILogic SAS/SATA Adapter> port 0x8800-0x88ff mem 0xfc2fc000-0xfc2fffff,0xfc2e0000-0xfc2effff irq 28 at device 3.0 on pci1
mpt0: [ITHREAD]
mpt0: MPI Version=1.5.10.0
>How-To-Repeat:
do a lot of IO over an mpt(4) device for an extended period
>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200710302025.l9UKPbgB030431>