From owner-freebsd-scsi@freebsd.org Tue Jul 7 15:37:25 2015 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F0A36995F34 for ; Tue, 7 Jul 2015 15:37:25 +0000 (UTC) (envelope-from stephen.mcconnell@avagotech.com) Received: from mail-vn0-x233.google.com (mail-vn0-x233.google.com [IPv6:2607:f8b0:400c:c0f::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AE1451D7E for ; Tue, 7 Jul 2015 15:37:25 +0000 (UTC) (envelope-from stephen.mcconnell@avagotech.com) Received: by vnbf7 with SMTP id f7so16381826vnb.0 for ; Tue, 07 Jul 2015 08:37:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=avagotech.com; s=google; h=from:references:in-reply-to:mime-version:thread-index:date :message-id:subject:to:content-type; bh=bbcIyltYojc0gLxutuNbAofzT86Z78VJ6ws17oqPbPU=; b=B4xUhOhTvOxWa33Gsv69eLnhzzIv3uuAlWHOafq1Mw2gpD3fleLheXaa5iVXZm/B3p 9VvnfyZlGQ1gPrBHRr4+160pjCy5pmbjOEsBI7X0vyBirBGvg7ccG79PLAnZcnXNIxdO xcpWd7v9JGf/lEi0PG9TzR6r+dV7rh6reqY3c= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to:content-type; bh=bbcIyltYojc0gLxutuNbAofzT86Z78VJ6ws17oqPbPU=; b=HHD1AV/cn6oD5fqL23ECpPaROMCyUtwwkUmJAuP3/QuEDxfpoTW/O7uA8hFr7zoaVW DRW5g6XOsUT8ltx96rX2qrNhsE7W9b7PlVC4UxcVPCjEoc16qUSu60OJoUrlN5sTQR+u 0ILiuiOG7MdHdE1kNyUuXlUyKDdXNmxmP9osRCVSfKCHs5Gaur19F2CJ0IP3wI085TEF j1I74ADABgyjt+XuNxUlhV5Bu70Q0FYaFU58CgQkVQDMo8yzfPhoE2gZa/wTznyeKF/4 p+4mbNCCACTIG5yKUPyNwgS2KxmGEU7jAWVRoWoxjr4RbqIoUYWX1Twtn3i9XYGiFWGc Tklg== X-Gm-Message-State: ALoCoQkrTMkMVjHo5jsGmJNUAQiAji4yWMYv1PzS7doSgku44xAKFmCMpEOYYNt93kry9b3W/NRk X-Received: by 10.52.114.230 with SMTP id jj6mr4871589vdb.66.1436283444321; Tue, 07 Jul 2015 08:37:24 -0700 (PDT) From: Stephen Mcconnell References: <20150707132416.71b44c90f7f4cd6014a304b2@yamagi.org> In-Reply-To: <20150707132416.71b44c90f7f4cd6014a304b2@yamagi.org> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQGYg4GhL/MCbKjUmWvd1rpHhP9ph55AjWnQ Date: Tue, 7 Jul 2015 09:37:22 -0600 Message-ID: <9426ced85d7def424e106fdefd7448ae@mail.gmail.com> Subject: RE: Device timeouts(?) with LSI SAS3008 on mpr(4) To: Yamagi Burmeister , freebsd-scsi@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jul 2015 15:37:26 -0000 Hi Yamagi, I see two drives that are having problems. Are there others? Can you try to remove those drives and let me know what happens. To me, it actually looks like those drives could be faulty. Steve > -----Original Message----- > From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- > scsi@freebsd.org] On Behalf Of Yamagi Burmeister > Sent: Tuesday, July 07, 2015 5:24 AM > To: freebsd-scsi@freebsd.org > Subject: Device timeouts(?) with LSI SAS3008 on mpr(4) > > Hello, > I've got 3 new Supermicro servers based upon the X10DRi-LN4+ platform. > Each server is equiped with 2 LSI SAS9300-8i-SQL SAS adapters. Each adapter > serves 8 Intel DC S3700 SSDs. Operating system is 10.1-STABLE as of r283938 on > 2 servers and r285196 on the last one. > > The controller identify themself as: > > ---- > > mpr0: port 0x6000-0x60ff mem > 0xc7240000-0xc724ffff,0xc7200000-0xc723ffff irq 32 at device 0.0 on > pci2 mpr0: IOCFacts : MsgVersion: 0x205 > HeaderVersion: 0x2300 > IOCNumber: 0 > IOCExceptions: 0x0 > MaxChainDepth: 128 > NumberOfPorts: 1 > RequestCredit: 10240 > ProductID: 0x2221 > IOCRequestFrameSize: 32 > MaxInitiators: 32 > MaxTargets: 1024 > MaxSasExpanders: 42 > MaxEnclosures: 43 > HighPriorityCredit: 128 > MaxReplyDescriptorPostQueueDepth: 65504 > ReplyFrameSize: 32 > MaxVolumes: 0 > MaxDevHandle: 1106 > MaxPersistentEntries: 128 > mpr0: Firmware: 08.00.00.00, Driver: 09.255.01.00-fbsd > mpr0: IOCCapabilities: > 7a85c ,HostDisc> > > ---- > > 08.00.00.00 is the last available firmware. > > > Since day one 'dmesg' is cluttered with CAM errors: > > ---- > > mpr1: Sending reset from mprsas_send_abort for target ID 5 > (da11:mpr1:0:5:0): WRITE(10). CDB: 2a 00 4c 15 1f 88 00 00 08 > 00 length 4096 SMID 554 terminated ioc 804b scsi 0 state c xfer 0 > (da11:mpr1:0:5:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 > 01 00 00 00 00 00 00 40 06 00 length 512 SMID 506 ter(da11:mpr1:0:5:0): > READ(10). CDB: 28 00 4c 2b 95 c0 00 00 10 00 minated ioc 804b scsi 0 state c > xfer 0 (da11:mpr1:0:5:0): CAM status: Command timeout mpr1: > (da11:Unfreezing devq for target ID 5 mpr1:0:5:0): Retrying command > (da11:mpr1:0:5:0): READ(10). CDB: 28 00 4c 2b 95 c0 00 00 10 00 > (da11:mpr1:0:5:0): CAM status: SCSI Status Error (da11:mpr1:0:5:0): > SCSI status: Check Condition (da11:mpr1:0:5:0): SCSI sense: UNIT ATTENTION > asc:29,0 (Power on, reset, or bus device reset occurred) > (da11:mpr1:0:5:0): Retrying command (per sense data) (da11:mpr1:0:5:0): > READ(10). CDB: 28 00 4c 22 b5 b8 00 00 18 00 (da11:mpr1:0:5:0): CAM > status: SCSI Status Error (da11:mpr1:0:5:0): SCSI status: Check Condition > (da11:mpr1:0:5:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or > bus device reset occurred) (da11:mpr1:0:5:0): Retrying command (per sense > data) (noperiph:mpr1:0:4294967295:0): SMID 2 Aborting command > 0xfffffe0001601a30 > > mpr1: Sending reset from mprsas_send_abort for target ID 2 > (da8:mpr1:0:2:0): WRITE(10). CDB: 2a 00 59 81 ae 18 00 00 30 00 length > 24576 SMID 898 terminated ioc 804b scsi 0 state c xfer 0 > (da8:mpr1:0:2:0): READ(10). CDB: 28 00 59 77 cc e0 00 00 18 00 length > 12288 SMID 604 terminated ioc 804b scsi 0 state c xfer 0 mpr1: > Unfreezing devq for target ID 2 (da8:mpr1:0:2:0): ATA COMMAND PASS > THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 > (da8:mpr1:0:2:0): CAM status: Command timeout (da8:mpr1:0:2:0): > Retrying command (da8:mpr1:0:2:0): WRITE(10). CDB: 2a 00 59 81 ae 18 00 > 00 30 00 (da8:mpr1:0:2:0): CAM status: SCSI Status Error > (da8:mpr1:0:2:0): SCSI status: Check Condition (da8:mpr1:0:2:0): SCSI > sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset > occurred) (da8:mpr1:0:2:0): Retrying command (per sense data) > (da8:mpr1:0:2:0): READ(10). CDB: 28 00 59 41 3d 08 00 00 10 00 > (da8:mpr1:0:2:0): CAM status: SCSI Status Error (da8:mpr1:0:2:0): SCSI > status: Check Condition (da8:mpr1:0:2:0): SCSI sense: UNIT ATTENTION > asc:29,0 (Power on, reset, or bus device reset occurred) > (da8:mpr1:0:2:0): Retrying command (per sense data) > (noperiph:mpr1:0:4294967295:0): SMID 3 Aborting command > 0xfffffe000160b660 > > ---- > > ZFS doesn't like this and sees read errors or even write errors. In extreme cases > the device is marked as FAULTED: > > ---- > > pool: examplepool > state: DEGRADED > status: One or more devices are faulted in response to persistent errors. > Sufficient replicas exist for the pool to continue functioning in a degraded state. > action: Replace the faulted device, or use 'zpool clear' to mark the device > repaired. > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > examplepool DEGRADED 0 0 0 > raidz1-0 ONLINE 0 0 0 > da3p1 ONLINE 0 0 0 > da4p1 ONLINE 0 0 0 > da5p1 ONLINE 0 0 0 > logs > da1p1 FAULTED 3 0 0 too many errors > cache > da1p2 FAULTED 3 0 0 too many errors > spares > da2p1 AVAIL > > errors: No known data errors > > ---- > > The problems arise on all 3 machines all all SSDs nearly daily. So I highly suspect > a software issue. Has anyone an idea what's going on and what I can do to solve > this problems? More information can be provided if necessary. > > Regards, > Yamagi > > -- > Homepage: www.yamagi.org > XMPP: yamagi@yamagi.org > GnuPG/GPG: 0xEFBCCBCB > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"