From owner-freebsd-scsi@freebsd.org Wed Jul 4 10:28:33 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7AA3A102672E for ; Wed, 4 Jul 2018 10:28:33 +0000 (UTC) (envelope-from crimsonthunder@gmx.net) Received: from mout.gmx.net (mout.gmx.net [212.227.15.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CB03172BF6; Wed, 4 Jul 2018 10:28:32 +0000 (UTC) (envelope-from crimsonthunder@gmx.net) Received: from [10.12.22.246] ([193.170.152.64]) by mail.gmx.com (mrgmx003 [212.227.17.190]) with ESMTPSA (Nemesis) id 0MMBiP-1fTGV63Ua5-00805r; Wed, 04 Jul 2018 12:28:29 +0200 Subject: Re: problems with SAS JBODs 2 To: "Kenneth D. Merry" Cc: freebsd-scsi@freebsd.org, slm@freebsd.org References: <20180703142629.GF26046@mithlond.kdm.org> From: Oliver Sech Message-ID: Date: Wed, 4 Jul 2018 12:28:29 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20180703142629.GF26046@mithlond.kdm.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K1:q45F6FvDO7to6go6lJ3Fl7VfxHzumQqb35Tx9T10dapkmcJyhYf 9og77p5OsVLlmoB/cp/caR38Eq6etIDTezkAikkY91NVuTVFCNPSZ47nVgE3h7q2hsVb+4f 4SoLP41iF/dpGvbZ/W8BOfrUn0vcup4SqFJno1Ex4/aWaeUUo4+yyMsoQDTDmTwo9oyHcf9 rneBL8LohEmizMHGeUJFQ== X-UI-Out-Filterresults: notjunk:1;V01:K0:9S8cY5xf2Tw=:nX0dp3U1FSm18F7lE+/xPU wIJdRzFM77YjksCNhmdLeKtp2vJnR1WXjN5r7yW/ziinmQmHyEgrTuGWsH6uuqs44OUB0dHiu LzS+bKW5z60Itzd0/D5rIo4QCWRk9gsyfXfolE4CD17FwFh5QvpO8KmFvpKw8bAQ/GBMXlqzu bOa8tOmwBDhEZK02mU7356Ycnv/S9PY7u3N6cg85EkYCH2yCQDLh4NvyahtxoRjDXYZ9OlYDk Bym4KfVM82nhi/ojtlVrGQUAuVzMkq9ecX4RP6LGryf75NJGSp5dPffu1SJ++IfeSYJk6KLyK Pa4XGtzoGGXn+3zGNGf3p76s00LkF4EsnAiuyv1cg+TdPC3V8DV/GsgI3rtKHNi+CYQ6JyWY8 CdUYd2tdMfmW/Nmqehl8STPiJSto/w+coJH7bZIw9f8C5+pXZEjkY0mp5KS3OlQT+2QogOH3v GhoNpQwoGvYlOp+g6tQWuelsNdP9PxqIkwBGZdcAT5X5u7jXJ4JTmAXnjN+3kxCYCXxrpmaZJ OYWrqaNSK7mlDhIGm3UhkCerWXsInCqRVrXfv00hyqCMnhZSmmwmWY5drXBJKgxsKSxt5EwzF pM3UmEw6vy+RGy3sSx0yua+MRZprizocXG8Ao91pGgyccptWA1H1EMTC+HJFLAbl1eHFODkDq 2DHcj+J17A9Ac4pajWfWUDK1dHtga6TuxPxfArBl8GZZ1regXmnQC7hsTqsECI/GzqZ9rItIj WuqBNYuoWFvFOegqW8fvf2rsBGZC84Ywdfaz2adaRNvZn5tl3FMKbtQucZ4kv1jQTZpK0XCIo RiZ6FKO X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Jul 2018 10:28:33 -0000 > The most likely issue is that the mapping table stored on the card is messed > up. Can you send dmesg output with the following loader tunable set: > > hw.mpr.debug_level=0x203 > > That will turn on debugging for the mapping code and may show the problem. > > If you see messages like this: > > mpr0: Attempting to reuse target id 63 handle 0x000b > mpr0: Attempting to reuse target id 64 handle 0x000c > mpr0: Attempting to reuse target id 65 handle 0x000d > mpr0: Attempting to reuse target id 66 handle 0x000e > mpr0: Attempting to reuse target id 67 handle 0x000f > mpr0: Attempting to reuse target id 68 handle 0x0010 > mpr0: Attempting to reuse target id 69 handle 0x0011 > mpr0: Attempting to reuse target id 70 handle 0x0012 > mpr0: Attempting to reuse target id 66 handle 0x000e > > It indicates that the mapping code is preventing some of the drives from > fully probing because there are collisions in the table. > > Unfortunately we have not yet fixed the problem in the other situation. > (He is running with multipathing, which could be contributing to the > problem.) > > I have a script and utility that will clear the mapping table in the card, > but that hasn't been enough to fix the other situation. If you do have a > mapping problem, I can give you the script/utility to clear the table and > we can see whether it fixes your problem. > > If not, it'll probably have to wait until Steve gets back from vacation. > > Ken I added the "hw.mpr.debug_level" tunable and collected logs on the whole connect -> disconnect -> connect problem. logs collected: first connect log: https://paste.docker.ist.ac.at/?6ec80dde0e1f236f#NufbXSs6o+dTDTPgZgWbU8vRQ6B47tMbQ8LHPkMXfIg= first connect sesutil: https://paste.docker.ist.ac.at/?256810338f87adc1#/N3m6iFH304SxSxpnHCt0ocOeAU8zkBennul2/BcKpQ= disconnected shelf log: https://paste.docker.ist.ac.at/?07ff1129a6cb6117#8WH8AjO1sO2hZlHE39h314CoQxxFZmBVZNo+Q8+qp4Q= disconnected shelf mprutil: https://paste.docker.ist.ac.at/?eebaee72dc9e1cfe#WTlnO5vlPb7997lJCMswWfwtcq1rN04CaFbxmMWHqrU= second connect log: https://paste.docker.ist.ac.at/?684ff32c6dae185b#nZ32x023ApRvNKrVUhvCr7xi5cYJnPhs9XNTfEW6sMw= second connect sesutil: https://paste.docker.ist.ac.at/?f0302ce3aa8e55d7#+ZaJsCUiLh/7VsqBJ5oPHxZtRbM1dVS2RankrXePikw= second connect mprutil: https://paste.docker.ist.ac.at/?4b8d347aed941c1f#wX7y0cjtb2gYKLU99IIftmDcFpKiV2QqjcC7YN96nB0= If you are interested in investigating this further I can try to organize a "test environment" as I'm pretty sure this issue is not limited to my hardware? best regards, Oliver