From owner-freebsd-fs@freebsd.org Mon Oct 2 19:47:12 2017 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 15041E25FEE for ; Mon, 2 Oct 2017 19:47:12 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wm0-x22c.google.com (mail-wm0-x22c.google.com [IPv6:2a00:1450:400c:c09::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B39457538E for ; Mon, 2 Oct 2017 19:47:11 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wm0-x22c.google.com with SMTP id m72so12668856wmc.1 for ; Mon, 02 Oct 2017 12:47:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language; bh=y4w5vFIrV/gh5rGVfnRA+yk973x633mqcScwQpwVqlc=; b=BiFKBpa8l0Q5bCjMefCv3ECBpdtEG8m7EZiNS8D0T+77I0PUR4L9OfkxwgsK3KkcVt 3tBfgU6DnK/j3UJNO2qQO4mI2Rpm2b8R31YpkGILTXM18iLCwjawT0YRTPIonulaf6ab lfRphq2s0m+BjEbhA3p4x3dv3/czEzgzE34VTeN6X9Jb6vPreSgVDIAojEVnKPzGmuSH T8va6SpnL1W3J50Bs2jYJnMcrAvf1aQT+ifD0c7HCI/K8vvbQTxti8QrM11AuU/ipgj2 3x0G60/ts07eOest9ytDpPb1fOCJoTenep9G4GU9QA2nT5P8ThmVbEQS1oNgWkYJqOtr GIxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=y4w5vFIrV/gh5rGVfnRA+yk973x633mqcScwQpwVqlc=; b=AN2IJPP5/PPZ2yzfYNt+7zeaefg86t2WOHiZXGefQTLOV4leQmU4Iy4rKn6vix6u2E GPrLeD+z5ibeIof5dw95KznvJhBMmzoK5jDYXZUPhwiMgm6RLD18Z3bXhgkWb/GqlbJi jHXXiQwpVClGx7FikDfwKjjFM5iW4vaae/6bYlh7Odwe/hU99oMRiPrAaJjhA3wZER0g Ob3nAbX0dgB6oF3ZYqgMNuAML8+4U4l8w2wRN1Dr9fEumr/0wOXvrkqV3yAIx914Rtey R3FLKiUm17MAHcvK1LFi0JD3ktfuQcu171Og6Rmbrdj9fvIbtHddNcSAPcNeT6G3dPV4 /pgA== X-Gm-Message-State: AMCzsaVzD6KAAfrz5TMA7BBWmy/8m+gAPBvefhwEd1/KBZYSY5izNYp7 eo/hIJkd9iMiW1tB5OxdkxO22IEv3SQ= X-Google-Smtp-Source: AOwi7QCzee8K14g8fr9E91e8CK0vrMP05qHWU/IVDtNxh8iS/9TanqGMPxjzmVrQS16da/MB33vcUQ== X-Received: by 10.28.7.79 with SMTP id 76mr9159699wmh.45.1506973629342; Mon, 02 Oct 2017 12:47:09 -0700 (PDT) Received: from [10.10.1.111] ([185.97.61.1]) by smtp.gmail.com with ESMTPSA id a19sm12933744wra.64.2017.10.02.12.47.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 02 Oct 2017 12:47:07 -0700 (PDT) Subject: Re: ZFS stalled after some mirror disks were lost To: freebsd-fs@freebsd.org References: <4A0E9EB8-57EA-4E76-9D7E-3E344B2037D2@gmail.com> <71d4416a-3454-df36-adae-34c0b70cd84e@multiplay.co.uk> <8A189756-028A-465E-9962-D0181FAEBB79@gmail.com> <953DD379-C03A-4737-BAD8-14BB2DB4AB05@gmail.com> <4f725113-bac3-64bb-9858-690811e73153@multiplay.co.uk> <54AD0000-AF0B-4682-9047-6E6C1B82506C@gmail.com> From: Steven Hartland Message-ID: <7fb4c99b-f3a0-1dda-691c-35f25769ed5c@multiplay.co.uk> Date: Mon, 2 Oct 2017 20:47:09 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <54AD0000-AF0B-4682-9047-6E6C1B82506C@gmail.com> Content-Language: en-US Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Oct 2017 19:47:12 -0000 On 02/10/2017 20:10, Ben RUBSON wrote: >> On 02 Oct 2017, at 20:41, Steven Hartland wrote: >> >> I'm guessing that the devices haven't disconnected cleanly so are just stalling all requests to them and hence the pool. > I even tried to ifconfig down the network interface serving the iscsi targets, it did not help. > >> I'm not that familiar with iscsi, does it still show under under camcontrol or geom? > # geom disk list > (...) > Geom name: da13 > Providers: > 1. Name: da13 > Mediasize: 3999688294912 (3.6T) > Sectorsize: 512 > Mode: r1w1e2 > wither: (null) > > Geom name: da15 > Providers: > 1. Name: da15 > Mediasize: 3999688294912 (3.6T) > Sectorsize: 512 > Mode: r1w1e2 > wither: (null) > > Geom name: da16 > Providers: > 1. Name: da16 > Mediasize: 3999688294912 (3.6T) > Sectorsize: 512 > Mode: r1w1e2 > wither: (null) > > Geom name: da19 > Providers: > 1. Name: da19 > Mediasize: 3999688294912 (3.6T) > Sectorsize: 512 > Mode: r1w1e2 > wither: (null) > > # camcontrol devlist > // does not show the above disks So these daXX devices represent your iscsi devices? If so looks like your problem is at the iscsi layer, as its not disconnected properly, so as far ZFS is concerned its still waiting for them. > >> Does iscsid have any options on how to treat failed devices? > iSCSI has some tuning regarding how to treat failing devices, and I did it : > kern.iscsi.ping_timeout=5 > kern.iscsi.iscsid_timeout=5 > kern.iscsi.login_timeout=85 > kern.iscsi.fail_on_disconnection=1 > > However, as I disconnected the targets from the server hosting the zpool, > they should not have been needed.     Regards     Steve