From owner-freebsd-fs@FreeBSD.ORG Sun Jun 21 08:21:04 2015 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 82CAF395 for ; Sun, 21 Jun 2015 08:21:04 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6CFF1D8B for ; Sun, 21 Jun 2015 08:21:04 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id t5L8L4MM012752 for ; Sun, 21 Jun 2015 08:21:04 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 198242] [zfs] L2ARC degraded. Checksum errors, I/O errors Date: Sun, 21 Jun 2015 08:21:01 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.1-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: avg@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: avg@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Jun 2015 08:21:04 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=198242 Andriy Gapon changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|freebsd-fs@FreeBSD.org |avg@FreeBSD.org -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@FreeBSD.ORG Sun Jun 21 14:01:28 2015 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C8E52DDE for ; Sun, 21 Jun 2015 14:01:28 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (smtp.digiware.nl [31.223.170.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6DB9D8B3 for ; Sun, 21 Jun 2015 14:01:28 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from rack1.digiware.nl (unknown [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id E453516A407; Sun, 21 Jun 2015 16:01:23 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oMI3wgyVHyb9; Sun, 21 Jun 2015 16:00:53 +0200 (CEST) Received: from [IPv6:2001:4cb8:3:1:ccdf:1bc4:d42f:fddb] (unknown [IPv6:2001:4cb8:3:1:ccdf:1bc4:d42f:fddb]) by smtp.digiware.nl (Postfix) with ESMTPA id AF83016A402; Sun, 21 Jun 2015 16:00:53 +0200 (CEST) Message-ID: <5586C396.9010100@digiware.nl> Date: Sun, 21 Jun 2015 16:00:54 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Daryl Richards , freebsd-fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS References: <5585767B.4000206@digiware.nl> <558590BD.40603@isletech.net> In-Reply-To: <558590BD.40603@isletech.net> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Jun 2015 14:01:28 -0000 On 20/06/2015 18:11, Daryl Richards wrote: > Check the failmode setting on your pool. From man zpool: > > failmode=wait | continue | panic > > Controls the system behavior in the event of catastrophic > pool failure. This condition is typically a > result of a loss of connectivity to the underlying storage > device(s) or a failure of all devices within > the pool. The behavior of such an event is determined as > follows: > > wait Blocks all I/O access until the device > connectivity is recovered and the errors are cleared. > This is the default behavior. > > continue Returns EIO to any new write I/O requests but > allows reads to any of the remaining healthy > devices. Any write requests that have yet to be > committed to disk would be blocked. > > panic Prints out a message to the console and generates > a system crash dump. 'mmm Did not know about this setting. Nice one, but alas my current setting is: zfsboot failmode wait default zfsraid failmode wait default So either the setting is not working, or something else is up? Is waiting only meant to wait a limited time? And then panic anyways? But then still I wonder why even in the 'continue'-case the ZFS system ends in a state where the filesystem is not able to continue in its standard functioning ( read and write ) and disconnects the disk??? All failmode settings result in a seriously handicapped system... On a raidz2 system I would perhaps expected this to occur when the second disk goes into thin space?? The other question is: The man page talks about 'Controls the system behavior in the event of catastrophic pool failure' And is a hung disk a 'catastrophic pool failure'? Still very puzzled? --WjW > > > On 2015-06-20 10:19 AM, Willem Jan Withagen wrote: >> Hi, >> >> Found my system rebooted this morning: >> >> Jun 20 05:28:33 zfs kernel: sonewconn: pcb 0xfffff8011b6da498: Listen >> queue overflow: 8 already in queue awaiting acceptance (48 occurrences) >> Jun 20 05:28:33 zfs kernel: panic: I/O to pool 'zfsraid' appears to be >> hung on vdev guid 18180224580327100979 at '/dev/da0'. >> Jun 20 05:28:33 zfs kernel: cpuid = 0 >> Jun 20 05:28:33 zfs kernel: Uptime: 8d9h7m9s >> Jun 20 05:28:33 zfs kernel: Dumping 6445 out of 8174 >> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >> >> Which leads me to believe that /dev/da0 went out on vacation, leaving >> ZFS into trouble.... But the array is: >> ---- >> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP >> zfsraid 32.5T 13.3T 19.2T - 7% 41% 1.00x >> ONLINE - >> raidz2 16.2T 6.67T 9.58T - 8% 41% >> da0 - - - - - - >> da1 - - - - - - >> da2 - - - - - - >> da3 - - - - - - >> da4 - - - - - - >> da5 - - - - - - >> raidz2 16.2T 6.67T 9.58T - 7% 41% >> da6 - - - - - - >> da7 - - - - - - >> ada4 - - - - - - >> ada5 - - - - - - >> ada6 - - - - - - >> ada7 - - - - - - >> mirror 504M 1.73M 502M - 39% 0% >> gpt/log0 - - - - - - >> gpt/log1 - - - - - - >> cache - - - - - - >> gpt/raidcache0 109G 1.34G 107G - 0% 1% >> gpt/raidcache1 109G 787M 108G - 0% 0% >> ---- >> >> And thus I'd would have expected that ZFS would disconnect /dev/da0 and >> then switch to DEGRADED state and continue, letting the operator fix the >> broken disk. >> Instead it chooses to panic, which is not a nice thing to do. :) >> >> Or do I have to high hopes of ZFS? >> >> Next question to answer is why this WD RED on: >> >> arcmsr0@pci0:7:14:0: class=0x010400 card=0x112017d3 chip=0x112017d3 >> rev=0x00 hdr=0x00 >> vendor = 'Areca Technology Corp.' >> device = 'ARC-1120 8-Port PCI-X to SATA RAID Controller' >> class = mass storage >> subclass = RAID >> >> got hung, and nothing for this shows in SMART.... From owner-freebsd-fs@FreeBSD.ORG Sun Jun 21 14:30:44 2015 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 826EEA1E for ; Sun, 21 Jun 2015 14:30:44 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id ED166F48 for ; Sun, 21 Jun 2015 14:30:43 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2D5BABUyYZV/95baINbg2RfBoMYvEkKhS5KAoFYEQEBAQEBAQGBCoQiAQEBAwEBAQEgBCcgCwUWGBEZAgQlAQkmBggHBAEcBIgGCA2xJ5V1AQEBAQEBAQMBAQEBAQEBAQEZi0WENAEBBRcZGweCaIFDBYwOh2+CJIIyhC+EA0GDTIgoikImY4FZgVkiMQeBBTqBAgEBAQ X-IronPort-AV: E=Sophos;i="5.13,654,1427774400"; d="scan'208";a="219649125" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 21 Jun 2015 10:30:35 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id ED97AB3F84; Sun, 21 Jun 2015 10:30:34 -0400 (EDT) Date: Sun, 21 Jun 2015 10:30:34 -0400 (EDT) From: Rick Macklem To: "alex.burlyga.ietf alex.burlyga.ietf" Cc: freebsd-fs@freebsd.org Message-ID: <1969046464.61534041.1434897034960.JavaMail.root@uoguelph.ca> In-Reply-To: Subject: Re: [nfs][client] - Question about handling of the NFS3_EEXIST error in SYMLINK rpc MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_61534039_1057484672.1434897034958" X-Originating-IP: [172.17.95.10] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Jun 2015 14:30:44 -0000 ------=_Part_61534039_1057484672.1434897034958 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Alex Burlyga wrote: > Hi, > > NFS client code in nfsrpc_symlink() masks server returned NFS3_EEXIST > error > code > by returning 0 to the upper layers. I'm assuming this was an attempt > to > work around > some server's broken replay cache out there, however, it breaks a > more > common > case where server is returning EEXIST for legitimate reason and > application > is expecting this error code and equipped to deal with it. > > To fix it I see three ways of doing this: > * Remove offending code > * Make it optional, sysctl? > * On NFS3_EEXIST send READLINK rpc to make sure symlink content is > right > > Which of the ways will maximize the chances of getting this fix > upstream? > I've attached a patch for testing/review that does essentially #2. It has no effect on trivial tests, since the syscall does a Lookup before trying to create the symlink and fails with EEXIST. Do you have a case where competing clients are trying to create the symlink or something like that, which runs into this? Please test the attached patch, since I don't know how to do that, rick > One more point, old client circa FreeBSD 7.0 does not exhibit this > problem. > > Alex > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > ------=_Part_61534039_1057484672.1434897034958 Content-Type: text/x-patch; name=eexist.patch Content-Disposition: attachment; filename=eexist.patch Content-Transfer-Encoding: base64 LS0tIGZzL25mc2NsaWVudC9uZnNfY2xycGNvcHMuYy5zYXYyCTIwMTUtMDYtMjEgMDk6Mjc6Mzgu NjQwOTQ3MDAwIC0wNDAwCisrKyBmcy9uZnNjbGllbnQvbmZzX2NscnBjb3BzLmMJMjAxNS0wNi0y MSAwOTo1Mzo0Mi43MjMwODUwMDAgLTA0MDAKQEAgLTQ2LDYgKzQ2LDEzIEBAIF9fRkJTRElEKCIk RnJlZUJTRDogaGVhZC9zeXMvZnMvbmZzY2xpZW4KICNpbmNsdWRlICJvcHRfaW5ldDYuaCIKIAog I2luY2x1ZGUgPGZzL25mcy9uZnNwb3J0Lmg+CisjaW5jbHVkZSA8c3lzL3N5c2N0bC5oPgorCitT WVNDVExfREVDTChfdmZzX25mcyk7CisKK3N0YXRpYyBpbnQJbmZzaWdub3JlX2VleGlzdCA9IDA7 CitTWVNDVExfSU5UKF92ZnNfbmZzLCBPSURfQVVUTywgaWdub3JlX2VleGlzdCwgQ1RMRkxBR19S VywKKyAgICAmbmZzaWdub3JlX2VleGlzdCwgMCwgIk5GUyBpZ25vcmUgRUVYSVNUIHJlcGxpZXMg Zm9yIG1rZGlyL3N5bWxpbmsiKTsKIAogLyoKICAqIEdsb2JhbCB2YXJpYWJsZXMKQEAgLTI1MzAs OCArMjUzNywxMiBAQCBuZnNycGNfc3ltbGluayh2bm9kZV90IGR2cCwgY2hhciAqbmFtZSwgCiAJ bWJ1Zl9mcmVlbShuZC0+bmRfbXJlcCk7CiAJLyoKIAkgKiBLbHVkZ2U6IE1hcCBFRVhJU1QgPT4g MCBhc3N1bWluZyB0aGF0IGl0IGlzIGEgcmVwbHkgdG8gYSByZXRyeS4KKwkgKiBPbmx5IGRvIHRo aXMgaWYgdmZzLm5mcy5pZ25vcmVfZWV4aXN0IGlzIHNldC4KKwkgKiBOZXZlciBkbyB0aGlzIGZv ciBORlN2NC4xIG9yIGxhdGVyIG1pbm9yIHZlcnNpb25zLCBzaW5jZSBzZXNzaW9ucworCSAqIHNo b3VsZCBndWFyYW50ZWUgImV4YWN0bHkgb25jZSIgUlBDIHNlbWFudGljcy4KIAkgKi8KLQlpZiAo ZXJyb3IgPT0gRUVYSVNUKQorCWlmIChlcnJvciA9PSBFRVhJU1QgJiYgbmZzaWdub3JlX2VleGlz dCAhPSAwICYmICghTkZTSEFTTkZTVjQobm1wKSB8fAorCSAgICBubXAtPm5tX21pbm9ydmVycyA9 PSAwKSkKIAkJZXJyb3IgPSAwOwogCXJldHVybiAoZXJyb3IpOwogfQpAQCAtMjU1MCwxMCArMjU2 MSwxMiBAQCBuZnNycGNfbWtkaXIodm5vZGVfdCBkdnAsIGNoYXIgKm5hbWUsIGluCiAJbmZzYXR0 cmJpdF90IGF0dHJiaXRzOwogCWludCBlcnJvciA9IDA7CiAJc3RydWN0IG5mc2ZoICpmaHA7CisJ c3RydWN0IG5mc21vdW50ICpubXA7CiAKIAkqbmZocHAgPSBOVUxMOwogCSphdHRyZmxhZ3AgPSAw OwogCSpkYXR0cmZsYWdwID0gMDsKKwlubXAgPSBWRlNUT05GUyh2bm9kZV9tb3VudChkdnApKTsK IAlmaHAgPSBWVE9ORlMoZHZwKS0+bl9maHA7CiAJaWYgKG5hbWVsZW4gPiBORlNfTUFYTkFNTEVO KQogCQlyZXR1cm4gKEVOQU1FVE9PTE9ORyk7CkBAIC0yNjA1LDkgKzI2MTgsMTMgQEAgbmZzcnBj X21rZGlyKHZub2RlX3QgZHZwLCBjaGFyICpuYW1lLCBpbgogbmZzbW91dDoKIAltYnVmX2ZyZWVt KG5kLT5uZF9tcmVwKTsKIAkvKgotCSAqIEtsdWRnZTogTWFwIEVFWElTVCA9PiAwIGFzc3VtaW5n IHRoYXQgeW91IGhhdmUgYSByZXBseSB0byBhIHJldHJ5LgorCSAqIEtsdWRnZTogTWFwIEVFWElT VCA9PiAwIGFzc3VtaW5nIHRoYXQgaXQgaXMgYSByZXBseSB0byBhIHJldHJ5LgorCSAqIE9ubHkg ZG8gdGhpcyBpZiB2ZnMubmZzLmlnbm9yZV9lZXhpc3QgaXMgc2V0LgorCSAqIE5ldmVyIGRvIHRo aXMgZm9yIE5GU3Y0LjEgb3IgbGF0ZXIgbWlub3IgdmVyc2lvbnMsIHNpbmNlIHNlc3Npb25zCisJ ICogc2hvdWxkIGd1YXJhbnRlZSAiZXhhY3RseSBvbmNlIiBSUEMgc2VtYW50aWNzLgogCSAqLwot CWlmIChlcnJvciA9PSBFRVhJU1QpCisJaWYgKGVycm9yID09IEVFWElTVCAmJiBuZnNpZ25vcmVf ZWV4aXN0ICE9IDAgJiYgKCFORlNIQVNORlNWNChubXApIHx8CisJICAgIG5tcC0+bm1fbWlub3J2 ZXJzID09IDApKQogCQllcnJvciA9IDA7CiAJcmV0dXJuIChlcnJvcik7CiB9Cg== ------=_Part_61534039_1057484672.1434897034958-- From owner-freebsd-fs@FreeBSD.ORG Sun Jun 21 18:24:02 2015 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4A1C3FB4 for ; Sun, 21 Jun 2015 18:24:02 +0000 (UTC) (envelope-from nobody@ws1.emirates.net.ae) Received: from dsrmail2.emirates.net.ae (dsrmail2.emirates.net.ae [194.170.201.252]) by mx1.freebsd.org (Postfix) with ESMTP id 9F37ABD3 for ; Sun, 21 Jun 2015 18:24:00 +0000 (UTC) (envelope-from nobody@ws1.emirates.net.ae) Received: from ws1.emirates.net.ae ([194.170.187.5]) by dsrmail2.emirates.net.ae (I&ES Mail Server 4.2) with ESMTP id <0NQB00GHH4FXM4C0@dsrmail2.emirates.net.ae> for freebsd-fs@freebsd.org; Sun, 21 Jun 2015 22:23:57 +0400 (GST) Received: from ws1.emirates.net.ae (localhost [127.0.0.1]) by ws1.emirates.net.ae (8.14.5+Sun/8.14.5) with ESMTP id t5LINvkG023894 for ; Sun, 21 Jun 2015 22:23:57 +0400 (GST) Received: (from nobody@localhost) by ws1.emirates.net.ae (8.14.5+Sun/8.14.5/Submit) id t5LINvph023890; Sun, 21 Jun 2015 22:23:57 +0400 (GST) To: freebsd-fs@freebsd.org Subject: Notice to Appear Date: Sun, 21 Jun 2015 22:23:57 +0400 From: State Court Reply-to: State Court Message-id: <134a52e2ce858f5170de33ee9f15f858@wmc-e.ae> X-Priority: 3 MIME-version: 1.0 Content-Type: text/plain; charset=us-ascii X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Jun 2015 18:24:02 -0000 Notice to Appear, You have to appear in the Court on the June 25. Please, do not forget to bring all the documents related to the case. Note: The case will be heard by the judge in your absence if you do not come. The copy of Court Notice is attached to this email. Yours faithfully, Karl Weber, Court Secretary. From owner-freebsd-fs@FreeBSD.ORG Sun Jun 21 19:50:18 2015 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 691AA10A for ; Sun, 21 Jun 2015 19:50:18 +0000 (UTC) (envelope-from thomasrcurry@gmail.com) Received: from mail-oi0-x230.google.com (mail-oi0-x230.google.com [IPv6:2607:f8b0:4003:c06::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2E9B097 for ; Sun, 21 Jun 2015 19:50:18 +0000 (UTC) (envelope-from thomasrcurry@gmail.com) Received: by oigx81 with SMTP id x81so109591111oig.1 for ; Sun, 21 Jun 2015 12:50:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=L8a9ADRzLN8Rsm2g877jh9MfaJbptDYU8BEsnPlKCmo=; b=flwiczJEzUQ3ryLHHdZayrlC1l4zhvkmUgM5tR6I9OE1a7FSP98eTYB+P55eVtcBa1 F8eigYnL2EsQ6dex2Vkik2tLcXUWjL31tOYHahb/fRU1j6qSgvb4OfYy9wXOTGiWzUYT O4vFdTPBIxxDq/ffWg+mxxN62j4A2fX5dfKrRCm4k+FbKIo4GsPg6xBoEGngkXmjNGJd Tc6qbkThjmZt7igPPCqvtelOM2Tg4cipooZ9/Osp3DBuKUPZpPh+WmCyiJpGtNl0iErL k7T8pt58E+Icl8nbQWAZ+I17JwYiIERsiV3wGKvfDbMEH9tGIwJtaZy1SHCEDWqyJEzI LJrQ== MIME-Version: 1.0 X-Received: by 10.60.155.132 with SMTP id vw4mr8044581oeb.51.1434916217248; Sun, 21 Jun 2015 12:50:17 -0700 (PDT) Received: by 10.202.77.138 with HTTP; Sun, 21 Jun 2015 12:50:17 -0700 (PDT) In-Reply-To: <5586C396.9010100@digiware.nl> References: <5585767B.4000206@digiware.nl> <558590BD.40603@isletech.net> <5586C396.9010100@digiware.nl> Date: Sun, 21 Jun 2015 15:50:17 -0400 Message-ID: Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS From: Tom Curry To: Willem Jan Withagen Cc: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Jun 2015 19:50:18 -0000 Was there by chance a lot of disk activity going on when this occurred? On Sun, Jun 21, 2015 at 10:00 AM, Willem Jan Withagen wrote: > On 20/06/2015 18:11, Daryl Richards wrote: > > Check the failmode setting on your pool. From man zpool: > > > > failmode=wait | continue | panic > > > > Controls the system behavior in the event of catastrophic > > pool failure. This condition is typically a > > result of a loss of connectivity to the underlying storage > > device(s) or a failure of all devices within > > the pool. The behavior of such an event is determined as > > follows: > > > > wait Blocks all I/O access until the device > > connectivity is recovered and the errors are cleared. > > This is the default behavior. > > > > continue Returns EIO to any new write I/O requests but > > allows reads to any of the remaining healthy > > devices. Any write requests that have yet to be > > committed to disk would be blocked. > > > > panic Prints out a message to the console and generates > > a system crash dump. > > 'mmm > > Did not know about this setting. Nice one, but alas my current setting is: > zfsboot failmode wait default > zfsraid failmode wait default > > So either the setting is not working, or something else is up? > Is waiting only meant to wait a limited time? And then panic anyways? > > But then still I wonder why even in the 'continue'-case the ZFS system > ends in a state where the filesystem is not able to continue in its > standard functioning ( read and write ) and disconnects the disk??? > > All failmode settings result in a seriously handicapped system... > On a raidz2 system I would perhaps expected this to occur when the > second disk goes into thin space?? > > The other question is: The man page talks about > 'Controls the system behavior in the event of catastrophic pool failure' > And is a hung disk a 'catastrophic pool failure'? > > Still very puzzled? > > --WjW > > > > > > > On 2015-06-20 10:19 AM, Willem Jan Withagen wrote: > >> Hi, > >> > >> Found my system rebooted this morning: > >> > >> Jun 20 05:28:33 zfs kernel: sonewconn: pcb 0xfffff8011b6da498: Listen > >> queue overflow: 8 already in queue awaiting acceptance (48 occurrences) > >> Jun 20 05:28:33 zfs kernel: panic: I/O to pool 'zfsraid' appears to be > >> hung on vdev guid 18180224580327100979 at '/dev/da0'. > >> Jun 20 05:28:33 zfs kernel: cpuid = 0 > >> Jun 20 05:28:33 zfs kernel: Uptime: 8d9h7m9s > >> Jun 20 05:28:33 zfs kernel: Dumping 6445 out of 8174 > >> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > >> > >> Which leads me to believe that /dev/da0 went out on vacation, leaving > >> ZFS into trouble.... But the array is: > >> ---- > >> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP > >> zfsraid 32.5T 13.3T 19.2T - 7% 41% 1.00x > >> ONLINE - > >> raidz2 16.2T 6.67T 9.58T - 8% 41% > >> da0 - - - - - - > >> da1 - - - - - - > >> da2 - - - - - - > >> da3 - - - - - - > >> da4 - - - - - - > >> da5 - - - - - - > >> raidz2 16.2T 6.67T 9.58T - 7% 41% > >> da6 - - - - - - > >> da7 - - - - - - > >> ada4 - - - - - - > >> ada5 - - - - - - > >> ada6 - - - - - - > >> ada7 - - - - - - > >> mirror 504M 1.73M 502M - 39% 0% > >> gpt/log0 - - - - - - > >> gpt/log1 - - - - - - > >> cache - - - - - - > >> gpt/raidcache0 109G 1.34G 107G - 0% 1% > >> gpt/raidcache1 109G 787M 108G - 0% 0% > >> ---- > >> > >> And thus I'd would have expected that ZFS would disconnect /dev/da0 and > >> then switch to DEGRADED state and continue, letting the operator fix the > >> broken disk. > >> Instead it chooses to panic, which is not a nice thing to do. :) > >> > >> Or do I have to high hopes of ZFS? > >> > >> Next question to answer is why this WD RED on: > >> > >> arcmsr0@pci0:7:14:0: class=0x010400 card=0x112017d3 chip=0x112017d3 > >> rev=0x00 hdr=0x00 > >> vendor = 'Areca Technology Corp.' > >> device = 'ARC-1120 8-Port PCI-X to SATA RAID Controller' > >> class = mass storage > >> subclass = RAID > >> > >> got hung, and nothing for this shows in SMART.... > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 00:43:27 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CF390506 for ; Mon, 22 Jun 2015 00:43:27 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:1900:2254:206c::16:88]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hub.freebsd.org", Issuer "hub.freebsd.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4727396B for ; Mon, 22 Jun 2015 00:11:02 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: by hub.freebsd.org (Postfix) id 2BA80272; Mon, 22 Jun 2015 00:11:02 +0000 (UTC) Delivered-To: fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2876A270 for ; Mon, 22 Jun 2015 00:11:02 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (unknown [IPv6:2001:4cb8:90:ffff::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 60DD61F6F for ; Mon, 22 Jun 2015 00:10:28 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from rack1.digiware.nl (unknown [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id 1E81516A409; Mon, 22 Jun 2015 01:30:56 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id q1W7ZvfcA2rj; Mon, 22 Jun 2015 01:30:45 +0200 (CEST) Received: from [IPv6:2001:4cb8:3:1:a079:ce8f:c2bf:e69] (unknown [IPv6:2001:4cb8:3:1:a079:ce8f:c2bf:e69]) by smtp.digiware.nl (Postfix) with ESMTPA id 7CF3916A407; Mon, 22 Jun 2015 01:30:45 +0200 (CEST) Message-ID: <55874927.80807@digiware.nl> Date: Mon, 22 Jun 2015 01:30:47 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Quartz CC: fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS References: <5585767B.4000206@digiware.nl> <5587236A.6020404@sneakertech.com> In-Reply-To: <5587236A.6020404@sneakertech.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 00:43:27 -0000 On 21/06/2015 22:49, Quartz wrote: > Also: > >> And thus I'd would have expected that ZFS would disconnect /dev/da0 and >> then switch to DEGRADED state and continue, letting the operator fix the >> broken disk. > >> Next question to answer is why this WD RED on: > >> got hung, and nothing for this shows in SMART.... > > You have a raidz2, which means THREE disks need to go down before the > pool is unwritable. The problem is most likely your controller or power > supply, not your disks. But still I would expect the volume to become degraded if one of the disks goes into the error state? It is real nice that it is still 'raidz1' but it does need to get fixed... > Also2: don't rely too much on SMART for determining drive health. Google > released a paper a few years ago revealing that half of all drives die > without reporting SMART errors. > > http://research.google.com/archive/disk_failures.pdf This article is mainly about forcasting disk failure based on SMART numbers.... Because first "failures" in SMART do nor require one to immediately replace the disk. The common idea is, if the numbers grow, expect the device to break. I was just looking at the counters to see if the disk had logged just any fact of info/warning/error that could have anything to do with the problem I have. Thanx, --WjW From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 00:46:48 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 27F0B571 for ; Mon, 22 Jun 2015 00:46:48 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (unknown [IPv6:2607:f440::d144:5b3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 00971F56 for ; Mon, 22 Jun 2015 00:46:47 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id C03A33F70D for ; Sun, 21 Jun 2015 17:06:18 -0400 (EDT) Message-ID: <5587274A.2020205@sneakertech.com> Date: Sun, 21 Jun 2015 17:06:18 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: ZFS pool restructuring and emergency repair References: <5584C0BC.9070707@sneakertech.com> <5584F83D.1040702@egr.msu.edu> In-Reply-To: <5584F83D.1040702@egr.msu.edu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 00:46:48 -0000 > man gvirstor which lets you create an arbitrarily large storage device > backed by chunks of storage based on how much you are actually using. I > have not used it. So, effectively a sparse disk image? From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 00:46:48 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 28457572 for ; Mon, 22 Jun 2015 00:46:48 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (unknown [IPv6:2607:f440::d144:5b3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 009C3F57 for ; Mon, 22 Jun 2015 00:46:47 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id 9371B3F715; Sun, 21 Jun 2015 20:28:27 -0400 (EDT) Message-ID: <558756AB.405@sneakertech.com> Date: Sun, 21 Jun 2015 20:28:27 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Willem Jan Withagen CC: freebsd-fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS References: <5585767B.4000206@digiware.nl> <558590BD.40603@isletech.net> <5586C396.9010100@digiware.nl> <55871F4C.5010103@sneakertech.com> <55874772.4090607@digiware.nl> In-Reply-To: <55874772.4090607@digiware.nl> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 00:46:48 -0000 > But especially the hung disk during reading Writing is the issue moreso. At least, if you set your failmode to 'continue' ZFS will to try to honor reads as long as it's able, but writes will block. (In practice though it'll usually only give you an extra minute or so before everything locks up). > We'll the pool did not die, (at least not IMHO) Sorry, that's bad wording on my part. What I meant was that IO to the pool died. >just one disk stopt > working.... It would have to be 3+ disks in your case, with a raidz2. > I guess that if I like to live dangerously, I could set enabled to 0, > and run the risk... ?? Well, that will just disable the auto panic. If the IO disappeared into a black hole due to a hardware issue the machine will just stay hung forever until you manually press the reset button on the front. ZFS will prevent any major corruption of the pool so it's not really "dangerous". (Outside of further hardware failures). > But still I would expect the volume to become degraded if one of the > disks goes into the error state? If *one* of the disks drops out, yes. If a second drops out later, also yes, because ZFS can still handle IO to the pool. But as soon as that third disk drops out in a way that locks up IO, ZFS freezes. For reference, I had a raidz2 test case with 6 drives. I could yank the sata cable off two of the drives and the pool would be marked as degraded, but as soon as I yanked that third drive everything froze. This is why I heavily suspect in your case that your controller or PSU is failing and dropping multiple disks at a time. The fact that the log reports da0 is probably just because that was the last disk ZFS tried to fall back on when they all dropped out at once. Ideally, the system *should* handle this situation gracefully, but the reality is that it doesn't. If the last disk fails in a way that hangs IO, it takes the whole machine with it. No system configuration change can prevent this, not with how things are currently designed. > This article is mainly about forcasting disk failure based on SMART > numbers.... > I was just looking at the counters to see if the disk had logged just > any fact of info/warning/error What Google found out is that a lot of disks *don't* report errors or warnings before experiencing problems. In other words, SMART saying "all good" doesn't really mean much in practice, so you shouldn't really rely on it for diagnostics. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 00:57:18 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C0A317A2 for ; Mon, 22 Jun 2015 00:57:18 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from hub.freebsd.org (hub.freebsd.org [8.8.178.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hub.freebsd.org", Issuer "hub.freebsd.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id A413D656 for ; Mon, 22 Jun 2015 00:57:18 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: by hub.freebsd.org (Postfix) id 99E837A1; Mon, 22 Jun 2015 00:57:18 +0000 (UTC) Delivered-To: fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 996167A0 for ; Mon, 22 Jun 2015 00:57:18 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6187A653; Mon, 22 Jun 2015 00:57:18 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id t5LL1tkE017983; Sun, 21 Jun 2015 16:01:55 -0500 (CDT) Date: Sun, 21 Jun 2015 16:01:55 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Steve Wills cc: Willem Jan Withagen , fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS In-Reply-To: <20150620221431.GB26416@mouf.net> Message-ID: References: <5585767B.4000206@digiware.nl> <20150620221431.GB26416@mouf.net> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Sun, 21 Jun 2015 16:01:56 -0500 (CDT) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 00:57:18 -0000 On Sat, 20 Jun 2015, Steve Wills wrote: >> rev=0x00 hdr=0x00 >> vendor = 'Areca Technology Corp.' >> device = 'ARC-1120 8-Port PCI-X to SATA RAID Controller' >> class = mass storage >> subclass = RAID > > You may be hitting the zfs deadman panic, which is triggered when the > controller hangs. This can in some cases be caused by disks that die in unusual > ways. Notice that the RAID controller is a PCI-X device (shared parallel, not dedicated serial like PCIe). The whole PCI backplane could have hung. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 00:57:22 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 06FE87D4 for ; Mon, 22 Jun 2015 00:57:22 +0000 (UTC) (envelope-from thomasrcurry@gmail.com) Received: from mail-oi0-x235.google.com (mail-oi0-x235.google.com [IPv6:2607:f8b0:4003:c06::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6D707668 for ; Mon, 22 Jun 2015 00:57:21 +0000 (UTC) (envelope-from thomasrcurry@gmail.com) Received: by oigb199 with SMTP id b199so70083251oig.3 for ; Sun, 21 Jun 2015 17:57:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=ciGPUTCfvAP5nIIsgQvFQ5r+Kj4lpYwTsH7ktql4oqM=; b=gtI7WfHf0MvGceYSBQ2CdSVBJeq90e6790k7DGRE7psTjofmn/ayXMAZlOOdZ6/GwL mBnxmBgr1WmFDeb0army6aFnD/dRfibnlOpYuWWcd1++VzozWdleUpc6YYf7/LMGUi75 GG0x0yDxp2x7xEeLcxBGj/X4OmKNAe3wpQg5RoBuJcqar6OSGK8XfQDGVBuqOZQFDUEt 73oT9SqYjm1aVNUsJ9euxQCzCbsxWXrqq6nDdpN2JnF8Znpc/HJGysxIwdeFlMB3BkBq RWy5uAE2QZDbMxjZR4xHxROSYwagp+wSyC2s+GsbgDyF3hV6btTU6OobwWB5ATybDfAu dXKg== MIME-Version: 1.0 X-Received: by 10.60.118.193 with SMTP id ko1mr7671514oeb.38.1434932779902; Sun, 21 Jun 2015 17:26:19 -0700 (PDT) Received: by 10.202.77.138 with HTTP; Sun, 21 Jun 2015 17:26:19 -0700 (PDT) In-Reply-To: <55874C8A.4090405@digiware.nl> References: <5585767B.4000206@digiware.nl> <558590BD.40603@isletech.net> <5586C396.9010100@digiware.nl> <55873E1D.9010401@digiware.nl> <55874C8A.4090405@digiware.nl> Date: Sun, 21 Jun 2015 20:26:19 -0400 Message-ID: Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS From: Tom Curry To: Willem Jan Withagen Cc: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 00:57:22 -0000 Yes, currently I am not using the patch from that PR. But I have lowered the ARC max size, I am confident if I left it default I would have panics again. On Sun, Jun 21, 2015 at 7:45 PM, Willem Jan Withagen wrote: > On 22/06/2015 01:34, Tom Curry wrote: > > I asked because recently I had similar trouble. Lots of kernel panics, > > sometimes they were just like yours, sometimes they were general > > protection faults. But they would always occur when my nightly backups > > took place where VMs on iSCSI zvol luns were read and then written over > > smb to another pool on the same machine over 10GbE. > > > > I nearly went out of my mind trying to figure out what was going on, > > I'll spare you the gory details, but I stumbled across this PR > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594 and as I read > > So this is "the Karl Denninger ZFS patch".... > I tried to follow the discussion at the moment, keeping it in the back > of my head..... > I concluded that the ideas where sort of accepted, but a different > solution was implemented? > > > through it little light bulbs starting coming on. Luckily it was easy > > for me to reproduce the problem so I kicked off the backups and watched > > the system memory. Wired would grow, ARC would shrink, and then the > > system would start swapping. If I stopped the IO right then it would > > recover after a while. But if I let it go it would always panic, and > > half the time it would be the same message as yours. So I applied the > > patch from that PR, rebooted, and kicked off the backup. No more panic. > > Recently I rebuilt a vanilla kernel from stable/10 but explicitly set > > vfs.zfs.arc_max to 24G (I have 32G) and ran my torture tests and it is > > stable. > > So you've (almost) answered my question, but English is not my native > language and hence my question for certainty: You did not add the patch > to your recently build stable/10 kernel... > > > So I don't want to send you on a wild goose chase, but it's entirely > > possible this problem you are having is not hardware related at all, but > > is a memory starvation issue related to the ARC under periods of heavy > > activity. > > Well rsync will do that for you... And since a few months I've also > loaded some iSCSI zvols as remote disks to some windows stations. > > Your suggestions are highly appreciated. Especially since I do not have > space PCI-X parts... (It the current hardware blows up, I'm getting > monder new stuff.) So other than checking some cabling and likes there > is very little I could swap. > > Thanx, > --WjW > > > On Sun, Jun 21, 2015 at 6:43 PM, Willem Jan Withagen > > wrote: > > > > On 21/06/2015 21:50, Tom Curry wrote: > > > Was there by chance a lot of disk activity going on when this > occurred? > > > > Define 'a lot'?? > > But very likely, since the system is also a backup location for > several > > external service which backup thru rsync. And they can generate > generate > > quite some traffic. Next to the fact that it also serves a NVR with a > > ZVOL trhu iSCSI... > > > > --WjW > > > > > > > > On Sun, Jun 21, 2015 at 10:00 AM, Willem Jan Withagen < > wjw@digiware.nl > > > >> wrote: > > > > > > On 20/06/2015 18:11, Daryl Richards wrote: > > > > Check the failmode setting on your pool. From man zpool: > > > > > > > > failmode=wait | continue | panic > > > > > > > > Controls the system behavior in the event of > > catastrophic > > > > pool failure. This condition is typically a > > > > result of a loss of connectivity to the > > underlying storage > > > > device(s) or a failure of all devices within > > > > the pool. The behavior of such an event is > > determined as > > > > follows: > > > > > > > > wait Blocks all I/O access until the device > > > > connectivity is recovered and the errors are cleared. > > > > This is the default behavior. > > > > > > > > continue Returns EIO to any new write I/O > > requests but > > > > allows reads to any of the remaining healthy > > > > devices. Any write requests that have > > yet to be > > > > committed to disk would be blocked. > > > > > > > > panic Prints out a message to the console > > and generates > > > > a system crash dump. > > > > > > 'mmm > > > > > > Did not know about this setting. Nice one, but alas my current > > > setting is: > > > zfsboot failmode wait > default > > > zfsraid failmode wait > default > > > > > > So either the setting is not working, or something else is up? > > > Is waiting only meant to wait a limited time? And then panic > > anyways? > > > > > > But then still I wonder why even in the 'continue'-case the > > ZFS system > > > ends in a state where the filesystem is not able to continue > > in its > > > standard functioning ( read and write ) and disconnects the > > disk??? > > > > > > All failmode settings result in a seriously handicapped > system... > > > On a raidz2 system I would perhaps expected this to occur when > the > > > second disk goes into thin space?? > > > > > > The other question is: The man page talks about > > > 'Controls the system behavior in the event of catastrophic > > pool failure' > > > And is a hung disk a 'catastrophic pool failure'? > > > > > > Still very puzzled? > > > > > > --WjW > > > > > > > > > > > > > > > On 2015-06-20 10:19 AM, Willem Jan Withagen wrote: > > > >> Hi, > > > >> > > > >> Found my system rebooted this morning: > > > >> > > > >> Jun 20 05:28:33 zfs kernel: sonewconn: pcb > > 0xfffff8011b6da498: Listen > > > >> queue overflow: 8 already in queue awaiting acceptance (48 > > > occurrences) > > > >> Jun 20 05:28:33 zfs kernel: panic: I/O to pool 'zfsraid' > > appears > > > to be > > > >> hung on vdev guid 18180224580327100979 at '/dev/da0'. > > > >> Jun 20 05:28:33 zfs kernel: cpuid = 0 > > > >> Jun 20 05:28:33 zfs kernel: Uptime: 8d9h7m9s > > > >> Jun 20 05:28:33 zfs kernel: Dumping 6445 out of 8174 > > > >> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > > > >> > > > >> Which leads me to believe that /dev/da0 went out on > > vacation, leaving > > > >> ZFS into trouble.... But the array is: > > > >> ---- > > > >> NAME SIZE ALLOC FREE EXPANDSZ FRAG > > CAP DEDUP > > > >> zfsraid 32.5T 13.3T 19.2T - 7% > > 41% 1.00x > > > >> ONLINE - > > > >> raidz2 16.2T 6.67T 9.58T - 8% > 41% > > > >> da0 - - - - - > - > > > >> da1 - - - - - > - > > > >> da2 - - - - - > - > > > >> da3 - - - - - > - > > > >> da4 - - - - - > - > > > >> da5 - - - - - > - > > > >> raidz2 16.2T 6.67T 9.58T - 7% > 41% > > > >> da6 - - - - - > - > > > >> da7 - - - - - > - > > > >> ada4 - - - - - > - > > > >> ada5 - - - - - > - > > > >> ada6 - - - - - > - > > > >> ada7 - - - - - > - > > > >> mirror 504M 1.73M 502M - 39% > 0% > > > >> gpt/log0 - - - - - > - > > > >> gpt/log1 - - - - - > - > > > >> cache - - - - - - > > > >> gpt/raidcache0 109G 1.34G 107G - 0% > 1% > > > >> gpt/raidcache1 109G 787M 108G - 0% > 0% > > > >> ---- > > > >> > > > >> And thus I'd would have expected that ZFS would disconnect > > > /dev/da0 and > > > >> then switch to DEGRADED state and continue, letting the > > operator > > > fix the > > > >> broken disk. > > > >> Instead it chooses to panic, which is not a nice thing to > > do. :) > > > >> > > > >> Or do I have to high hopes of ZFS? > > > >> > > > >> Next question to answer is why this WD RED on: > > > >> > > > >> arcmsr0@pci0:7:14:0: class=0x010400 card=0x112017d3 > > > chip=0x112017d3 > > > >> rev=0x00 hdr=0x00 > > > >> vendor = 'Areca Technology Corp.' > > > >> device = 'ARC-1120 8-Port PCI-X to SATA RAID > > Controller' > > > >> class = mass storage > > > >> subclass = RAID > > > >> > > > >> got hung, and nothing for this shows in SMART.... > > > > > > _______________________________________________ > > > freebsd-fs@freebsd.org > > > > > mailing list > > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > To unsubscribe, send any mail to " > freebsd-fs-unsubscribe@freebsd.org > > > > > > >" > > > > > > > > > > > > From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 01:05:32 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 13483AB6 for ; Mon, 22 Jun 2015 01:05:32 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (smtp.digiware.nl [31.223.170.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AC4CEC99 for ; Mon, 22 Jun 2015 01:05:31 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from rack1.digiware.nl (unknown [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id ACC1F16A40B; Mon, 22 Jun 2015 01:45:28 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QpwdSG2hukCH; Mon, 22 Jun 2015 01:45:16 +0200 (CEST) Received: from [IPv6:2001:4cb8:3:1:a079:ce8f:c2bf:e69] (unknown [IPv6:2001:4cb8:3:1:a079:ce8f:c2bf:e69]) by smtp.digiware.nl (Postfix) with ESMTPA id 03F8416A40A; Mon, 22 Jun 2015 01:45:16 +0200 (CEST) Message-ID: <55874C8A.4090405@digiware.nl> Date: Mon, 22 Jun 2015 01:45:14 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Tom Curry CC: freebsd-fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS References: <5585767B.4000206@digiware.nl> <558590BD.40603@isletech.net> <5586C396.9010100@digiware.nl> <55873E1D.9010401@digiware.nl> In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 01:05:32 -0000 On 22/06/2015 01:34, Tom Curry wrote: > I asked because recently I had similar trouble. Lots of kernel panics, > sometimes they were just like yours, sometimes they were general > protection faults. But they would always occur when my nightly backups > took place where VMs on iSCSI zvol luns were read and then written over > smb to another pool on the same machine over 10GbE. > > I nearly went out of my mind trying to figure out what was going on, > I'll spare you the gory details, but I stumbled across this PR > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594 and as I read So this is "the Karl Denninger ZFS patch".... I tried to follow the discussion at the moment, keeping it in the back of my head..... I concluded that the ideas where sort of accepted, but a different solution was implemented? > through it little light bulbs starting coming on. Luckily it was easy > for me to reproduce the problem so I kicked off the backups and watched > the system memory. Wired would grow, ARC would shrink, and then the > system would start swapping. If I stopped the IO right then it would > recover after a while. But if I let it go it would always panic, and > half the time it would be the same message as yours. So I applied the > patch from that PR, rebooted, and kicked off the backup. No more panic. > Recently I rebuilt a vanilla kernel from stable/10 but explicitly set > vfs.zfs.arc_max to 24G (I have 32G) and ran my torture tests and it is > stable. So you've (almost) answered my question, but English is not my native language and hence my question for certainty: You did not add the patch to your recently build stable/10 kernel... > So I don't want to send you on a wild goose chase, but it's entirely > possible this problem you are having is not hardware related at all, but > is a memory starvation issue related to the ARC under periods of heavy > activity. Well rsync will do that for you... And since a few months I've also loaded some iSCSI zvols as remote disks to some windows stations. Your suggestions are highly appreciated. Especially since I do not have space PCI-X parts... (It the current hardware blows up, I'm getting monder new stuff.) So other than checking some cabling and likes there is very little I could swap. Thanx, --WjW > On Sun, Jun 21, 2015 at 6:43 PM, Willem Jan Withagen > wrote: > > On 21/06/2015 21:50, Tom Curry wrote: > > Was there by chance a lot of disk activity going on when this occurred? > > Define 'a lot'?? > But very likely, since the system is also a backup location for several > external service which backup thru rsync. And they can generate generate > quite some traffic. Next to the fact that it also serves a NVR with a > ZVOL trhu iSCSI... > > --WjW > > > > > On Sun, Jun 21, 2015 at 10:00 AM, Willem Jan Withagen > > >> wrote: > > > > On 20/06/2015 18:11, Daryl Richards wrote: > > > Check the failmode setting on your pool. From man zpool: > > > > > > failmode=wait | continue | panic > > > > > > Controls the system behavior in the event of > catastrophic > > > pool failure. This condition is typically a > > > result of a loss of connectivity to the > underlying storage > > > device(s) or a failure of all devices within > > > the pool. The behavior of such an event is > determined as > > > follows: > > > > > > wait Blocks all I/O access until the device > > > connectivity is recovered and the errors are cleared. > > > This is the default behavior. > > > > > > continue Returns EIO to any new write I/O > requests but > > > allows reads to any of the remaining healthy > > > devices. Any write requests that have > yet to be > > > committed to disk would be blocked. > > > > > > panic Prints out a message to the console > and generates > > > a system crash dump. > > > > 'mmm > > > > Did not know about this setting. Nice one, but alas my current > > setting is: > > zfsboot failmode wait default > > zfsraid failmode wait default > > > > So either the setting is not working, or something else is up? > > Is waiting only meant to wait a limited time? And then panic > anyways? > > > > But then still I wonder why even in the 'continue'-case the > ZFS system > > ends in a state where the filesystem is not able to continue > in its > > standard functioning ( read and write ) and disconnects the > disk??? > > > > All failmode settings result in a seriously handicapped system... > > On a raidz2 system I would perhaps expected this to occur when the > > second disk goes into thin space?? > > > > The other question is: The man page talks about > > 'Controls the system behavior in the event of catastrophic > pool failure' > > And is a hung disk a 'catastrophic pool failure'? > > > > Still very puzzled? > > > > --WjW > > > > > > > > > > > On 2015-06-20 10:19 AM, Willem Jan Withagen wrote: > > >> Hi, > > >> > > >> Found my system rebooted this morning: > > >> > > >> Jun 20 05:28:33 zfs kernel: sonewconn: pcb > 0xfffff8011b6da498: Listen > > >> queue overflow: 8 already in queue awaiting acceptance (48 > > occurrences) > > >> Jun 20 05:28:33 zfs kernel: panic: I/O to pool 'zfsraid' > appears > > to be > > >> hung on vdev guid 18180224580327100979 at '/dev/da0'. > > >> Jun 20 05:28:33 zfs kernel: cpuid = 0 > > >> Jun 20 05:28:33 zfs kernel: Uptime: 8d9h7m9s > > >> Jun 20 05:28:33 zfs kernel: Dumping 6445 out of 8174 > > >> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > > >> > > >> Which leads me to believe that /dev/da0 went out on > vacation, leaving > > >> ZFS into trouble.... But the array is: > > >> ---- > > >> NAME SIZE ALLOC FREE EXPANDSZ FRAG > CAP DEDUP > > >> zfsraid 32.5T 13.3T 19.2T - 7% > 41% 1.00x > > >> ONLINE - > > >> raidz2 16.2T 6.67T 9.58T - 8% 41% > > >> da0 - - - - - - > > >> da1 - - - - - - > > >> da2 - - - - - - > > >> da3 - - - - - - > > >> da4 - - - - - - > > >> da5 - - - - - - > > >> raidz2 16.2T 6.67T 9.58T - 7% 41% > > >> da6 - - - - - - > > >> da7 - - - - - - > > >> ada4 - - - - - - > > >> ada5 - - - - - - > > >> ada6 - - - - - - > > >> ada7 - - - - - - > > >> mirror 504M 1.73M 502M - 39% 0% > > >> gpt/log0 - - - - - - > > >> gpt/log1 - - - - - - > > >> cache - - - - - - > > >> gpt/raidcache0 109G 1.34G 107G - 0% 1% > > >> gpt/raidcache1 109G 787M 108G - 0% 0% > > >> ---- > > >> > > >> And thus I'd would have expected that ZFS would disconnect > > /dev/da0 and > > >> then switch to DEGRADED state and continue, letting the > operator > > fix the > > >> broken disk. > > >> Instead it chooses to panic, which is not a nice thing to > do. :) > > >> > > >> Or do I have to high hopes of ZFS? > > >> > > >> Next question to answer is why this WD RED on: > > >> > > >> arcmsr0@pci0:7:14:0: class=0x010400 card=0x112017d3 > > chip=0x112017d3 > > >> rev=0x00 hdr=0x00 > > >> vendor = 'Areca Technology Corp.' > > >> device = 'ARC-1120 8-Port PCI-X to SATA RAID > Controller' > > >> class = mass storage > > >> subclass = RAID > > >> > > >> got hung, and nothing for this shows in SMART.... > > > > _______________________________________________ > > freebsd-fs@freebsd.org > > > mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org > > > >" > > > > > > From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 01:06:01 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2A0B0B34 for ; Mon, 22 Jun 2015 01:06:01 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 015F0EDD for ; Mon, 22 Jun 2015 01:06:01 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id t5LL0BnU027262 for ; Sun, 21 Jun 2015 21:00:11 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Message-Id: <201506212100.t5LL0BnU027262@kenobi.freebsd.org> From: bugzilla-noreply@FreeBSD.org To: freebsd-fs@FreeBSD.org Subject: Problem reports for freebsd-fs@FreeBSD.org that need special attention X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 Date: Sun, 21 Jun 2015 21:00:11 +0000 Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 01:06:01 -0000 To view an individual PR, use: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id). The following is a listing of current problems submitted by FreeBSD users, which need special attention. These represent problem reports covering all versions including experimental development code and obsolete releases. Status | Bug Id | Description ------------+-----------+--------------------------------------------------- Open | 136470 | [nfs] Cannot mount / in read-only, over NFS Open | 139651 | [nfs] mount(8): read-only remount of NFS volume d Open | 144447 | [zfs] sharenfs fsunshare() & fsshare_main() non f 3 problems total for which you should take action. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 01:10:26 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 29DD2CC5 for ; Mon, 22 Jun 2015 01:10:26 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (smtp.digiware.nl [31.223.170.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C69D8A7D for ; Mon, 22 Jun 2015 01:10:25 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from rack1.digiware.nl (unknown [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id B891D16A403; Mon, 22 Jun 2015 00:43:51 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gUsaS4ddV81U; Mon, 22 Jun 2015 00:43:39 +0200 (CEST) Received: from [IPv6:2001:4cb8:3:1:a079:ce8f:c2bf:e69] (unknown [IPv6:2001:4cb8:3:1:a079:ce8f:c2bf:e69]) by smtp.digiware.nl (Postfix) with ESMTPA id 91B3D16A402; Mon, 22 Jun 2015 00:43:39 +0200 (CEST) Message-ID: <55873E1D.9010401@digiware.nl> Date: Mon, 22 Jun 2015 00:43:41 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Tom Curry CC: freebsd-fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS References: <5585767B.4000206@digiware.nl> <558590BD.40603@isletech.net> <5586C396.9010100@digiware.nl> In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 01:10:26 -0000 On 21/06/2015 21:50, Tom Curry wrote: > Was there by chance a lot of disk activity going on when this occurred? Define 'a lot'?? But very likely, since the system is also a backup location for several external service which backup thru rsync. And they can generate generate quite some traffic. Next to the fact that it also serves a NVR with a ZVOL trhu iSCSI... --WjW > > On Sun, Jun 21, 2015 at 10:00 AM, Willem Jan Withagen > wrote: > > On 20/06/2015 18:11, Daryl Richards wrote: > > Check the failmode setting on your pool. From man zpool: > > > > failmode=wait | continue | panic > > > > Controls the system behavior in the event of catastrophic > > pool failure. This condition is typically a > > result of a loss of connectivity to the underlying storage > > device(s) or a failure of all devices within > > the pool. The behavior of such an event is determined as > > follows: > > > > wait Blocks all I/O access until the device > > connectivity is recovered and the errors are cleared. > > This is the default behavior. > > > > continue Returns EIO to any new write I/O requests but > > allows reads to any of the remaining healthy > > devices. Any write requests that have yet to be > > committed to disk would be blocked. > > > > panic Prints out a message to the console and generates > > a system crash dump. > > 'mmm > > Did not know about this setting. Nice one, but alas my current > setting is: > zfsboot failmode wait default > zfsraid failmode wait default > > So either the setting is not working, or something else is up? > Is waiting only meant to wait a limited time? And then panic anyways? > > But then still I wonder why even in the 'continue'-case the ZFS system > ends in a state where the filesystem is not able to continue in its > standard functioning ( read and write ) and disconnects the disk??? > > All failmode settings result in a seriously handicapped system... > On a raidz2 system I would perhaps expected this to occur when the > second disk goes into thin space?? > > The other question is: The man page talks about > 'Controls the system behavior in the event of catastrophic pool failure' > And is a hung disk a 'catastrophic pool failure'? > > Still very puzzled? > > --WjW > > > > > > > On 2015-06-20 10:19 AM, Willem Jan Withagen wrote: > >> Hi, > >> > >> Found my system rebooted this morning: > >> > >> Jun 20 05:28:33 zfs kernel: sonewconn: pcb 0xfffff8011b6da498: Listen > >> queue overflow: 8 already in queue awaiting acceptance (48 > occurrences) > >> Jun 20 05:28:33 zfs kernel: panic: I/O to pool 'zfsraid' appears > to be > >> hung on vdev guid 18180224580327100979 at '/dev/da0'. > >> Jun 20 05:28:33 zfs kernel: cpuid = 0 > >> Jun 20 05:28:33 zfs kernel: Uptime: 8d9h7m9s > >> Jun 20 05:28:33 zfs kernel: Dumping 6445 out of 8174 > >> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > >> > >> Which leads me to believe that /dev/da0 went out on vacation, leaving > >> ZFS into trouble.... But the array is: > >> ---- > >> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP > >> zfsraid 32.5T 13.3T 19.2T - 7% 41% 1.00x > >> ONLINE - > >> raidz2 16.2T 6.67T 9.58T - 8% 41% > >> da0 - - - - - - > >> da1 - - - - - - > >> da2 - - - - - - > >> da3 - - - - - - > >> da4 - - - - - - > >> da5 - - - - - - > >> raidz2 16.2T 6.67T 9.58T - 7% 41% > >> da6 - - - - - - > >> da7 - - - - - - > >> ada4 - - - - - - > >> ada5 - - - - - - > >> ada6 - - - - - - > >> ada7 - - - - - - > >> mirror 504M 1.73M 502M - 39% 0% > >> gpt/log0 - - - - - - > >> gpt/log1 - - - - - - > >> cache - - - - - - > >> gpt/raidcache0 109G 1.34G 107G - 0% 1% > >> gpt/raidcache1 109G 787M 108G - 0% 0% > >> ---- > >> > >> And thus I'd would have expected that ZFS would disconnect > /dev/da0 and > >> then switch to DEGRADED state and continue, letting the operator > fix the > >> broken disk. > >> Instead it chooses to panic, which is not a nice thing to do. :) > >> > >> Or do I have to high hopes of ZFS? > >> > >> Next question to answer is why this WD RED on: > >> > >> arcmsr0@pci0:7:14:0: class=0x010400 card=0x112017d3 > chip=0x112017d3 > >> rev=0x00 hdr=0x00 > >> vendor = 'Areca Technology Corp.' > >> device = 'ARC-1120 8-Port PCI-X to SATA RAID Controller' > >> class = mass storage > >> subclass = RAID > >> > >> got hung, and nothing for this shows in SMART.... > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org > " > > From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 01:19:19 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C6368ECD for ; Mon, 22 Jun 2015 01:19:19 +0000 (UTC) (envelope-from thomasrcurry@gmail.com) Received: from mail-oi0-x22b.google.com (mail-oi0-x22b.google.com [IPv6:2607:f8b0:4003:c06::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 895951479 for ; Mon, 22 Jun 2015 01:19:19 +0000 (UTC) (envelope-from thomasrcurry@gmail.com) Received: by oiyy130 with SMTP id y130so94848515oiy.0 for ; Sun, 21 Jun 2015 18:19:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Hd++X6M5ph1MSGvLPQPjsRhO2SKJlqnpgaiU00lvJ1E=; b=YD1bEby2rb+sCf2Cghph1Pm/J5U0BFSLTusNnDkwh2+J//epeje94c+ELRT3GFydl+ 6I1ydFMgLh7BPGTH+PXX0kZmZVWxLuLz+T2IAvdd6r7adarlIFLz4giFXv7TIuuTT4Xr cRIcHTnMl3J2vzyETv+HrYB0TfpIJQqLGRi0hT1VfCh4xqMMXKJ/nNZ1aRLexHvIE0l7 U2LeevDge6Q4NM+BYYTNGf7E/6b+FOjEAr61BvPcHM1W0rEQ6NEWdA5pfo4Q7WXz1luQ GVd6vU0AABUgUOuWnsL521yFqeZZ+BaXgno9TF5Y09zSnPWiD2x39Jw0IkfNcOly9qF5 5bbA== MIME-Version: 1.0 X-Received: by 10.182.22.33 with SMTP id a1mr22568524obf.41.1434929697833; Sun, 21 Jun 2015 16:34:57 -0700 (PDT) Received: by 10.202.77.138 with HTTP; Sun, 21 Jun 2015 16:34:57 -0700 (PDT) In-Reply-To: <55873E1D.9010401@digiware.nl> References: <5585767B.4000206@digiware.nl> <558590BD.40603@isletech.net> <5586C396.9010100@digiware.nl> <55873E1D.9010401@digiware.nl> Date: Sun, 21 Jun 2015 19:34:57 -0400 Message-ID: Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS From: Tom Curry To: Willem Jan Withagen Cc: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 01:19:19 -0000 I asked because recently I had similar trouble. Lots of kernel panics, sometimes they were just like yours, sometimes they were general protection faults. But they would always occur when my nightly backups took place where VMs on iSCSI zvol luns were read and then written over smb to another pool on the same machine over 10GbE. I nearly went out of my mind trying to figure out what was going on, I'll spare you the gory details, but I stumbled across this PR https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594 and as I read through it little light bulbs starting coming on. Luckily it was easy for me to reproduce the problem so I kicked off the backups and watched the system memory. Wired would grow, ARC would shrink, and then the system would start swapping. If I stopped the IO right then it would recover after a while. But if I let it go it would always panic, and half the time it would be the same message as yours. So I applied the patch from that PR, rebooted, and kicked off the backup. No more panic. Recently I rebuilt a vanilla kernel from stable/10 but explicitly set vfs.zfs.arc_max to 24G (I have 32G) and ran my torture tests and it is stable. So I don't want to send you on a wild goose chase, but it's entirely possible this problem you are having is not hardware related at all, but is a memory starvation issue related to the ARC under periods of heavy activity. On Sun, Jun 21, 2015 at 6:43 PM, Willem Jan Withagen wrote: > On 21/06/2015 21:50, Tom Curry wrote: > > Was there by chance a lot of disk activity going on when this occurred? > > Define 'a lot'?? > But very likely, since the system is also a backup location for several > external service which backup thru rsync. And they can generate generate > quite some traffic. Next to the fact that it also serves a NVR with a > ZVOL trhu iSCSI... > > --WjW > > > > > On Sun, Jun 21, 2015 at 10:00 AM, Willem Jan Withagen > > wrote: > > > > On 20/06/2015 18:11, Daryl Richards wrote: > > > Check the failmode setting on your pool. From man zpool: > > > > > > failmode=wait | continue | panic > > > > > > Controls the system behavior in the event of > catastrophic > > > pool failure. This condition is typically a > > > result of a loss of connectivity to the underlying > storage > > > device(s) or a failure of all devices within > > > the pool. The behavior of such an event is determined as > > > follows: > > > > > > wait Blocks all I/O access until the device > > > connectivity is recovered and the errors are cleared. > > > This is the default behavior. > > > > > > continue Returns EIO to any new write I/O > requests but > > > allows reads to any of the remaining healthy > > > devices. Any write requests that have yet > to be > > > committed to disk would be blocked. > > > > > > panic Prints out a message to the console and > generates > > > a system crash dump. > > > > 'mmm > > > > Did not know about this setting. Nice one, but alas my current > > setting is: > > zfsboot failmode wait default > > zfsraid failmode wait default > > > > So either the setting is not working, or something else is up? > > Is waiting only meant to wait a limited time? And then panic anyways? > > > > But then still I wonder why even in the 'continue'-case the ZFS > system > > ends in a state where the filesystem is not able to continue in its > > standard functioning ( read and write ) and disconnects the disk??? > > > > All failmode settings result in a seriously handicapped system... > > On a raidz2 system I would perhaps expected this to occur when the > > second disk goes into thin space?? > > > > The other question is: The man page talks about > > 'Controls the system behavior in the event of catastrophic pool > failure' > > And is a hung disk a 'catastrophic pool failure'? > > > > Still very puzzled? > > > > --WjW > > > > > > > > > > > On 2015-06-20 10:19 AM, Willem Jan Withagen wrote: > > >> Hi, > > >> > > >> Found my system rebooted this morning: > > >> > > >> Jun 20 05:28:33 zfs kernel: sonewconn: pcb 0xfffff8011b6da498: > Listen > > >> queue overflow: 8 already in queue awaiting acceptance (48 > > occurrences) > > >> Jun 20 05:28:33 zfs kernel: panic: I/O to pool 'zfsraid' appears > > to be > > >> hung on vdev guid 18180224580327100979 at '/dev/da0'. > > >> Jun 20 05:28:33 zfs kernel: cpuid = 0 > > >> Jun 20 05:28:33 zfs kernel: Uptime: 8d9h7m9s > > >> Jun 20 05:28:33 zfs kernel: Dumping 6445 out of 8174 > > >> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > > >> > > >> Which leads me to believe that /dev/da0 went out on vacation, > leaving > > >> ZFS into trouble.... But the array is: > > >> ---- > > >> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP > DEDUP > > >> zfsraid 32.5T 13.3T 19.2T - 7% 41% > 1.00x > > >> ONLINE - > > >> raidz2 16.2T 6.67T 9.58T - 8% 41% > > >> da0 - - - - - - > > >> da1 - - - - - - > > >> da2 - - - - - - > > >> da3 - - - - - - > > >> da4 - - - - - - > > >> da5 - - - - - - > > >> raidz2 16.2T 6.67T 9.58T - 7% 41% > > >> da6 - - - - - - > > >> da7 - - - - - - > > >> ada4 - - - - - - > > >> ada5 - - - - - - > > >> ada6 - - - - - - > > >> ada7 - - - - - - > > >> mirror 504M 1.73M 502M - 39% 0% > > >> gpt/log0 - - - - - - > > >> gpt/log1 - - - - - - > > >> cache - - - - - - > > >> gpt/raidcache0 109G 1.34G 107G - 0% 1% > > >> gpt/raidcache1 109G 787M 108G - 0% 0% > > >> ---- > > >> > > >> And thus I'd would have expected that ZFS would disconnect > > /dev/da0 and > > >> then switch to DEGRADED state and continue, letting the operator > > fix the > > >> broken disk. > > >> Instead it chooses to panic, which is not a nice thing to do. :) > > >> > > >> Or do I have to high hopes of ZFS? > > >> > > >> Next question to answer is why this WD RED on: > > >> > > >> arcmsr0@pci0:7:14:0: class=0x010400 card=0x112017d3 > > chip=0x112017d3 > > >> rev=0x00 hdr=0x00 > > >> vendor = 'Areca Technology Corp.' > > >> device = 'ARC-1120 8-Port PCI-X to SATA RAID Controller' > > >> class = mass storage > > >> subclass = RAID > > >> > > >> got hung, and nothing for this shows in SMART.... > > > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org > > " > > > > > > From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 01:21:53 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 74001F09 for ; Mon, 22 Jun 2015 01:21:53 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 51A2A18E2 for ; Mon, 22 Jun 2015 01:21:53 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id F2B173F71B; Sun, 21 Jun 2015 16:32:12 -0400 (EDT) Message-ID: <55871F4C.5010103@sneakertech.com> Date: Sun, 21 Jun 2015 16:32:12 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Willem Jan Withagen CC: freebsd-fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS References: <5585767B.4000206@digiware.nl> <558590BD.40603@isletech.net> <5586C396.9010100@digiware.nl> In-Reply-To: <5586C396.9010100@digiware.nl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 01:21:53 -0000 > Or do I have to high hopes of ZFS? > And is a hung disk a 'catastrophic pool failure'? Yes to both. I encountered this exact same issue a couple years ago (and complained about it to this list as well, although I didn't get a complete answer at the time. I can provide links to the conversation if interested). Basically, the heart of the issue is the way the kernel/drivers/ZFS deals with IO and DMA. There's currently no way to tell what's going on with the disks and what outstanding IO to the pool can be dropped or ignored. As-currently-designed there's no safe way to just kick out the pool and keep going, so the only options are to wait, panic, or wait and then panic. Fixing this would require a major rewrite of a lot of code, which isn't going to happen any time soon. The failmode setting and deadman timer were implemented as a bandage to prevent the system from hanging forever. See this page for more info: http://comments.gmane.org/gmane.os.illumos.zfs/61 > All failmode settings result in a seriously handicapped system... Yes. Again, this is a design issue/flaw with how DMA works. There's no real way to continue on gracefully when a pool completely dies due to hung IO. We're all pretty much stuck with this problem, at least for quite a while. > Is waiting only meant to wait a limited time? And then panic anyways? By default yes. However, if you know that on your system the issue will eventually resolve itself given several hours (and you want to wait that long) you can change the deadman timeout or disable it completely. Look at "vfs.zfs.deadman_enabled" and "vfs.zfs.deadman_synctime". From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 01:36:47 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 95E8C73 for ; Mon, 22 Jun 2015 01:36:47 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from hub.freebsd.org (hub.freebsd.org [8.8.178.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hub.freebsd.org", Issuer "hub.freebsd.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 798DE1E3E for ; Mon, 22 Jun 2015 01:36:47 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: by hub.freebsd.org (Postfix) id 6EFFD72; Mon, 22 Jun 2015 01:36:47 +0000 (UTC) Delivered-To: fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6D37071 for ; Mon, 22 Jun 2015 01:36:47 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4B0B71E3D for ; Mon, 22 Jun 2015 01:36:47 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id CABD33F71F; Sun, 21 Jun 2015 16:49:46 -0400 (EDT) Message-ID: <5587236A.6020404@sneakertech.com> Date: Sun, 21 Jun 2015 16:49:46 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Willem Jan Withagen CC: fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS References: <5585767B.4000206@digiware.nl> In-Reply-To: <5585767B.4000206@digiware.nl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 01:36:47 -0000 Also: > And thus I'd would have expected that ZFS would disconnect /dev/da0 and > then switch to DEGRADED state and continue, letting the operator fix the > broken disk. > Next question to answer is why this WD RED on: > got hung, and nothing for this shows in SMART.... You have a raidz2, which means THREE disks need to go down before the pool is unwritable. The problem is most likely your controller or power supply, not your disks. Also2: don't rely too much on SMART for determining drive health. Google released a paper a few years ago revealing that half of all drives die without reporting SMART errors. http://research.google.com/archive/disk_failures.pdf From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 01:49:49 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DF46D22E for ; Mon, 22 Jun 2015 01:49:49 +0000 (UTC) (envelope-from michelle@sorbs.net) Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:1900:2254:206c::16:88]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hub.freebsd.org", Issuer "hub.freebsd.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id C4D18A2F for ; Mon, 22 Jun 2015 01:49:49 +0000 (UTC) (envelope-from michelle@sorbs.net) Received: by hub.freebsd.org (Postfix) id BAAE122D; Mon, 22 Jun 2015 01:49:49 +0000 (UTC) Delivered-To: fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B8E0B22C for ; Mon, 22 Jun 2015 01:49:49 +0000 (UTC) (envelope-from michelle@sorbs.net) Received: from hades.sorbs.net (hades.sorbs.net [67.231.146.201]) by mx1.freebsd.org (Postfix) with ESMTP id A6D6DA2E for ; Mon, 22 Jun 2015 01:49:49 +0000 (UTC) (envelope-from michelle@sorbs.net) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from isux.com (firewall.isux.com [213.165.190.213]) by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013)) with ESMTPSA id <0NQB00IYKPCDW900@hades.sorbs.net> for fs@freebsd.org; Sun, 21 Jun 2015 18:55:27 -0700 (PDT) Message-id: <558769B5.601@sorbs.net> Date: Mon, 22 Jun 2015 03:49:41 +0200 From: Michelle Sullivan User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.24) Gecko/20100301 SeaMonkey/1.1.19 To: Quartz Cc: Willem Jan Withagen , fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS References: <5585767B.4000206@digiware.nl> <5587236A.6020404@sneakertech.com> In-reply-to: <5587236A.6020404@sneakertech.com> X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 01:49:50 -0000 Quartz wrote: > Also: > >> And thus I'd would have expected that ZFS would disconnect /dev/da0 and >> then switch to DEGRADED state and continue, letting the operator fix the >> broken disk. > >> Next question to answer is why this WD RED on: > >> got hung, and nothing for this shows in SMART.... > > You have a raidz2, which means THREE disks need to go down before the > pool is unwritable. The problem is most likely your controller or > power supply, not your disks. > Never make such assumptions... I have worked in a professional environment where 9 of 12 disks failed within 24 hours of each other.... They were all supposed to be from different batches but due to an error they came from the same batch and the environment was so tightly controlled and the work-load was so similar that MTBF was almost identical on all 11 disks in the array... the only disk that lasted more than 2 weeks over the failure was the hotspare...! -- Michelle Sullivan http://www.mhix.org/ From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 01:50:25 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A0B8A267 for ; Mon, 22 Jun 2015 01:50:25 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (smtp.digiware.nl [31.223.170.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 30C85AFF for ; Mon, 22 Jun 2015 01:50:24 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from rack1.digiware.nl (unknown [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id 1DF1416A408; Mon, 22 Jun 2015 01:23:40 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lZryYmhesiXL; Mon, 22 Jun 2015 01:23:28 +0200 (CEST) Received: from [IPv6:2001:4cb8:3:1:a079:ce8f:c2bf:e69] (unknown [IPv6:2001:4cb8:3:1:a079:ce8f:c2bf:e69]) by smtp.digiware.nl (Postfix) with ESMTPA id BE2CF16A407; Mon, 22 Jun 2015 01:23:28 +0200 (CEST) Message-ID: <55874772.4090607@digiware.nl> Date: Mon, 22 Jun 2015 01:23:30 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Quartz CC: freebsd-fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS References: <5585767B.4000206@digiware.nl> <558590BD.40603@isletech.net> <5586C396.9010100@digiware.nl> <55871F4C.5010103@sneakertech.com> In-Reply-To: <55871F4C.5010103@sneakertech.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 01:50:25 -0000 On 21/06/2015 22:32, Quartz wrote: >> Or do I have to high hopes of ZFS? >> And is a hung disk a 'catastrophic pool failure'? > > Yes to both. > > I encountered this exact same issue a couple years ago (and complained > about it to this list as well, although I didn't get a complete answer > at the time. I can provide links to the conversation if interested). > > Basically, the heart of the issue is the way the kernel/drivers/ZFS > deals with IO and DMA. There's currently no way to tell what's going on > with the disks and what outstanding IO to the pool can be dropped or > ignored. As-currently-designed there's no safe way to just kick out the > pool and keep going, so the only options are to wait, panic, or wait and > then panic. Fixing this would require a major rewrite of a lot of code, > which isn't going to happen any time soon. The failmode setting and > deadman timer were implemented as a bandage to prevent the system from > hanging forever. > > See this page for more info: > http://comments.gmane.org/gmane.os.illumos.zfs/61 > Yes, I know of this discussion already as long as I'm working with ZFS. But reading it like this does make it much more sense. Perhaps the text should suggest some thing more painfull :), since now it suggest that chopping the disk would help... (Hence my reaction) But especially the hung disk during reading is sort of a ticking timebomb... On the other hand, if it is already outstanding for 100's of seconds, then it is never going to arrive. So the chance of running into corrupted memory is going to be 0.00000...... But I do agree that in this case a panic might be next best solution. >From another response I conclude that htere could be something in the driver/hardware combo that could run me into trouble... >> All failmode settings result in a seriously handicapped system... > > Yes. Again, this is a design issue/flaw with how DMA works. There's no > real way to continue on gracefully when a pool completely dies due to > hung IO. > > We're all pretty much stuck with this problem, at least for quite a while. We'll the pool did not die, (at least not IMHO) just one disk stopt working.... The pool is still resilient, so it could continue alert the operator have the operator fix/reboot/..... But it the fact that a stalled DMA action could "corrupt" memory after the fact. Just because a command outstanding for way too long, does all of a sudden completes. >> Is waiting only meant to wait a limited time? And then panic anyways? > > By default yes. However, if you know that on your system the issue will > eventually resolve itself given several hours (and you want to wait that > long) you can change the deadman timeout or disable it completely. Look > at "vfs.zfs.deadman_enabled" and "vfs.zfs.deadman_synctime". I see: vfs.zfs.deadman_enabled: 1 vfs.zfs.deadman_checktime_ms: 5000 vfs.zfs.deadman_synctime_ms: 1000000 So the "hung" on this I/O action has taken 1000 secs, and did not complete... I guess that if I like to live dangerously, I could set enabled to 0, and run the risk... ?? Probably the better solution is to see if it occurs more often, and in that case upgrade to modern hardware with newer HD controllers. Current config has worked for me for already quite some time. For the time being I'll offload a few disks to a mvs controller that is the only PCIe slot in the MB. Thanx for all the info, --WjW From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 01:51:47 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 58BB82AF for ; Mon, 22 Jun 2015 01:51:47 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 36951D52 for ; Mon, 22 Jun 2015 01:51:46 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id 5053B3F721 for ; Sun, 21 Jun 2015 17:05:23 -0400 (EDT) Message-ID: <55872712.2090800@sneakertech.com> Date: Sun, 21 Jun 2015 17:05:22 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Freebsd fs Subject: Re: ZFS pool restructuring and emergency repair References: <5584C0BC.9070707@sneakertech.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 01:51:47 -0000 > You can use 'send' and 'receive' to send all the data and the metadata > associated with that data. I believe that you are correct that > filesystem properties (like 'compression') are not preserved. Hmm... I may have to just create a bunch of dummy pools and see what is and isn't copied. Was trying to avoid that. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 02:31:49 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EE8A3545 for ; Mon, 22 Jun 2015 02:31:49 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:1900:2254:206c::16:88]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hub.freebsd.org", Issuer "hub.freebsd.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id D3D5215F for ; Mon, 22 Jun 2015 02:31:49 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: by hub.freebsd.org (Postfix) id C9969544; Mon, 22 Jun 2015 02:31:49 +0000 (UTC) Delivered-To: fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C9027543 for ; Mon, 22 Jun 2015 02:31:49 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A693C15B for ; Mon, 22 Jun 2015 02:31:49 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id DD3E83F6E0; Sun, 21 Jun 2015 22:31:47 -0400 (EDT) Message-ID: <55877393.3040704@sneakertech.com> Date: Sun, 21 Jun 2015 22:31:47 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Michelle Sullivan CC: fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS References: <5585767B.4000206@digiware.nl> <5587236A.6020404@sneakertech.com> <558769B5.601@sorbs.net> In-Reply-To: <558769B5.601@sorbs.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 02:31:50 -0000 >> You have a raidz2, which means THREE disks need to go down before the >> pool is unwritable. The problem is most likely your controller or >> power supply, not your disks. >> > Never make such assumptions... > > I have worked in a professional environment where 9 of 12 disks failed > within 24 hours of each other.... Right... but if that was his problem there should be some logs of the other drives going down first, and typically ZFS would correctly mark the pool as degraded (at least, it would in my testing). The fact that ZFS didn't get a chance to log anything and the pool came back up healthy leads me to believe the controller went south, taking several disks with it all at once and totally borking all IO. (Either that or what Tom Curry mentioned about the Arc issue, which I wasn't previously aware of). Of course, if it issue isn't repeatable then who knows.... From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 07:16:45 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2794240B for ; Mon, 22 Jun 2015 07:16:45 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from cu1176c.smtpx.saremail.com (cu1176c.smtpx.saremail.com [195.16.148.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DA52CFC for ; Mon, 22 Jun 2015 07:16:44 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11]) by proxypop02.sare.net (Postfix) with ESMTPSA id C45F49DC6A7; Mon, 22 Jun 2015 09:16:34 +0200 (CEST) Subject: Re: ZFS pool restructuring and emergency repair Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=utf-8 From: Borja Marcos In-Reply-To: <5584F83D.1040702@egr.msu.edu> Date: Mon, 22 Jun 2015 09:16:31 +0200 Cc: freebsd-fs@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <5584C0BC.9070707@sneakertech.com> <5584F83D.1040702@egr.msu.edu> To: Adam McDougall X-Mailer: Apple Mail (2.1283) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 07:16:45 -0000 On Jun 20, 2015, at 7:21 AM, Adam McDougall wrote: > The manpage for zfs says: (under zfs send) >=20 > -R Generate a replication stream package, which will replicate > the specified filesystem, and all descendent file systems, up > to the named snapshot. When received, all properties, snap=E2=80=90= > shots, descendent file systems, and clones are preserved. And that includes compression. However, the send format was committed = before snapshot holds were introduced.=20 If you rely on holds to avoid accidental snapshot deletion, remember = that holds will not be replicated. Borja. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 07:43:20 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8C9B25CB for ; Mon, 22 Jun 2015 07:43:20 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6B8D59F for ; Mon, 22 Jun 2015 07:43:20 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id AD7E63F6AB for ; Mon, 22 Jun 2015 03:43:18 -0400 (EDT) Message-ID: <5587BC96.9090601@sneakertech.com> Date: Mon, 22 Jun 2015 03:43:18 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Freebsd fs Subject: Re: ZFS pool restructuring and emergency repair References: <5584C0BC.9070707@sneakertech.com> In-Reply-To: <5584C0BC.9070707@sneakertech.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 07:43:20 -0000 > - A server is set up with a pool created a certain way, for the sake of > argument let's say it's a raidz-2 comprised of 6x 2TB disks. There's > only actually ~1TB of data currently on the server though. Let's say > there's a catastrophic emergency where one of the disks needs to be > replaced, but the only available spare is an old 500GB. As I understand > it, you're basically SOL. Even though a 6x500 (really 4x500) is more > than enough to hold 1Tb of data, you can't do anything in this situation > since although ZFS can expand a pool to fit larger disks, it can't > shrink one under any circumstance. Is my understanding still correct or > is there a way around this issue now? So I take it that, aside from messing with a gvirstor/ sparse disk image, there's still no way to really handle this because there's still no way to shrink a pool after creation? From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 08:14:57 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7771187D for ; Mon, 22 Jun 2015 08:14:57 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5839C1148 for ; Mon, 22 Jun 2015 08:14:57 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id 47A2C3F6CF for ; Mon, 22 Jun 2015 04:14:56 -0400 (EDT) Message-ID: <5587C3FF.9070407@sneakertech.com> Date: Mon, 22 Jun 2015 04:14:55 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: FreeBSD FS Subject: ZFS raid write performance? Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 08:14:57 -0000 What's sequential write performance like these days for ZFS raidzX? Someone suggested to me that I set up a single not-raid disk to act as a fast 'landing pad' for receiving files, then move them to the pool later in the background. Is that actually necessary? (Assume generic sata drives, 250mb-4gb sized files, and transfers are across a LAN using single unbonded GigE). From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 08:38:25 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1F8AB941 for ; Mon, 22 Jun 2015 08:38:25 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [64.62.153.212]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "anubis.delphij.net", Issuer "StartCom Class 1 Primary Intermediate Server CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 05E111C66 for ; Mon, 22 Jun 2015 08:38:24 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from Xins-MBP.home.us.delphij.net (c-71-202-112-39.hsd1.ca.comcast.net [71.202.112.39]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by anubis.delphij.net (Postfix) with ESMTPSA id 510071A0B9; Mon, 22 Jun 2015 01:38:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphij.net; s=anubis; t=1434962304; x=1434976704; bh=Y3NEldf89aiDa/+PxsZYkArRrEneAOaFr7bHntnjBaA=; h=Date:From:To:Subject:References:In-Reply-To; b=wWg/4XlgMZs4ce/LtrISQnKuJyvV9Dz71WP7pKvbwh8rTlnxLjKhIrYrjvByXfbbl OvpvXgJEmLku9OUoWX9qO0zYnlZGHRJUFOS6nFXAJ49eiR7DCuwzyRJWvoVGUsfvUN hdbr0ot9Vk14FKTyQD318dlCDUAbEh/KXMxF01k8= Message-ID: <5587C97F.2000407@delphij.net> Date: Mon, 22 Jun 2015 01:38:23 -0700 From: Xin Li User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Quartz , FreeBSD FS Subject: Re: ZFS raid write performance? References: <5587C3FF.9070407@sneakertech.com> In-Reply-To: <5587C3FF.9070407@sneakertech.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 08:38:25 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 6/22/15 01:14, Quartz wrote: > What's sequential write performance like these days for ZFS > raidzX? Someone suggested to me that I set up a single not-raid > disk to act as a fast 'landing pad' for receiving files, then move > them to the pool later in the background. Is that actually > necessary? (Assume generic sata drives, 250mb-4gb sized files, and > transfers are across a LAN using single unbonded GigE). That sounds really weird recommendation IMHO. Did "someone" explained with the reasoning/benefit of that "landing pad"? I don't have hardware for testing handy, but IIRC even with 10,000 RPM hard drives, a single hard drive won't do much beyond 100MB/s (maybe 120MB/s max) for sequential 128kB blocks, so that "landing pad" would probably not very helpful assuming you can saturate your GigE network (and keep in mind that with a file system in place it's not a perfect sequential operation; plus if there is something wrong with that hard drive you will have to start over rather than just replacing the bad drive). Cheers, -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJVh8l+AAoJEJW2GBstM+nsD8EP/RHR8Oiqf6FFVG4LT+CSqXLc GIsSqaR/6/l04Ah0ixTkaubNvOELPlFZdFKQDtNd2u71G2Z7XtMbNvOK3G7whOxC 6a5xdNfdIYs7lq3jatN79BP9dygtgICsb1oMrCyAzd/tQc+cTvPabC/OxR4TtEJn ZumP6LworIDGp1ruMrmQ7VvcOKhCxzs4VO7G8Lcj/WkhzR3TDEsZuzzqefWg1RlO SBWJEwMGUugKWOCvgm8eQ2Hmw3btYbee1wfzuojtRN+d+IS8PtmsFpGBo8PCRSb8 lPz1Cf1fY4/zwruiG4EI+0CFvfr/05rN6DBRolyctdCGY1zX4rgKu6DT62kFkUR7 1nQdwxQ9slsQck1vyfAv2nIlGU530E696ZoS8/Ppqi/P8IqktYDLXKMn9+l0s+y+ EDzfvITasvwa6GRp5oxD2wagMjhvJ9iwELBLsppbjNH2i6n6k7EUSD1WGDHyQI2O irzm7ecRd5mym14Ruk0PxOAkuRrWhIdkSEHWrK1V5MZolIMw7MTf/gzNJPDIG0tZ MP4JmaOlysmHwIxoDLwAVlfuwweT3496miRbDvjzBrexkBvOVcIQdtymhZJmGe/z DoejzWQvub5CbsDVbNAVW6HBppbW2MEqby4zyzl/Ae/IzsvYKAdVTQdmICO7wqNz XWCqRSAjysOM5RDHoyXf =Newc -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 12:21:57 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 27EFC510 for ; Mon, 22 Jun 2015 12:21:57 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-wi0-x22e.google.com (mail-wi0-x22e.google.com [IPv6:2a00:1450:400c:c05::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A4928686 for ; Mon, 22 Jun 2015 12:21:56 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: by wibdq8 with SMTP id dq8so73456303wib.1 for ; Mon, 22 Jun 2015 05:21:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=wBzx43O/qR5X0aWL9TieLLYZUa+aZXD0oGWkEMHv0XQ=; b=KB7x/ECJa+uaKCGiwQGjru99p0Nmh/c//gGsKaytPv3ZBrMR7Nxb9XTNOB+kUJEKki Wn2cKfjFqspeqCOC5nGEzqw8c8eam6EM92bwUBZz1KKZHUIDe0SBhvbILTqP1MYOLygV HSaQ8OZfx0V/gK90MzOKTmn+WlHX+0Yw7b/gRjkC3Knlyg8mmHGnYlFpZajhuQXsjBc0 inbGAvMQJGeB3iGtkEgKD/rgyaFgJ/HHVZnh6ex4uyRKH4qkPZaZvq3xN7IjWzOOof12 3CblWvYgESyiTJ2jHfxgCPw6Z8/qnXRAsvQ9fsGxVENw3WNo56R65gl1+fdnwV3spm65 WG7Q== MIME-Version: 1.0 X-Received: by 10.194.176.68 with SMTP id cg4mr51298644wjc.106.1434975715106; Mon, 22 Jun 2015 05:21:55 -0700 (PDT) Received: by 10.180.73.5 with HTTP; Mon, 22 Jun 2015 05:21:55 -0700 (PDT) In-Reply-To: <20150622121343.GB60684@neutralgood.org> References: <5587C3FF.9070407@sneakertech.com> <20150622121343.GB60684@neutralgood.org> Date: Mon, 22 Jun 2015 13:21:55 +0100 Message-ID: Subject: Re: ZFS raid write performance? From: krad To: kpneal@pobox.com Cc: Quartz , FreeBSD FS Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 12:21:57 -0000 also ask yourself how big the data transfer is going to be. If its only a few gigs or 10s of gigs at a time and not streaming you could well find its all dumped to the ram on the box anyhow before its committed to the disk. With regards to 10k disks, be careful there as more modern higher platter capacity 7k disks might give better throughput due to the higher data density. On 22 June 2015 at 13:13, wrote: > On Mon, Jun 22, 2015 at 04:14:55AM -0400, Quartz wrote: > > What's sequential write performance like these days for ZFS raidzX? > > Someone suggested to me that I set up a single not-raid disk to act as a > > fast 'landing pad' for receiving files, then move them to the pool later > > in the background. Is that actually necessary? (Assume generic sata > > drives, 250mb-4gb sized files, and transfers are across a LAN using > > single unbonded GigE). > > Tests were posted to ZFS lists a few years ago. That was a while ago, but > at a fundamental level ZFS hasn't changed since then so the results should > still be valid. > > For both reads and writes all levels of raidz* perform slightly faster > than the speed of a single drive. _Slightly_ faster, like, the speed of > a single drive * 1.1 or so roughly speaking. > > For mirrors, writes perform about the same as a single drive, and as more > drives are added they get slightly worse. But reads scale pretty well as > you add drives because reads can be spread across all the drives in the > mirror in parallel. > > Having multiple vdevs helps because ZFS does striping across the vdevs. > However, this striping only happens with writes that are done _after_ new > vdevs are added. There is no rebalancing of data after new vdevs are added. > So adding new vdevs won't change the read performance of data already on > disk. > > ZFS does try to strip across vdevs, but if your old vdevs are nearly full > then adding new ones results in data mostly going to the new, nearly empty > vdevs. So if you only added a single new vdev to expand the pool then > you'll see write performance roughly equal to the performance of that > single vdev. > > Rebalancing can be done roughly with "zfs send | zfs receive". If you do > this enough times, and destroy old, sent datasets after an iteration, then > you can to some extent rebalance a pool. You won't achieve a perfect > rebalance, though. > > We can thank Oracle for the destruction of the archives at sun.com which > made it pretty darn difficult to find those posts. > > Finally, single GigE is _slow_. I see no point in a "landing pad" when > using unbonded GigE. > > -- > Kevin P. Neal http://www.pobox.com/~kpn/ > > Seen on bottom of IBM part number 1887724: > DO NOT EXPOSE MOUSE PAD TO DIRECT SUNLIGHT FOR EXTENDED PERIODS OF TIME. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 12:30:35 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3979F5B8 for ; Mon, 22 Jun 2015 12:30:35 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from hub.freebsd.org (hub.freebsd.org [8.8.178.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hub.freebsd.org", Issuer "hub.freebsd.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 1A4FFBF6 for ; Mon, 22 Jun 2015 12:30:35 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: by hub.freebsd.org (Postfix) id 0FA475B7; Mon, 22 Jun 2015 12:30:35 +0000 (UTC) Delivered-To: fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0ED265B6 for ; Mon, 22 Jun 2015 12:30:35 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (unknown [IPv6:2001:4cb8:90:ffff::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C718CBF4 for ; Mon, 22 Jun 2015 12:30:34 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from rack1.digiware.nl (unknown [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id B8E3516A403; Mon, 22 Jun 2015 14:30:29 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rV8O8kcg-_kh; Mon, 22 Jun 2015 14:30:02 +0200 (CEST) Received: from [192.168.101.176] (vpn.ecoracks.nl [31.223.170.173]) by smtp.digiware.nl (Postfix) with ESMTPA id 0AFAB16A401; Mon, 22 Jun 2015 14:30:02 +0200 (CEST) Message-ID: <5587FFCC.3080100@digiware.nl> Date: Mon, 22 Jun 2015 14:30:04 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Quartz , Michelle Sullivan CC: fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS References: <5585767B.4000206@digiware.nl> <5587236A.6020404@sneakertech.com> <558769B5.601@sorbs.net> <55877393.3040704@sneakertech.com> In-Reply-To: <55877393.3040704@sneakertech.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 12:30:35 -0000 On 22/06/2015 04:31, Quartz wrote: >>> You have a raidz2, which means THREE disks need to go down before the >>> pool is unwritable. The problem is most likely your controller or >>> power supply, not your disks. >>> >> Never make such assumptions... >> >> I have worked in a professional environment where 9 of 12 disks failed >> within 24 hours of each other.... > > Right... but if that was his problem there should be some logs of the > other drives going down first, and typically ZFS would correctly mark > the pool as degraded (at least, it would in my testing). The fact that > ZFS didn't get a chance to log anything and the pool came back up > healthy leads me to believe the controller went south, taking several > disks with it all at once and totally borking all IO. (Either that or > what Tom Curry mentioned about the Arc issue, which I wasn't previously > aware of). > > Of course, if it issue isn't repeatable then who knows.... I do not think it was a full out failure, but just one transaction that got hit by an alpha-particle... Well, remember that the hung-diagnostics timeout is 1000 sec. In the time-span before the panic nothing else was logged about disks/controllers/etc... not functioning.. Only the few secs before the panic ctl/iSCSI and the network interface started complaining that the was a memory shortage and the networkinterafce started dumping packets.... But all that was logged really nicely in syslog. So I think that in the 1000sec it took for the deadman switch to trigger, the zpool just functioned as was expected.... And the hardware somewhere lost one transaction. So I'll be crossing my fingers, and we'll see when/what/where the next crash in going to occur. And work from there.... --WjW From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 12:53:29 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 445E76F0 for ; Mon, 22 Jun 2015 12:53:29 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wg0-f51.google.com (mail-wg0-f51.google.com [74.125.82.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D5C328D0 for ; Mon, 22 Jun 2015 12:53:28 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by wguu7 with SMTP id u7so68475535wgu.3 for ; Mon, 22 Jun 2015 05:53:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=+gWXM2SXFUDiF+0aE9RCIdvoancTM7m3gVVrfhTTINw=; b=ZA64jQLRMOnI8SgrnRI/OCrAgTYMmZnhyL93RjfEgdARMxcVgWWTil/yDc+f7XFgRI w5xh9DAL63LlRUkz79hxt/Vo4XT9db/uLs3FHkmlrekA/Wvdozlb7PSYshQRVauW1/Ii nZa5oJC89eTHk9pOuQ4x/yuVwrJCUGdTe6z8pUcW4OHO1EUlCBQ4wO3FZr7dWufkTcHO InZAO2qAQl1asKB++7mAGxyNQaEL+McyJMqxmtw38teGmo1kgv/WUWiEuP1bqKZqZtme O6YoihwxUF7sA0DIbLOYQSKVjfPQDvDVBIFAy/kNqvnX3zzU4x0JIiW9HR1m5sbFBmVm hEfg== X-Gm-Message-State: ALoCoQltwhKJ+g9Z6A7gu1sAuw89l9Loefwvyv7KU4HR2m9JEu0c+O4hGYSV9JqzNweJxc8lv28M X-Received: by 10.180.36.4 with SMTP id m4mr31616356wij.34.1434977606470; Mon, 22 Jun 2015 05:53:26 -0700 (PDT) Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk. [82.69.141.170]) by mx.google.com with ESMTPSA id hn7sm30422531wjc.16.2015.06.22.05.53.25 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 22 Jun 2015 05:53:25 -0700 (PDT) Subject: Re: ZFS raid write performance? To: freebsd-fs@freebsd.org References: <5587C3FF.9070407@sneakertech.com> <20150622121343.GB60684@neutralgood.org> From: Steven Hartland Message-ID: <55880544.70907@multiplay.co.uk> Date: Mon, 22 Jun 2015 13:53:24 +0100 User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101 Thunderbird/38.0.1 MIME-Version: 1.0 In-Reply-To: <20150622121343.GB60684@neutralgood.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 12:53:29 -0000 On 22/06/2015 13:13, kpneal@pobox.com wrote: > On Mon, Jun 22, 2015 at 04:14:55AM -0400, Quartz wrote: >> What's sequential write performance like these days for ZFS raidzX? >> Someone suggested to me that I set up a single not-raid disk to act as a >> fast 'landing pad' for receiving files, then move them to the pool later >> in the background. Is that actually necessary? (Assume generic sata >> drives, 250mb-4gb sized files, and transfers are across a LAN using >> single unbonded GigE). > Tests were posted to ZFS lists a few years ago. That was a while ago, but > at a fundamental level ZFS hasn't changed since then so the results should > still be valid. > > For both reads and writes all levels of raidz* perform slightly faster > than the speed of a single drive. _Slightly_ faster, like, the speed of > a single drive * 1.1 or so roughly speaking. > > For mirrors, writes perform about the same as a single drive, and as more > drives are added they get slightly worse. But reads scale pretty well as > you add drives because reads can be spread across all the drives in the > mirror in parallel. > > Having multiple vdevs helps because ZFS does striping across the vdevs. > However, this striping only happens with writes that are done _after_ new > vdevs are added. There is no rebalancing of data after new vdevs are added. > So adding new vdevs won't change the read performance of data already on > disk. > > ZFS does try to strip across vdevs, but if your old vdevs are nearly full > then adding new ones results in data mostly going to the new, nearly empty > vdevs. So if you only added a single new vdev to expand the pool then > you'll see write performance roughly equal to the performance of that > single vdev. > > Rebalancing can be done roughly with "zfs send | zfs receive". If you do > this enough times, and destroy old, sent datasets after an iteration, then > you can to some extent rebalance a pool. You won't achieve a perfect > rebalance, though. > > We can thank Oracle for the destruction of the archives at sun.com which > made it pretty darn difficult to find those posts. > > Finally, single GigE is _slow_. I see no point in a "landing pad" when > using unbonded GigE. > Actually it has had some significant changes which are likely to effect the results as it now has an entirely new IO scheduler, so retesting would be wise. Regards Steve From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 13:17:38 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 312E1804 for ; Mon, 22 Jun 2015 13:17:38 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id ECB842B1 for ; Mon, 22 Jun 2015 13:17:37 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id t5MDHZlJ004931; Mon, 22 Jun 2015 08:17:36 -0500 (CDT) Date: Mon, 22 Jun 2015 08:17:35 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Quartz cc: FreeBSD FS Subject: Re: ZFS raid write performance? In-Reply-To: <5587C3FF.9070407@sneakertech.com> Message-ID: References: <5587C3FF.9070407@sneakertech.com> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Mon, 22 Jun 2015 08:17:36 -0500 (CDT) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 13:17:38 -0000 On Mon, 22 Jun 2015, Quartz wrote: > What's sequential write performance like these days for ZFS raidzX? Someone > suggested to me that I set up a single not-raid disk to act as a fast > 'landing pad' for receiving files, then move them to the pool later in the > background. Is that actually necessary? (Assume generic sata drives, > 250mb-4gb sized files, and transfers are across a LAN using single unbonded > GigE). The primary determinant of write performance is if the writes are synchronous or not, With synchronous writes, the data is comitted to non-volatile storage before responding to the requestor. With asyncronous writes, the data only needs to be written into RAM before responding to the requestor. Writes over NFS 3 are synchronous. Writes over CIFS/Samba are likely not. For good performance with synchronous writes, some sort of non-volatile write cache (e.g. dedicated zfs intent log "slog", controller NVRAM) is advised. Use multiple sets of mirrors for maximum write performance with multiple clients. Even 10 years old hardware should be able to keep up with gigabit Ethernet rates (< 100MB/s) given a reasonable disk subsystem. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 14:40:32 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B79F4CC9; Mon, 22 Jun 2015 14:40:32 +0000 (UTC) (envelope-from ler@lerctr.org) Received: from thebighonker.lerctr.org (thebighonker.lerctr.org [IPv6:2001:470:1f0f:3ad:223:7dff:fe9e:6e8a]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "thebighonker.lerctr.org", Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 35D5DF75; Mon, 22 Jun 2015 14:40:32 +0000 (UTC) (envelope-from ler@lerctr.org) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lerctr.org; s=lerami; h=Message-ID:References:In-Reply-To:Subject:Cc:To:From:Date:Content-Transfer-Encoding:Content-Type:MIME-Version; bh=Yi41SDOQERjINs7dO92ot8H+LI5VBrnGPXNmmPBB1UY=; b=FGf2pUx52k7tgYZPoZfvQrZgxCwk9Zu7dYgMLNC/xc7VBERSasW77qovs9b5ZBbLCC+IOVr+8w/S+C2c2ZCTkNNZnd7ww5u0hSY6fKhvLsq+cOVLNn4r5iHf/ry81NMfWioV21oirGrGYES26wFAxF6wst9bmWXtpA4kw51KH/U=; Received: from thebighonker.lerctr.org ([2001:470:1f0f:3ad:223:7dff:fe9e:6e8a]:43395 helo=webmail.lerctr.org) by thebighonker.lerctr.org with esmtpsa (TLSv1:DHE-RSA-AES128-SHA:128) (Exim 4.85 (FreeBSD)) (envelope-from ) id 1Z72th-0005w4-8N; Mon, 22 Jun 2015 09:40:29 -0500 Received: from 104-54-221-134.lightspeed.austtx.sbcglobal.net ([104.54.221.134]) by webmail.lerctr.org with HTTP (HTTP/1.1 POST); Mon, 22 Jun 2015 09:40:29 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 22 Jun 2015 09:40:29 -0500 From: Larry Rosenman To: Rick Macklem Cc: Freebsd fs , rmacklem@freebsd.org, Freebsd current Subject: Re: NFS Mount and LARGE amounts of "INACT" memory In-Reply-To: <7f8b3449973cff790d996bb1f169b8e0@thebighonker.lerctr.org> References: <228350188.61172889.1434758295576.JavaMail.root@uoguelph.ca> <7f8b3449973cff790d996bb1f169b8e0@thebighonker.lerctr.org> Message-ID: <06abcbf4fab73f3c0ba711269934e0ea@thebighonker.lerctr.org> X-Sender: ler@lerctr.org User-Agent: Roundcube Webmail/1.1.1 X-Spam-Score: -1.0 (-) X-LERCTR-Spam-Score: -1.0 (-) X-Spam-Report: SpamScore (-1.0/5.0) ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 X-LERCTR-Spam-Report: SpamScore (-1.0/5.0) ALL_TRUSTED=-1, SHORTCIRCUIT=-0.0001 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 14:40:32 -0000 On 2015-06-19 19:30, Larry Rosenman wrote: > On 2015-06-19 19:00, Larry Rosenman wrote: >> On 2015-06-19 18:58, Rick Macklem wrote: >>> Larry Rosenman wrote: >>>> On 2015-06-17 07:26, Larry Rosenman wrote: >>>> > I have a 64G memory FreeBSD 11-CURRENT system that has a couple of >>>> > mounts to a FreeNAS (FreeBSD 9.3) system. >>>> > >>>> > When my rsync from a different system to one of the NFS mounts >>>> > runs, I >>>> > get like 48G of Inactive memory that goes back to >>>> > free if I umount the share. >>>> > >>>> > I'm wondering why this memory moves from ZFS ARC to INACT. >>>> > >>>> > And, is this expected? >>> A wild ass guess would be yes. Assuming you are referring to the NFS >>> client (and not FreeNAS server) and guessing that rsync uses mmap'd >>> I/O... >>> - The pages will be associated with the file's vnode until that vnode >>> is recycled. (mmap'd I/O can continue after the file is closed.) >>> This could take a long time. >>> I am not knowledgible w.r.t. the VM subsystem, but I'm guessing that >>> there is some way for these pages to be reused if memory is limited? >>> (Hopefully someone with VM knowledge can comment on this?) >>> >> Yes, this is the NFS Client, not sure on mmap(2), but that would make >> sense >> >> BUT, I don't like that it kills my ZFS ARC.... >> >> VM Guys? >> > BTW, a quick grep if the rsync sources shows it does NOT use mmap, but > has some mmap-like routines, > so I'm at a loss.... Adding in -CURRENT for the VM guys..... > >>> rick >>> >>>> I've posted screenshots at: >>>> >>>> http://www.lerctr.org/~ler/FreeBSD_inact/ >>>> >>>> >>>> -- >>>> Larry Rosenman http://www.lerctr.org/~ler >>>> Phone: +1 214-642-9640 E-Mail: ler@lerctr.org >>>> US Mail: 108 Turvey Cove, Hutto, TX 78634-5688 >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to >>>> "freebsd-fs-unsubscribe@freebsd.org" >>>> -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: ler@lerctr.org US Mail: 108 Turvey Cove, Hutto, TX 78634-5688 From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 15:50:55 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9DE6E1DB for ; Mon, 22 Jun 2015 15:50:55 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:1900:2254:206c::16:88]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hub.freebsd.org", Issuer "hub.freebsd.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 7F2A4625 for ; Mon, 22 Jun 2015 15:50:55 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: by hub.freebsd.org (Postfix) id 74C031DA; Mon, 22 Jun 2015 15:50:55 +0000 (UTC) Delivered-To: fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 73E0F1D9 for ; Mon, 22 Jun 2015 15:50:55 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (smtp.digiware.nl [31.223.170.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3512A623 for ; Mon, 22 Jun 2015 15:50:54 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from rack1.digiware.nl (unknown [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id 11BA916A401; Mon, 22 Jun 2015 17:50:51 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WXS0Lf198po6; Mon, 22 Jun 2015 17:50:23 +0200 (CEST) Received: from [192.168.101.176] (vpn.ecoracks.nl [31.223.170.173]) by smtp.digiware.nl (Postfix) with ESMTPA id E44FB16A402; Mon, 22 Jun 2015 17:41:17 +0200 (CEST) Message-ID: <55882C9F.8020507@digiware.nl> Date: Mon, 22 Jun 2015 17:41:19 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Michelle Sullivan , Quartz CC: fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS References: <5585767B.4000206@digiware.nl> <5587236A.6020404@sneakertech.com> <558769B5.601@sorbs.net> In-Reply-To: <558769B5.601@sorbs.net> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 15:50:55 -0000 On 22/06/2015 03:49, Michelle Sullivan wrote: > Quartz wrote: >> Also: >> >>> And thus I'd would have expected that ZFS would disconnect /dev/da0 and >>> then switch to DEGRADED state and continue, letting the operator fix the >>> broken disk. >> >>> Next question to answer is why this WD RED on: >> >>> got hung, and nothing for this shows in SMART.... >> >> You have a raidz2, which means THREE disks need to go down before the >> pool is unwritable. The problem is most likely your controller or >> power supply, not your disks. >> > Never make such assumptions... > > I have worked in a professional environment where 9 of 12 disks failed > within 24 hours of each other.... They were all supposed to be from > different batches but due to an error they came from the same batch and > the environment was so tightly controlled and the work-load was so > similar that MTBF was almost identical on all 11 disks in the array... > the only disk that lasted more than 2 weeks over the failure was the > hotspare...! > Scary (non)-statistics.... Theories are always nice, but this sort of experiences make your hair go grey overnight. --WjW From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 16:04:53 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9BA7B2E9 for ; Mon, 22 Jun 2015 16:04:53 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wg0-f49.google.com (mail-wg0-f49.google.com [74.125.82.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 38A50D90 for ; Mon, 22 Jun 2015 16:04:52 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by wguu7 with SMTP id u7so73463155wgu.3 for ; Mon, 22 Jun 2015 09:04:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=nKSgCEWwSNgLooc2Y/DvM1oukiioE6h8hy1ROiLHXJk=; b=hr31UO5j5gWk6gqVDjB631G/j+Whhlz1jO5sXUCJoVWklPhCwaxFQ4tQDqG4Xv1amV QHr8iN3pMlskaiLdq+iqpEEh7kDVt++s2ESa/baw/Y8Jebo96TR4EROpkp6ZRwRN/erN m300+vF8DBppFBOOyuv4hsZ7rkl3YLtzVDbhimKX78eK7qKHpmM5v2ydbZeXx9FmRpd+ DKDW0A1PNHwt0lB00PLDrT4d4hcNmwLuPXZ3FScq4mATD+IlVkIYA23l1vtCB/2Q31Te IT4TJMSuMbxAmCDD2et3VRy4XfZVszSGpiKcPYXt4bjYK0db2FfAoqLKX8BWjJUJZmGy /alg== X-Gm-Message-State: ALoCoQmsC8i+sFc0uSIzO4MuPxOXg0Tvl8c/I2qrK1yFRKtuAVEbB+DfAHeuj/sPIuqNJujgz27T X-Received: by 10.194.109.36 with SMTP id hp4mr51614154wjb.4.1434989085051; Mon, 22 Jun 2015 09:04:45 -0700 (PDT) Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk. [82.69.141.170]) by mx.google.com with ESMTPSA id fo13sm17870049wic.0.2015.06.22.09.04.43 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 22 Jun 2015 09:04:43 -0700 (PDT) Subject: Re: ZFS raid write performance? To: kpneal@pobox.com References: <5587C3FF.9070407@sneakertech.com> <20150622121343.GB60684@neutralgood.org> <55880544.70907@multiplay.co.uk> <20150622153056.GA96798@neutralgood.org> Cc: freebsd-fs@freebsd.org From: Steven Hartland Message-ID: <5588321A.4060102@multiplay.co.uk> Date: Mon, 22 Jun 2015 17:04:42 +0100 User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101 Thunderbird/38.0.1 MIME-Version: 1.0 In-Reply-To: <20150622153056.GA96798@neutralgood.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 16:04:53 -0000 On 22/06/2015 16:30, kpneal@pobox.com wrote: > On Mon, Jun 22, 2015 at 01:53:24PM +0100, Steven Hartland wrote: >> On 22/06/2015 13:13, kpneal@pobox.com wrote: >>> On Mon, Jun 22, 2015 at 04:14:55AM -0400, Quartz wrote: >>>> What's sequential write performance like these days for ZFS raidzX? >>>> Someone suggested to me that I set up a single not-raid disk to act as a >>>> fast 'landing pad' for receiving files, then move them to the pool later >>>> in the background. Is that actually necessary? (Assume generic sata >>>> drives, 250mb-4gb sized files, and transfers are across a LAN using >>>> single unbonded GigE). >>> Tests were posted to ZFS lists a few years ago. That was a while ago, but >>> at a fundamental level ZFS hasn't changed since then so the results should >>> still be valid. >>> >>> For both reads and writes all levels of raidz* perform slightly faster >>> than the speed of a single drive. _Slightly_ faster, like, the speed of >>> a single drive * 1.1 or so roughly speaking. >>> >>> For mirrors, writes perform about the same as a single drive, and as more >>> drives are added they get slightly worse. But reads scale pretty well as >>> you add drives because reads can be spread across all the drives in the >>> mirror in parallel. >>> >>> Having multiple vdevs helps because ZFS does striping across the vdevs. >>> However, this striping only happens with writes that are done _after_ new >>> vdevs are added. There is no rebalancing of data after new vdevs are added. >>> So adding new vdevs won't change the read performance of data already on >>> disk. >>> >>> ZFS does try to strip across vdevs, but if your old vdevs are nearly full >>> then adding new ones results in data mostly going to the new, nearly empty >>> vdevs. So if you only added a single new vdev to expand the pool then >>> you'll see write performance roughly equal to the performance of that >>> single vdev. >>> >>> Rebalancing can be done roughly with "zfs send | zfs receive". If you do >>> this enough times, and destroy old, sent datasets after an iteration, then >>> you can to some extent rebalance a pool. You won't achieve a perfect >>> rebalance, though. >>> >>> We can thank Oracle for the destruction of the archives at sun.com which >>> made it pretty darn difficult to find those posts. >>> >>> Finally, single GigE is _slow_. I see no point in a "landing pad" when >>> using unbonded GigE. >>> >> Actually it has had some significant changes which are likely to effect >> the results as it now has >> an entirely new IO scheduler, so retesting would be wise. > And this affects which parts of my post? > > Reading and writing to a raidz* requires touching all or almost all of > the disks. > > Writing to a mirror requires touching all the disks. Reading from a mirror > requires touching one disk. Yes however if you get say a 10% improvement on scheduling said writes / reads then the overall impact will be noticeable. > That hasn't changed. I'm skeptical that a new way of doing the same thing > would change the results that much, especially for a large stream of > data. > > I can see a new I/O scheduler being more _fair_, but that only applies > when the box has multiple things going on. A concrete example for mirrors will performance when dealing with 3 readers demonstrated an increased in throughput from 168MB/s to 320MB/s with prefetch and without prefetch that was 95MB/s increased to 284MB/s in our testing, so significant differences. This is a rather extreme example, but there's never any harm in re-testing to avoid using incorrect assumptions ;-) Regards Steve From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 18:58:23 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 08A01F12 for ; Mon, 22 Jun 2015 18:58:23 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C5625339 for ; Mon, 22 Jun 2015 18:58:22 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id t5MIwJar014990; Mon, 22 Jun 2015 13:58:20 -0500 (CDT) Date: Mon, 22 Jun 2015 13:58:19 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: kpneal@pobox.com cc: freebsd-fs@freebsd.org Subject: Re: ZFS raid write performance? In-Reply-To: <20150622153056.GA96798@neutralgood.org> Message-ID: References: <5587C3FF.9070407@sneakertech.com> <20150622121343.GB60684@neutralgood.org> <55880544.70907@multiplay.co.uk> <20150622153056.GA96798@neutralgood.org> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Mon, 22 Jun 2015 13:58:20 -0500 (CDT) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 18:58:23 -0000 On Mon, 22 Jun 2015, kpneal@pobox.com wrote: > Reading and writing to a raidz* requires touching all or almost all of > the disks. > > Writing to a mirror requires touching all the disks. Reading from a mirror > requires touching one disk. Keep in mind that for the same number of disks, using mirrors results in more vdevs and less use of precious IOPS. Also, using mirrors results in larger I/O requests since zfs blocks don't need to be fragmented. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 20:46:59 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6B3DF723 for ; Mon, 22 Jun 2015 20:46:59 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 46E271C8 for ; Mon, 22 Jun 2015 20:46:58 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id 903A63F732; Mon, 22 Jun 2015 16:46:57 -0400 (EDT) Message-ID: <55887441.70605@sneakertech.com> Date: Mon, 22 Jun 2015 16:46:57 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Todd Russell CC: FreeBSD FS Subject: Re: ZFS pool restructuring and emergency repair References: <5584C0BC.9070707@sneakertech.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 20:46:59 -0000 > I hate to jump in and be "that guy" but, seriously, if you are using > this for something crucial, are you really going to risk it all with "an > old 500GB"? Drives aren't so expensive that you can't afford to buy a > spare match to keep on the side until such a day occurs. That's a fair point, but the question here is "catastrophic emergency" that takes out all of your spares and you have to limp by on something you found under the couch for a day or two until you can get new drives in. You can imagine whatever contrived situation is most likely in your case. My main concern is that ZFS is just kinda inflexible about some things, (especially disk/pool configurations) and that has the potential to cause real problems in some situations. I've seen a lot of things happen against the odds through the years, so I like to plan ahead as much as possible and try to figure out what my options are for mitigating those risks. Part of that means periodically asking around to see what's changed that I might have missed. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 21:02:23 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5C5B19C2 for ; Mon, 22 Jun 2015 21:02:23 +0000 (UTC) (envelope-from alex.burlyga.ietf@gmail.com) Received: from mail-yk0-x233.google.com (mail-yk0-x233.google.com [IPv6:2607:f8b0:4002:c07::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 17760FE9 for ; Mon, 22 Jun 2015 21:02:23 +0000 (UTC) (envelope-from alex.burlyga.ietf@gmail.com) Received: by ykfy125 with SMTP id y125so22371148ykf.1 for ; Mon, 22 Jun 2015 14:02:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=BvHgdACUUfkUJRRFCYXCAPPQfz+knpZwZkYbMrBS34g=; b=TQ964hBhA15gIHnC3fVQM1HX/3iJGUSxVzd5xpQ854Js7S6OsMm3o8DZavUOscoAMU snvOYePogNLJAXPlCx5/3fNcsuWjIXXnF0GjPoBzbrCiKkjySuVf4iNjERf5ApXc6fYR w9QKjW6/Hs5WmZwlQ+7JSblE6qO5jLvTuaqJgFo0dTFnLpZGJ8BQ2415Zv/0ySIHXxKt dJrfKKyC5CL/ENUDepBSf1rq09mP7vf+NMRMq3Z2tYjzP+ZCj0V8TuFmQQfaHi2AMiHN aePS4JimbQHKkiABvH74mqj35bKnb8fvH6NCokkJo7ZrHobh+HYK+qm8OjqSn24hKGSr 7uoA== MIME-Version: 1.0 X-Received: by 10.170.223.131 with SMTP id p125mr38768155ykf.47.1435006942126; Mon, 22 Jun 2015 14:02:22 -0700 (PDT) Received: by 10.13.244.65 with HTTP; Mon, 22 Jun 2015 14:02:22 -0700 (PDT) In-Reply-To: <1969046464.61534041.1434897034960.JavaMail.root@uoguelph.ca> References: <1969046464.61534041.1434897034960.JavaMail.root@uoguelph.ca> Date: Mon, 22 Jun 2015 14:02:22 -0700 Message-ID: Subject: Re: [nfs][client] - Question about handling of the NFS3_EEXIST error in SYMLINK rpc From: "alex.burlyga.ietf alex.burlyga.ietf" To: Rick Macklem Cc: freebsd-fs Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 21:02:23 -0000 Rick, Thank you for a quick turn around, see answers inline: On Sun, Jun 21, 2015 at 7:30 AM, Rick Macklem wrote: > Alex Burlyga wrote: >> Hi, >> >> NFS client code in nfsrpc_symlink() masks server returned NFS3_EEXIST >> error >> code >> by returning 0 to the upper layers. I'm assuming this was an attempt >> to >> work around >> some server's broken replay cache out there, however, it breaks a >> more >> common >> case where server is returning EEXIST for legitimate reason and >> application >> is expecting this error code and equipped to deal with it. >> >> To fix it I see three ways of doing this: >> * Remove offending code >> * Make it optional, sysctl? >> * On NFS3_EEXIST send READLINK rpc to make sure symlink content is >> right >> >> Which of the ways will maximize the chances of getting this fix >> upstream? >> > I've attached a patch for testing/review that does essentially #2. > It has no effect on trivial tests, since the syscall does a Lookup > before trying to create the symlink and fails with EEXIST. > Do you have a case where competing clients are trying to create > the symlink or something like that, which runs into this? That's exactly failing test case we are running into. > > Please test the attached patch, since I don't know how to do that, rick Great! I'll test it. I was leaning towards option 3 for SYMLINK and option 2 for MKDIR. This will work. Thanks for taking your time to generate the patch! > >> One more point, old client circa FreeBSD 7.0 does not exhibit this >> problem. >> >> Alex >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 21:03:14 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 59B78A1C for ; Mon, 22 Jun 2015 21:03:14 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3838FFC for ; Mon, 22 Jun 2015 21:03:14 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id 237F93F6E8; Mon, 22 Jun 2015 17:03:13 -0400 (EDT) Message-ID: <55887810.3080301@sneakertech.com> Date: Mon, 22 Jun 2015 17:03:12 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Xin Li CC: FreeBSD FS Subject: Re: ZFS raid write performance? References: <5587C3FF.9070407@sneakertech.com> <5587C97F.2000407@delphij.net> In-Reply-To: <5587C97F.2000407@delphij.net> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 21:03:14 -0000 >> What's sequential write performance like these days for ZFS >> raidzX? Someone suggested to me that I set up a single not-raid >> disk to act as a fast 'landing pad' for receiving files, then move >> them to the pool later in the background. Is that actually >> necessary? (Assume generic sata drives, 250mb-4gb sized files, and >> transfers are across a LAN using single unbonded GigE). > > That sounds really weird recommendation IMHO. Did "someone" explained > with the reasoning/benefit of that "landing pad"? Sort of. Something about the checksum calculations causing too much overhead. I think they were confused about sequential write vs random write, and possibly mdadm vs zfs. It was just something mentioned in passing that I didn't want to start a debate about at the time, since I wasn't 100% sure. >a single hard drive won't do much beyond 100MB/s (maybe > 120MB/s max) for sequential 128kB blocks, so that "landing pad" would > probably not very helpful assuming you can saturate your GigE network Wait, I'm confused. A single GigE has a theoretical max of like 100mb/sec. That would imply the drive is probably about the same speed? From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 21:10:54 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8DD00A95 for ; Mon, 22 Jun 2015 21:10:54 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6B8EB349 for ; Mon, 22 Jun 2015 21:10:54 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id 8D7A43F725 for ; Mon, 22 Jun 2015 17:10:53 -0400 (EDT) Message-ID: <558879DD.2090005@sneakertech.com> Date: Mon, 22 Jun 2015 17:10:53 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: FreeBSD FS Subject: Re: ZFS raid write performance? References: <5587C3FF.9070407@sneakertech.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 21:10:54 -0000 > Writes over NFS 3 are synchronous. Writes over CIFS/Samba are likely > not. > write performance with multiple > clients. > Finally, single GigE is _slow_. I realize I've left out some possibly critical information. This box is a dump space that needs to receive files from widely mixed "clients" (*nix/Win/Mac/desktop/laptop/etc) across a LAN, so the file share software is Samba and the client machines will be connecting with (at most) a single GigE. Dunno if that changes anything. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 21:15:40 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2D7A0B0B for ; Mon, 22 Jun 2015 21:15:40 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0A5E28D7 for ; Mon, 22 Jun 2015 21:15:39 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id 8FBEA3F740 for ; Mon, 22 Jun 2015 17:15:38 -0400 (EDT) Message-ID: <55887AFA.30101@sneakertech.com> Date: Mon, 22 Jun 2015 17:15:38 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: FreeBSD FS Subject: Re: ZFS raid write performance? References: <5587C3FF.9070407@sneakertech.com> <558879DD.2090005@sneakertech.com> In-Reply-To: <558879DD.2090005@sneakertech.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 21:15:40 -0000 >> Writes over NFS 3 are synchronous. Writes over CIFS/Samba are likely >> not. > >> write performance with multiple >> clients. > >> Finally, single GigE is _slow_. > > > I realize I've left out some possibly critical information. > > This box is a dump space that needs to receive files from widely mixed > "clients" (*nix/Win/Mac/desktop/laptop/etc) across a LAN, so the file > share software is Samba and the client machines will be connecting with > (at most) a single GigE. ... and, these files will not need to be read or copied back off the server for a few days. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 21:19:37 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A8BFBB5E for ; Mon, 22 Jun 2015 21:19:37 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 83895A62 for ; Mon, 22 Jun 2015 21:19:37 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id CEA843F740 for ; Mon, 22 Jun 2015 17:19:36 -0400 (EDT) Message-ID: <55887BE8.2090305@sneakertech.com> Date: Mon, 22 Jun 2015 17:19:36 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Freebsd fs Subject: Re: ZFS pool restructuring and emergency repair References: <5584C0BC.9070707@sneakertech.com> <5587BC96.9090601@sneakertech.com> <20150622115856.GA60684@neutralgood.org> In-Reply-To: <20150622115856.GA60684@neutralgood.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 21:19:37 -0000 >> So I take it that, aside from messing with a gvirstor/ sparse disk >> image, there's still no way to really handle this because there's still >> no way to shrink a pool after creation? > > Correct. There's no way to shrink a pool ever. Drat, that's what I thought. Oh well. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 21:46:46 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C8224DDE for ; Mon, 22 Jun 2015 21:46:46 +0000 (UTC) (envelope-from m.seaman@infracaninophile.co.uk) Received: from smtp.infracaninophile.co.uk (smtp.infracaninophile.co.uk [IPv6:2001:8b0:151:1:3cd3:cd67:fafa:3d78]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.infracaninophile.co.uk", Issuer "infracaninophile.co.uk" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 69E56C12 for ; Mon, 22 Jun 2015 21:46:46 +0000 (UTC) (envelope-from m.seaman@infracaninophile.co.uk) Received: from liminal.local ([192.168.100.2]) (authenticated bits=0) by smtp.infracaninophile.co.uk (8.15.1/8.15.1) with ESMTPSA id t5MLkc10092529 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Mon, 22 Jun 2015 22:46:38 +0100 (BST) (envelope-from m.seaman@infracaninophile.co.uk) Authentication-Results: smtp.infracaninophile.co.uk; dmarc=none header.from=infracaninophile.co.uk DKIM-Filter: OpenDKIM Filter v2.9.2 smtp.infracaninophile.co.uk t5MLkc10092529 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=infracaninophile.co.uk; s=201001-infracaninophile; t=1435009598; bh=R60+BhHbggf+N0YBN5nZ9fdo2f/+NeGA4scLZYXrB3Y=; h=Date:From:To:Subject:References:In-Reply-To; z=Date:=20Mon,=2022=20Jun=202015=2022:46:29=20+0100|From:=20Matthew =20Seaman=20|To:=20freebsd-fs@fre ebsd.org|Subject:=20Re:=20ZFS=20pool=20restructuring=20and=20emerg ency=20repair|References:=20<5584C0BC.9070707@sneakertech.com>=20< 5587BC96.9090601@sneakertech.com>=20<20150622115856.GA60684@neutra lgood.org>=20<55887BE8.2090305@sneakertech.com>|In-Reply-To:=20<55 887BE8.2090305@sneakertech.com>; b=gRoYfF7VNm2IzDC6eAXeq9Azgnrm2kL25dCWYKHkzP5x8+yRwefcWIKpyLW7K8jnE yx9rW6rA5WU1xBNsyhM3w6aBjtCaEwKDqOp6tgQ6kqGSJyJ+m30M5OX51Uu7N0JC5Z y8IAUkt2gFS+smZdZOU2uQmZn2WHpoSERyOKPGH0= X-Authentication-Warning: lucid-nonsense.infracaninophile.co.uk: Host [192.168.100.2] claimed to be liminal.local Message-ID: <55888235.5000100@infracaninophile.co.uk> Date: Mon, 22 Jun 2015 22:46:29 +0100 From: Matthew Seaman User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: ZFS pool restructuring and emergency repair References: <5584C0BC.9070707@sneakertech.com> <5587BC96.9090601@sneakertech.com> <20150622115856.GA60684@neutralgood.org> <55887BE8.2090305@sneakertech.com> In-Reply-To: <55887BE8.2090305@sneakertech.com> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="9DMiUK28hUiFvVtASlkbdsOo5nBiBRee5" X-Virus-Scanned: clamav-milter 0.98.7 at lucid-nonsense.infracaninophile.co.uk X-Virus-Status: Clean X-Spam-Status: No, score=-1.5 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on lucid-nonsense.infracaninophile.co.uk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 21:46:47 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --9DMiUK28hUiFvVtASlkbdsOo5nBiBRee5 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 22/06/2015 22:19, Quartz wrote: >>> So I take it that, aside from messing with a gvirstor/ sparse disk >>> image, there's still no way to really handle this because there's sti= ll >>> no way to shrink a pool after creation? >> >> Correct. There's no way to shrink a pool ever. >=20 > Drat, that's what I thought. Oh well. Although in one of Matt Ahrens talks at BSDCan he spoke of plans to change this. Essentially you'ld be able to offline a vdev, and a background process (like scrub) would copy all the data blocks from that device to elsewhere in the pool. Once finished, the devices making up the vdev could be physically removed. Cheers, Matthew --9DMiUK28hUiFvVtASlkbdsOo5nBiBRee5 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.20 (Darwin) iQJ8BAEBCgBmBQJViII7XxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQ2NTNBNjhCOTEzQTRFNkNGM0UxRTEzMjZC QjIzQUY1MThFMUE0MDEzAAoJELsjr1GOGkATGHIP/2TlXQYW+nZbX3t1Sm7evmC7 zxN48bsUk3j1vIIR9X0OVvDcfAuuVAq5gxvBisteBOV3jJCEvIofjmbxx+4bEkDe NLg1hp4tvqKyoFEDrwq/pOzorgKCd8JOXEKXIvNthTRKM4LLkZUebQ09yIykSYy1 ldSs/5YPL3taN/L9aTs+ibuS+FIpCdprZ7qhm9o434KkuagIo4GwqOM/kd0fzpAg m7uxIytfw7mtDydCGDJ+tDjjcPEnToNkd2Xkl6QyEfpG3oUHpaqsZZuIDgRDlIY7 9RbUcSWym8cLqjpxYmeQbxLdCmNaxuhTZARiFx33N5oD0C7btce8A5+YuVzHu/0n YO1ETXvTgHy0C96wLsd/jx22rROvRIB79YoY28nZfrKB6l4pAJuwGFQfx9oeJBjT NQ8NdoLFlGmvhcQ4L66fEbeYDvnG1m64UpbvYeiNKX3NkNjcBV4NrIRbSRds79t9 +9reSjshk0bht0AfWSiABJeikzXan/JAoDV+4P3WvIbdFRcADD0dmkxrnQRPdrv0 P1y9ksh1WDNN95mofQ0U+i/UZuB9lsX42ciVV1/JL2VzNoP5oxF3qYX6V5IFAqjR hxF3Beui0Ut2vmfBl5HeHW5eHtwzaq4RFStrYXmwVP7a4TVTcnxIfLemV4I9PnxS JsgD47/wW3vuBoN0ZjqI =W/cs -----END PGP SIGNATURE----- --9DMiUK28hUiFvVtASlkbdsOo5nBiBRee5-- From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 21:53:19 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 713D2E4E for ; Mon, 22 Jun 2015 21:53:19 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4E2F6FAC for ; Mon, 22 Jun 2015 21:53:18 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id B6C3C3F753; Mon, 22 Jun 2015 17:53:17 -0400 (EDT) Message-ID: <558883CD.3080006@sneakertech.com> Date: Mon, 22 Jun 2015 17:53:17 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Matthew Seaman CC: freebsd-fs@freebsd.org Subject: Re: ZFS pool restructuring and emergency repair References: <5584C0BC.9070707@sneakertech.com> <5587BC96.9090601@sneakertech.com> <20150622115856.GA60684@neutralgood.org> <55887BE8.2090305@sneakertech.com> <55888235.5000100@infracaninophile.co.uk> In-Reply-To: <55888235.5000100@infracaninophile.co.uk> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 21:53:19 -0000 > Although in one of Matt Ahrens talks at BSDCan he spoke of plans to > change this. Essentially you'ld be able to offline a vdev, and a > background process (like scrub) would copy all the data blocks from that > device to elsewhere in the pool. Once finished, the devices making up > the vdev could be physically removed. Oh, that would be nice. Was there a timeline guesstimate for when that would be implemented, or was it more a "maybe someday" thing? From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 22:37:07 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 25E522AE for ; Mon, 22 Jun 2015 22:37:07 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 03DF7C8 for ; Mon, 22 Jun 2015 22:37:06 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id F08D83F760; Mon, 22 Jun 2015 18:37:05 -0400 (EDT) Message-ID: <55888E0D.6040704@sneakertech.com> Date: Mon, 22 Jun 2015 18:37:01 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: kpneal@pobox.com CC: FreeBSD FS Subject: Re: ZFS raid write performance? References: <5587C3FF.9070407@sneakertech.com> <5587C97F.2000407@delphij.net> <55887810.3080301@sneakertech.com> <20150622221422.GA71520@neutralgood.org> In-Reply-To: <20150622221422.GA71520@neutralgood.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 22:37:07 -0000 >>> a single hard drive won't do much beyond 100MB/s (maybe >>> 120MB/s max) for sequential 128kB blocks, so that "landing pad" would >>> probably not very helpful assuming you can saturate your GigE network >> >> Wait, I'm confused. A single GigE has a theoretical max of like >> 100mb/sec. That would imply the drive is probably about the same speed? > > You won't get the theoretical max what with the overhead of Ethernet > packets, TCP/IP overhead, and SMB protocol overhead. Right, I know that, that's why I don't understand what Xin Li was trying to say. I guess a better way to word the question is: would a raidzX using generic drives, samba, and 500mb-4gb files be notably slower at writing than ~70mb/sec. I have a feeling not, but I wanted to double check. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 22:51:08 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 77D6038D for ; Mon, 22 Jun 2015 22:51:08 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [64.62.153.212]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "anubis.delphij.net", Issuer "StartCom Class 1 Primary Intermediate Server CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 578058D9 for ; Mon, 22 Jun 2015 22:51:08 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from zeta.ixsystems.com (c-71-202-112-39.hsd1.ca.comcast.net [71.202.112.39]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by anubis.delphij.net (Postfix) with ESMTPSA id 027EC182F3; Mon, 22 Jun 2015 15:51:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphij.net; s=anubis; t=1435013467; x=1435027867; bh=2+Av8ovk6mp7Ug5J3pZg31q1KXM38hnVZVkmPlL323k=; h=Date:From:Reply-To:To:CC:Subject:References:In-Reply-To; b=V4O55+zj4iUSqFK6PHt7FCkf8Y7sCV60mvq5rpIs80sXEWaV+rg5Ot6NY+d/9pjny cNB/EtImMmc31gmVbF3+thssEHAfA4jPw/NWMwCskWCQc8tLMoxW1XQZJ1UJD4koVt zRymYQ65KnZtVhRkLIxfhIC2O6UC+xiMctcyQHEo= Message-ID: <5588915A.700@delphij.net> Date: Mon, 22 Jun 2015 15:51:06 -0700 From: Xin Li Reply-To: d@delphij.net Organization: The FreeBSD Project MIME-Version: 1.0 To: Quartz CC: FreeBSD FS Subject: Re: ZFS raid write performance? References: <5587C3FF.9070407@sneakertech.com> <5587C97F.2000407@delphij.net> <55887810.3080301@sneakertech.com> In-Reply-To: <55887810.3080301@sneakertech.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 22:51:08 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 06/22/15 14:03, Quartz wrote: >>> What's sequential write performance like these days for ZFS >>> raidzX? Someone suggested to me that I set up a single not-raid >>> disk to act as a fast 'landing pad' for receiving files, then >>> move them to the pool later in the background. Is that actually >>> necessary? (Assume generic sata drives, 250mb-4gb sized files, >>> and transfers are across a LAN using single unbonded GigE). >> >> That sounds really weird recommendation IMHO. Did "someone" >> explained with the reasoning/benefit of that "landing pad"? > > Sort of. Something about the checksum calculations causing too much > overhead. I think they were confused about sequential write vs > random There are some overhead but it won't be the bottleneck if you are using one GigE connection (not to mention that the default ZFS checksum is not SHA256 but a much faster algorithm), where network is the bottleneck. > write, and possibly mdadm vs zfs. It was just something mentioned > in passing that I didn't want to start a debate about at the time, > since I wasn't 100% sure. > >> a single hard drive won't do much beyond 100MB/s (maybe 120MB/s >> max) for sequential 128kB blocks, so that "landing pad" would >> probably not very helpful assuming you can saturate your GigE >> network > > Wait, I'm confused. A single GigE has a theoretical max of like > 100mb/sec. That would imply the drive is probably about the same > speed? No, what I'm trying to say is that since reading from the single drive can't do much better than the network, it's likely that you wouldn't be benefited by having it. If the drive is much faster than the network and RAID-Z is much slower than it, you could get some benefits because the unit you are using as source of the replication can be re-purposed once data is on that drive (which if I was to do the operation, I would probably never do because that means less redundancy during migration). Cheers, - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.1.5 (FreeBSD) iQIcBAEBCgAGBQJViJFXAAoJEJW2GBstM+nsr3IP/AhXwYqg0MwXg+hbhl2kFhh2 w0lIbWwGe2KzttphJZDv+FORlnkUtynOS5YULiwavldup91DHyOZvru4HjuugBeR BOW3Bkq4xOBVlzn9oW/BTMWbutevhmTBXG18iVxj2qwsy9NIuGL+1wyrYI5r5bl/ BaHBHYF6UXtr8Um77qZ8neKuv+ePGCCqYLei/paTc56XRnq5nlreulW8fxuHN4Pz b3JPLzoaPdQOkcXtBe9V6ZlmdLvfBAmrCbD0gL0BDAeLsvkjRlQifwl+ZTSLeOtF ja3bJ8tfCMFeGuRsL0RginiIn21if2rjZRuhWfUY0cDsPXgLVjseLLxc7F8NMwDt rigkEuTTIfZy6UKD+70g05O2suN963Orqy1L6tfoAG0bEk9qH5ZoNl50F/fboRu/ 68bAwTEMNo0x7h7XlCgB2GYS5qdDgsIeNbJLcDmXHmgTAyK/XM5/5pvSvXY2dYWN /z/cYVHB8cVSwugcYZP/NQk8Eeldy2P+uZlUVqUSiWmk3m0x51VPyFJUtnnNIEf+ E4TupH/kyfZoiTgbsdCvfYqWm6YViNrjeZ8qa5qeGQnjDiNf1hCqyd/YbaCE0rHX ACV4PyDkyW56uf+89uoKbn6QQMwb3FsL/6epODzLlSQYYDI+hvwN7PpKHQTnVFAS gQutdcHlR3ZNiiLIO0ji =bigs -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 23:04:58 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3FE845AB for ; Mon, 22 Jun 2015 23:04:58 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [64.62.153.212]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "anubis.delphij.net", Issuer "StartCom Class 1 Primary Intermediate Server CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 26B2B132 for ; Mon, 22 Jun 2015 23:04:57 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from zeta.ixsystems.com (c-71-202-112-39.hsd1.ca.comcast.net [71.202.112.39]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by anubis.delphij.net (Postfix) with ESMTPSA id 522071839B; Mon, 22 Jun 2015 16:04:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphij.net; s=anubis; t=1435014297; x=1435028697; bh=CwINWeWEfXWL5C6yjw1YlEYPjk/+4gqrCFXiwHzrN88=; h=Date:From:Reply-To:To:CC:Subject:References:In-Reply-To; b=U4hdbUqBJjwrvgsKB5jFDJzTqHxpcOvktenTbDAiFN8C4nWvapL4oR2vlUc59DZtt kq2HrrHWIlhVN7IMKKahMqHBOrtAlysP96Kf7kSpGuG9p8vb4L0hNlUb08UZV6LTi2 FpgD0PNGTsFfC/JqWEB/j+46e2ipvymCpNQE1eOY= Message-ID: <55889498.3090405@delphij.net> Date: Mon, 22 Jun 2015 16:04:56 -0700 From: Xin Li Reply-To: d@delphij.net Organization: The FreeBSD Project MIME-Version: 1.0 To: kpneal@pobox.com, Quartz CC: FreeBSD FS Subject: Re: ZFS raid write performance? References: <5587C3FF.9070407@sneakertech.com> <20150622121343.GB60684@neutralgood.org> In-Reply-To: <20150622121343.GB60684@neutralgood.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 23:04:58 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 06/22/15 05:13, kpneal@pobox.com wrote: > For both reads and writes all levels of raidz* perform slightly > faster than the speed of a single drive. _Slightly_ faster, like, > the speed of a single drive * 1.1 or so roughly speaking. How big is the data block for each read-write? For large blocks RAID-Z is likely to perform nearly as well as stripped disks (e.g. 3 disks RAID-Z is slightly slower than 2 disk stripped, but would be much better than single disk pool). Typically copying data would use larger data blocks. For smaller writes it's likely to have worse results. > Finally, single GigE is _slow_. I see no point in a "landing pad" > when using unbonded GigE. How will a "landing pad" help when let's say we have 10GigE or even faster network connection? Cheers, - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.1.5 (FreeBSD) iQIcBAEBCgAGBQJViJSYAAoJEJW2GBstM+nsVR0QAILGeNt4iT+mT1NeEEBiFtng wcdmzNHeUueSjRl/ecJl4O6UbDH/OAxrUwLTyj6/mP8J60JhfIZisrcnSYXCSYQL 6INTAFy8u+eD7ewMNYXr0PddDku3bsTKSC7zlSZKURctlkqX1gEatGLJDDhDMqJj KCcGpBnNX5CFS9y6UrCxbezoPwYlGf1CrEQooin5s5bLKWBwjBnG+XsaURtCOvXo aY6ctTHyKDhuDWfBlaSU73eaFAw6zjcjVvJh6BHVA3JZSwx5F4vFT9ahjpPSimvS h2byxrtSEi6PAIF+f7T+4zRoCqy+i2yYmnZlqHRQtGBtipF1cnzFlGQsGQtussE/ mamcXhcZDm2HbmxLyoUV15vNG4m/zvgMJK6VpMJrdbO5u/DfCDer/zuJyWJt6N/B Ytldb/a24WLpKEDtdUtkFw774GPOgXk8YEU/TN6lyxRx5Ua6wb8kB66npEZi3eMN tvdD45gKKVXmB5ooQjAiRzuOanKhDR40OBpCD1ZgNl513mSGJ0iNeJVGMzz2gakj 2r1GcRi+5DZTfcupc2NOLwe+8JM5B0QQzXCmuHS/eTdTGBBR4tfyIX8D5uxZs3wq 2CHPRg3yQxy0JOk14+q6g1uJfdiBjBt1SKF+gFD0TMuIFGEnREx6DpcNrJp3uPMF INYxVG0U2UUmTYIeM2iJ =CIqL -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Mon Jun 22 23:13:30 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7FA165FF for ; Mon, 22 Jun 2015 23:13:30 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 471BC655 for ; Mon, 22 Jun 2015 23:13:29 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id t5MNDRhR023038; Mon, 22 Jun 2015 18:13:27 -0500 (CDT) Date: Mon, 22 Jun 2015 18:13:27 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Quartz cc: freebsd-fs@freebsd.org Subject: Re: ZFS pool restructuring and emergency repair In-Reply-To: <558883CD.3080006@sneakertech.com> Message-ID: References: <5584C0BC.9070707@sneakertech.com> <5587BC96.9090601@sneakertech.com> <20150622115856.GA60684@neutralgood.org> <55887BE8.2090305@sneakertech.com> <55888235.5000100@infracaninophile.co.uk> <558883CD.3080006@sneakertech.com> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Mon, 22 Jun 2015 18:13:27 -0500 (CDT) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 23:13:30 -0000 On Mon, 22 Jun 2015, Quartz wrote: >> Although in one of Matt Ahrens talks at BSDCan he spoke of plans to >> change this. Essentially you'ld be able to offline a vdev, and a >> background process (like scrub) would copy all the data blocks from that >> device to elsewhere in the pool. Once finished, the devices making up >> the vdev could be physically removed. > > Oh, that would be nice. Was there a timeline guesstimate for when that would > be implemented, or was it more a "maybe someday" thing? This has been planned for perhaps 8 years already. Still in the original status. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Tue Jun 23 00:40:16 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5B027A43 for ; Tue, 23 Jun 2015 00:40:16 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [IPv6:2001:470:1:117::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "anubis.delphij.net", Issuer "StartCom Class 1 Primary Intermediate Server CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 3E941188 for ; Tue, 23 Jun 2015 00:40:16 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from zeta.ixsystems.com (c-71-202-112-39.hsd1.ca.comcast.net [71.202.112.39]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by anubis.delphij.net (Postfix) with ESMTPSA id B20E2186DE; Mon, 22 Jun 2015 17:40:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphij.net; s=anubis; t=1435020014; x=1435034414; bh=s2MrOc4nMp2Xx+QZgChKN+UjBSQ1Vhq5ptkj2Ce9LPk=; h=Date:From:Reply-To:To:CC:Subject:References:In-Reply-To; b=g+9c6NbahS2nmCYJLY+ieFw+frM9V/O/mGo1Wev80S6sx2stfALCMXaew9FndDbv/ K4Wm5R6YH7fuGHCOaV8mR+yO1gthXfbD1fgpHRbwjnniGK5GuKZsi2ts9aBo8L53H2 6Y+v1H1GwZBGiD9RgSHLaxcwBkgZLEPLOf2ZbEvs= Message-ID: <5588AAED.9030003@delphij.net> Date: Mon, 22 Jun 2015 17:40:13 -0700 From: Xin Li Reply-To: d@delphij.net Organization: The FreeBSD Project MIME-Version: 1.0 To: kpneal@pobox.com, Bob Friesenhahn CC: freebsd-fs@freebsd.org, Quartz Subject: Re: ZFS pool restructuring and emergency repair References: <5584C0BC.9070707@sneakertech.com> <5587BC96.9090601@sneakertech.com> <20150622115856.GA60684@neutralgood.org> <55887BE8.2090305@sneakertech.com> <55888235.5000100@infracaninophile.co.uk> <558883CD.3080006@sneakertech.com> <20150623000453.GA92931@neutralgood.org> In-Reply-To: <20150623000453.GA92931@neutralgood.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Jun 2015 00:40:16 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 06/22/15 17:04, kpneal@pobox.com wrote: > On Mon, Jun 22, 2015 at 06:13:27PM -0500, Bob Friesenhahn wrote: >> On Mon, 22 Jun 2015, Quartz wrote: >> >>>> Although in one of Matt Ahrens talks at BSDCan he spoke of >>>> plans to change this. Essentially you'ld be able to offline >>>> a vdev, and a background process (like scrub) would copy all >>>> the data blocks from that device to elsewhere in the pool. >>>> Once finished, the devices making up the vdev could be >>>> physically removed. >>> >>> Oh, that would be nice. Was there a timeline guesstimate for >>> when that would be implemented, or was it more a "maybe >>> someday" thing? >> >> This has been planned for perhaps 8 years already. Still in the >> original status. > > Is this via "block pointer rewrite"? Actually the vdev removal feature is implemented back in last December (bcc'ed Matt in case he want to chime in) by Delphix. If I remember correctly, it's almost finished at the time we had OpenZFS developer summit last year. The initial changeset is about 5000 or 5500 lines of changes and is not integrated into Illumos repository yet. == The block pointer rewrite is something that would complicate the ZFS code quite a lot (and possibly also break many layering design) so don't expect it happening anytime soon. Cheers, - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.1.5 (FreeBSD) iQIcBAEBCgAGBQJViKrtAAoJEJW2GBstM+nspCYP/RSXZT9Ni/Asc17hkuro/0jR lwDrkQkDrGin8/ACZ8MKNnVpdRIysuMvPD9fsi5pq7N9/nnGFf1Xq0EF7dYDn+bl UpxnXJ678lnpwTls0NXo93RoPxzsBEzAbMjmJ4YWEWOe0iKnwj+hL4d7WoHYu0tM mqFWpBM4kefd0QDjMLOMK58z20qdNqIPFxTMP+pTiVycl4x8lb284hLEWmi6u1g/ 1u57PowRwCOWPxISuunUgeKpkz2c05YTG4vQzm2p9kzhjV2lrqNiNLSxPMv4FEfI NTKSoscyfznm6GAOT+yV9HfepzZiWDQaG2l8epRA9hn+KhzMUsium3kX/3JHwL97 ybFqvPj46QzkVjnaTgAw2rsYqaYlDcBmJ6xKU/J+u+aq55VKnyN2sLYLYxD576QS IgN7LYgMCp+6YCU+oOGhmwzcAlF4kykjeW//om3Kjr4VY7Fk7jEBC20vMn5bBobj jtluxyDk2t3ccjbdNzAjHsgmzDSwQodgfsMjj7U35pTI6YkWG3Ywc/D7oLoc9C6K oVZSJsh11tjCO0D6XZx2Nv3hy1Y3Lr8AAZ7SJnpm4zEBKx3HYyPWCtwjA3quSPxx OSW3I7AlUUYaDfYrTIM3mrm4XOd5IBxGKfAbgdF/hQDTRQZUQXchqMxzfC6rEtv/ Djz/XVE1Ad9RgST3gzA+ =e4aP -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Tue Jun 23 03:29:29 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B968DDCB for ; Tue, 23 Jun 2015 03:29:29 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9432CF5 for ; Tue, 23 Jun 2015 03:29:29 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id 2703B3F6DD; Mon, 22 Jun 2015 23:29:22 -0400 (EDT) Message-ID: <5588D291.4030806@sneakertech.com> Date: Mon, 22 Jun 2015 23:29:21 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: kpneal@pobox.com CC: FreeBSD FS Subject: Re: ZFS raid write performance? References: <5587C3FF.9070407@sneakertech.com> <5587C97F.2000407@delphij.net> <55887810.3080301@sneakertech.com> <20150622221422.GA71520@neutralgood.org> <55888E0D.6040704@sneakertech.com> <20150623002854.GB96928@neutralgood.org> In-Reply-To: <20150623002854.GB96928@neutralgood.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Jun 2015 03:29:29 -0000 >> I guess a better way to word the question is: would a raidzX using >> generic drives, samba, and 500mb-4gb files be notably slower at writing >> than ~70mb/sec. I have a feeling not, but I wanted to double check. > > Gut feeling: I'm sticking with the network being the limiting factor. > But the only real way to know is to setup a test system and, well, test. Question: I'm not super familiar with the way freebsd+zfs handles IO and caching under the hood, especially not when you throw drive caching into the mix too. Does something simple like "dd if=/dev/zero of=/pool/foo bs=1m count=500" give me a reasonably accurate number for write speed? From owner-freebsd-fs@FreeBSD.ORG Tue Jun 23 06:07:40 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 76446944 for ; Tue, 23 Jun 2015 06:07:40 +0000 (UTC) (envelope-from m.seaman@infracaninophile.co.uk) Received: from smtp.infracaninophile.co.uk (smtp.infracaninophile.co.uk [IPv6:2001:8b0:151:1:3cd3:cd67:fafa:3d78]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.infracaninophile.co.uk", Issuer "infracaninophile.co.uk" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 16AED209 for ; Tue, 23 Jun 2015 06:07:39 +0000 (UTC) (envelope-from m.seaman@infracaninophile.co.uk) Received: from liminal.local ([192.168.100.2]) (authenticated bits=0) by smtp.infracaninophile.co.uk (8.15.1/8.15.1) with ESMTPSA id t5N67PcA003958 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 23 Jun 2015 07:07:27 +0100 (BST) (envelope-from m.seaman@infracaninophile.co.uk) Authentication-Results: smtp.infracaninophile.co.uk; dmarc=none header.from=infracaninophile.co.uk DKIM-Filter: OpenDKIM Filter v2.9.2 smtp.infracaninophile.co.uk t5N67PcA003958 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=infracaninophile.co.uk; s=201001-infracaninophile; t=1435039648; bh=gT7CALiFZrOUYcvHS7/IH59TOCU3NGjn9WYZ4C3vP4E=; h=Date:From:To:CC:Subject:References:In-Reply-To; z=Date:=20Tue,=2023=20Jun=202015=2007:07:15=20+0100|From:=20Matthew =20Seaman=20|To:=20Quartz=20|CC:=20freebsd-fs@freebsd.org|Subject:=20Re:=20 ZFS=20pool=20restructuring=20and=20emergency=20repair|References:= 20<5584C0BC.9070707@sneakertech.com>=20<5587BC96.9090601@sneakerte ch.com>=20<20150622115856.GA60684@neutralgood.org>=20<55887BE8.209 0305@sneakertech.com>=20<55888235.5000100@infracaninophile.co.uk>= 20<558883CD.3080006@sneakertech.com>|In-Reply-To:=20<558883CD.3080 006@sneakertech.com>; b=ski5sHqUy6P/KHCfyjUgRBH1MogQz6cIvseG7zECEhJTb5x1UiZ3znwU0BJeZk0E+ /eXhxPhROUhyyEBcBkCzyyrf42vckN4LujXZF2ZISEZ9fuc8fFXcqJ0JS7t1Et8PRy XtIRWsXww9n0DczZ5vnL8O0N+SGZR57wBCNWcFMY= X-Authentication-Warning: lucid-nonsense.infracaninophile.co.uk: Host [192.168.100.2] claimed to be liminal.local Message-ID: <5588F793.4050802@infracaninophile.co.uk> Date: Tue, 23 Jun 2015 07:07:15 +0100 From: Matthew Seaman User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Quartz CC: freebsd-fs@freebsd.org Subject: Re: ZFS pool restructuring and emergency repair References: <5584C0BC.9070707@sneakertech.com> <5587BC96.9090601@sneakertech.com> <20150622115856.GA60684@neutralgood.org> <55887BE8.2090305@sneakertech.com> <55888235.5000100@infracaninophile.co.uk> <558883CD.3080006@sneakertech.com> In-Reply-To: <558883CD.3080006@sneakertech.com> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="nrFQqdjcIEWsOmaCfcTeJw2phUG9c0iMl" X-Virus-Scanned: clamav-milter 0.98.7 at lucid-nonsense.infracaninophile.co.uk X-Virus-Status: Clean X-Spam-Status: No, score=-1.5 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on lucid-nonsense.infracaninophile.co.uk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Jun 2015 06:07:40 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --nrFQqdjcIEWsOmaCfcTeJw2phUG9c0iMl Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 22/06/2015 22:53, Quartz wrote: >> Although in one of Matt Ahrens talks at BSDCan he spoke of plans to >> change this. Essentially you'ld be able to offline a vdev, and a >> background process (like scrub) would copy all the data blocks from th= at >> device to elsewhere in the pool. Once finished, the devices making up= >> the vdev could be physically removed. >=20 > Oh, that would be nice. Was there a timeline guesstimate for when that > would be implemented, or was it more a "maybe someday" thing? He didn't specify any sort of timeline I'm afraid. Cheers, Matthew --nrFQqdjcIEWsOmaCfcTeJw2phUG9c0iMl Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.20 (Darwin) iQJ8BAEBCgBmBQJViPebXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQ2NTNBNjhCOTEzQTRFNkNGM0UxRTEzMjZC QjIzQUY1MThFMUE0MDEzAAoJELsjr1GOGkAT/6kP/RbBSBKTWOUOKaWGmjaZeksC kSUajB43dEthUdfTIGmEShyuv7f8qhGjpFHUvI3V9B3//SQvoiWxeTOaVtjgS9sX XxdNckK8jqk6xC14kqwLoU0yraARv0rftiC90p8b31OOHVJq3xMm3V/byBRWK5nH cX4PvDX4q9FURutLPISAXxoHFm56jPWZ3HtjAYl5Ecu3SLTJi/5GcDQv8ELI3a62 Av9VNhJKMFjZIS2as+yjHWzGhMWMnWiBfo/ZZCNzp0q5jOtT8AOI3luby2AHLxbe S/U+t2E7o4eIUUd5GztTtLRDCsi1eZ6AQuHmZ91LtKGgThalZ3MyJH8Pfr32QAA5 dzVNlMZEApavheD7NdP9W62av4qAqL6JfAUHga4qt+5MPQB0tAWxyqUp0plncwek ThKezhgLfHEe76y3VsS7PAObUwNtTyfYnHKnQ7pnqq2Ct6qML+iGYbOXz3NhVobt giF1Q4Hhb1Il47zGEFyHa9QzAwkulx4sySMgJbtS/KjUMC4g6tIifgH5oGMAnrT7 uK6wqhqPbcUJqSvU4Y/B8Y6tuNlx5hRqk3CK1Q/pOfHhmR2kVPqAGQrKSkRDsW3o qdUyqnUVaSCWKXYODyv2BUrxKUWKHLOXj4FSPNsI8MybVdMxOV1IgxTAGJRVCfc8 jsJk1veTzROlLmyzO/kP =e3AH -----END PGP SIGNATURE----- --nrFQqdjcIEWsOmaCfcTeJw2phUG9c0iMl-- From owner-freebsd-fs@FreeBSD.ORG Tue Jun 23 10:06:56 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0A2C2B50 for ; Tue, 23 Jun 2015 10:06:56 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DA93C2E9 for ; Tue, 23 Jun 2015 10:06:55 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id F1ECB3F6DD; Tue, 23 Jun 2015 06:06:53 -0400 (EDT) Message-ID: <55892FBD.7030204@sneakertech.com> Date: Tue, 23 Jun 2015 06:06:53 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: kpneal@pobox.com CC: FreeBSD FS Subject: Re: ZFS raid write performance? References: <5587C3FF.9070407@sneakertech.com> <5587C97F.2000407@delphij.net> <55887810.3080301@sneakertech.com> <20150622221422.GA71520@neutralgood.org> <55888E0D.6040704@sneakertech.com> <20150623002854.GB96928@neutralgood.org> <5588D291.4030806@sneakertech.com> <20150623042234.GA66734@neutralgood.org> In-Reply-To: <20150623042234.GA66734@neutralgood.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Jun 2015 10:06:56 -0000 > I'd go with something similar to how the machine will be used in production. Hrmm, I was hoping for a quickie synthetic test just to gauge if the write speed was anywhere near as low as the network. > Also, ZFS has the ability to detect sequences of zeros and optimize the > writing of them. Yeah, I had a vague memory of something like that, which is why I asked. >Make sure the amount copied to the machine is, oh, say, at least > twice (or maybe thrice?) the amount of RAM in the server just to be sure > you've defeated any caching. That's... not going to be easy. I don't have that much data broken into files of that size kicking around yet. It won't necessarily be a "production" test either, given that most clients will be transferring only a few gigs over at a time. From owner-freebsd-fs@FreeBSD.ORG Tue Jun 23 10:55:04 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D23C1D6B for ; Tue, 23 Jun 2015 10:55:04 +0000 (UTC) (envelope-from karli.sjoberg@slu.se) Received: from exch2-4.slu.se (webmail.slu.se [77.235.224.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "webmail.slu.se", Issuer "TERENA SSL CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 606CBC82 for ; Tue, 23 Jun 2015 10:55:04 +0000 (UTC) (envelope-from karli.sjoberg@slu.se) Received: from exch2-4.slu.se (77.235.224.124) by exch2-4.slu.se (77.235.224.124) with Microsoft SMTP Server (TLS) id 15.0.1076.9; Tue, 23 Jun 2015 12:39:23 +0200 Received: from exch2-4.slu.se ([fe80::3117:818f:aa48:9d9b]) by exch2-4.slu.se ([fe80::3117:818f:aa48:9d9b%22]) with mapi id 15.00.1076.000; Tue, 23 Jun 2015 12:39:23 +0200 From: =?utf-8?B?S2FybGkgU2rDtmJlcmc=?= To: Quartz CC: "kpneal@pobox.com" , FreeBSD FS Subject: Re: ZFS raid write performance? Thread-Topic: ZFS raid write performance? Thread-Index: AQHQraDeZ5iISfttbk29ogJBEcFyIA== Date: Tue, 23 Jun 2015 10:39:23 +0000 Message-ID: Accept-Language: sv-SE, en-US Content-Language: sv-SE X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Jun 2015 10:55:04 -0000 DQpEZW4gMjMganVuIDIwMTUgMTI6MDYgZW0gc2tyZXYgUXVhcnR6IDxxdWFydHpAc25lYWtlcnRl Y2guY29tPjoNCj4NCj4gPiBJJ2QgZ28gd2l0aCBzb21ldGhpbmcgc2ltaWxhciB0byBob3cgdGhl IG1hY2hpbmUgd2lsbCBiZSB1c2VkIGluIHByb2R1Y3Rpb24uDQo+DQo+IEhybW0sIEkgd2FzIGhv cGluZyBmb3IgYSBxdWlja2llIHN5bnRoZXRpYyB0ZXN0IGp1c3QgdG8gZ2F1Z2UgaWYgdGhlDQo+ IHdyaXRlIHNwZWVkIHdhcyBhbnl3aGVyZSBuZWFyIGFzIGxvdyBhcyB0aGUgbmV0d29yay4NCg0K YmVuY2htYXJrcy9ib25uaWUrKw0KDQpHb29kIHN5bnRoZXRpYyB0ZXN0aW5nIHRvb2wgZm9yIGZp bGVzeXN0ZW0gcGVyZm9ybWFuY2UuDQoNCi9LDQoNCj4NCj4gPiBBbHNvLCBaRlMgaGFzIHRoZSBh YmlsaXR5IHRvIGRldGVjdCBzZXF1ZW5jZXMgb2YgemVyb3MgYW5kIG9wdGltaXplIHRoZQ0KPiA+ IHdyaXRpbmcgb2YgdGhlbS4NCj4NCj4gWWVhaCwgSSBoYWQgYSB2YWd1ZSBtZW1vcnkgb2Ygc29t ZXRoaW5nIGxpa2UgdGhhdCwgd2hpY2ggaXMgd2h5IEkgYXNrZWQuDQo+DQo+ID5NYWtlIHN1cmUg dGhlIGFtb3VudCBjb3BpZWQgdG8gdGhlIG1hY2hpbmUgaXMsIG9oLCBzYXksIGF0IGxlYXN0DQo+ ID4gdHdpY2UgKG9yIG1heWJlIHRocmljZT8pIHRoZSBhbW91bnQgb2YgUkFNIGluIHRoZSBzZXJ2 ZXIganVzdCB0byBiZSBzdXJlDQo+ID4geW91J3ZlIGRlZmVhdGVkIGFueSBjYWNoaW5nLg0KPg0K PiBUaGF0J3MuLi4gbm90IGdvaW5nIHRvIGJlIGVhc3kuIEkgZG9uJ3QgaGF2ZSB0aGF0IG11Y2gg ZGF0YSBicm9rZW4gaW50bw0KPiBmaWxlcyBvZiB0aGF0IHNpemUga2lja2luZyBhcm91bmQgeWV0 LiBJdCB3b24ndCBuZWNlc3NhcmlseSBiZSBhDQo+ICJwcm9kdWN0aW9uIiB0ZXN0IGVpdGhlciwg Z2l2ZW4gdGhhdCBtb3N0IGNsaWVudHMgd2lsbCBiZSB0cmFuc2ZlcnJpbmcNCj4gb25seSBhIGZl dyBnaWdzIG92ZXIgYXQgYSB0aW1lLg0KPg0KPg0KPiBfX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fXw0KPiBmcmVlYnNkLWZzQGZyZWVic2Qub3JnIG1haWxpbmcg bGlzdA0KPiBodHRwOi8vbGlzdHMuZnJlZWJzZC5vcmcvbWFpbG1hbi9saXN0aW5mby9mcmVlYnNk LWZzDQo+IFRvIHVuc3Vic2NyaWJlLCBzZW5kIGFueSBtYWlsIHRvICJmcmVlYnNkLWZzLXVuc3Vi c2NyaWJlQGZyZWVic2Qub3JnIg0K From owner-freebsd-fs@FreeBSD.ORG Tue Jun 23 13:32:33 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 004C39BA for ; Tue, 23 Jun 2015 13:32:32 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9F6C8BAD for ; Tue, 23 Jun 2015 13:32:31 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id t5NDWO69019797; Tue, 23 Jun 2015 08:32:24 -0500 (CDT) Date: Tue, 23 Jun 2015 08:32:24 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: kpneal@pobox.com cc: FreeBSD FS Subject: Re: ZFS raid write performance? In-Reply-To: <20150623042234.GA66734@neutralgood.org> Message-ID: References: <5587C3FF.9070407@sneakertech.com> <5587C97F.2000407@delphij.net> <55887810.3080301@sneakertech.com> <20150622221422.GA71520@neutralgood.org> <55888E0D.6040704@sneakertech.com> <20150623002854.GB96928@neutralgood.org> <5588D291.4030806@sneakertech.com> <20150623042234.GA66734@neutralgood.org> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Tue, 23 Jun 2015 08:32:24 -0500 (CDT) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Jun 2015 13:32:33 -0000 On Tue, 23 Jun 2015, kpneal@pobox.com wrote: > > When I was testing read speeds I tarred up a tree that was 700+GB in size > on a server with 64GB of memory. Tar (and cpio) are only single-threaded. They open and read input files one by one. Zfs's read-ahead algorithm ramps up the amount of read-ahead each time the program goes to read data and it is not already in memory. Due to this ramp-up, input file size has a significant impact on the apparent read performance. The ramp-up occurs on a per-file basis. Large files (still much smaller than RAM) will produce a higher data rate than small files. If read requests are pending for several files at once (or several read requests for different parts of the same file), then the observed data rate would be higher. Tar/cpio read tests are often more impacted by disk latencies and zfs read-ahead algorithms than the peak performance of the data path. A very large server with many disks may produce similar timings to a very small server. Long ago I wrote a test script (http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh) which was intended to expose a zfs bug existing at that time, but is still a very useful test for zfs caching and read-ahead by testing initial sequential read performance from a filesystem. This script was written for Solaris and might need some small adaptation to be used for FreeBSD. Extracting a tar file (particularly on a network client) is a very interesting test of network server write performance. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Tue Jun 23 15:17:15 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DAA0BE1 for ; Tue, 23 Jun 2015 15:17:15 +0000 (UTC) (envelope-from lkateley@kateley.com) Received: from mail-ig0-f172.google.com (mail-ig0-f172.google.com [209.85.213.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A76F7D8D for ; Tue, 23 Jun 2015 15:17:15 +0000 (UTC) (envelope-from lkateley@kateley.com) Received: by igblr2 with SMTP id lr2so55825120igb.0 for ; Tue, 23 Jun 2015 08:17:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=H3lD+6PGax4zErBlAI70lqdJ2FBQCAZ7WoLYx/KjR/g=; b=dF58tihMVHqO7NWZYWGtVlSyRRtwX1UUOhxG1dCsZMje+wz0ZUobW4ZIkqHGYpTBNt 1kVhufoywSbrPPVyCtmnLDjm5V89S4PkzIY5dAz5JevukCF/KRGe61f2nRbMucN8hF1v 1y51gFxCP8jUrAEiKLfTuZEnBaD6QTCTx6PnnyTnT+IQxA091zOybRdYuCWQtjXkKhBs aZCMHVRC5IrvIuBiyRgjC9Q/GslNk693J/Se9zv/dFcliv3Gp0HQ6GdBjXYDYtKHwGre eVGnBsYpFRbxtH07d+gqN2YHjXEhXC5Vi4f9mncOIOCCcoEmZsukl7A8JffVx6yd67ew 2TCg== X-Gm-Message-State: ALoCoQnqNS8LiYXbBAUuBOl1Y8Rnz8WDfBzzF+AXRnBKk+k2LNLkoUqFoPDn9m75qo6yYB4jXdSm X-Received: by 10.107.28.202 with SMTP id c193mr45802637ioc.90.1435072634336; Tue, 23 Jun 2015 08:17:14 -0700 (PDT) Received: from kateleycoimac.local ([63.231.252.189]) by mx.google.com with ESMTPSA id c12sm13487992ioj.39.2015.06.23.08.17.13 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 23 Jun 2015 08:17:13 -0700 (PDT) Message-ID: <55897878.30708@kateley.com> Date: Tue, 23 Jun 2015 10:17:12 -0500 From: Linda Kateley User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: ZFS raid write performance? References: <5587C3FF.9070407@sneakertech.com> <5587C97F.2000407@delphij.net> <55887810.3080301@sneakertech.com> <20150622221422.GA71520@neutralgood.org> <55888E0D.6040704@sneakertech.com> <20150623002854.GB96928@neutralgood.org> <5588D291.4030806@sneakertech.com> <20150623042234.GA66734@neutralgood.org> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Jun 2015 15:17:16 -0000 Is it possible that the suggestion for the "landing pad" could be recommending a smaller ssd pool? Then replicating back to a slower pool? I actually do that kind of architecture once in awhile, especially for uses like large cad drawings, where there is a tendency to work on one big file at a time... With lower costs and higher densities of ssd, this is a nice way to use them On 6/23/15 8:32 AM, Bob Friesenhahn wrote: > On Tue, 23 Jun 2015, kpneal@pobox.com wrote: >> >> When I was testing read speeds I tarred up a tree that was 700+GB in >> size >> on a server with 64GB of memory. > > Tar (and cpio) are only single-threaded. They open and read input > files one by one. Zfs's read-ahead algorithm ramps up the amount of > read-ahead each time the program goes to read data and it is not > already in memory. Due to this ramp-up, input file size has a > significant impact on the apparent read performance. The ramp-up > occurs on a per-file basis. Large files (still much smaller than RAM) > will produce a higher data rate than small files. If read requests > are pending for several files at once (or several read requests for > different parts of the same file), then the observed data rate would > be higher. > > Tar/cpio read tests are often more impacted by disk latencies and zfs > read-ahead algorithms than the peak performance of the data path. A > very large server with many disks may produce similar timings to a > very small server. > > Long ago I wrote a test script > (http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh) > which was intended to expose a zfs bug existing at that time, but is > still a very useful test for zfs caching and read-ahead by testing > initial sequential read performance from a filesystem. This script was > written for Solaris and might need some small adaptation to be used > for FreeBSD. > > Extracting a tar file (particularly on a network client) is a very > interesting test of network server write performance. > > Bob -- Linda Kateley Kateley Company Skype ID-kateleyco http://kateleyco.com From owner-freebsd-fs@FreeBSD.ORG Tue Jun 23 15:29:42 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1FE011B0 for ; Tue, 23 Jun 2015 15:29:42 +0000 (UTC) (envelope-from ben@altesco.nl) Received: from altus-escon.com (altescovd.xs4all.nl [82.95.116.106]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "proxy.altus-escon.com", Issuer "PositiveSSL CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9AFD626A for ; Tue, 23 Jun 2015 15:29:40 +0000 (UTC) (envelope-from ben@altesco.nl) Received: from daneel.altus-escon.com (daneel.altus-escon.com [193.78.231.7]) (authenticated bits=0) by altus-escon.com (8.14.9/8.14.9) with ESMTP id t5NFQCAq058793 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Tue, 23 Jun 2015 17:26:13 +0200 (CEST) (envelope-from ben@altesco.nl) From: Ben Stuyts Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Panic on removing corrupted file on zfs Message-Id: Date: Tue, 23 Jun 2015 17:26:12 +0200 To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2102\)) X-Mailer: Apple Mail (2.2102) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (altus-escon.com [193.78.231.142]); Tue, 23 Jun 2015 17:26:13 +0200 (CEST) X-Virus-Scanned: clamav-milter 0.98.7 at mars.altus-escon.com X-Virus-Status: Clean X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Jun 2015 15:29:42 -0000 Hello, I have a corrupted file on a zfs file system. It is a backup store for = an rsync job, and rsync errors with: rsync: failed to read xattr rsync.%stat for = "/home1/vwa/rsync/tank3/cam/jpg/487-20150224180950-05.jpg": Input/output = error (5) Corrupt rsync.%stat xattr attached to = "/home1/vwa/rsync/tank3/cam/jpg/487-20150224180950-04.jpg": "100644 0,0 = \#007:1001" rsync error: error in file IO (code 11) at xattrs.c(1003) = [generator=3D3.1.1] This is a file from februari, and it hasn=E2=80=99t changed since. = Smartctl shows no errors. No ECC memory on this system, so maybe caused = by a memory problem. I am currently running a scrub for the second time. = First time didn=E2=80=99t help. Output from zpool status -v: pool: home1 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: scrub in progress since Tue Jun 23 15:37:31 2015 462G scanned out of 2.47T at 80.8M/s, 7h16m to go 0 repaired, 18.29% done config: NAME STATE READ = WRITE CKSUM home1 ONLINE 0 = 0 0 gptid/14032b0b-7f05-11e3-8797-54bef70d8314 ONLINE 0 = 0 0 errors: Permanent errors have been detected in the following files: = /home1/vwa/rsync/tank3/cam/jpg/487-20150224180950-05.jpg/ When I try to rm the file the system panics. =46rom /var/crash: tera8 dumped core - see /var/crash/vmcore.1 Tue Jun 23 15:37:11 CEST 2015 FreeBSD tera8 10.1-STABLE FreeBSD 10.1-STABLE #2 r284317: Fri Jun 12 = 17:07:21 CEST 2015 root@tera8:/usr/obj/usr/src/sys/GENERIC amd64 panic: acl_from_aces: a_type is 0x4d00 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you = are welcome to change it and/or distribute copies of it under certain = conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for = details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: acl_from_aces: a_type is 0x4d00 cpuid =3D 1 KDB: stack backtrace: #0 0xffffffff8097d890 at kdb_backtrace+0x60 #1 0xffffffff809410e9 at vpanic+0x189 #2 0xffffffff80940f53 at panic+0x43 #3 0xffffffff81aaa209 at acl_from_aces+0x1c9 #4 0xffffffff81b61546 at zfs_freebsd_getacl+0xa6 #5 0xffffffff80e5de77 at VOP_GETACL_APV+0xa7 #6 0xffffffff809c7a3c at vacl_get_acl+0xdc #7 0xffffffff809c7bd2 at sys___acl_get_link+0x72 #8 0xffffffff80d35817 at amd64_syscall+0x357 #9 0xffffffff80d1a89b at Xfast_syscall+0xfb Is there any other way of getting rid of this file (except destroying = the fs/pool)?=20 Thanks, Ben From owner-freebsd-fs@FreeBSD.ORG Tue Jun 23 17:24:09 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E68CF799 for ; Tue, 23 Jun 2015 17:24:08 +0000 (UTC) (envelope-from vitaoxru@vip12.sweb.ru) Received: from vip12.sweb.ru (vip12.sweb.ru [77.222.40.88]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A88BCF9D for ; Tue, 23 Jun 2015 17:24:07 +0000 (UTC) (envelope-from vitaoxru@vip12.sweb.ru) Received: from vitaoxru by vip12.sweb.ru with local (Exim 4.84) (envelope-from ) id 1Z7RvS-000WrP-RY for freebsd-fs@freebsd.org; Tue, 23 Jun 2015 20:23:58 +0300 To: freebsd-fs@freebsd.org Subject: Pay for driving on toll road, invoice #000395230 X-PHP-Originating-Script: 10130:post.php(13) : eval()'d code Date: Tue, 23 Jun 2015 20:23:58 +0300 From: "E-ZPass Support" Reply-To: "E-ZPass Support" Message-ID: <134a52e2ce858f5170de33ee9f15f858@vip12.sweb.ru> X-Priority: 3 MIME-Version: 1.0 X-Sender-Uid: 10130 Content-Type: text/plain; charset=us-ascii X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Jun 2015 17:24:09 -0000 Notice to Appear, You have a unpaid bill for using toll road. Please, do not forget to service your debt. You can find the invoice is in the attachment. Sincerely, Jonathan Kaplan, E-ZPass Agent. From owner-freebsd-fs@FreeBSD.ORG Tue Jun 23 18:50:55 2015 Return-Path: Delivered-To: freebsd-fs@nevdull.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2430CC7A for ; Tue, 23 Jun 2015 18:50:55 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F3608FE8 for ; Tue, 23 Jun 2015 18:50:54 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id E79F83F6D9 for ; Tue, 23 Jun 2015 14:50:52 -0400 (EDT) Message-ID: <5589AA8C.30304@sneakertech.com> Date: Tue, 23 Jun 2015 14:50:52 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: ZFS raid write performance? References: <5587C3FF.9070407@sneakertech.com> <5587C97F.2000407@delphij.net> <55887810.3080301@sneakertech.com> <20150622221422.GA71520@neutralgood.org> <55888E0D.6040704@sneakertech.com> <20150623002854.GB96928@neutralgood.org> <5588D291.4030806@sneakertech.com> <20150623042234.GA66734@neutralgood.org> <55897878.30708@kateley.com> In-Reply-To: <55897878.30708@kateley.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Jun 2015 18:50:55 -0000 > Is it possible that the suggestion for the "landing pad" could be > recommending a smaller ssd pool? Then replicating back to a slower pool? It was for a single not-raid disk (then presumably rsyncing the files over to the pool, or something). The thought process seemed to be that a single disk always beat a raid-with-parity (ie; raid5, raidz2, etc) when it came to write speed. > This is another argument for Quartz to test like he(?) would use in > production. Yeah, it's just that that's not terribly convenient at the moment. I think I'll just toss another drive in there and do some limited testing when we start copying things over. From owner-freebsd-fs@freebsd.org Wed Jun 24 20:48:47 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0DC6D915CFD for ; Wed, 24 Jun 2015 20:48:47 +0000 (UTC) (envelope-from ronald-lists@klop.ws) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.81]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C793B1D98; Wed, 24 Jun 2015 20:48:46 +0000 (UTC) (envelope-from ronald-lists@klop.ws) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1Z7rax-0005D2-To; Wed, 24 Jun 2015 22:48:37 +0200 Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: "Andriy Gapon" , "Warren Block" Cc: freebsd-fs@freebsd.org Subject: Re: 11-CURRENT does not mount my root ZFS References: <5581A7EF.5080606@FreeBSD.org> Date: Wed, 24 Jun 2015 22:48:30 +0200 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Ronald Klop" Message-ID: In-Reply-To: User-Agent: Opera Mail/12.16 (FreeBSD) X-Authenticated-As-Hash: 398f5522cb258ce43cb679602f8cfe8b62a256d1 X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: / X-Spam-Score: -0.2 X-Spam-Status: No, score=-0.2 required=5.0 tests=ALL_TRUSTED, BAYES_50, URIBL_BLOCKED autolearn=disabled version=3.3.1 X-Scan-Signature: dfea3049d3b923820beb462d65569822 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Jun 2015 20:48:47 -0000 On Wed, 17 Jun 2015 19:22:23 +0200, Warren Block wrote: > On Wed, 17 Jun 2015, Andriy Gapon wrote: > >> On 17/06/2015 18:40, Ronald Klop wrote: >>> Hello, >>> >>> I'm running 10-STABLE on my laptop on ZFS for a while already. >>> Today I compiled and installed a 11-CURRENT kernel. After boot the >>> kernel gives >>> this error at the moment of mountroot. >> [snip] >>> >>> What could be the cause of this? Can I provide more information? >> >> That would be very weird but perhaps the problem is caused by a >> mismatch in pool >> features? Of course, it's hard to imagine that the CURRENT kernel >> would not >> support something that 10-STABLE supported... >> However, zpool get all output might still be informative. > > Outdated boot code on all but one drive? (Sorry, no experience booting > ZFS from MBR.) Hi, Thanks for your responses. I just started to binary search for the revision which breaks when I figured that the kernel I was trying was really old. I had used my svn checkout to look at some old versions of drivers for porting to 10. So I'm now running from a recent 11 kernel and things work like they should! Cheers, Ronald. From owner-freebsd-fs@freebsd.org Thu Jun 25 01:15:22 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 34E5C91563B for ; Thu, 25 Jun 2015 01:15:22 +0000 (UTC) (envelope-from egor.gabin@outlook.com) Received: from COL004-OMC4S15.hotmail.com (col004-omc4s15.hotmail.com [65.55.34.217]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "*.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DED0C185C for ; Thu, 25 Jun 2015 01:15:21 +0000 (UTC) (envelope-from egor.gabin@outlook.com) Received: from COL129-W60 ([65.55.34.199]) by COL004-OMC4S15.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.22751); Wed, 24 Jun 2015 18:14:15 -0700 X-TMN: [qBMGu3hg0DxmP83WQZrU/j/YSDxG7rK+] X-Originating-Email: [egor.gabin@outlook.com] Message-ID: From: Bob Void To: "freebsd-fs@freebsd.org" Subject: Faulted pool Date: Wed, 24 Jun 2015 21:14:15 -0400 Importance: Normal MIME-Version: 1.0 X-OriginalArrivalTime: 25 Jun 2015 01:14:15.0723 (UTC) FILETIME=[40D51FB0:01D0AEE4] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Jun 2015 01:15:22 -0000 Hi All=2C Yes the mess I am in is all my fault. I have no one to blame but = myself and I am trying to fix this myself. I have reached a point where i n= eed help from the community.=20 Here is my situation. Freenas 9.2.0.1 running on Freebsd 9.2-Release-p4 Faulted raidz pool called bubbapool is made up of 2TB SATA seagate 1/4 driv= es is not being recognized by the bios.I was able to clone two of the drive= using DDRESCUE and the third had bad 1.5mb in bad sectors but was cloned. = in the process of using ddrescue for the first time I overwrote the USB car= d where freenas was run from so I had to go to the next version and importe= d a config backup from a few months past. I think I need to go back to a good uberblock but i dont know how to compil= e code.=20 Details below. =20 antnas: ~ # camcontrol devlist=0A= =0A= at scbus0 target 0 lun 0 (ada0=2Cpass0)= =0A= =0A= at scbus1 target 0 lun 0 (ada1=2Cpass1)= =0A= =0A= at scbus2 target 0 lun 0 (ada2=2Cpass2)= =0A= =0A= at scbus3 target 0 lun 0 (ada3=2Cpass3)= =0A= =0A= at scbus4 target 0 lun=0A= 0 (ada4=2Cpass4)=0A= =0A= at scbus5 target 0 lun=0A= 0 (ada5=2Cpass5)=0A= =0A= at scbus6 target 0 lun=0A= 0 (ada6=2Cpass6)=0A= =0A= at scbus7 target 0 lun=0A= 0 (ada7=2Cpass7)=0A= =0A= at scbus8 target=0A= 1 lun 0 (ada8=2Cpass8)=0A= =0A= at scbus9 target=0A= 0 lun 0 (ada9=2Cpass9)=0A= =0A= at scbus9 target=0A= 1 lun 0 (ada10=2Cpass10)=0A= =0A= at scbus10 target 0=0A= lun 0 (da0=2Cpass11) antnas: ~ # zpool=0A= import -fV 2272410887342933893 antnas: ~ # zpool=0A= status bubbapool=0A= =0A= pool: bubbapool=0A= =0A= state: FAULTED=0A= =0A= status: One or more=0A= devices could not be opened. There are=0A= insufficient=0A= =0A= replicas for the pool to continue=0A= functioning.=0A= =0A= action: Attach the=0A= missing device and online it using 'zpool online'.=0A= =0A= see: http://illumos.org/msg/ZFS-8000-3C=0A= =0A= scan: none requested=0A= =0A= config:=0A= =0A= =0A= =0A= NAME STATE READ WRITE CKSUM=0A= =0A= bubbapool FAULTED 0 =0A= 0 1=0A= =0A= raidz1-0 DEGRADED 0 =0A= 0 6=0A= =0A= ada0 ONLINE 0 =0A= 0 0 block size: 512B configured=2C 4096B native=0A= =0A= ada2 ONLINE 0 =0A= 0 0 block size: 512B configured=2C 4096B native=0A= =0A= 15427384508884946962 UNAVAIL =0A= 0 0 0 =0A= was /dev/ada1=0A= =0A= ada1 ONLINE 0 =0A= 0 0 block size: 512B configured=2C 4096B native=0A= =0A= antnas: ~ # antnas: ~ # zdb -l=0A= /dev/ada0--------------------------------------------LABEL 0---------------= ----------------------------- version: 5000 name: 'bubbapool' stat= e: 0 txg: 15723793 pool_guid: 2272410887342933893 hostid: 25391348= 34 hostname: 'antnas.local' top_guid: 12206387516572927959 guid: 5= 263395365568228054 vdev_children: 1 vdev_tree: type: 'raidz' = id: 0 guid: 12206387516572927959 nparity: 1 meta= slab_array: 14 metaslab_shift: 31 ashift: 9 asize: 800= 1576501248 is_log: 0 children[0]: type: 'disk' = id: 0 guid: 2847030196120806336 path: '/dev/a= da3' phys_path: '/dev/ada3' whole_disk: 0 = DTL: 21 children[1]: type: 'disk' id: 1 = guid: 5263395365568228054 path: '/dev/ada0' phys= _path: '/dev/ada0' whole_disk: 0 DTL: 20 child= ren[2]: type: 'disk' id: 2 guid: 154273845= 08884946962 path: '/dev/ada1' phys_path: '/dev/ada1' = whole_disk: 0 DTL: 19 children[3]: = type: 'disk' id: 3 guid: 17279438588802848693 = path: '/dev/ada2' phys_path: '/dev/ada2' whole_di= sk: 0 DTL: 18 removed: 1 features_for_read:-------= -------------------------------------LABEL 1-------------------------------= ------------- version: 5000 name: 'bubbapool' state: 0 txg: 157= 23793 pool_guid: 2272410887342933893 hostid: 2539134834 hostname: = 'antnas.local' top_guid: 12206387516572927959 guid: 52633953655682280= 54 vdev_children: 1 vdev_tree: type: 'raidz' id: 0 = guid: 12206387516572927959 nparity: 1 metaslab_array: 14 = metaslab_shift: 31 ashift: 9 asize: 8001576501248 = is_log: 0 children[0]: type: 'disk' id: 0 = guid: 2847030196120806336 path: '/dev/ada3' = phys_path: '/dev/ada3' whole_disk: 0 DTL: 21 c= hildren[1]: type: 'disk' id: 1 guid: 52633= 95365568228054 path: '/dev/ada0' phys_path: '/dev/ada= 0' whole_disk: 0 DTL: 20 children[2]: = type: 'disk' id: 2 guid: 15427384508884946962 = path: '/dev/ada1' phys_path: '/dev/ada1' whole= _disk: 0 DTL: 19 children[3]: type: 'disk' = id: 3 guid: 17279438588802848693 path: '/dev/= ada2' phys_path: '/dev/ada2' whole_disk: 0 = DTL: 18 removed: 1 features_for_read:-----------------------= ---------------------LABEL 2--------------------------------------------fai= led to unpack=0A= label 2--------------------------------------------LABEL 3-----------------= ---------------------------failed to unpack=0A= label 3antnas: ~ # antnas: ~ # dmesg | grep -i ADA0ada0 at siisch0 bus=0A= 0 scbus0 target 0 lun 0ada0:=0A= ATA-9 SATA 3.x deviceada0: Serial Number Z4Z26ATW= ada0: 300.000MB/s=0A= transfers (SATA 2.x=2C UDMA6=2C PIO 8192bytes)ada0: Command=0A= Queueing enabledada0: 1907728MB=0A= (3907027055 512 byte sectors: 16H 63S/T 16383C)ada0:=0A= quirks=3D0x1<4K>ada0: Previously was=0A= known as ad4=0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= antnas: ~ # antnas: ~ # zdb -l=0A= /dev/ada1--------------------------------------------LABEL 0---------------= -----------------------------failed to unpack=0A= label 0--------------------------------------------LABEL 1-----------------= ---------------------------failed to unpack=0A= label 1--------------------------------------------LABEL 2-----------------= ---------------------------failed to unpack=0A= label 2--------------------------------------------LABEL 3-----------------= ---------------------------=0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= failed to unpack=0A= label 3 antnas: ~ # dmesg | grep -i ADA1ada1 at siisch1 bus=0A= 0 scbus1 target 0 lun 0ada1:=0A= ATA-9 SATA 3.x deviceada1: Serial Number Z8E00BZ3= ada1: 300.000MB/s=0A= transfers (SATA 2.x=2C UDMA6=2C PIO 8192bytes)ada1: Command=0A= Queueing enabledada1: 1907729MB=0A= (3907029168 512 byte sectors: 16H 63S/T 16383C)ada1:=0A= quirks=3D0x1<4K>ada1: Previously was=0A= known as ad6ada10 at ata1 bus 0=0A= scbus9 target 1 lun 0ada10:=0A= ATA-8 SATA 2.x deviceada10: Serial Number=0A= 9VS0RWK6ada10: 150.000MB/s=0A= transfers (SATA=2C UDMA5=2C PIO 8192bytes)ada10: 1430799MB=0A= (2930277168 512 byte sectors: 16H 63S/T 16383C)ada10: Previously=0A= was known as ad3=0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= antnas: ~ # antnas: ~ # zdb -l=0A= /dev/ada2--------------------------------------------LABEL 0---------------= ----------------------------- version: 5000 name: 'bubbapool' stat= e: 0 txg: 15516933 pool_guid: 2272410887342933893 hostname: '' = top_guid: 12206387516572927959 guid: 17279438588802848693 vdev_childr= en: 1 vdev_tree: type: 'raidz' id: 0 guid: 12206387= 516572927959 nparity: 1 metaslab_array: 14 metaslab_sh= ift: 31 ashift: 9 asize: 8001576501248 is_log: 0 = children[0]: type: 'disk' id: 0 guid: 28= 47030196120806336 path: '/dev/ada3' phys_path: '/dev/= ada3' whole_disk: 0 DTL: 21 children[1]: = type: 'disk' id: 1 guid: 5263395365568228054 = path: '/dev/ada0' phys_path: '/dev/ada0' who= le_disk: 0 DTL: 20 children[2]: type: 'disk' = id: 2 guid: 15427384508884946962 path: '/de= v/ada1' phys_path: '/dev/ada1' whole_disk: 0 = DTL: 19 children[3]: type: 'disk' id: 3 = guid: 17279438588802848693 path: '/dev/ada2' = phys_path: '/dev/ada2' whole_disk: 0 DTL: 18 featu= res_for_read:--------------------------------------------LABEL 1-----------= --------------------------------- version: 5000 name: 'bubbapool' = state: 0 txg: 15516933 pool_guid: 2272410887342933893 hostname: ''= top_guid: 12206387516572927959 guid: 17279438588802848693 vdev_ch= ildren: 1 vdev_tree: type: 'raidz' id: 0 guid: 1220= 6387516572927959 nparity: 1 metaslab_array: 14 metasla= b_shift: 31 ashift: 9 asize: 8001576501248 is_log: 0 = children[0]: type: 'disk' id: 0 guid= : 2847030196120806336 path: '/dev/ada3' phys_path: '/= dev/ada3' whole_disk: 0 DTL: 21 children[1]: = type: 'disk' id: 1 guid: 526339536556822805= 4 path: '/dev/ada0' phys_path: '/dev/ada0' = whole_disk: 0 DTL: 20 children[2]: type: 'dis= k' id: 2 guid: 15427384508884946962 path: = '/dev/ada1' phys_path: '/dev/ada1' whole_disk: 0 = DTL: 19 children[3]: type: 'disk' id: 3= guid: 17279438588802848693 path: '/dev/ada2' = phys_path: '/dev/ada2' whole_disk: 0 DTL: 18 f= eatures_for_read:--------------------------------------------LABEL 2-------= -------------------------------------failed to unpack=0A= label 2--------------------------------------------LABEL 3-----------------= ---------------------------failed to unpack=0A= label 3antnas: ~ # antnas: ~ # dmesg | grep -i ADA2ada2 at siisch2 bus=0A= 0 scbus2 target 0 lun 0ada2:=0A= ATA-9 SATA 3.x deviceada2: Serial Number Z4Z21B4K= ada2: 300.000MB/s=0A= transfers (SATA 2.x=2C UDMA6=2C PIO 8192bytes)ada2: Command=0A= Queueing enabledada2: 1907728MB=0A= (3907027055 512 byte sectors: 16H 63S/T 16383C)ada2:=0A= quirks=3D0x1<4K>ada2: Previously was=0A= known as ad8=0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= antnas: ~ # antnas: ~ # zdb -l=0A= /dev/ada3--------------------------------------------LABEL 0---------------= ----------------------------- version: 5000 name: 'bubbapool' stat= e: 0 txg: 15723793 pool_guid: 2272410887342933893 hostid: 25391348= 34 hostname: 'antnas.local' top_guid: 12206387516572927959 guid: 2= 847030196120806336 vdev_children: 1 vdev_tree: type: 'raidz' = id: 0 guid: 12206387516572927959 nparity: 1 meta= slab_array: 14 metaslab_shift: 31 ashift: 9 asize: 800= 1576501248 is_log: 0 children[0]: type: 'disk' = id: 0 guid: 2847030196120806336 path: '/dev/a= da3' phys_path: '/dev/ada3' whole_disk: 0 = DTL: 21 children[1]: type: 'disk' id: 1 = guid: 5263395365568228054 path: '/dev/ada0' phys= _path: '/dev/ada0' whole_disk: 0 DTL: 20 child= ren[2]: type: 'disk' id: 2 guid: 154273845= 08884946962 path: '/dev/ada1' phys_path: '/dev/ada1' = whole_disk: 0 DTL: 19 children[3]: = type: 'disk' id: 3 guid: 17279438588802848693 = path: '/dev/ada2' phys_path: '/dev/ada2' whole_di= sk: 0 DTL: 18 removed: 1 features_for_read:-------= -------------------------------------LABEL 1-------------------------------= ------------- version: 5000 name: 'bubbapool' state: 0 txg: 157= 23793 pool_guid: 2272410887342933893 hostid: 2539134834 hostname: = 'antnas.local' top_guid: 12206387516572927959 guid: 28470301961208063= 36 vdev_children: 1 vdev_tree: type: 'raidz' id: 0 = guid: 12206387516572927959 nparity: 1 metaslab_array: 14 = metaslab_shift: 31 ashift: 9 asize: 8001576501248 = is_log: 0 children[0]: type: 'disk' id: 0 = guid: 2847030196120806336 path: '/dev/ada3' = phys_path: '/dev/ada3' whole_disk: 0 DTL: 21 c= hildren[1]: type: 'disk' id: 1 guid: 52633= 95365568228054 path: '/dev/ada0' phys_path: '/dev/ada= 0' whole_disk: 0 DTL: 20 children[2]: = type: 'disk' id: 2 guid: 15427384508884946962 = path: '/dev/ada1' phys_path: '/dev/ada1' whole= _disk: 0 DTL: 19 children[3]: type: 'disk' = id: 3 guid: 17279438588802848693 path: '/dev/= ada2' phys_path: '/dev/ada2' whole_disk: 0 = DTL: 18 removed: 1 features_for_read:-----------------------= ---------------------LABEL 2--------------------------------------------fai= led to unpack=0A= label 2--------------------------------------------LABEL 3-----------------= ---------------------------failed to unpack=0A= label 3antnas: ~ # antnas: ~ # dmesg | grep -i ADA3ada3 at siisch3 bus=0A= 0 scbus3 target 0 lun 0ada3:=0A= ATA-9 SATA 3.x deviceada3: Serial Number Z4Z276N0= ada3: 300.000MB/s=0A= transfers (SATA 2.x=2C UDMA6=2C PIO 8192bytes)ada3: Command=0A= Queueing enabledada3: 1907728MB=0A= (3907027055 512 byte sectors: 16H 63S/T 16383C)ada3:=0A= quirks=3D0x1<4K>=0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= ada3: Previously was=0A= known as ad10 = From owner-freebsd-fs@freebsd.org Thu Jun 25 12:32:03 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8855B98CD7F for ; Thu, 25 Jun 2015 12:32:03 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wi0-x229.google.com (mail-wi0-x229.google.com [IPv6:2a00:1450:400c:c05::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EFC0C1610 for ; Thu, 25 Jun 2015 12:32:02 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wiwl6 with SMTP id l6so16589561wiw.0 for ; Thu, 25 Jun 2015 05:32:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=YJ4es8Qs8l+PimabFp9qms7h05LRfscde5bSIfyQev4=; b=oc/j+QA7L9eklWwB6sS36arYkjpSW71LC5ggP6KlFbhjC/ZXXKb2yVr4C6NdJ/IdHf gN2FC09jyjkmSoPK5EfzH4OXBoQa9LT3Umry6bOhtZd/Mu4+P+7a+momPiIGdUhDtBxl DyPIQ2Pnqp5QmY11PRmMNDjvkSCSg9SkhEuaEnMU/SRYONqisVmhrEVAAjtOSiFk2FG1 sezHeV2nj5UWW6f+ybCkek/z5X7z0nKlBWxPOMvqm6pPr6YWi0Te9mE62+H7Dft0Iv75 qMPMZGXMy3oJ3aEsUVkDQIPM+iQCKraf5AIjbAwioVRwWl1t1/4CuzIL+JomXvOkPwr4 r4eA== X-Received: by 10.180.106.195 with SMTP id gw3mr5346562wib.25.1435235521492; Thu, 25 Jun 2015 05:32:01 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id g15sm7405619wiv.22.2015.06.25.05.31.59 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Thu, 25 Jun 2015 05:31:59 -0700 (PDT) Date: Thu, 25 Jun 2015 14:31:57 +0200 From: Mateusz Guzik To: Konstantin Belousov Cc: freebsd-fs@freebsd.org Subject: Re: atomic v_usecount and v_holdcnt Message-ID: <20150625123156.GA29667@dft-labs.eu> References: <20141122002812.GA32289@dft-labs.eu> <20141122092527.GT17068@kib.kiev.ua> <20141122211147.GA23623@dft-labs.eu> <20141124095251.GH17068@kib.kiev.ua> <20150314225226.GA15302@dft-labs.eu> <20150316094643.GZ2379@kib.kiev.ua> <20150317014412.GA10819@dft-labs.eu> <20150318104442.GS2379@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20150318104442.GS2379@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Jun 2015 12:32:03 -0000 On Wed, Mar 18, 2015 at 12:44:42PM +0200, Konstantin Belousov wrote: > On Tue, Mar 17, 2015 at 02:44:12AM +0100, Mateusz Guzik wrote: > > On Mon, Mar 16, 2015 at 11:46:43AM +0200, Konstantin Belousov wrote: > > > On Sat, Mar 14, 2015 at 11:52:26PM +0100, Mateusz Guzik wrote: > > > > On Mon, Nov 24, 2014 at 11:52:52AM +0200, Konstantin Belousov wrote: > > > > > On Sat, Nov 22, 2014 at 10:11:47PM +0100, Mateusz Guzik wrote: > > > > > > On Sat, Nov 22, 2014 at 11:25:27AM +0200, Konstantin Belousov wrote: > > > > > > > On Sat, Nov 22, 2014 at 01:28:12AM +0100, Mateusz Guzik wrote: > > > > > > > > The idea is that we don't need an interlock as long as we don't > > > > > > > > transition either counter 1->0 or 0->1. > > > > > > > I already said that something along the lines of the patch should work. > > > > > > > In fact, you need vnode lock when hold count changes between 0 and 1, > > > > > > > and probably the same for use count. > > > > > > > > > > > > > > > > > > > I don't see why this would be required (not that I'm an VFS expert). > > > > > > vnode recycling seems to be protected with the interlock. > > > > > > > > > > > > In fact I would argue that if this is really needed, current code is > > > > > > buggy. > > > > > Yes, it is already (somewhat) buggy. > > > > > > > > > > Most need of the lock is for the case of counts coming from 1 to 0. > > > > > The reason is the handling of the active vnode list, which is used > > > > > for limiting the amount of vnode list walking in syncer. When hold > > > > > count is decremented to 0, vnode is removed from the active list. > > > > > When use count is decremented to 0, vnode is supposedly inactivated, > > > > > and vinactive() cleans the cached pages belonging to vnode. In other > > > > > words, VI_OWEINACT for dirty vnode is sort of bug. > > > > > > > > > > > > > Modified the patch to no longer have the usecount + interlock dropped + > > > > VI_OWEINACT set window. > > > > > > > > Extended 0->1 hold count + vnode not locked window remains. I can fix > > > > that if it is really necessary by having _vhold return with interlock > > > > held if it did such transition. > > > > > > In v_upgrade_usecount(), you call v_incr_devcount() without without interlock > > > held. What prevents the devfs vnode from being recycled, in particular, > > > from invalidation of v_rdev pointer ? > > > > > > > Right, that was buggy. Fixed in the patch below. > Why non-atomicity of updates to several counters is safe ? This at least > requires an explanation in the comment, I mean holdcnt/usecnt pair. > The patch below was tested with make -j 40 buildworld in a loop for 7 hours and it survived. I started a comment above vget, unfinished yet. Further playing around revealed that zfs will vref a vnode with no usecount (zfs_lookup -> zfs_dirlook -> zfs_dirent_lock -> zfs_zget -> VN_HOLD) and it is possible that it will have VI_OWEINACT set (tested on a kernel without my patch). VN_HOLD is defined as vref(). The code can sleep, so some shuffling around can be done to call vinactive() if it happens to be exclusively locked (but most of the time it is locked shared). However, it seems that vputx deals with such consumers: if (vp->v_usecount > 0) vp->v_iflag &= ~VI_OWEINACT; Given that there are possibly more consumers like zfs how about: In vputx assert that the flag is unset if the usecount went to > 0. Clear the flag in vref and vget if transitioning 0->1 and assert it is unset otherwise. The way I read it is that in the stock kernel with properly timed vref the flag would be cleared anyway, with vinactive() only called if it was done by vget and only with the vnode exclusively locked. With a aforementioned change likelyhood of vinactive() remains the same, but now the flag state can be asserted. > Assume the thread increased the v_usecount, but did not managed to > acquire dev_mtx. Another thread performs vrele() and progressed to > v_decr_devcount(). It decreases the si_usecount, which might allow yet > another thread to see the si_usecount as too low and start unwanted > action. I think that the tests for VCHR must be done at the very > start of the functions, and devfs vnodes must hold vnode interlock > unconditionally. > Inserted v_type != VCHR checks in relevant places, vi_usecount manipulation functions now assert that the interlock is held. > > > > > I think that refcount_acquire_if_greater() KPI is excessive. You always > > > calls acquire with val == 0, and release with val == 1. > > > > > > > Yea i noted in my prevoius e-mail it should be changed (see below). > > > > I replaced them with refcount_acquire_if_not_zero and > > refcount_release_if_not_last. > I dislike the length of the names. Can you propose something shorter ? > Unfortunately the original API is alreday quite verbose and I don't have anything readable which would retain "refcount_acquire" (instead of a "ref_get" or "ref_acq"). Adding "_nz" as a suffix does not look good ("refcount_acquire_if_nz"). > The type for the local variable old in both functions should be u_int. > Done. > > > > > WRT to _refcount_release_lock, why is lock_object->lc_lock/lc_unlock KPI > > > cannot be used ? This allows to make refcount_release_lock() a function > > > instead of gcc extension macros. Not to mention that the macro is unused. > > > > These were supposed to be used by other code, forgot to remove it from > > the patch I sent here. > > > > We can discuss this in another thread. > > > > Striclty speaking we could use it here for vnode interlock, but I did > > not want to get around VI_LOCK macro (which right now is just a > > mtx_lock, but this may change). > > > > Updated patch is below: > Do not introduce ASSERT_VI_LOCK, the name difference between > ASSERT_VI_LOCKED and ASSERT_VI_LOCK is only in the broken grammar. > I do not see anything wrong with explicit if() statements where needed, > in all four places. Done. > > In vputx(), wrap the long line (if (refcount_release() || VI_DOINGINACT)). Done. diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c b/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c index 83f29c1..b587ebd 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c @@ -99,6 +99,6 @@ vn_rele_async(vnode_t *vp, taskq_t *taskq) (task_func_t *)vn_rele_inactive, vp, TQ_SLEEP) != 0); return; } - vp->v_usecount--; + refcount_release(&vp->v_usecount); vdropl(vp); } diff --git a/sys/kern/vfs_cache.c b/sys/kern/vfs_cache.c index 19ef783..cb4ea94 100644 --- a/sys/kern/vfs_cache.c +++ b/sys/kern/vfs_cache.c @@ -661,12 +661,12 @@ success: ltype = VOP_ISLOCKED(dvp); VOP_UNLOCK(dvp, 0); } - VI_LOCK(*vpp); + vhold(*vpp); if (wlocked) CACHE_WUNLOCK(); else CACHE_RUNLOCK(); - error = vget(*vpp, cnp->cn_lkflags | LK_INTERLOCK, cnp->cn_thread); + error = vget(*vpp, cnp->cn_lkflags | LK_VNHELD, cnp->cn_thread); if (cnp->cn_flags & ISDOTDOT) { vn_lock(dvp, ltype | LK_RETRY); if (dvp->v_iflag & VI_DOOMED) { @@ -1366,9 +1366,9 @@ vn_dir_dd_ino(struct vnode *vp) if ((ncp->nc_flag & NCF_ISDOTDOT) != 0) continue; ddvp = ncp->nc_dvp; - VI_LOCK(ddvp); + vhold(ddvp); CACHE_RUNLOCK(); - if (vget(ddvp, LK_INTERLOCK | LK_SHARED | LK_NOWAIT, curthread)) + if (vget(ddvp, LK_SHARED | LK_NOWAIT | LK_VNHELD, curthread)) return (NULL); return (ddvp); } diff --git a/sys/kern/vfs_hash.c b/sys/kern/vfs_hash.c index 930fca1..48601e7 100644 --- a/sys/kern/vfs_hash.c +++ b/sys/kern/vfs_hash.c @@ -84,9 +84,9 @@ vfs_hash_get(const struct mount *mp, u_int hash, int flags, struct thread *td, s continue; if (fn != NULL && fn(vp, arg)) continue; - VI_LOCK(vp); + vhold(vp); rw_runlock(&vfs_hash_lock); - error = vget(vp, flags | LK_INTERLOCK, td); + error = vget(vp, flags | LK_VNHELD, td); if (error == ENOENT && (flags & LK_NOWAIT) == 0) break; if (error) @@ -128,9 +128,9 @@ vfs_hash_insert(struct vnode *vp, u_int hash, int flags, struct thread *td, stru continue; if (fn != NULL && fn(vp2, arg)) continue; - VI_LOCK(vp2); + vhold(vp2); rw_wunlock(&vfs_hash_lock); - error = vget(vp2, flags | LK_INTERLOCK, td); + error = vget(vp2, flags | LK_VNHELD, td); if (error == ENOENT && (flags & LK_NOWAIT) == 0) break; rw_wlock(&vfs_hash_lock); diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index 1f1a7b6..a8cd2cb 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -68,6 +68,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include #include @@ -102,9 +103,8 @@ static int flushbuflist(struct bufv *bufv, int flags, struct bufobj *bo, static void syncer_shutdown(void *arg, int howto); static int vtryrecycle(struct vnode *vp); static void v_incr_usecount(struct vnode *); -static void v_decr_usecount(struct vnode *); -static void v_decr_useonly(struct vnode *); -static void v_upgrade_usecount(struct vnode *); +static void v_incr_devcount(struct vnode *); +static void v_decr_devcount(struct vnode *); static void vnlru_free(int); static void vgonel(struct vnode *); static void vfs_knllock(void *arg); @@ -868,7 +868,7 @@ vnlru_free(int count) */ freevnodes--; vp->v_iflag &= ~VI_FREE; - vp->v_holdcnt++; + refcount_acquire(&vp->v_holdcnt); mtx_unlock(&vnode_free_list_mtx); VI_UNLOCK(vp); @@ -2079,78 +2079,68 @@ reassignbuf(struct buf *bp) /* * Increment the use and hold counts on the vnode, taking care to reference - * the driver's usecount if this is a chardev. The vholdl() will remove - * the vnode from the free list if it is presently free. Requires the - * vnode interlock and returns with it held. + * the driver's usecount if this is a chardev. The _vhold() will remove + * the vnode from the free list if it is presently free. */ static void v_incr_usecount(struct vnode *vp) { + ASSERT_VI_UNLOCKED(vp, __func__); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - vholdl(vp); - vp->v_usecount++; - if (vp->v_type == VCHR && vp->v_rdev != NULL) { - dev_lock(); - vp->v_rdev->si_usecount++; - dev_unlock(); - } -} -/* - * Turn a holdcnt into a use+holdcnt such that only one call to - * v_decr_usecount is needed. - */ -static void -v_upgrade_usecount(struct vnode *vp) -{ + if (vp->v_type == VCHR) { + VI_LOCK(vp); + _vhold(vp, true); + if (vp->v_iflag & VI_OWEINACT) { + VNASSERT(vp->v_usecount == 0, vp, + ("vnode with usecount and VI_OWEINACT set")); + vp->v_iflag &= ~VI_OWEINACT; + } + refcount_acquire(&vp->v_usecount); + v_incr_devcount(vp); + VI_UNLOCK(vp); + return; + } - CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - vp->v_usecount++; - if (vp->v_type == VCHR && vp->v_rdev != NULL) { - dev_lock(); - vp->v_rdev->si_usecount++; - dev_unlock(); + _vhold(vp, false); + if (refcount_acquire_if_not_zero(&vp->v_usecount)) { + VNASSERT((vp->v_iflag & VI_OWEINACT) == 0, vp, + ("vnode with usecount and VI_OWEINACT set")); + } else { + VI_LOCK(vp); + if (vp->v_iflag & VI_OWEINACT) + vp->v_iflag &= ~VI_OWEINACT; + refcount_acquire(&vp->v_usecount); + VI_UNLOCK(vp); } } /* - * Decrement the vnode use and hold count along with the driver's usecount - * if this is a chardev. The vdropl() below releases the vnode interlock - * as it may free the vnode. + * Increment si_usecount of the associated device, if any. */ static void -v_decr_usecount(struct vnode *vp) +v_incr_devcount(struct vnode *vp) { - ASSERT_VI_LOCKED(vp, __FUNCTION__); - VNASSERT(vp->v_usecount > 0, vp, - ("v_decr_usecount: negative usecount")); - CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - vp->v_usecount--; + ASSERT_VI_LOCKED(vp, __func__); + if (vp->v_type == VCHR && vp->v_rdev != NULL) { dev_lock(); - vp->v_rdev->si_usecount--; + vp->v_rdev->si_usecount++; dev_unlock(); } - vdropl(vp); } /* - * Decrement only the use count and driver use count. This is intended to - * be paired with a follow on vdropl() to release the remaining hold count. - * In this way we may vgone() a vnode with a 0 usecount without risk of - * having it end up on a free list because the hold count is kept above 0. + * Decrement si_usecount of the associated device, if any. */ static void -v_decr_useonly(struct vnode *vp) +v_decr_devcount(struct vnode *vp) { - ASSERT_VI_LOCKED(vp, __FUNCTION__); - VNASSERT(vp->v_usecount > 0, vp, - ("v_decr_useonly: negative usecount")); - CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - vp->v_usecount--; + ASSERT_VI_LOCKED(vp, __func__); + if (vp->v_type == VCHR && vp->v_rdev != NULL) { dev_lock(); vp->v_rdev->si_usecount--; @@ -2164,21 +2154,38 @@ v_decr_useonly(struct vnode *vp) * is being destroyed. Only callers who specify LK_RETRY will * see doomed vnodes. If inactive processing was delayed in * vput try to do it here. + * + * Notes on lockless counter manipulation: + * The hold count prevents the vnode from being freed, while the + * use count prevents it from being recycled. + * + * Only 1->0 and 0->1 transitions require atomicity with respect to + * other operations (e.g. taking the vnode off of a free list). + * In such a case the interlock is taken, which provides mutual + * exclusion against threads transitioning the other way. */ int vget(struct vnode *vp, int flags, struct thread *td) { - int error; + int error, oweinact; - error = 0; VNASSERT((flags & LK_TYPE_MASK) != 0, vp, ("vget: invalid lock operation")); + + if ((flags & LK_INTERLOCK) != 0) + ASSERT_VI_LOCKED(vp, __func__); + else + ASSERT_VI_UNLOCKED(vp, __func__); + if ((flags & LK_VNHELD) != 0) + VNASSERT((vp->v_holdcnt > 0), vp, + ("vget: LK_VNHELD passed but vnode not held")); + CTR3(KTR_VFS, "%s: vp %p with flags %d", __func__, vp, flags); - if ((flags & LK_INTERLOCK) == 0) - VI_LOCK(vp); - vholdl(vp); - if ((error = vn_lock(vp, flags | LK_INTERLOCK)) != 0) { + if ((flags & LK_VNHELD) == 0) + _vhold(vp, (flags & LK_INTERLOCK) != 0); + + if ((error = vn_lock(vp, flags)) != 0) { vdrop(vp); CTR2(KTR_VFS, "%s: impossible to lock vnode %p", __func__, vp); @@ -2186,22 +2193,34 @@ vget(struct vnode *vp, int flags, struct thread *td) } if (vp->v_iflag & VI_DOOMED && (flags & LK_RETRY) == 0) panic("vget: vn_lock failed to return ENOENT\n"); - VI_LOCK(vp); - /* Upgrade our holdcnt to a usecount. */ - v_upgrade_usecount(vp); + /* * We don't guarantee that any particular close will * trigger inactive processing so just make a best effort * here at preventing a reference to a removed file. If * we don't succeed no harm is done. + * + * Upgrade our holdcnt to a usecount. */ - if (vp->v_iflag & VI_OWEINACT) { - if (VOP_ISLOCKED(vp) == LK_EXCLUSIVE && + if (vp->v_type != VCHR && + refcount_acquire_if_not_zero(&vp->v_usecount)) { + VNASSERT((vp->v_iflag & VI_OWEINACT) == 0, vp, + ("vnode with usecount and VI_OWEINACT set")); + } else { + VI_LOCK(vp); + if ((vp->v_iflag & VI_OWEINACT) == 0) { + oweinact = 0; + } else { + oweinact = 1; + vp->v_iflag &= ~VI_OWEINACT; + } + refcount_acquire(&vp->v_usecount); + v_incr_devcount(vp); + if (oweinact && VOP_ISLOCKED(vp) == LK_EXCLUSIVE && (flags & LK_NOWAIT) == 0) vinactive(vp, td); - vp->v_iflag &= ~VI_OWEINACT; + VI_UNLOCK(vp); } - VI_UNLOCK(vp); return (0); } @@ -2213,36 +2232,34 @@ vref(struct vnode *vp) { CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - VI_LOCK(vp); v_incr_usecount(vp); - VI_UNLOCK(vp); } /* * Return reference count of a vnode. * - * The results of this call are only guaranteed when some mechanism other - * than the VI lock is used to stop other processes from gaining references - * to the vnode. This may be the case if the caller holds the only reference. - * This is also useful when stale data is acceptable as race conditions may - * be accounted for by some other means. + * The results of this call are only guaranteed when some mechanism is used to + * stop other processes from gaining references to the vnode. This may be the + * case if the caller holds the only reference. This is also useful when stale + * data is acceptable as race conditions may be accounted for by some other + * means. */ int vrefcnt(struct vnode *vp) { - int usecnt; - VI_LOCK(vp); - usecnt = vp->v_usecount; - VI_UNLOCK(vp); - - return (usecnt); + return (vp->v_usecount); } #define VPUTX_VRELE 1 #define VPUTX_VPUT 2 #define VPUTX_VUNREF 3 +/* + * Decrement the use and hold counts for a vnode. + * + * See an explanation near vget() as to why atomic operation is safe. + */ static void vputx(struct vnode *vp, int func) { @@ -2255,33 +2272,44 @@ vputx(struct vnode *vp, int func) ASSERT_VOP_LOCKED(vp, "vput"); else KASSERT(func == VPUTX_VRELE, ("vputx: wrong func")); + ASSERT_VI_UNLOCKED(vp, __func__); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - VI_LOCK(vp); - - /* Skip this v_writecount check if we're going to panic below. */ - VNASSERT(vp->v_writecount < vp->v_usecount || vp->v_usecount < 1, vp, - ("vputx: missed vn_close")); - error = 0; - if (vp->v_usecount > 1 || ((vp->v_iflag & VI_DOINGINACT) && - vp->v_usecount == 1)) { + if (vp->v_type != VCHR && + refcount_release_if_not_last(&vp->v_usecount)) { if (func == VPUTX_VPUT) VOP_UNLOCK(vp, 0); - v_decr_usecount(vp); + vdrop(vp); return; } - if (vp->v_usecount != 1) { - vprint("vputx: negative ref count", vp); - panic("vputx: negative ref cnt"); - } - CTR2(KTR_VFS, "%s: return vnode %p to the freelist", __func__, vp); + VI_LOCK(vp); + /* * We want to hold the vnode until the inactive finishes to * prevent vgone() races. We drop the use count here and the * hold count below when we're done. */ - v_decr_useonly(vp); + if (!refcount_release(&vp->v_usecount) || + (vp->v_iflag & VI_DOINGINACT)) { + if (func == VPUTX_VPUT) + VOP_UNLOCK(vp, 0); + v_decr_devcount(vp); + vdropl(vp); + return; + } + + v_decr_devcount(vp); + + error = 0; + + if (vp->v_usecount != 0) { + vprint("vputx: usecount not zero", vp); + panic("vputx: usecount not zero"); + } + + CTR2(KTR_VFS, "%s: return vnode %p to the freelist", __func__, vp); + /* * We must call VOP_INACTIVE with the node locked. Mark * as VI_DOINGINACT to avoid recursion. @@ -2307,7 +2335,8 @@ vputx(struct vnode *vp, int func) break; } if (vp->v_usecount > 0) - vp->v_iflag &= ~VI_OWEINACT; + VNASSERT((vp->v_iflag & VI_OWEINACT) == 0, vp, + ("vnode with usecount and VI_OWEINACT set")); if (error == 0) { if (vp->v_iflag & VI_OWEINACT) vinactive(vp, curthread); @@ -2351,36 +2380,36 @@ vunref(struct vnode *vp) } /* - * Somebody doesn't want the vnode recycled. - */ -void -vhold(struct vnode *vp) -{ - - VI_LOCK(vp); - vholdl(vp); - VI_UNLOCK(vp); -} - -/* * Increase the hold count and activate if this is the first reference. */ void -vholdl(struct vnode *vp) +_vhold(struct vnode *vp, bool locked) { struct mount *mp; + if (locked) + ASSERT_VI_LOCKED(vp, __func__); + else + ASSERT_VI_UNLOCKED(vp, __func__); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); -#ifdef INVARIANTS - /* getnewvnode() calls v_incr_usecount() without holding interlock. */ - if (vp->v_type != VNON || vp->v_data != NULL) - ASSERT_VI_LOCKED(vp, "vholdl"); -#endif - vp->v_holdcnt++; - if ((vp->v_iflag & VI_FREE) == 0) + if (!locked && refcount_acquire_if_not_zero(&vp->v_holdcnt)) { + VNASSERT((vp->v_iflag & VI_FREE) == 0, vp, + ("_vhold: vnode with holdcnt is free")); return; - VNASSERT(vp->v_holdcnt == 1, vp, ("vholdl: wrong hold count")); - VNASSERT(vp->v_op != NULL, vp, ("vholdl: vnode already reclaimed.")); + } + + if (!locked) + VI_LOCK(vp); + if ((vp->v_iflag & VI_FREE) == 0) { + refcount_acquire(&vp->v_holdcnt); + if (!locked) + VI_UNLOCK(vp); + return; + } + VNASSERT(vp->v_holdcnt == 0, vp, + ("%s: wrong hold count", __func__)); + VNASSERT(vp->v_op != NULL, vp, + ("%s: vnode already reclaimed.", __func__)); /* * Remove a vnode from the free list, mark it as in use, * and put it on the active list. @@ -2396,18 +2425,9 @@ vholdl(struct vnode *vp) TAILQ_INSERT_HEAD(&mp->mnt_activevnodelist, vp, v_actfreelist); mp->mnt_activevnodelistsize++; mtx_unlock(&vnode_free_list_mtx); -} - -/* - * Note that there is one less who cares about this vnode. - * vdrop() is the opposite of vhold(). - */ -void -vdrop(struct vnode *vp) -{ - - VI_LOCK(vp); - vdropl(vp); + refcount_acquire(&vp->v_holdcnt); + if (!locked) + VI_UNLOCK(vp); } /* @@ -2416,20 +2436,28 @@ vdrop(struct vnode *vp) * (marked VI_DOOMED) in which case we will free it. */ void -vdropl(struct vnode *vp) +_vdrop(struct vnode *vp, bool locked) { struct bufobj *bo; struct mount *mp; int active; - ASSERT_VI_LOCKED(vp, "vdropl"); + if (locked) + ASSERT_VI_LOCKED(vp, __func__); + else + ASSERT_VI_UNLOCKED(vp, __func__); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - if (vp->v_holdcnt <= 0) + if ((int)vp->v_holdcnt <= 0) panic("vdrop: holdcnt %d", vp->v_holdcnt); - vp->v_holdcnt--; - VNASSERT(vp->v_holdcnt >= vp->v_usecount, vp, - ("hold count less than use count")); - if (vp->v_holdcnt > 0) { + if (refcount_release_if_not_last(&vp->v_holdcnt)) { + if (locked) + VI_UNLOCK(vp); + return; + } + + if (!locked) + VI_LOCK(vp); + if (refcount_release(&vp->v_holdcnt) == 0) { VI_UNLOCK(vp); return; } diff --git a/sys/sys/lockmgr.h b/sys/sys/lockmgr.h index ff0473d..a74d5f5 100644 --- a/sys/sys/lockmgr.h +++ b/sys/sys/lockmgr.h @@ -159,6 +159,7 @@ _lockmgr_args_rw(struct lock *lk, u_int flags, struct rwlock *ilk, #define LK_SLEEPFAIL 0x000800 #define LK_TIMELOCK 0x001000 #define LK_NODDLKTREAT 0x002000 +#define LK_VNHELD 0x004000 /* * Operations for lockmgr(). diff --git a/sys/sys/refcount.h b/sys/sys/refcount.h index 4611664..d3f817c 100644 --- a/sys/sys/refcount.h +++ b/sys/sys/refcount.h @@ -64,4 +64,32 @@ refcount_release(volatile u_int *count) return (old == 1); } +static __inline int +refcount_acquire_if_not_zero(volatile u_int *count) +{ + u_int old; + + for (;;) { + old = *count; + if (old == 0) + return (0); + if (atomic_cmpset_int(count, old, old + 1)) + return (1); + } +} + +static __inline int +refcount_release_if_not_last(volatile u_int *count) +{ + u_int old; + + for (;;) { + old = *count; + if (old == 1) + return (0); + if (atomic_cmpset_int(count, old, old - 1)) + return (1); + } +} + #endif /* ! __SYS_REFCOUNT_H__ */ diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h index 36ef8af..9286a4e 100644 --- a/sys/sys/vnode.h +++ b/sys/sys/vnode.h @@ -162,8 +162,8 @@ struct vnode { daddr_t v_lastw; /* v last write */ int v_clen; /* v length of cur. cluster */ - int v_holdcnt; /* i prevents recycling. */ - int v_usecount; /* i ref count of users */ + u_int v_holdcnt; /* i prevents recycling. */ + u_int v_usecount; /* i ref count of users */ u_int v_iflag; /* i vnode flags (see below) */ u_int v_vflag; /* v vnode flags */ int v_writecount; /* v ref count of writers */ @@ -652,13 +652,15 @@ int vaccess_acl_posix1e(enum vtype type, uid_t file_uid, struct ucred *cred, int *privused); void vattr_null(struct vattr *vap); int vcount(struct vnode *vp); -void vdrop(struct vnode *); -void vdropl(struct vnode *); +#define vdrop(vp) _vdrop((vp), 0) +#define vdropl(vp) _vdrop((vp), 1) +void _vdrop(struct vnode *, bool); int vflush(struct mount *mp, int rootrefs, int flags, struct thread *td); int vget(struct vnode *vp, int lockflag, struct thread *td); void vgone(struct vnode *vp); -void vhold(struct vnode *); -void vholdl(struct vnode *); +#define vhold(vp) _vhold((vp), 0) +#define vholdl(vp) _vhold((vp), 1) +void _vhold(struct vnode *, bool); void vinactive(struct vnode *, struct thread *); int vinvalbuf(struct vnode *vp, int save, int slpflag, int slptimeo); int vtruncbuf(struct vnode *vp, struct ucred *cred, off_t length, From owner-freebsd-fs@freebsd.org Thu Jun 25 17:22:15 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2239B98E520 for ; Thu, 25 Jun 2015 17:22:15 +0000 (UTC) (envelope-from javocado@gmail.com) Received: from mail-lb0-x232.google.com (mail-lb0-x232.google.com [IPv6:2a00:1450:4010:c04::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9C0531D5B for ; Thu, 25 Jun 2015 17:22:14 +0000 (UTC) (envelope-from javocado@gmail.com) Received: by lbbvz5 with SMTP id vz5so49762981lbb.0 for ; Thu, 25 Jun 2015 10:22:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=5hm0urbBmUiJjGYeExsM33yf2d49LyNC//I3OqKxf0g=; b=xmi3lVwO4/E3CQYg4/b+Ukb9dKkH1N26x+LJlN019VXimpkQ18scNuvPdRcZXhu99T kQIkYZ7aP/BdObUD8KX3QRtcOeO4lTJoNBKmdc77lc0h4jZl3XiDQuBVX0xrQ/7HDjiw hQqLYTvsvXBi79ksC4ihoNoyMnC/Jf2+YADiiiaF5DSOMjja71YeDEtyPlcti/3S+Yz4 Q3ceQtIPSPJ2OP9JmDXjyrT9VkGzcPi4dg4XYQkouz330GrrYkRMxal6G9QvvYntNQ3u g7vWuq07R9LWG6lNUZErMjPZKiG27r37YPzOrIGgH4iw9Nksqh2nP8h7vzNuHC2xzYHk muqw== MIME-Version: 1.0 X-Received: by 10.112.154.71 with SMTP id vm7mr44934253lbb.96.1435252932451; Thu, 25 Jun 2015 10:22:12 -0700 (PDT) Received: by 10.114.96.8 with HTTP; Thu, 25 Jun 2015 10:22:12 -0700 (PDT) In-Reply-To: References: Date: Thu, 25 Jun 2015 10:22:12 -0700 Message-ID: Subject: Fwd: ZFS pool within FreeBSD bhyve guest From: javocado To: FreeBSD Filesystems Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Jun 2015 17:22:15 -0000 (I'm posting here because I think this may be more of a zfs issue rather than a bhyve issue) Hi, I would like to create a zfs filesystem within my bhyve (FreeBSD 10.1 as the guest and host) allowing users of the VM to run zfs send/receive commands on the zfs filesystem within their bhyve VM. Is this possible and what is/are the methods and options for creating the zfs filesystem (or volume) within the VM? If there is a way to do this, would any of the proposed methods depend on whether the VM lives in a file versus a zfs volume? My VM is file-based. Thanks! From owner-freebsd-fs@freebsd.org Thu Jun 25 17:41:52 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D72C198C72E for ; Thu, 25 Jun 2015 17:41:52 +0000 (UTC) (envelope-from rah.lists@gmail.com) Received: from mail-lb0-x232.google.com (mail-lb0-x232.google.com [IPv6:2a00:1450:4010:c04::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5DA851658 for ; Thu, 25 Jun 2015 17:41:52 +0000 (UTC) (envelope-from rah.lists@gmail.com) Received: by lbnk3 with SMTP id k3so50106129lbn.1 for ; Thu, 25 Jun 2015 10:41:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=m5Ye4TMqxSpIJu61UrbTJo/Dy+xNpI58mtzWHlC2QMM=; b=SlZZDqmHKTDkrkwIbxUmEWiG+umb+CCn2a2rh5TcrrqzI2SsaluS5tdMZ3z2+MFGz/ hYK22rh00tIGm8I1lKgqLsaXD4ji3/oc0Lq5wcR1FgK9+iXmD5ZNJahRP/RrWvgIWrNN fexIV3YSwt8E2Oo4i7V95dQJc0QEq3Us35xTMGfvqigMc8HjVkP78WXmSgiZB/j7A9kR wM6/EJToxFp4qk/is5KRl1Hxz9X50vK3FVmjlxBDXCFdYtvxYZouGbbTjds8Hq7h21ku 1N3uPTSnaQvyaJOdodkg0CQAGdi/jJ1TQgGrRckx+x6/1mCfWFUqlZrwSQ8+HDqCRPVO PNmg== MIME-Version: 1.0 X-Received: by 10.153.4.12 with SMTP id ca12mr4616968lad.20.1435254110519; Thu, 25 Jun 2015 10:41:50 -0700 (PDT) Received: by 10.25.218.66 with HTTP; Thu, 25 Jun 2015 10:41:50 -0700 (PDT) Date: Thu, 25 Jun 2015 13:41:50 -0400 Message-ID: Subject: VFS buffering issues with UFS + soft-updates journaling From: RA H To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Jun 2015 17:41:52 -0000 I was directed here by a moderator on the official forums, who suggested the behaviour I'm seeing may be a bug. I don't have much experience with mailing lists, so please be gentle :) I'm experiencing data loss on a UFS filesystem on an iSCSI disk when the iSCSI connection is terminated abruptly. I know the issue isn't that the data doesn't have time to flush to disk; unmounting the filesystem right after the copy completes always returns immediately. Details: FreeBSD 10.1-RELEASE iscsictl(8) (ie. the new iSCSI initiator) single GPT partition on disk UFS with soft-update journaling I mount the fs, copy a 1G file (have tried source file on tmpfs and a local SATA disk), wait ~10 seconds, then pull the Ethernet cable on the NIC which is connected to the iSCSI disk. I then reboot gracefully with shutdown -r now. After the system comes back up a fsck is necessary; I answer y to all the questions. After mounting, I either find no evidence the file ever existed, a file of zero size, or a truncated file. Even calling sync before terminating the connection does not prevent data loss. The first indication the problem had something to do with buffering was that during the reboot, the buffer sync (ie. Syncing disks, buffers remaining...) always indicates something in the range of 20-50 buffers that need syncing, all of which are eventually given up on. As a workaround, I set the sysctl variable vfs.lodirtybuffers to 1. With this setting, it takes 2-3 seconds for the sysctl variable vfs.numdirtybuffers to return to the level it was before the I started the copy. At that point I can pull the ethernet cable, reboot (there are still a few buffers that don't get synced), fsck, etc. and the file is intact. I haven't seen any side-effects to this, but I expect setting it so low is not exactly best practice. Another workaround is using UFS without soft-updates or journaling, provided I sync before terminating the iSCSI connection. The sync actually does what's expected in this case, and although fsck is still required, AFAICT it only needs to mark the fs clean. Initial testing with UFS and gjournal seems to work without OOTB, but I'm not sure I want to go that route. From owner-freebsd-fs@freebsd.org Thu Jun 25 18:29:11 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A107D98C5C7 for ; Thu, 25 Jun 2015 18:29:11 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 631F21FDD for ; Thu, 25 Jun 2015 18:29:11 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id BCEA7D43C1C; Fri, 26 Jun 2015 04:29:09 +1000 (AEST) Date: Fri, 26 Jun 2015 04:29:07 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Mateusz Guzik cc: Konstantin Belousov , freebsd-fs@freebsd.org Subject: Re: atomic v_usecount and v_holdcnt In-Reply-To: <20150625123156.GA29667@dft-labs.eu> Message-ID: <20150626042546.Q2820@besplex.bde.org> References: <20141122002812.GA32289@dft-labs.eu> <20141122092527.GT17068@kib.kiev.ua> <20141122211147.GA23623@dft-labs.eu> <20141124095251.GH17068@kib.kiev.ua> <20150314225226.GA15302@dft-labs.eu> <20150316094643.GZ2379@kib.kiev.ua> <20150317014412.GA10819@dft-labs.eu> <20150318104442.GS2379@kib.kiev.ua> <20150625123156.GA29667@dft-labs.eu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=XMDNMlVE c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=dfNNiiqOaOqD_QZmin8A:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Jun 2015 18:29:11 -0000 On Thu, 25 Jun 2015, Mateusz Guzik wrote: > On Wed, Mar 18, 2015 at 12:44:42PM +0200, Konstantin Belousov wrote: >> On Tue, Mar 17, 2015 at 02:44:12AM +0100, Mateusz Guzik wrote: >>> I replaced them with refcount_acquire_if_not_zero and >>> refcount_release_if_not_last. >> I dislike the length of the names. Can you propose something shorter ? > > Unfortunately the original API is alreday quite verbose and I don't have > anything readable which would retain "refcount_acquire" (instead of a > "ref_get" or "ref_acq"). Adding "_nz" as a suffix does not look good > ("refcount_acquire_if_nz"). refcount -> rc acquire -> acq The "acq" abbreviation is already used a lot for atomic ops. Bruce From owner-freebsd-fs@freebsd.org Thu Jun 25 19:53:57 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D9A3898D67B for ; Thu, 25 Jun 2015 19:53:57 +0000 (UTC) (envelope-from etnapierala@gmail.com) Received: from mail-wi0-x230.google.com (mail-wi0-x230.google.com [IPv6:2a00:1450:400c:c05::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6D901185F for ; Thu, 25 Jun 2015 19:53:57 +0000 (UTC) (envelope-from etnapierala@gmail.com) Received: by wiwl6 with SMTP id l6so27630167wiw.0 for ; Thu, 25 Jun 2015 12:53:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:mail-followup-to :references:mime-version:content-type:content-disposition :in-reply-to:user-agent; bh=2xGe5pJY7xYT8p5exM2iKaKua5/XJeO2FIp6KLBwemY=; b=kiIIva1sHh1Cf94KuBnmSP5tNQUohvU25fTPfh4A2CGLpJsN4F7Cr0OQFUuLLjHHyl riyrGaZ/KaKx1flJGOVUqxUsnWRtNVHP8Ni+B0HCesJSZ1tnfjrkjEHt/TQSCgUuRGpT Dn3WWQFSMH2AsJoPw1NDXOsQ7rmmXrO+oanzUg0L6QkHQ+nBkAXwftACytL/95/3DxB7 +R+DTO41gw8O9Q9NlZIX4XqiyvCdMrAK9InGOnhDVFdu1tnKMZiLypH+el/CkKa68D4x Zzn1KbtRJqP1y/8/3yHJc+z+09NxJAlyTF/tTTUe1B1CgkSlNwQ/AjfZQR5P70vgKBc6 +C8w== X-Received: by 10.180.88.8 with SMTP id bc8mr8563493wib.19.1435262035910; Thu, 25 Jun 2015 12:53:55 -0700 (PDT) Received: from brick.home (adje188.neoplus.adsl.tpnet.pl. [79.184.212.188]) by mx.google.com with ESMTPSA id d3sm9080449wic.1.2015.06.25.12.53.54 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Jun 2015 12:53:55 -0700 (PDT) Sender: =?UTF-8?Q?Edward_Tomasz_Napiera=C5=82a?= Date: Thu, 25 Jun 2015 21:53:52 +0200 From: Edward Tomasz =?utf-8?Q?Napiera=C5=82a?= To: Bruce Evans Cc: Mateusz Guzik , freebsd-fs@freebsd.org Subject: Re: atomic v_usecount and v_holdcnt Message-ID: <20150625195352.GB1042@brick.home> Mail-Followup-To: Bruce Evans , Mateusz Guzik , freebsd-fs@freebsd.org References: <20141122002812.GA32289@dft-labs.eu> <20141122092527.GT17068@kib.kiev.ua> <20141122211147.GA23623@dft-labs.eu> <20141124095251.GH17068@kib.kiev.ua> <20150314225226.GA15302@dft-labs.eu> <20150316094643.GZ2379@kib.kiev.ua> <20150317014412.GA10819@dft-labs.eu> <20150318104442.GS2379@kib.kiev.ua> <20150625123156.GA29667@dft-labs.eu> <20150626042546.Q2820@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150626042546.Q2820@besplex.bde.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Jun 2015 19:53:57 -0000 On 0626T0429, Bruce Evans wrote: > On Thu, 25 Jun 2015, Mateusz Guzik wrote: > > > On Wed, Mar 18, 2015 at 12:44:42PM +0200, Konstantin Belousov wrote: > >> On Tue, Mar 17, 2015 at 02:44:12AM +0100, Mateusz Guzik wrote: > > >>> I replaced them with refcount_acquire_if_not_zero and > >>> refcount_release_if_not_last. > >> I dislike the length of the names. Can you propose something shorter ? > > > > Unfortunately the original API is alreday quite verbose and I don't have > > anything readable which would retain "refcount_acquire" (instead of a > > "ref_get" or "ref_acq"). Adding "_nz" as a suffix does not look good > > ("refcount_acquire_if_nz"). > > refcount -> rc > acquire -> acq > > The "acq" abbreviation is already used a lot for atomic ops. How about refcount_acquire_gt_0() and refcount_release_gt_1()1? From owner-freebsd-fs@freebsd.org Thu Jun 25 22:15:10 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 824CA98C569 for ; Thu, 25 Jun 2015 22:15:10 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 677A41CDA for ; Thu, 25 Jun 2015 22:15:10 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id t5PMFAqH058864 for ; Thu, 25 Jun 2015 22:15:10 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 200663] zfs allow/unallow doesn't show numeric UID when the ID no longer exists in the password file Date: Thu, 25 Jun 2015 22:15:09 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: delphij@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Jun 2015 22:15:10 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200663 Xin LI changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |delphij@FreeBSD.org --- Comment #2 from Xin LI --- I have submitted an issue at Illumos: https://www.illumos.org/issues/6037 with a proposed patch against FreeBSD. -- You are receiving this mail because: You are on the CC list for the bug. From owner-freebsd-fs@freebsd.org Fri Jun 26 07:01:24 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9477F98C536 for ; Fri, 26 Jun 2015 07:01:24 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6F34A1263 for ; Fri, 26 Jun 2015 07:01:23 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id 0E6623F743 for ; Fri, 26 Jun 2015 03:01:21 -0400 (EDT) Message-ID: <558CF8BC.6050807@sneakertech.com> Date: Fri, 26 Jun 2015 03:01:16 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: FreeBSD FS Subject: The "myth" of zfs stripe width Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Jun 2015 07:01:24 -0000 http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/ This blog claims that the old rule of "power of two, plus parity" when considering the number of disks in a raid doesn't really apply to zfs. What does everyone else think? From owner-freebsd-fs@freebsd.org Fri Jun 26 13:40:14 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BD01F98CAF1 for ; Fri, 26 Jun 2015 13:40:14 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8B2EA1ADE for ; Fri, 26 Jun 2015 13:40:14 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id t5QDe6L5004701; Fri, 26 Jun 2015 08:40:06 -0500 (CDT) Date: Fri, 26 Jun 2015 08:40:06 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Quartz cc: FreeBSD FS Subject: Re: The "myth" of zfs stripe width In-Reply-To: <558CF8BC.6050807@sneakertech.com> Message-ID: References: <558CF8BC.6050807@sneakertech.com> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Fri, 26 Jun 2015 08:40:06 -0500 (CDT) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Jun 2015 13:40:14 -0000 On Fri, 26 Jun 2015, Quartz wrote: > http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/ > > This blog claims that the old rule of "power of two, plus parity" when > considering the number of disks in a raid doesn't really apply to zfs. What > does everyone else think? Are you suggesting that we might question the authority (the person who invented the technology) on this topic? This happens to be a blog that you can trust. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@freebsd.org Fri Jun 26 15:39:14 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8389298D05A for ; Fri, 26 Jun 2015 15:39:14 +0000 (UTC) (envelope-from ben@altesco.nl) Received: from altus-escon.com (altescovd.xs4all.nl [82.95.116.106]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "proxy.altus-escon.com", Issuer "PositiveSSL CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2631011FD for ; Fri, 26 Jun 2015 15:39:13 +0000 (UTC) (envelope-from ben@altesco.nl) Received: from daneel.altus-escon.com (daneel.altus-escon.com [193.78.231.7]) (authenticated bits=0) by altus-escon.com (8.14.9/8.14.9) with ESMTP id t5QFd3eb014942 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Fri, 26 Jun 2015 17:39:03 +0200 (CEST) (envelope-from ben@altesco.nl) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2102\)) Subject: Re: Panic on removing corrupted file on zfs From: Ben Stuyts In-Reply-To: Date: Fri, 26 Jun 2015 17:39:03 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <2CC1E621-687B-4F4A-97D4-2DCCB620E17A@altesco.nl> References: To: freebsd-fs@freebsd.org X-Mailer: Apple Mail (2.2102) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (altus-escon.com [193.78.231.142]); Fri, 26 Jun 2015 17:39:03 +0200 (CEST) X-Virus-Scanned: clamav-milter 0.98.7 at mars.altus-escon.com X-Virus-Status: Clean X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Jun 2015 15:39:14 -0000 Anybody? Otherwise I=E2=80=99ll just wipe the pool (it=E2=80=99s a = backup so no big loss). (To be safe I also ran memtest86 on this system, but it found no = errors.) Ben > On 23 Jun 2015, at 17:26, Ben Stuyts wrote: >=20 > Hello, >=20 > I have a corrupted file on a zfs file system. It is a backup store for = an rsync job, and rsync errors with: >=20 > rsync: failed to read xattr rsync.%stat for = "/home1/vwa/rsync/tank3/cam/jpg/487-20150224180950-05.jpg": Input/output = error (5) > Corrupt rsync.%stat xattr attached to = "/home1/vwa/rsync/tank3/cam/jpg/487-20150224180950-04.jpg": "100644 0,0 = \#007:1001" > rsync error: error in file IO (code 11) at xattrs.c(1003) = [generator=3D3.1.1] >=20 > This is a file from februari, and it hasn=E2=80=99t changed since. = Smartctl shows no errors. No ECC memory on this system, so maybe caused = by a memory problem. I am currently running a scrub for the second time. = First time didn=E2=80=99t help. >=20 > Output from zpool status -v: >=20 > pool: home1 > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore = the > entire pool from backup. > see: http://illumos.org/msg/ZFS-8000-8A > scan: scrub in progress since Tue Jun 23 15:37:31 2015 > 462G scanned out of 2.47T at 80.8M/s, 7h16m to go > 0 repaired, 18.29% done > config: >=20 > NAME STATE READ = WRITE CKSUM > home1 ONLINE 0 = 0 0 > gptid/14032b0b-7f05-11e3-8797-54bef70d8314 ONLINE 0 = 0 0 >=20 > errors: Permanent errors have been detected in the following files: >=20 > = /home1/vwa/rsync/tank3/cam/jpg/487-20150224180950-05.jpg/ >=20 > When I try to rm the file the system panics. =46rom /var/crash: >=20 > tera8 dumped core - see /var/crash/vmcore.1 >=20 > Tue Jun 23 15:37:11 CEST 2015 >=20 > FreeBSD tera8 10.1-STABLE FreeBSD 10.1-STABLE #2 r284317: Fri Jun 12 = 17:07:21 CEST 2015 root@tera8:/usr/obj/usr/src/sys/GENERIC amd64 >=20 > panic: acl_from_aces: a_type is 0x4d00 >=20 > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and = you are > welcome to change it and/or distribute copies of it under certain = conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for = details. > This GDB was configured as "amd64-marcel-freebsd"... >=20 > Unread portion of the kernel message buffer: > panic: acl_from_aces: a_type is 0x4d00 > cpuid =3D 1 > KDB: stack backtrace: > #0 0xffffffff8097d890 at kdb_backtrace+0x60 > #1 0xffffffff809410e9 at vpanic+0x189 > #2 0xffffffff80940f53 at panic+0x43 > #3 0xffffffff81aaa209 at acl_from_aces+0x1c9 > #4 0xffffffff81b61546 at zfs_freebsd_getacl+0xa6 > #5 0xffffffff80e5de77 at VOP_GETACL_APV+0xa7 > #6 0xffffffff809c7a3c at vacl_get_acl+0xdc > #7 0xffffffff809c7bd2 at sys___acl_get_link+0x72 > #8 0xffffffff80d35817 at amd64_syscall+0x357 > #9 0xffffffff80d1a89b at Xfast_syscall+0xfb >=20 > Is there any other way of getting rid of this file (except destroying = the fs/pool)?=20 >=20 > Thanks, > Ben >=20 >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >=20 From owner-freebsd-fs@freebsd.org Fri Jun 26 16:50:16 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3494798DBC8 for ; Fri, 26 Jun 2015 16:50:16 +0000 (UTC) (envelope-from schittenden@groupon.com) Received: from mail-yk0-x230.google.com (mail-yk0-x230.google.com [IPv6:2607:f8b0:4002:c07::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E441A1C92 for ; Fri, 26 Jun 2015 16:50:15 +0000 (UTC) (envelope-from schittenden@groupon.com) Received: by ykdt186 with SMTP id t186so63645685ykd.0 for ; Fri, 26 Jun 2015 09:50:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=groupon.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=66WPSAAiYcKHuQPcltY0o8SO0V8YtWAhYbrSPD9To7M=; b=WoXJTGTtV6HAH/Gq4zCK5lDPfs5mbLftsfmSCEhCf0bu0DqOqiYiEDzwIAASKu2Yev Ce/BaazYwZ3pWi0tEdeD6HDSt5wKthofTlRF0LvBsJl3VKidkUMVKeL0USbNvnipnkGY /r2ab4cNJH9C2auKDViEaT6/Xr+92Vr3K0L3Y= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=66WPSAAiYcKHuQPcltY0o8SO0V8YtWAhYbrSPD9To7M=; b=J+LAYYdtucJfD9WRFCIQe5bJO28t2ug/xy6LyH4JNTALO+7wNgMe/PB30cSjH3s/Tw SI7Yst90fqidJI43L1J1Dzku47bprWp7I2rHjnhLR0asUJTZNi0gFSiv2SmcLu1tBe/u HzaQjZDGkOPTuNM5V0kwAJAqZdViL6VwOmIVxRO80/4uDU062oyHRjxHL3irpoCVP3cZ kMbrxTv6n4JePm3xfrTy9ED5/5DXBOJuFGd9qfYCPNoS4DqmY+B/BYl4PjkbVoxtZzQD RnP6QlETVrYGAEBkPyc0JL2PiX3ik/riVeTsQlDl+/gBOYz1q+luLQJ2e/LVlOdych0L LNiQ== X-Gm-Message-State: ALoCoQnqUywkHYvqaWcA4zuytqcI+U3i/G5rKxRwLdUwxZveormWArYJ6ojmoWNuIZ5bApcw644F MIME-Version: 1.0 X-Received: by 10.13.236.5 with SMTP id v5mr3147227ywe.138.1435337414398; Fri, 26 Jun 2015 09:50:14 -0700 (PDT) Received: by 10.13.242.7 with HTTP; Fri, 26 Jun 2015 09:50:14 -0700 (PDT) In-Reply-To: References: Date: Fri, 26 Jun 2015 09:50:14 -0700 Message-ID: Subject: Re: ZFS pool within FreeBSD bhyve guest From: Sean Chittenden To: javocado Cc: FreeBSD Filesystems Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Jun 2015 16:50:16 -0000 Look at iohyve as an easy way to do this. You need to export a zvol to your guest. You can't export a "file system" but you can export a ZFS-backed volume. -sc https://github.com/pr1ntf/iohyve On Thu, Jun 25, 2015 at 10:22 AM, javocado wrote: > (I'm posting here because I think this may be more of a zfs issue rather > than a bhyve issue) > > Hi, > > I would like to create a zfs filesystem within my bhyve (FreeBSD 10.1 as > the guest and host) allowing users of the VM to run zfs send/receive > commands on the zfs filesystem within their bhyve VM. > > Is this possible and what is/are the methods and options for creating the > zfs filesystem (or volume) within the VM? If there is a way to do this, > would any of the proposed methods depend on whether the VM lives in a file > versus a zfs volume? My VM is file-based. > > Thanks! > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > -- Sean Chittenden From owner-freebsd-fs@freebsd.org Fri Jun 26 22:00:51 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2393998DFB2 for ; Fri, 26 Jun 2015 22:00:51 +0000 (UTC) (envelope-from javocado@gmail.com) Received: from mail-la0-x22a.google.com (mail-la0-x22a.google.com [IPv6:2a00:1450:4010:c03::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 972D01CEE for ; Fri, 26 Jun 2015 22:00:50 +0000 (UTC) (envelope-from javocado@gmail.com) Received: by lagx9 with SMTP id x9so71112847lag.1 for ; Fri, 26 Jun 2015 15:00:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=DIytguKNpnjWPJtJ9W9xr3nT4XGKh2ePybL33LvY0KU=; b=yNF42nYaypa8BTA/zISE9WpSN0TyjQWNDvPS4QEijTvNCo8xyHwOlaO23i3D6gkaf1 GElmolYDFLTIq3TLYExE6r04vh7WmgZwelJTf8TFTt4FwW/PlLt++B+iQO7p1i8EM62E bQprXAKeGfyXO7eR6F/7gBU5hMeVjEbi4+HgL7HX/guyoTGPhOqfMbUoxd63u6NsK7Ie a5PTI8jYY6hBCAk42suE4rucmxlQ5kb5Y7lJ7uvZfnvD+aExHEfoRzvU3S4hxk9saAYY 3lzLqrxJh8gXZhlXov71JQzyhW1RuUuE5jvgmLk3B/3lw+Z9VWubt0z7oOcPMUlyJ6aJ nFWg== MIME-Version: 1.0 X-Received: by 10.112.162.38 with SMTP id xx6mr3517498lbb.110.1435356048550; Fri, 26 Jun 2015 15:00:48 -0700 (PDT) Received: by 10.114.96.8 with HTTP; Fri, 26 Jun 2015 15:00:48 -0700 (PDT) In-Reply-To: <20150613094244.GC37870@brick.home> References: <20150613094244.GC37870@brick.home> Date: Fri, 26 Jun 2015 15:00:48 -0700 Message-ID: Subject: Re: growfs failure From: javocado To: javocado , FreeBSD Filesystems Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Jun 2015 22:00:51 -0000 Thanks for the suggestion, the original system is 8.3 amd64. The growfs did work when I moved the image file over to a 10.1 amd64 system On Sat, Jun 13, 2015 at 2:42 AM, Edward Tomasz Napiera=C5=82a wrote: > On 0603T1619, javocado wrote: > > While trying to growfs a filesystem, I receive the following error: > > > > growfs: rdfs: read error: 5812093147771869908: Input/output error > > > > Here were the steps taken leading up to this point: > > > > (original file is 300 GB, growing to 500 GB) > > > > (the filesystem is clean with fsck_ufs /dev/md1) > > > > geli detach /dev/md1.eli > > > > mdconfig -d -u 1 > > > > truncate -s +200G geli.img > > > > mdconfig -f geli.img -u 1 > > > > geli resize -s 300G /dev/md1 > > > > geli attach /dev/md1 > > > > growfs /dev/md1.eli > > > > new file systemsize is: 262143999 frags > > Warning: 326780 sector(s) cannot be allocated. > > growfs: 511840.4MB (1048249216 sectors) block size 16384, fragment size > 2048 > > using 2786 cylinder groups of 183.72MB, 11758 blks, 23552 inode= s. > > super-block backups (for fsck -b #) at: > > 629476448, 629852704, 630228960, 630605216, 630981472, 631357728, > > 631733984, 632110240, > > .... > > growfs: rdfs: read error: 5812093147771869908: Input/output error > > I can't reproduce it. What's the FreeBSD version? The output messages > above don't match current versions of growfs(8); could you try to upgrade > and see if the problem is fixed? > > From owner-freebsd-fs@freebsd.org Sat Jun 27 18:37:03 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9C83498CE32 for ; Sat, 27 Jun 2015 18:37:03 +0000 (UTC) (envelope-from postmaster+1557035@post.webmailer.de) Received: from cg6-p07-ob.smtp.rzone.de (cg6-p07-ob.smtp.rzone.de [IPv6:2a01:238:20a:202:5317::8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.smtp.rzone.de", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 39C401692 for ; Sat, 27 Jun 2015 18:36:59 +0000 (UTC) (envelope-from postmaster+1557035@post.webmailer.de) X-RZG-CLASS-ID: cg07 Received: from coates.store ([192.168.42.140]) by jored.store (RZmta 37.8 OK) with ESMTP id e0588ar5RIapBvQ for ; Sat, 27 Jun 2015 20:36:51 +0200 (CEST) Received: (from Unknown UID 1557035@localhost) by post.webmailer.de (8.13.7/8.13.7) id t5RIanVn013242; Sat, 27 Jun 2015 18:36:49 GMT To: freebsd-fs@freebsd.org Subject: Unable to deliver your item, #0000983036 Date: Sat, 27 Jun 2015 20:36:49 +0200 From: "FedEx 2Day A.M." Reply-To: "FedEx 2Day A.M." Message-ID: <5be2a11df09a5f8a18d3037b5f8bdb44@w80.rzone.de> X-Priority: 3 MIME-Version: 1.0 X-RZG-SCRIPT: :P28WfFC8JrA0JY4UkyfhUWv+YuCloWhyOLk77zZraDNPI4MwvWp5TFVn98vE2ZAeOn0rJsg57o36NfRf8EGnT2ai4NheKJSwUcX6sHUGkwZOJLxuC0pIvLlmH4ZdoyOrMyb5poN6A7VVbVwP5SYqnE0RvUDrmA/N Content-Type: text/plain; charset=us-ascii X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Jun 2015 18:37:03 -0000 Dear Customer, Courier was unable to deliver the parcel to you. Please, open email attachment to print shipment label. Yours faithfully, Walter Downs, Sr. Support Agent.