From owner-freebsd-fs@FreeBSD.ORG Fri Mar 13 17:08:01 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B1956852; Fri, 13 Mar 2015 17:08:01 +0000 (UTC) Received: from gate.pik.ru (gate.pik.ru [IPv6:2a03:5a00:31:40::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1A3F2653; Fri, 13 Mar 2015 17:08:00 +0000 (UTC) Received: from [internal] by relay.pik.ru (Postfix) with ESMTP id B5A38107C9; Fri, 13 Mar 2015 20:07:54 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=hotplug.ru; s=mx; t=1426266475; bh=qvlcOkMWfeyqUMT3l1QcpG/IxkeDFq6JDHtWzCItWm0=; h=Date:From:To:Subject:References:In-Reply-To; b=PZwhRKa0Ng+NX62iSg2fAENzZPTaHGSHdeOOyseesyKmmjOkcsjWz/kdRDGH8S0i0 l5JIwDZTXtWV5VRoLIvd53Zci8rTeF3GEx0EeAI6joTXK72n6gD+SpyRcgCPj09VS3 MFdvB8U9Ln7T+C3RGwgGkMnn/M0b0cr399Tyenc8= Message-ID: <5503196A.3090504@hotplug.ru> Date: Fri, 13 Mar 2015 20:07:54 +0300 From: Emil Muratov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: Alexander Motin , freebsd-fs@freebsd.org Subject: Re: CAM Target over FC and UNMAP problem References: <54F88DEA.2070301@hotplug.ru> <54F8AB96.6080600@FreeBSD.org> <54F9782A.90501@hotplug.ru> <54F98135.5000908@FreeBSD.org> In-Reply-To: <54F98135.5000908@FreeBSD.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Mar 2015 17:08:01 -0000 Hi Alexander, thanks for your comments, I'm still working on this On 06.03.2015 13:28, Alexander Motin wrote: > On 06.03.2015 11:49, Emil Muratov wrote: >> On 05.03.2015 22:16, Alexander Motin wrote: >>> What's about the large amount of reads during UNMAP, I have two guesses: >>> 1) it may be read of metadata absent in ARC. Though I doubt that there >>> are so much metadata to read them during several minutes. >> Just to be sure I setup SSD card, made L2 ARC cache over it and set the >> vol properties to 'secondarycache=metadata'. Then run the tests again - >> acording to gstat ssd is almost idle both for reads and writes but hdds >> are still heavily loaded for reads. > L2ARC is empty on boot and filled at limited rate. You may need to read > the file several times before deleting it to make metadata get into L2ARC. Done more tests with L2ARC. Warming up L2ARC gives small improvement (if any), but the problem with IO blocking timeouts is still actual. Observing gstat behavior during IO blocks I can see that HDDs are reading something lazily at a rate about 100-200 iops for several seconds (disk queue bumps to 10, 100% busy) than for an instant comes l2ARC reads burst with 3000-4000 iops and than again many seconds of lazy hdd reads. Maybe I should dive deep into L2ARC hits and misses, but I think that there is something more than metadata reads queue, see below. Not sure if I should disable L2ARC and do clean tests or continue with caching improvement. >>> 2) if UNMAP ranges were not aligned to ZVOL block, I guess ZFS could try >>> to read blocks that need partial "unmap". I've made experiment with >>> unmapping 512 bytes of 8K ZVOL block, and it indeed zeroed specified 512 >>> bytes, from SCSI perspective while it would be fine to just ignore the >>> request. >> Maybe I should take a closer look into this. Although I've tried to do >> best to align upper layer fs to zvol blocks, I've put GPT over LUN, >> win2012 should align it to 1M boundaries, than formatted NTFS partition >> with 8K cluster. As far as I can see during heavy writes there is no >> reads at the same time from the zvol, but I will do some more tests >> investigating this point. > You should check for reads not only during writes, but also during > REwrites. If initiator actively practices UNMAP, then even misaligned > initial write may not cause read-modify-write cycle, since there is just > nothing to read. Simple large files overwrite test shows interesting results - system writes several gigs chunk of data to disks for about a minute or two, gstat shows constant large speed disk writes with a low disk queue, almost no reads - so there shouldn't be any misalign problems. Than for a very short period of time here comes that blocking behavior - writes stops, queue bumps to 10 and guest IO blocks for several seconds. Although this blocks are clearly visible but short in time and it doesn't produce guest OS timeouts. Looks like when CoW releases unused blocks from zvol those issues arises. I should repeat this test with UNMAP disabled, not sure if this is ZFS CoW or UNMAP behavior. > >> Besides this why there should be so a lot of reads at the first place? >> Isn't it enough to just update metadata to mark unmapped blocks as free? > As I can see in ZFS code, if UNMAP is not aligned to zvol blocks, then > first and last blocks are not unmapped, but instead affected parts are > written with zeroes. Those partial writes may trigger read-modify-write > cycle, if data are not already in cache. SCSI spec allows device to skip > such zero writes, and I am thinking about implementing such filtering on > CTL level. > >> And what is the most annoying is that all IO blocks for a time, I'm not >> an expert in this area but isn't there any way to reorder or delay those >> unmap op's or even drop it out if there are a lot of other pending IOs? > That was not easy to do, but CTL should be clever about this now. It > should now block only access to blocks that are affected by specific > UNMAP command. From the other side after fixing this issue on CTL level > I've noticed that in ZFS UNMAP also significantly affects performance of > other commands to the same zvol. > > To check possible CTL role in this blocking you may try to add to your > LUN configuration `option reordering unrestricted`. It makes CTL to not > track any potential request collisions. If after that UNMAP will still > block other I/Os, then all questions to ZFS. I've tried 'reordering=unrestricted' - not much of a help indeed for a single zvol. But working with two zvols simultaneously gives other results. Reading/writing/unmapping data on the same zvol blocks everything very fast. Reading/writing one zvol and unmapping files on another zvol blocks only that particular zvol where unmap is in progress. IO to the other zvol is still processed, only with a performance penalty and more bursty in nature, but at least no timeouts on the guest and tons of ctl errors in log on the target. I've made another test - attached 3rd zvol to the guest and initiated large data EXTENDED_COPY with a guest system from the 2nd zvol to the 3rd zvol. Monitoring gstat, I saw that fast speed disk reads and writes is in progress, than I started unmapping lots of large files from the first zvol. At the beginning when 1st zvol blocked completely both operations (EXTENDED_COPY and UNMAP) worked in parallel, disk queue bumped to 10, reads/writes speed decreased (but still stayed at fast-copy level). Continue pushing with more large unmaps I've reached disk queue bumped to 20-30 and then it all went to the state when all 3 zvols blocked, fast disk reads and writes stopped and disk IO went into a previously mentioned long lazy reads/short L2ARC burst reads pattern, it lasted for a minute or two until all unmaps were finished and EXTENDED_COPY continued. More and more confusion with all of this.