Date: Fri, 06 Mar 2015 12:28:05 +0200 From: Alexander Motin <mav@FreeBSD.org> To: Emil Muratov <gpm@hotplug.ru>, freebsd-fs@freebsd.org Subject: Re: CAM Target over FC and UNMAP problem Message-ID: <54F98135.5000908@FreeBSD.org> In-Reply-To: <54F9782A.90501@hotplug.ru> References: <54F88DEA.2070301@hotplug.ru> <54F8AB96.6080600@FreeBSD.org> <54F9782A.90501@hotplug.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On 06.03.2015 11:49, Emil Muratov wrote: > On 05.03.2015 22:16, Alexander Motin wrote: >> On 05.03.2015 19:10, Emil Muratov wrote: >>> I've got an issue with CTL UNMAP and zvol backends. >>> Seems that UNMAP from the initiator passed to the underlying disks >>> (without trim support) causes IO blocking to the whole pool. Not sure >>> where to address this problem. >> There is no direct relations between UNMAP sent to ZVOl and UNMAP/TRIM >> to underlying disks. ZVOL UNMAP only frees some pool space, that may >> later be trimmed if disks support it. > So as far as I understood it must be only a zfs issue not related to CTL > at all? I think so. CTL just tells ZFS to free specified range of ZVOL, and so far nobody shown that it does it incorrectly. >> There were number of complains on UNMAP performance in Illumos lists >> too. Six month ago there were some fixes committed and merged to >> stable/10 that substantially improved the situation. Since that time I >> haven't observed problems with that on my tests. > Have you tried unmap on zvols with non-ssd backeds too? Now I'm actively > testing this scenario, but this issues makes it impossible to use UNMAP > in production, blocking timeouts turns into IO failures for initiator OS. My primary test system is indeed all-SSD. But I do some testing on HDD-based system and will do this more for UNMAP. >> What's about the large amount of reads during UNMAP, I have two guesses: >> 1) it may be read of metadata absent in ARC. Though I doubt that there >> are so much metadata to read them during several minutes. > Just to be sure I setup SSD card, made L2 ARC cache over it and set the > vol properties to 'secondarycache=metadata'. Then run the tests again - > acording to gstat ssd is almost idle both for reads and writes but hdds > are still heavily loaded for reads. L2ARC is empty on boot and filled at limited rate. You may need to read the file several times before deleting it to make metadata get into L2ARC. >> 2) if UNMAP ranges were not aligned to ZVOL block, I guess ZFS could try >> to read blocks that need partial "unmap". I've made experiment with >> unmapping 512 bytes of 8K ZVOL block, and it indeed zeroed specified 512 >> bytes, from SCSI perspective while it would be fine to just ignore the >> request. > Maybe I should take a closer look into this. Although I've tried to do > best to align upper layer fs to zvol blocks, I've put GPT over LUN, > win2012 should align it to 1M boundaries, than formatted NTFS partition > with 8K cluster. As far as I can see during heavy writes there is no > reads at the same time from the zvol, but I will do some more tests > investigating this point. You should check for reads not only during writes, but also during REwrites. If initiator actively practices UNMAP, then even misaligned initial write may not cause read-modify-write cycle, since there is just nothing to read. > Besides this why there should be so a lot of reads at the first place? > Isn't it enough to just update metadata to mark unmapped blocks as free? As I can see in ZFS code, if UNMAP is not aligned to zvol blocks, then first and last blocks are not unmapped, but instead affected parts are written with zeroes. Those partial writes may trigger read-modify-write cycle, if data are not already in cache. SCSI spec allows device to skip such zero writes, and I am thinking about implementing such filtering on CTL level. > And what is the most annoying is that all IO blocks for a time, I'm not > an expert in this area but isn't there any way to reorder or delay those > unmap op's or even drop it out if there are a lot of other pending IOs? That was not easy to do, but CTL should be clever about this now. It should now block only access to blocks that are affected by specific UNMAP command. From the other side after fixing this issue on CTL level I've noticed that in ZFS UNMAP also significantly affects performance of other commands to the same zvol. To check possible CTL role in this blocking you may try to add to your LUN configuration `option reordering unrestricted`. It makes CTL to not track any potential request collisions. If after that UNMAP will still block other I/Os, then all questions to ZFS. -- Alexander Motin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54F98135.5000908>