FreeBSD Mail Archives

Date:      Fri, 06 Mar 2015 12:49:30 +0300
From:      Emil Muratov <gpm@hotplug.ru>
To:        Alexander Motin <mav@FreeBSD.org>, freebsd-fs@freebsd.org
Subject:   Re: CAM Target over FC and UNMAP problem
Message-ID:  <54F9782A.90501@hotplug.ru>
In-Reply-To: <54F8AB96.6080600@FreeBSD.org>
References:  <54F88DEA.2070301@hotplug.ru> <54F8AB96.6080600@FreeBSD.org>

On 05.03.2015 22:16, Alexander Motin wrote:
> Hi.
>
> On 05.03.2015 19:10, Emil Muratov wrote:
>> I've got an issue with CTL UNMAP and zvol backends.
>> Seems that UNMAP from the initiator passed to the underlying disks
>> (without trim support) causes IO blocking to the whole pool. Not sure
>> where to address this problem.
> There is no direct relations between UNMAP sent to ZVOl and UNMAP/TRIM
> to underlying disks. ZVOL UNMAP only frees some pool space, that may
> later be trimmed if disks support it.
So as far as I understood it must be only a zfs issue not related to CTL
at all?

>
>> Create a new LUN with a zvol backend
>>
>> ctladm realsync off
> Are you sure you need this? Your data are so uncritical to ignore even
> explicit cache flushes?
No, it's just for the test lab scenario. I'm not sure if UNMAP commands
implies sync or not, so decided to take a chance, but no success anyway.

>
>> ctladm port -o on -p 5
>> ctladm create -b block -o file=/dev/zvol/wd/tst1 -o unmap=on -l 0 -d
>> wd.tst1 -S tst1
> Just for note, this configuration can now be alternatively done via ctld
> and /etc/ctl.conf.
>
>> But as soon as I've tried to delete large files all IO to the LUN
>> blocks, initiator system just iowaits. gstat on target shows that
>> underlying disk load bumped to 100%, queue up to 10, but no iowrites
>> actually in progress, only decent amount of ioreads. After a minute or
>> so IO unblocks for a second or two than blocks again and so on again
>> until all UNMAPs are done, it could take up to 5 minutes to delete 10Gb
>> file. I can see that 'logicalused' property of a zvol shows that the
>> deleted space was actually released. System log is filled with CTL msgs:
>>
>>
>> kernel: (ctl2:isp1:0:0:3): ctlfestart: aborted command 0x12aaf4 discarded
>> kernel: (2:5:3/3): WRITE(10). CDB: 2a 00 2f d4 74 b8 00 00 08 00
>> kernel: (2:5:3/3): Tag: 0x12ab24, type 1
>> kernel: (2:5:3/3): ctl_process_done: 96 seconds
>> kernel: (ctl2:isp1:0:0:3): ctlfestart: aborted command 0x12afa4 discarded
>> kernel: (ctl2:isp1:0:0:3): ctlfestart: aborted command 0x12afd4 discarded
>> kernel: ctlfedone: got XPT_IMMEDIATE_NOTIFY status 0x36 tag 0xffffffff
>> seq 0x121104
>> kernel: (ctl2:isp1:0:0:3): ctlfe_done: returning task I/O tag 0xffffffff
>> seq 0x1210d4
>>
>>
>> I've tried to tackle some sysctls, but no success so far.
>>
>> vfs.zfs.vdev.bio_flush_disable: 1
>> vfs.zfs.vdev.bio_delete_disable: 1
>> vfs.zfs.trim.enabled=0
>>
>>
>> Disabling UNMAP in CTL (-o unmap=off) resolves the issue completely but
>> than there is no space reclamation for zvol.
>>
>> Any hints would be appreciated.
> There were number of complains on UNMAP performance in Illumos lists
> too. Six month ago there were some fixes committed and merged to
> stable/10 that substantially improved the situation. Since that time I
> haven't observed problems with that on my tests.
Have you tried unmap on zvols with non-ssd backeds too? Now I'm actively
testing this scenario, but this issues makes it impossible to use UNMAP
in production, blocking timeouts turns into IO failures for initiator OS.

> What's about the large amount of reads during UNMAP, I have two guesses:
> 1) it may be read of metadata absent in ARC. Though I doubt that there
> are so much metadata to read them during several minutes.
Just to be sure I setup SSD card, made L2 ARC cache over it and set the
vol properties to 'secondarycache=metadata'. Then run the tests again -
acording to gstat ssd is almost idle both for reads and writes but hdds
are still heavily loaded for reads.

> 2) if UNMAP ranges were not aligned to ZVOL block, I guess ZFS could try
> to read blocks that need partial "unmap". I've made experiment with
> unmapping 512 bytes of 8K ZVOL block, and it indeed zeroed specified 512
> bytes, from SCSI perspective while it would be fine to just ignore the
> request.
Maybe I should take a closer look into this. Although I've tried to do
best to align upper layer fs to zvol blocks, I've put GPT over LUN,
win2012 should align it to 1M boundaries, than formatted NTFS partition
with 8K cluster. As far as I can see during heavy writes there is no
reads at the same time from the zvol, but I will do some more tests
investigating this point.
Besides this why there should be so a lot of reads at the first place?
Isn't it enough to just update metadata to mark unmapped blocks as free?
And what is the most annoying is that all IO blocks for a time, I'm not
an expert in this area but isn't there any way to reorder or delay those
unmap op's or even drop it out if there are a lot of other pending IOs?

Will be back with more test results later.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54F9782A.90501>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation