Date: Fri, 06 Mar 2015 12:49:30 +0300 From: Emil Muratov <gpm@hotplug.ru> To: Alexander Motin <mav@FreeBSD.org>, freebsd-fs@freebsd.org Subject: Re: CAM Target over FC and UNMAP problem Message-ID: <54F9782A.90501@hotplug.ru> In-Reply-To: <54F8AB96.6080600@FreeBSD.org> References: <54F88DEA.2070301@hotplug.ru> <54F8AB96.6080600@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 05.03.2015 22:16, Alexander Motin wrote: > Hi. > > On 05.03.2015 19:10, Emil Muratov wrote: >> I've got an issue with CTL UNMAP and zvol backends. >> Seems that UNMAP from the initiator passed to the underlying disks >> (without trim support) causes IO blocking to the whole pool. Not sure >> where to address this problem. > There is no direct relations between UNMAP sent to ZVOl and UNMAP/TRIM > to underlying disks. ZVOL UNMAP only frees some pool space, that may > later be trimmed if disks support it. So as far as I understood it must be only a zfs issue not related to CTL at all? > >> Create a new LUN with a zvol backend >> >> ctladm realsync off > Are you sure you need this? Your data are so uncritical to ignore even > explicit cache flushes? No, it's just for the test lab scenario. I'm not sure if UNMAP commands implies sync or not, so decided to take a chance, but no success anyway. > >> ctladm port -o on -p 5 >> ctladm create -b block -o file=/dev/zvol/wd/tst1 -o unmap=on -l 0 -d >> wd.tst1 -S tst1 > Just for note, this configuration can now be alternatively done via ctld > and /etc/ctl.conf. > >> But as soon as I've tried to delete large files all IO to the LUN >> blocks, initiator system just iowaits. gstat on target shows that >> underlying disk load bumped to 100%, queue up to 10, but no iowrites >> actually in progress, only decent amount of ioreads. After a minute or >> so IO unblocks for a second or two than blocks again and so on again >> until all UNMAPs are done, it could take up to 5 minutes to delete 10Gb >> file. I can see that 'logicalused' property of a zvol shows that the >> deleted space was actually released. System log is filled with CTL msgs: >> >> >> kernel: (ctl2:isp1:0:0:3): ctlfestart: aborted command 0x12aaf4 discarded >> kernel: (2:5:3/3): WRITE(10). CDB: 2a 00 2f d4 74 b8 00 00 08 00 >> kernel: (2:5:3/3): Tag: 0x12ab24, type 1 >> kernel: (2:5:3/3): ctl_process_done: 96 seconds >> kernel: (ctl2:isp1:0:0:3): ctlfestart: aborted command 0x12afa4 discarded >> kernel: (ctl2:isp1:0:0:3): ctlfestart: aborted command 0x12afd4 discarded >> kernel: ctlfedone: got XPT_IMMEDIATE_NOTIFY status 0x36 tag 0xffffffff >> seq 0x121104 >> kernel: (ctl2:isp1:0:0:3): ctlfe_done: returning task I/O tag 0xffffffff >> seq 0x1210d4 >> >> >> I've tried to tackle some sysctls, but no success so far. >> >> vfs.zfs.vdev.bio_flush_disable: 1 >> vfs.zfs.vdev.bio_delete_disable: 1 >> vfs.zfs.trim.enabled=0 >> >> >> Disabling UNMAP in CTL (-o unmap=off) resolves the issue completely but >> than there is no space reclamation for zvol. >> >> Any hints would be appreciated. > There were number of complains on UNMAP performance in Illumos lists > too. Six month ago there were some fixes committed and merged to > stable/10 that substantially improved the situation. Since that time I > haven't observed problems with that on my tests. Have you tried unmap on zvols with non-ssd backeds too? Now I'm actively testing this scenario, but this issues makes it impossible to use UNMAP in production, blocking timeouts turns into IO failures for initiator OS. > What's about the large amount of reads during UNMAP, I have two guesses: > 1) it may be read of metadata absent in ARC. Though I doubt that there > are so much metadata to read them during several minutes. Just to be sure I setup SSD card, made L2 ARC cache over it and set the vol properties to 'secondarycache=metadata'. Then run the tests again - acording to gstat ssd is almost idle both for reads and writes but hdds are still heavily loaded for reads. > 2) if UNMAP ranges were not aligned to ZVOL block, I guess ZFS could try > to read blocks that need partial "unmap". I've made experiment with > unmapping 512 bytes of 8K ZVOL block, and it indeed zeroed specified 512 > bytes, from SCSI perspective while it would be fine to just ignore the > request. Maybe I should take a closer look into this. Although I've tried to do best to align upper layer fs to zvol blocks, I've put GPT over LUN, win2012 should align it to 1M boundaries, than formatted NTFS partition with 8K cluster. As far as I can see during heavy writes there is no reads at the same time from the zvol, but I will do some more tests investigating this point. Besides this why there should be so a lot of reads at the first place? Isn't it enough to just update metadata to mark unmapped blocks as free? And what is the most annoying is that all IO blocks for a time, I'm not an expert in this area but isn't there any way to reorder or delay those unmap op's or even drop it out if there are a lot of other pending IOs? Will be back with more test results later.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54F9782A.90501>