From owner-freebsd-current@FreeBSD.ORG Sat Jan 4 08:56:54 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 13B03D1E for ; Sat, 4 Jan 2014 08:56:54 +0000 (UTC) Received: from outpost1.zedat.fu-berlin.de (outpost1.zedat.fu-berlin.de [130.133.4.66]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 96B16102C for ; Sat, 4 Jan 2014 08:56:53 +0000 (UTC) Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost.zedat.fu-berlin.de (Exim 4.82) with esmtp (envelope-from ) id <1VzN2J-001C9F-QE>; Sat, 04 Jan 2014 09:56:51 +0100 Received: from g225159021.adsl.alicedsl.de ([92.225.159.21] helo=thor.walstatt.dyndns.org) by inpost2.zedat.fu-berlin.de (Exim 4.82) with esmtpsa (envelope-from ) id <1VzN2J-002zWp-Im>; Sat, 04 Jan 2014 09:56:51 +0100 Date: Sat, 4 Jan 2014 09:56:50 +0100 From: "O. Hartmann" To: "Steven Hartland" Subject: Re: ZFS command can block the whole ZFS subsystem! Message-ID: <20140104095650.2c500d20@thor.walstatt.dyndns.org> In-Reply-To: <4FB654C6DBC1479C943BD2C305A1C92E@multiplay.co.uk> References: <20140103130021.30569db4@thor.walstatt.dyndns.org> <20140103171457.0fbf0cd4@telesto> <4FB654C6DBC1479C943BD2C305A1C92E@multiplay.co.uk> Organization: FU Berlin X-Mailer: Claws Mail 3.9.3 (GTK+ 2.24.22; amd64-portbld-freebsd11.0) MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/JH3CdiJWm_mtysontn0VZ.0"; protocol="application/pgp-signature" X-Originating-IP: 92.225.159.21 X-ZEDAT-Hint: A Cc: FreeBSD CURRENT X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Jan 2014 08:56:54 -0000 --Sig_/JH3CdiJWm_mtysontn0VZ.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Fri, 3 Jan 2014 17:04:00 -0000 "Steven Hartland" wrote: > ----- Original Message -----=20 > From: "O. Hartmann" > > On Fri, 3 Jan 2014 14:38:03 -0000 > > "Steven Hartland" wrote: > >=20 > > >=20 > > > ----- Original Message -----=20 > > > From: "O. Hartmann" > > > >=20 > > > > For some security reasons, I dumped via "dd" a large file onto > > > > a 3TB disk. The systems is 11.0-CURRENT #1 r259667: Fri Dec 20 > > > > 22:43:56 CET 2013 amd64. Filesystem in question is a single ZFS > > > > pool. > > > >=20 > > > > Issuing the command > > > >=20 > > > > "rm dumpfile.txt" > > > >=20 > > > > and then hitting Ctrl-Z to bring the rm command into background > > > > via fg" (I use FreeBSD's csh in that console) locks up the > > > > entire command and even worse - it seems to wind up the pool in > > > > question for being exported! > > >=20 > > > I cant think of any reason why backgrounding a shell would export > > > a pool. > >=20 > > I sent the job "rm" into background and I didn't say that implies an > > export of the pool! > >=20 > > I said that the pool can not be exported once the bg-command has > > been issued.=20 >=20 > Sorry Im confused then as you said "locks up the entire command and > even worse - it seems to wind up the pool in question for being > exported!" >=20 > Which to me read like you where saying the pool ended up being > exported. I'm not a native English speaker. My intention was, to make it short: renove the dummy file. While having issued the command in the foreground of the terminal, I decided a second later after hitting return, to send it in the background via suspending the rm-command and issuing "bg" then. >=20 > > > > I expect to get the command into the background as every other > > > > UNIX command does when sending Ctrl-Z in the console. > > > > Obviously, ZFS related stuff in FreeBSD doesn't comply.=20 > > > >=20 > > > > The file has been removed from the pool but the console is still > > > > stuck with "^Z fg" (as I typed this in). Process list tells me: > > > >=20 > > > > top > > > > 17790 root 1 20 0 8228K 1788K STOP 10 0:05 > > > > 0.00% rm > > > >=20 > > > > for the particular "rm" command issued. > > >=20 > > > Thats not backgrounded yet otherwise it wouldnt be in the state > > > STOP. > >=20 > > As I said - the job never backgrounded, locked up the terminal and > > makes the whole pool inresponsive. >=20 > Have you tried sending a continue signal to the process? No, not by intention. Since the operation started to slow down the whole box and seemed to influence nearly every operation with ZFS pools I intended (zpool status, zpool import the faulty pool, zpool export) I rebootet the machine. After the reboot, when ZFS came up, the drive started working like crazy again and the system stopped while in recognizing the ZFS pools. I did then a hard reset and restarted in single user mode, exported the pool successfully, and rebooted. But the moment I did an zpool import POOL, the heavy working continued. >=20 > > > > Now, having the file deleted, I'd like to export the pool for > > > > further maintainance > > >=20 > > > Are you sure the delete is complete? Also don't forget ZFS has > > > TRIM by default, so depending on support of the underlying > > > devices you could be seeing deletes occuring. > >=20 > > Quite sure it didn't! It takes hours (~ 8 now) and the drive is > > still working, although I tried to stop.=20 >=20 > A delete of a file shouldn't take 8 hours, but you dont say how large > the file actually is? The drive has a capacity of ~ 2,7 TiB (Western Digital 3TB drive). The file I created was, do not laugh, please, 2,7 TB :-( I guess depending on COW technique and what I read about ZFS accordingly to this thread and others, this seems to be the culprit. There is no space left to delete the file savely. By the way - the box is still working on 100% on that drive :-( That's now > 12 hours. =20 >=20 > > > You can check that gstat -d > >=20 > > command report 100% acticity on the drive. I exported the pool in > > question in single user mode and now try to import it back while in > > miltiuser mode. >=20 > Sorry you seem to be stating conflicting things: > 1. The delete hasnt finished > 2. The pool export hung > 3. You have exported the pool >=20 Not conflicting, but in my non-expert terminology not quite accurate and precise as you may expect. ad item 1) I terminated (by the brute force of the mighty RESET button) the copy command. It hasn't finished the operation on the pool as I can see, but it might be a kind of recovery mechanism in progress now, not the rm-command anymore. ad 2) Yes, first it hung, then I reset the box, then in single user mode the export to avoid further interaction, then I tried to import the pool again ... ad 3) yes, successfully after the reset, now I imported the pool and the terminal, in which I issued the command is still stuck again while the pool is under heavy load. > What exactly is gstat -d reporting, can you paste the output please. I think this is boring looking at 100% activity, but here it is ;-) dT: 1.047s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d= %busy Name 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada1 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada2 10 114 114 455 85.3 0 0 0.0 0 0 0.0= 100.0| ada3 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada4 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| cd0 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p1 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p2 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p3 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p4 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p5 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p6 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p7 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p8 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p9 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p10 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p11 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p12 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p13 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p14 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/boot 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gptid/c130298b-046a-11e0-b2d6-001d60a6fa74 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/root 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/swap 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gptid/fa3f37b1-046a-11e0-b2d6-001d60a6fa74 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/var 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/var.tmp 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/usr 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/usr.src 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/usr.obj 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/usr.ports 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/data 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/compat 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/var.mail 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/usr.local 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada1p1 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada2p1 10 114 114 455 85.3 0 0 0.0 0 0 0.0= 100.0| ada3p1 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada4p1 >=20 > > Shortly after issuing the command > >=20 > > zpool import POOL00 > >=20 > > the terminal is stuck again, the drive is working at 100% for two > > hours now and it seems the great ZFS is deleting every block per > > pedes. Is this supposed to last days or a week? >=20 > What controller and what drive? Hardware is as follows: CPU: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz (3201.89-MHz K8-class CPU) real memory =3D 34359738368 (32768 MB) avail memory =3D 33252507648 (31712 MB) ahci1: port 0xf090-0xf097,0xf080-0xf0= 83,0xf070-0xf077,0xf060-0xf063,0xf020-0xf03f mem 0xfb520000-0xfb5207ff irq = 20 at device 31.2 on pci0 ahci1: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported ahcich8: at channel 0 on ahci1 ahcich9: at channel 1 on ahci1 ahcich10: at channel 2 on ahci1 ahcich11: at channel 3 on ahci1 ahcich12: at channel 4 on ahci1 ahcich13: at channel 5 on ahci1 ahciem0: on ahci1 >=20 > What does the following report: > sysctl kstat.zfs.misc.zio_trim sysctl kstat.zfs.misc.zio_trim kstat.zfs.misc.zio_trim.bytes: 0 kstat.zfs.misc.zio_trim.success: 0 kstat.zfs.misc.zio_trim.unsupported: 507 kstat.zfs.misc.zio_trim.failed: 0 >=20 > > > > but that doesn't work with > > > >=20 > > > > zpool export -f poolname > > > >=20 > > > > This command is now also stuck blocking the terminal and the > > > > pool from further actions. > > >=20 > > > If the delete hasnt completed and is stuck in the kernel this is > > > to be expected. > >=20 > > At this moment I will not imagine myself what will happen if I have > > to delete several deka terabytes. If the weird behaviour of the > > current system can be extrapolated, then this is a no-go. >=20 > As I'm sure you'll appreciate that depends if the file is simply being > unlinked or if each sector is being erased, the answers to the above > questions should help determine that :) You're correct in that. But sometimes I'd like to appreciate to have the ch= oice. >=20 > Regards > Steve Regards, Oliver --Sig_/JH3CdiJWm_mtysontn0VZ.0 Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQEcBAEBAgAGBQJSx8zSAAoJEOgBcD7A/5N8dYgH/AlFSUqGDzR5OcFgkC8Cg1Sm a+XhTAaoeDx1sbr2V4vRgyWHhtc2YjscfyWa4qSRj7Nvl/2m8NXkRMD/u51iwva/ P+5iD62bKuCFpJ4ZavKMlBWUEPrsdcAbyr7a9S1oIjSfhAxFdqaR0qzyy46CBLzb QJqOAETmbQluOsWMcgdgBv56NEjTLINmNWX9cKpj85feDgJdGrenJXPdl7DIW1Ze J6+uVeIf6WD9Woc5WqmOpnNfKXVq/XFMrz77KNvGSN5bItuKup10RPoQDsVMn9Ml mqhYBfWv5nDV5gwbPfjWIGtQ5Y6pKCGj4ubhxa4d3DLCY0HYRYupPJLLxY1A7YI= =44gh -----END PGP SIGNATURE----- --Sig_/JH3CdiJWm_mtysontn0VZ.0--