From owner-freebsd-hackers@freebsd.org Fri Aug 16 19:00:00 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id A1578ABBF9 for ; Fri, 16 Aug 2019 19:00:00 +0000 (UTC) (envelope-from kib@freebsd.org) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 469CLS0mJ6z4dJV for ; Fri, 16 Aug 2019 18:59:59 +0000 (UTC) (envelope-from kib@freebsd.org) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x7GIxql8086776 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Fri, 16 Aug 2019 21:59:55 +0300 (EEST) (envelope-from kib@freebsd.org) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x7GIxql8086776 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x7GIxq7u086775; Fri, 16 Aug 2019 21:59:52 +0300 (EEST) (envelope-from kib@freebsd.org) X-Authentication-Warning: tom.home: kostik set sender to kib@freebsd.org using -f Date: Fri, 16 Aug 2019 21:59:52 +0300 From: Konstantin Belousov To: Shrikanth Kamath Cc: freebsd-hackers@freebsd.org Subject: Re: Reclaiming "dirty buffers" after seeing "fsync: giving up on dirty..." / Unplugging USB while copy in progress Message-ID: <20190816185952.GF71821@kib.kiev.ua> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.12.1 (2019-06-15) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-Rspamd-Queue-Id: 469CLS0mJ6z4dJV X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-2.97 / 15.00]; local_wl_from(0.00)[freebsd.org]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.97)[-0.974,0]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Aug 2019 19:00:00 -0000 On Fri, Aug 16, 2019 at 10:16:05AM -0700, Shrikanth Kamath wrote: > How do “lingering” dirty buffers get reclaimed? In the function > vop_stdfsync there is logic to retry but eventually fail after “maxretry” > and print “fsync: giving up on dirty (error “ while returning the error. In > a scenario where a USB stick is plugged in and a large file (> 1.5G) is > being copied to it from the host filesystem when the USB device is abruptly > removed. I see the fsync function retrying for a number of times before > returning with the below error > > fsync: giving up on dirty 0xfffff8058091d1d8: tag devfs, type VCHR > > usecount 1, writecount 0, refcount 1070 mountedhere 0xfffff805808af800 > > flags (VI_DOOMED|VI_ACTIVE) > > v_object 0xfffff807a6efe948 ref 0 pages 1069 cleanbuf 893 dirtybuf 174 > > lock type devfs: EXCL by thread 0xfffff8009aebb560 (pid 6463, chassisd, > tid 100270) > > What is eventually happening is there are other processes that start > appearing to be stuck waiting in “flswai” state (including the copy > operation to the USB stick). > > # ps jaux -o mwchan -o command | grep flswai > 6423 1 6423 6423 0 Ds - 0:02.35 /usr/sbin/eventd > 0.0 0.0 744768 12916 06:22 flswai /usr/sbin/eventd -r -s -A > 6463 6428 6427 6427 0 D - 8:25.69 /usr/sbin/chassi 0.0 > 0.1 862940 56472 06:22 flswai /usr/sbin/chassisd -N > 19753 19195 19753 6453 1 D+ u0 0:01.08 cp junos-vmhost- 0.0 > 0.0 8164 2968 12:13 flswai cp > junos-vmhost-install-mx-x86-64-19.3I-14062-TB-130172-_cd-builder.tgz /mnt/ > > Looking at the code, this seems to be coming from the “bwillwrite” function > (sys/kern/vfs_bio.c) where it explains it will block prior to “…locking of > any vnodes we attempt to avoid the situation where a locked vnode prevents > the various system daemons from flushing related buffers…” How does the > dirty buffers in this scenario get reclaimed? > > The dmesg log is from a Juniper device running stable/11 (closer to > 11.1ish) based Junos. > > Jul 23 12:06:31.740 da0 at umass-sim0 bus 0 scbus3 target 0 lun 0 > > Jul 23 12:06:31.740 da0: s/n > AA04012700046751 detached > Jul 23 12:06:31.740 g_vfs_done():da0p1[WRITE(offset=272711680, > length=65536)]error = 6 > ... > > Jul 23 12:06:31.943 g_vfs_done():da0p1[WRITE(offset=277626880, > length=65536)]error = 6 > ... > > Jul 23 12:06:31.992 g_vfs_done():da0p1[WRITE(offset=281624576, > length=65536)]error = 6 > ... > > Jul 23 12:06:32.144 g_vfs_done():da0p1[WRITE(offset=285687808, > length=65536)]error = 6 > Jul 23 12:06:32.144 (da0:umass-sim0:0:0:0): Periph destroyed > > Jul 23 12:06:32.144 umass0: detached > > Jul 23 12:06:36.672 fsync: giving up on dirty 0xfffff8058091d1d8: tag > devfs, type VCHR > Jul 23 12:06:36.672 usecount 1, writecount 0, refcount 1070 > mountedhere 0xfffff805808af800 > Jul 23 12:06:36.672 flags (VI_DOOMED|VI_ACTIVE) > > Jul 23 12:06:36.672 v_object 0xfffff807a6efe948 ref 0 pages 1069 > cleanbuf 893 dirtybuf 174 What I describe below is relevant for HEAD, and might be absent in 11. After the io finished with whatever results, brelse(9) is called by some means. There, if io finished with an error, and the error is ENXIO, which is believed to mean that the device went away, the buffer is marked as B_INVAL and truncated. Then the normal flow in brelse() causes the buffer return to the freelist. A large unsolved issue is that if the buffer was used by UFS with softupdates and there are unfinished dependencies hanging from the buffer, system checks that and panics. You should not use SU on USB stick anyway.