From nobody Fri Jul 28 13:37:07 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RC7wX6S0hz4qDVL; Fri, 28 Jul 2023 13:37:16 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4RC7wX3cCcz3CNV; Fri, 28 Jul 2023 13:37:16 +0000 (UTC) (envelope-from kostikbel@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.17.1/8.17.1) with ESMTPS id 36SDb7u4021659 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Fri, 28 Jul 2023 16:37:10 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 36SDb7u4021659 Received: (from kostik@localhost) by tom.home (8.17.1/8.17.1/Submit) id 36SDb7Pa021658; Fri, 28 Jul 2023 16:37:07 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 28 Jul 2023 16:37:07 +0300 From: Konstantin Belousov To: Mateusz Guzik Cc: src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Subject: Re: git: 5b353925ff61 - main - vnode read(2)/write(2): acquire rangelock regardless of do_vn_io_fault() Message-ID: References: <202307242203.36OM3IwQ009522@gitrepo.freebsd.org> List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=4.0.0 X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-14) on tom.home X-Rspamd-Queue-Id: 4RC7wX3cCcz3CNV X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated On Fri, Jul 28, 2023 at 02:17:51AM +0200, Mateusz Guzik wrote: > On 7/25/23, Konstantin Belousov wrote: > > The branch main has been updated by kib: > > > > URL: > > https://cgit.FreeBSD.org/src/commit/?id=5b353925ff61b9ddb97bb453ba75278b578ed7d9 > > > > commit 5b353925ff61b9ddb97bb453ba75278b578ed7d9 > > Author: Konstantin Belousov > > AuthorDate: 2023-07-23 15:55:50 +0000 > > Commit: Konstantin Belousov > > CommitDate: 2023-07-24 22:02:59 +0000 > > > > vnode read(2)/write(2): acquire rangelock regardless of > > do_vn_io_fault() > > > > To ensure atomicity of reads against parallel writes and truncates, > > vnode lock was not enough at least since introduction of vn_io_fault(). > > That code only take rangelock when it was possible that vn_read() and > > vn_write() could drop the vnode lock. > > > > At least since the introduction of VOP_READ_PGCACHE() which generally > > does not lock the vnode at all, rangelocks become required even > > for filesystems that do not need vn_io_fault() workaround. For > > instance, tmpfs. > > > > Is there a bug with pgcache reads disabled (as in when the vnode lock > is held for reads?) > > Note this patch adds 2 lock trips which were previously not present, > which has to slow things down single-threaded, but I did not bother > measuring that part. > > As this adds to vnode-wide *lock* acquires this has to very negatively > affect scalability. > > This time around I ran: ./readseek3_processes -t 10 (10 workers > reading from *disjoint* offsets from the same vnode. this in principle > can scale perfectly) > > I observed a 90% drop in performance: > before: total:25723459 ops/s > after: total:2455794 ops/s > > Going to an unpatched kernel and disabling pgcache reads instead: > disabled: total:6522480 ops/s > > or about 2.6x of performance of the current kernel > > In other words I think the thing to do here is to revert the patch and > instead flip pgcache reads to off by default until a better fix can be > implemented. The rangelock purpose is to ensure atomicity of reads in presence of writes. In other words, taking the rangelock there is architecturally right. Also, it fixes issues with truncation that are not fixable with the vnode lock on tmpfs vnodes anyway. That said, disabling pgcache vop on tmpfs means that the regular read vop is always used, which takes the vnode lock around reads. So I doubt that the changed disposition would gain much in your test. The proper future fix would be to improve scalability of the rangelocks, whose naive stop-gap implementation I did initially in time of current-7 or -8 was not changed at all. BTW, it seems that file offset locks are no longer needed, but I need to recheck it. This should shave off four atomics on read and write path.