From owner-freebsd-current@freebsd.org Sat Jan 2 22:09:08 2021 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id BBD3F4BA49C for ; Sat, 2 Jan 2021 22:09:08 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-oi1-f175.google.com (mail-oi1-f175.google.com [209.85.167.175]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4D7bdc4nlzz3Jy5; Sat, 2 Jan 2021 22:09:08 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-oi1-f175.google.com with SMTP id s75so27783679oih.1; Sat, 02 Jan 2021 14:09:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=VR04DHZ9FVpfgvAEaOPWvboWixPijn1Z3KN4JFCCTCw=; b=jg6nTa3h9A9/YALNVWGtr4Z4vDZ4GOfFE9hoInm/0uYcE99bf8Hr9wJ4F5HioaWpbv e6ECZFduKsWPJUR9rU4kS0iKiSF5Y1r+O68AVoWlri10dflUBYof9265g5gzhit86xMD 5OvHkE7rrY9/r6GocTqAKPicZu+kiyQ9cs4nKY8yif5tMVc8Vk0mgw+NwoaeTOvxuL1r +Ww71gDdtOoqde+ivi+tVLtf1ZuV/9Lalu6f1MrIoiYgvUXb1fSzUa9+Xwx4GJQyctFM 0H8ZR1H5rmK1DOXD/NZRadLojwEEHhmtyVQeBxxhUzeJYFRCNmQYEX58n8E/tMd2g810 5U9Q== X-Gm-Message-State: AOAM5312mKRkzueWRnYp34+aHOZEfkMGC8dDFx15K1DnqncUrnc8xzdV QlK0vbD43ImQ+MUe6p8vHNOwsFhqb7txmajHxlw= X-Google-Smtp-Source: ABdhPJwrjbdg17aU0Ck+BcuIuusI7r+6XU2yns9w0+VMH2CH+SZLGNIkYa0QcgfA2p1NZdI93Pf17CV5smut+Q7Q8Kw= X-Received: by 2002:aca:dd09:: with SMTP id u9mr14132494oig.73.1609625347436; Sat, 02 Jan 2021 14:09:07 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Alan Somers Date: Sat, 2 Jan 2021 15:08:56 -0700 Message-ID: Subject: Re: cp(1) of large files is causing 100% CPU utilization and poor transfer To: Rick Macklem Cc: Matthias Apitz , FreeBSD CURRENT , Konstantin Belousov , Kirk McKusick X-Rspamd-Queue-Id: 4D7bdc4nlzz3Jy5 X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.34 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Jan 2021 22:09:08 -0000 LGTM! This patch also fixes another problem: the previous version of cp, when copying a large sparse file on UFS, would create some UFS indirect blocks (because it would keep truncating the file to larger sizes). The output file would still be sparse, but it would take up more space than the original. IIRC about 0.2% of the empty space would get used by UFS indirect blocks. But your patch fixes it. What I said earlier about needing to modify vn_generic_copy_file_range wasn't quite correct. I confused len with xfer when I was reading the code. The change I proposed to vn_generic_copy_file_range would only make a difference if the process receives many interrupts. And here's some background for other people reading the thread: the reason that the initial copy_file_range implementation in cp only used a 2 MB block size is because vn_generic_copy_file_range wasn't always interruptible, and we didn't want cp to block for minutes or even hours during a long transfer. Subsequently rmacklem made vn_generic_copy_file_range interruptible, but we never raised the block size in cp. -Alan On Sat, Jan 2, 2021 at 2:42 PM Rick Macklem wrote: > The attached small patch seems to fix the problem. > My hunch is that, for a large non-sparse file, SEEK_DATA > SEEK_HOLE takes a fairly long time. > These are done for each copy_file_range(2) syscall. > > cp was doing lots of them because of the small len argument. > Bumping the len up to SSIZE_MAX results in far fewer sycalls > and, therefore, SEEK_DATAs and SEEK_HOLEs. > > Without the patch, cp took 6 times as long as dd. > With the patch, cp takes less time than dd. > > I'll put the patch on the bug report. Matthias, can you test > the patch? > > Thanks for reporting this, rick > ps: All my test programs use SSIZE_MAX unless they were > not supposed to copy to eof, which explains why I > missed this. My bad, for the testing.;-) > > ________________________________________ > From: owner-freebsd-current@freebsd.org > on behalf of Matthias Apitz > Sent: Saturday, January 2, 2021 3:05 PM > To: Alan Somers > Cc: Rick Macklem; FreeBSD CURRENT; Konstantin Belousov; Kirk McKusick > Subject: Re: cp(1) of large files is causing 100% CPU utilization and poo= r > transfer > > CAUTION: This email originated from outside of the University of Guelph. > Do not click links or open attachments unless you recognize the sender an= d > know the content is safe. If in doubt, forward suspicious emails to > IThelp@uoguelph.ca > > > El d=C3=ADa s=C3=A1bado, enero 02, 2021 a las 11:29:36a. m. -0700, Alan S= omers > escribi=C3=B3: > > > > El d=C3=ADa s=C3=A1bado, enero 02, 2021 a las 05:06:05p. m. +0000, Ri= ck Macklem > > > escribi=C3=B3: > > > > > > > Just fyi, I've reproduced the problem. > > > > All I did was create a 20Gbyte file > > > > on UFS on a slow (4Gbyte or RAM, > > > > slow spinning disk) laptop. > > > > (The UFS file system is just what the installer creates these days.= ) > > > > > > > > cp still hasn't finished and is definitely > > > > taking a looott longer than dd did. > > > > > > > > I'll start drilling down later to-day. > > > > > > > > I'll admit doing lots of testing of copy_file_range(2) > > > > with large sparse files, but I may have missed testing > > > > a large non-sparse file. > > > > > > > > rick > > > > ps: I've added Kostik and Kirk to the cc. > > > > > > As the problem seems to be clear now, should I still file a PR? > > > I'm happy to do so. > > > > > > > Yes please . That will help ensure that we don't lose track of it. > > Here we go: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D252358 > > Thanks > > matthias > > -- > Matthias Apitz, =E2=9C=89 guru@unixarea.de, http://www.unixarea.de/ > +49-176-38902045 > Public GnuPG key: http://www.unixarea.de/key.pub > =C2=A1Con Cuba no te metas! =C2=AB=C2=BB Don't mess with Cuba! =C2=AB= =C2=BB Leg Dich nicht mit > Kuba an! > http://www.cubadebate.cu/noticias/2020/12/25/en-video-con-cuba-no-te-meta= s/ > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org= " >