From nobody Mon Nov 10 09:09:36 2025 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4d4kQC5Bh8z6GRJ7 for ; Mon, 10 Nov 2025 09:09:55 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4d4kQC3GPsz3Fdc for ; Mon, 10 Nov 2025 09:09:55 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ed1-x52d.google.com with SMTP id 4fb4d7f45d1cf-6407e617ad4so4721478a12.0 for ; Mon, 10 Nov 2025 01:09:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1762765789; x=1763370589; darn=freebsd.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=veS2wPnau9mVMVxBqtJOO5P4EMCPClMkopYbDXqxxpM=; b=fUj2JHtr4T9tZKLSmg5CrUUIjahelDUttraYFQq99w7PrJ3scLG4Rp013FELEW7W12 qtHEfVjxYeyievOb5cfxZ0WFHLrW9OTOgI7u5DzQzN4FuG99ckuyrZtg3FFlQJe99mRc B9aDQpjklxerKQlPDyA0zRhON2dlYJ59vp5jCw/KA3P5Kcniv/jNrBdvAQFLqV64o87u E08+xOkVByY0BZLRhn9dQsNR/7vI9+ads3lo3LGR7Sx8WqjBkXtWqNCLVM6QaOKgbTd9 E8UBTpau/tva62sqrZ08w1A8+ZDyuFN2jmATOTtoxzQKFFhJ/Hd9tcoAaUUPL9nahyV9 CHpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762765789; x=1763370589; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=veS2wPnau9mVMVxBqtJOO5P4EMCPClMkopYbDXqxxpM=; b=WYLoZyHjq7b1z/6kJW3S/PRaHlcZYMV/H8zq8Zb8HbbrNd30mx9c/kz7X+gGlS4Bms q0aLEtd+jIwb1I5oO31OlIfRvssea26/lL8RU4IT8+yvpG0lUxTnGop0LWcUcmOCF5wx +JEy09g0fb9xobOFlY9sMYM53AbKPExEB6afNn+YfI1IyzEk4sBHsaWXTogLvqbLYQCB ehgy3MSMl7ZI5vTmlnOMTqJDWbwPaBaUF3H4dOteEvWQBme0z/1HuiLwpMn0cK3TJUOy yVk3FRboieIpcWlD0lcWSM5n1L8OMzqWGBKes2zQZ6HGZSxXwhI2w92QGAtRCJRyGGiW VlqQ== X-Forwarded-Encrypted: i=1; AJvYcCX8QOJxNzZmuSnLpIpOG5uEsDw4QKdSSdq92+AZy9TdSKJhKeUgpo2oR61Wa2mXNLUlMw6NLYDvSNoLsGHeUz8=@freebsd.org X-Gm-Message-State: AOJu0YwB1AgyGtDkXGhtkg00sU2s5Vg2YGsxXugikqaXkDjGkxJrlTq1 KfsP/3mDNnt7m78LJiaEaFQOm/j0SYL1uw+P6DAfVz4fVLZoqLJs3DauKYOlu1gjkuia5UTjzjV 5SL1EqVCfqM5seKZKa+FkDhVk73nJ1g== X-Gm-Gg: ASbGncsAUcjxUU4sgDApbzJqM4a0tHteeUqyibrqvDr3dFi+8NvI2m2wuzCRxh43q30 NYQg1dqvQUgbQZur+ZMOe20oZUudOz62vAW6jHhtPfs2Ei0+TNAR3OBHTJkUsWum6k7/rMTmSKy 1/563kohRe8mT8DmTfuk1YeVW9/WgaQ1brt4Sec1LJSlheCPkUDCdQzxwph8bEElOHKRCBJp57O KkqsZnUw0V4ptkUzDVR73V8RI0zEimk4Ah/TF4KgHWpeTl0m7g5zgn9TxeAWfO67Y2XbSwo X-Google-Smtp-Source: AGHT+IEIRZE/ODNv5WuCAwnTTj0xg9BbnvrIYDAWeK7WODV76GY853lHfslRMAlTYi+GnhZXGR//XGi8r6Dela1W93o= X-Received: by 2002:a05:6402:5215:b0:640:9bed:85a5 with SMTP id 4fb4d7f45d1cf-64159e51fd0mr6911893a12.8.1762765788911; Mon, 10 Nov 2025 01:09:48 -0800 (PST) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org MIME-Version: 1.0 References: <2100145914.14642.1762672441817@localhost> In-Reply-To: From: Rick Macklem Date: Mon, 10 Nov 2025 01:09:36 -0800 X-Gm-Features: AWmQ_bkveSo-IuZxvtJkUBKuWGBkLgTegrU30TlLjOYLSAXCq2HJ0pNQWtEGcKg Message-ID: Subject: Re: RFC: Should copy_file_range(2) return after a few seconds? To: Bakul Shah Cc: Ronald Klop , "Peter 'PMc' Much" , FreeBSD CURRENT Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; TAGGED_FROM(0.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Queue-Id: 4d4kQC3GPsz3Fdc On Mon, Nov 10, 2025 at 12:58=E2=80=AFAM Bakul Shah w= rote: > > On Nov 9, 2025, at 12:52=E2=80=AFAM, Rick Macklem wrote: > > > > On Sat, Nov 8, 2025 at 11:14=E2=80=AFPM Ronald Klop wrote: > >> > >> Why is this locking needed? > >> AFAIK Unix has advisory locking, so if you read a file somebody else i= s writing the result is your own problem. It is up to the applications to a= dhere to the locking. > >> Is this a lock different than file locking from user space? > > Yes. A rangelock is used for a byte range during a read(2) or > > write(2) to ensure that they are serialized. This is a POSIX > > requirement. (See this post by kib@ in the original email > > discussion. https://lists.freebsd.org/archives/freebsd-fs/2025-October/= 004704.html) > > > > Since there is no POSIX standard for copy_file_range(), it could > > be argued that range locking isn't required for copy_file_range(), > > but that makes it inconsistent with read(2)/write(2) behaviour. > > (I, personally, am more comfortable with a return after N sec > > than removing the range locking, but that's just my opinion.) > > Traditionally reads/writes on Unix were atomic but that is not the > case for NFS, right? That is, while I am reading a file over NFS > someone else can modify it from another host (if they have write > permission). That is, AFAIK, the POSIX atomicity requirement for > ead / write is broken by NFS except for another reader/writer on > the same host. Yes. NFS is not a POSIX compliant file system (and cannot be, given various aspects of the protocol). The client can only attempt to approximate POSIX semantics. > > Another issue is that a kernel lock that is held for a very very > long time is asking for trouble. Ideally one spends as little time > as possible in the supervisor state and any optimization hacks > that push logic into the kernel should strive to not hold locks > for very long so that things don't grind to a complete halt. > > That is, copy_file_range() use in cat(1) seems excessive. The only > reason for its use seems to be for improving performance. Why not > break it up in smaller chunks? As I noted, for ZFS with block cloning enabled (now the default), the entire copy happens quickly no matter how large the file, since it does a copy on write. To break it up into smaller chunks defeats a lot of this, although it still might be able to do block cloing so long as the offsets and lengths are exact multiples of the recordsize (default of 128K these days, but can vary per file, so an application cannot easily know what the correct read/write size is to make it work). I did not do the "cat" commit, but I think it is a good idea, rick > That way you still get the benefit > of reducing syscall overhead (which pales in comparision to any > network reads in any case) + the same skipping over holes. Small > reads/wries is what we did before this syscall raised its head!