From owner-svn-src-all@freebsd.org Tue Dec 11 19:58:09 2018 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7D5331311F9B; Tue, 11 Dec 2018 19:58:09 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: from mail-it1-f194.google.com (mail-it1-f194.google.com [209.85.166.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BA58C6A87B; Tue, 11 Dec 2018 19:58:08 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: by mail-it1-f194.google.com with SMTP id x124so11714217itd.1; Tue, 11 Dec 2018 11:58:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to:cc:content-transfer-encoding; bh=aEuwPC4FaunwQxgn0m+1EYcxN1LUBEKhv5IUiHJ/LM4=; b=SKsSXp1uitwzOya4kCW0pPihm4qqRA2WlE8elI1ilKq9nBj87bE9nn+tvaPkRQCZ55 z9rassUZs7eUYyFuIspGsp+yqYS0WIVCkTNpMw+12S5X157+OuihTjjqq8L71Izrf+2q sSP/sHweCNypGFt5R+9bxizjB0Gv+hX8cVUNwYjPHFA9bQgSD77/nS2fXso49SwgzUrh xNRJ/DjqOhgKoheooZtjS9JYJgwY0//Sm90a5sIOynG3RELm4Cu8nU8AojChIQmTQu4k siV858/QugEuO5qXf0pj88G9ef5fr3HScoupsxbvYr6YWhA9YAoODFJOYNX/ipS/kNkg et0Q== X-Gm-Message-State: AA+aEWYaPplj6074v+U6Q2twpyaStKgH1owpuRkx4f2ESzYLM11ibF+M Lnvl+h3MAvSv9pYfbiAy/JBy8l3y X-Google-Smtp-Source: AFSGD/X49iJgxcjvbOk9syKGm7dDbh4yXzXoyAoLzUIatyQ5a4jKu/rRahAUk5ESXY18H6xKlI+22g== X-Received: by 2002:a02:c88:: with SMTP id 8mr16529490jan.87.1544558287383; Tue, 11 Dec 2018 11:58:07 -0800 (PST) Received: from mail-io1-f49.google.com (mail-io1-f49.google.com. [209.85.166.49]) by smtp.gmail.com with ESMTPSA id t194sm9368852iof.3.2018.12.11.11.58.06 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Dec 2018 11:58:06 -0800 (PST) Received: by mail-io1-f49.google.com with SMTP id x6so12836190ioa.9; Tue, 11 Dec 2018 11:58:06 -0800 (PST) X-Received: by 2002:a6b:ee16:: with SMTP id i22mr13739435ioh.124.1544558286416; Tue, 11 Dec 2018 11:58:06 -0800 (PST) MIME-Version: 1.0 References: <201812110138.wBB1cp1p006660@repo.freebsd.org> <2a76b295-b2da-3015-c201-dbe0ec63ca5a@FreeBSD.org> <98481565-CDD7-4301-B86B-072D5B984AF7@FreeBSD.org> In-Reply-To: Reply-To: cem@freebsd.org From: Conrad Meyer Date: Tue, 11 Dec 2018 11:57:55 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: svn commit: r341803 - head/libexec/rc To: Warner Losh Cc: src-committers , svn-src-all@freebsd.org, svn-src-head@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: BA58C6A87B X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of csecem@gmail.com designates 209.85.166.194 as permitted sender) smtp.mailfrom=csecem@gmail.com X-Spamd-Result: default: False [-1.95 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; HAS_REPLYTO(0.00)[cem@freebsd.org]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; IP_SCORE(-0.98)[ipnet: 209.85.128.0/17(-3.54), asn: 15169(-1.27), country: US(-0.09)]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; DMARC_NA(0.00)[freebsd.org]; TO_DN_SOME(0.00)[]; REPLYTO_ADDR_EQ_FROM(0.00)[]; RCVD_COUNT_THREE(0.00)[4]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com]; NEURAL_HAM_SHORT(-0.96)[-0.962,0]; RCVD_IN_DNSWL_NONE(0.00)[194.166.85.209.list.dnswl.org : 127.0.5.0]; RCVD_TLS_LAST(0.00)[]; FORGED_SENDER(0.30)[cem@freebsd.org,csecem@gmail.com]; RWL_MAILSPIKE_POSSIBLE(0.00)[194.166.85.209.rep.mailspike.net : 127.0.0.17]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; TAGGED_FROM(0.00)[]; FROM_NEQ_ENVFROM(0.00)[cem@freebsd.org,csecem@gmail.com] X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Dec 2018 19:58:09 -0000 On Tue, Dec 11, 2018 at 10:04 AM Warner Losh wrote: > On Tue, Dec 11, 2018, 9:55 AM John Baldwin > >> The 'read' builtin in sh can't use buffering, so it is always going to b= e slow > > It can't use it because of pipes. The example from the parts of this that= was on IRC was basically: > > foo | (read bar; baz) > > Which reads one line into the bar variable and then sends the rest to the= bar command. It can't trivially, but it's not impossible. sh could play games and buffer its own use of stdin, and then open a fresh pipe for stdin of subsequent non-builtins, writing out unused portions of the buffer.[A] Some other alternatives that would require kernel support but are things we've talked about doing in the kernel before anyway: * If we had something like eBPF programs attached to IO, maybe sh's read built-in could push a small eBPF program into the kernel that determined how many bytes could be read from the pipe in a single syscall without reading too far. It's fairly trivial. Simply returning a number of bytes up to and including the first '\n' would be a fine, if sometimes conservative amount. (Input lines can be continued with a trailing backslash, except in -r mode, but as a first-cut approximation, reading-until-newline is probably good enough.)[B] * Heck, even just a read_until_newline(2) syscall would work and probably be more broadly useful than just sh(1). I don't think it passes the sniff test =E2=80=94 not general enough, and probably not someth= ing you want beginners stumbling across instead of fgets(3) =E2=80=94 but it'd = be fine, and there are other pipe-abusing programs that care about reading ASCII text lines without overconsuming input than just sh(1).[C] * If we had something like Linux's tee(2) system call (which is as it sounds =E2=80=94 tee(1) for pipes), sh(1)'s read built-in could tee(2) for buffering without impacting stdin, and read(2) stdin only when it knew how many bytes were consumed (or when the pipe buffer became full).[D] I suspect (C) would be the easiest to implement correctly, followed by (D). (B) is requires some architectural design and bikeshedding and the details on the kernel side are tricky. (A) would be a little tricky and probably require extensive changes to sh(1) itself, which is a risk to the base system. But it would not impact the kernel. Is there any interest in a tee(2)-like syscall? Thanks, Conrad