From nobody Fri Nov 11 18:11:34 2022 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4N86Gb0nVYz4dWhC; Fri, 11 Nov 2022 18:11:35 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4N86Gb0MVjz3mp2; Fri, 11 Nov 2022 18:11:35 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1668190295; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=cPTu+t/fJPK0TIBcVcvwZfyvjtcv8z+NmZdL01NPIWo=; b=eGzLFsGfzGFhLjMMWdSO8nee8vS3ys71bUPSHu51mMkUCeOusUEX81skluTzjG0fzLy8NG 5Dy4/KpYa/0sFfyFocm99TrQwRTYQ7Nhf4dvYu+b9oJCFevWu/dy9/IXZYgJlihLC4UCWX nw1ku2JoKCgmTaGBB+kNiIayivxhDjQSeufwhUZFagvsrE5SlaJkclxrhErFQJYzy+PNKb dhgqwhu+DctTZBERnauqCYk9VwsEIacRD7JQhVum3eInBtwDcqqvEP6jvIWyotV2CsdbQi vsFA1DgLGm7QvbYT0mG8OqXp5NwgNmvlzCmtQqZOkj9AhAiPDx0qk9Q+OZeSrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1668190295; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=cPTu+t/fJPK0TIBcVcvwZfyvjtcv8z+NmZdL01NPIWo=; b=MoEwvD2VJzizDm2IvuprZpPycBzMRuX7u7Jh6AWqJGQN+GA5c7wfaALvA0bAsMHWP29occ sqbBITI2AvYurdRNvFXnvvwGp5JicKy8UbkboosrDOrQ6RIQh+OJDSdq6MwkP12XnXuF0f qktdnvXtxaQFhpDJpFAjfc53j1luWYjmoDPK02RTBUIq0uNLMRbwHtNDoEtNshMqahwPqI uRRcdd+9ePDrOcs8TIN8hNBFQEKgXLKudgny3L9k9n0sIkwd9ng3ksjeKhzUASVYI5KdZ2 utM9UuOEa10qrsFJ8UhFmHIraTtu/z7lIP1RqD4pu1m3NImKY4RHBvkay50NTg== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1668190295; a=rsa-sha256; cv=none; b=d1YHf0ybNjduZsRfOJ7FlJkP+zGxye3N+CTZeJBkp+7nAywgHpxLbdNN3KBf9MV0bBMtPk E4BpTTD8/fuB1ssiVYaFJQahywYgoaRp1UEmINyYs2XrZwb3NqY3kZG8H60HJXodLbWxKg AHrcuOAb+kdXuPcpJmMIeXjTAJdsRpuNTxh6Hb2uDipJuAUoYcUDsMi6gLiVI2B0c92A9a wtkKSTo242yQgcXoQxDNPSPw7uFpKFdplnsTS/G1Dm3eX3axC6VW5JSV6KGLCTq9gWTAUW DhNOyhicPnaYlUApI4oFI3aw57VVN2j8IZc3zxgJu/MT8E0P4/rWllKWWjJoFw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4N86GZ6XPNzsQ1; Fri, 11 Nov 2022 18:11:34 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 2ABIBYrw073007; Fri, 11 Nov 2022 18:11:34 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 2ABIBY28073006; Fri, 11 Nov 2022 18:11:34 GMT (envelope-from git) Date: Fri, 11 Nov 2022 18:11:34 GMT Message-Id: <202211111811.2ABIBY28073006@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Kyle Evans Subject: git: f6a842313ca2 - stable/13 - split: switch to getline() for line/pattern matching List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: kevans X-Git-Repository: src X-Git-Refname: refs/heads/stable/13 X-Git-Reftype: branch X-Git-Commit: f6a842313ca28d300beb36c0b765c10e0970b2ca Auto-Submitted: auto-generated X-ThisMailContainsUnwantedMimeParts: N The branch stable/13 has been updated by kevans: URL: https://cgit.FreeBSD.org/src/commit/?id=f6a842313ca28d300beb36c0b765c10e0970b2ca commit f6a842313ca28d300beb36c0b765c10e0970b2ca Author: Kyle Evans AuthorDate: 2022-08-23 02:05:58 +0000 Commit: Kyle Evans CommitDate: 2022-11-11 18:08:46 +0000 split: switch to getline() for line/pattern matching Get rid of split's home-grown logic for growing the buffer; arbitrarily breaking at LONG_MAX bytes instead of 65536 bytes gives us much more wiggle room. Additionally, we'll actually fail out entirely if we can't fit a line, which makes noticing this class of problem much easier. Reviewed by: bapt, emaste, pauamma Sponsored by: Klara, Inc. (cherry picked from commit 5c053aa3c5e907bdd1ac466ce9b58611781c2c20) --- usr.bin/split/split.1 | 8 +++++--- usr.bin/split/split.c | 25 ++++++++++++------------- 2 files changed, 17 insertions(+), 16 deletions(-) diff --git a/usr.bin/split/split.1 b/usr.bin/split/split.1 index 8f287a4163dd..684cad57d4fc 100644 --- a/usr.bin/split/split.1 +++ b/usr.bin/split/split.1 @@ -28,7 +28,7 @@ .\" @(#)split.1 8.3 (Berkeley) 4/16/94 .\" $FreeBSD$ .\" -.Dd May 9, 2013 +.Dd October 25, 2022 .Dt SPLIT 1 .Os .Sh NAME @@ -213,5 +213,7 @@ A .Nm command appeared in .At v3 . -.Sh BUGS -The maximum line length for matching patterns is 65536. +.Pp +Before +.Fx 14 , +pattern matching and only operated on lines shorter than 65,536 bytes. diff --git a/usr.bin/split/split.c b/usr.bin/split/split.c index 9028b29d1c69..008b614f4946 100644 --- a/usr.bin/split/split.c +++ b/usr.bin/split/split.c @@ -70,7 +70,6 @@ static off_t chunks = 0; /* Chunks count to split into. */ static long numlines; /* Line count to split on. */ static int file_open; /* If a file open. */ static int ifd = -1, ofd = -1; /* Input/output file descriptors. */ -static char bfr[MAXBSIZE]; /* I/O buffer. */ static char fname[MAXPATHLEN]; /* File name prefix. */ static regex_t rgx; static int pflag; @@ -203,6 +202,7 @@ main(int argc, char **argv) static void split1(void) { + static char bfr[MAXBSIZE]; off_t bcnt; char *C; ssize_t dist, len; @@ -211,7 +211,7 @@ split1(void) nfiles = 0; for (bcnt = 0;;) - switch ((len = read(ifd, bfr, MAXBSIZE))) { + switch ((len = read(ifd, bfr, sizeof(bfr)))) { case 0: exit(0); case -1: @@ -264,46 +264,45 @@ split1(void) static void split2(void) { + char *buf; + size_t bufsize; + ssize_t len; long lcnt = 0; FILE *infp; + buf = NULL; + bufsize = 0; + /* Stick a stream on top of input file descriptor */ if ((infp = fdopen(ifd, "r")) == NULL) err(EX_NOINPUT, "fdopen"); /* Process input one line at a time */ - while (fgets(bfr, sizeof(bfr), infp) != NULL) { - const int len = strlen(bfr); - - /* If line is too long to deal with, just write it out */ - if (bfr[len - 1] != '\n') - goto writeit; - + while ((len = getline(&buf, &bufsize, infp)) > 0) { /* Check if we need to start a new file */ if (pflag) { regmatch_t pmatch; pmatch.rm_so = 0; pmatch.rm_eo = len - 1; - if (regexec(&rgx, bfr, 0, &pmatch, REG_STARTEND) == 0) + if (regexec(&rgx, buf, 0, &pmatch, REG_STARTEND) == 0) newfile(); } else if (lcnt++ == numlines) { newfile(); lcnt = 1; } -writeit: /* Open output file if needed */ if (!file_open) newfile(); /* Write out line */ - if (write(ofd, bfr, len) != len) + if (write(ofd, buf, len) != len) err(EX_IOERR, "write"); } /* EOF or error? */ - if (ferror(infp)) + if ((len == -1 && errno != 0) || ferror(infp)) err(EX_IOERR, "read"); else exit(0);