From nobody Tue Feb 21 00:24:41 2023 X-Original-To: freebsd-questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PLKmb2bqyz3shX7 for ; Tue, 21 Feb 2023 00:24:47 +0000 (UTC) (envelope-from sysadmin.lists@mailfence.com) Received: from mailout-l3b-97.contactoffice.com (mailout-l3b-97.contactoffice.com [212.3.242.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4PLKmZ1Dy6z44Zt for ; Tue, 21 Feb 2023 00:24:45 +0000 (UTC) (envelope-from sysadmin.lists@mailfence.com) Authentication-Results: mx1.freebsd.org; dkim=fail ("body hash did not verify") header.d=mailfence.com header.s=20210208-e7xh header.b="J+/KZ+9h"; spf=pass (mx1.freebsd.org: domain of sysadmin.lists@mailfence.com designates 212.3.242.97 as permitted sender) smtp.mailfrom=sysadmin.lists@mailfence.com; dmarc=pass (policy=quarantine) header.from=mailfence.com Received: from fidget.co-bxl (fidget.co-bxl [10.2.0.33]) by mailout-l3b-97.contactoffice.com (Postfix) with ESMTP id 793C5280 for ; Tue, 21 Feb 2023 01:24:43 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1676939083; s=20210208-e7xh; d=mailfence.com; i=sysadmin.lists@mailfence.com; h=Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding:To:Subject:From; l=1835; bh=R/Yorz4vX5LLu76rdPIJ1RcT+qxKqY4HNuIz+fPTMN8=; b=J+/KZ+9hEHUIFLRLqGD97FdrTV+wlABnC6ZgD3iCQC2buqxpkBf6gZj+p2jt9lq0 JNvfNY3spd/g6YlDR3vGsm2H6EdeVvy3wG/FMz/bv8uYLzuuTHxwCionk8yBFZi5m2d OyyrEWg1f7TRuZDpwTwHnZy/GhQ/Wn/nEc2jAfsEffJK4HKQBUUbAtVL42veULCsxGL 6CBzrh/if5ftca2ABfur+unfyDPeUFCUKiVX1GsmAisZJcITIZuTWXqfshsx5GH+0lb eU0BeElhtTEZH/9oGcWaIBb1cS3GW7Ezphg7ulM9qii6WfTY7cWiBP//yurLHhswzF2 EflgfNHU9g== Date: Tue, 21 Feb 2023 01:24:41 +0100 (CET) Message-ID: <1600449078.170379.1676939080787@fidget.co-bxl> List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit To: Freebsd Questions Subject: BSD-awk print() Behavior From: Sysadmin Lists X-Mailer: ContactOffice Mail X-ContactOffice-Account: com:312482426 X-Spamd-Result: default: False [-1.87 / 15.00]; NEURAL_HAM_SHORT(-1.00)[-0.998]; NEURAL_SPAM_LONG(0.93)[0.934]; NEURAL_HAM_MEDIUM(-0.92)[-0.916]; DMARC_POLICY_ALLOW_WITH_FAILURES(-0.50)[]; R_SPF_ALLOW(-0.20)[+ip4:212.3.242.64/26]; MIME_GOOD(-0.10)[text/plain]; RCVD_IN_DNSWL_LOW(-0.10)[212.3.242.97:from]; XM_UA_NO_VERSION(0.01)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_REJECT(0.00)[mailfence.com:s=20210208-e7xh]; MIME_TRACE(0.00)[0:+]; MLMMJ_DEST(0.00)[freebsd-questions@freebsd.org]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:10753, ipnet:212.3.242.64/26, country:US]; DMARC_POLICY_ALLOW(0.00)[mailfence.com,quarantine]; DKIM_TRACE(0.00)[mailfence.com:-]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; PREVIOUSLY_DELIVERED(0.00)[freebsd-questions@freebsd.org]; TO_DN_ALL(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[] X-Rspamd-Queue-Id: 4PLKmZ1Dy6z44Zt X-Spamd-Bar: - X-ThisMailContainsUnwantedMimeParts: N Trying to wrap my head around what BSD awk is doing here. Although the behavior is unwanted for this exercise, it seems like a possibly useful feature or hack for future projects. Either way I'd like to understand what's going on. I extracted a list of URLs from my browser's history sql file, and when iterating over the list with awk got some strange results. file_1 has the sql-extracted URLs, and file_2 is a copy-paste of that file's contents using vim's yank-and-paste. $ cat file_{1,2} https://github.com/ https://github.com/ https://github.com/ https://github.com/ $ diff file_{1,2} 1,2c1,2 < https://github.com/ < https://github.com/ --- > https://github.com/ > https://github.com/ $ awk '{ print $0 " abc " }' file_{1,2} abc ://github.com/ abc ://github.com/ https://github.com/ abc https://github.com/ abc The sql-extracted URLs cause awk's print() to replace the front of the string with text following $0. file_2 does not. I used vim's `:set list' option to view hidden chars, but there's no apparent difference between the two -- although `diff' clearly thinks so. Both files show this when `list' is set: https://github.com/$ https://github.com/$ Here's more background if needed: I extracted the URLs using sqlite3 like so: for f in History-16768665* do sqlite3 --bail $f <<-HEREDOC .mode csv .output ${f}.csv select * from urls where url like '%github%'; HEREDOC done Then tried to create a list of unique URLs using `sort -u' but it broke because of special chars in the extracted lines (so it claimed). I used awk to get a unique list instead: for f in *.csv; do [[ -s $f ]] && list="${list} $f"; done; echo $list awk '{ u[$0] } END { for (e in u) print e > "file_1" }' $list -- Sent with https://mailfence.com Secure and private email