Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 21 Feb 2023 11:14:21 +0100
From:      Andreas Kusalananda =?utf-8?B?S8OkaMOkcmk=?= <andreas.kahari@abc.se>
To:        Sysadmin Lists <sysadmin.lists@mailfence.com>
Cc:        Freebsd Questions <freebsd-questions@freebsd.org>
Subject:   Re: BSD-awk print() Behavior
Message-ID:  <Y/SZfSO1CdhIvVUD@harpo.local>
In-Reply-To: <1600449078.170379.1676939080787@fidget.co-bxl>
References:  <1600449078.170379.1676939080787@fidget.co-bxl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Feb 21, 2023 at 01:24:41AM +0100, Sysadmin Lists wrote:
> Trying to wrap my head around what BSD awk is doing here. Although the behavior
> is unwanted for this exercise, it seems like a possibly useful feature or hack
> for future projects. Either way I'd like to understand what's going on.
> 
> I extracted a list of URLs from my browser's history sql file, and when
> iterating over the list with awk got some strange results.
> 
> file_1 has the sql-extracted URLs, and file_2 is a copy-paste of that file's
> contents using vim's yank-and-paste.
> 
> $ cat file_{1,2}
> https://github.com/
> https://github.com/
> https://github.com/
> https://github.com/
> 
> $ diff file_{1,2}  
> 1,2c1,2
> < https://github.com/
> < https://github.com/
> ---
> > https://github.com/
> > https://github.com/
> 
> $ awk '{ print $0 " abc " }' file_{1,2}  
>  abc ://github.com/
>  abc ://github.com/
> https://github.com/ abc 
> https://github.com/ abc 

file_1 is a DOS text file, while file_2 is a Unix text file.  The DOS
text file, when interpreted by tools expecting Unix text, has an extra
carriage-return character at the end of each line.  This carriage-return
character will be part of $0 in the awk code and causes the cursor to be
moved back to the start of the line when printing it, giving the effect
that you are seeing.

This has nothing to do with awk's print keyword.  You would get similar
strange result if you simply pasted the data side by side:

	$ paste file_{1,2}
	https://https://github.com/
	https://https://github.com/

Here, "https://github.com/" is first printed from the DOS text file,
after which the cursor is returned to the start of the line.  Then,
paste inserts a tab character which "steps over" the eight first
characters that had already been outputted ("https://") and then outputs
"https://github.com/" from the Unix text file.


> 
> The sql-extracted URLs cause awk's print() to replace the front of the string
> with text following $0. file_2 does not. I used vim's `:set list' option to
> view hidden chars, but there's no apparent difference between the two --
> although `diff' clearly thinks so. Both files show this when `list' is set:
> 
> https://github.com/$
> https://github.com/$

Yes, because Vim automatically interprets DOS text files as ordinary
text.  I'm asssuming that while editing file_1 in Vim, you see "[dos]"
at the bottom of the screen?


> 
> 
> Here's more background if needed:
[cut]

-- 
Andreas (Kusalananda) Kähäri
SciLifeLab, NBIS, ICM
Uppsala University, Sweden

.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Y/SZfSO1CdhIvVUD>