Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 11 Jul 2020 03:48:10 +0300
From:      Yuri Pankov <yuripv@yuripv.dev>
To:        Mark Millard <marklmi@yahoo.com>
Cc:        Steve Wills <swills@FreeBSD.org>, "daichi@freebsd.org" <daichi@FreeBSD.org>, FreeBSD Current <freebsd-current@freebsd.org>, Hiroki Sato <hrs@FreeBSD.org>
Subject:   Re: svn commit: r352558 - head/usr.bin/top
Message-ID:  <a0b9b38b-d9d1-62d8-e708-3d29ba524dc5@yuripv.dev>
In-Reply-To: <CEE74E37-1C7F-4CA5-B12C-5BFB5E77027D@yahoo.com>
References:  <1BDFB387-930D-4F4D-8729-A5850F1C15B9.ref@yahoo.com> <1BDFB387-930D-4F4D-8729-A5850F1C15B9@yahoo.com> <61107ecc-6f9b-a4db-7b1e-ec75f73939ee@FreeBSD.org> <f8c8e434-39d7-4c7b-d33d-ef8a6b196eb9@yuripv.dev> <BE3A2B48-D593-4733-8EAC-4C70F3F0B9B4@yahoo.com> <d6b7193c-0cfe-a0c2-be94-e261b59b2dd1@yuripv.dev> <CEE74E37-1C7F-4CA5-B12C-5BFB5E77027D@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Mark Millard wrote:
> 
> 
> On 2020-Jul-10, at 16:12, Yuri Pankov <yuripv at yuripv.dev> wrote:
> 
>> Mark Millard wrote:
>>> On 2020-Jul-10, at 11:05, Yuri Pankov <yuripv at yuripv.dev> wrote:
>>>> Steve Wills wrote:
>>>>> On 11/28/19 4:08 PM, Mark Millard via svn-src-head wrote:
>>>>>>> Author: daichi
>>>>>>> Date: Fri Sep 20 17:37:23 2019
>>>>>>> New Revision: 352558
>>>>>>> URL:
>>>>>>> https://svnweb.freebsd.org/changeset/base/352558
>>>>>>>
>>>>>>>
>>>>>>> Log:
>>>>>>>     top(1): support multibyte characters in command names (ARGV array)
>>>>>>>     depending on locale.
>>>>>>>      - add setlocale()
>>>>>>>      - remove printable() function
>>>>>>>      - add VIS_OCTAL and VIS_SAFE to the flag of strvisx() to display
>>>>>>>        non-printable characters that do not use C-style backslash sequences
>>>>>>>        in three digit octal sequence, or remove it
>>>>>>>     This change allows multibyte characters to be displayed according to
>>>>>>>     locale. If it is recognized as a non-display character according to the
>>>>>>>     locale, it is displayed in three digit octal sequence.
>>>>>>>
>>>>>>
>>>>>> Initially picking on tab characters as an example of what is
>>>>>> probably a somewhat broader issue . . .
>>>>>>
>>>>>> Ever since this change, characters like tabs that do not fit
>>>>>> in the next character cell when output, but for which they
>>>>>> are !isprintable(...), now mess up the top display. Again
>>>>>> using tab as an example: line wrapping from the text having
>>>>>> been shifted over by more than one character cell. top does
>>>>>> not track the line wrapping result in how it decides what
>>>>>> to output for the following display updates.
>>>>>>
>>>>> Apologies for the way late reply here, but I just now bothered tracking this down. This commit seems to be the cause of some corruption I'm seeing in long running top(1) as well. As Mark mentions, if I use "hh" it clears up. Should I open a bugzilla bug? I can share screenshots of the corruption, such as:
>>>>> https://i.imgur.com/Xqlwf9h.png
>>>>> https://i.imgur.com/Jv0d5NU.png
>>>>
>>>> Does removing VIS_SAFE fixes the issue for you?
>>>>
>>>> As for original Mark's report (which I missed), removing isprintable() doesn't look wrong as vis(3) should take of its functionality (and in multibyte-aware way).
>>> vis (as used) and the old isprintable logic are not
>>> equivalent when multi-byte is not needed/involved.
>>> Otherwise I'd not have had anything to ever report.
>>> If vis can do what is needed, more work needed to
>>> be done when the change was made in order to avoid
>>> msesed up displays in single-byte contexts.
>>>> Also, is there an easy way to reproduce this?
>>> The following sort of command (the empty space inside quoted
>>> text are tab characters):
>>> # tr '0\n      1\n     2\n     3\n     4\n     5\n     6\n     7\n     8\n' '\t0       \t1     \t2     \t3     \t4     \t5     \t6     \t7     \t8' < /dev/zero > /dev/null
>>> causes my 200 character wide window running top to show:
>>> 32920 root        100    0  12764Ki    2420Ki CPU3     3   2:22  99.87% tr 0\\n	1\\n	2\\n	3\\n	4\\n	5\\n	6\\n	7\\n	8\\n \\t0	\\t1	\\t2	\\t3	\\t4	\\t5	\\t6	\\t733   \\t8       20        7172      5448Ki CPU23   23   0:00   0.04% top -HiSCazopid
>>> But that does not show where the lines wrap at the edges of the window,
>>> so breaking it up explicitly after the first "\" in \\7:
>>> 32920 root        100    0  12764Ki    2420Ki CPU3     3   2:22  99.87% tr 0\\n	1\\n	2\\n	3\\n	4\\n	5\\n	6\\n	7\\n	8\\n \\t0	\\t1	\\t2	\\t3	\\t4	\\t5	\\t6	\
>>> \t733   \\t8       20        7172      5448Ki CPU23   23   0:00   0.04% top -HiSCazopid
>>> Note how \n turned into \\n , taking an extra character for
>>> each \n . Similarly for \t vs. \\t . (Other examples do
>>> similarly.)
>>> The tab characters really do use more than one character cell
>>> on the display (sometimes).
>>> The text from the tr command ends up spread across 2 lines
>>> as things look like in the window where top is running.
>>> I ran top in another ssh session first and then the tr command.
>>> Before running the tr command, top showed as:
>>> 33019 root         20    0  17172Ki    5448Ki CPU24   24   0:00   0.05% top -HiSCazopid
>>> If you do not end up with top listed just after tr in top's output,
>>> then it will not be top's line that ends up partially overwritten.
>>> If you have wider windows, you may need more text in the tr quoted
>>> strings.
>>> In another experiment I inserted a large number of backspace characters
>>> (control-H's) at the front of the first quoted string in the tr command.
>>> The top output displayed:
>>> 0\\n5 ro1\\n    2\\93   3\\n12764\\n   25\\ni CP6\\n   97\\n:12 100.00\t0r \nHiS\\t1pid \\t2	\\t3	\\t4	\\t5	\\t6	\\t
>>> 33094 root         20    0  17172Ki    5488Ki CPU21   21   0:00   0.06% top -HiSCazopid
>>> In other words, backspace moved the cursor position back over prior
>>> fields on the line and then the later line content overwrote those
>>> fields instead of being after "tr" someplace (or truncated off).
>>> Note that part of "-HiSCazopid" shows up on both lines. The extra
>>> is from when top was running but tr had not started yet. top is
>>> not managing text replacement correctly for output characters that
>>> end up not being just "in" the next character-cell on the terminal.
>>> The same sort of result happens when instead adding just one
>>> carriage return (control-M) in front of that first quuoted
>>> string instead:
>>> 0\\n8 ro1\\n    2\\92   3\\n12764\\n   25\\ni CP6\\n  117\\n:11 100.00\t0r \nHiS\\t1pid \\t2	\\t3	\\t4	\\t5	\\t6	\\t
>>> 33094 root         20    0  17172Ki    5488Ki CPU23   23   0:00   0.04% top -HiSCazopid
>>> I do not intend to try to find all examples of characters that
>>> cause problems but used to not cause problems.
>>>  From what I've seen, cursor positioning escape character sequences
>>> seem to be sent through and cause overwrites at arbitrary places
>>> on screen, based on the escape sequence content. There are command
>>> lines around that contain such sequences. So I sometimes see the
>>> first few lines of top's output have garbage text from commands
>>> that were listed below at some point overwriting the top text.
>>> Part of what is going on is top avoiding rewriting characters
>>> that its tracking indicates have not been updated. When the
>>> actual display and that supposed-tracking mismatch, the
>>> display ends up wrong when updated (bad text continues to
>>> display).
>>> The text in commands should not make "top -a" output mess up
>>> the display of other lines in top's output, nor of other
>>> top output fields on the same line. In my view, if some usage
>>> contexts need otherwise, it should take an extra command line
>>> option to put top in a mode that might do such things. The
>>> default behavior should strictly avoid having such things
>>> happen.
>>
>> Thanks.
>>
>> The attached diff seems to take care of the issue for me, adding VIS_TAB and removing VIS_SAFE, which can be blamed for passing through the following:
>>
>> VIS_SAFE   Currently this form allows space, tab, newline, backspace,
>>            bell, and return — in addition to all graphic characters —
>>            unencoded.
>> <top.txt>
> 
> A quick test suggests agreement. We will see how it
> looks for on-going use.
> 
> But I'll note that top's man page should document the
> translations that are being used: it is not the same
> text that top produced before -r352558 and one should
> be able to read the man page to find out how to
> interpret what top reports for the likes of top -a .

I think this was taken care of in r352568, and what it says now is correct:

Non-printable characters in the command line are encoded in C-style 
backslash sequences or a three digit octal sequences.

> (It does not appear that escape sequences or vertical
> tab would have gone through unencoded. So I'm still
> unclear how I ever had the top few lines of top's
> output messed up by command text. So it is also
> unclear that this change would make a difference
> for such. We will see over time if that text is
> ever messed up.)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a0b9b38b-d9d1-62d8-e708-3d29ba524dc5>