From owner-freebsd-stable@FreeBSD.ORG Wed Apr 18 01:26:13 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4AA0D106566B for ; Wed, 18 Apr 2012 01:26:13 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta07.westchester.pa.mail.comcast.net (qmta07.westchester.pa.mail.comcast.net [76.96.62.64]) by mx1.freebsd.org (Postfix) with ESMTP id 06B078FC08 for ; Wed, 18 Apr 2012 01:26:12 +0000 (UTC) Received: from omta09.westchester.pa.mail.comcast.net ([76.96.62.20]) by qmta07.westchester.pa.mail.comcast.net with comcast id zD6M1i0050SCNGk57DR6DP; Wed, 18 Apr 2012 01:25:06 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta09.westchester.pa.mail.comcast.net with comcast id zDR41i0101t3BNj3VDR5DX; Wed, 18 Apr 2012 01:25:05 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id B9976102C1E; Tue, 17 Apr 2012 18:25:03 -0700 (PDT) Date: Tue, 17 Apr 2012 18:25:03 -0700 From: Jeremy Chadwick To: freebsd-stable@freebsd.org Message-ID: <20120418012503.GA25161@icarus.home.lan> References: <20120417182242.GA58449@icarus.home.lan> <20120417195115.GA60447@icarus.home.lan> <20120417200252.GA60718@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=unknown-8bit Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20120417200252.GA60718@icarus.home.lan> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: top not restoring terminal echo/icanon correctly X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 01:26:13 -0000 On Tue, Apr 17, 2012 at 01:02:52PM -0700, Jeremy Chadwick wrote: > On Tue, Apr 17, 2012 at 12:51:15PM -0700, Jeremy Chadwick wrote: > > On Tue, Apr 17, 2012 at 11:22:42AM -0700, Jeremy Chadwick wrote: > > > (Please keep me CC'd as I'm not subscribed to the list) > > > > > > I'd like to request that folks running RELENG_8 (and RELENG_9, though I > > > do not use it) please check the behaviour of their terminal after each > > > of following commands are run (check terminal after each command): > > > > > > top -a (press "q" after 1 screen refresh) > > > top -b > > > > > > If you find that your input characters in your shell aren't being echo'd > > > back after one of the above commands, blindly type "stty icanon echo" > > > and hit and things should be back to normal. > > > > > > What I'm looking for is confirmation from others of the problem. > > > > > > Also very important: please provide uname -a output, specifically world > > > rebuild date. It greatly matters, because a commit was recently done > > > where now -b functions fine (was previously busted in this way), but now > > > -a behaves like -b did. So src/world date matters. > > > > > > All of this is documented in PR 161739. I urge anyone experiencing this > > > problem to read that PR in full, as I spent many hours today writing a > > > debug routine to confirm that top is sometimes not calling tcsetattr() > > > with the original terminal parameters when it exits, and what the > > > condition seems to be. > > > > > > http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/161739 > > > > > > Finally, if anyone want to tackle the problem (work out the logic bug > > > that is in there which causes it), please be my guest. I have other > > > things going on right now (doctors appointments) so I don't have as much > > > time as I'd like. > > > > > > Thanks everyone. > > > > Thanks to all those who have responded, including kib@. > > > > The problem is very odd and appears specific to the bash shell, but > > only "somewhat". Below is a part of what I sent kib@ on the matter. > > A chart showing what I've found: > > > > Location Username TERM Shell bug? > > ============================================================= > > VGA console root cons25 /bin/csh no > > VGA console jdc cons25 /usr/local/bin/bash no > > SSH (PuTTY) root xterm /bin/csh no > > SSH (PuTTY) jdc xterm /bin/csh no > > SSH (PuTTY) jdc xterm /usr/local/bin/bash yes > > ============================================================= > > > > In the last case (and only that case): if I move my dotfiles (.bashrc > > and .bash_profile) aside and log in (SSH), top behaves normally. > > Naturally this made me think "something is wonky with my dotfiles!"... > > > > But the problem *doesn't* happen if my dotfiles are intact and I > > log in via VGA console (cons25) as myself with a bash shell. So it's > > almost like there's some bizarre combination of things that causes this > > problem. > > > > I'll continue to try and narrow it down. > > Boy this is a weird one. > > In my .bashrc I've been using the following statement to show all > arguments and set the update interval to 1 second: > > export TOP="-a -s 1" > > Removing this completely and logging back in results in no problems > when running "top" in any way (top -a, top -s 1, or top -a -s 1). > > However, the problem I describe with icanon/echo never getting restored > rears its ugly head when running "top -a -b", but only under bash > (with no dotfiles too). It never happens under csh. > > I can also reproduce this on VGA console when logged in (regardless of > user), e.g. log in as root (get csh shell), run bash, "top -a -b", bam, > issue happens. > > I don't know what's different about the two shells at this level that > would cause oddities like this, but um, yeah..... *blink* After further conversations with kib@, and my own prodding at the code at his request, I did what's usually required these days: down the rabbit hole. First, it actually has nothing to do with the shell. It's interesting that I can reproduce it but only with bash and when I have specific environment settings or flags used when passed to top, and not with csh/zsh. Maybe it's luck of the draw. Maybe it has to do with my terminal size (132x43). Maybe it's something else. Not sure. Anyway, kib@ pointed me to the fact that top(1) corrupts memory, meaning it overwrites/destroys its own variables in some cases. When top is run, it does a series of variables to determine the environment its being run from. It makes use of termios(4), and uses tcgetattr(3) to save terminal device settings (stuff you'd see via stty -a) before modifying them -- specifically disabling the ICANON and ECHO capabilities. Normally top restores these on exit. However, as my PR and Email suggests, there are times where it doesn't. How top decides whether or not to do this is via a simple if statement based on a global variable called is_a_terminal. If this is true/1, then it issues tcsetattr(3) with the original terminal device settings it fetched from when top was run initially. If false, it doesn't. kib@ recommended that I set a hardware watchpoint in gdb for is_a_terminal and see if it was changing while top ran. Sure enough that's the case. (I had to use a software watchpoint since a hardware one wasn't permitted by gdb, but that's not important; software ones just result in the program running significantly slower). Here's the results, which confirms his claims. (gdb) watch is_a_terminal Watchpoint 1: is_a_terminal (gdb) x/x &is_a_terminal 0x511de0 : 0x00000000 is_a_terminal defaults to 0. It's also very important to note that the variable is at address 0x511de0. (gdb) p &fmt $1 = (char (*)[128]) 0x511d60 (gdb) p fmt $2 = '\0' The above will make more sense in a moment... (gdb) run -a -b Starting program: /home/jdc/usr.bin/top/top -a -b Watchpoint 1: is_a_terminal Watchpoint 1: is_a_terminal Watchpoint 1: is_a_terminal Watchpoint 1: is_a_terminal Watchpoint 1: is_a_terminal Watchpoint 1: is_a_terminal Old value = 0 '\0' New value = 1 '\001' init_screen () at /home/jdc/usr.bin/top/../../contrib/top/screen.c:317 317 putcap(terminal_init); Here init_screen() has set is_a_terminal to 1. Good. Now we continue... (gdb) c Continuing. Watchpoint 1: is_a_terminal Old value = 1 '\001' New value = 0 '\0' 0x00000000a0aa1448 in vsprintf () from /lib/libc.so.7 (gdb) x/x &is_a_terminal 0x511de0 : 0x00000000 Eh? Something just changed its value to zero? vsprintf? Stack trace time. (gdb) bt #0 0x00000000a0aa1448 in vsprintf () from /lib/libc.so.7 #1 0x00000000a0aa04c8 in sprintf () from /lib/libc.so.7 #2 0x0000000000407170 in format_next_process (handle=0x511d50 "\230\003са", get_userid=0x40c100 , flags=1) at machine.c:963 #3 0x000000000040af1c in main (argc=3, argv=0x7fffffffea18) at /home/jdc/usr.bin/top/../../contrib/top/top.c:678 Frame 2 looks suspicious... (gdb) f 2 #2 0x0000000000407170 in format_next_process (handle=0x511d50 "\230\003са", get_userid=0x40c100 , flags=1) at machine.c:963 963 sprintf(fmt, proc_fmt, Uh oh, sprintf(), not even snprintf(). Living dangerously... Let's see how dangerous. (gdb) p fmt $3 = " 6482 halbot 1 44 0 90940K 75912K select 0 0:22 0.00% /usr/local/bin/perl /home/halbot/hal/halbot.pl /home/halbot/h" (gdb) x/s fmt 0x511d60 : " 6482 halbot 1 44 0 90940K 75912K select 0 0:22 0.00% /usr/local/bin/perl /home/halbot/hal/halbot.pl /home/halbot/h" (gdb) p strlen(fmt) $4 = 128 (gdb) x/s proc_fmt 0x40d828 <_fini+2980>: "%5d%s %-*.*s %s%3d %4s%7s %6s %-6.6s %2d%7s %5.2f%% %.*s" The fmt variable is actually a "static char fmt[128]". So what comes after that? (gdb) x/x 0x511d60+128 0x511de0 : 0x0000000000000000 Ah ha... yep: 0x511de0 - 0x511d60 = 128. Thus as shown, the ECHO and ICANON settings never get restored to the terminal device because the sprintf() resulted in writing past the end of the buffer. We can easily solve this by making use of snprintf() instead, with a sizeof(fmt), limiting the total string length printed to 128. And I have confirmed that indeed solves the problem. There are lots and lots of warnings in this code too, due to lack of proper #include statements and some other minor annoyances. I could clean these up as well. Thoughts/comments? Should I close my existing PR and open a new one? -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |