From owner-freebsd-current@freebsd.org Wed Jul 19 13:00:21 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0DBF8C7CD29 for ; Wed, 19 Jul 2017 13:00:21 +0000 (UTC) (envelope-from amdmi3@amdmi3.ru) Received: from ihor-3.amdmi3.ru (ihor-3.amdmi3.ru [185.117.152.228]) by mx1.freebsd.org (Postfix) with ESMTP id B9C357E3E9; Wed, 19 Jul 2017 13:00:20 +0000 (UTC) (envelope-from amdmi3@amdmi3.ru) Received: from hive.panopticon (unknown [78.153.152.119]) by ihor-3.amdmi3.ru (Postfix) with ESMTPS id E2D701AF123; Wed, 19 Jul 2017 16:00:12 +0300 (MSK) Received: from hades.panopticon (hades.panopticon [192.168.0.32]) by hive.panopticon (Postfix) with ESMTP id 6C343202D; Wed, 19 Jul 2017 15:53:03 +0300 (MSK) Received: by hades.panopticon (Postfix, from userid 1000) id E60FEB1; Tue, 18 Jul 2017 23:57:00 +0300 (MSK) Date: Tue, 18 Jul 2017 23:57:00 +0300 From: Dmitry Marakasov To: sjg@FreeBSD.org Cc: rum1cro@yandex.ru, ian@FreeBSD.org, kib@FreeBSD.org, freebsd-current@FreeBSD.org Subject: [bmake] bmake sigint handling causing tty corruption Message-ID: <20170718205700.GA2131@hades.panopticon> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: Mutt/1.8.3 (2017-05-23) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Jul 2017 13:00:21 -0000 Hi! Me and Ilya Arkhipov were investigating the cause of this bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=215572 In short, when FreeBSD ports options dialog is interrupted by Ctrl+C, there's chance of sporadic terminal corruption. They are not always reproducible and seem to be dependent on a machine, shell, terminal, tmux used, but are not tied to any specific configuration. The investigation led us to the following conclusion: - the corruption is caused by dialog4ports program (which handles ports options dialogs) not being able to restore terminal state on exit - dialog4ports does indeed try to restore terminal state, but the corresponding ioctl (TIOCSETAW) fails with EIO - examining kern/tty.c suggests that this happens likely because the make which is the session leader or something dies before dialog4ports - which led us to bmake as a culprit Here's the ktrace of the problem (the process hierarchy here is make -> sh -> dialog4ports) --- 78337 dialog4ports CALL sigaction(SIGTSTP,0x800a80228,0) 78337 dialog4ports RET sigaction 0 78337 dialog4ports CALL clock_gettime(0xd,0x7fffffffde08) 78337 dialog4ports RET clock_gettime 0 78337 dialog4ports CALL gettimeofday(0x7fffffffdc90,0) 78337 dialog4ports RET gettimeofday 0 78337 dialog4ports CALL poll(0x7fffffffdca0,0x2,0xffffffff) (make and sh receive SIGINT first) 78265 make RET wait4 RESTART 78335 sh RET wait4 -1 errno 4 Interrupted system call 78265 make PSIG SIGINT caught handler=0x402530 mask=0x0 code=SI_KERNEL 78335 sh PSIG SIGINT caught handler=0x41b950 mask=0x0 code=SI_KERNEL 78265 make CALL lstat(0x800ab9900,0x7fffffffd1f0) 78265 make NAMI "do-config" 78335 sh CALL sigreturn(0x7fffffffd280) 78335 sh RET sigreturn JUSTRETURN 78265 make RET lstat -1 errno 2 No such file or directory 78335 sh CALL wait4(0xffffffff,0x7fffffffd6ec,0,0) 78265 make CALL sigaction(SIGINT,0x7fffffffd250,0x7fffffffd230) 78265 make RET sigaction 0 78265 make CALL kill(0x131b9,SIGINT) 78265 make RET kill 0 78265 make CALL sigreturn(0x7fffffffd2d0) 78265 make RET sigreturn JUSTRETURN (make kills itself) 78265 make PSIG SIGINT SIG_DFL code=SI_USER (dialog4ports finally starts to process the signal) 78337 dialog4ports RET poll -1 errno 4 Interrupted system call 78337 dialog4ports PSIG SIGINT caught handler=0x800855e00 mask=0x0 code=SI_KERNEL 78337 dialog4ports CALL sigaction(SIGINT,0x7fffffffd7c0,0) 78337 dialog4ports RET sigaction 0 78337 dialog4ports CALL ioctl(0x1,TIOCGETA,0x7fffffffd770) 78337 dialog4ports RET ioctl 0 78337 dialog4ports CALL write(0x1,0x801676a00,0x17) 78337 dialog4ports GIO fd 1 wrote 23 bytes 78337 dialog4ports RET write 23/0x17 (this call should restore terminal state, but it fails) 78337 dialog4ports CALL ioctl(0x1,TIOCSETAW,0x80161604c) 78337 dialog4ports RET ioctl -1 errno 5 Input/output error 78337 dialog4ports CALL exit(0x1) --- Here's the ktrace of the case which didn't cause terminal corruption: --- 79506 dialog4ports CALL poll(0x7fffffffdc00,0x2,0xffffffff) 79506 dialog4ports RET poll -1 errno 4 Interrupted system call (dialog4ports is lucky enough to start processing the signal before make) 79506 dialog4ports PSIG SIGINT caught handler=0x800855e00 mask=0x0 code=SI_KERNEL 79506 dialog4ports CALL sigaction(SIGINT,0x7fffffffd720,0) 79506 dialog4ports RET sigaction 0 79506 dialog4ports CALL ioctl(0x1,TIOCGETA,0x7fffffffd6d0) 79506 dialog4ports RET ioctl 0 79506 dialog4ports CALL write(0x1,0x801676a00,0x17) 79506 dialog4ports GIO fd 1 wrote 23 bytes 79506 dialog4ports RET write 23/0x17 (and cleanup succeeds) 79506 dialog4ports CALL ioctl(0x1,TIOCSETAW,0x80161604c) 79506 dialog4ports RET ioctl 0 79506 dialog4ports CALL exit(0x1) 79433 make RET wait4 RESTART 79433 make PSIG SIGINT caught handler=0x402530 mask=0x0 code=SI_KERNEL 79433 make CALL lstat(0x800ab4980,0x7fffffffd140) 79433 make NAMI "do-config" 79433 make RET lstat -1 errno 2 No such file or directory 79433 make CALL sigaction(SIGINT,0x7fffffffd1a0,0x7fffffffd180) 79433 make RET sigaction 0 79433 make CALL kill(0x13649,SIGINT) 79433 make RET kill 0 79433 make CALL sigreturn(0x7fffffffd220) 79433 make RET sigreturn JUSTRETURN 79433 make PSIG SIGINT SIG_DFL code=SI_USER 79504 sh RET wait4 -1 errno 4 Interrupted system call 79504 sh PSIG SIGINT caught handler=0x41b950 mask=0x0 code=SI_KERNEL 79504 sh CALL sigreturn(0x7fffffffd1d0) 79504 sh RET sigreturn JUSTRETURN --- For reference, here's the program which demonstrates the tty layer behaviour which causes this: --- #include #include #include #include #include #include #include #include int main() { struct termios t; int ret; // save terminal state ret = ioctl(1, TIOCGETA, &t); fprintf(stderr, "ioctl(1, TIOCGETA) -> %d / %s\n", ret, strerror(errno)); pid_t p = fork(); if (p > 0) { // parent would die from SIGTERM early kill(getpid(), SIGTERM); } else if (p == 0) { // child tries to restore terminal state with some delay usleep(1000); // because parent is dead now, this will fail with EIO ret = ioctl(1, TIOCSETAW, &t); fprintf(stderr, "ioctl(1, TIOCSETAW) -> %d / %s\n", ret, strerror(errno)); } return 0; } --- Now to fix this, I suggest that instead of killing itself, make should signal all its childs carefully and wait() on them, only then die itself. Now after a quick glance at bmake sources it seems like the jobs control code https://svnweb.freebsd.org/base/head/contrib/bmake/job.c?revision=317239&view=markup#l2633 does the very same thing that I've just described, however bmake is run in compat mode by default, and CompatInterrupt does exactly what ktrace shows - it just kills itself. https://svnweb.freebsd.org/base/head/contrib/bmake/compat.c?revision=310304&view=markup#l180 So, to fix this problem it seems that CompatInterrupt should be improved as described above. Also wanted to ask kib@, ian@ (as recent committers to tty.c) it this behavior of tty layer is correct and if it could be improved. -- Dmitry Marakasov . 55B5 0596 FF1E 8D84 5F56 9510 D35A 80DD F9D2 F77D amdmi3@amdmi3.ru ..: jabber: amdmi3@jabber.ru http://amdmi3.ru