From owner-freebsd-current@freebsd.org Tue Jul 25 17:27:23 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2CA47CFDAE3 for ; Tue, 25 Jul 2017 17:27:23 +0000 (UTC) (envelope-from rum1cro@yandex.ru) Received: from forward3h.cmail.yandex.net (forward3h.cmail.yandex.net [87.250.230.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "forwards.mail.yandex.net", Issuer "Yandex CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BBDF1847C0; Tue, 25 Jul 2017 17:27:22 +0000 (UTC) (envelope-from rum1cro@yandex.ru) Received: from mxback7j.mail.yandex.net (mxback7j.mail.yandex.net [IPv6:2a02:6b8:0:1619::110]) by forward3h.cmail.yandex.net (Yandex) with ESMTP id 6E727211A0; Tue, 25 Jul 2017 20:27:13 +0300 (MSK) Received: from web54g.yandex.ru (web54g.yandex.ru [95.108.252.224]) by mxback7j.mail.yandex.net (nwsmtp/Yandex) with ESMTP id YnxaDyRov8-RC4ux2Dw; Tue, 25 Jul 2017 20:27:12 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1501003632; bh=24Wu0aNUnc3C99jOWxnTy6N3NfQilh+BLGka9VclSWc=; h=From:To:Cc:In-Reply-To:References:Subject:Message-Id:Date; b=mvDk0ZKI2Z/voqdk5g6kYcd/CAPQrJxZ558wZ9ceZFCcwJpn8timT1/4oOh28qVe5 21+e6mV9jhieGkCO8wf2UsrcjPCmGyuYNzMTSL/QaFMpVY7poxeHNHB7ljPPO5Cfpo tIR3T8tiBv9bqHp8ukO/ts6UIgThzWsXIIwByQ7I= Authentication-Results: mxback7j.mail.yandex.net; dkim=pass header.i=@yandex.ru Received: by web54g.yandex.ru with HTTP; Tue, 25 Jul 2017 20:27:12 +0300 From: Ilya A. Arkhipov Envelope-From: rum1cro@yandex.ru To: Dmitry Marakasov , "sjg@FreeBSD.org" Cc: "ian@FreeBSD.org" , "kib@FreeBSD.org" , "freebsd-current@FreeBSD.org" In-Reply-To: <20170718205700.GA2131@hades.panopticon> References: <20170718205700.GA2131@hades.panopticon> Subject: Re: [bmake] bmake sigint handling causing tty corruption MIME-Version: 1.0 Message-Id: <408041501003632@web54g.yandex.ru> X-Mailer: Yamail [ http://yandex.ru ] 5.0 Date: Tue, 25 Jul 2017 20:27:12 +0300 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=utf-8 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Jul 2017 17:27:23 -0000 19.07.2017, 16:00, "Dmitry Marakasov" : > Hi! > > Me and Ilya Arkhipov were investigating the cause of this bug: > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=215572 > > In short, when FreeBSD ports options dialog is interrupted by Ctrl+C, > there's chance of sporadic terminal corruption. They are not always > reproducible and seem to be dependent on a machine, shell, terminal, > tmux used, but are not tied to any specific configuration. > > The investigation led us to the following conclusion: > > - the corruption is caused by dialog4ports program (which handles ports >   options dialogs) not being able to restore terminal state on exit > - dialog4ports does indeed try to restore terminal state, but the >   corresponding ioctl (TIOCSETAW) fails with EIO > - examining kern/tty.c suggests that this happens likely because the make >   which is the session leader or something dies before dialog4ports > - which led us to bmake as a culprit > > Here's the ktrace of the problem (the process hierarchy here is make -> > sh -> dialog4ports) > > --- > 78337 dialog4ports CALL sigaction(SIGTSTP,0x800a80228,0) > 78337 dialog4ports RET sigaction 0 > 78337 dialog4ports CALL clock_gettime(0xd,0x7fffffffde08) > 78337 dialog4ports RET clock_gettime 0 > 78337 dialog4ports CALL gettimeofday(0x7fffffffdc90,0) > 78337 dialog4ports RET gettimeofday 0 > 78337 dialog4ports CALL poll(0x7fffffffdca0,0x2,0xffffffff) > > (make and sh receive SIGINT first) > > 78265 make RET wait4 RESTART > 78335 sh RET wait4 -1 errno 4 Interrupted system call > 78265 make PSIG SIGINT caught handler=0x402530 mask=0x0 code=SI_KERNEL > 78335 sh PSIG SIGINT caught handler=0x41b950 mask=0x0 code=SI_KERNEL > 78265 make CALL lstat(0x800ab9900,0x7fffffffd1f0) > 78265 make NAMI "do-config" > 78335 sh CALL sigreturn(0x7fffffffd280) > 78335 sh RET sigreturn JUSTRETURN > 78265 make RET lstat -1 errno 2 No such file or directory > 78335 sh CALL wait4(0xffffffff,0x7fffffffd6ec,0,0) > 78265 make CALL sigaction(SIGINT,0x7fffffffd250,0x7fffffffd230) > 78265 make RET sigaction 0 > 78265 make CALL kill(0x131b9,SIGINT) > 78265 make RET kill 0 > 78265 make CALL sigreturn(0x7fffffffd2d0) > 78265 make RET sigreturn JUSTRETURN > > (make kills itself) > > 78265 make PSIG SIGINT SIG_DFL code=SI_USER > > (dialog4ports finally starts to process the signal) > > 78337 dialog4ports RET poll -1 errno 4 Interrupted system call > 78337 dialog4ports PSIG SIGINT caught handler=0x800855e00 mask=0x0 code=SI_KERNEL > 78337 dialog4ports CALL sigaction(SIGINT,0x7fffffffd7c0,0) > 78337 dialog4ports RET sigaction 0 > 78337 dialog4ports CALL ioctl(0x1,TIOCGETA,0x7fffffffd770) > 78337 dialog4ports RET ioctl 0 > 78337 dialog4ports CALL write(0x1,0x801676a00,0x17) > 78337 dialog4ports GIO fd 1 wrote 23 bytes > 78337 dialog4ports RET write 23/0x17 > > (this call should restore terminal state, but it fails) > > 78337 dialog4ports CALL ioctl(0x1,TIOCSETAW,0x80161604c) > 78337 dialog4ports RET ioctl -1 errno 5 Input/output error > 78337 dialog4ports CALL exit(0x1) > --- > > Here's the ktrace of the case which didn't cause terminal corruption: > > --- > 79506 dialog4ports CALL poll(0x7fffffffdc00,0x2,0xffffffff) > 79506 dialog4ports RET poll -1 errno 4 Interrupted system call > > (dialog4ports is lucky enough to start processing the signal before make) > > 79506 dialog4ports PSIG SIGINT caught handler=0x800855e00 mask=0x0 code=SI_KERNEL > 79506 dialog4ports CALL sigaction(SIGINT,0x7fffffffd720,0) > 79506 dialog4ports RET sigaction 0 > 79506 dialog4ports CALL ioctl(0x1,TIOCGETA,0x7fffffffd6d0) > 79506 dialog4ports RET ioctl 0 > 79506 dialog4ports CALL write(0x1,0x801676a00,0x17) > 79506 dialog4ports GIO fd 1 wrote 23 bytes > 79506 dialog4ports RET write 23/0x17 > > (and cleanup succeeds) > > 79506 dialog4ports CALL ioctl(0x1,TIOCSETAW,0x80161604c) > 79506 dialog4ports RET ioctl 0 > 79506 dialog4ports CALL exit(0x1) > 79433 make RET wait4 RESTART > 79433 make PSIG SIGINT caught handler=0x402530 mask=0x0 code=SI_KERNEL > 79433 make CALL lstat(0x800ab4980,0x7fffffffd140) > 79433 make NAMI "do-config" > 79433 make RET lstat -1 errno 2 No such file or directory > 79433 make CALL sigaction(SIGINT,0x7fffffffd1a0,0x7fffffffd180) > 79433 make RET sigaction 0 > 79433 make CALL kill(0x13649,SIGINT) > 79433 make RET kill 0 > 79433 make CALL sigreturn(0x7fffffffd220) > 79433 make RET sigreturn JUSTRETURN > 79433 make PSIG SIGINT SIG_DFL code=SI_USER > 79504 sh RET wait4 -1 errno 4 Interrupted system call > 79504 sh PSIG SIGINT caught handler=0x41b950 mask=0x0 code=SI_KERNEL > 79504 sh CALL sigreturn(0x7fffffffd1d0) > 79504 sh RET sigreturn JUSTRETURN > --- > > For reference, here's the program which demonstrates the tty layer > behaviour which causes this: > > --- > #include > #include > #include > #include > #include > #include > #include > #include > > int main() { >         struct termios t; >         int ret; > >         // save terminal state >         ret = ioctl(1, TIOCGETA, &t); >         fprintf(stderr, "ioctl(1, TIOCGETA) -> %d / %s\n", ret, strerror(errno)); > >         pid_t p = fork(); >         if (p > 0) { >                 // parent would die from SIGTERM early >                 kill(getpid(), SIGTERM); >         } else if (p == 0) { >                 // child tries to restore terminal state with some delay >                 usleep(1000); > >                 // because parent is dead now, this will fail with EIO >                 ret = ioctl(1, TIOCSETAW, &t); >                 fprintf(stderr, "ioctl(1, TIOCSETAW) -> %d / %s\n", ret, strerror(errno)); >         } > >         return 0; > } > --- > > Now to fix this, I suggest that instead of killing itself, make should > signal all its childs carefully and wait() on them, only then die > itself. > > Now after a quick glance at bmake sources it seems like the jobs control > code > > https://svnweb.freebsd.org/base/head/contrib/bmake/job.c?revision=317239&view=markup#l2633 > > does the very same thing that I've just described, however bmake is run > in compat mode by default, and CompatInterrupt does exactly what ktrace > shows - it just kills itself. > > https://svnweb.freebsd.org/base/head/contrib/bmake/compat.c?revision=310304&view=markup#l180 > > So, to fix this problem it seems that CompatInterrupt should be improved > as described above. > > Also wanted to ask kib@, ian@ (as recent committers to tty.c) it this > behavior of tty layer is correct and if it could be improved. > > -- > Dmitry Marakasov . 55B5 0596 FF1E 8D84 5F56 9510 D35A 80DD F9D2 F77D > amdmi3@amdmi3.ru ..: jabber: amdmi3@jabber.ru http://amdmi3.ru FYI: it issue was fixed https://svnweb.freebsd.org/base?view=revision&revision=321410 -- With Best Regards, Ilya A. Arkhipov