From owner-freebsd-bugs@freebsd.org Wed Sep 16 22:34:14 2015 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4E4CB9CEE41 for ; Wed, 16 Sep 2015 22:34:14 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 398141FEB for ; Wed, 16 Sep 2015 22:34:14 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8GMYE1N067641 for ; Wed, 16 Sep 2015 22:34:14 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 203162] when close(fd) on a fifo fails with EINTR, the file descriptor is not really closed Date: Wed, 16 Sep 2015 22:34:14 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.1-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: victor.stinner@gmail.com X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Sep 2015 22:34:14 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203162 Bug ID: 203162 Summary: when close(fd) on a fifo fails with EINTR, the file descriptor is not really closed Product: Base System Version: 10.1-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: victor.stinner@gmail.com Created attachment 161126 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=161126&action=edit Program to reproduce the bug tl;dr The close() syscall doesn't close correctly a FIFO file descriptor when close() is interrupted by a signal. Hi, I'm working on the Python project. Python 3.5 now retries syscalls when a syscall fails with EINTR. This change is described in https://www.python.org/dev/peps/pep-0475/ The associated unit test "test_eintr" hangs sometimes on our FreeBSD buildbots (FreeBSD 9 and 10). It took me some days to identify that test_open() of test_eintr hangs sometimes. It looks like the test hangs when the close() syscall fails with EINTR. By the way, the BSD C library ignores EINTR in this case, so the caller of the close() function is not aware that the syscall failed with EINTR. In Python, it was also decided to ignore EINTR on close() and dup2() because the file descriptor is closed anyway. It's explained in the PEP 475 (see the link below). The test ensures that Python retries correctly open() when the function fails with EINTR. The test uses two processes, I will call them the parent process and the child process. To get a EINTR on open(), the test uses a FIFO created by mkfifo(). The parent calls mkfifo() and immedialty tries to open the FIFO for writing: open() blocks until the child opens the FIFO for reading. Both processes uses setitimer() to inject SIGARLM signals every 10 ms. The child process sleeps 100 ms, opens the FIFO for reading and then close it. Attached tarball contains a C program based on the Python unit test. To reproduce the bug, run ./test.sh multiple times in different terminals, you have to pass a different number to each run (to name the truss log file): the program should hang after between 1 and 5 minutes. You may have to stop/restart the script: truss creates a ghost process for the child process which becomes , so quickly we will reach the number of processes limit. I noticed two cases: the test hangs (no more output) or the test slowly fills the terminal with "@". The "@" character is written each time that open() fails with EINTR in the child process (only in the child process, this case doesn't produce output in the parent process). I'm quite sure that truss has bugs and fails to log correctly syscalls in the parent and the child process. To workaround truss bugs, I wrote my program to ensure that open(path, O_WRONLY) returns the fd 3 in the parent process and open(path, O_RDONLY) returns the fd 4 in the child process. So depending on the fd number, you can check if it's the parent or the child process. It helps to workaround truss bugs. When the close() syscall fails with EINTR: fstat(fd) fails with EBAD, so the file descriptor seems to be really closed. Note: I reproduced the bug in a VM running FreeBSD 10.1-RELEASE-p6 with a single core (1 virtual CPU in fact). Note: I'm following evolutions of the FreeBSD kernel from the Python test suite. I noticed that FreeBSD made *huge* progresses on handling threads and signals. Congrats :-) -- You are receiving this mail because: You are the assignee for the bug.