From owner-freebsd-python@freebsd.org Tue May 9 18:23:48 2017 Return-Path: Delivered-To: freebsd-python@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B04A4D65545 for ; Tue, 9 May 2017 18:23:48 +0000 (UTC) (envelope-from pmhausen@me.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 94632A3F for ; Tue, 9 May 2017 18:23:48 +0000 (UTC) (envelope-from pmhausen@me.com) Received: by mailman.ysv.freebsd.org (Postfix) id 93AE8D65544; Tue, 9 May 2017 18:23:48 +0000 (UTC) Delivered-To: python@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 92D12D65543 for ; Tue, 9 May 2017 18:23:48 +0000 (UTC) (envelope-from pmhausen@me.com) Received: from pv38p43im-ztdg05061201.me.com (pv38p43im-ztdg05061201.me.com [17.133.183.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7972BA3E for ; Tue, 9 May 2017 18:23:48 +0000 (UTC) (envelope-from pmhausen@me.com) Received: from process-dkim-sign-daemon.pv38p43im-ztdg05061201.me.com by pv38p43im-ztdg05061201.me.com (Oracle Communications Messaging Server 7.0.5.38.0 64bit (built Feb 26 2016)) id <0OPP00N0045YJ700@pv38p43im-ztdg05061201.me.com> for python@freebsd.org; Tue, 09 May 2017 17:23:42 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=me.com; s=4d515a; t=1494350622; bh=BF7AIP2YpI9L75Uu8FFR4SW4495E2dQjlEjMhTQTexU=; h=From:Content-type:MIME-version:Subject:Message-id:Date:To; b=dXeHRE/vtUFig5QBqHN2hO6W+4ZFSto1xU45pChNbyLf2UCAxY6ASThqZEe7TtdsU SSInezyrm3JSB5+2xqRXf2RItOxsu1wcpj1V7k6Ssts7IOzE8OFv88hFgQ+P84CvlJ t9LEUQ68J5BBKW+cgeqpUtl5w1K/EWu3cmemTU7e5ZNbNO7AimpC7z0WTaWchyVSQZ VbZQG7i9Y27ccodVrzeQh1qHOkGmKsWpTE0FbyJP2zLc6nPZotlcRzloIIG8UlCgK0 EA5xWsQbhy+apy7Ak0mqFrpeIS64sEOZkcvwYMW9m+EFHL6oADms2DOw/itru2bbmu XJxEfJ2eaCv6Q== Received: from icloud.com ([127.0.0.1]) by pv38p43im-ztdg05061201.me.com (Oracle Communications Messaging Server 7.0.5.38.0 64bit (built Feb 26 2016)) with ESMTPSA id <0OPP007R94BFBW30@pv38p43im-ztdg05061201.me.com> for python@freebsd.org; Tue, 09 May 2017 17:23:42 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-05-09_14:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 clxscore=1034 suspectscore=14 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1701120000 definitions=main-1705090094 From: pmhausen Content-type: text/plain; charset=utf-8 Content-transfer-encoding: quoted-printable MIME-version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Proposal for pkg-message for python 3.x ports Message-id: <36BF7EEF-EE06-4A70-B0F2-69D1454CD1FF@me.com> Date: Tue, 09 May 2017 19:23:38 +0200 To: python@freebsd.org X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-python@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: FreeBSD-specific Python issues List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 May 2017 18:23:48 -0000 Dear python port maintainers, I discovered an odd performance problem with python 3.6 on servers with large amounts of memory. You can find all the details in a thread of the help@python.org mailinglist with subject "Python bug? Closing *all* file descriptors on fork() instead of only = open ones?" Unfortunately I cannot get at the list archives. Otherwise I would have = sent you a direct link. Short version: since python 3.3 (if my google-fu worked) python tries to close all open file descriptors in the child after fork(). Since there = is no standard POSIX compliant way to do this for *open* descriptors only, it tries to use /dev/fd on FreeBSD and if that is not available it = closes *all* descriptors (resulting in ERR#9 for the majority). On a 64 bit = FreeBSD 11 system with 128 GB of RAM the auto-tuned kern.maxfilesperproc is between 3 and 4 million. That's a heck of a lot of system calls for = each fork() ;-) I suggest adding a hint to mount fdescfs and add it to /etc/fstab in the pkg-message for the affected python ports. I'll attach my first message to the list below. The solution is to mount = /dev/fd. Kind regards, Patrick ----------------------- Hi, all, last week we encountered an odd behaviour of a particular python = application on a freshly deployed server. The application in question is: = https://github.com/iocage/iocage On our test system (SMALLBOX) everything seemed to be working fine while = on the new designated production system (BIGBOX) every invocation of iocage = was incredibly slow =E2=80=A6 This was suprising since BIGBOX features 20 cores, 128 GB of RAM, SSD = only while SMALLBOX is a run-off-the-mill server with 4 cores, 16 GB of RAM = and spinning drives we used for development. I finally found that the problem may not lie with iocage. To reproduce = with just python installed: SMALLBOX: $ uname -a FreeBSD testhost.proserver.punkt.de 11.0-RELEASE-p10 FreeBSD = 11.0-RELEASE-p10 #2 r317823: Fri May 5 11:41:45 CEST 2017 = root@testhost.proserver.punkt.de:/usr/obj/usr/src/sys/VIMAGE amd64 $ python3.6 -V Python 3.6.1 $ time pip3 list [=E2=80=A6] real 0m0.982s user 0m0.614s sys 0m0.370s BIGBOX: $ uname -a FreeBSD ph002.proserver.punkt.de 11.0-RELEASE-p10 FreeBSD = 11.0-RELEASE-p10 #2 r317823: Fri May 5 10:55:27 CEST 2017 = root@ph002.proserver.punkt.de:/usr/obj/usr/src/sys/VIMAGE amd64 $ python3.6 -V Python 3.6.1 $ time pip3 list real 0m4.140s user 0m1.242s sys 0m2.900s Well =E2=80=A6 something=E2=80=99s fishy. After double checking OS and = python versions, SSD configuration, CPU and I/O load, being a sysadmin and not really a developer I decided to = use truss to have a look at what the systems were actually *doing*. Result: on every fork of a python process we get: 77906: close(4) ERR#9 'Bad file = descriptor' 77906: close(5) ERR#9 'Bad file = descriptor' 77906: close(6) ERR#9 'Bad file = descriptor' 77906: close(7) ERR#9 'Bad file = descriptor' 77906: close(9) ERR#9 'Bad file = descriptor' 77906: close(10) ERR#9 'Bad file = descriptor' 77906: close(11) ERR#9 'Bad file = descriptor=E2=80=98 [=E2=80=A6 continued for literally millions of calls ...] I=E2=80=99m perfectly familiar with the concept of closing *open* file = descriptors on fork/exec but closing *all* of them regardless of whether they are in use or not means that on = the two systems we get this number of system calls: SMALLBOX: $ truss -f pip3 list 2>&1 | grep 'ERR#9' | wc -l 1877703 BIGBOX: $ truss -f pip3 list 2>&1 | grep 'ERR#9' | wc -l 15082323 Which rather nicely corresponds to the number of file descriptors per = process on each system multiplied by 4, since the initial python process seems to fork 4 times = to get the pip3 job done: SMALLBOX: $ sysctl kern.maxfilesperproc kern.maxfilesperproc: 469431 $ dc 469431 4 * p 1877724 BIGBOX: $ sysctl kern.maxfilesperproc kern.maxfilesperproc: 3770586 $ dc 3770586 4 * p 15082344 OK - obviously even on a system like BIGBOX 15 million invocations of = close() take a couple of seconds. And this will get worse with 256 or 512 GB of RAM since = kernel tables will dynamically increase accordingly. What now? Is this a bug? It should be sufficient to close descriptors 0 = to 3 after fork. Which is precisely what the invoking shell does before exec()ing python3.6 =E2=80= =A6 Thanks in advance for any hints Patrick=