From owner-freebsd-bugs@freebsd.org Wed Aug 23 15:45:06 2017 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4596FDE9F83 for ; Wed, 23 Aug 2017 15:45:06 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2A56B75283 for ; Wed, 23 Aug 2017 15:45:06 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v7NFj6fL056129 for ; Wed, 23 Aug 2017 15:45:06 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 221743] mountd at 100% CPU for 24+ hours - getmntinfo() inefficient with thousands of filesystems and snapshots Date: Wed, 23 Aug 2017 15:45:06 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: peter@ifm.liu.se X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Aug 2017 15:45:06 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D221743 Bug ID: 221743 Summary: mountd at 100% CPU for 24+ hours - getmntinfo() inefficient with thousands of filesystems and snapshots Product: Base System Version: CURRENT Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: bin Assignee: freebsd-bugs@FreeBSD.org Reporter: peter@ifm.liu.se Created attachment 185694 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D185694&action= =3Dedit Fixed getmntinfo.c.diff I noticed that mountd on two of our file servers where running at 100% for = over 24 hours.=20 These systems are running FreeBSD 11.0 and are Dell 730xd servers with 256GB RAM and around 140TB of disk where there are around 16000 user filesystems (with around 20-40 hourly snapshots per filesystem). A "truss" of one of them indicated it was busy in a loop calling getfsinfo(= ), munmap() and mmap() and slowly trying to loading more and more (one more per loop) filesystems into a dynamically allocated buffer.=20 At the time I looked it was up at 280000 filesystems+snapshots out of the 360000 available ones (16000 filesystems, the rest snapshots). Looking at the code for getmntinfo() in /usr/src/lib/libc/gen/getmntinfo.c I see that the code calls getfsinfo() and tries to load the list of filesyste= ms - and if it sees that it could load more filesystems than expected, loops back and reretries with the buffer resized to fit one more filesystem. The problem seems to be that at around 250000-300000 filesystems+snapshots = the loop took so long that due to the 16000 new snapshots created every hour it never really catched up... In the attached patch I've modified the getmntinfo() function to call getfsinfo() in the loop in order to get the new number of available filesys= tems - and also have a larger "extra" space - and just give up after 3 rounds in= the loop and just return the list it has got at that time... Btw we also noticed that the snapshots where only sometimes included in the list from getfsinfo() - but not always. It seems it must be accessed to sho= w up in the list (ls -l in ".zfs/snapshot" triggers a "mount"), or like in our c= ase - and rsync backup job). I include a patch for a modified getmntinfo() function. --=20 You are receiving this mail because: You are the assignee for the bug.=