Date: Wed, 04 Dec 2019 14:29:16 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 242427] pmap_remove() sometimes is very slow causing 10+ minutes long reboots Message-ID: <bug-242427-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D242427 Bug ID: 242427 Summary: pmap_remove() sometimes is very slow causing 10+ minutes long reboots Product: Base System Version: 11.3-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: pen@lysator.liu.se I've noticed that on our file servers it can take a _very_ long time to reb= oot - where most of the time spent is during the "shutdown" phase - time spent after the "Uptime: " line is printed. After a long debugging session I've pinpointed it to ZFS freeing the "zio" cache, which, after many levels of function calls, ends up it in pmap_remov= e() where some calls to it takes approximately 1 second. On a basically idle te= st server it can take up to 10-20 minutes for the server to "shut down" (or ev= en more - the time seems to be relative to the server uptime). We've seen production machines that seem to be "hung" (atleast an hour or more) so we = gave up and sent them a "hard reset" via IPMI. Hardware: Dell PowerEdge R730xd with dual Intel Xeon E5-2620v4 CPUs (32 "cpus") and 2= 56GB of RAM. No swap. Software: FreeBSD 11.3-RELEASE-p5. ZFS on boot & data. ZFS ARC limited to 128GB. Appr= ox 24000 ZFS filesystems (empty on this test server). Snapshots taken every ho= ur. An example of how long it can take (I'm only printing timing info for calls that take >=3D1s (the top 4) or >=3D2s (the rest): kmem_unback: pmap_remove(kernel_pmap, 18446741877714755584, 18446741877714767872) took 1 seconds kmem_free: kmem_unback(kmem_object, 18446741877714755584, 12288) took 1 sec= onds page_free: kmem_free(kmem_arena, 18446741877714755584, 12288) took 1 seconds keg_free_slab: keg->uk_freef(mem) {page_free} took 1 seconds keg_drain: while-keg_free_slab-loop took 14 seconds [20021 loops, 14 slow calls] zone_drain_wait: zone_foreach_keg(zone, &keg_drain) took 14 seconds zone_dtor: zone_drain_wait(zone, M_WAITOK) took 14 seconds zone_free_item(zone=3DUMA Zones): zone->uz_dtor() took 14 seconds uma_zdestroy(zio_buf_12288) took 14 seconds kmem_cache_destroy: uma_zdestroy(0xfffff803467c8ac0) [zio_buf_12288] took 14 seconds kmem_cache_destroy(zio_buf_cache[20]) took 14 seconds Called from kern_shutdown() -> EVENTHANDLER_INVOKE(shutdown_post_sync) -> zfsshutdown() Ie, ~14 out of 20021 calls to keg_free_slab() takes 1 second instead of executing really quickly (in this case). But some kmem_cache:s are much bigger causing delays for 300-600 seconds (or more). (I use "time_second" for time measurements, should probably use something w= ith better granularity for the top 4 calls :-) I added a sysctl() kern.shutdown.verbose that I can set to a number to make= it be more verbose (and added a lot of printf()s to get this info)... With it I now can see the number of filesystems being unmounted (since that too can t= ake a little while - nothing close to the times above though). - Peter --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-242427-227>