Date: Mon, 2 Dec 2019 15:32:39 +0100 From: Peter Eriksson <pen@lysator.liu.se> To: freebsd-fs@freebsd.org Subject: Slow reboots due to ZFS cleanup in kern_shutdown() .. zio_fini() Message-ID: <AD17E454-6A51-436D-A853-07F04A406EC9@lysator.liu.se>
next in thread | raw e-mail | index | archive | help
I’ve been looking at trying to figure out why our servers take so long to reboot, where the most time is spent doing a “shutdown”. We’ve seen examples where it has taken 10-20 minutes (or more). This is Dell PowerEdge R730xd servers with 256GB RAM and ~140TB of disks. FreeBSD 11.3. With ~24000 filsystems per server. We normally cap ARC to 128GB RAM. Adding a lot of debugging printf() calls to relevant parts of the code points to: kern_shutdown() -> EVENTHANDLER_INVOKE(shutdown_post_sync) -> zfsshutdown() -> zfs__fini() -> spa_fini() -> zio_fini(): Debug output from a test run: zio_fini: kmem_cache_destroy(zio_buf_cache & zio_data_buf_cache): kmem_cache_destroy: uma_zfree_arg(0xfffff803465eec00) [zio_buf_12288] took 16 seconds kmem_cache_destroy(zio_buf_cache[20]) took 16 seconds kmem_cache_destroy: uma_zfree_arg(0xfffff803465eeb00) [zio_buf_16384] took 61 seconds kmem_cache_destroy(zio_buf_cache[28]) took 61 seconds kmem_cache_destroy: uma_zfree_arg(0xfffff8034c9018c0) [zio_buf_131072] took 87 seconds kmem_cache_destroy(zio_buf_cache[224]) took 87 seconds kmem_cache_destroy: uma_zfree_arg(0xfffff8034c901880) [zio_data_buf_131072] took 5 seconds kmem_cache_destroy(zio_data_buf_cache[224]) took 5 seconds (I modified the code here to print the time spent if it took 2 seconds or more) This is on a newly rebooted server (with all filesystems mounted). Seems like uma_zfree_arg() is taking really long to execute. Now that code isn’t exactly easy to read (for me atleast)… Lot’s of barrier/locks and stuff. I wonder why this code should take so long? There shouldn’t be any disk I/O involved and it’s just a cache so I wonder if there might be some way to get rid of it quicker? Any UMA experts online? :-) Reason for this is that I’d like to be able to make sure a server reboots more quickly in case of problems. Now with the parallell ZFS mount stuff being done at boot time that part is much quicker :-). - Peter
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AD17E454-6A51-436D-A853-07F04A406EC9>
