Date: Tue, 12 Nov 2013 14:28:30 +0200 From: Ivan Dimitrov <zlobber@gmail.com> To: freebsd-fs@freebsd.org Subject: Strange lock/crash - 100% cpu with basic command line utils Message-ID: <52821EEE.5040502@gmail.com>
next in thread | raw e-mail | index | archive | help
Hello list This is my first time reporting a problem, so please excuse me if this is not the right place or format. Also apology for my poor English. Last month we started experiencing strange locks on some of our servers. On semi-random occasions, when typing `cd`, `ls`, `pwd` the server would crash and start behave strangely. Sometimes the problem is reproducible, sometimes all commands work as expected. All servers are Intel or AMD CPUs with FreeBSD 9.2 that netboot the latest kernel and load the OS in RAM. All our servers are using zfs with ssd for cache. Here is an example server: Also we tested out with preempted and non preempted kernel. ========================================== [root@ph3storage5 ~]# zpool status -v pool: zstorage5p1 state: ONLINE scan: scrub repaired 0 in 39h36m with 0 errors on Mon Nov 4 05:11:48 2013 config: NAME STATE READ WRITE CKSUM zstorage5p1 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 cache ada4p1 ONLINE 0 0 0 errors: No known data errors pool: zstorage5p2 state: ONLINE scan: scrub repaired 0 in 14h59m with 0 errors on Sun Nov 3 04:41:50 2013 config: NAME STATE READ WRITE CKSUM zstorage5p2 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 cache ada4p2 ONLINE 0 0 0 errors: No known data errors ========================================== The typical lock would look like the following: cd ~userdir/ ; ls At this point, the ls command "freezes" and cannot be "ctrl+c". We open up another console and see that the `ls` command is using 100% CPU. Also, some disk operations randomly start taking 1 to 2 minutes to complete. For example, we used `camcontrol` a few times, and it freezed at one point. Also (while crashed) we used zpool to remove the ssd cache from the pool, than we re-added the cache back to the pool, but when we issued zpool status, the command freezed for a minute. We managed to collect some data from two different incidents Incident 1: http://pastebin.com/EkCeSwY9 Incident 2: http://pastebin.com/5rj9BV68 Since the problem is reproducible, we accept proposals how to do further tests. Thanks in advance Best Regards Ivan Dimitrov
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?52821EEE.5040502>