Date: Tue, 31 May 2011 18:08:59 +0300 From: Daniel Kalchev <daniel@digsys.bg> To: Mikolaj Golub <trociny@freebsd.org> Cc: freebsd-stable@freebsd.org Subject: Re: HAST instability Message-ID: <4DE5048B.3080206@digsys.bg> In-Reply-To: <86zkm3t11g.fsf@in138.ua3> References: <4DE21C64.8060107@digsys.bg> <4DE3ACF8.4070809@digsys.bg> <86d3j02fox.fsf@kopusha.home.net> <4DE4E43B.7030302@digsys.bg> <86zkm3t11g.fsf@in138.ua3>
next in thread | previous in thread | raw e-mail | index | archive | help
On 31.05.11 17:08, Mikolaj Golub wrote: > As I wrote privately, it would be nice to see both netstat and hast logs (from both nodes) for the same rather long period, when several cases occured. It would be good to place them somewere on web so other guys could access them too, as I will be offline for 7-10 days and will not be able to help you until I am back. The test finished running for almost three hours, and so here is the collected data: (for the duration of test, on the secondary node) systat -if /0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10 Load Average Interface Traffic Peak Total lo0 in 0.000 KB/s 0.000 KB/s 1.126 KB out 0.000 KB/s 0.000 KB/s 1.126 KB ix1 in 0.003 KB/s 230.590 MB/s 614.688 GB out 0.054 KB/s 7.425 MB/s 19.910 GB igb0 in 0.025 KB/s 3.636 KB/s 566.897 KB out 0.072 KB/s 4.296 KB/s 1.091 MB The primary node is b1a, the secondary node is b1b. kernel (built just after csup update): FreeBSD b1a 8.2-STABLE FreeBSD 8.2-STABLE #1: Mon May 30 14:17:50 EEST 2011 root@b1a:/usr/obj/usr/src/sys/GENERIC amd64 from primary messages: http://news.digsys.bg/~admin/hast/test31may/b1a-messages netstat -in: http://news.digsys.bg/~admin/hast/test31may/b1a-netstat -in netstat-s: http://news.digsys.bg/~admin/hast/test31may/b1a-netstat-s from secondary messages: http://news.digsys.bg/~admin/hast/test31may/b1b-messages netstat -in: http://news.digsys.bg/~admin/hast/test31may/b1b-netstat -in netstat-s: http://news.digsys.bg/~admin/hast/test31may/b1b-netstat-s > DK> One additional note: while playing with this setup, I tried to > DK> simulate local disk going away in the hope HAST will switch to using > DK> the remote disk. Instead of asking someone at the site to pull out the > DK> drive, I just issued on the primary > > DK> hastctl role init data0 > > DK> which resulted in kernel panic. Unfortunately, there was no sufficient > DK> dump space for 48GB. I will re-run this again with more drives for the > DK> crash dump. Anything you want me to look for in particular? (kernels > DK> have no KDB compiled in yet) > > Well, removing physical disk (device /dev/gpt/data0 consumed by hastd > dissapears) and switching a resource to init role (devive /dev/hast/data0 > consumed by FS dissapears) are two different things. Sure you should not > normally change the resource role (destroy hast device) before unmounting > (exporting) FS. Then how do I proceed with a failed drive? Or a flaky drive that is still visible to the OS, that I want to remove from HAST and replace with a different one? How do I ask HAST to switch I/O to the secondary? Is there other way to get a drive out of HAST? In any case, even if this is not allowed operation, it should not panic. I am now going to reboot and run the same tests without checksums. Daniel
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DE5048B.3080206>