Date: Mon, 17 Sep 2007 16:18:42 -0700 (PDT) From: Barnabas <barnabasdk@gmail.com> To: freebsd-current@freebsd.org Subject: Re: Testers wanted: Gvinum patches of SoC 2007 work Message-ID: <12747042.post@talk.nabble.com> In-Reply-To: <20070813204035.GA5338@stud.ntnu.no> References: <20070813204035.GA5338@stud.ntnu.no>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Ulf I also use gvinum in a pretty complex setup, and would be happy to help in any testing, bug finding and whatever. I work as a developer - not exactly with kernel hacking - but I am not completely at a loss. I have had a lot of issues with the semi-finished version of gvinum that is currently available though the freebsd distro. I have had a lot of DMA read/write timeout issues lately. Especially under heavy load. But only on my ata drives - the scsi drives runs perfectly. First I thought it was a dead disk, but I have been seeing the issue on my new raid5 setup as well. The same issue on two disk at the same time is not very likely. If you have any info on how I am to patch the current vinum driver with the new changes you made it would be great. I really appreciate someone is looking at this exellent software. It has not really worked 100% since the port to geom was started. Heres my config: FreeBSD sauron.barnabas.dk 6.2-RELEASE-p6 FreeBSD 6.2-RELEASE-p6 #0: Wed Jul 18 23:33:58 CEST 2007 root@sauron.barnabas.dk:/usr/src/sys/i386/compile/KERNEL_6_2 i386 7 drives: D elben State: up /dev/da1s1h A: 0/7825 MB (0%) D donau State: up /dev/da0s1h A: 0/7825 MB (0%) D raid5_4 State: up /dev/ad11a A: 6/194480 MB (0%) D raid5_3 State: up /dev/ad10a A: 6/194480 MB (0%) D raid5_2 State: up /dev/ad9a A: 6/194480 MB (0%) D raid5_1 State: up /dev/ad8a A: 6/194480 MB (0%) D spree State: up /dev/ad4a A: 3/114473 MB (0%) 6 volumes: V raid5 State: up Plexes: 1 Size: 569 GB V data01 State: up Plexes: 1 Size: 111 GB V usr State: up Plexes: 2 Size: 5625 MB V home State: up Plexes: 2 Size: 1000 MB V tmp State: up Plexes: 2 Size: 600 MB V var State: up Plexes: 2 Size: 600 MB 10 plexes: P raid5.p0 R5 State: degraded Subdisks: 4 Size: 569 GB P data01.p0 C State: up Subdisks: 1 Size: 111 GB P usr.p1 C State: up Subdisks: 1 Size: 5625 MB P home.p1 C State: up Subdisks: 1 Size: 1000 MB P tmp.p1 C State: up Subdisks: 1 Size: 600 MB P var.p1 C State: up Subdisks: 1 Size: 600 MB P usr.p0 C State: up Subdisks: 1 Size: 5625 MB P home.p0 C State: up Subdisks: 1 Size: 1000 MB P tmp.p0 C State: up Subdisks: 1 Size: 600 MB P var.p0 C State: up Subdisks: 1 Size: 600 MB 13 subdisks: S raid5.p0.s3 State: stale D: raid5_4 Size: 189 GB S raid5.p0.s2 State: up D: raid5_3 Size: 189 GB S raid5.p0.s1 State: up D: raid5_2 Size: 189 GB S raid5.p0.s0 State: up D: raid5_1 Size: 189 GB S data01.p0.s0 State: up D: spree Size: 111 GB S usr.p1.s0 State: up D: elben Size: 5625 MB S home.p1.s0 State: up D: elben Size: 1000 MB S tmp.p1.s0 State: up D: elben Size: 600 MB S var.p1.s0 State: up D: elben Size: 600 MB S usr.p0.s0 State: up D: donau Size: 5625 MB S home.p0.s0 State: up D: donau Size: 1000 MB S tmp.p0.s0 State: up D: donau Size: 600 MB S var.p0.s0 State: up D: donau Size: 600 MB # Vinum configuration of sauron.barnabas.dk, saved at Tue Sep 18 01:11:49 2007 # Current configuration: # drive elben device /dev/da1s1h # drive donau device /dev/da0s1h # drive raid5_4 device /dev/ad11a # drive raid5_3 device /dev/ad10a # drive raid5_2 device /dev/ad9a # drive raid5_1 device /dev/ad8a # drive spree device /dev/ad4a # volume raid5 # volume data01 # volume usr # volume home # volume tmp # volume var # plex name raid5.p0 org raid5 2048s vol raid5 # plex name data01.p0 org concat vol data01 # plex name usr.p1 org concat vol usr # plex name home.p1 org concat vol home # plex name tmp.p1 org concat vol tmp # plex name var.p1 org concat vol var # plex name usr.p0 org concat vol usr # plex name home.p0 org concat vol home # plex name tmp.p0 org concat vol tmp # plex name var.p0 org concat vol var # sd name raid5.p0.s3 drive raid5_4 len 398282752s driveoffset 265s plex raid5.p0 plexoffset 6144s # sd name raid5.p0.s2 drive raid5_3 len 398282752s driveoffset 265s plex raid5.p0 plexoffset 4096s # sd name raid5.p0.s1 drive raid5_2 len 398282752s driveoffset 265s plex raid5.p0 plexoffset 2048s # sd name raid5.p0.s0 drive raid5_1 len 398282752s driveoffset 265s plex raid5.p0 plexoffset 0s # sd name data01.p0.s0 drive spree len 234434560s driveoffset 265s plex data01.p0 plexoffset 0s # sd name usr.p1.s0 drive elben len 11521427s driveoffset 4505865s plex usr.p1 plexoffset 0s # sd name home.p1.s0 drive elben len 2048000s driveoffset 2457865s plex home.p1 plexoffset 0s # sd name tmp.p1.s0 drive elben len 1228800s driveoffset 1229065s plex tmp.p1 plexoffset 0s # sd name var.p1.s0 drive elben len 1228800s driveoffset 265s plex var.p1 plexoffset 0s # sd name usr.p0.s0 drive donau len 11521427s driveoffset 4505865s plex usr.p0 plexoffset 0s # sd name home.p0.s0 drive donau len 2048000s driveoffset 2457865s plex home.p0 plexoffset 0s # sd name tmp.p0.s0 drive donau len 1228800s driveoffset 1229065s plex tmp.p0 plexoffset 0s # sd name var.p0.s0 drive donau len 1228800s driveoffset 265s plex var.p0 plexoffset 0s As you can tell the raid setup is not well. Here is some of the errors I have experienced: Sep 17 16:54:03 sauron kernel: subdisk10: detached Sep 17 16:54:03 sauron kernel: ad10: detached Sep 17 16:54:03 sauron kernel: ad11: FAILURE - device detached Sep 17 16:54:03 sauron kernel: subdisk11: detached Sep 17 16:54:03 sauron kernel: ad11: detached Sep 17 16:54:03 sauron kernel: GEOM_VINUM: subdisk raid5.p0.s3 state change: up -> down Sep 17 16:54:03 sauron kernel: GEOM_VINUM: plex raid5.p0 state change: up -> degraded Sep 17 16:54:03 sauron kernel: GEOM_VINUM: subdisk raid5.p0.s2 state change: up -> down Sep 17 16:54:03 sauron kernel: GEOM_VINUM: plex raid5.p0 state change: degraded -> down Sep 17 16:54:03 sauron kernel: GEOM_VINUM: lost drive 'raid5_3' Sep 17 23:20:58 sauron sshd[15009]: refused connect from host11-69-static.30-87-b.business.telecomitalia.it (87.30.69.11) Sep 17 23:22:12 sauron sshd[15039]: refused connect from host11-69-static.30-87-b.business.telecomitalia.it (87.30.69.11) Sep 18 00:00:42 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=536025710592, length=16384)]error = 6 Sep 18 00:00:42 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=536025726976, length=49152)]error = 6 Sep 18 00:00:42 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=536025776128, length=131072)]error = 6 Sep 18 00:00:44 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=535918395392, length=49152)]error = 6 Sep 18 00:00:53 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=536025710592, length=16384)]error = 6 Sep 18 00:00:53 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=536025726976, length=49152)]error = 6 Sep 18 00:00:53 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=536025776128, length=131072)]error = 6 Sep 18 00:00:53 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=535918395392, length=49152)]error = 6 Sep 18 00:01:00 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=528017391616, length=32768)]error = 6 Sep 18 00:01:00 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=528108421120, length=49152)]error = 6 Sep 18 00:01:10 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=511343853568, length=16384)]error = 6 Sep 18 00:01:10 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=511343869952, length=49152)]error = 6 Sep 18 00:01:10 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=511343919104, length=131072)]error = 6 Sep 18 00:01:11 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=511343853568, length=16384)]error = 6 Sep 18 00:01:11 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=511343869952, length=49152)]error = 6 Sep 18 00:01:11 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=511343919104, length=131072)]error = 6 Sep 18 00:01:11 sauron kernel: g_vfs_done():gvinum/raid5[WRITE(offset=511212584960, length=16384)]error = 6 Sep 18 00:01:11 sauron kernel: g_vfs_done():gvinum/raid5[WRITE(offset=527976808448, length=16384)]error = 6 Sep 18 00:01:11 sauron kernel: g_vfs_done():gvinum/raid5[WRITE(offset=535877189632, length=16384)]error = 6 Sep 18 00:01:13 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=511343853568, length=16384)]error = 6 Sep 18 00:01:13 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=511343869952, length=49152)]error = 6 Sep 18 00:01:13 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=511343919104, length=131072)]error = 6 Sep 18 00:01:15 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=511343853568, length=16384)]error = 6 Sep 18 00:01:15 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=511343869952, length=49152)]error = 6 Sep 18 00:01:15 sauron kernel: g_vfs_done():gvinum/raid5[READ(offset=511343919104, length=131072)]error = 6 I have seen exactly the same on the data01 disk that is stand alone. Hope I am able to help. Nikolaj Hansen Ulf Lilleengen-6 wrote: > > Hi, > > It's here! The new and hopefully better gvinum patch. This is perhaps my > final > patch of the work I've done during GSoC 2007 (the patch will be updated > when I > fix a bug). This doesn't mean I'll stop work on gvinum, but rather that > I'm not > adding more features until this gets into the tree. But, for this to get > into > the tree, I need people to test it. _ALL_ reports on how it works is good. > > So, what should you test? > > * Plain normal use. > > * Mirror synchronization, rebuild if raid-5 arrays, growing of raid-5 > arrays > etc. These should work, and probably is the most tested, but some weird > combinations that I have not forseen might show itself. > > * Try weird combinations to check if it crashes. > > * Test mirror, concat, stripe and raid5 commands. > > * If there are any issues with the usability aspect. E.g. if the > information > gvinum gives you is good enough for you to understand what it's doing, > if one > way to do things seems unnatural to you etc. I'd like to hear all of > this, no > matter how bikshedish it might sound, it might be something that have > been > overlooked. These things are hard to test for the people that have been > developing it, since we know how it "should" be used. > > Before you head on, beware that the new gvinum does not give messages back > to > the userland gvinum (so you won't get them into your terminal). This is > because > it's not very simple to do with the new event system. > !! This means you'll have to look after messages in /var/log/messages !! > > And thanks to people for comments and help that I've been getting during > the summer. > > -- > Ulf Lilleengen > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > > -- View this message in context: http://www.nabble.com/Testers-wanted%3A-Gvinum-patches-of-SoC-2007-work-tf4263568.html#a12747042 Sent from the freebsd-current mailing list archive at Nabble.com.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?12747042.post>