Date: Tue, 15 Jul 2008 22:29:28 -0400 From: Steve Bertrand <steve@ibctech.ca> To: Matthew Dillon <dillon@apollo.backplane.com> Cc: freebsd-stable@freebsd.org Subject: Re: taskqueue timeout Message-ID: <487D5D08.9070102@ibctech.ca> In-Reply-To: <200807151955.m6FJtf77008969@apollo.backplane.com> References: <487CCD46.8080506@ibctech.ca> <200807151711.m6FHBgVO007481@apollo.backplane.com> <487CF077.2040201@ibctech.ca> <487CFA08.5000308@ibctech.ca> <200807151955.m6FJtf77008969@apollo.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Matthew Dillon wrote: > :Went from 10->15, and it took quite a bit longer into the backup before > :the problem cropped back up. Jumping right into it, there is another post after this one, but I'm going to try to reply inline: > Try 30 or longer. See if you can make the problem go away entirely. > then fall back to 5 and see if the problem resumes at its earlier > pace. I'm sure 30 will either push the issue longer, or into non-existence, but are there any developers here who can say what this timer does? ie. How does changing this timer affect the performance of the disk subsystem (aside from allowing it to work, of course). After I'm done responding this message, I'll be testing the sysctl to 30. > It could be temperature related. The drives are being exercised > a lot, they could very well be overheating. To find out add more > airflow (a big house fan would do the trick). > Temperature is a good thought, but currently, my physical situation has this: - 2U chassis - multiple fans in the case - in my lab (which is essentially beside my desk) - the case has no lid - it is 64 degrees with A/C and circulating fans in this area - hard drives are separated relatively well inside the case > It could be that errors are accumulating on the drives, but it seems > unlikely that four drives would exhibit the same problem. Thats what I'm thinking. All four drives are exhibiting the same errors... or, for all intents and purposes, the machine is coughing the same errors for all the drives. > Also make sure the power supply can handle four drives. Most power > supplies that come with consumer boxes can't under full load if you > also have a mid or high-end graphics card installed. Power supplies > that come with OEM slap-together enclosures are not usually much better. I currently have a 550W PSU in the 2U chassis, which again, is sitting open. I have more hardware, running in worse conditions with less wattage PSUs that don't exhibit this behavior. I need to determine whether this problem is SATA, ZFS, the motherboard or code. > Specifically, look at the +5V and +12V amperage maximums on the power > supply, then check the disk labels to see what they draw, then > multiply by 2. e.g. if your power supply can do 30A@12V and you have > four drives each taking 2A@12V (and typically ~half that at 5V), thats > 4x2x2 = 16A@12V and you would probably be ok. I'm well within specs. Even after V/A tests with the meter. The power supply is providing ample wattage to each device accordingly. > To test, remove two of the four drives, reformat the ZFS to use just 2, > and see if the problem reoccurs with just two drives. ... I knew that was going to come up... my response is "I worked so hard to get this system with ZFS all configured *exactly* how I wanted it". To test, I'm going to flip to 30 as per Matthews recommendation, and see how far that takes me. At this time, I'm only testing by backing up one machine on the network. If it fails, I'll clock the time, and then 'reformat' with two drives. Is there a technical reason this may work better with only two drives? Is there anyone interested to the point where remote login would be helpful? Steve
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?487D5D08.9070102>