Date: Sun, 07 Dec 2014 12:05:17 -0500 From: Paul Pathiakis <pathiaki2@yahoo.com> To: freebsd-questions@freebsd.org Subject: Re: Probably Hardware Trouble But What Is It? Message-ID: <548488CD.50207@yahoo.com> In-Reply-To: <5483A639.2050704@mykitchentable.net> References: <5483A639.2050704@mykitchentable.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Drew, Just trying to assist.... From the look of it, something is definitely failing and it is either the controller or the disk. FreeBSD is trying to stay alive. (I've had something similar happen in the past. When I rebooted, a disk showed to be faulted and inaccessible.) I'd theorize that the first line about the kernel maxfiles being exceeded by root (borrowing you haven't changed the setting) is due to the failure trying to allocate file handles to handle the requests that can't be completed due to the failure. If you have access to the console and another drive, you may want to connect a second drive, configure it to mirror the first and hope that it can mirror the first. If it works, great. BTW, don't forget to install bootblocks if this is your boot drive. Now, if it doesn't start to mirror the drive after being attached, you're going to have to reboot. That's probably going to show you the real failure. :-( If the controller card is onboard, not much you can do. If it's a PCIe bus card, try to re-seat it. Sometimes things get pulled on, or hit inadvertently and aren't sitting in the slot correctly any more. I agree with the other post in either replacing the connecting cables and/or re-seating them. If, after all this, it doesn't work, it's probably the disk itself. Now, comes the patient part. If it's the drive, it's probably pretty hot from failing and trying to do it's job. Don't laugh at this it's worked for me 5 out of 7 times. Remove it from the machine, let it cool to room temperature on anti-static bag. Once cool, put it in the bag, put it in your freezer for at least three hours. Re-insert into the machine. (At this point, you should have that other drive for the mirror connected.) If the drive isn't a catastrophic loss, it will work for a short time. I recommend you allow it to mirror. Ask the drive to do NOTHING but let it sit and mirror while in single-user mode. However, before going to that last 'iffy' part, check everything before that. P. On 12/06/2014 19:58, Drew Tomlinson wrote: > I'm running FBS 9.1 RELEASE that I built several years ago. It's > mostly a Samba server and has "just worked" so I've never done much > more with it. However recently, I find it "locked up" with thousands > of these messages on the console: > > kernel: kern.maxfiles limit exceeded by uid 0, please see tuning(7) > > I've looked in /var/log/messages and also see lots of messages like > these: > > Dec 6 13:55:53 vm kernel: siisch0: ... waiting for slots 18000000 > Dec 6 13:55:53 vm kernel: siisch0: Timeout on slot 28 > Dec 6 13:55:53 vm kernel: siisch0: siis_timeout is 00040000 ss > 78000000 rs 78000000 es 00000000 sts 801b0000 serr 00000000 > Dec 6 13:55:53 vm kernel: siisch0: ... waiting for slots 08000000 > Dec 6 13:55:55 vm kernel: siisch0: Timeout on slot 27 > Dec 6 13:55:55 vm kernel: siisch0: siis_timeout is 00040000 ss > 78000000 rs 78000000 es 00000000 sts 801b0000 serr 00000000 > Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): FLUSHCACHE48. ACB: ea > 00 00 00 00 40 00 00 00 00 00 00 > Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): CAM status: Command > timeout > Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): Retrying command > Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): READ_FPDMA_QUEUED. > ACB: 60 01 fe d8 74 40 39 00 00 00 00 00 > Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): CAM status: Command > timeout > Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): Retrying command > Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): READ_FPDMA_QUEUED. > ACB: 60 0a a5 7f 00 40 4c 00 00 00 00 00 > > This machine uses zfs. I have two pools: > > # zpool list > NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT > zback 1.81T 848G 1008G 45% 1.00x ONLINE - > zroot 1.81T 1.16T 666G 64% 1.00x ONLINE - > > Then I tried this and my ssh window is now stuck: > > # zpool status > pool: zback > state: ONLINE > status: One or more devices are faulted in response to IO failures. > action: Make sure the affected devices are connected, then run 'zpool > clear'. > see: http://illumos.org/msg/ZFS-8000-HC > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > zback ONLINE 3 0 0 > ada0 ONLINE 4 0 0 > > I opened another ssh window and tried 'zpool clear zback' as suggested > but it appears stuck too. > > I'm sure I haven't provided all the relevant information so please ask > and I will do so. I'd appreciate any guidance on how to take a proper > backup of ada0 and what I should do next. I think this zback pool is > just the one disk which is a 2TB drive. I'd like to know how to > confirm that if possible since it seems the zpool commands aren't able > to complete. > > I appreciate any suggestions or guidance. > > Thanks, > > Drew >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?548488CD.50207>