Date: Sun, 07 Dec 2014 15:08:00 -0800 From: Drew Tomlinson <drew@mykitchentable.net> To: Paul Pathiakis <pathiaki2@yahoo.com>, freebsd-questions@freebsd.org Subject: Re: Probably Hardware Trouble But What Is It? Message-ID: <5484DDD0.2090005@mykitchentable.net> In-Reply-To: <548488CD.50207@yahoo.com> References: <5483A639.2050704@mykitchentable.net> <548488CD.50207@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 12/7/2014 9:05 AM, Paul Pathiakis via freebsd-questions wrote: > Drew, > > Just trying to assist.... > > From the look of it, something is definitely failing and it is either > the controller or the disk. FreeBSD is trying to stay alive. (I've > had something similar happen in the past. When I rebooted, a disk > showed to be faulted and inaccessible.) > > I'd theorize that the first line about the kernel maxfiles being > exceeded by root (borrowing you haven't changed the setting) is due to > the failure trying to allocate file handles to handle the requests > that can't be completed due to the failure. > > If you have access to the console and another drive, you may want to > connect a second drive, configure it to mirror the first and hope that > it can mirror the first. If it works, great. BTW, don't forget to > install bootblocks if this is your boot drive. > > Now, if it doesn't start to mirror the drive after being attached, > you're going to have to reboot. That's probably going to show you the > real failure. :-( > > If the controller card is onboard, not much you can do. If it's a > PCIe bus card, try to re-seat it. Sometimes things get pulled on, or > hit inadvertently and aren't sitting in the slot correctly any more. > > I agree with the other post in either replacing the connecting cables > and/or re-seating them. > > If, after all this, it doesn't work, it's probably the disk itself. > > Now, comes the patient part. If it's the drive, it's probably pretty > hot from failing and trying to do it's job. Don't laugh at this it's > worked for me 5 out of 7 times. Remove it from the machine, let it > cool to room temperature on anti-static bag. Once cool, put it in the > bag, put it in your freezer for at least three hours. Re-insert into > the machine. (At this point, you should have that other drive for the > mirror connected.) If the drive isn't a catastrophic loss, it will > work for a short time. I recommend you allow it to mirror. Ask the > drive to do NOTHING but let it sit and mirror while in single-user mode. > > However, before going to that last 'iffy' part, check everything > before that. Thank you for your suggestions. Funny you mention the freezer trick. I was just telling a co-worker about that as he's having trouble with a drive. My problem was that because of the failing drive, I couldn't verify which drive was causing the problem. Every time I'd try to issue a zpool or zfs command, it would just hang. I actually have 4 drives internally in the box and they are all together in a raidz1 pool and this pool contains my full FBSD system. Then I have another drive in an external SATA dock which I've put in it's own pool and mounted just to use for backups. I disconnected this drive and rebooted. Now I can access my system and have been able to verify that this is the failing drive. So I am lucky. All I have lost are backups. And thus all I need to do is replace this drive and then resume my backups. Thanks for your suggestions! Cheers, Drew -- Like card tricks? Visit The Alchemist's Warehouse to learn card magic secrets for free! http://alchemistswarehouse.com > > > On 12/06/2014 19:58, Drew Tomlinson wrote: >> I'm running FBS 9.1 RELEASE that I built several years ago. It's >> mostly a Samba server and has "just worked" so I've never done much >> more with it. However recently, I find it "locked up" with thousands >> of these messages on the console: >> >> kernel: kern.maxfiles limit exceeded by uid 0, please see tuning(7) >> >> I've looked in /var/log/messages and also see lots of messages like >> these: >> >> Dec 6 13:55:53 vm kernel: siisch0: ... waiting for slots 18000000 >> Dec 6 13:55:53 vm kernel: siisch0: Timeout on slot 28 >> Dec 6 13:55:53 vm kernel: siisch0: siis_timeout is 00040000 ss >> 78000000 rs 78000000 es 00000000 sts 801b0000 serr 00000000 >> Dec 6 13:55:53 vm kernel: siisch0: ... waiting for slots 08000000 >> Dec 6 13:55:55 vm kernel: siisch0: Timeout on slot 27 >> Dec 6 13:55:55 vm kernel: siisch0: siis_timeout is 00040000 ss >> 78000000 rs 78000000 es 00000000 sts 801b0000 serr 00000000 >> Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): FLUSHCACHE48. ACB: >> ea 00 00 00 00 40 00 00 00 00 00 00 >> Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): CAM status: Command >> timeout >> Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): Retrying command >> Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): READ_FPDMA_QUEUED. >> ACB: 60 01 fe d8 74 40 39 00 00 00 00 00 >> Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): CAM status: Command >> timeout >> Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): Retrying command >> Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): READ_FPDMA_QUEUED. >> ACB: 60 0a a5 7f 00 40 4c 00 00 00 00 00 >> >> This machine uses zfs. I have two pools: >> >> # zpool list >> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT >> zback 1.81T 848G 1008G 45% 1.00x ONLINE - >> zroot 1.81T 1.16T 666G 64% 1.00x ONLINE - >> >> Then I tried this and my ssh window is now stuck: >> >> # zpool status >> pool: zback >> state: ONLINE >> status: One or more devices are faulted in response to IO failures. >> action: Make sure the affected devices are connected, then run 'zpool >> clear'. >> see: http://illumos.org/msg/ZFS-8000-HC >> scan: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> zback ONLINE 3 0 0 >> ada0 ONLINE 4 0 0 >> >> I opened another ssh window and tried 'zpool clear zback' as >> suggested but it appears stuck too. >> >> I'm sure I haven't provided all the relevant information so please >> ask and I will do so. I'd appreciate any guidance on how to take a >> proper backup of ada0 and what I should do next. I think this zback >> pool is just the one disk which is a 2TB drive. I'd like to know how >> to confirm that if possible since it seems the zpool commands aren't >> able to complete. >> >> I appreciate any suggestions or guidance. >> >> Thanks, >> >> Drew >> > > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to > "freebsd-questions-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5484DDD0.2090005>