From owner-freebsd-current@FreeBSD.ORG Wed Jun 11 18:52:40 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 877626B3 for ; Wed, 11 Jun 2014 18:52:40 +0000 (UTC) Received: from dmz-mailsec-scanner-1.mit.edu (dmz-mailsec-scanner-1.mit.edu [18.9.25.12]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E95902830 for ; Wed, 11 Jun 2014 18:52:39 +0000 (UTC) X-AuditID: 1209190c-f79946d000000c3b-1f-5398a57097ad Received: from mailhub-auth-3.mit.edu ( [18.9.21.43]) (using TLS with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by dmz-mailsec-scanner-1.mit.edu (Symantec Messaging Gateway) with SMTP id 40.9A.03131.075A8935; Wed, 11 Jun 2014 14:52:32 -0400 (EDT) Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id s5BIqVN8022340; Wed, 11 Jun 2014 14:52:32 -0400 Received: from multics.mit.edu (system-low-sipb.mit.edu [18.187.2.37]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id s5BIqT6w006286 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 11 Jun 2014 14:52:30 -0400 Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id s5BIqSoO029960; Wed, 11 Jun 2014 14:52:28 -0400 (EDT) Date: Wed, 11 Jun 2014 14:52:28 -0400 (EDT) From: Benjamin Kaduk To: "O. Hartmann" Subject: Re: Panic String: ffs_alloccg: map corrupted [/dev/gpt/tmp] In-Reply-To: <20140611195504.3d03de36.ohartman@zedat.fu-berlin.de> Message-ID: References: <20140611195504.3d03de36.ohartman@zedat.fu-berlin.de> User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrJIsWRmVeSWpSXmKPExsUixCmqrVuwdEawwc+FkhZz3nxgsvg76w+T A5PHjE/zWTxObT/IGMAUxWWTkpqTWZZapG+XwJWx78chxoIVGhWH9/YyNTAeU+hi5OSQEDCR mHDtATOELSZx4d56ti5GLg4hgdlMEoufHABLCAlsZJSYu9MKInGISeLPpJ0sEE4Do8Sltd8Z QapYBLQlbm/dwQRiswmoSMx8s5ENxBYR0Jc413QazGYWMJSYvHUB2FRhAReJznfnwGxOASeJ pTM3s4DYvAKOErM7zjBBbHaUuLD0Ldh8UQEdidX7p0DVCEqcnPmEBWKmpcS5P9fZJjAKzkKS moUktYCRaRWjbEpulW5uYmZOcWqybnFyYl5eapGuoV5uZoleakrpJkZwqEry7GB8c1DpEKMA B6MSD29E/YxgIdbEsuLK3EOMkhxMSqK866YChfiS8lMqMxKLM+KLSnNSiw8xSnAwK4nwRjQD 5XhTEiurUovyYVLSHCxK4rxvra2ChQTSE0tSs1NTC1KLYLIyHBxKErxLlgA1ChalpqdWpGXm lCCkmTg4QYbzAA1fCFLDW1yQmFucmQ6RP8Woy/Hr5rE2JiGWvPy8VClx3jUgRQIgRRmleXBz YCnmFaM40FvCvF0gVTzA9AQ36RXQEiagJa89p4MsKUlESEk1MEbOPM1Vo9d9JMT907Lts3f0 ZjsXJcScdVH9y3Uv/6pYmeLiX643hOc0zliff1443dDcblfXLTll45NCy/xnq/eUb3ORk05g sTeW+BfPHWzXr23+PfjJVBa5kiRhFcEqlom1y2IW/Jz0/ZvNC7c4X77nMaeturUmHXkvfiFL 8b7T+WlCR9mdlFiKMxINtZiLihMBQOvWiwwDAAA= Cc: FreeBSD CURRENT X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jun 2014 18:52:40 -0000 It is rather difficult to determine what sort of response you are expecting to this message, as it seems to cover several different (but maybe related) topics, and include some exposition and supposition that do not include clear questions. On Wed, 11 Jun 2014, O. Hartmann wrote: > Running FreeBSD > > Version String: FreeBSD 11.0-CURRENT #3 r267294: Mon Jun 9 22:07:15 CEST 2014 amd64 > > crashes wihout panic message and /var/crash/info.0 contains this message: > > Dump header from device /dev/gpt/swap > Architecture: amd64 > Architecture Version: 2 > Dump Length: 968962048B (924 MB) > Blocksize: 512 > Dumptime: Wed Jun 11 19:19:19 2014 > Hostname: thor.sb211.zbv > Magic: FreeBSD Kernel Dump > Version String: FreeBSD 11.0-CURRENT #3 r267294: Mon Jun 9 22:07:15 CEST 2014 > root@thor.sb211.zbv:/usr/obj/usr/src/sys/THOR > Panic String: ffs_alloccg: map corrupted > Dump Parity: 3034136388 > Bounds: 0 > Dump Status: good > > I'm very confused about the panic string, since it seems to tell me something is bad with > FFS/UFS. ffs is encountering "bad" data while searching through the free block map. I am not an ffs/ufs expert, but I think this could be the result of of corrupt data on-disk [from a previous crash?] that does not get cleaned up by fsck. If that is the case, re-running newfs should clear things up. Since this is /tmp which is, as you note, usually just ephemeral files, that is probably one of the first things I would try. > More disturbing is the fact that the boot process into multi user stops at a compalin > about unclean /dev/gpt/tmp filesystem (mount to /tmp): The OS stops at the PAsswd: prompt > for single user-mode maintainance. If error(s) are encountered during the mounting of filesystems, the OS always drops to single-user mode. There is no special-casing for /tmp or anything else. See the calls to stop_boot() from /etc/rc.d/mountcritlocal, etc.. > I can not understand why the system is stopping complaining about a broken /tmp > filesystem. I consider especially /tmp infill corrupt after a fault and I'd like to ask > whether there is a way to overrun this corruption and force a repair and mount, even if > the data contained in /tmp is after forced cleaning corrupt. > > When using tmpfs backed /tmp there shouldn't be any stopp/fault of that kind so it would > be canonical to have it also for a hard-drive backed /tmp, or am I wrong? I don't think you're obviously correct. You may not be wrong, but this is not how the system is currently expected to behave; there would need to be some discussion if it was to change. > It is not the first time that I receive this kind of crash under heavy load (box is a > 8GB system with this CPU specs: > > FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final > 208032) 20140512 CPU: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz (2999.72-MHz > K8-class CPU) Origin="GenuineIntel" Id=0x10676 Family=0x6 Model=0x17 Stepping=6 > Features=0xbfebfbff > Features2=0x8e3fd > AMD Features=0x20100800 > AMD Features2=0x1 > TSC: P-state invariant, performance statistics > real memory = 8589934592 (8192 MB) > avail memory = 8278880256 (7895 MB) > Event timer "LAPIC" quality 400 > ACPI APIC Table: > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > FreeBSD/SMP: 1 package(s) x 2 core(s) > cpu0 (BSP): APIC ID: 0 > cpu1 (AP): APIC ID: 1 > [...] > > The not-so-funny-part is that I have those crashes under heavy load very frequent on ALL > C2D systems (one E8400 as shown, another has a Q4400 CPU, but also 8 GB RAM, same > motherboard). In all cases of a sudden crash, /tmp gets corrupted and the system refuses > to boot into multiuser mode complaining about the broken /tmp filesystem which can not be > repaired automatically. > > Apart from this specific question about an unclean /tmp, this kind of crash under heavy > load on a specific hardware architecture with most recent CURRENT is puzzling (and > occured within the past 8 weeks several times with the same stupid blocking at the > broken /tmp partition). I also checked the hardware with tools like memtest86 ensure > having no fault memory, but I can not exclude some kind of overheating the CPU since I > realized with CLANG and -O3 (which is supposed to optimise for vector units if available, > if I'm right) this increases the average CPU temperature by ~ 3 - 5 degree Celsius. This > is more obvious on a Dell Latitude E6510 with a first-generation Sandy Bridge mobile CPU > and FreeBSD 9.2/9.3: compiling the OS with gcc 4.2 (base compiler in that system), the > temperature is 2 - 4 degrees lower than using CLANG 3.4.1 with -O3 enabled (reading the > ACPI reported temperature via "systctl -a|grep tempe"). This is funny, isn't it? I don't feel like there is anything I can say in reply to this bit. -Ben