From owner-freebsd-stable@FreeBSD.ORG Mon Mar 5 22:12:41 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id AA10E1065672 for ; Mon, 5 Mar 2012 22:12:41 +0000 (UTC) (envelope-from lacombar@gmail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 2E3768FC16 for ; Mon, 5 Mar 2012 22:12:40 +0000 (UTC) Received: by wgbds12 with SMTP id ds12so3781832wgb.31 for ; Mon, 05 Mar 2012 14:12:40 -0800 (PST) Received-SPF: pass (google.com: domain of lacombar@gmail.com designates 10.180.86.9 as permitted sender) client-ip=10.180.86.9; Authentication-Results: mr.google.com; spf=pass (google.com: domain of lacombar@gmail.com designates 10.180.86.9 as permitted sender) smtp.mail=lacombar@gmail.com; dkim=pass header.i=lacombar@gmail.com Received: from mr.google.com ([10.180.86.9]) by 10.180.86.9 with SMTP id l9mr18431862wiz.15.1330985560179 (num_hops = 1); Mon, 05 Mar 2012 14:12:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=HGArakWapETiwdhwwSiIOawROs4uySWahjpi4WF7DzA=; b=q+jJ1Tcwz/I80ENYKtyJ8N5uaOcv59zZIcfNgS9kBYUtwq0KqV9XQ2aQ4oaNS42KHZ ntPhmCRUGLXFLg1EjjrkM5TJzn3QEBfG4pqyLgVAEFo5hRM4oNHGZrLAiZvY7t23/2X+ pHFiGR6pUixxkSRXkZCf+ncwgpvAUdBJiahUPGLfufHO+chlqSqS7EOod7NHgBz54uaM K8m+EfvMkCq94/H3gsP9sXvw3xcU4O4lZW2Y5AWfhsz2QnYqgXEwuD2P7yoIqq6ERGAz MkvnarSk1OvfZ5EnroOP7uwDyXagb3Miy61up60jVyXBgnQ7zBjmRejNXkbxweYSD7n5 /xvA== MIME-Version: 1.0 Received: by 10.180.86.9 with SMTP id l9mr14647832wiz.15.1330985560078; Mon, 05 Mar 2012 14:12:40 -0800 (PST) Received: by 10.216.166.139 with HTTP; Mon, 5 Mar 2012 14:12:40 -0800 (PST) Date: Mon, 5 Mar 2012 17:12:40 -0500 Message-ID: From: Arnaud Lacombe To: FreeBSD Stable Content-Type: text/plain; charset=ISO-8859-1 Subject: Heavy fs corruption with 9.0-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Mar 2012 22:12:41 -0000 Hi, I've been running a couple of system with 9.0-RELEASE since it is out. All the system were installed through the standard installation procedure. After unclean reboot, either crash or power-failure, I get a huge amount of really bad filesystem corruption (read: "silent", fs-wide, corruptions). This happens with either i386 or amd64 build. Systems involved use compact flash as their system permanent storage medium. Typical symptoms are: [during rc startup] Starting sshd. /usr/lib/libkrb5.so.10: invalid file format/etc/rc: WARNING: failed to start sshd /usr/libexec/sendmail/sendmail: Undefined symbol "SSL_library_init"/usr/libexec/sendmail/sendmail: Undefined symbol "SSL_library_init"Starting cron. [after startup, dropped in single user, remount / read-only + ran `fsck -y /' and went back multi-user] Starting sshd. Segmentation fault Mar 5 18:07:38 test kernel: Failed to write core file for process sshd (error 14) /etc/rc: WARNING: failed to start sshd Segmentation fault Segmentation fault Starting cron. /usr/lib/libgnuregex.so.5: invalid file format/usr/lib/libgnuregex.so.5: invalid file formatStarting background file system checks in 60 seconds. well, something looks broken, let's investage... # file /usr/include/* | tail -20 /usr/include/ulog.h: broken symbolic link to `liblzma.so.5' /usr/include/unctrl.h: ELF 64-bit LSB shared object, x86-64, version 1 (FreeBSD), dynamically linked, stripped /usr/include/unistd.h: ELF 64-bit LSB shared object, x86-64, version 1 (FreeBSD), dynamically linked, stripped /usr/include/usb.h: current ar archive /usr/include/usbhid.h: current ar archive /usr/include/utempter.h: current ar archive /usr/include/utime.h: broken symbolic link to `librt.so.1' /usr/include/utmpx.h: current ar archive /usr/include/uuid.h: current ar archive /usr/include/varargs.h: ELF 64-bit LSB shared object, x86-64, version 1 (FreeBSD), dynamically linked, stripped /usr/include/vgl.h: ELF 64-bit LSB shared object, x86-64, version 1 (FreeBSD), dynamically linked, stripped /usr/include/vis.h: broken symbolic link to `librtld_db.so.2' /usr/include/vm: directory /usr/include/wchar.h: current ar archive /usr/include/wctype.h: current ar archive /usr/include/wordexp.h: ELF 64-bit LSB shared object, x86-64, version 1 (FreeBSD), dynamically linked, stripped /usr/include/x86: directory /usr/include/ypclnt.h: ELF 64-bit LSB shared object, x86-64, version 1 (FreeBSD), dynamically linked, stripped /usr/include/zconf.h: current ar archive /usr/include/zlib.h: current ar archive "since when /usr/include contains a majority of binary file ?" # ssh Undefined symbol "ssh_compat20" referenced from COPY relocation in /usr/bin/ssh # file /usr/lib/libssh.so.5 /usr/lib/libssh.so.5: symbolic link to `libopie.so.7' "what ?" # file /usr/lib/snmp* /usr/lib/snmp_atm.so: ELF 64-bit LSB shared object, x86-64, version 1 (FreeBSD), dynamically linked, stripped /usr/lib/snmp_atm.so.6: symbolic link to `pam_deny.so.5' /usr/lib/snmp_bridge.so: ELF 64-bit LSB shared object, x86-64, version 1 (FreeBSD), dynamically linked, stripped /usr/lib/snmp_bridge.so.6: symbolic link to `pam_echo.so.5' /usr/lib/snmp_hostres.so: ELF 64-bit LSB shared object, x86-64, version 1 (FreeBSD), dynamically linked, stripped /usr/lib/snmp_hostres.so.6: symbolic link to `pam_exec.so.5' /usr/lib/snmp_mibII.so: ELF 64-bit LSB shared object, x86-64, version 1 (FreeBSD), dynamically linked, stripped /usr/lib/snmp_mibII.so.6: symbolic link to `pam_ftpusers.so.5' /usr/lib/snmp_netgraph.so: ELF 64-bit LSB shared object, x86-64, version 1 (FreeBSD), dynamically linked, stripped /usr/lib/snmp_netgraph.so.6: symbolic link to `pam_login_access.so.5' [...] "why `snmp_netgraph.so.6' would be linked to `pam_login_access.so.5' ?" Unsurprisingly, fsck (still) detects a lot of inconsistency: # fsck -f / ** /dev/ada0p2 (NO WRITE) USE JOURNAL? no ** Skipping journal, falling through to full fsck SETTING DIRTY FLAG IN READ_ONLY MODE UNEXPECTED SOFT UPDATE INCONSISTENCY ** Last Mounted on / ** Root file system ** Phase 1 - Check Blocks and Sizes 124184 DUP I=31488 UNEXPECTED SOFT UPDATE INCONSISTENCY 124185 DUP I=31488 UNEXPECTED SOFT UPDATE INCONSISTENCY 124186 DUP I=31488 UNEXPECTED SOFT UPDATE INCONSISTENCY 124187 DUP I=31488 UNEXPECTED SOFT UPDATE INCONSISTENCY 124188 DUP I=31488 UNEXPECTED SOFT UPDATE INCONSISTENCY 124189 DUP I=31488 UNEXPECTED SOFT UPDATE INCONSISTENCY [...] EXCESSIVE DUP BLKS I=31494 CONTINUE? [yn] I do not see this behavior when running 9.0-RELEASE on top of a 7.4-RELEASE userland (including FS). I've seen this behavior on various CF, so a single bad card is unlikely to be the culprit. Here are the currently mounted filesystem on the machine, as well as mount options: # mount /dev/ada0p2 on / (ufs, local, journaled soft-updates) devfs on /dev (devfs, local, multilabel) Any hints appreciated. Thanks, - Arnaud