From owner-freebsd-fs@freebsd.org  Wed Jul 15 17:37:20 2015
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7E1359A29F9;
 Wed, 15 Jul 2015 17:37:20 +0000 (UTC)
 (envelope-from dieterbsd@gmail.com)
Received: from mail-ig0-x236.google.com (mail-ig0-x236.google.com
 [IPv6:2607:f8b0:4001:c05::236])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4E3F61C92;
 Wed, 15 Jul 2015 17:37:20 +0000 (UTC)
 (envelope-from dieterbsd@gmail.com)
Received: by igvi1 with SMTP id i1so82616523igv.1;
 Wed, 15 Jul 2015 10:37:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:date:message-id:subject:from:to:content-type;
 bh=Gy3H2s4CSpQWi0u5RhQBajRjbP49PON3N+867t4x0Ak=;
 b=LbUPURhYDF/GMwYWtX8neQ+4FwW4ecO+95c0v7fQdXeg1MPwlWuiohYNjXBLgFHdI2
 XthF1RFHvkHPHYhDhXYpC4eCR8/Wew1GB8G4vTQjRrDTDfsT/cX7SU0GjNaWK4/VYBRJ
 q4sMqQq+tgvvKPKPlT9P4hS9LXHmcV0Eaeewkm/g1C8mpolDrfeU4OcT9kNSJ9EvX1Hk
 P/3zYd+t6dtfiK3bijFlxdh3Kr2noxJdxS6wjI3o76t/r8yRHnJWI4CM6jqGqzAxDD1e
 4h3uJ3tP1UXChTlubOqsPXKsbUR844g1KRoh9Qh3YyrnKFKHLvia5HOZqdAZ8XuSKm0f
 fmvg==
MIME-Version: 1.0
X-Received: by 10.107.6.194 with SMTP id f63mr5901717ioi.61.1436981839598;
 Wed, 15 Jul 2015 10:37:19 -0700 (PDT)
Received: by 10.64.2.132 with HTTP; Wed, 15 Jul 2015 10:37:19 -0700 (PDT)
Date: Wed, 15 Jul 2015 10:37:19 -0700
Message-ID: <CAA3ZYrB7i-Cjfv0UX1mb_RPmJdnj2LQw0apDd6+0fhKkrhH+PQ@mail.gmail.com>
Subject: Re: format/newfs larger external consumer drives
From: Dieter BSD <dieterbsd@gmail.com>
To: freebsd-hackers@freebsd.org, freebsd-fs@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Jul 2015 17:37:20 -0000

[ freebsd-fs@ added ]

>> If the average filesize will be large, use large block/frag sizes.
>> I use 64 KiB / 8 KiB.  And reduce the number of inodes.  I reduce
>> inodes as much as newfs allows and there are still way too many.
>
> Can you think of an algorithmic way to express this?  I.e., you don't
> want blocks to get *too* large as you risk greater losses in "partial
> fragments", etc.  Likewise, you don't want to run out of inodes.

I look at df -i for existing filesystems with similar filesizes.
My data filesystems usually get an entire disk (..., 2TB, 3TB, recently 5TB)
and with 64/8 block/frag and as few inodes as newfs will allow
df still reports numbers like 97% full but only using 0% or 1%
of inodes.

density reduced from 67108864 to 14860288
/dev/ada1: 4769307.0MB (9767541168 sectors) block size 65536, fragment size 8192
        using 1315 cylinder groups of 3628.00MB, 58048 blks, 256 inodes.
        with soft updates

I should take another look at increasing the size of cylinder groups.
Newfs likes very small cylinder groups, which made sense 30 years when
disks were like 40 MB and file sizes were a lot smaller.  IIRC, each
cylinder group gets at least one block of inodes, and with file sizes
of 1-20 GB I get way too many inodes.

Yes, a larger frag size will waste some space in the last frag of a file,
but having smaller block&frag sizes uses a lot of space to keep track of
all those blocks and frags.  And makes more work for fsck.

> "risk" of loss/cost of recovery (when the medium
> *is* unceremoniously dismounted

Some panics don't sync the disks.  Sometimes disks just go into a coma.
Soft updates is supposed to limit problems to those that fsck -p will
automagicly fix.  (assuming the disk's write cache is turned off)  There
is at least one case where it does not.  See PR 166499 (from 2012,
still not fixed).

As long as I'm whining about unfixed filesystem PRs, see also
bin/170676: Newfs creates a filesystem that does not pass fsck.
(also from 2012)

> I am concerned with the fact that users can so easily/carelessly "unplug"
> a USB device without the proper incantations beforehand.  of course, *their*
> mistake is seen as a "product design flaw"!  :-/

Superglue the cable in place?  :-)

Perhaps print up something like "Unmount filesystem(s) before unplugging
or powering off external disk, or you might lose your data.",
laminate it and attach it to the cables?

> The "demo app" that I'm working on is a sort of (low performance) NAS
> built on SBC's and external drives.

I assume that the drives *have* to be external?  Do they have to be
usb?  Could they be e-sata?  E-sata is faster and avoids the various usb
problems.  They used to sell external drives where the sata-to-usb bridge
was in a separate little module box.  They had alternate modules with
e-sata, firewire, etc.  The disk box had a standard internal ('L')
sata connector, except a standard sata connector was too large to fit.
So I took out my Swiss Army Knife and carved off some plastic from
the connector on a standard sata cable so that it would fit.
You could also put a standard sata drive into an enclosure (with
a small fan) and use your choice of connection to the computer.

>> USB specific stuff: There is an off by 1 sector problem, which will
>> bite you if you switch a drive between using the sata-usb bridge
>> and connecting the drive directly to a sata controller.  I had to
>
> Ouch!  I've not seen that with PATA-USB bridges.  OTOH, I tend not
> to pull a drive *from* an external enclosure but, rather, rely on
> the external enclosures to provide portability.  E.g., easier to
> move 500G of files from machineA to machineB by physically moving
> the volume containing them!

Apparently they vary, see the message from Warren.  Mine was missing
the first sector, so I had to have the kernel hunt for the partitioning
info.

The external drives I've seen do not have fans, and have little or
no ventilation.  If the drive will be spinning for awhile I worry
about it overheating.

> The "demo app" will try to use the large multi-TB drives of which I
> have little long-term experience.  OTOH, the usage model is "fire it
> up, pull off whichever files you need, then spin everything down"...
> until the next time you might need to retrieve an ISO (a week later?)

With this usage model it sounds like you could use a read-only mount.
Would an optical drive work for this application?

>> If the drive disappears with filesystem(s) mounted. the kernel might
>> very well panic.  There was a discussion of this problem recently.
>> I thought that FUSE was suggested as a possible solution, but I
>> can't find the discussion.  This problem is not limited to users
>> disconnecting usb drives without unmounting them.  The problem
>> happens all by itself with internal drives, as the drive, port
>> multiplier, controller, or device driver decides to go out to lunch,
>> and the kernel panics.  This happens *far* too often, and *kills*
>> reliability.  We really need a solution for this.
>
> I think its hard to back-port these sorts of things.  Much easier
> to consider the possibility of failure when initially designing the
> system, interfaces, etc.

I wonder how hard it would be to create a FUSE version of FFS?
Any thoughts from the filesystem wizards?

Alternately, instead of panicing, could the filesystem just
umount -f the offending filesystem?  (And whine to log(9).)

I am very tired of having an entire machine panic just because
one disk decided to take a nap.  This is not how you get 5 9s.  :-(