Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 20 Apr 2017 22:43:14 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        freebsd-current@freebsd.org, freebsd-fs@freebsd.org, freebsd-ports@freebsd.org
Cc:        emaste@freebsd.org, Kirk McKusick <mckusick@mckusick.com>
Subject:   64-bit inodes (ino64) Status Update and Call for Testing
Message-ID:  <20170420194314.GI1788@kib.kiev.ua>

next in thread | raw e-mail | index | archive | help
Inodes are data structures corresponding to objects in a file system,
such as files and directories. FreeBSD has historically used 32-bit
values to identify inodes, which limits file systems to somewhat under
2^32 objects. Many modern file systems internally use 64-bit identifiers
and FreeBSD needs to follow suit to properly and fully support these
file systems.

The 64-bit inode project, also known as ino64, started life many years
ago as a project by Gleb Kurtsou (gleb@).  After that time several
people have had a hand in updating it and addressing regressions, after
mckusick@ picked up and updated the patch, and acted as a flag-waver.

Sponsored by the FreeBSD Foundation I have spent a significant effort
on outstanding issues and integration -- fixing compat32 ABI, NFS and
ZFS, addressing ABI compat issues and investigating and fixing ports
failures.  rmacklem@ provided feedback on NFS changes, emaste@ and
jhb@ provided feedback and review on the ABI transition support. pho@
performed extensive testing and identified a number of issues that
have now been fixed.  kris@ performed an initial ports investigation
followed by an exp-run by antoine@. emaste@ helped with organization
of the process.

This note explains how to perform useful testing of the ino64 branch,
beyond typical smoke tests.

1. Overview.

The ino64 branch extends the basic system types ino_t and dev_t from
32-bit to 64-bit, and nlink_t from 16-bit to 64-bit.  The struct dirent
layout is modified due to the larger size of ino_t, and also gains a
d_off (directory offset) member. As ino64 implies an ABI change anyway
the struct statfs f_mntfromname[] and f_mntonname[] array length
MNAMELEN is increased from 88 to 1024, to allow for longer mount path
names.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and by
employing other tricks.  Unfortunately, not everything can be fixed,
especially outside the base system.  For instance, third-party APIs
which pass struct stat around are broken in backward and forward-
incompatible way.

2. Motivation.

The main risk of the ino64 change is the uncontrolled ABI breakage.
Due to expansion of the basic types dev_t, ino_t and struct dirent,
the impact is not limited to one part of the system, but affects:
- kernel/userspace interface (syscalls ABI, mostly stat(2), kinfo
  and more)
- libc interface (mostly related to the readdir(3), FTS(3))
- collateral damage in other libraries that happens to use changed types
  in the interfaces.  See, for instance, libprocstat, for which compat
  was provided using symbol versioning, and libutil, which shlib version
  was bumped.

3. Quirks.

We handled kinfo sysctl MIBs, but other MIBs which report structures
depended on the changed type, are not handled in general.  It was
considered that the breakage is either in the management interfaces,
where we usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.  It
was decided that keeping ABI compat in this case is more useful than
reporting 64bit dev_t, for the sake of pstat.

4. Testing procedure.

The ino64 project can be tested by cloning the project branch from
GitHub or by applying the patch <from the Phabricator review | located
at URL | attached> to a working tree.  The authorative source is the
GitHub, I do not promise to update the review for each update.

To clone from GitHub:
% git clone -b ino64 https://github.com/FreeBSDFoundation/freebsd.git ino64

To fetch the patch from Phabricator:
- Visit https://reviews.freebsd.org/D10439
- Click "Download Raw Diff" at the upper right of the page

Or
% arc patch D10439

After that, in the checkout directory do
% (cd sys/kern && touch syscalls.master && make sysent)
% (cd sys/compat/freebsd32 && touch syscalls.master && make sysent)
If you use custom kernel configuration, ensure that
	options COMPAT_FREEBSD11
is included into the config.  Then build world and kernel in the
usual way, install kernel, reboot, install new world.  Do not make
shortcuts in the update procedure.

4.1 New kernel, old world.

Build and install pristine HEAD world, apply patch and only build and
install updated kernel. The system must work same as with the pristine
kernel.

4.2 New kernel, new world, old third-party applications.

Build and install patched kernel and world.  Applications compiled on the
pristine HEAD (e.g. installed by pkg from the regular portmgr builds) must
work without a regression.

4.3 32bit compat.

Same as 4.1 and 4.2, but for 32bit (i386) binaries on the amd64 host.
Note that big-endian host, like powerpc, might expose additional
bugs in the 32bit compat with the patch, but the testing is too cumbersome
to arrange.

4.4 Targeted tests.

Useful programs to check items 4.1, 4.2 and 4.3 are versions of the
following programs, taken from the pristine system:

  stat(8). Use it on regular file, file in /dev, socket, pipe and so
  on. For both native and 32bit compat, stat(8) must print reasonable
  information.

  procstat(1). Use it with the -f option to examine processes files.
  kinfo(9) data must be returned in the format acceptable for older
  apps.

  Use pristine find(1) binary with many arbitrary options on a system with
  installed patched world, in particular, libc. Find examines FTS(3),
  and compat shims in libc are non-trivial.

4.5 NFS server and client test.

Check that the NFS server in the patched kernel operates correctly and without
performance regressions.  Same for client.
NFS should be checked for all four combination of patched/unpatched
kernel server/client, because the filehandle format includes inode number.

4.6 Other filesystems

Generally, filesystems should see no change in the system behaviour,
since patch goal is to provide space to grow in the ABI.  In
particular, local filesystem layout must stay same.  Of course, it is
possible that some reliance on the exact sizes of the changed types
was left unnoticed during the patch review, in which case e.g. on-disk
format would be broken.  We do not expect this to slip in, but it is
possible and should be watched for.

4.7 Test accounting

The process accounting, as documented in acct(5), changed format of
the records due to dev_t increase.  Verify that the programs like
sa(8) and accton(8) correctly work with both old and new accounting
records.

5. Ports Status with ino64

A ports exp-run for ino64 is open in PR 218320. The failing ports each
responsible for more than 1 skipped port are:

lang/ghc			497
multimedia/webcamd		62
lang/gcc6-aux			54
devel/libgtop			39
sysutils/py-psutil		13
devel/llvm38			6
lang/rust			4
sysutils/py-psutil121		3

Patches are available for lang/llvm39, lang/llvm40, lang/ghc, and
lang/rust in the topic branch as ports.patch, and llvm38 can be fixed
in the same way as llvm39 and llvm40. Assistance with investigating
and fixing the port failures will be greatly appreciated.

Below is an overview of the problems and proposed solutions, probably
mostly relevant to the ports maintainers.

5.1. LLVM

LLVM includes a component called Address Sanitizer or ASAN, which tries
to intercept syscalls, and contains knowledge of the layout of many
system structures.  Since stat and lstat syscalls were removed and
several types and structures changed, this has to be reflected in the
ASAN hacks.

5.2. lang/ghc

The ghc compiler and parts of the runtime are written in Haskell,
which means that to compile ghc, you need a working Haskell compiler
for bootstrap.  By default for ghc, the runtime is provided in the form
of static libraries.  Static libraries reference default versions of
libc symbols, which are assigned the ELF symbol version at the final
linking stage.  As result, using such libraries results in using the
updated syscall, but internally the code still uses old system types.
The end result is the random memory corruption because both libc and
kernel assume new types.

This situation cannot be fixed by symbol versioning, because versioning
acts too late.  Instead, we hacked the bootstrap compiler by providing
symbols for modified syscalls in the shipped static libraries, which
symbols direct execution to the compat variants of the syscalls.  This
allows the bootstrap compiler to generate working code.  After the
stage0, compiler operates on new structures and things stabilize.

The real solution is, of course, to re-package the bootstrap compiler,
but for some time we need to support pre-ino64 HEAD in ports.  Also,
learning full scope of GHC maintainance duties, required for that,
is too much for the ino64 task.

5.3. lang/rust

Rustc has a similar structure to GHC, and same issue.  The same solution
of patching the bootstrap was done.

Also rust libstd and liblibc provide rustified definitions of the
system structures, which were updated to reflect the updated layout.
I failed to understand why e.g. struct stat has to be defined in 3
places at least, but all found locations were patched.

6. Next Steps

The tentative schedule for the ino64 project:

2017-04-20 Post wide call for testing

           Investigate and address port failures with maintainer support

2017-05-05 Request second exp-run with initial patches applied

           Investigate and address port failures with maintainer support

2017-05-19 Commit to HEAD

           Address post-commit failures where feasible



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170420194314.GI1788>