Date: Thu, 20 Apr 2017 22:43:14 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: freebsd-current@freebsd.org, freebsd-fs@freebsd.org, freebsd-ports@freebsd.org Cc: emaste@freebsd.org, Kirk McKusick <mckusick@mckusick.com> Subject: 64-bit inodes (ino64) Status Update and Call for Testing Message-ID: <20170420194314.GI1788@kib.kiev.ua>
next in thread | raw e-mail | index | archive | help
Inodes are data structures corresponding to objects in a file system, such as files and directories. FreeBSD has historically used 32-bit values to identify inodes, which limits file systems to somewhat under 2^32 objects. Many modern file systems internally use 64-bit identifiers and FreeBSD needs to follow suit to properly and fully support these file systems. The 64-bit inode project, also known as ino64, started life many years ago as a project by Gleb Kurtsou (gleb@). After that time several people have had a hand in updating it and addressing regressions, after mckusick@ picked up and updated the patch, and acted as a flag-waver. Sponsored by the FreeBSD Foundation I have spent a significant effort on outstanding issues and integration -- fixing compat32 ABI, NFS and ZFS, addressing ABI compat issues and investigating and fixing ports failures. rmacklem@ provided feedback on NFS changes, emaste@ and jhb@ provided feedback and review on the ABI transition support. pho@ performed extensive testing and identified a number of issues that have now been fixed. kris@ performed an initial ports investigation followed by an exp-run by antoine@. emaste@ helped with organization of the process. This note explains how to perform useful testing of the ino64 branch, beyond typical smoke tests. 1. Overview. The ino64 branch extends the basic system types ino_t and dev_t from 32-bit to 64-bit, and nlink_t from 16-bit to 64-bit. The struct dirent layout is modified due to the larger size of ino_t, and also gains a d_off (directory offset) member. As ino64 implies an ABI change anyway the struct statfs f_mntfromname[] and f_mntonname[] array length MNAMELEN is increased from 88 to 1024, to allow for longer mount path names. ABI breakage is mitigated by providing compatibility using versioned symbols, ingenious use of the existing padding in structures, and by employing other tricks. Unfortunately, not everything can be fixed, especially outside the base system. For instance, third-party APIs which pass struct stat around are broken in backward and forward- incompatible way. 2. Motivation. The main risk of the ino64 change is the uncontrolled ABI breakage. Due to expansion of the basic types dev_t, ino_t and struct dirent, the impact is not limited to one part of the system, but affects: - kernel/userspace interface (syscalls ABI, mostly stat(2), kinfo and more) - libc interface (mostly related to the readdir(3), FTS(3)) - collateral damage in other libraries that happens to use changed types in the interfaces. See, for instance, libprocstat, for which compat was provided using symbol versioning, and libutil, which shlib version was bumped. 3. Quirks. We handled kinfo sysctl MIBs, but other MIBs which report structures depended on the changed type, are not handled in general. It was considered that the breakage is either in the management interfaces, where we usually allow ABI slip, or is not important. Struct xvnode changed layout, no compat shims are provided. For struct xtty, dev_t tty device member was reduced to uint32_t. It was decided that keeping ABI compat in this case is more useful than reporting 64bit dev_t, for the sake of pstat. 4. Testing procedure. The ino64 project can be tested by cloning the project branch from GitHub or by applying the patch <from the Phabricator review | located at URL | attached> to a working tree. The authorative source is the GitHub, I do not promise to update the review for each update. To clone from GitHub: % git clone -b ino64 https://github.com/FreeBSDFoundation/freebsd.git ino64 To fetch the patch from Phabricator: - Visit https://reviews.freebsd.org/D10439 - Click "Download Raw Diff" at the upper right of the page Or % arc patch D10439 After that, in the checkout directory do % (cd sys/kern && touch syscalls.master && make sysent) % (cd sys/compat/freebsd32 && touch syscalls.master && make sysent) If you use custom kernel configuration, ensure that options COMPAT_FREEBSD11 is included into the config. Then build world and kernel in the usual way, install kernel, reboot, install new world. Do not make shortcuts in the update procedure. 4.1 New kernel, old world. Build and install pristine HEAD world, apply patch and only build and install updated kernel. The system must work same as with the pristine kernel. 4.2 New kernel, new world, old third-party applications. Build and install patched kernel and world. Applications compiled on the pristine HEAD (e.g. installed by pkg from the regular portmgr builds) must work without a regression. 4.3 32bit compat. Same as 4.1 and 4.2, but for 32bit (i386) binaries on the amd64 host. Note that big-endian host, like powerpc, might expose additional bugs in the 32bit compat with the patch, but the testing is too cumbersome to arrange. 4.4 Targeted tests. Useful programs to check items 4.1, 4.2 and 4.3 are versions of the following programs, taken from the pristine system: stat(8). Use it on regular file, file in /dev, socket, pipe and so on. For both native and 32bit compat, stat(8) must print reasonable information. procstat(1). Use it with the -f option to examine processes files. kinfo(9) data must be returned in the format acceptable for older apps. Use pristine find(1) binary with many arbitrary options on a system with installed patched world, in particular, libc. Find examines FTS(3), and compat shims in libc are non-trivial. 4.5 NFS server and client test. Check that the NFS server in the patched kernel operates correctly and without performance regressions. Same for client. NFS should be checked for all four combination of patched/unpatched kernel server/client, because the filehandle format includes inode number. 4.6 Other filesystems Generally, filesystems should see no change in the system behaviour, since patch goal is to provide space to grow in the ABI. In particular, local filesystem layout must stay same. Of course, it is possible that some reliance on the exact sizes of the changed types was left unnoticed during the patch review, in which case e.g. on-disk format would be broken. We do not expect this to slip in, but it is possible and should be watched for. 4.7 Test accounting The process accounting, as documented in acct(5), changed format of the records due to dev_t increase. Verify that the programs like sa(8) and accton(8) correctly work with both old and new accounting records. 5. Ports Status with ino64 A ports exp-run for ino64 is open in PR 218320. The failing ports each responsible for more than 1 skipped port are: lang/ghc 497 multimedia/webcamd 62 lang/gcc6-aux 54 devel/libgtop 39 sysutils/py-psutil 13 devel/llvm38 6 lang/rust 4 sysutils/py-psutil121 3 Patches are available for lang/llvm39, lang/llvm40, lang/ghc, and lang/rust in the topic branch as ports.patch, and llvm38 can be fixed in the same way as llvm39 and llvm40. Assistance with investigating and fixing the port failures will be greatly appreciated. Below is an overview of the problems and proposed solutions, probably mostly relevant to the ports maintainers. 5.1. LLVM LLVM includes a component called Address Sanitizer or ASAN, which tries to intercept syscalls, and contains knowledge of the layout of many system structures. Since stat and lstat syscalls were removed and several types and structures changed, this has to be reflected in the ASAN hacks. 5.2. lang/ghc The ghc compiler and parts of the runtime are written in Haskell, which means that to compile ghc, you need a working Haskell compiler for bootstrap. By default for ghc, the runtime is provided in the form of static libraries. Static libraries reference default versions of libc symbols, which are assigned the ELF symbol version at the final linking stage. As result, using such libraries results in using the updated syscall, but internally the code still uses old system types. The end result is the random memory corruption because both libc and kernel assume new types. This situation cannot be fixed by symbol versioning, because versioning acts too late. Instead, we hacked the bootstrap compiler by providing symbols for modified syscalls in the shipped static libraries, which symbols direct execution to the compat variants of the syscalls. This allows the bootstrap compiler to generate working code. After the stage0, compiler operates on new structures and things stabilize. The real solution is, of course, to re-package the bootstrap compiler, but for some time we need to support pre-ino64 HEAD in ports. Also, learning full scope of GHC maintainance duties, required for that, is too much for the ino64 task. 5.3. lang/rust Rustc has a similar structure to GHC, and same issue. The same solution of patching the bootstrap was done. Also rust libstd and liblibc provide rustified definitions of the system structures, which were updated to reflect the updated layout. I failed to understand why e.g. struct stat has to be defined in 3 places at least, but all found locations were patched. 6. Next Steps The tentative schedule for the ino64 project: 2017-04-20 Post wide call for testing Investigate and address port failures with maintainer support 2017-05-05 Request second exp-run with initial patches applied Investigate and address port failures with maintainer support 2017-05-19 Commit to HEAD Address post-commit failures where feasible
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170420194314.GI1788>