d in the 'doc' directory in +a number of different formats. + +You should also read the copious comments in `archive.h` and the +source code for the sample programs for more details. Please let us +know about any errors or omissions you find. + +## Supported Formats + +Currently, the library automatically detects and reads the following formats: + + * Old V7 tar archives + * POSIX ustar + * GNU tar format (including GNU long filenames, long link names, and sparse files) + * Solaris 9 extended tar format (including ACLs) + * POSIX pax interchange format + * POSIX octet-oriented cpio + * SVR4 ASCII cpio + * Binary cpio (big-endian or little-endian) + * PWB binary cpio + * ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions) + * ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives) + * ZIPX archives (with support for bzip2, zstd, ppmd8, lzma and xz compressed entries) + * GNU and BSD 'ar' archives + * 'mtree' format + * 7-Zip archives (including archives that use zstandard compression) + * Microsoft CAB format + * LHA and LZH archives + * RAR and RAR 5.0 archives (with some limitations due to RAR's proprietary status) + * WARC archives + * XAR archives + +The library also detects and handles any of the following before evaluating the archive: + + * uuencoded files + * files with RPM wrapper + * gzip compression + * bzip2 compression + * compress/LZW compression + * lzma, lzip, and xz compression + * lz4 compression + * lzop compression + * zstandard compression + +The library can create archives in any of the following formats: + + * POSIX ustar + * POSIX pax interchange format + * "restricted" pax format, which will create ustar archives except for + entries that require pax extensions (for long filenames, ACLs, etc). + * Old GNU tar format + * Old V7 tar format + * POSIX octet-oriented cpio + * SVR4 "newc" cpio + * Binary cpio (little-endian) + * PWB binary cpio + * shar archives + * ZIP archives (with uncompressed or "deflate" compressed entries) + * ZIPX archives (with bzip2, zstd, lzma or xz compressed entries) + * GNU and BSD 'ar' archives + * 'mtree' format + * ISO9660 format + * 7-Zip archives (including archives that use zstandard compression) + * WARC archives + * XAR archives + +When creating archives, the result can be filtered with any of the following: + + * uuencode + * base64 + * gzip compression + * bzip2 compression + * compress/LZW compression + * lzma, lzip, and xz compression + * lz4 compression + * lzop compression + * zstandard compression + +## Notes about the Library Design + +The following notes address many of the most common +questions we are asked about libarchive: + +* This is a heavily stream-oriented system. That means that + it is optimized to read or write the archive in a single + pass from beginning to end. For example, this allows + libarchive to process archives too large to store on disk + by processing them on-the-fly as they are read from or + written to a network or tape drive. This also makes + libarchive useful for tools that need to produce + archives on-the-fly (such as webservers that provide + archived contents of a users account). + +* In-place modification and random access to the contents + of an archive are not directly supported. For some formats, + this is not an issue: For example, tar.gz archives are not + designed for random access. In some other cases, libarchive + can re-open an archive and scan it from the beginning quickly + enough to provide the needed abilities even without true + random access. Of course, some applications do require true + random access; those applications should consider alternatives + to libarchive. + +* The library is designed to be extended with new compression and + archive formats. The only requirement is that the format be + readable or writable as a stream and that each archive entry be + independent. There are articles on the libarchive Wiki explaining + how to extend libarchive. + +* On read, compression and format are always detected automatically. + +* The same API is used for all formats; it should be very + easy for software using libarchive to transparently handle + any of libarchive's archiving formats. + +* Libarchive's automatic support for decompression can be used + without archiving by explicitly selecting the "raw" and "empty" + formats. + +* I've attempted to minimize static link pollution. If you don't + explicitly invoke a particular feature (such as support for a + particular compression or format), it won't get pulled in to + statically-linked programs. In particular, if you don't explicitly + enable a particular compression or decompression support, you won't + need to link against the corresponding compression or decompression + libraries. This also reduces the size of statically-linked + binaries in environments where that matters. + - * The library is generally _thread safe_ depending on the platform: ++* The library is generally _thread-safe_ depending on the platform: + it does not define any global variables of its own. However, some + platforms do not provide fully thread-safe versions of key C library + functions. On those platforms, libarchive will use the non-thread-safe + functions. Patches to improve this are of great interest to us. + +* The function `archive_write_disk_header()` is _not_ thread safe on + POSIX machines and could lead to security issue resulting in world + writeable directories. Thus it must be mutexed by the calling code. + This is due to calling `umask(oldumask = umask(0))`, which sets the + umask for the whole process to 0 for a short time frame. + In case other thread calls the same function in parallel, it might + get interrupted by it and cause the executable to use umask=0 for the + remaining execution. + This will then lead to implicitly created directories to have 777 + permissions without sticky bit. + +* In particular, libarchive's modules to read or write a directory + tree do use `chdir()` to optimize the directory traversals. This + can cause problems for programs that expect to do disk access from + multiple threads. Of course, those modules are completely + optional and you can use the rest of libarchive without them. + - * The library is _not_ thread aware, however. It does no locking ++* The library is _not_ thread-aware, however. It does no locking + or thread management of any kind. If you create a libarchive + object and need to access it from multiple threads, you will + need to provide your own locking. + +* On read, the library accepts whatever blocks you hand it. + Your read callback is free to pass the library a byte at a time + or mmap the entire archive and give it to the library at once. + On write, the library always produces correctly-blocked output. + +* The object-style approach allows you to have multiple archive streams + open at once. bsdtar uses this in its "@archive" extension. + +* The archive itself is read/written using callback functions. + You can read an archive directly from an in-memory buffer or + write it to a socket, if you wish. There are some utility + functions to provide easy-to-use "open file," etc, capabilities. + +* The read/write APIs are designed to allow individual entries + to be read or written to any data source: You can create + a block of data in memory and add it to a tar archive without + first writing a temporary file. You can also read an entry from + an archive and write the data directly to a socket. If you want + to read/write entries to disk, there are convenience functions to + make this especially easy. + +* Note: The "pax interchange format" is a POSIX standard extended tar + format that should be used when the older _ustar_ format is not + appropriate. It has many advantages over other tar formats + (including the legacy GNU tar format) and is widely supported by + current tar implementations.