Skip to content
Home » Linux Unix » The Size Limit for Legacy Tar to Accomodate

The Size Limit for Legacy Tar to Accomodate

A quote from wikipedia:
Numeric values are encoded in octal numbers using ASCII digits, with leading zeroes. For historical reasons, a final NUL or space character should also be used. Thus although there are 12 bytes reserved for storing the file size, only 11 octal digits can be stored. This gives a maximum file size of 8 gigabytes on archived files.

To overcome this limitation, star in 2001 introduced a base-256 coding that is indicated by setting the high-order bit of the leftmost byte of a numeric field. GNU-tar and BSD-tar followed this idea. Additionally, versions of tar from before the first POSIX standard from 1988 pad the values with spaces instead of zeroes.

Key implementations in order of origin:
  • Solaris tar, based on the original UNIX V7 tar and comes as the default on the Solaris operating system
  • star (unique standard tape archiver), written in 1982 by Jörg Schilling, is published under the CDDL-license. A test of star, reported in 1999, achieved a throughput of more than 14 MB/s giving it the label of "fastest known implementation of a tar archiver"
  • GNU tar is the default on most GNU/Linux distributions. It is based on the public domain implementation pdtar which started in 1987. Recent versions can use various formats, including ustar, pax, GNU and v7 formats.
  • FreeBSD tar (also BSD tar) has become the default tar on most Berkeley Software Distribution-based operating systems including Mac OS X. The core functionality is available as libarchive for inclusion in other applications. This implementation automatically detects the format of the file and can extract from tar, pax, cpio, zip, jar, ar, xar, rpm and ISO 9660 cdrom images.
Additionally, most pax implementations can read and create many types of tar files.
That is, 8GB is the maximum size of  any archived file (not the tar file itself) for historic tar. As the above quote suggested, we should use gnu-tar or bsd-tar instead. In the present days, we call tar usually means gnu-tar, but it's called gtar in Solaris.

Here is a table that summarizes the limitations of each of these formats:

Format UID File Size File Name Devn
gnu 1.8e19 Unlimited Unlimited 63
oldgnu 1.8e19 Unlimited Unlimited 63
v7 (Solaris)2097151 8GB 99 n/a
ustar 2097151 8GB 256 21
posix Unlimited Unlimited Unlimited Unlimited

(Source: GNU tar 1.28: 8. Controlling the Archive Format)

Leave a Reply

Your email address will not be published. Required fields are marked *