The freedom of choice has always been one of the biggest advantages of Linux. Being created by an ever-bigger community of IT enthusiasts, this kernel gives everyone the opportunity to contribute to its source code as well as a great many options and alternatives whether it comes to hardware, software or any configurations. And the file system is no exception to this rule: while using a proprietary OS you are likely to get stuck with its only default file system, in the world of Linux you’re not limited in this way and have a variety of them to choose from. But as we all know, with choice comes responsibility, and although file systems are hardly the most exciting thing in the world, you should at least be acquainted with the most common ones as each of them has its own peculiarities you should take into account.
As has been noted, Linux supports a range of file systems, each of them having its set of rules using which it controls the allocation of disk space to all data and associates data about each file (referred to as metadata) with the very file it belongs to. Yet, the most widely-spread one is the Extended File System or simply Ext. In fact, it is an entire family of file systems which have been used a de facto standard for many Linux distributions and are still very common.
Ext was developed by Remy Card specifically for Linux and came out in 1992. It was aimed to replace Minix, which was borrowed from the Minix operating system and had serious limitations such as maximum partition size of just 64 MB and short file names. Ext itself has already become outdated and is no longer supported by the kernel, but it had three revisions called as Ext2, Ext3, and Ext4 and each of them is currently in use.
Ext2 or the Second Extended File System was released in 1994. A disk formatted with Ext2 is divided into small parts called “blocks”, which are then grouped into larger units called block groups. Each block group is then split into a set of sections:
-the superblock contains metadata which describes the whole file system, the group descriptor table contains an entry for each of the block groups in the file system with some information about it;
-the data block bitmap indicates which of the data blocks within this very block group are “occupied” and which are “free”;
-the inode table contains a set of inodes which are special structures that keep information about a file and point to the blocks that actually contain the data;
-the inode bitmap specifies which of the inodes within this block group are “occupied” and which are “free” and can be allocated;
-data blocks store the actual contents of the file.
Unlike its predecessor, Ext2 features support for a maximum file size of 2 TB and volume size of 32 TB, boasts increased speed and some additional features such as ability to track file system state, store the last file access and last inode modification date, a special field to mark status of file system as “dirty” or “clean”. Yet, in contrast to Ext3 and Ext4, it is not a journaling file system, so it is not able to minimize data corruption in case of an unexpected failure which makes it rarely used on modern personal computers and servers. However, this drawback doesn’t prevent this file system from being applied on various portable flash-based storage media like USB flash drives or memory cards. The lack of journaling means fewer write operations to the storage and this helps to increase performance and the lifespan of the device which typically has a limited number of write cycles. Still, you should remember that unlike Windows’ FAT, it is compatible only with devices powered by Linux.
Ext3 was introduced by Stephen Tweedie in 2001. Its structure is very similar to Ext2, whereas the main difference was the introduction of journaling. In the event of an unexpected system shutdown, the journal provides stronger guarantees for all data to remain intact and helps to bring it back to the working state without the need for long-lasting system consistency checks. Due to its relative simplicity and wide testing base, it is considered to be stable and works just fine on most hardware. However, as Ext3 was meant to be backward compatible with the earlier ext2, it lacks a lot of features that are typical of modern file systems.
Ext4 was developed in 2008 as the successor of Ext3. It’s also a journaling file system (this feature can be manually disabled though) which supports volumes up to one exabyte as well as individual files up to 16 terabytes in size. Ext4 has brought considerable improvements over ext3 owing to the application of various modern techniques such as:
Extents. As opposed to the traditional block mapping scheme employed by its predecessors, which proves to be very inefficient for large files as an entry has to be stored for every single block which makes up the file resulting in enormous mappings that require a lot of recourses, Ext4 uses extents. An extent is, in essence, a bunch of contiguous physical blocks, so, only the address of the first and last block have to be stored, which helps to improve performance, reduce fragmentation and save storage space by reducing the amount of metadata.
Multiblock allocation. This technique allows allocating many blocks in a single operation instead of a single block per operation, dramatically reducing the amount of consumed CPU resources.
Journal checksumming. The system needs to constantly perform write operations to the journal, making it the most used part of the disk and thus very prone to failures, whereas the journal, when corrupted, may cause the corruption of the whole file system. Ext4 checksums the journal data to promptly detect which blocks on the disk that belong to the journal have become corrupted or are about to fail.
Persistent preallocation. This feature allows applications to book disk space they know they will need for their data in advance. As all the needed blocks are allocated at one time and as contiguously as possible, disk fragmentation is reduced to the minimum and applications always have enough space for their needs.
Inode reservation. When a directory is created, several inodes get reserved for it, which can be used by it in the future. This improves the performance and makes file creation as well as file deletion more efficient.
Delayed allocation. Ext4 doesn’t immediately allocates data to the blocks on the disk but keeps it in memory until this file is really going to be written to the disk (after about 30-150 seconds). This feature reduces file fragmentation and improves performance, but, on the other hand, can lead to data loss in case of a blackout or server crash, as if it happens during this delay, the file may get entirely wiped with zero bytes and become empty.
It is important to know that despite all of its strengths, Ext4 as well as its predecessors, Ext2 and Ext3, has a serious drawback. Data lost from these file systems has extremely low chances to be recovered, especially with correct filenames. The reason lies in the way their files get deleted: file inodes that keep crucial information indicating where the content of the file is located simply get wiped. The use of journal in Ext3 and Ext4 will make things a bit better, but even if you use quality data recovery software, with this file system you will never get a 100% result.
As a rule, Ext goes by default on most Linux distributions, but if you need to exactly determine which file system controls your drive, you can use the command called df. Type it as follows: $ df - T . Under the type heading, you will see the name of your file system.
The performance, reliability and wide support of Ext make it extremely popular among Linux users. However, on the flip side, it comes with a fair share of disadvantages, therefore, if it is highly important that your data should always remain safe, it would be better for you to stay away from this file system.