Types of Linux file systems and their features. Physical Organization ext2 Filesystem ext2 3

As with any UNIX file system, ext2 can be divided into the following components:

− blocks and groups of blocks;

− index descriptor;

− superblock.

The entire disk partition space is divided into fixed-size blocks that are multiples of the sector size: 1024, 2048, 4096, or 8192 bytes. The block size is specified when creating a file system on a disk partition. A smaller block size saves hard disk space, but also limits the maximum file system size. All blocks have serial numbers. In order to reduce fragmentation and the number of hard disk head movements when reading large data arrays, blocks are combined into block groups.

The basic concept of a file system is an inode, or inode (information node). This is a special structure that contains information about the attributes and physical location of a file. The inodes are organized into a table that is contained at the beginning of each group of blocks.

Figure 10 - Generalized block diagram of FS ext2

The superblock is the main element of the ext2 file system. It contains general information about the file system:

the total number of blocks and inodes in the file system,

the number of free blocks and inodes in the file system,

file system block size,

the number of blocks and inodes in a block group,

inode size,

file system identifier.

The superblock is 1024 bytes from the start of the section. The health of the file system directly depends on the integrity of the superblock. The operating system creates several backup copies of the superblock in case the partition is corrupted. The next block after the superblock contains the global descriptor table - a description of block groups, which is an array containing general information about all groups of blocks in the section.

All blocks on an ext2 partition are divided into block groups. For each group, a separate entry is created in the global descriptor table, which stores the main parameters:

block number in the block bitmap,

the block number in the inode bitmap,

block number in the inode table,

the number of free blocks in the group,

the number of inodes containing directories.

A block bitmap is a structure, each bit of which indicates whether the corresponding block is assigned to any file. If the bit is 1, then the block is busy. A similar function is performed by the inode bitmap, which shows which inodes are occupied and which are not. Linux kernel, using the number of inodes containing directories, tries to evenly distribute directory inodes among groups, and tries to move file inodes to the group with the parent directory if possible. All remaining space, indicated in the table as data, is reserved for storing files.

File system ext2 uses the following file block addressing scheme. To store the file address, 15 fields are allocated, each of which consists of 4 bytes. If the file fits in 12 blocks, then the numbers of the corresponding clusters are directly listed in the first twelve fields of the address. If the file size exceeds 12 blocks, then the next field contains the address of the cluster in which the numbers of the next blocks of the file can be located. Thus, the 13th field is used for indirect addressing.

With a maximum block size of 4096 bytes, the cluster corresponding to the 13th field can contain up to 1024 next block numbers in the file. If the file size exceeds 12+1024 blocks, then the 14th field is used, which contains the address of a cluster containing 1024 cluster numbers, each of which refers to 1024 blocks of the file. Here double indirect addressing is used. Finally, if the file contains more than 12+1024+1048576 blocks, then the last 15th field is used for triple indirection.

This system addressing allows, with a maximum block size of 4096 bytes, to have files larger than 2 TB.

ext3 or ext3fs is a journaling file system used in operating systems based on the Linux kernel. Based on FS ext2.

The main difference from ext2 is that ext3 is journaled, that is, it provides for writing some data that allows you to restore the file system if the computer crashes.

The standard provides three logging modes:

writeback: only the metadata of the file system, that is, information about its change, is written to the journal. Cannot guarantee data integrity, but already noticeably reduces verification time compared to ext2;

ordered: the same as writeback, but the data is written to the file before writing information about changes to this file. Slightly reduces performance, also cannot guarantee the integrity of the data (although it increases the likelihood of their safety when appending to the end of an existing file);

journal: Full journaling of both FS metadata and user data. The slowest but also the most safe mode; can guarantee data integrity when storing the log on a separate partition (or better, on a separate hard disk).

The ext3 file system can support files up to 1 TB in size. With the Linux kernel 2.4, the size of the file system is limited by the maximum block device size, which is 2 terabytes. In Linux 2.6 (for 32-bit processors), the maximum block device size is 16 TB, however ext3 only supports up to 4 TB.

ext4 is a file system based on and compatible with ext3 (forward and backwards). It differs from ext3 in the support of extents, groups of contiguous physical blocks managed as a whole; increased integrity check speed and a number of other improvements.

New ext4 features (compared to ext3):

Using extents. In the ext3 filesystem, data was addressed in the traditional way, block by block. This addressing method becomes less efficient as the file size grows. Extents allow you to address a large number (up to 128 MB) of contiguous blocks with a single descriptor. Up to 4 extent pointers can be placed directly in the inode, which is sufficient for small to medium sized files.

48-bit block numbers. With a block size of 4K, this allows addressing up to one exabyte (2 48 *4KB = 2 50 *1KB = 2 60 B = 1 EB).

Allocation of blocks in groups (multiblock allocation). The file system stores not only information about the location of free blocks, but also the number of free blocks that follow each other. When allocating space, the file system finds a fragment into which data can be written without fragmentation. This reduces the level of fragmentation of the file system as a whole.

Delayed allocation of blocks. The allocation of blocks for storing file data occurs immediately before the physical write to disk (for example, when calling sync), and not when calling write. As a result, block allocation operations can be done not one at a time, but in groups, which in turn minimizes fragmentation and speeds up the block allocation process. On the other hand, it increases the risk of data loss in the event of a sudden power failure.

The limit of 32000 directories has been exceeded.

Reservation of inodes when creating a directory (directory inodes reservation). When creating a directory, several inodes are reserved. Subsequently, when creating files in this directory, reserved inodes are used first, and if there are none left, the usual procedure is followed.

inode size. Inode size (default) increased from 128 bytes to 256 bytes. This made it possible to realize the benefits listed below.

Timestamps with nanosecond precision (nanosecond timestamps). Higher precision of times stored in inode. The range of stored times has also been expanded: if earlier the upper limit of the stored time was January 18, 2038, now it is April 25, 2514.

inode version. The inode has a number that increases each time the inode of the file is changed.

Storing extended attributes in an inode (EA in inode). Storing extended attributes such as ACLs, SELinux attributes, and so on can improve performance. Attributes for which there is not enough inode space are stored in a separate 4KB block.

Journal checksumming. Checksums of log transactions. Allows you to better find and (sometimes) fix errors when checking system integrity after a crash.

Preliminary allocation (persistent preallocation). Now, in order for the application to be guaranteed to take up space in the file system, it fills it with zeros. In ext4, it became possible to reserve many blocks for writing and not spend too much time on initialization. If the application tries to read the data, it will receive a message that it has not been initialized. Thus, unauthorized reading of deleted data will not work.

Defragmentation without unmounting (online Defragmentation).

Uninitialized blocks. Allows you to speed up the file system check. Blocks marked as unused are checked in groups, and a detailed check is performed only if the check of the group has shown that there is damage inside.

Lecture 12

Subject: Directory systems

The link between the file management system and the set of files is file directory. The simplest form of a directory system is that there is one directory that contains all the files. The directory contains information about files, including attributes, location, ownership. Users refer to files by symbolic names. However, the capacity of human memory limits the number of object names that a user can refer to by name. The hierarchical organization of the namespace allows you to significantly expand these boundaries. That is why directory systems have hierarchical structure. A graph describing a directory hierarchy can be a tree or a network. Directories form a tree if a file is allowed to be in only one directory (Figure 7.11), and a network if a file can be in more than one directory.

For example, in Ms-Dos and Windows directories form a tree structure, and in UNIX - a network structure. In general computing system can have several disk devices, even a PC always has several disks: flexible, hard drive, CD-ROM (DVD). How to organize file storage in this case?

Rice. Directory systems

The first solution is to host an offline file system on each device, i.e. files on this device are described by a directory tree that has nothing to do with directory trees on other devices. In this case, to uniquely identify the file, the user must specify the logical device identifier along with the compound symbolic file name. An example of such autonomous existence is MS-DOS, Windows 95/98/Me/XP.

Another solution is to organize file storage in such a way that the user is given the opportunity to combine file systems located on different devices, into a single file system described by a single directory tree. This operation is called mounting.

On UNIX, mounting is done as follows. Among all the available logical disk devices, one stands out, called the system. Let there be two file systems located on different logical disks, and one of the disks is a system one (Fig. 7.12).

The file system located on system drive, is called the root. To link the file hierarchies in the root file system, some existing directory is selected, in this example the loc directory. Once the mount is complete, the selected loc directory becomes the root directory of the second file system. Through this directory, the mounted file system is connected as a subtree to the general tree.

Rice. Mounting

Attribute

The concept of a file includes not only the data it stores and its name, but also information that describes the properties of the file. This information constitutes the attributes of the file. The list of attributes may be different in different operating systems. An example of possible attributes is shown below.

Attribute	Meaning
File type	Regular, catalog, special, etc.
File owner	Current owner
File Creator	ID of the user who created the file
Password	Password to access the file
Time	Creation, last access, last change
Current file size	Number of bytes per entry
Maximum size	The number of bytes to which the file size can be increased
Read-only flag	0 - read / write, 1 - read only
Flag "hidden"	0 - normal, 1 - do not show in the catalog file list
Flag "system"	0 - normal, 1 - system
Flag "archived"	0 - archived, 1 - archive required
ASCII flag/binary	0 - ASCII, 1 - binary
Random access flag	0 - sequential access only, 1 - random access
Flag "temporary"	0 - normal, 1 - deletion after the end of the process
Key position	Offset to the key in the entry
Key length	Number of bytes in the key field

The user can access the attributes using the facilities provided for this purpose by the file system. It is usually allowed to read the value of any attributes, and change only some.

File attribute values can be contained in directories, as is done, for example, in MS-DOS (Figure 7.7). Another option is to place the attributes in special tables, in which case the directories contain links to these tables.

Rice. 7. MS DOS File Attributes

ext2(also referred to as ext2fs) - Second Extended File System (Second Extended File System) is a file system built on the Linux kernel. The creator and developer of ext2 is Remy Card. The ext2 file system was built by him to replace the old one, previous version- ext.

In terms of such indicators as speed and performance, this file system can serve as a benchmark. This is evidenced by the results of file system performance tests. For example, in the Dell Tech Center's sequential read/write speed tests, ext2 outperforms ext3, and only outperforms the more modern ext4 in read speed.

The main disadvantage of ext2 is that it is not a journaling filesystem. However, this shortcoming was eliminated in the next file system - ext3.

ext2 is used on flash cards and solid state drives(SSD) since the lack of journaling is an advantage when working with drives with write cycle limits.

History of ext2

At the time of the rapid development of the Linux system, it used the Minix OS file system. It was quite stable, but it was 16-bit. As a result, it had a hard limit of 64 Mb per partition. In addition, there was a limit on the maximum length of a file name, which was 14 characters.

These limitations combined were the reason for the development of the "extended file system" (hence the term " Extended File System»). She was given the task of solving two of Minix's key problems. The new file system was made public in April 1992. It was Ext, which extended the file size limit to 2 gigabytes and set the file name limit to 255 characters.

However, despite the success of the new file system, there were still quite a few unsolved problems. For example, there was no support for separate access, there were no timestamps for data modification. The need to solve these problems was the motive for creating the next version of the extended ext2 file system (" Second Extended File System). ext2 was developed in January 1993 and also implemented POSIX-compliant ACLs and extended file attributes.

Logical organization of ext2

The ext2 directory hierarchy graph is represented as a network. This is due to the fact that one file can be included in several directories at once.

All file types have symbolic names. In hierarchically organized file systems, three types of names are generally used: simple, compound, and relative. Same with ext2. In the case of a simple name, the limitation is that its length should not exceed 255 characters, in addition, the name should not contain a NULL character and a slash.

As for the NULL character, the limitations are related to the representation of strings in the C language, in the case of the slash character, everything lies in the fact that it is used as a separator character between directories.

The fully qualified name is a chain of simple symbolic names of all directories through which the path passes from the root to given file. In ext2, a file can be in multiple directories, which means it can have multiple full names (one file, multiple full names). But anyway, the full name defines the file.

ext2 attributes:

file type and permissions,
owner, access group,
information on authorized transactions,
creation time, last access date, last modification date and last deletion time,
current file size,
file specification:

regular file,
catalog,
byte-oriented device file,
block device file,
named pipe,
symlink,

number of blocks occupied,
others

File attributes are stored in special tables, not in directories, as is common in simple file systems. As a result, the catalog has a very simple structure, consisting of two parts: an inode number and a name.

Physical organization of ext2

Disk partition structure

As part of ext2, the following can be distinguished:

blocks and groups of blocks;
inode;
superblock.

The entire disk partition space is divided into fixed-size blocks, the blocks being a multiple of the sector size (1024, 2048, 4096 or 8192 bytes). The block size is specified when creating a file system on a disk partition. All blocks are assigned serial numbers. To reduce fragmentation and head movement hard drive when reading large data arrays, blocks are combined into groups.

The basic concept of a file system is an inode (also called inode - information node). This is a special structure containing information about the attributes and physical location of a file. The index decryptors are combined into a table contained at the beginning of each group of blocks. The superblock is the main element of the ext2 file system. It contains general information about the file system. The superblock is located 1024 bytes from the start of the section. The integrity of the superblock determines the health of the file system. The OS creates several backup copies of the superblock - in case the partition is damaged. The next block after the superblock contains a global descriptor table - a description of block groups in the form of an array with general information about all block groups.

Block group

All blocks on an ext2 partition are divided into groups. A separate entry is created for each group in the global descriptor table. This entry stores the basic parameters, such as: the block number in bitmaps and tables, the number of free blocks in the group, the number of inodes containing directories.

Block bitmap is a system in which each bit informs whether the block corresponding to it is assigned to any file. If the bit is 1, then the block is busy. An inode bitmap performs a similar function: it shows which inodes are occupied and which are not. The Linux kernel tries to evenly distribute directory inodes into groups, and moves file inodes to the group with the parent directory. All remaining space, which appears in the table as data, is allocated for storing files.

Data Addressing System

The data addressing system is one of the most serious and key components of the file system. Thanks to it, the desired file is located among the many empty or busy blocks on the disk.

ext2 uses the following file block addressing scheme. To store the file address, 15 fields are allocated, each of which consists of 4 bytes. If the file fits in 12 blocks, then the numbers of the corresponding clusters are listed in the first twelve fields of the address. If the file size exceeds 12 blocks, then the next field contains the address of the cluster in which the numbers of the next blocks of the file can be located. Thus, the thirteenth field is used for indirect addressing.

With a maximum block size of 4096 bytes, the cluster corresponding to the 13th field can contain up to 1024 next block numbers in the file. If the file size exceeds 12+1024 blocks, then the 14th field is used, which contains the address of a cluster containing 1024 cluster numbers, each of which refers to 1024 blocks of the file. Here double indirect addressing is used. And if the file includes more than 12 + 1024 + 1048576 blocks, then the last 15th field is used for triple indirect addressing.

Such an addressing system allows, with a maximum block size of 4096 bytes, to have files larger than 2 TB.

Consider the logical structure of the ext2fs file system. Physically HDD divided into sectors of 512 bytes. The first sector of a disk partition in any file system is considered the boot area. In the primary partition, this area contains a boot entry, a piece of code that initiates the boot process of the operating system at startup. This area is not used on other sections. The remaining sectors are combined into logical blocks of 1, 2 or 4 kilobytes. A logical block is the smallest addressable chunk of data: each file's data occupies an integer number of blocks. Blocks, in turn, are combined into groups of blocks. Block groups and blocks within a group are numbered sequentially, starting from 1.

The data structures used when working with the ext2fs file system are described in the /usr/include/linux/ext2fs .h header file.

The superblock serves as the starting point of the file system and stores all

information about her. It has a size of 1024 bytes and is located at an offset of 1024 bytes from the beginning of the file system. In each group of blocks, it is duplicated, which allows you to quickly restore it after failures. The superblock determines the size of the file system, the maximum number of files in the partition, the amount of free space, and contains information about where to look for unallocated areas. When the OS starts, the superblock is read into memory, and all changes to the file system are first reflected in the copy of the superblock located in the operating system, and are written to disk only periodically. This improves system performance because many users and processes are constantly updating files. On the other hand, when the system is stopped, the superblock must be written to disk, which does not allow you to turn off the computer by simply turning off the power. Otherwise, at the next boot, the information recorded in the sunerblock will not be

corresponding to the actual state of the file system.

The description (descriptor) of the group of blocks follows the superblock. The information stored in it allows you to find block and inode bitmaps, as well as an inode table.

A block bitmap is a structure, each bit of which indicates whether a block of the same number is assigned to any file. A value of 1 indicates that the block is busy. This map is used to search for free blocks in cases where you need to allocate space for a file..

The inode bitmap performs a similar function to the inode table: it shows which inodes are busy.

Each file has one and only one inode (inode, i-node, information node), which is identified by its serial number - the file index. The inode contains the file's metadata. Among them are all attributes of the file, except for its name, and a pointer to the file's data.

For a regular file or directory, this pointer is an array of 15 block addresses. The first 12 addresses in this array are direct links to the block numbers where the file data is stored. If the data does not fit into 12 blocks, then the indirect addressing mechanism is activated. The next address in this array is an indirect link, that is, the address of a block that stores a list of addresses of the next blocks with data from this file.

How many blocks of data can be addressed like this? The block address takes 4 bytes, the block, as already mentioned, is 1, 2 or 4 kilobytes. This means that 256 - 1024 blocks can be placed by indirect addressing.

What if the file is even longer? The next address in the pointer array points to the double indirection block. (double block indirect). This block contains a list of block addresses, which in turn contain lists of addresses of the next data blocks.

And finally, the last address in the pointer array specifies the address of a triple indirection block, that is, a block with a list of block addresses that are double indirection blocks.

So far, it remains unclear where the file name is located if it is neither among the file data nor among its metadata. In UNIX-like systems, the filename is not an attribute of the file itself, but of the file system, understood as a logical directory structure. The file name is stored only in the directory to which the file is assigned, and nowhere else. Curious consequences follow from this.

First, any number of names assigned to different directories can correspond to one inode, and all of them are real. The number of names (hard links) is counted in the inode. This is the amount you can see with the Is -1 command.

Second, deleting a file simply means deleting its entry from the directory data and decrementing the link count by 1.

Thirdly, a name can only be matched to an inode number within the same file system, which is why you cannot create a hard link to another file system (symbolic - you can, it has a different storage mechanism).

The directory itself is likewise assigned to its parent directory. The root directory is always written to inode number 2 (number 1 is reserved for the bad block address list). Each directory stores a link to itself and to its parent directory - these are the pseudo-subdirectories "." And "..".

Thus, the number of links to a directory is equal to the number of its subdirectories plus two.

The directory data is a linked list with variable length entries and looks something like this:

Directory structure in ext2fs

What about physical device files? They can be located in the same directories as regular files: there is no data in the directory that indicates whether the name belongs to a file on disk or a device. The difference is at the inode level. If the i-node of a regular file points to the disk blocks where its data is stored, then the i-node of the device file contains a pointer to the list of device drivers in the kernel - the element of the list that corresponds to the major device number:

Difference between regular file and device file

Properties of the ext2fs file system:

The maximum file system size is 4 TB.

The maximum file size is 2 GB.

The maximum length of a file name is 255 characters.

The minimum block size is 1024 bytes.

The number of allocated inodes is 1 per 4096 bytes of the partition.

VLADIMIR MESHKOV

ext2 file system architecture

The article discusses the logical structure of ext2, the file system of the Linux operating system.

Basic components of an ext2 file system

As in any UNIX file system, the following components can be distinguished in the ext2 file system:

blocks and groups of blocks;
information node (information node);
superblock.

Blocks and block groups

The entire disk partition space is divided into fixed-size blocks, multiples of the sector size - 1024, 2048 and 4096 bytes. The block size is specified when creating a file system on a hard disk partition. A smaller block size saves hard disk space, but also limits the maximum file system size. All blocks have serial numbers. In order to reduce fragmentation and the number of hard disk head movements when reading large data arrays, blocks are combined into groups.

Information node

The basic concept of a file system is an information node, information node, or inode. This is a special structure that contains information about the attributes and physical location of a file. File attributes are its type (regular file, directory, etc.), access rights to it, owner ID, size, creation time. The physical location information is a sequence of absolute block numbers containing file data.

Superblock

The superblock is the main element of the ext2 file system. It contains the following information about the file system (the list is not complete):

the total number of blocks and inodes in the file system;
the number of free blocks and inodes in the file system;
file system block size;
the number of blocks and inodes in the group;
inode size;
file system identifier;
number of the first data block.

In other words, this is the number of the block containing the superblock. This number is always 0 if the file system block size is greater than 1024 bytes, and 1 if the block size is 1024 bytes.

The health of the file system directly depends on the integrity of the superblock. The operating system creates several backup copies of the superblock so that it can be restored in case of damage. The master copy is located at offset 1024 bytes from the beginning of the partition on which the file system was created (the first 1024 bytes are reserved for the operating system loader).

Early versions of the ext2 filesystem created superblock copies at the beginning of each block group. This resulted in big losses. disk space, so later the number of superblock backups was reduced, and block groups 0, 1, 3, 5, and 7 were allocated to accommodate them.

Block group format

A generalized block diagram of the ext2 file system is shown in fig. 1.

Almost all block groups have the same format. In each group, in addition to information blocks, information about the occupancy of blocks and inodes of the group is stored in the form of a bitmap. Block group 0 also includes a super block and a group descriptor table, which we will discuss below.

The block busy bitmap is usually located in the first block of the group. If there is a backup superblock in the group, the bitmap is located in the second block of the group. The bitmap size is one block. Each bit of this map indicates the state of the block. If the bit is set (1), then the block is busy; if it is reset (0), the block is free. The first block of the group corresponds to the zero bit of the map, the second block corresponds to the first bit, and so on.

Inodes that are within the same group are collected in a table. In a group's inode occupancy bitmap, each bit represents the state of an element in the group's inode table.

Each block group is described by a block group descriptor. A group descriptor is a structure that contains information about the addresses of the block busy bitmap, busy inode bitmap, and inode table of the corresponding group. All group descriptors are collected in a group descriptor table, which is stored in block group 0. Same as for the superblock, operating system creates backups group descriptor tables.

File Reading Algorithm

Each inode, like a block, has a sequence number that is unique within the file system and contains information about only one file. Thus, in order to access the contents of a file, it is necessary to know the ordinal number of the inode corresponding to it.

As mentioned above, information about the physical location of the file is contained in the inode. This information is a sequence of 32-bit block numbers containing the file data (Figure 1). The first 12 numbers are direct links to information blocks (direct blocks number). The 13th number is an indirect block number. It contains the address of the block, which stores the addresses of information blocks. The 14th number is a double indirect link (double blocks number), the 15th number is a triple indirect link (triple blocks number).

The file name is not part of the inode, the mapping between file names and inode sequence numbers is done through directories.

Catalogs

Files on UNIX and POSIX systems are stored in a tree-like hierarchical file system. The root of the file system is the root directory, denoted by the "/" symbol. Each intermediate node in the file system tree is a directory. The leaf nodes of the filesystem tree are either empty directories or files. The absolute pathname of a file consists of the names of all directories leading to the specified file, starting from the root directory. For example, the pathname /home/test.file means that the file test.file is located in the home directory, which in turn is located in the root "/" directory.

A directory, like a file, is described with an inode. The contents of a directory is an array of entries, each containing information about a file that is "inside" the current directory.

The directory entry has the following format:

the inode number of the file;
record length in bytes;
file name;
filename length.

Searching for the inode number of a file always starts at the root directory. For example, to get the inode number of a file located in the root directory, the operating system must get the contents of the root directory, find an entry in it with the name of this file, and extract the file's inode number from this entry.

The first few inodes are reserved by the file system and are listed in the header file:

*Special inode numbers

#define EXT2_BAD_INO 1 /* Bad blocks inode */

#define EXT2_ROOT_IN 2 /* Root inode */

#define EXT2_ACL_IDX_IN 3 /* ACL inode */

#define EXT2_ACL_DATA_INO 4 /* ACL inode */

#define EXT2_BOOT_LOADER_INO 5 /* Boot loader inode */

#define EXT2_UNDEL_DIR_INO 6 /* Undelete directory inode */

Inode number 2 (root inode) is reserved for the root directory entry. This inode is in block group 0 and is the second position in the group's inode table. The number of the first unreserved inode is stored in the super block.

Having determined the sequence number of the file's inode, the kernel calculates the number of the group in which this inode is located and its position in the group's inode table. By reading from this inode position, the operating system obtains complete information about the file, including the addresses of the blocks in which the contents of the file are stored.

The block group number in which the inode is located is calculated by the formula:

group = (inode_num - 1) / inodes_per_group

Where:

group– desired block group number;
inode_num– sequence number of the inode that defines the file;
inodes_per_group– the number of inodes in the group (this information is in the superblock).

The position of the inode in the group's inode table is determined by the formula:

index = (inode_num - 1) % inodes_per_groupe

where index is the inode position in the table.

Consider an example of getting the contents of the test.file file located in the root directory. To read the /test.file file, you need to:

find an entry about this file in the array of entries in the root directory;
extract the sequence number of the inode of the file, calculate the number of the group in which this inode is located;
extract the group's inode table address from the group's descriptor;
calculate the inode position in this table;
read the inode of a file;
extract the addresses of information blocks from the inode and read the information contained in these blocks.

On fig. Figure 2 shows the steps for reading the /test.file in detail.

Steps 1-6 - reading the root directory:

From block group 0, the group descriptor table is read.
The block group descriptor 0 is retrieved from the group descriptor table and the address of the group 0 inode table is read from it.
The inode table is read from block group 0.
The root directory's inode number is fixed at 2, so the second element is read from the group 0 inode table, which contains the address of the block containing the contents of the root directory. Suppose this block is located in block group A.
From block group A, the block containing the root directory entries is read.
The entry named "test.file" is searched for. If such an entry is found, the inode number of "test.file" is retrieved from it.

By specifying the inode number, the information blocks of the file can be accessed (steps 7-11):

The number of the group the given inode is in and its position in the group's inode table are calculated (assuming the group number is B and the position in the table is X).
From the group descriptor table, we retrieve the block group B descriptor, and read the address of the inode table of this block group from it.
The inode table is read from block group B.
From the inode table of block group B, the inode at position X is read.
From the read inode, the block addresses with the contents of the /test.file file are extracted and information is read from the block with the specified address.

Software implementation of the file reading algorithm

Initial data: there is a hard disk partition on which the ext2 file system is created. This partition corresponds to the /dev/hda3 device file. The home subdirectory has been created in the root directory of the partition, and it contains the test.file file with the following content:

Would citrus live in the thickets of the south?

Yes, but a fake copy!

1234567890-=

Do not think bad, this is not nonsense, but a test exercise from the training course for telegraph operators in the signal troops of the former USSR!

Attention! One should take into account important point. The created file will not be immediately written to the disk, but will first go to the disk buffer. An attempt to immediately get the contents of the file using the above algorithm will not lead to anything, since there is no information about this file physically on the disk. You need to "force" the system to write the disk buffer to disk. The easiest way to do this is to perform a reboot operation. Therefore, after the file is created, reboot the system.

Our task is to use the /dev/hda3 device file to read the /home/test.file file by directly accessing its information blocks.

Consider the software implementation of the module that performs this operation.

Header files:

#include

The header file defines structural types that describe the main components of the ext2 file system - superblock, block group descriptor, information node, directory entry.

Let's briefly consider the fields that are included in each of these structures:

Superblock structure struct ext2_super_block:
- __u32 s_inodes_count is the total number of inodes in the file system;
- __u32 s_blocks_count is the total number of blocks in the file system;
- __u32 s_free_blocks_count– number of free blocks;
- __u32 s_free_inodes_count– number of free inodes;
- __u32 s_first_data_block– number of the first data block (number of the block in which the superblock is located);
- __u32 s_log_block_size- This value is used to calculate the block size. The block size is determined by the formula: block size = 1024<< s_log_block_size;
- __u32 s_blocks_per_group– the number of blocks in the group;
- __u32 s_inodes_per_group– number of inodes in the group;
- __u16 s_magic– identifier of the ext2 file system (signature 0xEF53);
- __u16 s_inode_size– size of information node (inode);
- __u32s_first_ino is the number of the first unreserved inode.
Block group descriptor structure struct ext2_group_desc:
- __u32 bg_block_bitmap– bitmap of group block occupancy;
- __u32 bg_inode_bitmap– group inode busy bitmap;
- __u32 bg_inode_table– the address of the group's inode table.
Information node structure struct ext2_inode:
- __u16 i_mode - file type and access rights to it. The file type is determined by bits 12-15 of this field:
  - 0xA000– symbolic link;
  - 0x8000– regular file;
  - 0x6000– block device file;
  - 0x4000– directory;
  - 0x2000– character device file;
  - 0x1000– FIFO channel.
- __u32 i_size– size in bytes;
- __u32 i_atime– time of the last access to the file;
- __u32 i_ctime– file creation time;
- __u32 i_mtime– time of last modification;
- __u32 i_blocks– the number of blocks occupied by the file;
- __u32 i_block– addresses of information blocks (including all indirect references).

The value of EXT2_N_BLOCKS is defined in the file:

* Constants relative to the data blocks

#define EXT2_NDIR_BLOCKS 12

#define EXT2_IND_BLOCK EXT2_NDIR_BLOCKS

#define EXT2_DIND_BLOCK (EXT2_IND_BLOCK + 1)

#define EXT2_TIND_BLOCK (EXT2_DIND_BLOCK + 1)

#define EXT2_N_BLOCKS(EXT2_TIND_BLOCK + 1)

Directory entry structure struct ext2_dir_entry_2:

#define EXT2_NAME_LEN 255

__u32 inode– file inode number;
__u16 rec_len– length of the directory entry;
__u8 name_len– length of the file name;
character name – file name.

Let's determine the name of the partition on which the file system was created, global structures and variables.

#define PART_NAME "/dev/hda3"

struct ext2_super_block sb;

/* buffer to store group descriptor table */

unsigned char buff_grp;

unsigned char buff; /* information buffer */

int indev; /* device file descriptor */

int BLKSIZE; /* filesystem block size */

Let's define a few functions that we need to work:

Superblock read function:

void read_sb()

memset(&sb,0,1024);

We shift 1024 bytes from the beginning of the section and read the superblock into the struct ext2_super_block sb structure:

If(lseek(indev,1024,0)< 0) {

error("lseek");

Exit(-1);

If(read(indev,(char *)&sb,sizeof(sb))< 0) {

error("read");

Exit(-1);

Checking the file system ID:

If(sb.s_magic != EXT2_SUPER_MAGIC) (

Printf("Unknown file system type!");

Exit(-1);

The EXT2_SUPER_MAGIC value is defined in the header file.

We display information about the file system that is in the superblock:

printf(" Superblock info ----------- ");

Printf("Inodes count - %u ",sb.s_inodes_count);

Printf("Blocks count - %u ",sb.s_blocks_count);

Printf("Block size - %u ",1024<< sb.s_log_block_size);

Printf("First inode - %d ",sb.s_first_ino);

Printf("Magic - 0x%X ",sb.s_magic);

Printf("Inode size - %d ",sb.s_inode_size);

Printf("Inodes per group - %u ",sb.s_inodes_per_group);

Printf("Blosks per group - %u ",sb.s_blocks_per_group);

Printf("First data block - %u ",sb.s_first_data_block);

return;

Group descriptor table read function:

void read_gdt()

Calculate the block size of the file system:

BLKSIZE = 1024<< sb.s_log_block_size

The group descriptor table is located in the block immediately following the first data block (behind the superblock).

Reading the table:

If(lseek(indev, (sb.s_first_data_block + 1) * BLKSIZE, 0)< 0) {

error("lseek");

Exit(-1);

If(read(indev,buff_grp,BLKSIZE)< 0) {

error("read");

Exit(-1);

return;

The function to get the contents of an inode by its number:

void get_inode(int inode_num, struct ext2_inode *in)

The input parameters of the function are the inode number and the struct ext2_inode structure.

Struct ext2_group_desc gd;

U64 group, index, pos;

We calculate the number of the group of blocks in which the inode with the ordinal number inode_num is located:

Group = (inode_num - 1) / sb.s_inodes_per_group;

Extract the group descriptor from the group descriptor table and copy it to the struct ext2_group_desc gd structure:

Memset((void *)&gd, 0, sizeof(gd));

Memcpy((void *)&gd, buff_grp + (group * (sizeof(gd))), sizeof(gd));

We calculate the position of the inode with the ordinal number inode_num in the inode table of the group group and read this inode into the struct ext2_inode structure:

index = (inode_num - 1) % sb.s_inodes_per_group;

Pos = ((__u64)gd.bg_inode_table) * BLKSIZE + (index * sb.s_inode_size);

Preread64(indev, in, sb.s_inode_size, pos);

return;

Data Block Read Function:

void read_iblock(struct ext2_inode *in, int blk_num)

U64pos;

The input parameters of the function are the inode structure and the block number (meaning the number from the sequence of address blocks located in the inode).

We calculate the offset to the information block on the partition and read this block into the global buff buffer:

Pos = ((__u64)in->i_block) * BLKSIZE;

Preread64(indev, buff, BLKSIZE, pos);

return;

The function to get the contents of the root directory:

void get_root_dentry()

Struct ext2_inode in;

The root directory inode number is known, so we get the contents of the root directory inode and read its contents into the buff buffer:

get_inode(EXT2_ROOT_INO, &in);

Read_iblock(&in, 0);

The buff buffer will contain the contents of the root directory.

return;

Function to get inode number from file name:

int get_i_num(char *name)

The input parameters of the function are the file name. The return value is the inode number of the file.

int i = 0, rec_len = 0;

Struct ext2_dir_entry_2dent;

The buff buffer contains an array of directory entries. To determine the inode number of a file, you need to find an entry in this array with the name of this file:

For(;i< 700; i++) {

Memcpy((void *)&dent, (buff + rec_len), sizeof(dent));

If(!memcmp(dent.name, name, dent.name_len)) break;

Rec_len += dent.rec_len;

Return dent.inode;

Now let's write the main function:

int main()

Variables and structures:

struct ext2_inode in;

// absolute pathname of the file

Unsigned char *full_path = "/home/test.file";

unsigned char buff1;

Static int i = 1;

int n, i_num, outf, type;

The first character in an absolute pathname of a file must be a forward slash (/). Let's check it out:

If(full_path != "/") (

error("slash");

Exit(-1);

Open the device file, read the superblock and group descriptor table:

indev = open(PART_NAME,O_RDONLY);

if(indev< 0) {

error("open");

Exit(-1);

Read_sb();

Read_gdt();

Get the contents of the root directory:

get_root_dentry();

The buff buffer now contains all entries in the root directory (you can save them in a separate file if you like). Now that we have root directory entries, we can get to the contents of test.file using the file reading algorithm above. To this end, we organize a cycle. In the body of the loop, we will parse the absolute path name of the file, highlighting its elements - subdirectories (we have one, home) and the name of the desired file (test.file). For each element, we determine the ordinal number of the inode, read this inode and then get the contents of the zero block (from the sequence of address blocks located in the inode):

while(1) (

memset(buff1,0,sizeof(buff1));

For(n = 0 ; n< EXT2_NAME_LEN; n++, i++) {

Buff1[n] = full_path[i];

If((buff1[n] == "/") || (buff1[n] == "?")) (

I++;

break;

buff1[n] = "?";

For each element of the absolute pathname of the file, we determine the sequence number of the inode, read this inode into memory, and then get the contents of the zero block:

I_num = get_i_num(buff1);

Get_inode(i_num, &in);

Read_iblock(&in, 0);

Let's display information about the file (name, inode number, file size and type):

Printf("Inode number - %u ", i_num);

Printf("File name - %s ", buff1);

Printf("File size - %u ",in.i_size);

The file type is determined by the upper four bits of the i_mode field of the struct ext2_inode structure:

type = ((in.i_mode & 0xF000) >> 12);

Printf("Type - %d ",type);

switch(type) (

Case(0x04) :

Printf("(directory)");

break;

Case(0x08) :

Printf("(regular file)");

break;

Case(0x06) :

Printf("(block device file)");

break;

Case(0x02) :

Printf("(char device file) ");

break;

default:

Printf("(unknown type)");

break;

Checking the file type. If this is a regular file, we break the loop:

If(type & 0x08) (

The buff buffer will contain the information read from the information blocks of the /home/test.file file. Let's write this information to a file:

Outf = open("out",O_CREAT|O_RDWR,0600);

Write(outf, buff, sizeof(buff));

close(outf);

break;

Exit:

close(indev);

return 0;

This concludes our consideration of the logical structure of the ext2 file system.

We will now describe the most popular Linux disk file system, ext2. The first release of Linux used the MINIX 1 file system, which had short file names and a maximum file size of 64 MB. The MINIX 1 file system was eventually replaced by the first extended file system, ext, which allowed for longer file names and larger file sizes. Due to its low efficiency (in terms of performance), the ext system was replaced by its successor ext2, which is still widely used today.

The disk partition with ext2 contains the file system shown in fig. 10.17 layout. Block 0 is not used by the Linux system and contains the computer's boot code. Following block 0, the disk partition is divided into groups of blocks (ignoring disk cylinder boundaries). Each group is organized as follows.

The first block is a superblock, which stores information about the layout of the file system, including the number of i-nodes, the number of disk blocks, the beginning of the list of free disk blocks (usually several hundred elements). This is followed by a group descriptor containing information about the location of the bitmaps, the number of free blocks and i-nodes in the group, and the number of directories in the group. This information is important because the ext2 filesystem attempts to distribute directories evenly across the disk.

The two bitmaps keep track of free blocks and free i-nodes (this is also inherited from the MINIX 1 file system and distinguishes it from most UNIX file systems, which use a list for free blocks). The size of each bitmap is one block. With a block size of 1 KB, this scheme limits the block group size to 8192 blocks and 8192 i-nodes. The first number is a real limit, and the second is practically none. With 4 KB blocks, the numbers are four times larger.

Then the i-nodes themselves are located. They are numbered from 1 to some maximum. Each i-node is 128 bytes in size and describes exactly one file. The i-node contains accounting information (including everything returned by the stat call, which simply takes it from the i-node), as well as enough information to locate all disk blocks that contain file data.

The i-nodes are followed by data blocks. All files and directories are stored here. If a file or directory consists of more than one block, then those blocks need not be contiguous on disk. In fact, the blocks big file are likely to be scattered all over the disk.

The i-nodes corresponding to the directories are scattered across all disk block groups. Ext2 tries to place regular files in the same block group as the parent directory, and data files in the same block as the i-node of the source file (assuming there is enough space there). This idea was borrowed from the Berkeley Fast File System (McKusick et al., 1984). Bitmaps are used to receive quick fixes regarding the selection

places for new file system data.

When new file blocks are allocated, ext2 also preallocates a few (eight) additional blocks for the same file (to minimize file fragmentation due to future writes). This scheme spreads the file system across the disk. It also has good performance (due to its contiguous nature and reduced fragmentation).

To access a file, you must first use one of the Linux system calls (such as open), which requires you to specify the path to the file. This path is parsed and its constituent directories are extracted from it. If a relative path is specified, then the search starts from the current directory of the process, otherwise, from the root directory. In any case, the i-node for the first directory is easy to find: there is a pointer to it in the process descriptor, or (in the case of the root directory) it is stored in a specific block on disk.

The directory allows filenames up to 255 characters long (Figure 10.18). Each directory consists of a number of disk blocks (so that the directory can be written to disk atomically). In a directory, the elements for files and directories are in unsorted order (each element immediately follows the previous one). Elements cannot cross block boundaries, so there is usually some amount of unused bytes at the end of each disk block.

Each directory entry in Fig. 10.18 consists of four fixed-length fields and one variable-length field. The first field is the i-node number, which is 19 for the colossal file, 42 for the voluminous file, and 88 for the bigdir directory. Next comes the rec_len field, which gives the size of the entire directory entry in bytes (possibly with additional padding bytes after the name). This field is needed to find the next entry (in the case where the file name is padded with an unknown number of bytes). In the figure, this field is indicated by an arrow. Then there is a field of type file, directory, etc. The last fixed-length field contains the length of the file name in bytes (8, 10 and 6 for this example). Finally, there is the filename itself, terminated by a null byte and padded to a 32-bit boundary. It may be followed by additional padding bytes.

On fig. 10.18b shows the same directory after the element for voluminous has been removed. All that is done in the directory is to increase the number in the record size field. previous file colossal, and the directory entry bytes for remote file voluminous turn into placeholders for the first record. Subsequently, these bytes can be used for writing when creating a new file.

Since directories are searched linearly, it can take a long time to find an entry that is at the end of a large directory. Therefore, the system maintains a cache of recently accessed directories. The cache is searched by file name, and if it is found, then the expensive linear search is no longer needed. A dentry object is entered into the directory entry cache for each of the path components, and (via its i-node) the directory is searched for subsequent path entries (until the actual i-node of the file is found).

For example, to find a file specified by an absolute path (such as /usr/ast/file), the following steps must be followed. First of all, the system finds the root directory that i-node number 2 normally uses (especially when i-node number 1 is reserved for bad blocks). It places the appropriate entry into the directory entry cache (for future searches of the root directory). It then searches the root directory for the string "usr" to obtain the i-node number for the /usr directory (which is also entered into the directory's item cache). This i-node is then read and the disk blocks are extracted from it, so that the /usr directory can be read and searched for the string "ast". Once a matching entry has been found, the i-node number for the /usr/ast directory can be determined from it. Given this i-node number, it can be read and directory blocks found. Finally, we search for "file" and find its i-node number. Thus, using a relative path is not only more convenient for the user, but also reduces the amount of work for the system.

If the file is present, the system retrieves the i-node number and uses it as an index of the i-node table (on disk) to look up the corresponding i-node and read it into memory. This i-node is placed in the i-node table, a kernel data structure that contains all i-nodes for open this moment files and directories. The i-node element format must contain (at a minimum) all the fields returned by the stat system call for the stat call to work (see Table 10.10). In table. Figure 10-13 shows some of the fields in the i-node structure supported by the Linux file system. The actual i-node structure contains many more fields, since the same structure is used to represent directories, devices, and other special files. The i-node structure also contains fields reserved for future use. History has shown that unused bits don't stay idle for long.

Now let's see how the system reads the file. Remember that a typical library procedure call to run the read system call looks like this:

n = read(fd, buffer, nbytes);

When the kernel takes control, all it can start with are these three parameters and the information in its internal tables (pertaining to the user). One of the elements of these internal tables is an array of file descriptors. It is indexed by file descriptors and contains one element per open file (up to some maximum number, usually 32 by default).

The idea is to start at this file descriptor and end with the corresponding bnode. Let's look at one possible scheme: put a pointer to a node in a table of file descriptors. Despite the simplicity this method(unfortunately) doesn't work. The problem is this. Each file descriptor must have an associated file pointer that specifies the byte in the file where the next read or write operation will begin. Where should this pointer be stored? One option is to place it in the node table. However, this approach will not work if several unrelated processes open the same file at the same time, because each process must have its own pointer.

The second solution is to place a pointer in the file descriptor table. Each process that opens a file has its own position in the file. Unfortunately, this scheme also does not work, but the reason for the failure is this case not so obvious and related to nature sharing files on a Linux system. Consider shell script 5, which consists of two commands (p1 and p2) to be executed in turn. If the script is called from the command line

then p1 is expected to write its output to file x, and then p2 will also write its output to file x, starting where p1 left off.

When the shell starts p1, file x will be empty at first, so p1 will simply start writing to the file at position 0. However, when p1 is done, some mechanism is needed to ensure that p2 sees the start position not 0 (but this is exactly what happens if the position in the file is stored in the file descriptor table), but the value at which pi stopped.

How this is done is shown in Fig. 10.19. The trick is to introduce a new table - the description table open files(open file description table) - between the file descriptor table and the i-node table and store the file pointer (as well as the read/write bit) in it. In the figure, the parent process is the shell, and the child is first the pi process and then the p2 process. When the shell creates the pi process, its user structure (including the file descriptor table) is an exact copy of the same shell structure, so they both contain pointers to the same open file description table. When the pi process exits, the shell's file descriptor continues to point to the open file description table, which contains the p1 process's position in the file. When the shell now creates the p2 process, the new child process automatically inherits the position in the file, without new process, nor is the shell required to know the current value of that position.

If some extraneous process opens the file, then it will get its own entry in the open file description table with its position in the file, which is exactly what is needed. Thus, the purpose of the open file description table is to allow parent and child processes to share a single pointer in a file, but to allocate personal pointers for foreign processes.

So (returning to the problem of performing a read), we have shown how the position in the file and the i-node are determined. The I-node contains the disk addresses of the first 12 blocks of the file. If the position in the file falls within its first 12 blocks, then it reads desired block file and data are copied to the user. For files longer than 12 blocks, the i-node contains the disk address of a single indirect block (Figure 10.19). This block contains the disk addresses of additional disk blocks. For example, if the block size is 1 KB and the disk address is 4 bytes, then a single indirect block can store up to 256 disk addresses. This scheme allows you to support files up to 268 KB in size.

Thematic materials: