chkdsk is a utility from Microsoft that tries to fix a volume’s file system without losing data. It checks for both logical and physical errors. chkdsk usually checks the file system metadata information first and applies file system repair functions if necessary. At Datto, we mostly deal with NTFS, so in this post, I'll only discuss the NTFS file system. So let’s begin with some basics about NTFS.
NTFS was started as a joint venture between Microsoft and IBM to create a new file system. They initially came up with the High Performance File System (HPFS), which introduced the idea of extents and using B+Trees, and later led to the implementation of NTFS. IBM was the major contributor as far as design was concerned. They came up with the B+Tree based structure, but then left the project due to some conflicts with Microsoft. Despite IBM’s exit, Microsoft continued the research and created the NT file system (NTFS). NTFS was made for a combination of robustness and performance, which was lacking in the FAT file system.
In NTFS, everything is a file. It contains a Boot Sector, which is the zero sector of the volume, followed by some system files (aka metadata files) and user data files.
Some of the major metadata files are the Boot Sector ($BOOT), Master File Table ($MFT), Master File Table Mirror ($MFTMirr), Root Directory ($DOT) and Bitmap ($BITMAP). There are a total of 26 system files including reserved FILE records.
$BOOT is always at the zero sector of the volume. A duplicate copy of
$BOOT is also available at the last sector of the volume. Most importantly,
$BOOT contains information on where exactly the
$MFT is located.
$MFT is a record keeping structure for all the files on the NTFS volume, including
$MFT itself and $BOOT. When an NTFS volume is formatted, typically 37% of the volume is reserved for the
$MFT, which can be increased or reduced based on how many more records are required or how big an individual file can be. The default reservation for
$MFT is made such that the file system driver doesn’t need to reserve space for it, unless required. For a huge file that uses 90% of a volume’s space, the
$MFT size is reduced by the NTFS driver, whereas for smaller files that fit in the
$MFT records, the NTFS driver increases the space for the
$MFT in order to keep all the records in the
Each file system record may point to one or many data runs (also called extents) which contain the actual data of a file. Normally the
$MFT and other records have a single data run, unless there is fragmentation.
Each record of the
$MFT is allocated based on the allocation size when NTFS is formatted, which is typically 512 bytes. Modern operating systems use 4096 bytes as a default value for the allocation unit. A typical
$MFT record will have standard information, which is mostly timestamps for a record and flags. It also has an attribute list for the record which is a variable length list of attributes that can be added to a file, name information, security descriptor and data.
$MFTMirr is a copy of the first four records of
$MFT, in case
$MFT is moved somewhere or lost. This is important for chkdsk. We will discuss later on how this is helpful.
$DOT is a B+Tree structure that keeps all information about the directory structure starting from the root of the volume. Each entry in this structure is known as INDEX_ENTRY. The
$DOT is a special INDEX that points to itself as well, none other entries points to themselves. Each INDEX entry can be resident or non-resident. $INDEX_ROOT is always resident, where as $INDEX_ALLOCATION is non-resident. Each record in either of
$INDEX_ALLOCATION is known as INDX
How chkdsk fixes NTFS volumes
There are two major stages in the chkdsk process. First, a file system analysis is performed. If that finds any issue with the file system, the second step is a file system repair. Both of these steps do the same job of going through the meta-data and finding issues, one only reports if there are issues, whereas the other one tries to fix those issues.
As discussed earlier, everything is a file on NTFS. The analysis starts with processing the file system
$BOOT file, which gives information if the file system is NTFS, any other file system, or a raw volume. The
$BOOT file contains the magic identifier
NTFS ) that determines if the file system is NTFS. There is important information in the
$BOOT file, such as Bytes Per Sector, Sectors Per Cluster, Sectors Per Track, Number of Headers, Hidden Sectors, Total Sectors, Cluster Per File Record, and Cluster Per Index. These fields are required in order to track the metadata on the volume itself. chkdsk then finds the
$MFT file itself from the
$MFT Logical Cluster Number (LCN). Additionally, the
$MFTMirr LCN will point chkdsk to the
Here’s what the
$BOOT record looks like in the Active Disk Editor tool:
If chkdsk can’t find the valid FILE records for
$MFTMirr from their respective LCNs fetched from
$BOOT (i.e. the
$BOOT record is corrupt), then there is always a second copy of
$BOOT available on the last sector of the volume. So how would we find the last sector of the volume if the
$BOOT file is corrupt? We can find that information from partition table, which could either be GPT or MBR.
If the information from
$BOOT isn’t correct, it is corrected by fetching the information from the second copy of
$BOOT, assuming the copy is correct.
$BOOT record is traversed and/or fixed, the
$MFTMirr records are compared. If they are not the same, it is reported and then fixed. The fix could come from either
$MFTMirr, depending on which one has logical or physical errors. If
$MFT is faulty,
$MFTMirr will be used to fix it, and vice versa.
This operation fixes the mount process: Once the
$MFT is correct, the device can be mounted without any problems.
Bad sectors are physical damage on the hard disk, which can lead to invalid I/O on the volume. If a sector is malfunctioning and NTFS detects it, NTFS will keep a record of that in the $BadClus (bad cluster) file, so that no I/O is directed towards that sector. Mostly bad sectors never go away, and they will increase with time. But for a while, NTFS can give you the facility to keep your data in the correct physical locations, so you can use it.
If NTFS chkdsk can’t find the
$BadClus file, it would be considered as a failure in the file system. chkdsk is going to recreate the
$BadClus file, and will try to look for the bad sectors on the volume, once the $BadClus file is recreated.
Next comes the $JOURNAL file also known as $LogFile. This is the transaction log file that contains all the transactions that happened on the NTFS volume in a circular structure. Every transaction on the file system in this file is marked as “OK” almost after every two seconds. If machine failure or a power failure occurs. Some of transactions will not be marked as complete. chkdsk will read this file and find the incomplete transactions, will fix them by either undoing the transactions or complete the transaction in some cases. If chkdsk have problems opening or reading
$JOURNAL file, it will simply recreate the file in order to make NTFS driver happy.
Each file on an NTFS volume is associated with a list of attributes. The attributes of a file can be its name, data, security information, etc. Each attribute is given a specific identifier and it may have a name as well. $AttrDef contains a list of all the available attribute names, their respective identifiers and their description. If chkdsk detects that attributes are invalid/faulty, it regenerates them by adding default attributes.
ROOT Directory ($DOT/$ROOT)
Once the attribute definition is fixed, chkdsk moves to traverse the B+Tree structure of the directory, starting with the root directory
$DOT, a special system file with inode number 5. If chkdsk is not able to open the
$DOT file, it will recreate it, and then traverse the
$MFT records to see which files were part of the root directory (
$DOT file) in order to put their links back. The same process is performed on its subtrees.
$DOT is all good, chkdsk opens its left and right subtree recursively and starts looking for available items in each of them. Each item has parent information, whereas the same information can also be found in the
$MFT record of the same file. If a directory is found to be corrupt it is regenerated. Each tree entry consists of standard information, name, security info, index root (
$MFT resident entry of an item), index allocation (non resident or out of
$MFT record entry of an item) and associated bitmap.
Traversing and fixing the B+Tree structure of the root directory is the most time consuming process. chkdsk has to traverse resident and non resident data. It also has to fetch information from the
$MFT records in order to verify if the file in a directory structure are correctly placed, or if they belong to some other directory. Missing files are added to a directory, incorrect children are placed in the correct directory, and child entries that appear in an incorrect order are also fixed as part of this process.
Once the B+Tree is fixed, chkdsk traverses all the
$MFT records, opens them to read all their attributes,finds their parent information and tries to find missing references. Some of the files will be able to be re-associated with their parent directory. However, if a file’s parent is no longer available, that file is considered orphaned and will be moved to a folder created in the root of the volume. chkdsk names them uniquely with some information on the file name, so that the user can check the files manually and decide what to do with them.
During this process, chkdsk also deletes invalid records if they were already deleted files, or if information isn’t valid.
Cluster Allocation, BITMAP
Once the directory structure is fixed, orphaned files are taken care of. This guarantees that the B+Tree structure and
$MFT records are correctly pointing to each other. The next step is for chkdsk to look at the $Bitmap of the volume, and create a bitmap from all file entries for comparison, and of course fix issues if encountered.
For this step, chkdsk first reads the data attribute of
$Bitmap file, then walks through all the files in the
$MFT and traverses their attributes to create a bitmap-like structure. When that’s done, it compares the bitmaps. If they do not match, chkdsk writes the $Bitmap it just created rather than relying on the $Bitmap data stream itself, as now the file system is changed and the original
$Bitmap could be incorrect.
Cross linked clusters
Now the bitmap is fixed. We might still have cross-linked clusters, which are clusters that are pointed to by two different files or entries. Suppose an INDEX is pointing to a cluster that is already marked as used. In this case, chkdsk will throw out all the entries of the index and recreate it, which will assign new bits in the BITMAP for the entries that were pointing to clusters already being used.
Once cluster allocations are fixed, chkdsk opens the bitmap associated with
$MFT file itself in order to find all active records. It does that by reading all records that correspond to bits set to 1 in the bitmap. By opening each record and checking if it exists, chkdsk can reset all the bits to the correct state. This corrects any errors in the
$MFT bitmap, which is then re-written to disk if there were changes.
Remove DIRTY flag
Once everything is fixed, chkdsk re-runs the whole process to make sure there are no further issues. If not, it will unset the DIRTY flag on the volume and exit successfully.
In this post, I briefly explained the fundamentals of NTFS, and explained how Microsoft’s chkdsk can be used to fix problems with corrupt NTFS volumes. I walked through the individual steps that chkdsk takes to ensure consistency across the
$MFT, the B+Tree structure and the various bitmaps.
Of course, Microsoft chkdsk is a closed system, so naturally this is just a rough idea on how chkdsk works. However, I hope it helped in understanding it a little more.