Sunday, September 20, 2015

Who's your Master? : MFT Parsers Reviewed

The Master File Table (MFT) contains the information related to folders and files on an NTFS system. Brian Carrier (2005) stated “The Master File Table is the heart of NTFS because it contains the information about all files and directories” (p. 274) Many of the forensics tools such as EnCase, FTK and X-Ways parse the MFT to display the file and folder structure to the user.

During Incident Response, there could be hundreds if not thousands of computers to examine. A way to quickly review these systems for Indicators of Compromise (IOCs) is to grab the MFT file rather than take a full disk image. The MFT file is much smaller in size than a disk image and can be parsed to show existing as well as deleted files on a system.

During a case, I noted some anomalies with a tool that I use to accomplish this task, AnalyzeMFT. This led me to do some testing and verification of several MFT parsers – and I was a little surprised with the results. Foremost, I would like to say that I am appreciative to all the authors of these tools.  My intent with this post is to draw attention to understanding the outputs of these tools so that the examiner can correctly interpret the results.

Many of the differences and issues arose due the handling of deleted files. The documentation of one of the tools I tested, MFTDump, explains the issues with deleted files in the MFT:
"Since MFTDump only has access to the $MFT file, it is not possible to “chase‟ down deleted files through the $INDEX_ALLOC structures to determine if the file is an orphan. Instead, the tool uses the resident $FILE_NAME attribute to determine its parent folder, and follows the folder path to the root folder. In the case of deleted files, this information may or may not be accurate. To determine the exact status of a deleted file, you need to analyze the file system in a forensic tool."
Some of the tools did not notify the examiner that the file path associated with the deleted file may be incorrect  – which could lead to some false conclusions.

There are a lot of tools that parse the MFT. For this testing, I focused on tools that are free, command line and output the results into Bodyfile format. The reason I chose to do this is that when I parse the MFT, I am using it to create a timeline, usually in an automated fashion. The one exception to this was the tool MFTDump.  The output was a TSV file that I wrote a parser for that converted it into Bodyfile format.

There were four “things” that I was checking each tool for:  File Size, Deleted Files, Deleted File Paths and Speed. This criteria may not be important to everyone, but I’ll explain why these are important to me.
  1. File Size
    When looking for IOC’s, file size can be used to distinguish a legitimate file from malware that has the same name.  It could also be used in lieu of file hashes. Instead of hashing every file on the computer which can be time consuming, the hashed file's size can be used to do a comparison of the MFT file sizes to flag suspect files (thanks to @rdormi for that idea)
  2. Deleted Files
    MFT records can contain deleted file information. Does the output show deleted files? In some cases the attacker’s tools and malware have been removed from the system, so being able to see deleted files is nice.
  3. Deleted File Paths
    Is the tool able to resolve and display any portion of the previous file path for the deleted file? Knowing the parent path helps give context to the file. For example, it may be located under a user account, or a suspicious location, like a temp folder.
  4. Speed
    If I am processing thousands of machines, I need a tool that will parse the MFT relatively quickly. 10 minutes per machine or 1 hour per machine can make a big difference.

Findings

The tools I tested were AnalyzeMFT, log2timeline.pl, list-mft and MFTDump. Below is a summary of the findings. Further below, I explain the results in more detail, along with some sample data.

AnalyzeMFT
  1. Many files, both deleted and existing, show an incorrect file size of 0
  2. Deleted files were not designated as deleted in the output
  3. Deleted files where prepended with incorrect file paths
  4. Time to parse MFT: 11 minutes
 List-mft
  1.  File sizes were shown in the output
  2.  Deleted files were designated as deleted
  3.  No file paths were shown for deleted files
  4. Time to parse MFT: 1 hour, 49 minutes
Log2timeline.pl 
  1. No file sizes were shown in the output
  2. Deleted files were designated as deleted
  3. Deleted files were shown with correct file paths
  4. Time to parse MFT: 39 minutes
MFTDump 
  1.  File sizes were shown in the output
  2. Deleted files were designated as deleted
  3. Deleted files were enclosed  with ‘?’ to alert the examiner that file paths may be inaccurate
  4. Time to parse MFT: 7 minutes
Please note, I did not cross reference and verify every single file in the output. The observations made above were for the files that I reviewed.

What does this mean, or why are these results important?

No file size reported
The file size can help give context to a file. Having the file size can help determine if a file is suspect or not. If no file size is provided, this context is lost.

'0’ File size reported
The incorrect file size of ‘0’ can be misleading to an investigator. Take into consideration a RAM scraper output file. If an examiner is checking various systems and they see a file size of ‘0’, they might think the file is empty, when in fact, it could have thousands of credit card numbers written to it.

Files are not being reported/noted as deleted
Since there is no designation that the file is deleted, malware might appear to exist on a system, when in fact, it has been deleted. A suspect may have deleted a file and it is still showing as active in the output.

Deleted files are being associated with the wrong parent path
As noted above, due to issues with looking up the parent folder for deleted files, incorrect file paths were found to be prepended to deleted files. Even though a portion of the path may be correct, the prepended path could cause the examiner to draw an incorrect conclusion.

For example, many times a malware file will have a legitimate windows system name, such as svchost.exe. What flags the file as suspicious is where it was/is located. If the parent path is reportedly incorrectly, a malicious file may be missed. Or, a file may my attributed to an incorrect user account because the path is listed incorrectly.

Conclusion

Based on my testing and criteria, MFTDump seems to be the best fit for my process. It contains the file sizes, and designates between an active file and a deleted file. In the event that it recovers a file path for a deleted file, it lets the examiner know that it might be inaccurate by making a notation in the output.  If any important files are found using any of these tools, it would be prudent for the examiner to verify with a full disk image.

Sample Test Data

Below, I show some examples from the output for each tool. Although I did some testing and verification, it is up to each examiner to test their tools – I accept no liability or responsibility for using these tools and relying on my results. For demonstrative purposes only. :)

I used FLS from the Sleuthkit and X-Ways to check a deleted file. I then compared how this deleted file was handled with the different tools. I also used Harlan Carvey’s tools (bodyfile.exe and parse.exe) to convert the bodyfile generated by the tool into TLN format for readability.

The deleted file I reviewed was “048002.jpg”.  The path was shown as C:/$OrphanFiles/Pornography/048002.jpg (deleted) in both FLS and X-Ways.

Each of the outputs were grepped for the file 048002.jpg, and the entries located are displayed below in TLN format. I omitted the "Type" (File), "Host" (Computer1) and "User" (blank) columns in order to better display the results.

I have also included how long each process took. The system I used was Windows 7 with an Intel i7 and 16GB of RAM. The size of the MFT was about 1.8GB (which is much larger then most systems I process)
  
FLS Output
fls -m C: -f ntfs -r \\.\[Mounted Drive] >> C:\path\to\bodyfile

Date Description
2076-11-29 08:54:34 MA.B [4995] C:/$OrphanFiles/Pornography/048002.jpg (deleted)
2014-01-11 01:25:45 ..C. [4995] C:/$OrphanFiles/Pornography/048002.jpg (deleted)
2013-10-28 20:38:37 MACB [124] C:/$OrphanFiles/Pornography/048002.jpg ($FILE_NAME) (deleted)

FLS was used as the baseline for the test, and the output was verified with X-Ways. It shows the file as a deleted Orphan file, with a partial recovered directly listing of "/Pornography/048002.jpg". According to The Sleuthkit documentation on orphan files:
"Orphan files are deleted files that still have file metadata in the file system, but that cannot be accessed from the root directory."
Fls took about 20 minutes to run accross the mounted image.

AnalyzeMFT Output
analyzeMFT.py -f "C:\path\to\$MFT" -b "C:\path\to\output\bodyfile.txt" --bodyfull -p

Date Description
2013-10-28 20:38:37 MACB [0] /Users/SpeedRacer/AppData/Roaming/Scooter Software/Beyond Compare 3/BCState.xml/Pornography/048002.jpg

AnalyzeMFT showed 0 for the file size. It had no designation in the output that flags if the file is deleted or active. Although it was able to recover the deleted file path "/Pornography/", it prepended the file path with a folder that currently exists on the system rather then identify it as an Orphan file.

This makes it appear to the examiner that this is an active file, under the location "Users/SpeedRacer/AppData/Roaming/Scooter Software/Beyond Compare 3/BCState.xml/Pornography",  when in fact, it is a deleted Orphan file.

During my review of the outputs, I noticed quite a few files were showing an incorrect file size of '0', including active files.  In the review of the open issues on github, these issues appear to have been noted.

I also ran AnalyzeMFT with the default output, a csv file. In this output, the file did have a flag designating it as deleted, however, the bodyfile format does not.

Log2Timeline.pl Output
log2timeline -z local -f mft -o tln -w /path/to/bodyfile.txt


Date Description
2014-01-11 01:25:45 FILE,-,-,[$SI ..C.] /Pornography/048002.jpg (deleted)|UTC|  inode:781789

The “old” version of log2timeline has an –f  mft option that parses an MFT file into bodyfile format. The “new” version of log2timeline with Plaso does not have the option to parse the MFT separately (at least I coudnt find it.). log2timeline.pl was run from a SIFT Virtual Machine. For the VM, I gave the VM about 11GB of RAM, and 6 CPUs. With this setup, it took about 39 minutes to parse the MFT.

No file size was provided in the log2timeline for any files. The file is flagged as deleted, and includes the correct partial recovered path /Pornography/". Out off all the MFT tools I tested, this one most accurately depicts the deleted file path. However, it's interesting to note that it did not include the FileName attribute.

list-mft Output
list-mft.py "C:\path\to\$MFT" >> "C:\path\to\output\bodyfile.txt"

Date Description
2014-01-11 01:25:45 ,..C. [4995] \\$ORPHAN\048002.jpg (inactive)
2013-10-28 20:38:37 ,MACB [4995] \\$ORPHAN\048002.jpg (filename, inactive)

list-mft provided the file size, and a designation that the file was deleted (inactive). It also identified the file as an Orphan, however, it did not recover the partial path of /Pornography/. This may be important as the partial path can help provide context for the deleted file.

This program took the longest to run at 1 hour and 49 minutes. There is a -c, cache option that can be configured. This can be increased for better performance, however, I just used the default settings.

MFTDump Output
mftdump.exe "C:\path\to\$MFT" /o "C:\path\to\output\mftdump-output.txt"

Date Description
2076-11-29 08:54:34 MA.B [4995] ?\Users\SpeedRacer\AppData\Roaming\Scooter Software\Beyond Compare 3\BCState.xml\Pornography\048002.jpg?(DELETED)
2014-01-11 01:25:45 ..C. [4995] ?\Users\SpeedRacer\AppData\Roaming\Scooter Software\Beyond Compare 3\BCState.xml\Pornography\048002.jpg?(DELETED)
2013-10-28 20:38:37 MACB [4995] ?\Users\SpeedRacer\AppData\Roaming\Scooter Software\Beyond Compare 3\BCState.xml\Pornography\048002.jpg? (DELETED)(FILENAME)

The file sizes are displayed, and a designation is included showing that the file has been deleted. Deleted files were enclosed  with ‘?’ to alert the examiner that file paths may be incorrect. This tool ran the fastest, clocking 7 minutes for a 1.8 GB MFT file. The output from this tool as a TSV file. I wrote a python script to parse it into bodyfile format.

To keep this post relativity short, I just demonstrated the output for one file, however, I used the same process on several files and the results were consistent. Whatever tool an examiner chooses to use will depend on their particular needs. For example, an examiner may not be interested in file sizes, and in this case they may choose to use log2timeline.  However, if speed is an issue, MFTDump might make more sense. As long as the examiner knows what information the output is portraying, and can verify the results independently, any of these tools can get the job done.

Carrier, B. (2005). File System Forensic Analysis. Upper Saddle River, NJ: Pearson Education

6 comments:

  1. Great work! this has been a really helpful post. It's a shame mftdump doesnt output to bodyfile by default.

    Do you have any suggestions for programmatically accessing the orphan files? I have located some files of interest in Encase but want to avoid manually processing them.
    Cheers!

    ReplyDelete
  2. > The tools I tested were AnalyzeMFT, log2timeline.pl, list-mft and MFTDump.

    Some others to try:

    The Sleuthkit:
    http://www.sleuthkit.org/sleuthkit/
    Especially the 'tsk_loaddb' command which gives you a nice SQLite DB.

    Hachoir (cow fork), it's quite extensive:
    https://bitbucket.org/blinkingtwelve/hachoir-cow/src/tip/hachoir-parser/hachoir_parser/file_system/mft.py
    Extracting what you want requires some Python scripting and is left as an exercise.

    ReplyDelete
  3. Great work, Mari...and another excellent post.

    If there's one thing I would suggest to folks within the DFIR "community", it would be to follow your lead.

    ReplyDelete
  4. Great post Mari. It made me check my tools again. I believe most of these tools are determining the file size from the $FILENAME attribute. In some situations there is a value there but most of the time it's 0. In my experience the best place to find the file size is from the Attribute header. Table 13.3 and 13.4 in Brian Carrier's File System and Forensic Analysis provide this information.

    ReplyDelete
  5. https://github.com/jschicht/Mft2Csv/wiki/Mft2Csv

    ReplyDelete