Pages

Monday, August 5, 2013

MS Office Recent Docs Plist Parser

Recently a post came up at Forensic Focus regarding the timestamps in the com.microsoft.office.plist file. I had a case several months ago where I ran into the same situation - trying to determine the timestamp for the Access Date stored in this file.  I have been meaning to get around to writing about what I found,  so after I saw that post I thought I would get in gear and do it.

When opening documents in Office 2008 and 2010 ( not sure about other versions) on a Mac the user is presented with a dialog box for recent documents, called the Workbook Gallery.  As you can see by the screen shot below, the Recent Documents tracks  the File Name, Last Opened date and File Path of the recent documents:


This information is stored in the com.microsoft.office.plist file under the User's profile : /Users/%Username%/Library/Preferences/.

Some notes about the com.microsoft.office.plist files:
  • It can contain A LOT of entries. I found close to 1500 entries spanning 4 years. 
  • It can also have User Information such as name and email address
  • It has Volume names, so you can see if files were opened from an external drive, etc.
If you do not have a Mac, you can view this file with plist Editor for Windows from icopybot.com. Below is a screen shot of how the com.microsoft.office.plist file looks. There is an Access Date field and File Alias field which both appear to be Base64 encoded:



Sometimes using various tools to see how the information is presented is helpful. Since the plist file is from a Mac, my next choice was to look at the file natively on a Mac. There is a free Plist Editor included in XCode, which is a developer tool published by Apple.

Looking at this plist file through the Xcode Plist Editor shows the following:



Now the Access Date look familiar, Hex values - and the File Alias looks like it's in Hex too.

Time to view the data in a Hex viewer to see what’s going on:



Ah ha! File Paths, File names, and the timestamp information.

Now the trick – figuring out the timestamp. By doing some testing – I.E. opening up files in MS Office on a Mac, checking the changes in the timestamp values, and brainstorming with Brian Moran, we were able to figure out the timestamp appeared to be in HFS+ 32 Bit Little Endian:

B95120CE = Thu, 01 August 2013 10:56:09 -0700

We couldn’t quite figure out what the last two bytes, 0xEB6A, were for – maybe milliseconds? Further testing will need to be done to confirm this.

Time to time it all together. Take the Data Field from File Alias in the Mac Plist Editor and convert the Hex value to ASCII (try this website) to get your File Paths and File Name:

Hex:
00000000 01960002 00000a4d 44544855 4d424452 56000000 00000000 00000000 00000000 00000000 00004244 0001ffff ffff1645 6d706c6f 79656520 53616c61 72696573 2e786c73 78000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000ffff ffff0000 00000000 00000000 0000ffff ffff0000 0a024953 00000000 00000000 00000000 0018436f 6d70616e 7920546f 70205365 63726574 2046696c 65730002 00442f3a 566f6c75 6d65733a 4d445448 554d4244 52563a43 6f6d7061 6e792054 6f702053 65637265 74204669 6c65733a 456d706c 6f796565 2053616c 61726965 732e786c 7378000e 002e0016 0045006d 0070006c 006f0079 00650065 00200053 0061006c 00610072 00690065 0073002e 0078006c 00730078 000f0016 000a004d 00440054 00480055 004d0042 00440052 00560012 00302f43 6f6d7061 6e792054 6f702053 65637265 74204669 6c65732f 456d706c 6f796565 2053616c 61726965 732e786c 73780013 00132f56 6f6c756d 65732f4d 44544855 4d424452 5600ffff 0000

ASCII:
MDTHUMBDRV
Employee Salaries.xlsx
Company Top Secret Files
D/:Volumes:MDTHUMBDRV:Company Top Secret Files:Employee Salaries.xlsx
Employee Salaries.xlsx
MDTHUMBDRV
0/Company Top Secret Files/Employee Salaries.xlsx
/Volumes/MDTHUMBDRV
Mari DeGrazia


(In this example, the volume name of my thumbdrive was MDTHUMVDRV)

Then use Dcode (or whatever you like) to convert the Hex timestamp (remember to remove the last two Bytes):


If you do not have a Mac at your disposal, no worries, you can still use the Windows plist editor.

From the plist Editor for Windows, convert the Access Date from Base64 to Hex (try this website):
AAC5USDO62o=  =  0000B95120CEEB6A

Then use DCode to convert B95120CE as shown above.

For the File Alias, convert the Data field from Base64 to ASCII, try this website for the conversion.

Or go for door number two - you can use the python scrip I wrote, OfficePlistParser to parse the file.

The script will pull the MRU ID (so you can refer back to the plist for verification), Access Date in UTC, and Full Path. Long file names appear to be concatenated with with a random set of numbers, like so:

\long\path\to\my\long\file\name\Supercalifragilistic#6E432C.doc

In Office 2010, the long file names are supplied in the file aliases which are parsed by the script.

It also pulls User information which is output to the screen. A note on the User information. I noticed on some of my test data that the username in the file may be the person who first registered the product, or entered their user information into MS Word first. This did not correspond to the user who opened the file.

For example, on another Mac I created a profile for testing, opened a document then parsed the file. The owner's name was listed in the plist file, not mine or my account user name. Some more research will need to be done here.... If your looking at a carved plist files it's something to be aware of as the username may not be representative of who opened the files.

Since Python does not have native support for reading binary plist files, the library biplist is required. This can be installed on the SIFT workstation using easy install:

sudo easy_install biplist

Here is an example some parsed content:


You can get the scrip here. Enjoy, and any feedback/issues with the script are appreciated. It worked on my test data but I don't know what type of shenanigans your clients may be up to...

Also, quick shout outs to Cheeky4N6Monkey and @brianjmoran for their help. Three minds are better then one, right?

9 comments:

  1. Hello Mari,

    I'd just followed your instructions and you saved me a lot of work, because I just needed to read the com.microsoft.office.plist of a Mac.

    Really, really thank you so much. It works fantastic!!!

    ReplyDelete
  2. This is (still) an extremely useful tool. Just found it and began using it. Only 1 minor change: I had to add in a couple of lines of code to catch keyerror's. The entire script was bombing out on the first bad key it found.

    ReplyDelete
    Replies
    1. I've updated the code to catch some errors - hopefully this will help out for future users.

      Delete
  3. Thanks for the code change! Would you be able to share that with me so I can update the source code?

    ReplyDelete
  4. Those random strings of numbers look suspiciously like RGB color definitions ....

    ReplyDelete
  5. I wonder if there is a length byte in there that would allow us to skip over all of those nulls that look like padding....

    ReplyDelete
  6. It's been awhile since I've looked at this artifact - but a great observation. I'll have to take a look and see.

    ReplyDelete
  7. For anyone coming across this great post in the future, we've released an open source plistutils library for Python which will parse the Alias data and convert Mac dates, among other things: https://github.com/strozfriedberg/plistutils

    ReplyDelete