## Monday, July 21, 2014

### Safari and iPhone Internet History Parser

Back in June, I had the opportunity to speak at the SANS DFIR Summit.  One of the great things about this conference was the ability to meet and socialize with all the attendees and presenters. While I was there, I had a chance to catch up with Sarah Edwards who teaches the Mac 518 class for SANS.

I'm always looking for new projects to work on, and she suggested a script to parse Safari Internet History. So the 4th of July long weekend rolled around and I had some spare time to devote to a project. In between the fireworks and a couple of Netflix shows (OK, maybe 10 shows), I put together a python script that parses out several plist files related to Safari Internet History: History.plist, Bookmarks.plist, TopSites.plist and Downloads.plist.

Since the iPhone also uses Safari, I decided to expand the script to parse some iPhone Safari artifacts: History.plist, Bookmarks.db and RecentSearches.plist. I imagine the iPad also contains Safari Internet History, but I did not have one at my disposal to test. If you want to send one to me for testing, I would be happy to take it off your hands :-).

In this post I'll run through each of the artifacts I located and explain how to use the script to parse out the files.

### Plist Files: A love/hate relationship

First, a little background on plist files. Plist files are awesome because they can contain all sorts of information such as Internet History, Recent Docs, Network IDs etc.  There are free tools for both Windows and OS X that will allow you view the data stored in the plist file. For Windows, you can use plist Editor.  If you have a Mac, a free plist editor is included in Apple's XCode Developer Tools which can be downloaded through the App Store.

However, plist files also stink because while the plist format is standardized, it's entirely up to the programmer to store whatever they want, in whatever format they want.

A (frustrating) example of this is date information. In the Safari History.plist file the date is defined as a "String", and is stored in Mac Absolute time. Mac Absolute time is the number of seconds since June January 1, 2001.  Below is an example of this from a Safari History.plist file viewed in the XCode plist editor:

 History.plist file in XCode plist editor
In the Safari Bookmarks.plist file, the date is stored in a field defined as "Date".The date is stored in a more standard format:

 Bookmarks.plist file in XCode plist editor
This means that each plist file needs to reviewed manually to determine what format the data is in, and how it's stored before it can be parsed.

So, moving on to the artifacts...

### Where's the beef?

On a Mac OS X, the Safari Internet History is located under the folder /Users/%USERNAME%/Library/Safari. As I mentioned before, I located four plist files in this folder containing Internet History: History.plist, Bookmarks.plist, TopSites.plist and Downloads.plist. I've written the script to read either an individual file, or the entire folder at once.

History.plist
This file contains the the last visited date, URL, page title and visit count. To run the parser over this file and get a tsv file use the following syntax:

safari_parser.py --history -f  history.plist -o history-results.tsv

TopSites.plist
The Top Site feature of Safari identifies 12 Top Sites based upon how often and how recent the sites were visited. There are several ways to view the tops sites in Safari, such as starting a new tab or selecting it from the Menu>View>Top Sites. Small thumbnails of each Top Site are displayed. The user has the option to Pin or Delete a site from the Top Sites. Pinning a site keeps it in the Top Sites List, while deleting it removes it. The list can be increased to hold up to 24 sites.

The thumbnails for the webpage previews for Safari can be found under /Users/%Username%/Library/Caches/com.apple.Safari. Below is how the TopSites appear to a user ( this may vary depending on the browser version):

The TopSite.plist file contains the Page Title and URL.  It also stores values to indicate if it's a Pinned or Built in Site. Built in Sites are pre-populated sites such as iCloud or the Apple Website.

TopSites that have been deleted are tracked in the TopSites.plist as "BannedURLStrings".

To parse the TopSites.plist file use the following syntax:

safari_parser.py --topsites -f  TopSites.plist -o topsite-results.tsv

Bookmarks.plist
Safari tracks three different types of bookmarks in the Bookmarks.plist file: Favorites, Bookmarks and the Reading List.

Favorites
The Bookmarks Bar (aka Favorites) is located at the top of the browser:

The Favorites are also displayed on the side bar:

A folder titled "Bookmark Menu" is created by default when a user creates bookmarks. It contains a hierarchical structure of bookmarks and folders - these are shown in the red box below:

The user may add folders, as demonstrated with the "test bookmarks" folder below:

The Reading List is another type of bookmark. According to Safari documentation, "Reading List helps you save webpages and links for you to read later, even when you are not connected to the internet". These items show up when the user selects the Reading List icon:

Safari downloads and stores information such as cached pages related to the Reading List under  /Users/%USERNAME%/Library/Safari/ReadingListArchives. I didn't spend too much time researching this as my parser is focused on the bookmarks.plist file, but keep it in mind as it may turn up some interesting stuff.

All three types of bookmarks (Favorites, Bookmarks and Reading Lists) are stored in the Bookmarks.plist file.

The Bookmarks.plist file tracks the Page Title and URL for the Favorites and the Bookmarks, however, the Reading List entries contain a little bit more information. The Reading Lists also contains a date added, date last fetched, fetch result, and preview text.  There are also a couple of boolean entries, Added Locally and Archived on Disk.

Out of all the plist files mentioned so far, I think this one looks the most confusing in the plist editor programs.  The parent/child relationships of the folders and sub folders can get pretty messy:

To parse the Bookmarks.plist file, use the following syntax:

safari_parser.py --bookmarks -f Bookmarks.plist -o bookmark-results.tsv

The Safari Parser will output this into a spreadsheet with the folder structure rebuilt, which is hopefully more intuitive then viewing in the plist editor:

All Four One and One for All
Instead of parsing each file individually, all four files can be parsed by pointing Safari Parser to a folder containing all four files.  This means you can export out the /Users/%Username%/Library/Safari folder and point the script at it. You could also mount the image and point it to the mounted folder. To parse the folder, use the following syntax:

safari_parser.py -d /Users/maridegrazia/Library/Safari -o /Cases/InternetHistory/Reports

This will create four tsv files with results from each of the above Internet History Files.

### iPhone Internet History

Safari is also installed on the iPhone so I figured while I was at it I might as well expand the script to handle the iPhone Internet History files. I had some test data laying around, and  I was able to locate three files of interest: History.plist, Bookmarks.db and RecentSearches.plist.

While my test data came from an iPhone extraction, these types of files are also located in an iTunes backup on a computer. This means that even if you don't have access to the phone, you could get still get the Internet History. Check in the user's folder under \AppData\Roaming\Apple Computer\MobileSync\Backup, then use a tool like iphonebackupbrowser to browse the backups and export out the files:

History
The location of the History.plist file may vary depending on the model of the iPhone. Check \private\var\mobile\Library\Safari or \data\mobile\Library\Safari for this file.

Luckily, the History.plist file has the same format as the OS X version, so using the script to parse the iPhone History.plist file works the same:

safari_parser.py --history -f  history.plist -o history-results.tsv

Bookmarks
The location of the Bookmarks.db file may vary depending on the model of the iPhone. Check \private\var\mobile\Library\Safari or \data\mobile\Library\Safari for this file. On an iPhone, this file is stored in an SQLite database rather then the plist format used on OS X.  In the test data I had, I did not see any entries for the Reading List. To parse the iPhone Bookmarks.db file, use the following syntax:

safari_parser.py --iPhonebookmarks -f bookmarks.db -o bookmark-results.tsv

Recent Searches
I located a RecentSearches.plist file under the cache folder. The location of this file may vary depending on the model of the iPhone. Check \private\var\mobile\Library\Caches\Safari or \data\mobile\Library\Caches\Safari. This file contained a list of recent searches, about 20 or so. Use the following syntax to parse this file:

safari_parser.py --iPhonerecentsearches -f recentsearches.plist -o recentsearches-results.tsv

You can also point the script to a directory with all three files and parse them at once:

safari_parser.py -d /Users/maridegrazia/iPhoneFiles -o /Cases/InternetHistory/Reports

### The Script

The Safari Parser can be download here. It requires the biplist library which is super easy to install (directions below). However, I've also included a complied .exe file for Windows if you don't want to hassle with installing the library. A thank you  to Harlan Carvey for suggesting the PyInstaller to compile Windows binaries for python - it worked like a charm.

To install biplist in Linux just type the following:

sudo easy_install biplist

For Windows, if you don't already have it installed, you'll need to grab the easy install utility which is included in the setup tools from python.org. The setup tools will place easy_install.exe into your Python directory in the Scripts folder. Change into this directory and run:

easy_install.exe biplist

Remember to look at the plist files to manually to verify your results. I don't have access to every past or future version of Safari or iOS. As always, just shoot me an email or tweet if you need some modifications made.

### References and Tools

safari_parser.py (my script to parse the Safari Internet History)
Safari 5.1 (OS X Lion): View and customize Top Sites
Plist Editor (free plist editor for Windows)
XCode (includes free Plist Editor for OS X)
iphonebackupbrowser ( free iTunes backup browser)

## Thursday, April 24, 2014

### What's the Word - Thunderbird! - Parser that is....

Thunderbird is a free email client by Mozilla (similar to Outlook).  Most of the major Forensic tools support parsing this data in one way or another.  However, I recently came across a Thunderbird profile in a Volume Shadow Copy that was not getting parsed correctly, or in some instances, into a format that I needed it in.

What tipped me off that the profile was not being parsed correctly? Several things. One program I used parsed only 274 messages. Based upon the large size of the profile, this seemed suspect to me.  I tried another program and it parsed over 5,000 emails from the same profile. Quite a discrepancy. When I tried to view the profile natively using Thunderbird, it threw errors.

This caused me to take a closer look at the Thunderbird files, and untimely, write a python parser to extract the emails – including deleted ones.

Testing

Because the email profile was corrupted, I wanted to test the same programs with a "normal" profile. I actually use Thunderbird as my email client, so I had a decent profile for testing with over 7,000 emails in my Inbox and about 3,300 in my sent folder over the course of a couple of years.

I parsed my profile with three forensic programs as well as just viewing it in Thunderbird. I also ran the python script  I wrote over it (noted as TB Parser below). I was surprised by the variety of results - many programs were not getting all the messages. I've listed the major email folders from the Thunderbird profile below and the number of parsed emails from each program:

Tool 1 is a common "all in one" forensic tool. If you look at results from the Inbox, over 4,000 emails were parsed. If an examiner was using this as their only tool, it's easy to see how they might not even realize that an additional 3,000 messages were not parsed.

A possible reason for these discrepancy is the format in which Thunderbird stores its emails. Thunderbird uses a modified version of the MBOX email format, called MBOXRD1.This may account for the partial processing of emails as many of the tools support state support for MBOX. However, Tool 1 states in it's documentation specific support for Thunderbird.

So if the tools states support for Thunderbird, or if you see some emails but they are all not being parsed, is the tool to blame? I think it may be a little misleading that some of the emails are parsed, however,   I believe that it is incumbent upon the examiner to verify the results and understand the way that the tools work. However, that being said, sometimes it's easier said then done. I had a situation where it was pretty obvious all the emails had not been parsed. What if the profile size was 1GB and 5,000 emails were parsed? Is that a reasonable number? What if it was supposed to be 6,000 and your smoking gun is one on the ones that didn't get parsed?

Thunderbird Configuration

First, a little background information on Thunderbird. Thunderbird allows a user to set up both POP and IMAP email. Once a user has set up and configured their profile, it’s stored under the following location (at least on Windows 7):

Unlike Outlook, the data is not stored in one file, but rather a series of files and folders under the profile directory. If you want to view this profile natively with Thunderbird, the easiest way I have found so far it to launch Thunderbird from the command prompt with the –profile switch and point it to the path where you have exported out the profile. Make sure you’re not connected to the Internet if your doing this on an evidence profile. The last thing you want to do is download new email or send out a message that has been sitting in the outbox. This may (and probably will)  modify the file, so only do it on a copy.

Once launched, a typical setup may look like this:

Of course, being forensicators, this may not be the preferred way to review emails - but sometimes it's nice or even necessary to see files in the native viewer/program.

A whole bunch of files are created under the root of the profile directory. These include files like cookies.sqlite, places.sqite and formhistory.sqlite that may warrant a peek. However, I am going to focus on the email files for now.

Email Files

Thunderbird stores the IMAP mail profile in a sub folder named "ImapMail" while POP mail and Local Folders are stored in a sub folder named "Mail":

There are several files that hold information related to emails. The first is the global-messages-db.sqlite file. This file is located in the root of the profile folder:

Global-messages.db.sqlite Database

The global-messages.db.sqlite is an SQLite database that Thunderbird uses to index and search messages.2 This file can be viewed using an SQLite Browser. The "mesagesText_Contents" table contains the Email Body, Subject, Author, Recipients and Attachment Names.

 messagesText_Contents Table
While this database contains email information, the email body is not a true representation of the email. For example, the body field does not contain images or attachments. Also, it does not contain messages that have been deleted, whereas the MBOXRD file can (discussed below).  However, it does contain some useful data, such as the name of the attachments of non-deleted emails. You could browse this database quickly to see if any attachment names are suspicious.

Using "docid" in the messagesText_Contents table, you can link it back to the “messages”table id field. The messages table contains information about each message, such as the headerMessageID and jsonAttributes. The jsonAttirbutes are what stores whether a message has been read, forwarded or replied to among other things.

The headerMessageID is also located in the MBOXRD file - which is what I used to link the raw MBOXRD data back to global-messages.db.sqlite database. You may noticed there is a deleted column here. Based upon limited testing, I believe that this value is used during the synching of the IMAP mail. When a message is deleted, it remains in this database with a 1 until the corresponding message is deleted on the mail sever. Once it has been deleted, the message is removed from the database, but remains in the MBOXRD file. Normally all these values will be '0' unless the user was offline when the message was deleted.

In my particular case, this sqlite file was corrupt and I did not have access to these tables.  This may also be why one of the programs did not parse the email fully - maybe it was relying on the table, who knows.  I have written my parser so that it does not need this database to process the emails. It merely displays "Data not available" for the fields that it can't pull from the table.

Just a heads up, there is more data that could be mined from this database, such as IM Conversations but I am trying to stay focused on email.. so moving on.... (and who uses Thunderbird to IM anyways????)

Thunderbird stores email in an mbox  format called MBOXRD. Basically, it stores email in plain text MIME format. The cool thing is (based upon my testing and some internet research) when an email is deleted, it stays in this file. These deleted emails would not be seen if this profile was viewed using the Thunderbird client. The thunderbird parser pulls all the emails from these files, including deleted ones.

The MBOXRD files are stored in file that is named after the corresponding email folder with no file extension. For example, the Inbox folder stores its emails in the "INBOX" file":

One level deeper, in the .sdb folder are the other folders such as the Sent folder and any user created folders to store email:

.MSF files
For each MBOXRD file, there is a corresponding .msf file. The .msf file contains folder indexes and preference data in Mork format. According to internet research, this file format has taken a lot of heat as being a pain to work with. The pointers for messages marked as Junk by Thunderbird appear to be tracked in here (based upon my limited testing). However, the formatting of the Message-ID's in this file are whacked. They include backslashes and if they are to long, they can also include the newline "\n" character as well.

Deleted Files
As mentioned before, when a file is deleted it is removed from the database yet still remains in the MBOXRD file.  In order to determine if a file is deleted,  the headerMessageID in the MBOXRD file can be cross referenced back to the database. However, emails that have been marked as "Junk" mail by Thunderbird are not stored in the global-messages.db.sqlite either. The "Junk" emails appear to be stored in the corresponding MBOXRD .msf file. So two checks need to be done to determine if a file has been deleted. The logic is as follows:

Thunderbird Email Parser

The python thunderbird email parser does three things:

1) Provides an Excel Sheet with the following information: file the email came from, address information (from, to, cc, bc), subject, raw date, converted date (in UTC) a link to the exported email and a list of attachments:

2) If the corresponding global-messages.db.sqlite is readable, it will provide TRUE/FALSE values for read, replied forwarded and if the message was deleted. If a message was deleted, the database format has changed, or the database is corrupt, these fields will say "Data not available".

3) It exports all the emails into a subfolder named "emails". Each email is named with the timestamp, email subject and unique number.

Normally, when I write a parser, I like to dump the output into a CSV, TSV or a plain text file.  This proved difficult for two main reasons.

First, many of the email addresses and strings within the email body contained tabs and commas which threw the formatting off.

Second,   I needed a way to supply the body of the email. Putting a large body of an email into one cell looked ugly.  Also, html was not displayed as one would see it in an email client making it difficult to read.

For this reason, I decided to put the output into an Excel sheet. So in order to use the parser, the xlwt python libary needs to be installed which is pretty quick and easy to do for either the Windows or Linux platform. For Linux, you can use easy install. For Windows, you can downalod the installer for xlwt at https://pypi.python.org/pypi/xlwt/0.7.2

To use the parser, simply point it at the profile directory and select a directory for the output. The script will recurse through all subdirectories, so if you export out the user profile, make sure it goes in it’s own directory:

A report.xls file will be created along with a log file in output folder. The .eml files will be placed in a subdirectory named “emails”.

Some things to note, you may notice duplicate emails. This is because some emails may be stored in several folders, thus the email is stored in multiple files.  For example, an email may be in the Inbox, as well as the All Email folder.  Why not remove duplicate emails?  Well, there may be significance if you find an email has been stored in a particular folder.

I am using a built in MIME python library to parse the emails. If an email does not follow this standard, the output may not be as expected -weird characters, etc. This is why I put the file name in the Excel sheet. You can always refer back to the original MBOXRD file to verify the results.

Although I have made every effort to test this script, and to make sure it is working accurately, verify your own results - which you should be doing anyways, right? ;-)

For deleted emails, I have made the notation "Deleted (Verify)". I did this because there is not a specific flag or variable to designate that the email has been deleted. I run through several checks to located the Message-ID to determine if the file has been deleted.  It seems to be working pretty good, but I have a limited set of test data.  How can you verify if the message has been deleted?  One way would be to open the profile in Thunderbird and use Thunderbird to search for the email. If the user deleted the email, it would not show up in Thunderbird.

I have tested this on Thunderbird 24.4.0 using Windows 7 and the SIFT workstation with Python 2.7.  If you want a Python 3+ version, I like shiny things and K-cup hot chocolate.

Given the frequency Mozilla tends to update things, there is always a chance that a new version may break the code. If you run into a situation where it doesn't work on a new or older version of Thunderbird, shoot me an email and I'll see what I can do.

As always, feedback and suggestions are welcome (If you're nice about it. Otherwise it goes right in the spam folder).

References:

1. Library of Congress "Sustainability of Digital Formats Planning for Library of Congress Collections, MOBXRD Email Format."

2. Mozilla Foundation. "Rebuilding the Global Database"

## Monday, December 30, 2013

Google Analytics information can include values such as timestamps, page titles, keywords and page referrers which can be located on a user's computer. These values can be located in Cookie files and Browser cache files.
 Artwork by Cheeky4n6Monkey

A while ago I wrote a blog post about the Google Analytic Cookies and the Cache files. Rather then focus on how to parse these artifacts like previous posts, this post will dive into how you can use deleted Google Analytic artifacts to build a much more comprehensive timeline as well as how to recover them using Scalpel.

(Or you can watch me talk about it on the Forensic Lunch if you prefer, but I have more detailed instructions here)

Building out the Timeline

I had a case where the user account was deleted, and the client wanted Internet History recovered to show a pattern of activity - not just that the user had been to a site once, but many times over the course of the time they had access to the computer.

Although I tried two commercial tools to recover deleted Internet History, it was very little and over a short period of time. This is where Google Analytic artifacts stepped in and saved the day.  I was able to recover a large amount of cookies from unallocated space and cache files to build up a timeline that showed a pattern over time - much, much more then the Internet History I recovered with the commercial tools. Even if you are working with an existing user account, adding these artifacts can build out your timeline even more. I'll explain what I mean below.

Normally, when a cookie is viewed through a tool such as NetAnalysis, you are presented with a Last Visited Date, Hit Count and Domain name:

Take the cookie highlighted above in blue for an example. By looking at the displayed information  we know the host name, last time the domain was visited and how many hits it had. But what do the hits mean exactly?  Were all of these 136 hits done in one day? Were they spread out over the course of the year? What about all the days/visits prior, if any?

This is where the power of recovering deleted Google Analytic artifacts can help build out a timeline.  Take for example the _utma cookie. This cookie has three timestamps as opposed to the one timestamp of a "regular" cookie. After recovering some of these __utma cookies with the same host name, we can start to build up a timeline that has way more information. (The following spreadsheets were generated by using GA Cookie Cruncher to parse the recovered cookies)

The __utmb cookie stores session information for each visit to a website. It expires after 30 minutes of inactivity. This __utmb cookie not only stores the time of the session, but how many pages were viewed during that session. This means that if a user visits a website twice in one day, say before and after work,  two separate__utmb cookies would have been created. Once in the morning with a page count for that visit, and once in the evening with a new page count for that visit. Since the only the last cookie is saved, we would only have one count for the page views. If we recover the cookies that existed previously, we can see a session count for those previous visits and add those to the timeline:

The__utmz  cookie stores information related to how a user arrived at a website. Theses include keywords and the source. Once again, recovering these can show various ways a user arrived at a site:

Now if we combine all cookies into one sheet for review:

All Recovered __utm Values

Look at all the information that is now available compared to viewing just the one existing cookie! Instead of being presented with one visit date and one hit count, we now have previous visits, keywords, referral pages and how many pages were viewed in each session. This can further be built upon by adding the __utm?gif cache values which can have over 30 other variables such as page title and referral page. I have also seen values like usernames in the cached URLs which could extremely helpful.

The real power of the Google Analytic artifacts comes into play when deleted artifacts are recovered. By using Scalpel and then parsing the carved files you can have some new data to play with and analyze.

Based on some initial and limited testing with Internet Explorer 11 and Windows 7, it appears the browser deletes then creates a new cookie when visiting a website rather then overwriting the old cookie. This means there could be a lot of cookies waiting to be recovered.

Scalpel is a great program for recovering, or 'carving' for deleted files. It's a command line tool which is included in the Sift Workstation, or it can be downloaded from here.

By default, Scalpel does not carve for Google Analytic Cookies and cache files, but that is easily fixed by adding in a few lines to the Scalpel configuration file (mine was located under /user/local/etc/scalpel.config):

 scalpel.config

I added five entries into the configuration file in order to locate Internet Explorer and Safari Binary Cookies and Cache files. (I'm still working on the best way to carve out then parse cookies that are stored in SQLite databases, such as Firefox and Chrome.)

If your unfamiliar with how to use Scalpel or need a refresher,Cheeky4n6Monkey has a great post on how to use Scalpel, including how to add custom carvers like these.

The configuration file itself has detailed instructions on how to add custom file types, but here is a quick explanation of the entries I've made.  The first column is the file extension. In this case it's arbitrary and you can use whatever you like here. For instance, I could have also used .txt instead of iec (which I chose to stand for Internet Explorer Cookie).

The second column is whether or not the header is case sensitive. In my test data for IE, I have always seen them in lowercase so I used 'yes' to help reduce false positives.

The third column is the max size for the file we are carving. Since each IE cookie should be relatively small, I have used 1000 bytes as the value.

For the header and footer, if we view the Internet Explorer Cookies in a text editor, we can see that each cookies starts with a __utm value and ends with a '*' - these will be the header and footer respectively for each carved cookie:

Safari Binary cookies have "cook" for a file header. Since one Safari Binary Cookie file holds all the cookies for the browser, the file size can be larger then the IE cookies. To be on the safe side I have specified a much larger file size of 1000000 bytes.  The footers on Safari Binary Cookie files are not always the same, so I have left this value blank.

The cache file store the __utm.gif? values in plain text.  The goal is not to recover the entire cache file, but just the __utm_gif? URL and values. Below is a picture of a Firefox cache file:

By using the string "google-analytics.com/__utm.gif?" as a header, and specifying 1000 bytes, it should extract the whole URL plus a little extra for padding to be safe. (To read more about the __utm.gif values in cache files, check out my blog post here.)

When Scalpel carves all these file types, they will each be dropped into their own sub directory automatically.

To run scalpel:
scalpel -c /usr/local/etc/scalpel.conf -o carvedcookies /cases/myimage.dd

Opening up each carved file and manually parsing it for all the _utm values could be pain, especially if you have hundreds of recovered files. To that end, I have updated GA Cookie Cruncher to handle carved cookies for Internet Explorer and Safari Binary Cookie Parser for Safari Binary Cookies.

What do I mean by "handle"?  When recovering files, sometimes the files are fragmented, incomplete or there may be some false positives.  For example, if you were to try and open an incomplete Word Document in Word, it might close and give you an error.  Both of the above programs can handle these situations. If the file is incomplete, it tries to get as much information as it can, then moves on to the next file.

(To that end, it worked on my test data. If it crashes on yours, shoot me an email and I'll see what I can do)

What this means is that once you carve the files, you just need to point the programs at the directories and let them parse as many values as they can.

So in summary, the following steps can be followed to recover and parse deleted Google Analytic values for Internet Explorer and Safari:

• Update the Scalpel config file with the carvers
• Run Scalpel over the image
• For Internet Explorer, point GA Cookie Cruncher to the directory holding the IE recovered cookies
• For Safari, run the Safari Binary Cookie parser with the directory option (-d) to the directory holding the recovered binary cookies.
• For Cache records,point the Gis4Cookie parser with the directory option to the directory holding the recovered cache files
(Side note - the entry made in the scalpel.config file for cache files also works for Chrome and Firefox, however I am still working on how to carve and process fragmented Chrome and Firefox Cookies as they are in a different format. Sqlite files are harder to recover in their entirety.)

There might be a few more steps involved then just pushing a button, but in my case is was worth it.

## Wednesday, November 6, 2013

### Python Parser to Recover Deleted SQLite Database Data

Soooo.... last week I was listening to the Forenisc Lunch  and the topic of parsing deleted
records from SQLite databases came up. These Forensic Lunches are every Friday and cover a wide range of topics relevant to the Forensics Community and are hosted by David Cowen. I highly recommend participating in one if you get the chance. It's actually at 10am my time, so it's more like a Forensic Doughnut for me.

Anyways, back to the SQLite databases....I see a lot of these databases in my mobile phone exams. They can contain emails, text messages, app data and more. It's also not uncommon to run into them on Windows (and Mac) exams as well - think Google Chrome History which is stored in an SQLite database.

SQLite databases can store deleted data within the database itself. There are a couple of commercial tools that can parse this deleted data such as Oxygen Forensics SQLite Viewer.

While a commerical tool is good, its always nice to have an open source alternative. After hearing David mention in the webcast he was not aware of any open source tools that did this, my ears perked and I decided to try my hand at writing a Python script to parse SQLite databases for deleted data.

Luckily, the SQLite file format is nicely documented on the SQLite.org website. I won't go into much detail here as it's laid out very nicely on their website.

Basically the database consists of Pages. Some of these Pages are "leaf table b-trees" which contain the data. In turn, these leaf table b-trees contain cells. According to SQLite.org, SQLite "strives" to place the cell towards the end of the b-tree page (how does a program strive I wonder?).  Because the cells 'strives' to be towards the end  (I keep thinking of Happy Gilmore - Go home ball! Don't you want to be in your home?) the unallocated space is, in essence, the space before the first cell starts. This unallocated space can contain deleted data.

The leaf table b-tree page can also contain freeblocks. Freeblocks are areas of unallocated space tracked by the leaf table b-trees.  So there are two areas within a page that can contain deleted data: unalloacted and freeblocks.

In this example I am going to use the script to parse the Google Chrome History database.  In case you want to play along you can find this file under C:\Users\%USERNAME%\AppData\Local\Google\Chrome\User Data\Default (if you have Chrome installed).

Using the SIFT workstation I ran the script over the History file (by default the Chrome History file does not have a file extension):

sqlparse.py -f  /home/sanforensics/History -o report.tsv

The output includes the Type (Allocated or Freeblock), Offset, Length and Data:

Now, an important note about the deleted data. In order to make the data readable, I have stripped tabs,white spaces and non-printable characters in the output.  As much as I love like looking at hex, it was drowning out the strings I was looking for.

You can also run the script in raw mode, which will dump the data field as is:

sqlparse.py -f mmssms.db -r -o report.txt

This can be helpful if you are looking for timestamps, flags or other data that may be in Hex.

## Thursday, August 29, 2013

### Safari Binary Cookies - Now with more parsing power!

Several people have already done a fantastic job of breaking down the file format and writing scripts to parse these cookies. If Perl is your flavor, check out these handy tools from Jake Cunningham. If you love Python, the script from Satishb3 does a great job of parsing the information.

While both of the above scripts do a fantastic job of parsing and presenting the information for the Cookies.binarycookies file, I wanted a way to parse a directory full of these binarycookies as well as the Google Analytic values from the cookies.

The awesome thing about open source is the ability to not only learn by looking at someone else's code, but to build on top of what they have done and create or tailor something for what you need (then hopefully turn around and share it again with others).

When I was reviewing the Satishb3 python script, I did not see a specific licensing agreement distributed with the code. I reached out to Satishb3 for permission to reuse his code and luckily for me, he graciously wrote back granting me permission.

This saved me a lot of time, and enabled me to focus my efforts on adding in the features that I needed. I sat down with some Dr. Pepper and the handy, dandy SIFT Workstation,and wrote a python script that parses the binarycookies file with the following additions:

1) Parses a directory full of cookies
2) Parses the Google Analytic values from the Cookies (umta, utmb, utmz)
3) Added an option to output into TLN format

Usage Examples

To process one file:

To process a directory of cookies:
To have the output in TLN format (this can be used with the file or directory option):

-f is the binary cookie filename, -o is the output file,  -t means TLN output, -H is the Host (optional) and -u is the username (optional) .

 Full Image

 Full Image

TLN (Timeline Output):

 Full Image

## Monday, August 5, 2013

### MS Office Recent Docs Plist Parser

Recently a post came up at Forensic Focus regarding the timestamps in the com.microsoft.office.plist file. I had a case several months ago where I ran into the same situation - trying to determine the timestamp for the Access Date stored in this file.  I have been meaning to get around to writing about what I found,  so after I saw that post I thought I would get in gear and do it.

When opening documents in Office 2008 and 2010 ( not sure about other versions) on a Mac the user is presented with a dialog box for recent documents, called the Workbook Gallery.  As you can see by the screen shot below, the Recent Documents tracks  the File Name, Last Opened date and File Path of the recent documents:

This information is stored in the com.microsoft.office.plist file under the User's profile : /Users/%Username%/Library/Preferences/.

Some notes about the com.microsoft.office.plist files:
• It can contain A LOT of entries. I found close to 1500 entries spanning 4 years.
• It can also have User Information such as name and email address
• It has Volume names, so you can see if files were opened from an external drive, etc.
If you do not have a Mac, you can view this file with plist Editor for Windows from icopybot.com. Below is a screen shot of how the com.microsoft.office.plist file looks. There is an Access Date field and File Alias field which both appear to be Base64 encoded:

Sometimes using various tools to see how the information is presented is helpful. Since the plist file is from a Mac, my next choice was to look at the file natively on a Mac. There is a free Plist Editor included in XCode, which is a developer tool published by Apple.

Looking at this plist file through the Xcode Plist Editor shows the following:

Now the Access Date look familiar, Hex values - and the File Alias looks like it's in Hex too.

Time to view the data in a Hex viewer to see what’s going on:

Ah ha! File Paths, File names, and the timestamp information.

Now the trick – figuring out the timestamp. By doing some testing – I.E. opening up files in MS Office on a Mac, checking the changes in the timestamp values, and brainstorming with Brian Moran, we were able to figure out the timestamp appeared to be in HFS+ 32 Bit Little Endian:

B95120CE = Thu, 01 August 2013 10:56:09 -0700

We couldn’t quite figure out what the last two bytes, 0xEB6A, were for – maybe milliseconds? Further testing will need to be done to confirm this.

Time to time it all together. Take the Data Field from File Alias in the Mac Plist Editor and convert the Hex value to ASCII (try this website) to get your File Paths and File Name:

Hex:
00000000 01960002 00000a4d 44544855 4d424452 56000000 00000000 00000000 00000000 00000000 00004244 0001ffff ffff1645 6d706c6f 79656520 53616c61 72696573 2e786c73 78000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000ffff ffff0000 00000000 00000000 0000ffff ffff0000 0a024953 00000000 00000000 00000000 0018436f 6d70616e 7920546f 70205365 63726574 2046696c 65730002 00442f3a 566f6c75 6d65733a 4d445448 554d4244 52563a43 6f6d7061 6e792054 6f702053 65637265 74204669 6c65733a 456d706c 6f796565 2053616c 61726965 732e786c 7378000e 002e0016 0045006d 0070006c 006f0079 00650065 00200053 0061006c 00610072 00690065 0073002e 0078006c 00730078 000f0016 000a004d 00440054 00480055 004d0042 00440052 00560012 00302f43 6f6d7061 6e792054 6f702053 65637265 74204669 6c65732f 456d706c 6f796565 2053616c 61726965 732e786c 73780013 00132f56 6f6c756d 65732f4d 44544855 4d424452 5600ffff 0000

ASCII:
MDTHUMBDRV
Employee Salaries.xlsx
Company Top Secret Files
D/:Volumes:MDTHUMBDRV:Company Top Secret Files:Employee Salaries.xlsx
Employee Salaries.xlsx
MDTHUMBDRV
0/Company Top Secret Files/Employee Salaries.xlsx
/Volumes/MDTHUMBDRV
Mari DeGrazia

(In this example, the volume name of my thumbdrive was MDTHUMVDRV)

Then use Dcode (or whatever you like) to convert the Hex timestamp (remember to remove the last two Bytes):

If you do not have a Mac at your disposal, no worries, you can still use the Windows plist editor.

From the plist Editor for Windows, convert the Access Date from Base64 to Hex (try this website):
AAC5USDO62o=  =  0000B95120CEEB6A

Then use DCode to convert B95120CE as shown above.

For the File Alias, convert the Data field from Base64 to ASCII, try this website for the conversion.

Or go for door number two - you can use the python scrip I wrote, OfficePlistParser to parse the file.

The script will pull the MRU ID (so you can refer back to the plist for verification), Access Date in UTC, and Full Path. Long file names appear to be concatenated with with a random set of numbers, like so:

\long\path\to\my\long\file\name\Supercalifragilistic#6E432C.doc

In Office 2010, the long file names are supplied in the file aliases which are parsed by the script.

It also pulls User information which is output to the screen. A note on the User information. I noticed on some of my test data that the username in the file may be the person who first registered the product, or entered their user information into MS Word first. This did not correspond to the user who opened the file.

For example, on another Mac I created a profile for testing, opened a document then parsed the file. The owner's name was listed in the plist file, not mine or my account user name. Some more research will need to be done here.... If your looking at a carved plist files it's something to be aware of as the username may not be representative of who opened the files.

Since Python does not have native support for reading binary plist files, the library biplist is required. This can be installed on the SIFT workstation using easy install:

sudo easy_install biplist

Here is an example some parsed content:

You can get the scrip here. Enjoy, and any feedback/issues with the script are appreciated. It worked on my test data but I don't know what type of shenanigans your clients may be up to...

Also, quick shout outs to Cheeky4N6Monkey and @brianjmoran for their help. Three minds are better then one, right?

## Sunday, July 14, 2013

### Google Analytic Values in Cache Files

A while ago I wrote about Google Analytic Cookies. These cookies can contain information such as keywords, referrer, number of visits and the first and most recent visit.  This information is stored in cookie variables called __utma, __utmb and __utmz.

These __utma and __utmz values are not just stored in Cookies, but also in Google Analytic GIF
requests, which in turn are stored in the browser cache files.

Webmasters using Google Analytics put a piece of code on their page that might look something like this:

<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-12345678-1']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>

“When all this information is collected, it is sent to the Analytics servers in the form of a long list of parameters attached to a single-pixel GIF image request. The data contained in the GIF request is the data sent to the Google Analytics servers, which then gets processed and ends up in your reports” 1

This GIF request looks something like this - notice the parameter 'utmcc' - this holds the Google Analytic Cookie values (they weren't kidding when they said it was long):

In addition to the utmcc parameter, there are up to 31(!) other parameters that this utm.gif value can hold.1

Some of these are:

Utmdt: Page Title
Utmhn: Host title
Utmp: page request
Utmr: referral with the complete URL

Other variables include the version of Flash used, screen resolution, screen color depth, and language encoding. To see the full list, check out this Google Developers page.

If looking at the above URL makes your eyes cross, here is what a manually parsed version looks
like:

utmhn (Host Title):             www.deviantart.com

The utmcc parameter holds the Goggle Analytic cookie values. So manually parsing these values 2:

utma (First Visit ) =7/5/2013  6:21:40 PM
utma_(Previous)   =7/5/2013  6:21:40 PM
utma (Last Visit)   =7/5/2013  6:21:40 PM

utmctr (keywords that found site) = not provided

(For a refresher on how to parse the GA cookie values see this article on DFI News , or my blog post here)

When I ran into these values on an exam and needed to parse a lot of them, I reached out to Cheeky4n6Monkey who wrote an awesome script to parse them. What is cool about his script is it works with various file formats. For example, it can parse the Safari SQLite cache.db and the Firefox __CACHE_ files.

Speaking of which, here are the locations for some different cache files holding these utm.gif? image requests:

Locations

FireFox

These utm.gif? values can be in the _CACHE_001_, _CACHE_002 etc files and the randomly named files in the sub-folders.

Chrome
data_1, data_2 etc files

Safari

Windows

So in the course of an exam, a quick way to locate and parse the GA GIF urls might be this:
• Run a keyword search for “utm.gif?”
• Filter by unique files
• Export out the files into one directory
• Use Gis4cookie.pl to parse the entire directory with the -p switch: