Saturday, December 29, 2012

Dude, Where's My Data?


Harlan's tweet (view picture to the right) got me thinking, and I would like to share a case example that I feel drove this particular point home for me.

 Many of the 'Swiss Army' forensics tools will parse data for you and automate various tasks. For example, X-Ways will parse link files and EnCase will  parse (or mount, whatever term you prefer) PST files. Instead of exporting out these files and working with them in separate programs, these Swiss Army knives will display the data in a more readable format within their GUI.

Cell phone forensic programs work in a similar fashion. They will first acquire the phone (if you’re lucky that day) then parse typical data such as SMS and MMS messages, call logs and contact information. These programs can also generate a pretty report for you to turn over to your clients (whether it’s a prosecutor,  a defense attorney or your cousin Vinnie). As an examiner, this can be great thing.  No need to locate and export the database, run queries and convert timestamps.

Each of these programs has taken a task that is repetitive and automated it - in most cases, saving the examiner time. But where is the Swiss Army knife getting its data from, how is it interpreting it and  is it getting all the data?
 
Now, to get to my case example.  I had an iPhone where I was tasked with getting the voicemails.  In order to do this I had three “Swiss Army knife” tools at my disposal:
  • Swiss Army Knife A – $$$$$
  • Swiss Army Knife B – $$$
  • Swiss Army Knife C – $
All three programs were able to acquire a file system image of the cell phone as well as parsing the SMS, MMS and call logs, but what I needed were the voicemails.  Only one of these tools (A) automatically parsed the voicemail.db file which contained information regarding the voicemails. I did locate the voicemail.db file within the file system of the two other programs (B and C) - the programs just didn’t parse this database automatically.

Now, if an examiner had just the option of B or C that did not automatically parse the voicemails and point them out – would they have assumed there were no voicemails? Ok, this may not be the best example as voicemails are a pretty common thing, but what if it were a not so common artifact?

I decided to use A to conduct the remainder of the exam since it had already parsed the voicemails saving me the time of exporting out the database, running quires and converting timestamps. I could now get a pretty report.   I went to generate the report and the program threw an error. No report, no exported voicemails.  Dude, where's my data?
 
In my quest to find out why the $$$$$ Swiss Army knife threw an error, I went to view the contents of the voicemail.db file to see if there was some abnormal data causing issues. I opened the voicemail.db file with an SQLite viewer and noted several columns of data NOT displayed by the $$$$$ program.

Included were two columns I thought right off the bat could be important – a "flag" column and a "trashed" column. The flag column designates certain statuses of the voicemail such as heard, unheard or deleted. The trashed column is the date that the voicemails where placed in the deleted folder. What if the examiner needs to prove the suspect had listened to a voicemail?  I know I don't always listen to my voicemails (sorry Kim) and opt to just call the person back instead. (Now I know you can't prove they listened to it, per say. Maybe their speaker was busted,  or their nephew had their phone but this is just an example for illustrative purposes so just roll with me).

After some more testing, I determined it was blank values in the database that were causing errors with the reporting. Was I going to wait for the next software update to get a report? No. Time to work with the data myself. I had three options:

Good → Export the data from the database into an Excel sheet, use formulas to convert the timestamps

Better → Write a script to parse the data as I would probably need it again

Best → Have someone else write a script to parse the data

Now, arguably, Better and Best could be switched.  It’s always good to write your own tools so you gain a deeper understanding of the data. However, in my case I had someone (Cheeky4n6Monkey) reach out to me when I was working on iParser asking if I needed any help.  I know he enjoys learning and is always looking for a forensic project to help on. So rather then write my own parser  I thought this would be a nice project to involve him in.

A few emails later I had a custom tool written by him that gave me the exact data I wanted and hopefully he got to learn something in the process too.

So in summary, I am going to quote Harlan’s tweet again:

 How much do you know about what your tools do for you?  May I make the following suggestions?  Look at the data with different tools to see what your tool may not be doing. Look at the raw data, or look at the data in its native format to see how your tool interprets the data and what it may be missing. Read the forums associated with your tool, see what it may be capable of that you are missing out on based upon how others use it.

Do they get the data you need? In my case, not always. Sometimes I need to roll up my sleeves and do the dirty work by myself (err, in this case I asked someone else to join in with me).

Do you know what you need? How do you know what data you need, if you don’t even know it exists? Keep researching, reading blogs, watching webcasts and asking questions.  Don’t assume that everything will be handed to you on a silver platter by your tools.

Umm, in case you don't get my movie reference, Google "Dude, Where's My Car"  :-)


4 comments:

  1. Great article! Added a link to the Digital Forensics Google Currents feed at http://iadfi.org/go

    ReplyDelete
  2. Great post, Maria! I really liked not only how you approached the problem, but the fact that you willing shared this case study. I don't think that most analysts really understand how valuable this can be, even though they may thoroughly enjoy reading what you posted.

    ReplyDelete
  3. Part of the reason I wrote that tweet is because I regularly see analysts who have an issue or analysis goal they are working toward, and they often seem to be trying to fit the tool they use to the problem, rather than focusing on solving the problem. For example, if a log on a system illustrates during your dead-box analysis that some unusual activity occurred at a certain time, or that anomalous activity occurred at regular intervals for a brief period of time, then generating a timeline of system and user activity would be a great way to nail down what was happening on the system at the time. However, the tool or process you use may not get all of the data you need...much like the example you provided where some tools parse voice mail databases, and other do not. So, is the tool you're using getting the data you need, and if not, how can you go about getting that data?

    ReplyDelete
  4. Maria,

    I've been doing some digging into data structures and database specifications, and what really amazes me is the information that's available, but not necessarily visible to analysts who rely on tools. Tools provide a layer of abstraction over the data itself, often hiding the data from the analyst who is not curious.

    Here's an example...take the Firefox 3 places.sqlite database. The moz_historyvisits table includes a visit_type value, ranging 1-7, which essentially describes how the browser got directed to the URL. Like other analysts, I get questions about whether or not the user was at the keyboard at the time...and this value can provide significant context in answering that question. A brief look at the history database specification for Chrome indicates that there are 10 possible values...and yet of the tools that do illustrate this information, none that I've seen really help the analyst understand what they're looking at.


    ReplyDelete