Wednesday, November 6, 2013

Python Parser to Recover Deleted SQLite Database Data

Soooo.... last week I was listening to the Forenisc Lunch  and the topic of parsing deleted
records from SQLite databases came up. These Forensic Lunches are every Friday and cover a wide range of topics relevant to the Forensics Community and are hosted by David Cowen. I highly recommend participating in one if you get the chance. It's actually at 10am my time, so it's more like a Forensic Doughnut for me.

Anyways, back to the SQLite databases....I see a lot of these databases in my mobile phone exams. They can contain emails, text messages, app data and more. It's also not uncommon to run into them on Windows (and Mac) exams as well - think Google Chrome History which is stored in an SQLite database.

SQLite databases can store deleted data within the database itself. There are a couple of commercial tools that can parse this deleted data such as Oxygen Forensics SQLite Viewer.

While a commerical tool is good, its always nice to have an open source alternative. After hearing David mention in the webcast he was not aware of any open source tools that did this, my ears perked and I decided to try my hand at writing a Python script to parse SQLite databases for deleted data.

Luckily, the SQLite file format is nicely documented on the website. I won't go into much detail here as it's laid out very nicely on their website.

Basically the database consists of Pages. Some of these Pages are "leaf table b-trees" which contain the data. In turn, these leaf table b-trees contain cells. According to, SQLite "strives" to place the cell towards the end of the b-tree page (how does a program strive I wonder?).  Because the cells 'strives' to be towards the end  (I keep thinking of Happy Gilmore - Go home ball! Don't you want to be in your home?) the unallocated space is, in essence, the space before the first cell starts. This unallocated space can contain deleted data.

The leaf table b-tree page can also contain freeblocks. Freeblocks are areas of unallocated space tracked by the leaf table b-trees.  So there are two areas within a page that can contain deleted data: unalloacted and freeblocks.

In this example I am going to use the script to parse the Google Chrome History database.  In case you want to play along you can find this file under C:\Users\%USERNAME%\AppData\Local\Google\Chrome\User Data\Default (if you have Chrome installed).

Using the SIFT workstation I ran the script over the History file (by default the Chrome History file does not have a file extension): -f  /home/sanforensics/History -o report.tsv

 The output includes the Type (Allocated or Freeblock), Offset, Length and Data:


Now, an important note about the deleted data. In order to make the data readable, I have stripped tabs,white spaces and non-printable characters in the output.  As much as I love like looking at hex, it was drowning out the strings I was looking for.

You can also run the script in raw mode, which will dump the data field as is: -f mmssms.db -r -o report.txt

This can be helpful if you are looking for timestamps, flags or other data that may be in Hex.

Download the script here. Tested on Python 2.6.4.

***Update 9/2/2014***
Windows GUI and Windows CLI added. Use the same link as above to download any of these versions.