These __utma and __utmz values are not just stored in Cookies, but also in Google Analytic GIF
Webmasters using Google Analytics put a piece of code on their page that might look something like this:
var _gaq = _gaq || ;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script'); s.parentNode.insertBefore(ga, s);
According to Google:
“When all this information is collected, it is sent to the Analytics servers in the form of a long list of parameters attached to a single-pixel GIF image request. The data contained in the GIF request is the data sent to the Google Analytics servers, which then gets processed and ends up in your reports” 1
This GIF request looks something like this - notice the parameter 'utmcc' - this holds the Google Analytic Cookie values (they weren't kidding when they said it was long):
In addition to the utmcc parameter, there are up to 31(!) other parameters that this utm.gif value can hold.1
Some of these are:
Utmdt: Page Title
Utmhn: Host title
Utmp: page request
Utmr: referral with the complete URL
Other variables include the version of Flash used, screen resolution, screen color depth, and language encoding. To see the full list, check out this Google Developers page.
If looking at the above URL makes your eyes cross, here is what a manually parsed version looks
utmhn (Host Title): www.deviantart.com
utmdt (Page Title): deviantART : Log In
utmp (Page Request): /users/wrong-password?username=GuiltyAsSin&ref=http%253A%252F%252Fwww.deviantart.com%252F
The utmcc parameter holds the Goggle Analytic cookie values. So manually parsing these values 2:
utma (First Visit ) =7/5/2013 6:21:40 PM
utma_(Previous) =7/5/2013 6:21:40 PM
utma (Last Visit) =7/5/2013 6:21:40 PM
utmcsr (souce site) = google
utmctr (keywords that found site) = not provided
(For a refresher on how to parse the GA cookie values see this article on DFI News , or my blog post here)
Cheeky4n6Monkey who wrote an awesome script to parse them. What is cool about his script is it works with various file formats. For example, it can parse the Safari SQLite cache.db and the Firefox __CACHE_ files.
Speaking of which, here are the locations for some different cache files holding these utm.gif? image requests:
These utm.gif? values can be in the _CACHE_001_, _CACHE_002 etc files and the randomly named files in the sub-folders.
data_1, data_2 etc files
So in the course of an exam, a quick way to locate and parse the GA GIF urls might be this:
- Run a keyword search for “utm.gif?”
- Filter by unique files
- Export out the files into one directory
- Use Gis4cookie.pl to parse the entire directory with the -p switch:
Gis4cookie.pl –p /home/sanforensics/exported_files –o All_Cache_Entries.tsv
The script, Gis4Cookie.pl is still being perfected, so consider this a teaser, but rumor has it it will be out shortly. Make sure to keep an eye on Adrian's blog...
[Edit 7/17/2013] - Adrian's script in now available.