Sunday, July 14, 2013

Google Analytic Values in Cache Files

A while ago I wrote about Google Analytic Cookies. These cookies can contain information such as keywords, referrer, number of visits and the first and most recent visit.  This information is stored in cookie variables called __utma, __utmb and __utmz.

These __utma and __utmz values are not just stored in Cookies, but also in Google Analytic GIF
requests, which in turn are stored in the browser cache files.

Webmasters using Google Analytics put a piece of code on their page that might look something like this:

<script type="text/javascript">
  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-12345678-1']);
  _gaq.push(['_trackPageview']);
  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();
</script>

According to Google:

 “When all this information is collected, it is sent to the Analytics servers in the form of a long list of parameters attached to a single-pixel GIF image request. The data contained in the GIF request is the data sent to the Google Analytics servers, which then gets processed and ends up in your reports” 1

This GIF request looks something like this - notice the parameter 'utmcc' - this holds the Google Analytic Cookie values (they weren't kidding when they said it was long):

HTTP:http://www.google-analytics.com/__utm.gif?utmwv=5.4.3&utms=4&utmn=1046933378&utmhn=www.deviantart.com&utme=8(user-type)9(visitor)11(1)&utmcs=windows-1252&utmsr=1600x900&utmvp=1583x754&utmsc=24-bit&utmul=en-us&utmje=0&utmfl=11.7%20r700&utmdt=deviantART%20%3A%20Log%20In&utmhid=1021688701&utmr=0&utmp=%2Fusers%2Fwrong-password%3Fusername%3DGuiltyAsSin%26ref%3Dhttp%25253A%25252F%25252Fwww.deviantart.com%25252F&utmht=1373048611329&utmac=UA-322734-1&utmcc=__utma%3D212885643.1758037532.1373048500.1373048500.1373048500.1%3B%2B__utmz%3D212885643.1373048500.1.1.utmcsr%3Dgoogle%7Cutmccn%3D(organic)%7Cutmcmd%3Dorganic%7Cutmctr%3D(not%2520provided)%3B&utmu=qR~

In addition to the utmcc parameter, there are up to 31(!) other parameters that this utm.gif value can hold.1

Some of these are:

Utmdt: Page Title
Utmhn: Host title
Utmp: page request
Utmr: referral with the complete URL
 
Other variables include the version of Flash used, screen resolution, screen color depth, and language encoding. To see the full list, check out this Google Developers page.

If looking at the above URL makes your eyes cross, here is what a manually parsed version looks
like:

utmhn (Host Title):             www.deviantart.com
utmdt (Page Title):            deviantART : Log In 
utmp (Page Request):      /users/wrong-password?username=GuiltyAsSin&ref=http%253A%252F%252Fwww.deviantart.com%252F

The utmcc parameter holds the Goggle Analytic cookie values. So manually parsing these values 2:

utma (First Visit ) =7/5/2013  6:21:40 PM
utma_(Previous)   =7/5/2013  6:21:40 PM
utma (Last Visit)   =7/5/2013  6:21:40 PM

utmcsr (souce site) =  google
utmctr (keywords that found site) = not provided

(For a refresher on how to parse the GA cookie values see this article on DFI News , or my blog post here)

When I ran into these values on an exam and needed to parse a lot of them, I reached out to Cheeky4n6Monkey who wrote an awesome script to parse them. What is cool about his script is it works with various file formats. For example, it can parse the Safari SQLite cache.db and the Firefox __CACHE_ files.

Speaking of which, here are the locations for some different cache files holding these utm.gif? image requests:

Locations

FireFox
C:\Users\%USERNAME%\AppData\Local\Mozilla\Firefox\Profiles\%RANDOM%.default\Cache   

These utm.gif? values can be in the _CACHE_001_, _CACHE_002 etc files and the randomly named files in the sub-folders.

Chrome
    C:\Users\%USERNAME%\AppData\Local\Google\Chrome\User Data\Default\Cache
    data_1, data_2 etc files 

Safari
    Users\%USERNAME%\Library\Caches\com.apple.Safari\cache.db
       
Windows
    C:\Users\%USERNAME%\Local\Microsoft\Windows\WebCache\WebCacheV01.dat

So in the course of an exam, a quick way to locate and parse the GA GIF urls might be this:
  • Run a keyword search for “utm.gif?”
  • Filter by unique files
  • Export out the files into one directory
  • Use Gis4cookie.pl to parse the entire directory with the -p switch:
    Gis4cookie.pl –p /home/sanforensics/exported_files –o All_Cache_Entries.tsv
 Click, Parse, Boom. One worksheet with tons of information:


The script, Gis4Cookie.pl is still being perfected, so consider this a teaser, but rumor has it it will be out shortly. Make sure to keep an eye on Adrian's blog...

[Edit 7/17/2013] - Adrian's script in now available.

References:

1. https://developers.google.com/analytics/resources/concepts/gaConceptsTrackingOverview?hl=es-ES

2.https://developers.google.com/analytics/devguides/collection/gajs/cookie-usage?hl=en