Repressive governments have abused “third party trackers” to exploit users or even third parties, for example, malware by FinFisher/HacklingTeam was injected by exploiting unencrypted HTTP connections, or the so called “Chinese Greatcannon”, that was transforming, even international, users accessing a popular chinese website into bots part of a DDoS attack. Considering the latest trend of exploiting the presence of unencrypted HTTP connections or the insecurity of some third party services included in the common web navigation, we need a new way to raise awareness. Additionally, we don’t know yet if third party trackers are deploying more invasive tracking technology to a subset of users. This investigation has never been done and my goal is to perform research comparing differences in how tracking is done depending on the country of the user and how it changes over time.
The analysis set would be (at the beginning), the list of the most accessed website per country and per categories, but the testing list will be opened up for community contributions at a later stage.
The host organisation, CodingRights, is doing an extensive antisurvellaince project in latin america. I picked this NGO to support me in the communication/advocacy side. I don’t need technical support for this organisation, but my limit is rather the outreach.
Explaining which are the finding, explaining which are the responsibilities and the protection tools.
Connecting the line between web tracking for advertising and surveillance is a tricky topic touching different aspect, I’m keeping attention of:
- Through the low level analysis (done with the honeynet tool Thug integration), the research project would map out which tracking techniques besides cookies are used. The data produced would be open data, permitting other researchers to do their own analysis. This is the key value of the project: permit an understanding of tracking technology, providing up to date information and raw data to support a campaigning effort.
- If malware/malvertising spreads through compromised third parties (this is unlikely, because I’ve not the same capacity of an anti virus industry), or very invasive fingerprinting techniques appear overnight, an “emergency” communication will be dispatched. This has to be done in conjunction with CodingRights, and the goal would be to raise awareness in the impacted communities and report the actors involved. These intermediary report will be published and improved during the course of the year. It is very hard for me as an individual to have enough resources to actively look for malvertising campaigns (this is something that even antivirus companies, that have much more resources, are struggling with), however the researchers interface will be designed to identify spikes (or other anomalies) as well as other privacy invasive behavior.
- Provide a resource useful to website owners that include third parties so they can act more responsibly and make a more informed decision over what they should or should not be including on their site.
- Investigate if third party tracking is something that changes depending on the country of the user. For example, does an Indian company serve more “aggressive” trackers to a Pakistani audience? Are certain geopolitical factors linked to serving different trackers to users? To date nobody has found any evidence of this occurring, but it is something that can happen and if found would be an example of “algorithmic discrimination”.
- Having a tool flexible enough to react when new conflicts arise. At the beginning the analysis takes into account the “most accessed websites per country and category”, but the goal is follow the current events and be reactive to what is happening in the world.
The research goals
The end goal of the research project is to provide a daily updated database on tracking technology; enable researchers and web content managers to understand the security and privacy implications of their third party inclusions.
The midterm goal is to engage privacy aware community to exert pressure on site owners that include highly invasive tracking technologies. Never before has the security and privacy implication of third party trackers been assessed in this way. This represents a new way to express critical and technical judgment on trackers decisions.
The application under development is a pipeline that supports data collection, analysis, minimization and open data. The open data factor is a key value, I want to enable researchers and analysts around the world to understand the impact of tracking and tracking technologies. It is generally not so hard to understand the technology behind tracking, what is however hard to communicate is the broader impact that this phenomenon has.
This data collection will enable mine (and others, considering every result will be open data) research about tracking surveillance.
Having a community of supporters will be necessary to provide “local contextual knowledge”, that I cannot possible have for all involved regions. This community will also be providing lists of websites to test and perform campaigning based on the results of my analysis.
Coding Rights will be the first NGO implementing this workflow, this will fit in Coding Rights ongoing investigation on online surveillance practices in Mexico, Argentina and Brazil.
The project will include the following high level stages of research and implementation
- Make a list of sites, technically selected and community selected. Have a flexible methodology to keep the sources updated to reflect current events.
- Collect data on the sites in question from one (or more) network vantage points. One is enough, but more vantage points means that a more thorough and in depth analysis can be performed.
- Use the analysis to feed a daily updated visualization for researched based on parallel coordinates.
- Disseminate the results of the research and analysis with help from Coding Rights.
- Having an outreach capacity. assess the “campaign feed by data” concept. Improve the analysis of the injected script in order to extract more information. having more than one observation data point, in order to compare the same trackers from two different points in the network. Timing of the 2nd milestone: at 6 months since the beginning of the project (a narrative and a campaign strategy has to be implemented, in order to get supporters around the world)
Firm list of technical activities
every activity is supposed to fit in 1 month of job
- Having a datasharing capability in every node, and look for differences between tracking code.
- In browser visualization of the results, usable to monitor the trend or visually identify anomalies.
- Import the browser history of a person to map their profile of exposure / support community driven input (through github files), this approach would allow a more personalized analysis, that goes beyond just looking at the Alexa top 500 sites for each country.
- Integrate the tool developed by Princeton university in doing trackers fingerprinting, this will provide an intermediate level of detail, still lower than the Thug code analysis capability.
- Research into how to identify anomalies and tracking related functionality based on the dynamic code analysis provided by 1.
- Research into the privacy implications and device fingerprinting used in tracking
- Support Latin American communities running the tool, interpolating their results
- Write a research report
- Work with CodingRights in disseminating the results in Latin American communities
- Researcher visualization: the difference between this and point 3 is the amount of detail provided
- Wrapping up the project and performing last touches and cleanups.
Anticipated outputs and outcomes
The surveillance implications of third party trackers has still to be explained to a wider audience. The past year debate around “ad blocking” has shown certain levels of misunderstanding on the privacy and security implications of unsafe (non https) third party inclusions.
As activists and journalists security outcome, a better understanding of the attack vector and pointers for countermeasure are the outreach goals of this project.
Note: TacticalTech , my former employer, has no role in this fellowship. I think the whole project would be beneficial also to the past product Trackography, currently maintained by TacticalTech most of my results would be published as open data.
For this project, I will use another domain name to present the result.
Why is the selected host organization best suited to mentor your project?
Coding Rights has being doing research, advocacy and awareness raising on privacy and surveillance practices in Latin America, particularly through antivigilancia.org (with content available in Spanish and Portuguese). I will integrate my results with their communication. Coding Rights on my engagement said:“We consider that the expanded version of Trackography could be a great tool for visually translating privacy rights into clear abuses in our daily transfer of data while simply browsing. And it particularly fits with our project on story telling entitled “Unveiling Surveillance Practices in Latin America”, which will be a platform/repository for investigations and storytelling experimentation on surveillance and privacy rights.”
What do you expect to be the primary outcome of this project for a general audience?
An improved understanding of tracking techniques: having a worldwide assessment of the tracking systems existing beside cookies.
Report the most abusive behavior and stimulate a technical, critical judgment of third party trackers. At the moment website owners choose what scripts to include on their site carelessly, through this project we hope to raise global awareness on the significance of making an informed choice on the matter.
How will you collaborate with other researchers working in this field?
Using Client identification mechanism 2014 report, as reference, I will confirm the findings, find variations, provide numbers to confirm that document. Mostly I want provide Open Data permitting other researchers in doing analysis over my data. In the last months the interest around web tracking is increased to be defined as totally out of control, and so the debate on ADBlocking and security.
Princeton University made a research on 1 million website, but thanks to my previous experience I know that trackers change quite fast. Researcher shouldn’t use a static data as reference. Princeton research don’t consider all the subtle way trackers can use to do user fingerprinting. In my case, with the integration of Thug, I can provide a more detailed analysis re-usable by other researcher in this field.
In theory, I’m operating in a field where at least four kind of researchers can be interested:
- web technologies: technical analysis of web driven malware, invasive scripting, invisible tracking mechanism
- privacy activist: surveillance researcher and activists
- policy analyst: to realize if the Term of Service, EULA, international polices are aligned with the state of art
With OONI project lead Arturo Filastò, I discussed the possibility of an integration with the raspberrypi network deployed by OONI. This can permit the usage of many advantage points in different Network. This is a viable hypothesis but should be explored only if the vantage point become essential in the comparative analysis.