⛅ invi.sible.link, operative reporting

This page get updated near the beginning of every month, to understand the overall picture, consult the Project Plan.

Task list

  1. Improve browser emulation and javascript sand boxing, integrating the Honeynet project Thug, technically this allows us to get a list of all the javascript functions executed going beyond just a static source code analysis
  2. Having a data sharing capability in every node, and look for differences between tracking code
  3. In browser visualization of the results, usable to monitor the trend or visually identify anomalies
  4. Import the browser history of a person to map their profile of exposure / support community driven input (through github files), this approach would allow a more personalized analysis, that goes beyond just looking at the Alexa top 500 sites for each country
  5. Integrate the tool developed by Princeton university in doing trackers fingerprinting, this will provide an intermediate level of detail, still lower than the Thug code analysis capability
  6. Research into how to identify anomalies and tracking related functionality based on the dynamic code analysis provided by 1.
  7. Research into the privacy implications and device fingerprinting used in tracking
  8. Support Latin American communities running the tool, interpolating their results
  9. Write a research report
  10. Work with CodingRights in disseminating the results in Latin American communities
  11. Researcher visualization: the difference between this and point 3 is the amount of detail provided
  12. Wrapping up the project and performing last touches and cleanups

Final report, March 2018

This report is the final one of my OTF fellowship. it tooks even more months of delay, because I get involved in a partially-related experiment. a team and I, setup a system to monitor the Italian 2018 elections (on third party trackers, algorithm influence, narratives). Has been an intersectional research and I postponed the end of this project for that. This test explored also a new branch for this research, you can read it more in the report: Overcame researcher limitation (social network feed vs defined URL list)

For the project conclusion, I changed the website structure and written the conclusive research reports. Any of them is used to introduce a funcionality of the code developed.
invi.sible.link will be used to foster investigation on web trackers analysis and accountability. The scenario of automatic web testing is changing quite fast. It is uncommon, but still happens, to attract supporters which can use the technology with me, and eventually update/maintain the code.
Findings, got used for advocacy and assessment into this, otherwise invisible, phenomena. Occasionally, some NGOs or individual display the interest in the data, and try to figure how these data can be used for advocacy.

The final report is composed by four chapters

Intended audience of these reports: privacy researcher, a web developer, a network analyst or a technologist. Inside you'll find boxes with green colored background, these sub-chapters are intended for technical audience and are used to explain the tool capabilities.

Access to the full report, or pick the chapters individually:

November 2017

During November, besides the exciting OTF summit in Valentia which has open a couple of potential collaboration, is the 11th month of my fellowship and few updates from the academia involved in the privacy research have been published.

External input

Two external products confirm to me the usefulness of the invi.sible.link pipeline and they will provide some useful narrative in the final phase of this project.

Princeton university investigation on Session Reply

Quoting wikipedia:

Session replay is the ability to replay a visitor's journey on a web site or within a web application. Replay can include the user's view (browser or screen output), user input (keyboard and mouse inputs), and logs of network events or console logs. Its main value is to help improve customer experience[1] and to identify obstacles in conversion processes on websites.

The research of Princeton university has been published in November 2017 based on the attention which such feature could raise on website intended for a targeted audience, I've extended the scorecard I'm working on to report this evidence.

As explained by the researchers, they released a list of third party trackers offering session reply services, but this list is not complete and at the moment is a mix of domain names and script. (Wired).

ProPublica spot malware via social media crowdsource monitoring

Targeted attack was the main concern of this fellowship product: what if a malicious script, or injected malware, is served based on the content you are visiting or in your profile background? This has been confirmed to happen, but to do that, a media organization develop a tool, made campaign, analyze data, find results. This is a process who do not scale, and my development goals are exactly make scale for community these experts abilities.

Privacy and security implications

As it was years ago, increases the privacy implication of the unregulated usage of third party trackers. Also if the protection are getting better, for example with Ghostry 8, the challenge is not over, and still, the goal of the project here is making website more responsible and don't leverage only on the self-defense techniques that skilled users can adopt.

Final stage of the fellowship

Nearly all the declared points announced on my project plan should be delivered. The research part, in which I was supposed to look for insights into the scripts behavior, has been limited by the campaigni and partnership efforts.

TODO list for December, to conclude the fellowship

  1. Complete the expert visualization, with the goal of investigating on session reply and canvas fingerprint. It is the bare minimum
  2. Complete the scorecards, at the moment the prototype visible on campaign/irantrex (the experimental campaign are: Iran, Italy, Poland, NGOs, LatAm Clinics, community-based WTO Argentina monitor, besides the american muslims campaign which need some partnership to continue properly)
  3. Try to lunch a collaborative analysis over World Trade Organization meeting in Argentina, more person will adopt the social extension, and more input can be analyzed.
  4. Complete the clinics analysis, test the bot concept and outreach to other organizations. The goal is test a campaign tool.

August, September and October 2017

This report, which is supposed to be monthly, is published after three months. I collaborated with a third party organization in experimenting an analysis based on the scraping of links from social media links and trackers analysis. At the moment I'm writing, beginning of November 2017, a report is in development (is not related to trackers analysis), and the opportunity raised a new research branch in this research.

Social media observer and realtime analysis

One of the challenges in web analysis is the impossibility to test all the condition in which a user will navigate. Making a common assumption, you can end up in doing a mass study but not to spot some targeted web surveillance because the script you hope to catch will just trigger in specific content.

The state of art solutions are:

Considering how Facebook and Google are the de-facto gatekeepers of the WWW, the most appropriate solution to monitor the quality of web pages, is look at what is effectively provided by these platform. But, in order to do so, you have to observe the personalized experience of the users. With a browser extension is it possible collects the links appearing in an user timeline (or the one shared by a selected community), and analyzed them.

This approach is currently in use, and will provide output during the month.

LatAm realtime tracking experiment

A work in progress is the analysis of the links shared by Argentinian media and political figures, this permits to test a new branch of research, in which the links are collected dinamically instead of be pick by researchers.

Analysis of javascript tracking behavior

Has been implemented the usage of Chrome with PrivacyBadger as part of the testing suite, this permit the first javascript fingerprinting and visual reporting.

"VP": "virtual", 
"_id": "59b3ef87f50635667354b8b6", 
"acquired": "2017-09-09T13:06:06.072Z", 
"campaign": "germany", 
"href": "http://www.gulli.com/", 
"id": "2db2fd4cbf447b63c7be59cb0d0993f95164b68e", 
"inclusion": "imagesrv.adition.com", 
"navigator_plugins": 7, 
"navigator_userAgent": 1, 
"needName": "badger", 
"promiseId": "e70987de9f779c3924cffa601cd53f2c926d7973", 
"screen_width": 1, 
"scriptHash": "32f51deedfd3dd9a2f3a52a352367f6d6bae6a67", 
"scriptacts": 9, 
"subjectId": "a6074ec62e6d2d3775919ed4a8b53ba4f990e885", 
"version": 1, 
"when": "2017-09-09T13:41:24.310Z"
"VP": "virtual", 
"_id": "59b3ef87f50635667354b8b7", 
"acquired": "2017-09-09T13:06:06.072Z", 
"campaign": "germany", 
"href": "http://www.gulli.com/", 
"id": "2db2fd4cbf447b63c7be59cb0d0993f95164b68e", 
"inclusion": "pagead2.googlesyndication.com", 
"navigator_userAgent": 7, 
"needName": "badger", 
"promiseId": "e70987de9f779c3924cffa601cd53f2c926d7973", 
"screen_availWidth": 2, 
"screen_width": 2, 
"scriptHash": "63d92bc0c16414f12fd8bb04e6e8189fb9a62ea6", 
"scriptacts": 18, 
"subjectId": "a6074ec62e6d2d3775919ed4a8b53ba4f990e885", 
"version": 1, 
"when": "2017-09-09T13:41:24.310Z", 
"window_localStorage": 7

This result can be fetched via API (i.e.: https://invi.sible.link/api/v1/details/itatopex ), it contains most of the javascript calls usable in browser fingerprinting.

JS details visualization

The interface wannts to show graphically what the third parties can potentially access. Every campaign is producing this outut daily, the incomplete visualization can be this example from the Cilean online clinics: Clinics-CL

July 2017

The last month has been used for advocacy, networking and personal time.

In July, I was in Cartagena (Colombia), then in Rio Magdalena meeting indigenous community during their pacification process, then in Lima (Perù) meeting some local digital rights group, then to SHA2017 hacker camp near Amsterdam.

The most exciting developments are the conferences and the meeting, in the Latin American countries, I made with CodingRights.

I explained tracking implication in the last years, with mixed result. I am pleased to see how an analysis including only a contextual group of sites enables compelling narratives.

The story line used with CodingRights, in Cartagena was following this logic:

  1. you as a citizen could have a health issue, and health insurance is necessary for it.
  2. In the so-called quantify society, personalized services are a mean permitting more customer exploitation. Policy and common knowledge seem not ready yet to face these offers.
  3. If an online clinics include third party trackers, your activity on their websites has only one link of distance to your physical person.
  4. Your navigation an online clinic website could leak some patterns: a particular exam you are looking for, symptoms, prescription.
  5. This information can be used by the data processor, sell through data broker and ends up in an Insurance company. then used to increase the profit.
  6. The business model of hospital, public or private, is not ADS based, third party trackers have not the same justification used, for example, in the news media debate.

This simple sequence it is worked in explaining the necessity of ad-blockers, the responsibility of websites and power dynamics of the data processor.

The visualization elaborated follow the same pattern for every country, and using Tableau has been quite simple do some prototyping.

The outreach of I was looking for, is meeting partners who try to figure out their concrete problem, and how data broker could exploit the context they live.

If I have to make a list of the topic raised in the discussions, I recall more often:

I am exploring a theory: every connected human, belongs to many social context.

Moreover, we as humans are not vulnerable in all of this environment, you can even belong to thug gang, and nobody in your town will harm you, but your risk not finding any insurance coverage because a multinational denies your health care.

This approach tries to simplify the creation of campaign intended to the life aspect in which a person is vulnerable. The content produced is designed to speak to a group of a person which feel themselves at risk.

Imagine two characters: "A political opponent in Iran" and "a poker addict looking for a new job." They face different risks, your empathy on the situation is probably different and the assistance they deserve too. InviSibleLink is a framework, can be used by two separate group, one speaking Persian and the other talking to addicts because massive web profiling can harm both of them.

The "nothing to hide" narrative want to be addressed making many websites. The hopes are anybody will find the one who speaks to the part in which is vulnerable. Few persons feel completely safe, and they are not the target.

This approach has been confirmed and would define the cultural inheritance left after the fellowship, speaking of which, I have to run a little bit now to catch up with the deliverable planned.

June 2017

The month has been used for reseaerch and advocacy more than development.

IACAP conference and presentation

I had my first presentation about third party trackers analysis, the content collected and the experience done would be reused in future presentations, a blogpost explaining the context will be published as soon as I make new visualizations with Tableau. I'm doing data investigation with that tool because it is much more efficient than developing my visualizations before have understood the complexity of the data. I've written a blogpost to test a simplify communication on algorithms, profiling and political impact: profiling, algorithm surveillance and religious freedom.

General software improvement for the campaign checking

A simple approach to monitor the trend on how website are doing has been implemented (a random example), using the interface of last activitivies me and others partecipants check the trends.


I get in touch and obtain a key of urlscan.io, it is a service which monitor websites inclusion from their own infrastructure. Can be useful as comparison. A driver to use the service has still to be implemented.

WebTAP and their publications

WebTAP is The Web Transparency and Accountability Project, of Princeton University. I subscribed to get access to of their data as researcher. I didn't yet have a chance to try these. The research team has released some inspiring paper, too, I had the opportunity to read this month:

PhantomJS will be unmaintained

This is not a big deal, considering the probes diversification I have in mind, but considering the capability of collecting OpenGraph data, probably I have to replace the support of phantomJS with nightmarejs, but in general, I'll look forward to integrate once for all Thug.

Experimental visualizations

I'm experimenting with Tableau Public, on this accountClaudio at tracking exposed, visualizations. I will use these for the conference in Cartagena.

May 2017

In May 2017 the first two campaign got released, mostly I worked as a facilitator, text editor, visualization revision, double checking the results and using this achievement to display a vision for partners.

Results publications to a broad audience

The month of May hasn't lead to any particular improvement. Rather a stabilization of the interfaces and the workflow. The month has got the two releases expected and the progress with CodingRights about our presentation in July.

In specific, the episode of the TV show using the analysis website has got 20k unique IP access and this spike in the web traffic:

Deflect.ca offers the CDN and the technological interface to query the users.

At the moment, the experience done has been useful to stabilize a tangible result for a broader audience. Also, minor initiatives are running, and currently I'm in Istanbul to make progress on an analysis in Turkey.

April 2017

Most of the time in April 2017 has been dedicated in separating the analysis content from the campaing content. Campaign has to be delegated as much as possible to the campainer (the local community aware of the social and digital issues), and this has required some polishment from my side. An example campaign, with 100% HTML and zero code, is here implemented.

Organizing documentation

With the first adoption of the technology, I've started to organize documentation and define how the project might be integrated into other project doing the same analysis. The README on the campaigns is the reference for them and is currently kept reference by who's organizing those.

Academy and outreach

Three important events involving my research in the fellowship are going to happen between May and July.

  1. I'm working with an investigative journalism team to explain, for a (very) large and basic audience third party trackers, privacy and security implications. In the second half of May might be done.
  2. Big Data for the South, in Cartagena, has accepted the application of Me and Joana of CodingRights, about third party analysis in Latin America compared with other Western countries.
  3. I got accepted at the annual meeting of Stanford University, in International Association for Computing and Philosophy. Has been accepted a talk of mine about third party profiling and the impact of race and religious discrimination.

In the first and third point above, The potential outcome is a quite large visibility over the code repository, in the hope some open source developer with free time take interest in the project.

Experiment with OpenWPM

Princeton University, after webXray, improves their technology with OpenWPM it is a nice developed tool that might represent a valid integration and extension to my analysis. It uses a different format, support much more interaction with a non-headless browser and is less orchestrated.

March 2017

RightsCon and the research of local supporters

The month of March (and the first days of April) I attend in RighsCon and at the International Journalism Festival. My presence in there was justified by some talks I gave about algorithm accountability, I had some meetings with teams from different countries and contexts. Discussions are proceeding further, in order to begin an analysis campaign.

The countries of interests are Iran and Turkey. Finding local supporters is getting more vital, and I'm expanding the side of the project intended to communicate the results. The goal is split my technical analysis and the graphs with the advocacy material. Having a clear separation of duties would make, in theory, my and local supporters work for the same goal without blocking each other. A clear separation between the technical analysis and the local declination intended to be done.

Human Rights Researcher and Internet Policy Analyst has been my target to get in touch with.

Side life

I started a trip to and around Europe, to meet a certain number of potential collaborators, I'm traveling in these months and my update schedule is getting some delay. The third party trackers analysis keep raising political (for example, the Sleeping Giant) and technical interests as highlighted below:

Updates from Academy and NGO

Interesting updates are happening in the academic environment. Below the most meaningful that provide additional arguments usable by this project outreach: Security and harm caused by third-party inclusions, Amnesty International on data brokers (and their impact in religious profiling), Cross browser hardware fingerprinting.

Research plan with CodingRights

Me and CodingRights team apply for a conference, we'll do in the next months a comparate analysis of sensitive (and less sensitive) websites among latin American and western country (as a comparison). Is expected to be one of the core results of this fellowship, or at least, a lasting example of the analysis method. In the meantime, analytics and deeper script analysis with Thug will be supported. Results are scheduled to be available in June 2017.

February 2017

Manage a campaign based website

I'm realizing three campaign with a Western audience in mind. These campaigns do not strictly fits with my fellowship goals, but are useful steps for outreach and early feedback.

I started the development of the tool named social-pressure.

In RightsCon, by the end of March 2017, I'll meet partners from Turkey, Iran and other countries to discuss how to begin some analysis campaign.

As technical imporvement, I've extended the campaign manager in importing CSV: this enables collaborator to work with a spreadsheet and github without dealing with more technically complex format (I use JSON natively). Also, it might be edited directly in github, lowering the entance barrier.

Campaigns in progress and testing of the workflow

In the three campaign I'm testing, the d3 plugin sankey is helping in the generation of a graphical appealing scalable visualization, like:

Joint application with Coding Rights

A research paper to compare 10 South American countries and 5 Western countries is a work in progress, we'll apply for academic conference.

Stable monitoring of the analysis

The monitoring pipeline is proved to be stable, the statistics are available here, they are showing the last two days and there the last 20 days (might take a while to load the graphs)

below you can see a strange pattern that happens only to the machine located in Washington.

Details: I'm using three boxes. Washington, Amsterdam and Hong-Kong. They share the same software and they execute the same command at the same time. This is done to reduce the differencies across tests. In a specific website under investigation, a different code is sent to the Washington box. Has the side effect of freeze phantomjs and keep it running. From the load average graph I spot this first anomaly:

In the next months, with the integration of Thug, will be easier perform javascript inspection and investigate on the reasons.

January 2017

An update on the vision

Inspired by this title Hacking the attention economy, Has become clear that my production has not just to be a website full of results, because a website has these limits:

This is apipeline, a series of tools that constantly process input to produce output. My output has to be:

Therefore, in this post-prototypel phase, campaign pipeline has to be the priority. This might permit to experience since the beginning how reach out to different social circles, and will force the prject in keeping an operating workflow despite the tecnical challenges that are going to be faced later.

Flexibility in target specification

2075 ۞  ~/Dev/invi.sible.link DEBUG=* filter='{"iso3166":"BR"}' bin/directionTool.js 
  directionTool Unspecified 'needsfile' ENV, using default config/dailyNeeds.json +0ms
  directionTool Using config/dailyNeeds.json as needs generator +2ms
  directionTool content {"needName":"basic","lastFor":{"amount":24,"unit":"h"},"startFrom":"midnight"} +31ms
  directionTool Processing timeframe: startFrom {"amount":24,"unit":"h"} (options: midnight|now), lastFor "midnight" +0ms
  lib:mongo read in subjects by {"iso3166":"BR"} sort by {} got 1 results +13ms
  directionTool Remind, everything with rank < 100 has been stripped off +1ms
  directionTool Generated 80 needs +5ms
  directionTool The first is {"subjectId":"65bdefee473b2aa910ff52efdcb0425f3d4201d6","href":"http://google.com.br","rank":1,"needName":"basic","start":"2017-01-11T02:00:39.251Z","end":"2017-01-12T02:00:39.251Z","id":"d8dcdbc594dbe873a3b5d4378420ea5eddc1ce9c"} +0ms
  lib:mongo writeMany in promises of 80 objects +0ms

This approach is in experimentation in February, for the first targeted campaign. The goal of such campaing is getting visibility, constructive criticism, and see the overall reaction about this kind of monitoring approach.

CodingRights campaign progress

We made a meeting in CodingRights planning the Chupadados campaign, and currently I'm running a prototypal experiment outside the fellowship scope, in order to test the infrastructured and the content production pipeline

Long term monitoring it is working properly

The stats page is working smoothly since time, I'm using it to keep in check the multiple operation performed, new graph might be add apply to specific campaign. Study which kind of graph is still a work in progress, but I've already done successiful experiment in integrating: rawgraphs.io.

December 2016

web Crawling and Orchestration works for me

The structure running is simple and easily to be distributed. It involved few componenents.

Vigile (central authoritity)

At 5AM GMT, a command is executed, it creates a list of tasks that has to be completed. This list is derived by the list of Subjects under analysis, and can be reached publicly via API:

http http://invi.sible.link:7200/api/v1/getTasks/Aname/20
۞  ~/Dev/invi.sible.link http http://invi.sible.link:7200/api/v1/getTasks/Aname/20
    [   {
            "AMS": true,
            "HK": true,
            "end": "2017-01-03T00:00:03.467Z",
            "href": "http://baidu.com",
            "id": "c0ff474789434497abdf3b335f1bdb1def18993a",
            "Aname": false,
            "needName": "basic",
            "rank": 1,
            "start": "2017-01-02T00:00:03.467Z",
            "subjectId": "b4ef98150c7eeb7c03afb40437ab4c34ec0620ad"
        }, {
            "AMS": true,
            "HK": true,
            "end": "2017-01-03T00:00:03.467Z",
            "href": "http://qq.com",
            "id": "54a5d8d9ce08d5899a782efb5adc73e29f62dd3d",
            "Aname": false,
            "needName": "basic",
            "rank": 2,
            "start": "2017-01-02T00:00:03.467Z",
            "subjectId": "35eabd32318c6082cb645fbacfe5bca8f2baeb50"

This model has technical properties that helps me in the orchestration:

  1. the field needName specify the needs. At the moment, the only need is namedbasic and means, crawl with phantomjs. This permit specialisation in distribution, because, if the vantage point don't support that test, can just skip to the next need.
  2. the fields HK, AMS and Aname have boolean values. It mean if the vantage point (specified in the request) has absolved or not the task. The value false means the VP has only got the task, if the value is true means has solved the task and confirmed the execution.
  3. b start and end describe the window of time in which the task can be absolved.


quick and dirty approach: every two minutes contrab call for tasks to be done. ask for 30 to be exectuted 10 per time. maximum time 30 seconds, after 35 is killed. I'll measure performances and failure ratio later on.

it saves and import in mongodb results

Above you can see the level of detail experimented now. Having many descriptive fields will helps finding correlation, trend, pattern.


make the results available for who need these, referenced by the promiseId. It display some basic graph of the data stored. was it working with only 1 day of results, with more than one, require an optimization of the analytic, because is too big.

As part of this improvement, the component machete will be completed.

Visulisation with Raw and c3

Work in progress is integrating RAW, framework, and c3To begin with a decent visualisation of technical results

Logical workflow of the pipeline

A decent distribution and resiliency is going on with the designed pipeline, here the scheduled tasks. the next component will fetch from the result and complete the pipeline.

November 2016

Components design

Define the components that should to work together in order to accomplish the pipeline. The goal is pretty ambitious, because the system has to operate on many vantage point, and be centrally coordinated. Enable the analyst to get easily results, and enable CodingRights and me to setup declined campaign without effort. The current schema is composed by 6 components each of it, with a small dedicated tasks. I have not used the prototype named Trackography-2, because the risk was a complexity increment. The reason in the components splitting, is to keep the design "simple and stupid" as possible.

  1. Component "storyteller": is the on running in the public website https://invi.sible.link, will contain information for technical audience and all the research tool developed this year. Will serve the results as open data, enabling third parties like CodingRights to integrate the data in their advocacy.
  2. Component "machete": aggregate the results from the vantage point and perform analysis, correlation, high level function to produce results. For example: rank the most invasive trackers, find correlations among the last day result and the last month. Will be the tool operating over the database and producing data-driven-insights.
  3. Component "vigile": will orchestrate the test on the vantage point, the analysis of machete, and keep track of the infrastructure performances
  4. Component "chopstick": inheritance from the Littlefork pipeline in which I worked following Christo's of TacticalTech directions. Is the component wrapping the execution of phantomjs and Thug, being a specialized micro-service on the vantage point.
  5. Component "exposer": The technical service needed to export the results from the vantage point to machete
  6. Component "social pressure": as the name evoke, is one of the key experiment of this project. A components containing the libraries, API keys to be a simple social media bots feed by machete.

Setup boxes infrastructure

Thanks to the OTF cloud, I setup easily four boxes to have the components runs, and a situation in which, box less or box more, the system can continue to operate and easily migrated, if other organization show interest in maintain the project after the fellowship, or just to run their own set of tests.

I recovered the lists I was using to do the previous experiment, they are nicely visualized with DataTables here:


CodingRights has launch at the end of November a campaign website targeting Latin America communities, the name is Chupadados it is a campaign exploring different narratives to raise awareness on data surveillance, government and corporate, for Spanish and Portuguese audience. The firstdeclination of invi.sible.link will be on a selected list of Brazilian websites, all related to the sexual health services. This would be an experiment to advocate on a target community outside our common audience.

Test webXray on OTF cloud

The tool webXray has many things in common with this project; I start to assess if code base can be re-used. As first, I tested webXray on the three vantage points on OTF cloud, it is worked smoothly with a low effort. It is an interactive tool, therefore some of the assumptions behind the architecture might be different from my needs, still, looking in the internals:

In my current design these blocking operation are a small engineering problem. In the examples below you'll see the effect of don't managing such blocks. Without a manual intervention the pipeline remain blocked forever, and spotting all the possible conditions is a complex problem.

When you see the 7 dayand 7 minutes is because I killed the process manually. webXray solved this problem with the hardcoded time limit, probably I'll use the same, if a smarter solution keep failing.