Abstract

This analysis looks at Brasil, Chile, Colombia, and Paraguay. The list of website analyzed can be updated at any time by Tedic, the project coordinator. The invi.sible.link system examines them every day, but the analysis reported here are the footprints of javascript functions. a group of partner, Tedic, and I, worked on this project, here is the list of website analyzed. The invi.sible.link system examines them every day, but the analysis report here are based on one day of observation.

Observatório Latino Americano de Paginas Estatales - OLPE

Fundacion Karisma (Colombia), Tedic (Paraguay), Coding Rights (Brasil) and Datos Protegido (Chile) are working together to develop the Observatório Latino Americano de Paginas Estatales. It's an advogacy project, to be launched soon, in order to measure security of governmental websites. Tedic is developing the backend and, with the indication from Coding Rights, it got to know invi.sible.link as an important tool to be integrated in the tests of the platform. It is currently running it's own instance for the partners. Which will be integrated with other security tests conceived by the group.

User's fingerprinting via Javascript execution

There are well-known technologies permitting to track a user even without cookies. Policy and institution regulated the cookies, they are old technology, one of the many which have been regulated, but browser fingerprint permits to third-party trackers to get the same benefit escaping from users, policy or browser control.

This analysis is inspired by the 2014 paper "Technical analysis of client identification mechanisms", and by Panopticlick. Javascript function can be used to extract information from the browser. This information produces a certain amount of fingerprinting bits, usable to track users also without cookies or across multiple browsers.

invi.sible.link looks at the following javascript call (I'm reporting this list here because the visualization below reference the calls through the [letters]

navigator.userAgent ⟶ [uA]
navigator.language ⟶ [l]
window.devicePixelRatio ⟶ [pix]
navigator.languages ⟶ [L]
screen.colorDepth ⟶ [colorD]
navigator.hardwareConcurrency ⟶ [hw]
navigator.cpuClass ⟶ [cpu]
navigator.platform ⟶ [platform]
navigator.doNotTrack ⟶ [dNT]
navigator.maxTouchPoints ⟶ [maxTouch]
screen.width ⟶ [width]
screen.availWidth ⟶ [availW]
Date.prototype.getTimezoneOffset ⟶ [getTZ]
window.sessionStorage ⟶ [sessionStorage]
window.localStorage ⟶ [localStorage]
window.indexedDB ⟶ [indexDB]
window.openDatabase ⟶ [openDB]
navigator.plugins ⟶ [plugins]
window.CanvasRenderingContext2D.prototype.rect ⟶ [canvas2D]
window.WebGLRenderingContext.prototype.createBuffer ⟶ [webGL]

Every website tested has included a certain number of javascript, I was researching which is the behavior of the scripts and if we can measure their "invasiveness". This was the goal of the analysis: permit a more granular judgment script, understand if and how ad-blocker can improve, understand responsibilities of third-party platforms.

In the visualization below, Viz#3 you can see all the [lettes] related to function the tool look for. The fist is always [cpu] and means we checked if (look the list above) navigator.cpuClass has been checked. If is reported [cpu ] means 0 times the function has been performed, if there is a red number, it represents the amount of time the domain name on the left has executed the function. Under the column 'href' you can see the tested website. The number of inclusion is far less than what we observe in the Viz#1 and Viz#2 because the scripts from the same domain share the same context in the browser, therefore should be aggregated. This analysis does not keep in account all the javascript inclusion which don't execute any of these functions.

Paraguay

Few technical notes and insights inspired by the Paraguaian analysis:

Brasil

It is possible consult with a dedicated UX the Brasil Viz#1 and Viz#2, it wasn't possible report them here because of the amount of tested sites. Certain visualization doesn't scale, that's why researcher or advocate should defer to the data available for all the four countries via API, documented in the previous chapter.

You can see the complete visualization; the image above do not include all the sites present on the Brasil CSV list

Chile

On the Chile institutions list, maybe for the reduced amount of site analyzed, I didn't spot any particular finding.

Colombia

Colombia is not reported because of the number of website analyzed, just lead to a visualization which don't scale. This is a tip to keep in account when running comparative analysis among cluster:

It is possible consult with a dedicated UX the Colombian visualizations .

Colombia list is long, the test is composed by 148 website, follow this link to the interactive visualization, and see the whole Colombian visualization, or look the the Gob-CO-* tabs for the Colombian Viz#1 and Viz#2.

Colombian findings show some of the same pattern explored in this report, such as:

Conclusions

In this chapter, we saw how javascript files can fingerprint a users device, and how invi.sible.link can spot such scripts. The presence of scripts of this kind even in institutional website confirm two aspects: advocacy should step up in raising awareness. website intended for a protected citizen-institution communication should reduce at the minimum their collaboration with third-party web services because it is intrinsical in the HTTP protocol a privacy leak.

The script fingerprinting has also a secondary intention, to make ad-blocking a less polarizing topic. At the moment we can either permit scripts or block them, causing a problem to a large sector which depends on ads. I feel the invasiveness of said script should be kept in the account in judging how much is fair block a third party or not, and this judgment can lead, hopefully, to a safer web ecosystem.

How to run a dedicated test with invi.sible.link

Requirement: a Debian GNU/Linux computer in which invi.sible.link repository has been cloned and installed, as explained in the GitHub page.

you should have:

The system is configured to perform analysis when they are in the queue; my configuration was causing the system to repeat the same collection every 24 hours.

To retrieve the datafrom the server, are available HTTP API. For this specific campaign, you can check this dedicated page, as the example show:

user@user:~/invi.sible.link$ http https://invi.sible.link/api/v1/details/gob.brasil
[
   {
        "acquired": "2018-03-23T03:22:54.741Z",
        "campaign": "gob.brasil",
        "href": "https://nfg.sefaz.rs.gov.br/Cadastro/CadastroNfg_2.aspx",
        "id": "eb43cb853cf3fd98084981a8d89aff7e3c1efbd8",
        "inclusion": "www.gstatic.com",
        "navigator_userAgent": 1,
        "subjectId": "1faf654c7f8277e94d71e65a5f31820b6f99672a",
        "version": 1,
        "when": "2018-03-23T00:00:00.000Z"
   },
   {
        "acquired": "2018-03-23T03:22:54.741Z",
        "campaign": "gob.brasil",
        "href": "https://nfg.sefaz.rs.gov.br/Cadastro/CadastroNfg_2.aspx",
        "id": "eb43cb853cf3fd98084981a8d89aff7e3c1efbd8",
        "inclusion": "tag.navdmp.com",
        "subjectId": "1faf654c7f8277e94d71e65a5f31820b6f99672a",
        "version": 1,
        "when": "2018-03-23T00:00:00.000Z",
        "window_localStorage": 6
    }
…
]

Format specification of the API details

This API returns a number of JSON objects equal to the amount of third party inclusion which executed javascript functions. The objects comes from the last cycle of analysis; In my case, every day is a different cycle because every 24 hours a command queueCampaign push the list of sites in the queue again. If you want to make a comparison between days, you should save the results separately. The JSON objects have some fixed fields, they are:

Each of the javascript function (with the . replaced by _ in the key name, and as value, the number of times the potentially-profiling javascript function has been invoked.