All of us probably have some understanding that, on the internet, you’re never alone and nothing that you do is entirely secret. Website administrators, for example, are known to use a number of scripts in order to learn more about how visitors navigate and interact with their sites. Such data collection is designed to improve a visitor’s journey through the site and, in the case of online stores, ultimately to boost sales. However, this tracking can go far beyond the scope of mundane website analytics.
Princeton University researchers recently found that an advanced tool, called a “session-replay script”, can keep a truly blow-by-blow account of a user’s interaction with a website. Likened to someone “looking over your shoulder”, these embedded snippets of code were found to watch and record a visitor’s every move on a website, in real time. Put differently, these “screen recordings” make it possible to capture your every keystroke, mouse movement and page scrolling action, packing it together with the site’s contents and sending off the package to third-party servers for analysis.
Having examined session-replay scripts by seven of the tool’s most popular providers, the researchers found that 482 of the world's 50,000 most-visited sites used one of these scripts. This is clearly just a tiny subset of the entire internet, so the actual number of the sites employing the scripts is very likely to be much, much higher.
What are the risks?
While the scripts are intended for legitimate purposes, such as for learning how users engage with websites or identifying confusing parts of a webpage, their use can have profound privacy and security implications. Most importantly, the tools fail to discriminate between the kinds of data that they scoop up. In the absence of adequate safeguards, they can expose a user’s sensitive data, including login credentials, credit card numbers, and health information. Additionally, they record the information even if the input is deleted before it is submitted – merely typing the text is sufficient to have those keystrokes recorded.
Also, rather than aggregating general statistics, the session-replay scripts are intended to record and play back individual browsing sessions, thus removing any anonymity from the equation. According to the aforementioned paper, some providers even allow linking a session’s recording to a user’s real name.
Admittedly, the services do offer both manual and automatic redaction tools with the aim of excluding sensitive details from the sessions. However, such “checks and balances” were not always effective, or even practicable, in real-life conditions, not to mention the fact that manual redaction is fundamentally insecure.
In addition, it was found that the data may leak to the outside world. Some vendors in the study were found to deliver playbacks within an HTTP page, even if the session originally took place over a secure and encrypted HTTPS page.
Another worry is that the information is collected on the sly, without explicit consent from the user. This may, in turn, indicate the provider’s concerns that users might not agree to the practice – if they were made aware of it to begin with.
What are (not) the mitigations?
One way to help thwart such scripts is to use trusted script- or ad-blocking tools that address the scripts. However, two commonly-used ad-blocking lists were found, at the time of the research, to be imperfect at blocking the scripts. With that in mind, be sure to check if the scripts are indeed filtered out by your script- or ad-blocking solution of choice. Meanwhile, turning on the “Do Not Track” setting that is part of many browsers’ in-built tracking protection often has no effect.
Ideally, the scripts should be opt-out by default so that a user first needs to consent to their use. For EU citizens, the looming General Data Protection Regulation (GDPR) is poised to help, as it introduces strict requirements on data handling and protection procedures. This includes obtaining consent from website users with the processing and storing of their personally identifiable information.
To the best of our knowledge, there is no known case of a compromise of user data due to session-replay scripts. However, their use does create privacy and security concerns, doubly so in an era in which the right to privacy has taken on a whole new vigor, and especially in light of recent revelations about data mining and matching on social media sites. At the end of the day, the best safeguards involve being aware of tracking and data-slurping tools and being cautious with what information we input anywhere on the internet.