Web Privacy Measurement is the observation of websites and serves to detect, characterize and quantify privacy-impacting behaviors. Applications of Web Privacy Measurement include the detection of price discrimination, targeted news articles and new forms of browser fingerprinting. Although originally focused solely on privacy violations, WPM now encompasses measuring security violations on the web as well.
For these studies to be truly large-scale and repeatable, creating an automated measurement platform is necessary. At least within the academic literature, measurement infrastructures in the field of WPM have been largely one-off and do not comprehensively address the engineering challenges within this realm.
OpenWPM, a flexible, stable, scalable and general web measurement platform, is our solution to this infrastructure vacuum. OpenWPM is a web privacy measurement framework which makes it easy to collect data for privacy studies on a scale of thousands to millions of site. OpenWPM is built on top of Firefox, with automation provided by Selenium. It includes several hooks for data collection
OpenWPM has been developed and tested on Ubuntu 14.04/16.04. An installation script,
install.sh is included to install both the system and python dependencies automatically. A few of the python dependencies require specific versions, so you should install the dependencies in a virtual environment if you’re installing a shared machine. If you plan to develop OpenWPM’s instrumentation extension or run tests you will also need to install the development dependencies included in
It is likely that OpenWPM will work on platforms other than Ubuntu
Our primary technical contributions thus far are as follows:
- Parallel browser automation with synchronization
- Browser crash recovery with full profile support
- Ability to set per-browser properties e.g. screen size, extensions, user-agent string
- Per-browser HTTP request/response logging
- Scanning of Flash Storage and HTTP Cookie Storage after each page visit (extending to other storage locations is possible)
- Loading and saving of browser profiles for multi-crawl studies
- Full command logging
- Aggregation of measurement data centrally from all browsers
Note that OpenWPM is under active development, and should be considered experimental software.