Use Data.
Preserve Privacy.

A differential privacy toolkit for analytics and machine learning

This toolkit uses state-of-the-art differential privacy (DP) techniques to inject noise into data, to prevent disclosure of sensitive information and manage exposure risk.

Based on the latest innovation in differential privacy research and best practices from real-world applications

Designed for both statistical analysis and machine learning applications

Open source framework to create and test new algorithms and techniques.

Why this toolkit?

Based on cutting-edge Differential Privacy algorithms

Data scientists, analysts, scientific researchers and policy makers often need to analyze data that contains sensitive personal information that must remain private.

Commonly-used privacy techniques are limiting and can result in leaks in sensitive information.

Differential Privacy is a technique that offers strong privacy assurances, preventing data leaks and re-identification of individuals in a dataset.

Flexible native runtime

A native runtime library to generate and validate differential privacy results that can be used with C, C++, Python, R, and other languages.

Built-in Connectivity to Data Sources

Access data from Data Lakes, SQL Server, Postgres, Apache Spark, Apache Presto, and CSV files.

Granular Privacy Risk Controls

Track privacy risk by managing multiple requests on data. Use privacy budgets to control the number of queries permitted by different users.

Privacy Loss Tester

An evaluator to automatically stress test existing differential privacy algorithms.

How It Works

This Toolkit is designed to be a layer between queries and data systems to protect sensitive data.

When a user queries the data or trains a model, the system:

Adds statistical noise to the results,

Calculates the privacy risk metric or information budget, used by the query,

Subtracts from the remaining budget to limit further queries.

Applications

Differential Privacy for Statistical Analysis

  • In-built support for commonly used mathematical and statistical operators
  • Use cases span healthcare, sensitive socio-economic data and more
  • Differential Privacy in Machine Learning

  • Built-in support for training simple machine learning models like linear and logistic regression
  • Compatible with open-source training libraries such TensorFlow Privacy
  • Getting Started

    Build and deploy easily with the Differential Privacy toolkit!

    Install the toolkit

    Contribute

    As a community project we encourage you to join the effort and contribute feedback, algorithms, ideas and more, so we can evolve the toolkit together!

    Contribute

    Key Contributors

    With support from