This toolkit uses state-of-the-art differential privacy (DP) techniques to inject noise into data, to prevent disclosure of sensitive information and manage exposure risk.
Based on the latest innovation in differential privacy research and best practices from real-world applications
Designed for both statistical analysis and machine learning applications
Open source framework to create and test new algorithms and techniques.
Data scientists, analysts, scientific researchers and policy makers often need to analyze data that contains sensitive personal information that must remain private.
Commonly-used privacy techniques are limiting and can result in leaks in sensitive information.
Differential Privacy is a technique that offers strong privacy assurances, preventing data leaks and re-identification of individuals in a dataset.
A native runtime library to generate and validate differential privacy results that can be used with C, C++, Python, R, and other languages.
Access data from Data Lakes, SQL Server, Postgres, Apache Spark, Apache Presto, and CSV files.
Track privacy risk by managing multiple requests on data. Use privacy budgets to control the number of queries permitted by different users.
An evaluator to automatically stress test existing differential privacy algorithms.
This Toolkit is designed to be a layer between queries and data systems to protect sensitive data.
When a user queries the data or trains a model, the system:
Adds statistical noise to the results,
Calculates the privacy risk metric or information budget, used by the query,
Subtracts from the remaining budget to limit further queries.
As a community project we encourage you to join the effort and contribute feedback, algorithms, ideas and more, so we can evolve the toolkit together!
ContributeWith support from