| # Attack Model for the Differential Privacy Libraries |
| |
| ## TL;DR |
| |
| This doc summarizes our assumptions and requirements for using the DP Building |
| Block Libraries in a safe way. We assume that an attacker does not have direct |
| access to the raw user data, has limited abilities to inject data into the |
| dataset, and has limited visibility into the resources consumed by the DP |
| Libraries. |
| |
| ## Differentially Private Output |
| |
| The DP Building Block Libraries provide differentially private output. In |
| layperson's terms, this means that an attacker should only be able to get very |
| limited additional knowledge from the output of the DP library. An upper bound |
| on the amount of knowledge obtained from the output is configurable via the DP |
| parameters epsilon and delta. |
| |
| ## Use Cases |
| |
| There are two intended use-cases of the library: |
| |
| * **Single data releases:** This scenario is a one-off aggregation of the raw |
| user data. The DP Libraries are used to aggregate the data once, and the |
| result is then shared with a wider audience. |
| * **Periodic data releases:** Similar to the above, but the library is used |
| periodically to aggregate the data, and the result is then shared with a |
| wider audience periodically. The client is responsible for calculating an |
| appropriate privacy budget (epsilon, delta) so that user data that is used |
| in multiple aggregations is accounted for correctly. |
| |
| ## Non-Goals of the DP Libraries |
| |
| * The DP library is designed to be used as a component in a higher-level |
| framework that mitigates against privacy attacks. It is not designed to be |
| used directly. For instance, the DP library does not have the notion of a |
| user and hence cannot filter the dataset to reduce the number of |
| contributions per user. |
| * The DP Library is not designed for an interactive setting, e.g., allowing an |
| untrusted analyst to perform arbitrary queries. |
| |
| ## Attack Model |
| |
| We assume that the DP Library is executed on trusted compute nodes. This means |
| that clients must trust the hardware and any process that is running on the same |
| node. If an attacker can control any process on these nodes, then the attack |
| surface is much larger than just via the DP libraries, since the node has access |
| to the raw user data (whether via the network or on a hard disc). |
| |
| We assume that the DP Library is executed in batch mode. After every run, the |
| output is eventually published to a wider audience and accessible to the |
| attacker. |
| |
| ### Attacker's prior knowledge |
| |
| The attacker does not have access to the raw user data, otherwise there is |
| nothing left to protect. |
| |
| The attacker might have knowledge about a subset of the raw user data: |
| |
| 1. The attacker could have contributed to the dataset themselves. |
| 2. The attacker could have prior knowledge of the raw values of a large number |
| of contributions (including the case where the attacker knows all raw values |
| except for a single user's contribution). |
| 1. The attacker could even use some outside mechanism to learn about all |
| the contributions. However, from the output of the DP Libraries, the |
| attacker should not be able to infer whether they learned about all the |
| contributions or whether they only obtained a strict subset of the raw |
| data source. In particular, the attacker should not be able to infer |
| the values that a user contributed to the dataset. |
| |
| ### Injecting malicious data |
| |
| The attacker can forge a very large number or even all contributions to the raw |
| input data set. However, from the output of the DP Libraries, it should be |
| impossible for the attacker to learn whether there were any other entries in the |
| dataset (i.e. whether all contributions were forged). |
| |
| Note: Depending on the application logic, there can be some mitigations against |
| malicious data, e.g., applying rounding and/or enforcing a typical number of |
| contributions per privacy unit. |
| |
| ### Order of events |
| |
| The attacker can control the sequence in which user data is passed to the DP |
| Library, including the case where this data is malicious and |
| attacker-controlled. The DP Library protects against attacks that target the |
| non-associativity of floating point arithmetic and similar attacks using the |
| order of events. |
| |
| ### Side channels |
| |
| The execution is hidden from the attacker. In particular, we assume that the |
| attacker does not have additional information about: |
| |
| 1. How the raw user data is retrieved from the data storage. |
| 2. How processes are executed, their memory consumption, CPU utilization, |
| network usage, or timing information. |
| 3. The state of the random number generator and the amount of entropy |
| available. |