Algorithm

Algorithm is the base class of all differentially private algorithms. Each algorithm can be constructed using the builder pattern, which sets its parameters. Algorithms are stateful: first you add data (possibly multiple times), and then you get a result. Algorithms are templated on the input type.

Construction

absl::StatusOr<std::unique_ptr<Algorithm>> algorithm =
 AlgorithmBuilder.SetEpsilon(double epsilon)
                 .SetMaxPartitionsContributedTo(int max_partitions)
                 .SetMaxContributionsPerPartition(int max_contributions)
                 .SetLaplaceMechanism(std::unique_ptr<NumericalMechanism::Builder> mechanism_builder)
                 .Build();
  • double epsilon: The epsilon differential privacy parameter. A smaller number means more privacy but less accuracy.
  • int max_partitions: The number of aggregations, or ‘partitions,’ that each user is allowed to contribute to. Defaults to 1 if unset. The caller must guarantee that this limit is enforced on the input. The library cannot enforce it because it cannot distinguish between users or aggregations. Note that Algorithms that will be merged together are considered part of the same partition.
  • int max_contributions: The number of pieces of input to this aggregation that can belong to a single user. Defaults to 1 if unset. The caller must guarantee that this limit is enforced on the input. The library cannot enforce it because it does not know which inputs belong to which users. If summaries from multiple Algorithms are merged together, the total number of inputs from a single user across all merged Algorithms must not exceed this limit.
  • std::unique_ptr<NumericalMechanism::Builder> mechanism_builder: Used to specify the type of numerical mechanism the algorithm will use to add noise (e.g. Laplace, Gaussian). In most cases this should not be set (and a default LaplaceMechanism will be used), but it can be used to remove or mock noise during testing.

Partitions

Several of the parameters refer to the concept of a partition. We define a partition as a portion of the data for which a single statistic will be released. This is best explained through examples: if you're counting the number of people in each of a set of age buckets, partitions would correspond to age buckets. Or if you want to count the number of cars broken down by color, each color would be a partition.

We imagine that you will use one or more Algorithms for the data from each partition. A single Algorithm should not be used for data from more than one partition. If multiple Algorithms are used for a single partition, we imagine that serialization (described below) will be used to combine their data into a single Algorithm and produce a single output.

Use

Adding data

These functions add data to the Algorithm‘s internal pool. For most algorithms, this doesn’t consume additional space; the space consumed is typically constant.

void AddEntry(const T& t);

Adds a single element t to the Algorithm's pool. The type of t should be the Algorithm's templated type T.

template <typename Iterator>
void AddEntries(Iterator begin, Iterator end);

Adds multiple inputs to the algorithm.

  • Iterator: any iterator type. All algorithms support any iterator category, including input iterators.
  • Iterator begin and Iterator end: The begin and end iterators for your data. The Algorithm will behave like any STL iterator-based algorithm.
void Reset();

Clears the algorithm's input pool and allows a new result to be generated.

Serialization

Since Algorithms hold an internal state to represent all entires that have been added, we can serialize the algorithm to a Summary proto. A Summary proto holds all the information needed to reconstruct an Algorithm and its internal state. We can merge a Summary into another Algorithm of the same type that was constructed with identical parameters. Merging Algorithms of different types or with different parameters doesn't make sense, and will return an error.

Summary Serialize();
absl::Status Merge(const Summary& summary);

Serialization and merging can allow these algorithms to be used in a distributed manner. This could be useful for very large input sets, for example.

Getting Results

Output PartialResult();

Get a result based on the current state of the Algorithm.

template <typename Iterator>
Output Result(Iterator begin, Iterator end)

Add the entries from begin to end, and then get the result.

Note that whichever method of getting a result you use, you can only get a single result before your epsilon and delta are exhausted.

Values are returned from Result in an Output proto. For most algorithms, this is a single int64 or double value. Some algorithms contain additional data about accuracy and algorithm mechanisms. You can use GetValue<Type> to get values out of Outputs easily.