tree: 16ecc8a872418cfaf91f8f360fa0f33b4b9858e9 [path history] [tgz]
  1. privacy_encoding_params
  2. README.md
  3. string_sketch_params
src/algorithms/privacy/data/README.md

Privacy encoding data

The file ‘privacy_encoding_params’ in this directory is used to calculate parameters to the PrivacyEncoder for each Cobalt 1.1 report with added privacy, other than StringCounts reports (see below).

In particular, we use this data to set the prob_bit_flip and num_index_points fields of the ReportDefinition for each report.

The file contains a lookup table mapping tuples (epsilon, population, sparsity) to precomputed values of prob_bit_flip and num_index_points.

The column format is: {epsilon, population, sparsity, prob_bit_flip, num_index_points}

where:

  • epsilon is the target epsilon value (in the shuffled model),
  • population is the estimated size of the fleet,
  • sparsity is the maximum number of buckets with nonzero counts in the histogram contributed by an individual client.
  • See the ReportDefinition proto for the meanings of the prob_bit_flip and num_index_points fields.

The file ‘string_sketch_params’ is used to calculate parameters to the PrivacyEncoder for StringCounts reports with privacy. We use this data to set the prob_bit_flip, num_index_points, and string_sketch_params fields of the ReportDefinition.

The column format is: {epsilon, population, string_buffer_max, max_count, prob_bit_flip, num_hashes, num_cells_per_hash, num_index_points}

where:

  • epsilon is the target epsilon value (in the shuffled model),
  • population is the estimated size of the fleet,
  • string_buffer_max is the maximum number of strings with nonzero counts in the histogram contributed by an individual client.
  • max_count is the maximum count for each string contributed by a client.
  • num_hashes is the number of hash functions used to create a CountMin sketch for the report.
  • num_cells_per_hash is the range size of each hash function used in a CountMin sketch.
  • See the ReportDefinition proto for the meanings of the prob_bit_flip and num_index_points fields.

Both tables were generated using a script by pasin@. The table contents are preliminary and may change.