An extensible, privacy-preserving, user-data analysis pipeline. go/cobalt-for-privacy
The script cobaltb.py orchestrates building and testing Cobalt.
cobaltb.py lintto run the linters
cobaltb.py -hfor general help
It is possible to run various subsets of the tests. Run
cobaltb.py test -h to see the possible subsets.
./cobaltb.py test --tests=gtests This runs all of the gunit tests. For those tests that need a DataStore, the in-memory datastore is used. Compare this against the other possibilities for a DataStore listed below.
./cobaltb.py test --tests=btemulator This starts the Bigtable emulator running on the local machine on the default port of 9000. Then it runs various gunit tests that need a DataStore and that expect to find the emulator running. These tests use the BigtableStore as the DataStore and BigtableStore connects to the Bigtable emulator.
These are a set of gunit tests that use BigtableStore as the DataStore and connect to the real Google Cloud Bigtable. These tests are not run automatically at this time. They are not run on the build machine and they are not run if you type
./cobaltb.py test --tests=all. Instead you must explictly invoke them. See below for the command line.
One-time setup: You must install a credential on your computer in order for the Cobalt code running on your computer to be able to access Cobalt's testing Google Cloud Project.
Service Account Keyas the type of key
New Service Accountand assign your service account any name.
JSONas the key type
service_account_credentials.jsonand you must put the file in the Cobalt source root directory (next to this README file.)
GOOGLE_APPLICATION_CREDENTIALSto the path to that file. This is necessary for the gRPC code linked with Cobalt to find the credential at run-time during the tests.
Side-note: We are abusing service accounts with this procedure. The more appropriate solution is to use oauth tokens in order to authenticate your computer to Google Cloud. However at this time there seems to be a bug that is preventing this from working. The symptom is you will see the following error message:
assertion failed: deadline.clock_type == g_clock_type. If you see this error message it means that the oauth flow is being attempted and has hit this bug. This happens if the gRPC code is not able to use the service account credential located at
If you want, it is possible to use a different Google Cloud project besides Cobalt's testing project. Follow all the same steps above using any project. Then later you will have to pass the command-line flag
--bigtable_project_name with the name of your project.
Every-time setup You will also need to create your own personal instance of Cloud Bigtable within the Google Cloud project for which you created credentials above. Because it costs our cost center money to have have an instance of Google Cloud Bigtable running, we recommend not leaving your personal instance running all the time. We recommend creating an instance when you want to test with it and then deleting it when you are done.
./cobaltb.py test --tests=cloud_bt --bigtable_instance_name=<your-personal-instance>
--bigtable_project_name=<your-project-name>if you want to use a different project.
You can run the Analyzer Service locally using an in-memory data store as follows:
<root> be the root of your Cobalt installation.
./out/analyzer/analyzer -for_testing_only_use_memstore -port=8080 -cobalt_config_dir="config/registered" -logtostderr
You can run the analyzer locally using the Bigtable emulator as follows. Start the Bigtable emulator:
In a separate console run:
./out/analyzer/analyzer -for_testing_only_use_bigtable_emulator -port=8080 -cobalt_config_dir="config/registered" -logtostderr
You can run the shuffler locally as follows:
/out/shuffler/shuffler -logtostderr -config_file out/config/shuffler_default.conf -batch_size 100 -vmodule=receiver=2,dispatcher=1,store=2
The following example sends a single RPC containing an ObservationBatch with 200 Observations to an Analyzer running locally on port 8080:
./out/tools/cgen -analyzer_uri="localhost:8080" -num_observations=200
The following example sends the RPC instead to the shuffler running locally on port 50051:
./out/tools/cgen -analyzer_uri="localhost:8080" -shuffler_uri="localhost:50051" -num_observations=200
The cobaltb.py tool is also a helper to interact with GCE. The following commands are supported:
gce_build - Build the docker images for use on GCE.
gce_push - Publish the built docker images to the GCE repository.
gce_start - Start the cobalt components. To see the external IP of services run, for example: kubectl get service analyzer
gce_stop - Stops the cobalt components.
sudo apt-get install docker-engine
sudo usermod -aG docker
Install gcloud: https://cloud.google.com/sdk/
gcloud components install kubectl
Tasks like gce_start, gce_stop or running the Analyzer outside of GCE require setting the environment variable GOOGLE_APPLICATION_CREDENTIALS to point to the JSON file containing credentials. Follow the instructions of step “1.” of “How the Application Default Credentials work” from:
All of Cobalt's dependencies (both compile and run-time) are installed in the sysroot directory using the
setup.sh script. To avoid having to compile the dependencies every time, a packaged binary version of sysroot is stored on Google storage.
cobaltb.py setup will download the pre-built sysroot from Google storage.
To upload a new version of sysroot on Google storage do the following:
rm -fr sysroot