| # Jitterentropy: tuning the configuration | 
 |  | 
 | The jitterentropy library is written by Stephan Mueller, is available at | 
 | <https://github.com/smuellerDD/jitterentropy-library>, and is documented at | 
 | <http://www.chronox.de/jent.html>. In Zircon, it's used as a simple entropy | 
 | source to seed the system CPRNG. | 
 |  | 
 | [The companion document about basic configuration options to jitterentropy](config-basic.md) | 
 | describes two options that fundamentally affect how jitterentropy runs. This document describes | 
 | instead the numeric parameters that control how fast jitterentropy is and how much entropy it | 
 | collects, but without fundamentally altering its principles of operation. It also describes how to | 
 | test various parameters and what to look for in the output (e.g. if adding support for a new device, | 
 | or to do a more thorough job of optimizing the parameters). | 
 |  | 
 | [TOC] | 
 |  | 
 | ## A rundown of jitterentropy's parameters | 
 |  | 
 | The following tunable parameters control how fast jitterentropy runs, and how fast it collects | 
 | entropy: | 
 |  | 
 | ### [`kernel.jitterentropy.ll`](/docs/reference/kernel/kernel_cmdline.md#kernel-jitterentropy-ll-num) | 
 |  | 
 | "`ll`" stands for "LFSR loops". Jitterentropy uses a (deliberately inefficient implementation of a) | 
 | LFSR to exercise the CPU, as part of its noise generation. The inner loop shifts the LFSR 64 times; | 
 | the outer loop repeats `kernel.jitterentropy.ll`-many times. | 
 |  | 
 | In my experience, the LFSR code significantly slows down jitterentropy, but doesn't generate very | 
 | much entropy. I tested this on RPi3 and qemu-arm64 with qualitatively similar results, but it hasn't | 
 | been tested on x86 yet. This is something to consider when tuning: using fewer LFSR loops tends to | 
 | lead to better overall performance. | 
 |  | 
 | Note that setting `kernel.jitterentropy.ll=0` causes jitterentropy to choose the number of LFSR | 
 | loops in a "random-ish" way. As described in [the basic config doc](config-basic.md), I discourage | 
 | the use of `kernel.jitterentropy.ll=0`. | 
 |  | 
 |  | 
 | ### [`kernel.jitterentropy.ml`](/docs/reference/kernel/kernel_cmdline.md#kernel-jitterentropy-ml-num) | 
 |  | 
 | "`ml`" stands for "memory access loops". Jitterentropy walks through a moderately large chunk of | 
 | RAM, reading and writing each byte. The size of the chunk and access pattern are controlled by the | 
 | two parameters below. The memory access loop is repeated `kernel.jitterentropy.ml`-many times. | 
 |  | 
 | In my experience, the memory access loops are a good source of raw entropy. Again, I've only tested | 
 | this on RPi3 and qemu-arm64 so far. | 
 |  | 
 | Much like `kernel.jitterentropy.ll`, if you set `kernel.jitterentropy.ml=0`, then jitterentropy will | 
 | choose a "random-ish" value for the memory access loop count. I also discourage this. | 
 |  | 
 | ### [`kernel.jitterentropy.bs`](/docs/reference/kernel/kernel_cmdline.md#kernel-jitterentropy-bs-num) | 
 |  | 
 | "`bs`" stands for "block size". Jitterentropy divides its chunk of RAM into blocks of this size. The | 
 | memory access loop starts with byte 0 of block zero, then "byte -1" of block 1 (which is actually | 
 | the last byte of block 0), then "byte -2" of block 2 (i.e. the second-to-last byte of block 1), and | 
 | so on. This pattern ensures that every byte gets hit, and most accesses go into different blocks. | 
 |  | 
 | I have usually tested jitterentropy with `kernel.jitterentropy.bs=64`, based on the size of a cache | 
 | line. I haven't tested yet to see whether there's a better option on some/all platforms. | 
 |  | 
 | ### [`kernel.jitterentropy.bc`](/docs/reference/kernel/kernel_cmdline.md#kernel-jitterentropy-bc-num) | 
 |  | 
 | "`bc`" stands for "block count". Jitterentropy uses this many blocks of RAM, each of size | 
 | `kernel.jitterentropy.bs`, in its memory access loops. | 
 |  | 
 | Since I choose `kernel.jitterentropy.bs=64`, I usually choose `kernel.jitterentropy.bc=1024`. | 
 | This means using 64KB of RAM, which is enough to overflow L1 cache. | 
 |  | 
 | The [jitterentropy source code](/zircon/third_party/lib/jitterentropy/jitterentropy-base.c#234) | 
 | in the comment before `jent_memaccess` suggests choosing the block size and count so that the RAM | 
 | used is bigger than L1. Confusingly, the default values in upstream jitterentropy (block size = 32, | 
 | block count = 64) aren't big enough to overflow L1. | 
 |  | 
 | ## Tuning process | 
 |  | 
 | The basic idea is simple: on a particular target device, try different values for the parameters. | 
 | Collect a large amount of data for each parameter set (ideally around 1MB), then | 
 | [run the NIST test suite to analyze the data](/docs/concepts/testing/entropy_quality_tests.md#running-the-nist-test-suite). | 
 | Determine which parameters give the best entropy per unit time. The time taken to draw the entropy | 
 | samples is logged on the system under test. | 
 |  | 
 | One complication is the startup testing built into jitterentropy. This essentially draws and | 
 | discards 400 samples, after performing some basic analysis (mostly making sure that the clock is | 
 | monotonic and has a high enough resolution and variability). A more accurate test would reboot twice | 
 | for each set of parameters: once to collect around 1MB of data for analysis, and a second time to | 
 | boot with the "right" amount of entropy (as computed according to the entropy estimate in the first | 
 | phase, with appropriate safety margins, etc. See | 
 | ["Determining the entropy\_per\_1000\_bytes statistic"](#determining-the-entropy_per_1000_bytes-statistic), | 
 | below). This second phase of testing simulates a real boot, including the startup tests. After | 
 | completing the second phase, choose the parameter set that boots fastest. Of course, each phase of | 
 | testing should be repeated a few times to reduce random variations. | 
 |  | 
 | ## Determining the entropy\_per\_1000\_bytes statistic | 
 |  | 
 | The `crypto::entropy::Collector` interface in | 
 | [kernel/lib/crypto/include/lib/crypto/entropy/collector.h](/zircon/kernel/lib/crypto/include/lib/crypto/entropy/collector.h) | 
 | requires a parameter `entropy_per_1000_bytes` from its instantiations. The value relevant to | 
 | jitterentropy is currently hard-coded in | 
 | [kernel/lib/crypto/entropy/jitterentropy\_collector.cpp](/zircon/kernel/lib/crypto/entropy/jitterentropy_collector.cc). | 
 | This value is meant to measure how much min-entropy is contained in each byte of data produced by | 
 | jitterentropy (since the bytes aren't independent and uniformly distributed, this will be less than | 
 | 8 bits). The "per 1000 bytes" part simply makes it possible to specify fractional amounts of | 
 | entropy, like "0.123 bits / byte", without requiring fractional arithmetic (since `float` is | 
 | disallowed in kernel code, and fixed-point arithmetic is confusing). | 
 |  | 
 | The value should be determined by using the NIST test suite to analyze random data samples, as | 
 | described in | 
 | [the entropy quality tests document](/docs/concepts/testing/entropy_quality_tests.md#running-the-nist-test-suite). | 
 | The test suite produces an estimate of the min-entropy; repeated tests of the same RNG have (in my | 
 | experience) varied by a few tenths of a bit (which is pretty significant when entropy values can be | 
 | around 0.5 bits per byte of data!). After getting good, consistent results from the test suites, | 
 | apply a safety factor (i.e. divide the entropy estimate by 2), and update the value of | 
 | `entropy_per_1000_bytes` (don't forget to multiply by 1000). | 
 |  | 
 | Note that eventually `entropy_per_1000_bytes` should probably be configured somewhere instead of | 
 | hard-coded in jitterentropy\_collector.cpp. Kernel cmdlines or even a preprocessor symbol could work. | 
 |  | 
 | ## Notes about the testing script | 
 |  | 
 | The `scripts/entropy-test/jitterentropy/test-tunable` script automates the practice of looping | 
 | through a large test matrix. The downside is that tests run in sequence on a single machine, so (1) | 
 | an error will stall the test pipeline so supervision *is* required, and (2) the machine is being | 
 | constantly rebooted rather than cold-booted (plus it's a netboot-reboot), which could conceivably | 
 | confound the tests. Still, it beats hitting power-off/power-on a thousand times by hand! | 
 |  | 
 | Some happy notes: | 
 |  | 
 | 1. When netbooting, the script leaves bootserver on while waiting for netcp to successfully export | 
 |    the data file. If the system hangs, you can power it off and back on, and the existing bootserver | 
 |    process will restart the failed test. | 
 |  | 
 | 2. If the test is going to run (say) 16 combinations of parameters 10 times each, it will go like | 
 |    this: | 
 |  | 
 |        test # 0: ml = 1   ll = 1  bc = 1  bs = 1 | 
 |        test # 1: ml = 1   ll = 1  bc = 1  bs = 64 | 
 |        test # 2: ml = 1   ll = 1  bc = 32 bs = 1 | 
 |        test # 3: ml = 1   ll = 1  bc = 32 bs = 64 | 
 |        ... | 
 |        test #15: ml = 128 ll = 16 bc = 32 bs = 64 | 
 |        test #16: ml = 1   ll = 1  bc = 1  bs = 1 | 
 |        test #17: ml = 1   ll = 1  bc = 1  bs = 64 | 
 |        ... | 
 |  | 
 |    (The output files are numbered starting with 0, so I started with 0 above.) | 
 |  | 
 |    So, if test #17 fails, you can delete tests #16 and #17, and re-run 9 more iterations of each | 
 |    test. You can at least keep the complete results from the first iteration. In theory, the tests | 
 |    could be smarter and also keep the existing result from test #16, but the current shell scripts | 
 |    aren't that sophisticated. | 
 |  | 
 | The scripts don't do a two-phase process like I suggested in the ["Tuning process"](#tuning-process) | 
 | section above. It's certainly possible, but again the existing scripts aren't that sophisticated. | 
 |  | 
 | ## Open questions | 
 |  | 
 | ### How much do we trust the low-entropy extreme? | 
 |  | 
 | It's *a priori* possible that we maximize entropy per unit time by choosing small parameter values. | 
 | Most extreme is of course `ll=1, ml=1, bs=1, bc=1`, but even something like `ll=1, ml=1, bs=64, | 
 | bc=32` is an example of what I'm thinking of.  Part of the concern is the variability in the test | 
 | suite: if hypothetically the tests are only accurate to within 0.2 bits of entropy per byte, and if | 
 | they're reporting 0.15 bits of entropy per byte, what do we make of it? Hopefully running the same | 
 | test a few hundred times in a row will reveal a clear modal value, but it's still a little bit risky | 
 | to rely on that low estimate to be accurate. | 
 |  | 
 | The NIST publication states (line 1302, page 35, second draft) that the estimators "work well when | 
 | the entropy-per-sample is greater than 0.1". This is fairly low, so hopefully it isn't an issue in | 
 | practice. Still, the fact that there is a lower bound means we should probably leave a fairly | 
 | conservative envelope around it. | 
 |  | 
 | ### How device-dependent is the optimal choice of parameters? | 
 |  | 
 | There's evidently a significant difference in the actual "bits of entropy per byte" metric on | 
 | different architectures or different hardware. Is it possible that most systems are optimal at | 
 | similar parameter values (so that we can just hard-code these values into | 
 | `kernel/lib/crypto/entropy/jitterentropy_collector.cpp`? Or, do we need to put the parameters into | 
 | MDI or into a preprocessor macro, so that we can use different defaults on a per-platform basis (or | 
 | whatever level of granularity is appropriate). | 
 |  | 
 | ### Can we even record optimal parameters with enough granularity? | 
 |  | 
 | I mentioned it above, but one of our targets is "x86", which is what runs on any x86 | 
 | PC. Naturally, x86 PCs can very quite a bit. Even if we did something like add preprocessor symbols | 
 | like `JITTERENTROPY_LL_VALUE` etc. to the build, customized in `kernel/project/target/pc-x86.mk`, | 
 | could we pick a good value for *all PCs*? | 
 |  | 
 | If not, what are our options? | 
 |  | 
 | 1. We could store a lookup table based on values accessible at runtime (like the exact CPU model, | 
 |    the core memory size, cache line size, etc.). This seems rather unwieldy. Maybe if we could find | 
 |    one or two simple properties to key off of, say "CPU core frequency" and "L1 cache size", we | 
 |    could make this relatively non-terrible. | 
 |  | 
 | 2. We could try an adaptive approach: monitor the quality of the entropy stream, and adjust the | 
 |    parameters according on the fly. This would take a lot of testing and justification if we want to | 
 |    trust it. | 
 |  | 
 | 3. We could settle for "good enough" parameters on most devices, with the option to tune via kernel | 
 |    cmdlines or a similar mechanism. This seems like the most likely outcome to me. I expect that | 
 |    "good enough" parameters will be easy to find, and not disruptive enough to justify extreme | 
 |    solutions. |