[compare.py] Add confidence interval (#377)
This patch adds a `--diff-confidence-interval=relative|absolute` option to `compare.py` to report 95% (or 1-alpha) confidence intervals for the relative difference between `lhs` and `rhs` runs.
The current p-value and significance markers only tells the user if a difference is statistically significant against null hypothesis, but does not show how large the true difference might vary.
Example output from `compare.py ... --statistics --diff-confidence-interval`
```
Program exec_time
lhs rhs diff std_lhs std_rhs t-value p-value significant diff_ci_rel
C 2.95 3.40 15.3% 0.076 0.100 -6.653 0.0027 Y [ 9.3%, 22.7%]
A 1.00 1.15 15.0% 0.050 0.050 -3.674 0.0213 Y [ 3.5%, 25.1%]
B 1.95 2.20 12.8% 0.076 0.050 -4.427 0.0114 Y [ 4.3%, 18.8%]
Geomean difference 14.4%
```1 file changed