Update README to mention UseRealTime for wallclock time measurements.

Also adding a use case in the API header.

Fixes #170
diff --git a/README.md b/README.md
index 8c656ea..1fa7186 100644
--- a/README.md
+++ b/README.md
@@ -145,9 +145,10 @@
 #define BENCHMARK_TEMPLATE2(func, arg1, arg2)
 ```
 
-In a multithreaded test, it is guaranteed that none of the threads will start
-until all have called KeepRunning, and all will have finished before KeepRunning
-returns false. As such, any global setup or teardown you want to do can be
+In a multithreaded test (benchmark invoked by multiple threads simultaneously),
+it is guaranteed that none of the threads will start until all have called
+KeepRunning, and all will have finished before KeepRunning returns false. As
+such, any global setup or teardown you want to do can be
 wrapped in a check against the thread index:
 
 ```c++
@@ -165,6 +166,16 @@
 BENCHMARK(BM_MultiThreaded)->Threads(2);
 ```
 
+If the benchmarked code itself uses threads and you want to compare it to
+single-threaded code, you may want to use real-time ("wallclock") measurements
+for latency comparisons:
+
+```c++
+BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime();
+```
+
+Without `UseRealTime`, CPU time is used by default.
+
 To prevent a value or expression from being optimized away by the compiler
 the `benchmark::DoNotOptimize(...)` function can be used.
 
diff --git a/include/benchmark/benchmark_api.h b/include/benchmark/benchmark_api.h
index 4dec01f..5523587 100644
--- a/include/benchmark/benchmark_api.h
+++ b/include/benchmark/benchmark_api.h
@@ -417,11 +417,11 @@
   // option overrides the `benchmark_min_time` flag.
   Benchmark* MinTime(double t);
 
-  // If a particular benchmark is I/O bound, or if for some reason CPU
-  // timings are not representative, call this method. If called, the elapsed
-  // time will be used to control how many iterations are run, and in the
-  // printing of items/second or MB/seconds values.  If not called, the cpu
-  // time used by the benchmark will be used.
+  // If a particular benchmark is I/O bound, runs multiple threads internally or
+  // if for some reason CPU timings are not representative, call this method. If
+  // called, the elapsed time will be used to control how many iterations are
+  // run, and in the printing of items/second or MB/seconds values.  If not
+  // called, the cpu time used by the benchmark will be used.
   Benchmark* UseRealTime();
 
   // Support for running multiple copies of the same benchmark concurrently