Optimize ninja -d stats

-d stats enables instrumented profiling of key functions in ninja.
However, some of those functions are invoked 6+ million times in a NOP
build of Chromium and the cost of measuring those functions dwarfs the
cost of the functions. Here is typical -d stats output for a Chromium
build:

metric                  count   avg (us)        total (ms)
.ninja parse            6580    4197.5          27619.5
canonicalize str        6240450 0.0             47.3
canonicalize path       6251390 0.0             33.5
lookup node             6339402 0.0             37.2
.ninja_log load         1       176226.0        176.2
.ninja_deps load        1       465407.0        465.4
node stat               168997  8.8             1482.9
depfile load            327     352.7           115.3

99% of the measurements are in three functions. The total measurement
cost (per ETW sampled profiling) is 700-1200 ms, which is many times
greater than the costs of the functions.

With this change typical output looks like this:

metric                  count   avg (us)        total (ms)
.ninja parse            6580    3526.3          23203.2
.ninja_log load         1       227305.0        227.3
.ninja_deps load        1       485693.0        485.7
node stat               168997  9.6             1615.0
depfile load            327     383.1           125.3

This resolves issue #1998.
diff --git a/src/state.cc b/src/state.cc
index a33d5a8..fc37c8a 100644
--- a/src/state.cc
+++ b/src/state.cc
@@ -19,7 +19,6 @@
 
 #include "edit_distance.h"
 #include "graph.h"
-#include "metrics.h"
 #include "util.h"
 
 using namespace std;
@@ -104,7 +103,6 @@
 }
 
 Node* State::LookupNode(StringPiece path) const {
-  METRIC_RECORD("lookup node");
   Paths::const_iterator i = paths_.find(path);
   if (i != paths_.end())
     return i->second;
diff --git a/src/util.cc b/src/util.cc
index 3dfa8dd..080883e 100644
--- a/src/util.cc
+++ b/src/util.cc
@@ -56,7 +56,6 @@
 #endif
 
 #include "edit_distance.h"
-#include "metrics.h"
 
 using namespace std;
 
@@ -118,7 +117,6 @@
 }
 
 void CanonicalizePath(string* path, uint64_t* slash_bits) {
-  METRIC_RECORD("canonicalize str");
   size_t len = path->size();
   char* str = 0;
   if (len > 0)
@@ -138,7 +136,6 @@
 void CanonicalizePath(char* path, size_t* len, uint64_t* slash_bits) {
   // WARNING: this function is performance-critical; please benchmark
   // any changes you make to it.
-  METRIC_RECORD("canonicalize path");
   if (*len == 0) {
     return;
   }