Make NFA threads shared and effectively immutable.

Instead of copying threads all the time, the NFA execution engine now
increments reference counters in AddToThreadq() and decrements them in
Step(). It now copies threads only when recording captures.

Twiddling reference counters is cheaper than copying pointers because
there are at least two pointers for the zeroth submatch and, of course,
there may be arbitrarily many submatches.

This probably will not help much with memory footprint except when fanout
is high, but seems like it will be friendlier in terms of cache effects.

Change-Id: I90e9f6c0164cb4d06554ec16a89bc8ce76f500a3
Reviewed-on: https://code-review.googlesource.com/4670
Reviewed-by: Paul Wankadia <junyer@google.com>
1 file changed