blob: e00f8c27030e997ae714fce9b4041dbe48994ed1 [file] [log] [blame]
Effect of Improvements
Changing from calculating the correct answer for each new program to
calculating them in advance and storing in a table, reduced the
execution time by about 2.7%.
An 8% improvement resulted from adding the "commutative" bit to the five
commutative operations (add, mul, and, or, xor). Perhaps more
importantly, it reduced the printout of essentially duplicate solutions.
An improvement by a factor of 2.58 (25.3/9.8) resulted from ensuring
that the last register operand of the last instruction, when this
instruction is created, refers to the result of the immediately
preceding instruction.
Continued the above idea for other register operands, i.e., ensured that
SOME operand of the last instruction always refers to the result of the
immediately preceding instruction. Got an improvement by a factor of
1.04 (9.8/9.4).
3/16/02: Got a factor of 1.85 by having it simulate the program only
from the last changed instruction to the end, which means that usually
only the last instruction is simulated. Also, changed the trial
value(s) so they "stick" at the last failed one(s). When a trial value
is changed, which happens after a success, the whole program must be
3/16/02: Got a factor of 1.010 by moving the assignment to
computed_result inside the loop just ahead of where it was. (The loop
is usually executed only once.)
3/16/02: Got a factor of 1.020 by computing corr_res only when sticky_i
and/or sticky_j change.
3/17/02: Tried making "numi" a constant defined with #define. Got a 5%
improvement. Decided not to do this.
3/19/02: Got a factor of 1.166 by inlining "increment."
3/23/02: Took 1614 secs (26.9 min) to search with numi = 4.
9/19/02: Got a factor of 1.131 by requiring that immediate values be in
the order 0, -1, 1, ... and using isa.opndstart[3] to avoid certain
silly cases like ADD of 0, ADD of -1 (we do a subtract of 1), AND of 0
or -1, etc. This was made kind of necessary because the compare ops
should have an immediate value of 0 as a possibility, whereas for most
other ops, immediate 0 would never be used.
9/22/02: Changed shift immediate amounts to be given in an array
(shimmed), so that fewer than 31 values can be specified. This gave no
change to execution time if all 31 are specified (1.222 secs for
absolute value problem on a basic RISC, running on my 1.8 mHz Thinkpad).
If only 4 values are specified, e.g. 1, 2, 30, and 31, the execution
dropped to 0.450 secs, a factor of 2.71 improvement.
I don't quite understand this, because the number of evaluations of
the third instruction reduced from 14.2 million to 2.74 million, a
factor of 5.18.
It's partly explained by the program load time. If you run aha with
an argument of 1, it takes 0.140 secs. Thus the time to start and end
the program is about that amount. So the ration of actual execution
time is (1.222 - 0.140)/(0.450 - 0.140) = 3.49. Closer to 5.18, but not
very close.
9/24/02: Put ALL immediates (both ordinary and shift amounts) in the
registers. This did not affect the execution time (if compiled -O2),
but it allowed deleting the operand type info in the isa table, and
simplified the code a little (by 77 lines).
Execution time for the standard run is now 0.591 secs on my 667 mHz
machine (compiled -O2, which I guess I'll use from now on).
9/25/02: Made Aha! measure and print its own execution time, using
clock(). I believe this is user + system time for the Aha! process,
rather than wall clock time. Found that -O2 and -O3 make no difference;
the assembly language files search.s and check.s are identical. Am
using -O2. The standard job runs in from 0.520 to 0.540 seconds process
time on my 667 mHz office machine.
The number of instruction evaluations is 62248 + 82618 + 2743328 (for
the first, second, and third instruction resp.), or 2888194 total. This
corresponds to 122 cycles per evaluation.
9/30/02: Before today, the program consisted of three .cc files: aha,
search, and check. Made it all one file (, mainly because of
problems with C in defining a preset array of values and not requiring
the user to also set a variable equal to the number of values in the
array. No change to execution time. Build (mainly compilation) time
dropped from about 2.0 secs to 1.2 secs. This change also permits
inlining fix_operands, but trying that did not change execution time
measurably (so it is not inlined now).
10/14/02: Before today, the incrementing of instructions was done with
the rightmost operand varying most rapidly. Today it was changed so
that the leftmost operand varies most rapidly. This simplifies the
handling of commutative ops, and permits a few other minor
This gave a factor of 1.05 improvement in execution time. Quite
minor, but the program is a little simpler and I think it will simplify
more complicated optimizations that may be done, such as (somehow)
avoiding programs that have an instruction whose result is unused.
A preliminary investigation of this shows that for a typical RISC
instruction set, 39% of three-instruction programs have an unused
result, and 70% of four-instruction programs have an unused result.
This is compared to the present program, which ensures only that the
second from last computed result does not go unused. Thus there is hay
to be made here.
An attempt to skip ALL these silly programs resulted in a net
increase in execution time, because it was implemented inefficiently.
It seems to be hard to devise an efficient way to do this. Some
compromise might be practical, such as ensuring only that the second and
third from last results are not both unused.
10/15/02: Changed the program as just mentioned, i.e., to ensure that
instruction n (the last) uses the result of instruction n-1 and, if
instruction n-1 does not use the result of instruction n-2, then the
last instruction does. This improved execution time by a factor of 1.4
for three-instruction programs, and a factor of 1.8 for four-instruction
4/22/03: Ran Aha! on a two-input problem with n = 5 and 17 instructions
enabled. Was searching for 5-instruction programs to compute the average
of two signed integers (without overflowing). Shut it off after 144
hours (6 days). I should make it display a "progress report" for such
long jobs, such as printing out the first instruction in the list each
time a new opcode is selected for it. Otherwise, you don't know if it
somehow got into an infinite loop and you have no idea how long the run
will take.