blob: e00f8c27030e997ae714fce9b4041dbe48994ed1 [file] [log] [blame]
 Effect of Improvements Changing from calculating the correct answer for each new program to calculating them in advance and storing in a table, reduced the execution time by about 2.7%. An 8% improvement resulted from adding the "commutative" bit to the five commutative operations (add, mul, and, or, xor). Perhaps more importantly, it reduced the printout of essentially duplicate solutions. An improvement by a factor of 2.58 (25.3/9.8) resulted from ensuring that the last register operand of the last instruction, when this instruction is created, refers to the result of the immediately preceding instruction. Continued the above idea for other register operands, i.e., ensured that SOME operand of the last instruction always refers to the result of the immediately preceding instruction. Got an improvement by a factor of 1.04 (9.8/9.4). 3/16/02: Got a factor of 1.85 by having it simulate the program only from the last changed instruction to the end, which means that usually only the last instruction is simulated. Also, changed the trial value(s) so they "stick" at the last failed one(s). When a trial value is changed, which happens after a success, the whole program must be simulated. 3/16/02: Got a factor of 1.010 by moving the assignment to computed_result inside the loop just ahead of where it was. (The loop is usually executed only once.) 3/16/02: Got a factor of 1.020 by computing corr_res only when sticky_i and/or sticky_j change. 3/17/02: Tried making "numi" a constant defined with #define. Got a 5% improvement. Decided not to do this. 3/19/02: Got a factor of 1.166 by inlining "increment." 3/23/02: Took 1614 secs (26.9 min) to search with numi = 4. 9/19/02: Got a factor of 1.131 by requiring that immediate values be in the order 0, -1, 1, ... and using isa.opndstart[3] to avoid certain silly cases like ADD of 0, ADD of -1 (we do a subtract of 1), AND of 0 or -1, etc. This was made kind of necessary because the compare ops should have an immediate value of 0 as a possibility, whereas for most other ops, immediate 0 would never be used. 9/22/02: Changed shift immediate amounts to be given in an array (shimmed), so that fewer than 31 values can be specified. This gave no change to execution time if all 31 are specified (1.222 secs for absolute value problem on a basic RISC, running on my 1.8 mHz Thinkpad). If only 4 values are specified, e.g. 1, 2, 30, and 31, the execution dropped to 0.450 secs, a factor of 2.71 improvement. I don't quite understand this, because the number of evaluations of the third instruction reduced from 14.2 million to 2.74 million, a factor of 5.18. It's partly explained by the program load time. If you run aha with an argument of 1, it takes 0.140 secs. Thus the time to start and end the program is about that amount. So the ration of actual execution time is (1.222 - 0.140)/(0.450 - 0.140) = 3.49. Closer to 5.18, but not very close. 9/24/02: Put ALL immediates (both ordinary and shift amounts) in the registers. This did not affect the execution time (if compiled -O2), but it allowed deleting the operand type info in the isa table, and simplified the code a little (by 77 lines). Execution time for the standard run is now 0.591 secs on my 667 mHz machine (compiled -O2, which I guess I'll use from now on). 9/25/02: Made Aha! measure and print its own execution time, using clock(). I believe this is user + system time for the Aha! process, rather than wall clock time. Found that -O2 and -O3 make no difference; the assembly language files search.s and check.s are identical. Am using -O2. The standard job runs in from 0.520 to 0.540 seconds process time on my 667 mHz office machine. The number of instruction evaluations is 62248 + 82618 + 2743328 (for the first, second, and third instruction resp.), or 2888194 total. This corresponds to 122 cycles per evaluation. 9/30/02: Before today, the program consisted of three .cc files: aha, search, and check. Made it all one file (aha.cc), mainly because of problems with C in defining a preset array of values and not requiring the user to also set a variable equal to the number of values in the array. No change to execution time. Build (mainly compilation) time dropped from about 2.0 secs to 1.2 secs. This change also permits inlining fix_operands, but trying that did not change execution time measurably (so it is not inlined now). 10/14/02: Before today, the incrementing of instructions was done with the rightmost operand varying most rapidly. Today it was changed so that the leftmost operand varies most rapidly. This simplifies the handling of commutative ops, and permits a few other minor simplifications. This gave a factor of 1.05 improvement in execution time. Quite minor, but the program is a little simpler and I think it will simplify more complicated optimizations that may be done, such as (somehow) avoiding programs that have an instruction whose result is unused. A preliminary investigation of this shows that for a typical RISC instruction set, 39% of three-instruction programs have an unused result, and 70% of four-instruction programs have an unused result. This is compared to the present program, which ensures only that the second from last computed result does not go unused. Thus there is hay to be made here. An attempt to skip ALL these silly programs resulted in a net increase in execution time, because it was implemented inefficiently. It seems to be hard to devise an efficient way to do this. Some compromise might be practical, such as ensuring only that the second and third from last results are not both unused. 10/15/02: Changed the program as just mentioned, i.e., to ensure that instruction n (the last) uses the result of instruction n-1 and, if instruction n-1 does not use the result of instruction n-2, then the last instruction does. This improved execution time by a factor of 1.4 for three-instruction programs, and a factor of 1.8 for four-instruction programs. 4/22/03: Ran Aha! on a two-input problem with n = 5 and 17 instructions enabled. Was searching for 5-instruction programs to compute the average of two signed integers (without overflowing). Shut it off after 144 hours (6 days). I should make it display a "progress report" for such long jobs, such as printing out the first instruction in the list each time a new opcode is selected for it. Otherwise, you don't know if it somehow got into an infinite loop and you have no idea how long the run will take.