As I’m continuously improving and growing the software base, I thought I’d add an update section and start posting… well… updates. Hate calling it a blog.
Yesterday I sat down for eight hours straight and focused on optimizations & cycles. Stuff like
- a:=a+#1 now translates to inc a (as opposed to lda a; clc; adc #1; sta a)
- a[c] := a[c] -#1 now translates to ldx c ; dec a,x instead of the whole tirade of loading the index, transfering to x (which is performed for more complex operations like a[c+sine[y]] := b+a[i];)
- Tons of smaller optimizations, (like using “lda index ; txa ; sta stuff,x ” -> ldx index ; sta stuff,x” or numbers etc
- zeropage optimizations
I discovered that the output asm sometimes had a substantial of overlay in terms of lda/ldx, like
ldx index sta blah,x ; do more stuff ldx index sta blah2,x ;more stuff ldx index ...
I added a post-optimizer that removes all excessive ldx / ldy / lda, and stops removing when one of the states have been altered (or a jmp, rts etc). In general, this removed about 3% of the total source.
In addition, I realized that when declaring temp variables (for use in more advanced expression), I actually declare them *on the fly* and perform a jump to the next code statement. Which is stupid. So all temp vars are now declared with other variables, and all excessive stupid jumps are removed. Later, I’ll reduce the size of temp vars used by reusing them in a eh sort of medieval garbage collector. I mean, compile-time.
In the end of the day, I managed to get a 4000-line assembler code down to about 3600 lines, about a 10% reduction in size – and increase in speed.
However, I still need to optimize the conditionals. They can be quite brutal where they should be simple.
In addition, I decided to finally implement cycle counters. I entered all the cycle information (like lda #2 has 2 cycles, lda i has 4, lda (zeropage),x has 5 etc), and started counting. The counter is displayed per block/statement / conditional, and is shown here in the right-hand side of this image :
Unfortunately, the cycle counter doesn’t take into account all post-processing optimizations mentioned above (the ldx/ldy removal), but internal ones are included. I found the cycle counter to be quite handy, and managed to optimize a bunch of routines in the tutorial game.
I also fixed a severe bug that finally resulted in correct line numbers & automatic error location. However, still doesn’t work when you use include files. must fix later.
During the next couple of days, I’ll shift my focus to making the editor more user friendly, with stuff like
- Multiple documents
- Replace function
- Code completion?
- Specialized character set editor (not just using a regular multicolor painting)
- Remember line number of files
- Remembering open files
- Thinking about type structures (bytes for now), but they need to be fast-fast-fast…