Jump to content




Optimization and micro-optimization


  • You cannot reply to this topic
43 replies to this topic

#1 HDeffo

  • Members
  • 214 posts

Posted 05 September 2015 - 08:19 PM

Introduction

Before beginning optimization first ask yourself does my program even need optimized. Yes we all want to have an insanely blazing fast program but not all programs are considered time critical. A frame buffer should always be as fast as possible because it is relied on for its speed. A door password on the other hand as long as it has a reasonable speed doesnt need to be blazing fast. After asking yourself if a program needs optimized you also need to ask can I make something faster even before applying any optimization tricks. You will gain a much larger speed difference many times simply by rewriting a program in a more resource friendly way then you will by changing simply how you write and use those resource using optimization hacks. After you have passed those two questions and still decided your program is taking too long to do what you want please follow this guide to push that extra bit out of your code.


This guide will be divided into four parts.
The good practice section is stuff that increase readabilit and have a major effect on speed or have such a massive effect on speed it is enough to completely toss out the impact of readability. These tips should always be followed whenever you are coding anything that they apply to.
The optimization section features tips that can get a decent boost on speed while sacrificing some readability in return. These should only be followed if you feel speed is crucial to your program.
The micro-optimization section contains tips that only offer a minor impact on speed however they will usually hurt readability and functionality a lot of times. They are left here more for academic purposes in case anyone ever does have a reason to micro-optimize. Remember the first law of micro optimization: Don't do it.
The misinformation section contains tips that either were at one time helpful for optimization and now have no impact or a negative impact as well as tips that tend to be misinformed or just bad practices people tend to pick up when trying to improve their codes. This is essentially a section of things you SHOULD NOT DO


It should also be noted this guide ONLY applies to Lua in Computercraft in conjecture with the latest version. Old outdated versions, other lua platforms, and computercraft emulators may not benefit in the same way from these tips. If you want to apply these to a platform outside of minecraft computercraft please test them before assuming it still applies.


Good Practice

Optimization

Micro-optimization

Misinformation/Bad practices

Edited by HDeffo, 07 September 2015 - 05:10 PM.


#2 MKlegoman357

  • Members
  • 1,170 posts
  • LocationKaunas, Lithuania

Posted 05 September 2015 - 08:32 PM

Well, this is going to be helpful. I already know where I'll be putting the table-string concat method.

EDIT: never assume something is self-explanatory. I've seen many people assuming that, while the "self-explanatory" wasn't that clear.

Edited by MKlegoman357, 05 September 2015 - 08:34 PM.


#3 Lupus590

  • Members
  • 2,028 posts
  • LocationUK

Posted 05 September 2015 - 08:38 PM

You may want to include warnings for errors people may think they find. The one I'm thinking of is declaring a local in a loop and then trying to use it outside of the loop.

#4 HDeffo

  • Members
  • 214 posts

Posted 05 September 2015 - 08:44 PM

View PostLupus590, on 05 September 2015 - 08:38 PM, said:

You may want to include warnings for errors people may think they find. The one I'm thinking of is declaring a local in a loop and then trying to use it outside of the loop.

I actually advised against that and to instead put the local outside of the loop as it is faster if the variable instead changed every iteration.

that being said when I get back and finish writing this I will add warnings on what negatives each tip could have. the only ones with real negatives are the ones with WEAK in the spoiler title.

Edited by HDeffo, 05 September 2015 - 08:45 PM.


#5 HDeffo

  • Members
  • 214 posts

Posted 05 September 2015 - 10:17 PM

View PostMKlegoman357, on 05 September 2015 - 08:32 PM, said:

EDIT: never assume something is self-explanatory. I've seen many people assuming that, while the "self-explanatory" wasn't that clear.

You are very right xD later today I will add more detail on those and explain them when I am back on a computer

#6 Bomb Bloke

    Hobbyist Coder

  • Moderators
  • 7,099 posts
  • LocationTasmania (AU)

Posted 05 September 2015 - 11:53 PM

Some other things to check, if you're still in the mood. I've often pondered them but never bothered to sit down and test:

Spreading assignments over multiple lines. (I imagine that executes slower than clumping them all into one, but it may be sorted out at compile time for all I know.)

Term updates; if you need to update just the characters at either end of a line, is it faster to move the cursor once and re-blit the whole line, or is it faster to move the cursor twice in order to just do the ends? Does line width / use of monitors make a difference?

Bit shifts vs multiplying/dividing by two. The function call is a penalty, but when dealing with larger numbers, I suspect shifting may still win out.

Checking for odd numbers via bit BANDing instead of a modulus operation. Same as above, I suspect BAND tests are faster for larger numbers.

View PostHDeffo, on 05 September 2015 - 08:19 PM, said:

lua tables are built off of arrays behind the scenes. a single lua table is built off of an array and a hash. both of these require a preset size and whenever a table increases in size both of these need to be rebuilt behind the scenes according to this new size. When you add a new numerical index in sequential order 1,2,3,4... this increases the size of the array. When you add a non sequential non numerical index e.g. ["foobar"] or [6] it increases the size of the hash. When setting a value in a table to nil the sizes wont decrease until after you set a new value into the table which is why the last example of presetting the removing a preset value before every added value still works for that speed increase.

Just to elaborate for those who aren't familiar with Java's arrays;

Think of them as being like a table which only accepts numeric indexes. But on top of that, you have to specify exactly how many indexes you want available within them before you can start using them.

If you later find that you've run out of indexes and need more, then to "expand" an array, you typically 1) define a new one of a larger size, and then 2) manually copy every index from the old array to the new.

LuaJ stores all the numeric indexes of a Lua table in a Java array (it uses a separate hash map to store all the other keys). Every time it finds itself needing to perform the above process, the new array will be sized according to whatever power of two will suffice. Eg, if you have a table with 128 indexes and want to add one more, LuaJ will initialise a new 256 index array and copy over the existing 128 indexes before adding number 129.

So if you DO need to use a loop or something to fill a large one, theoretically doing so in reverse order should end up being faster (as you'll be targeting the highest index first, thereby resizing the array only once - directly to the maximum size needed).

View PostHDeffo, on 05 September 2015 - 08:19 PM, said:

use 1,2,3,4,5... in your table indexes instead of ["foo"] or ["bar"] when possible

... and if you do use strings as key names, use the format myTable.keyName to reference them where possible, as opposed to myTable["keyName"]. This saves LuaJ a step in figuring out whether you're providing a numeric index or not.

View PostHDeffo, on 05 September 2015 - 08:19 PM, said:

lua has to look up the value for a variable every time it does a math operation to that variable then sets the value and continues to the next operation in the series. setting the variable to the last item handled or as far back as it can be means fewer lookups and sets.

Er, but why does the order matter if the variable is still only mentioned once? Wouldn't the same amount of operations get performed on it either way?

Edited by Bomb Bloke, 05 September 2015 - 11:58 PM.


#7 HDeffo

  • Members
  • 214 posts

Posted 06 September 2015 - 12:10 AM

@bomb bloke (quotes don't work well on my phone so I'm doing this)


Thanks I'll look through your suggestions and test the speeds of each I hadn't thought of those things.

Reverse indexing would actually be worse. If indexes are saved in reverse it doesn't initially read them as being sequential. Lua saves non sequential indexes in it's hash and not it's array. So you'll be increasing the hash size each time you add a new index and then at the last one increases the size of the array and drops the size of the hash moving all indexes over to the array.

Compile time may change using foo.bar instead of foo["bar"] but run speeds are exactly the same. Both save using 4 operation codes both of which are exactly the same.

I will be honest here I actually don't know. I only know the official lua website says to order operations putting variable as the last operation. And my profile tests have shown this does offer a slight speed increase but as to why I don't know the op codes behind it's math so I really don't know. I assume for some reason it must get the value of the variable after each operation but that doesn't really make sense...so I'm sorry that one I don't have the best explanation on why

Edited by HDeffo, 06 September 2015 - 12:16 AM.


#8 Bomb Bloke

    Hobbyist Coder

  • Moderators
  • 7,099 posts
  • LocationTasmania (AU)

Posted 06 September 2015 - 02:30 AM

View PostHDeffo, on 06 September 2015 - 12:10 AM, said:

Reverse indexing would actually be worse. If indexes are saved in reverse it doesn't initially read them as being sequential. Lua saves non sequential indexes in it's hash and not it's array. So you'll be increasing the hash size each time you add a new index and then at the last one increases the size of the array and drops the size of the hash moving all indexes over to the array.

Ah... I'd missed that! Interesting on two levels; I'd long been wondering what shoving a random value into some super-high index did to RAM usage. Not much, as it turns out.

I suppose this means that if you do have to build a table out of sequence, it may be worth unpacking it into a new table definition afterwards in order to "smooth it out", so to speak.

View PostHDeffo, on 06 September 2015 - 12:10 AM, said:

Compile time may change using foo.bar instead of foo["bar"] but run speeds are exactly the same. Both save using 4 operation codes both of which are exactly the same.

You're right, at run-time the check shouldn't "need" to be performed in the same manner, so I guess the compiler would filter it out. And compilation times seldom matter.

It still needs to be done at run-time if the key is being provided via a variable, but since you can't use dot notation if you're needing to do that, that's a moot point.

#9 HDeffo

  • Members
  • 214 posts

Posted 06 September 2015 - 03:51 AM

View PostBomb Bloke, on 06 September 2015 - 02:30 AM, said:

-snip-

sorry I was wrong it is 3 op codes here they are exactly. although even more technical the lookup itself is only 2 regardless of method so even better :P



b=a.x
GETGLOBAL 0 ; a
GETDOTTED 2 ; x
SETGLOBAL 3 ; b
b=a["x"]
GETGLOBAL 0 ; a
GETDOTTED 2 ; x
SETGLOBAL 3 ; b

Edited by HDeffo, 06 September 2015 - 03:52 AM.


#10 Bomb Bloke

    Hobbyist Coder

  • Moderators
  • 7,099 posts
  • LocationTasmania (AU)

Posted 06 September 2015 - 04:57 AM

Out of interest, what does this resolve to?

b = a[x]


#11 HDeffo

  • Members
  • 214 posts

Posted 06 September 2015 - 05:08 AM

View PostBomb Bloke, on 06 September 2015 - 04:57 AM, said:

Out of interest, what does this resolve to?
b = a[x]

offhand from what i know of lua op codes it would be

GETGLOBAL 0 --#a
GETGLOBAL 1 --#x
GETDOTTED 2 --#value of x
SETGLOBAL 3 --#b

I could be wrong on this I only just recently got into looking through the lua codes

#12 Bomb Bloke

    Hobbyist Coder

  • Moderators
  • 7,099 posts
  • LocationTasmania (AU)

Posted 06 September 2015 - 05:19 AM

Silly question, but I assume you're also considering how LuaJ actually handles these codes?

#13 HDeffo

  • Members
  • 214 posts

Posted 06 September 2015 - 05:31 AM

generally more codes=slower run time. That being said I haven't managed to look through LuaJ in its entirety so many of these are purely off of what my profiler is telling me along with what other people have written in on lua-users. The only one I have directly looked into on how LuaJ handles the codes were for tables because I wanted to make sure it handled tables the same backend as it did when wrapped over C++ before I made the suggestion of pre-indexing tables.

Just tested and confirmed by looking through the java end of the code. Line width when using term.blit does not matter.

#14 Bomb Bloke

    Hobbyist Coder

  • Moderators
  • 7,099 posts
  • LocationTasmania (AU)

Posted 06 September 2015 - 09:48 AM

So if you want to alter multiple parts of a line, doing it in one call is always the better option? Thanks, I'd suspected as much. :)

#15 SquidDev

    Frickin' laser beams | Resident Necromancer

  • Members
  • 1,427 posts
  • LocationDoes anyone put something serious here?

Posted 06 September 2015 - 12:07 PM

Whilst I did know most of these, it is interesting seeing these all put together in one place. I will stress though, readable/maintainable code is far more important than micro-optimised code. These could probably be split into three sections:
  • efficient programming (such as table.concat, using hash lookups instead of a linear search, etc...)
  • necessary evils (caching global/table lookups)
  • "micro-optimisations" (avoid ipairs/pairs/min/max/unpack, for not while).
For those interested in finding more about the LuaJ VM, the relevant code can be found here. Just a couple of questions/bit of scepticism though:

put all non variable math first

In the same line of things:
don't put constants in loops

localize arguments


Just another couple of optimisations/enhancements on above you can do:

Upvalues are quicker than tables

Closures creation is slow

Edited by SquidDev, 06 September 2015 - 12:08 PM.


#16 Bomb Bloke

    Hobbyist Coder

  • Moderators
  • 7,099 posts
  • LocationTasmania (AU)

Posted 06 September 2015 - 12:25 PM

How does the time it takes to define a new local compare to the time it takes to perform a table lookup?

That is to say, in this example:

foo.y.x = foo.y.x + 1 --#slower

local y = foo.y
y.x = y.x+1 --#faster

... you're trading one of four table lookups, for one extra assignment and variable initialisation. What's the average difference there?

Edited by Bomb Bloke, 06 September 2015 - 12:29 PM.


#17 SquidDev

    Frickin' laser beams | Resident Necromancer

  • Members
  • 1,427 posts
  • LocationDoes anyone put something serious here?

Posted 06 September 2015 - 01:01 PM

View PostBomb Bloke, on 06 September 2015 - 12:25 PM, said:

foo.y.x = foo.y.x + 1 --#slower

local y = foo.y
y.x = y.x+1 --#faster

... you're trading one of four table lookups, for one extra assignment and variable initialisation. What's the average difference there?

One thing to remember about Lua is that it is not entirely stack based but more a 'slot/register based' bytecode (the only time when the stack really comes into play is with variable number of arguments/return values). IIRC foo.y.x = foo.y.x + 1 is converted to the equivalent of:

local temp = foo
temp = temp.y

local temp2 = foo
temp2 = temp2.y

temp = temp.x
temp = temp + 1
temp2.x = temp

So there is no overhead to 'declaring' a new local variable as every expression is stored to a local variable anyway.

I recommend downloading ChunkSpy. You'll need to patch the IO library slightly (or just run it in normal Lua). If you run ChunkSpy.lua --auto --interact then you can type in expressions and get a dump of the Lua bytecode.

Edited by SquidDev, 06 September 2015 - 01:03 PM.


#18 HDeffo

  • Members
  • 214 posts

Posted 07 September 2015 - 12:50 AM

when i posted this below the formatting went to complete hell.. be warned because its a little too much for me to clean up afterwards
Spoiler

Edited by HDeffo, 07 September 2015 - 12:52 AM.


#19 Bomb Bloke

    Hobbyist Coder

  • Moderators
  • 7,099 posts
  • LocationTasmania (AU)

Posted 07 September 2015 - 01:02 PM

The trick there is to mash the rich text editor button until it bugs out in reverse and fixes its prior mistakes.

Spoiler


#20 Exerro

  • Members
  • 801 posts

Posted 07 September 2015 - 04:32 PM

Avoiding defining constant locals and globals in functions is a valid point. Not quite sure if you're right with the pointer theory, but any code that is run in a function is likely to be run more than once, so if it doesn't need to be, better to have it out of the repeated code. The same goes for loops... why define it over and over in the loop when you can define it once outside of that loop? While defining a local really doesn't take much time, it does take time, and with thousands of iterations that can really add up.

Let's say it takes 0.001s to define a local (I'd suggest it takes a tiny proportion of this time really). You're also iterating through every pixel on a 51x19 screen like this.
for x = 1, 51 do
	for y = 1, 19 do
		local something = 5
	end
end
That local definition is being run 969 times, making the total time taken to define it 0.969 seconds (nearly a whole second).

With concatenation, it completely depends on how you're using it. If you're concatenating 5 things, using '..' is definitely quicker.
local s = "a" .. "b" .. "c" .. "d" .. "e"
-- will execute faster than
local s = table.concat { "a", "b", "c", "d", "e" }

However, you're generally concatenating things in a loop, something like this:
local s = ""
for i = 1, n do
	 s = s .. v
end
Don't do this! Lua (C-Lua at least) takes time to make sure you're not duplicating strings. When you create a string, it will check every other string in existence, and if it is equal to that string, use that instead of the new one. Absolutely no idea why it does this, but it does. This is a really slow operation, so creating strings repeatedly (like in the above example) is slow. In that case, using table.concat() is a lot better:
local t = {}
for i = 1, n do
	t[i] = v
end
local s = table.concat( t )
You'll notice ridiculous speed increases by doing this.

There is, indeed, no GETDOTTED instruction. There is GETTABLE:

Quote

GETTABLE A B C R(A) := R( B)[RK©] Copies the value from a table element into register R(A). The table is referenced by register R( B), while the index to the table is given by RK©, which may be the value of register R© or a constant number.

Basically. The only difference between a.x and a[x] where x is some string is that the latter means that the index needs to be loaded into a register first (1 instruction). There is no difference between a.x and a["x"], Lua is clever, it can tell "x" is a constant string. a["x" .. ""], however, is much longer - 3 instructions longer than a.x (3 times the instructions).

I'd also like to suggest mentioning that... an example is the only way I can explain this.

Let's say you want to determine the circumference of a circle given its radius. You could use this:
	return 2 * math.pi * radius
See how there's a constant there '2*math.pi'? That isn't constant folded, so every time you call this function, you'll be multiplying 2 numbers that are constant. I would suggest that it's much quicker to do this:
local pi2 = 2 * math.pi
...
	return pi2 * radius
Taking constant values like that out of functions to the code above is generally a good idea. Some good mathematical knowledge will allow you to factorise your code then calculate the constants above, so you're doing 2 operations instead of 5.

Edited by awsumben13, 07 September 2015 - 04:32 PM.






1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users