1. This forum is obsolete and read-only. Feel free to contact us at support.keenswh.com

My first impressions, some minor potential performance improvements.

Discussion in 'Source Code' started by PoeticInjustice, Sep 23, 2016.

Thread Status:
This last post in this thread was made more than 31 days old.
  1. PoeticInjustice

    PoeticInjustice Trainee Engineer

    Messages:
    10
    This is kind of a data-dump. I'm hoping some of the info may be useful.

    I got curious, and finally downloaded the source code and got it running last week. I don't have a ton of time to devote to any fixes, but I did find some things while I was poking around with a performance profiler, and some other random observations.
    Disclaimer: some of this data was captured with the exe compiled in DEBUG mode, so can't be completely trusted.

    A ton of Debug.Fail and Debug.Assert calls being triggered. devs must not run in DEBUG? Maybe this is a result of differences in environments.

    App Startup
    5% of startup time is in constructor for MyScriptWhitelist. I *suspect* that's all file IO, loading many of those assemblies for the first time, but that's a guess.

    A ton of the time (~18%) is in System.Reflection.MethodBase.Invoke(object,object[]), all from one line in MyEventTable.RegisterEvents. It does appear to be done on a background thread at least.
    This is all from processing a hardcoded array of Assembly objects. It could be done in parallel, but synchronized access to shared data could be an issue here.
    Reflection sucks for performance.

    Save load:
    Unsurprisingly, a decent chunk of save load time is in XML deserialization. probably no easy way around that without big changes. Having pregenerated assmblies for this goes a long way, and that's already in place (I think).

    About 4% of time inside Stopwatch.GetTimestamp... oh, it is from calls in a busy loop. There are better ways to do this. (this may only show up because I wasn't able to kill profiling right after the load finished.)

    3% in "LoadChannelsFromImage", which is a triply nested unsafe loop. It appears to be trivially parallelizable. Like, Parallel.For on the outermost loop trivial.

    Reading in planet texture and processing is about 7.5% (including the 3% directly above)

    About 1.6%, also called from reading planet texture, is in MyHeightmapFace.CreatePruningTree. Specifically, the logic finding min and max values. That method looks like there are potential improvements available, probably enough to completely drop it off the radar. In particular, keep the data as a ushort as long as possible, don't bother with the multiplication and conversion to float. Instead of converting all values to float and comparing floats, just compare the ushorts, then convert the min and max.


    Game loaded, dropping a bunch of warheads onto a planet:
    Side note for DEBUG only, as all of this is from a method that's [Conditional("DEBUG")], so not called in RELEASE. Vector3I.ToString() calling string.Format is actually fairly slow, ~2% of samples.

    9% MyFrustrumCuller.ProcessCullQueryResults and child calls, biggest one of which is MyRenderableComponent.OnFrameUpdate

    3% BoundingBoxD.Intersects(PlaneD plane)
    All calculations on variable "vector3_2" can be delayed until after the first test on all the vector3_1 values.
    Also, I'm guessing this has a TON of calls, and the two checks performed are the same code but operating on different variables. There may be a code size / cache line benefit to factoring that out, but I haven't tried it to verify.
    Last, I'm betting there are mathematically simpler checks that could be made, but I'm not going there.

    Non-trivial time (less than 1% of an oops scenario I accidentally did) in MyRenderableComponent.UpdateKeyColor.
    I think UpdateKeyColor can be deleted.
    The assignment of MyObjectDataCommon.KeyColor appears to be write only. Nothing that I can see reads it, but there could be something in native code?
    Interestingly, the check MIGHT actually be sped up by adding a public static readonly Vector3 DummyThatRepresentsNoColorKeyAndNeedsABetterNameThanThis = new Vector3D(0,-1,-1), and using Vector3.Equals(Vector3 other, double epsilon) to compare instead of the separate checks in this method.
    Looking at it further, it looks like the other value set in this method, MaterialFlags, is also write only.
    If those two fields truly are write only, and nothing is looking at them in native code, then they should be removed from the struct, and UpdateKeyColor can be deleted.

    AddBillboardParticle taking ~8%
    MyAnimatedProperty.GetInterpolatedValue is about 0.9% Not sure if that makes sense or not, don't know how complicated the interpolation is.

    Matrix stuff
    Matrix.op_Multiply(Matrix, Matrix) took 1.2% of that oops scenario. The commented out native call to do the matrix multiply in D3D9 code might be a decent idea. Not sure why commented out. Maybe beacuse move to dx11? Maybe it was actually slower?

    Matrix.CreateAxisAngle(Vector3, float) about 1.1%.
    I see a lot of opportunity for common subexpression elimination, I would like to think that the jitter is already handling that, but I did not look at disassembly to confirm.
    The assignments of 0.0f are meaningless, the struct is initialized to that to begin with.

    Side note on Matrix struct. Any place that takes a matrix as a parameter should probably take it by reference because it is huge, and copy by value is an issue. Probably means ditching op_Multiply and explicitly calling a Multiply method. This same logic applies to any other larger structs.
    Same likely applies to MatrixD.

    Another side note on Matrix / MatrixD, might be a good idea to remove the implicit Matrix <--> MatrixD conversion operators. Matrix to MatrixD because that could be masking a performance issue, and MatrixD -> Matrix because that should be an explicit operator because there's data loss there going from double to float. If nothing else, just make the operators explicit instead of implicit. Call sites can then cast. The data loss here could honestly be the source of who knows what kind of odd behavior. Uses of the MatrixD->Matrix conversion include Quaternion creation code used by pistons, rotors, landing gear, etc. Sounds frightening. The loss of precision could be a continual offering to the all-mighty Clang.

    Other
    In MyParticleEffect, if m_sortedGenerations means anything, then the IMyParticleGeneration implementations should both include fast and correct IComparable<T> implementations. It looks like they're both IComparable but not IComparable<T> so that implies boxing and an allocation every time CompareTo is called. One of the implementations has hardcoded return 0 for CompareTo, so that implies that m_sortedGenerations may not matter. (other implementation returns 0 for null other, that's probably not correct, but it is likely also not relevant and is only an artifact of IComparable vs IComparable<T>.). I have doubts as to whether the sort is doing what the dev thinks it is.
     
    • Like Like x 2
  2. Rabir

    Rabir Trainee Engineer

    Messages:
    72
    *Replace hardcoded vectors and matrices with System.Numerics.Vectors whiches are SIMD supported...
    I ran a test, hardcoded se vectors required 40 ms for 5000000 vector addition, while simd vectors done it in 6 ms
     
  3. Malware

    Malware Master Engineer

    Messages:
    9,867
    @PoeticInjustice Very happy to see a constructive post pointing out potential problem points without becoming judgmental.

    MyScriptWhitelist is mine (the entire VRage.Scripting assembly, really). It does indeed load up a whole lot of assemblies there, because it needs to define a lot of types and members available for scripting in the various environments. It's also one of three locations where the whitelist is configured.
     
  4. Rabir

    Rabir Trainee Engineer

    Messages:
    72
    Shouldn't crating a basic script language and a parser would be much simple? For example:
    Set [BlockName] [Field] [Value]

    THat would solve mod backward ompatiblity, so old mods won't get broken in each update.
    With some IF and GOTO it is easy to create... BTW you did a good job creating an internal compiler...
     
  5. Phoera

    Phoera Senior Engineer

    Messages:
    1,713
    simple for who?
    for devs? no.
    for reflection use? no.
    for ppl? also no, cuz learn existing lang is simpler, then new one, which will not have guides.

    my mod live couple of month already and never got broken.
    compatiblity mainly lost when some functions are removed or heavy changed, this will be same for own script implementation.
     
  6. Malware

    Malware Master Engineer

    Messages:
    9,867
    When I got involved the language was already established, so it was quite simply not an option. I would have liked having a specialized language - still based on an existing one, to simplify learning, but one that could "gloss over" the fact that it runs only once per tick etc. Solvable in C#, sure, but the awareness needs to be there. However that would have been a much, much bigger job even if the script simply was translated into C#. Not something I'd do in my spare time, not any more at least. I've made enough script languages over the years. It's fun but time consuming. Other than that @phoenixcorp13 has it right. Custom language won't help backwards compatibility... because even that must expose an api.
     
  7. Rabir

    Rabir Trainee Engineer

    Messages:
    72
    Yeah but IMO this can be considered as "Hardcoded" stuff... While with a custom script, KSH or you could alter the engine anyway you want. Rename classes, add new ones and more without breaking any of the PB blocks or mods...
    Also you don't have to create new language, there is LUA. Many game uses it for modding.
    For example Firefall's entire UI system is written in LUA and can be modified by anyone anytime...

    At least, if I will ever make scripting into my game, i'm sure it won't use any internal compiling or other but something like LUA or Javascript.

    My strategy when I make a game is to keep everything separated from the other. So when I hard modify a module (Like my Voxel engine, that I shitty designed for the first time and now I have to re-write the entire engine) the others still can operate with the new one AND the old one.
    Simply I like to keep the same API for my modules over the time for backward compatiblity...
     
  8. Phoera

    Phoera Senior Engineer

    Messages:
    1,713
    LUA also is slower, so it will be bad here for PB block-mode run.

    actually that possible now.
    but who will do this compability layer?
     
  9. Malware

    Malware Master Engineer

    Messages:
    9,867
    Strange how someone who seems as opinionated about performance would make such a choice. A compiled language will always be faster than anything interpreted. At any rate, all of that is irrelevant, because there's no chance Keen will replace C# now.

    So does Keen. However there are plenty of situations where that's quite simply not an option.
     
Thread Status:
This last post in this thread was made more than 31 days old.