Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some type of support for crude multithreading with more than one kOS part. #2889

Open
Dunbaratu opened this issue Mar 9, 2021 · 34 comments
Open
Labels
enhancement Something new (not a bug fix) is being requested

Comments

@Dunbaratu
Copy link
Member

Dunbaratu commented Mar 9, 2021

There's been enough people trying to use multiple kOS parts in conjunction as a sort of multi-threaded approach that I guess it's time to try to support at least some sort of crude model that will allow that. The key things that cannot be done by user-level code alone and need kOS support are:

  1. Some way to share memory between cores on the same vessel. This would be a way for one core to set a variable that another core can read. One possible way is to use the current ship message queue, but it would be nice if they could just share variables more directly with, say, one universal variable dictionary that all kOS parts can see (a sort of "even more global than global" namespace.) One possible way to do this with minimal effort would be to just provide a single lexicon everyone can see, like if one core does set ship:sharedlex["foo"] to 5. then another core can do print ship:sharedlex["foo"]. (Intuitively that might seem like a slow way to access memory, but in kOS even the "direct" variables are still dictionary lookups in the C# under the hood so that doesn't really matter too much. A program could just first do set sharedFoo to ship:sharedlex["foo"]. up front, then refer to sharedFoo from then on, if the author is concerned about speed of accessing the variable.)
  2. Some way for a script to have "atomic access" to a semaphore or mutex, so that when the script's algorithm requires it, it can declare atomic sections where the other cores requesting the same "baton" have to wait their turn. Please note, this would NOT be an "atomic section" from the point of view of KSP itself. This kind of "atomic section" could still run out of IPU and have to continue next physics update. But what this would do is make it so that if a core that "has the baton" runs past its IPU and has to continue next physics update, the other kOS cores that requested access to the same "baton" won't wake up and run their opcodes until that "baton" is freed by this one. I envision something like this: Currently each physics update, a core wakes up and runs IPU opcodes. The change would be that a core wakes up, checks to see if it's requesting a baton that's in use, and only if that baton is not in use will it proceed to execute the next IPU worth of opcodes, otherwise it will to go back to sleep and try again next update. Obviously, there will need to be failsafes to make sure "the baton" gets freed when a program dies or a the core part explodes.

Documentation: This could be the messiest part. I wish to keep kOS well documented, and like it's sort of friendly "not too complex for the newbie" approach, but this topic is not one to just "gloss over" quickly. It may be impossible to document how one would use these features without it being well above the newbie level. It may be that the documentation will just have to point to some wikipedia articles, point out "this is a very advanced topic beyond the scope of this documentation", and then just describe how the features can be used by the people who already understand it.

@Dunbaratu Dunbaratu added the enhancement Something new (not a bug fix) is being requested label Mar 9, 2021
thexa4 added a commit to thexa4/KOS that referenced this issue Mar 9, 2021
Can work as a solution to KSP-KOS#2889, KSP-KOS#145
@Dunbaratu
Copy link
Member Author

A thing to contemplate about the "atomic section" mutex: Should it exclude the core from running any code while another core holds the baton, or just from running user mainline code? Essentially I am referring here to triggers still being able to interrupt a core that's waiting for the baton. The obvious answer might be "no, don't do that", but remember that cooked steering depends on these triggers to make things like LOCK STEERING to HEADING(90,0) keep recalculating using the newly rotated world axes every update to decide what XYZ vector corresponds to HEADING(90,0) at the moment. As you move, it has too keep changing it.

@nuggreat
Copy link

Another thing to consider is would there be some way to exclude a core from the multi-threading group. Or a given core must actively join a multi-threading group.

The advantages requiring an active exclude/join from a core is that it lets us release a core from the group when it's scrip ends, and it would then be possible to use the mutex to exclude all code on the other cores. As with a core independent of the group it could then be the one handling steering and throttle where the multi-threaded cores do there multi-threaded work.

@mgalyean
Copy link

In the interest of keeping things simple, that is one reason I suggested that only the sharer of a "meta" global would have write access to it, at least initially. This would make coding it in the mod simpler as well as making it far simpler to document for the casual user. Personally I love the baton/mutex approach with multi core/script write access, but could live within the constraint that only one core/script have write access also. Honestly, it would probably lead to saner and more readable code to have that restriction. It hurts to write that, but it is true

@mgalyean
Copy link

mgalyean commented Mar 17, 2021

I should clarify that when I originally imagined these "metaglobal" variables, I imagine them being distinct. In that core A could create and publish variable X and core B could create and publish variable Y. All processors could read X and Y but only A could update X and only B could update Y. This would allow sharing of information uniquely generated by a core along with the ability to read information not generated by a particular core. And much like one-way streets in a city leading to overall better traffic flow, would make for better programming and easier documentation. No batons or mutexes required so no worries about debugging blocking issues both in user code and in the mod itself. More time for fun. K.I.S.S.

@nuggreat
Copy link

If you want something like that then why not just write out a JSON file to the core with your structure in said file. With a published file you then simply need to do LOCAL otherCoreData IS READJSON(otherCore:VOLUME:OPEN("plublicData.JSON")). and you get the published data. Admittedly not as clean or as fast as what you might see with a suffix call on otherCore but it would work well enough. The real limiter would be how much space you have on the cores local volume as unlike what you would see with a suffix.

@mgalyean
Copy link

mgalyean commented Mar 17, 2021

Um, no. That is heinously too much overhead. For the same reason real computers don't implement shared memory using hard disks and JSON. When a core reads another core's published metavariable it should pretty much be within the same physics tick, ideally, right? Why would one want to read in the entire variable state of another core to read a single variable value every time they reference it? With the implied write of the entire variable state on the publisher's end? Maybe I'm not seeing something here

@mgalyean
Copy link

Also, I don't see accessing the published vars as OTHERCORE:variable. I see it as accessing it by a variable name the same as other variables with it being up to the coder to prevent collisions on names

@mgalyean
Copy link

mgalyean commented Mar 17, 2021

I definitely agree that cores should not be required to participate in shared data by any mechanism. The cores either publish or access shared data or they don't. But if a core uses a shared variable name then it can read it without any separate "join" operation. It is just there if it wants to access it; though it can't write to it unless it published it. Publishing could be as simple as a using a particular declaration keyword, like PUBLISH instead of GLOBAL? Cores could check for the existence of published data by way of DEFINED() or a subflavor of DEFINED(), like IF PUBLISHED(somevar). And of course somevar could be a huge lex if one wanted, though I'd think a bigger number of req'd instructions would track bigger data shares

@nuggreat
Copy link

What ever this ends up being is going to need to go through a lot of processing basically no matter what as things like lists that normally in kOS get passed by reference would instead need to be passed by value or else the remote core could mutate a list that an other core is iterating over, to say nothing about the whole no passing function delegates. And that level of processing is basically going to be equivalent to writing out a JSON file.

@mgalyean
Copy link

I've suggested before that serialization be allowed between vars without a requirement to write to disk, so I'm open to the underlying mechanism involving serialization, but object to having to write it to disk. In python for example one can write json to a var and unserialize it from a string back into instantiated vars all within variable space. But I think for simulation purposes the number of instructions should reflect the speed and overhead of real shared memory and not the slower underlying mechanism of serialization/deserialization that any underlying JSON ops would be performing. So yes, the underlying mechanism might be variable based JSON that avoids disk writes/reads, but the simulation shouldn't have it cost that much

@nuggreat
Copy link

Writing to the volume of a core only goes to disk when KSP saves the game state otherwise it is data in ram. Unless some one is using the archive as that actually is your disk directly.

@mgalyean
Copy link

mgalyean commented Mar 17, 2021

Ok, but then it comes down to bloated source code. To access a shared variable in the real world doesn't involve a long-ish series of serialization and deserialization with path and filenames. We aren't emulating JSON data over the internet here. We are emulating shared data among a cluster of cores with shared access to RAM devices at best or shared access to a very fast bus to a shared storage device at worse. If I want core A to check up on the impact position calculated and published by core B every loop iteration it seems silly to be coding '...READJSON..' when real clustering or multithreading shared variables don't work what way. So again, it might be implemented that way in the mod, but kerboscript should have the syntactic sugar to make it look much cleaner. For me the model is somewhere between multithreading and clustering with an assumed darn fast communication between the cores; as if they were on the same bus, or in worst case, something like SATA speed-ish

@Dunbaratu
Copy link
Member Author

To truncate this discussion: I was never planning to have the shared values use any sort of serializing or comms so most of that discussion is moot. It was just going to be a way to directly access something that exists on the vessel rather than on the CPU itself, so all CPUs on the vessel can see that thing. For example, a suffix called ship:vesselmem which is a LEXICON, so one core can do set ship:vesselmem["foo"] to 1. and another core can do print ship:vesselmem["foo"]. and it's the same variable.

As for mutexes, I was thinking of having the code explicitly declare that it's entering a section where it must be atomic with another core by executing a command, maybe something like beginmutex("foo") and endmutex("foo"). (Or perhaps make it a language syntax feature where the begin/end is enforced by squiggle braces). The idea is that people who didn't write scripts that explicitly declare they are entering such a section are not participating in the mutex logic at all. We could even make access to the ship:vesselmem or whatever it's called check to ensure you're in a mutex section or you're not allowed to use it (to help prevent newbie mistakes where people forget or don't know about threadsafe issues.) That may be a bit heavy handed though because as long as everyone is only reading, not writing, such a mutex wouldn't technically be needed and it would be a pain to require being in one.

@mgalyean
Copy link

mgalyean commented Mar 19, 2021

I like vesselmem. Works. The mutex for shared mem reads seems overkill to me, if that is part of it, but other than that (if that) you have a good simple design. A mutex for writes would be absolutely necessary; unless only one core were allowed to write to it. If the mutex covers the entire vesselmem structure and not individual values then I probably won't make much use of vesselmem as too much of my code would be waiting in line given what I'd imagined doing with shared vars. For the same reason I don't use data bases with table level locking instead of row level locking. Another consideration for me when suggesting variable level sharing and a single writer is that having worked with every database out there it is abundantly clear that anything that arbitrates access will end up with stale locks and other issues related to that arbitration. So more complex code to maintain. Is there any way the shared mem could be separated from the multithreading issue?

@Dunbaratu
Copy link
Member Author

Is there any way the shared mem could be separated from the multithreading issue?

Vesselmem and mutexes were intended to be separate things. The reason they're in the same issue is that the existence of shared variables is what causes it to be important for kOS to provide an optional mutex system for scripts to use if they choose to.

The mutex system will just be for people who've decided their algorithm should use it. I'm definitely not going to take on the task of trying to force use of vesselmem to be threadsafe as that comes with an awful overhead. But I am planning to provide a mutex tool so at least script makers can mark sections if they want to. Right now they cannot and have no say over when kOS chooses to let another core jump in and interrupt.

I imagine something maybe like this:

// script 1, running on CPU A, has this which removes a value from a shared list:
function remove_index {
  parameter idx.
  mutex hands_off_mylist // name can be anything that would be a valid identifier
  {
    ship:vesselmem["mylist"]:remove(idx).
  }
}
// script 2, running on CPU B, iterates over vesselmem:
mutex hands_off_mylist
{
  // Can't delete from a collection while in a loop iterating over it,
  // so it's a good idea to force this to be exclusive with the deletion
  // on CPU A:
  for item in ship:vesselmem["mylist"] {
    print item.
  }
}

If you tried running that without the mutexes, kOS would happily compile it and let you, but then you'd get an exception thrown if the deletion in CPU A happens to interrupt the iteration in CPU B's for loop.

That's the sort of thing I have in mind. And mutexes wouldn't necessarily pause all CPU's - just the ones that hit the start of a mutex section for a named mutex that is in use elsewhere.

When documenting this I would be very careful to warn newbies that use of the shared memory means they should be aware of threadsafe issues and if these words are all alien to them it would be a good idea to not use the feature until they learn more about that kind of programming.

@Dunbaratu
Copy link
Member Author

Dunbaratu commented Mar 19, 2021

And the reason I'm thinking of a mutex section being begin/ended with curly braces is that this would mean it essentially is marked on the call stack and releasing the mutex could be something automatically enforced when the stack is popped by the ending curly brace. Thus if you try to bypass the end mark of the section with, say, a return or a break, the system would still release the mutex as it pops the nesting level of the stack back down to the level the break or return goes to.

Any time you try to have "begin and end" handled by having the programmer issue one command for begin and another command for end, inevitably there will be a case where the programmer forgot that a break or return caused the end command to get bypassed and now they're stuck nested inside the thing forever. (A similar problem happens in Unity's IMGUI where you have to manually call BEGIN and END methods to lay out the GUI subsections. When they don't nest right, Unity gets stuck drawing the window and the window is just blank.)

@mgalyean
Copy link

Ok, thanks for the reassurance, I do appreciate that. I figured all was well in hand and just wanted to make sure I was on the same page I guess. As you describe it, I think I will probably dabble in multithreading. Previously I imagined very independent tasks on each core sharing some common data but otherwise completely asynchronous. Or at least any coordination between them would be on a coarse level handled by status vars with the most critical timing being on the order of 1 to 3 seconds typically. I could always have them wait for another core's status var to change to some value. Now you have me thinkin' of slicker solutions

@mgalyean
Copy link

mgalyean commented Mar 19, 2021

On a related note, I had profound respect for AmigaDOS back in the day for taking the bold step of implementing multitasking on hardware that was barely up to the task. They made the hardware up to the task by leaving all the protections between processes up to programmers following rules rather than the OS enforcing rules. The buggy programs got jeered out of existence or into shape fairly quickly. That multitasking environment was a gauntlet that only the "real" programmers could negotiate as there was no real safety net to speak of in many regards. So, the point being that making the solution rock solid if rules are followed, but dangerous if not, can be a great learning tool

@nuggreat
Copy link

The reason why i was talking about working with serialization instead of the normal pass by reference for vars that we do in kOS was due to the one of the edge cases I could see.

Specifically what happens during undock/decouple as I could easily see a way to get local instances of the same reference on 2 different cores and thus after the undock/decouple they would still point to the same thing in memory but be cores on different craft.

Some other cases that need to be addressed would be how docking/undock and decouple. If we leave things as being passed by reference for these we would also likely end up with cores on different craft able to access the same list in memory or some strange and unexpected changes in vesselmem.

What happens on docking which of the two ship:vesselmem take priority during the docking it's self after all one craft is going away to be hidden within the other until the undock occurs. Does there also need to be an element:vesselmem so you can access the vesselmem that gets hidden on docking and possibly exposed again after the undock.

What about a decouple do we copy the ship:vesselmem to the newly created craft when it splits off or do we leave that blank as it would be if said craft was launched from the VAB.

@Dunbaratu
Copy link
Member Author

If we do nothing and just let stuff naturally work how it "wants to", then when splitting a vessel in two (undock or decouple, they're really the same thing in the KSP data structure) vesselmem would remain on the "same" vessel, and a new blank one would be made on the "branch" vessel. Stale references to the vesselmem you are no longer attached to are an issue, which is a good point and it means that being allowed to access vesselmem should come with an exception to be thrown when the vessel isn't the current one. (There's a utility check already implemented that can be reused here, as that check exists in many places in kOS - like the error that's thrown when you try to set vessel:control:fore to 1. when vessel isn't your current vessel.)

Vessel merging, on the other hand, is a more thorny issue to deal with. With two different vesselmems - do they melt together into one (with one side "winning" and clobbering the other when two values clash), or does the mem get thrown away for the vessel that disappears? The simplest way is for the vesselmem for the disappearing vessel to just go away, as that's what would naturally happen anyway, but people may not like that.

If this is too much of an issue, then instead of vesselmem perhaps we just make the CPU parts themselves contain a shared mem namespace instead of the vessel containing it. Then you'd decide which CPU you want to "owns" the memory that all the other CPU's on the vessel access. Like CPU FOO says "give me a reference to CPU BAR's sharedlex". Then the rule would be that any attempt to access any of the values in the sharedlex come with a check to ensure the CPU is still attached to the same vessel as the one the sharedlex is in.

@nuggreat
Copy link

For the undock/decouple I was thinking more that you store local instances as apposed to always accessing vesselmem every time, likely bad practice for this type of thing but people will do it. I see it being something like this

core 0

LOCAL someList IS LIST(1,2,3).
SHIP:VESSELMEM:ADD("someList",someList).
//wait until undock
someList:ADD(4).
PRINT someList.// what gets printed?

core 1

//some wait until someList gets added to vesselmem
LOCAL myReff IS SHIP:VESSELMEM["someList"].
doUndock().
myReff:ADD(5).
print myReff.// what gets printed?

Thus there wouldn't be stale calls to VESSELMEM as nether core is trying to access VESSELMEM

Also the reason I was making a distinction between undock and decouple is that a decoupleing craft gets a new name where as a undocking craft will often get it's old name back which at least to me implies there is some vessel data that survives the dock/undock which I also know simply because of how elements work as they also expose some pieces of the normally inaccessible vessel data.

@mgalyean
Copy link

I lean to each CPU hosting the data it is sharing and other CPUs referring to that data by shared var name with a way to check if that data exists as a share (like the PUBLISHED() check or similar). But it would be referred to by shared variable name solely, for simplicity, not via core tag. When a script asks for the value of shared variable 'foo', and if more than one instance exists for that name across cores, then the first instance encountered would be returned in the normal case. In 99% of cases users will name their shared vars and use their shared vars such that no collisions will occur once debugging is done

For debugging, another access method with a bit more syntax overhead could return a LIST() of all shared vars colliding on that name and all shared var elements in that list would have suffixes allowing visibility on the core sharing it (tag etc). There is no reason someone couldn't use the "debugging" method as a coding practice to regularly, as a coding practice, go through a list to find the one they want if they really want to collide on names as a normal practice, but I don't think most good code would converge on that practice. This would be similar to how a lot of code uses SHIP:MODULESNAMED... or SHIP:PARTSDUBBEDPATTERN... and then picks through to what it wants. There is no reason the same approach couldn't be used for shared vars: return a list of shared vars with that name. Each item in the list would have a suffix to id the core from which it came via a PART or CORE suffix maybe

But there should be syntactic sugar to simply return what would be the first entry in that list directly for the case where there are no collisions in well-designed and debugged scripts. As in this case there would be no collisions and the list would only be used during debugging or code-handled exceptions. The list method could have some query syntax associated with it to filter it much like the SHIP:PARTS....PATTERN suffixes do for those who would collide on shared var name as a normal way of programming

tl;dr The list method could be used as an error check in typical code to check for name collision, or as a normal programming way to find the reference one wants among intentional collision, while the direct reference would simply return the first encountered instance of a shared var with that name where collision would be undesired and the list method would just be a debugging tool and runtime check for duplication

@Dunbaratu
Copy link
Member Author

Dunbaratu commented Mar 20, 2021

I'm trying to think over the "failing to orphan" problem here that comes with a script still holding a reference to a "foreign" CPU's variable when that CPU is no longer legal to access (blew up, or just got separated and is no longer on the same vessel because of an undocking.) There's no real way to enforce "you have to use the full path every time" to ensure it will check if the variable is still accessable (nor would people want that). Consider:

local otherFoo is otherCore:sharedmem["foo"].

// .. time passes here ...
// .. otherCore becomes undocked from this vessel now ...

print otherFoo. // This *should* be illegal now.  But enforcing that will require some work.

There's two reasons that print otherFoo. should be illegal there:

  1. The "theme" is broken by the fact that this is shared memory but that core is not on the same ship now.
  2. C# won't clear that variable because it's not orphaned properly.

To enforce this there'd have to be some way for kOS to make sure the variable is still on a core of the same vessel every time you get or set its value. It wouldn't be sufficient just to check when you use the call that gets the other vessel's mem. It would have to check each time because you can store a ref to it as shown above.

That would mean the knowledge that a Variable isn't really a "here" but is "somewhere else" would have to live inside kOS.Safe.Execution.Variable. It would have to know the difference between a variable on the same core versus a variable on a foreign core, so that when kOS.Safe.Execution.CPU.GetValue() and kOS.Safe.Execution.CPU.SetValue() try to use it, they can redirect over to the correct core that owns it (and that redirection could check if that other core is a legal one to access).

I'm thinking it may have to be a Weak Reference when it's on a foreign core, so when the other core dies or blows up, it really does truly orphan and wipe the value despite the fact that this core still has a reference to it.

@nuggreat
Copy link

That orphaning is why I was talking about running all the shared data between cores as something that gets passed by value as apposed to by reference and using our already existing serialization stuff seemed like a goodish way to accomplish this task. As on the plus side orphaning isn't really an issue when there are no references between cores. On the other hand the down sides are that there would be data duplication which would increase the kOS memory footprint and any changes to the shared data would need to be actively pushed as apposed to just passively happening as it would when stuff passes by reference.

Where as with a pass by reference more new code would be needed. The upsides are that any updates instantly happen between cores with out user intervention and we don't duplicate things in memory so the overall footprint is smaller. On the other hand this new code has a potential to leak if orphaning doesn't happen correctly. Also consider what would happens when some one tries to pass a function delegate by way of this shared system and how would that play out. I hope we would see the same error that we see if we make a global var a function delegate in a script and then try to call that var once the script ends and it would crash as normal. But if that doesn't happen one could get 2 different instruction pointers running in the same stack which would be "FUN".

@mgalyean
Copy link

The orphan issue is yet another reason kerboscript really needs a NULL or NIL value. If the shared var is no longer available, return null. Coders are used to doing null checks on volatile data from elsewhere

@mgalyean
Copy link

A true null would alleviate what to return for so many other suffixes and functions throughout kOS when a valid value can't be returned, not just for the orphaned shared var conundrum

@mgalyean
Copy link

mgalyean commented Mar 20, 2021

Definitely pass by value. No one needs the headaches doing it elsewise. Please consider the benefits of single writer/publisher for similar reasons. It would offer a lot of bang for the buck, and the limitations would actually help organize things quite a bit design-wise

@nuggreat
Copy link

kOS actively avoids having a null as the KSP API doesn't protect against getting passes these and if there is some place that kOS fails to protect correctly it could corrupt peoples current game state and possibly even corrupt saves.

@Dunbaratu
Copy link
Member Author

Dunbaratu commented Mar 21, 2021

@nuggreat :

Where as with a pass by reference more new code would be needed.

C# does everything by reference by default except for primitives like int and float. When you don't spend effort on implementing deep copy methods for everything, pass by reference is the default way C# behaves. I don't see how fighting this is less effort than not fighting it. (i.e. fighting it implies we have deep-copy constructors for literally everything we would allow to get shared, which we currently don't.) It's also quite easy to "miss a spot" and get one piece of data inside the copy that's really still a reference.

@mgalyean mgalyean:

single writer

As a model a user can choose to use it's fine. But for kOS to actually enforce the model and make it impossible for a reader CPU to alter anything at all in the writer CPU's area - that sounds like a lot of work because of how most of the subtypes of Structure in kOS are currently mutable. For example, say the writer puts a value into a shared area, calls it FOO, and makes it have the value LIST(1,2,3). A reader CPU that says something like local bar is GetValue(otherCPU, "foo") is going to get a bar that is a copy of the same reference to the same LIST that the owner's foo references. And the reader will be able to do bar:remove(1). on it because the list itself has no idea who it's owned by. It doesn't currently have a field like "myOwner" or anything like that. So this isn't a small change to be able to enforce this, at least not if you want that enforcement to recurse through containers nested in containers.

I suspect the vessel message queue is already able to be abused by scripts trying to use it as shared memory if they think this through. If all the CPU's Peek instead of Pop the topmost item on the queue, and the topmost item on the queue is just a LEXICON, then everyone has access to that one shared LEXICON.

@nuggreat
Copy link

nuggreat commented Mar 21, 2021

The message system does do a deep copy and pass by value only. Heck because it is passed by way of a serialized string if you pull the :CONTENT several times from a message each time you get you will have a new instance of the thing with the original values in the message. Hence why I was saying that using this already existing code for moving things by value between cores would avoid the reference issues.

This was the test case for a single core but by commenting out parts of it and changing the value you can also use it on another core and there was no reference bleed that I could make to happen.

SHIP:MESSAGES:CLEAR.
LOCAL foo IS LIST(1).

SHIP:CONNECTION:SENDMESSAGE(foo).

WAIT UNTIL NOT SHIP:MESSAGES:EMPTY.
LOCAL foo2 IS SHIP:MESSAGES:PEEK():CONTENT.
LOCAL foo3 IS SHIP:MESSAGES:PEEK():CONTENT.

foo2:ADD(2).
foo3:ADD(3).

PRINT foo2. // prints: list(1,2)
PRINT foo3. // prints: list(1,3)
PRINT foo2 = foo3. // prints: False

@Dunbaratu
Copy link
Member Author

Dunbaratu commented Mar 21, 2021

@nuggreat - Ahh I see it now. Again, you know the system better than I do even though I look directly at the code. I can see now the MessageQueue borrows the serializer routines when pushing, peeking, and popping. (Non-serializable objects cannot be put on the queue for this reason).

If we're willing to accept the limitation that only serializable things can be shared, then a single writer/multiple reader model could be done. It does seem like a lot of overhead to pass everything through the serializer, but it would force all references to be broken off and the data copied instead (and people could be told to keep their parallel algorithms rather coarse-grain - in the sense that it's going to make your frame rate chunky if you try to share lots of little things back and forth frequently.)

I still don't like being forced to use the message queue for this, as that does force a certain order to the data (can't read the data out of order). So a type much like MessageQueue that does the same thing (serialize and deserialize when accessing elements) but is a wrapper around LEXICON rather than a wrapper around QUEUE might work fine for random-access things out of order.

(Then again a shared LEX may encourage people to NOT store their own copies, which is kind of a bad thing really given how much overhead there would be under the hood if every access of one of the items of that LEX does a new copy-by-value of that whole entire item, through the serializer, every time. i.e. if the "foo" item in the shared memory was a list and another CPU did for x in range(0, otherCPU:sharedlex["foo"]:length) { print otherCPU:sharedlex["foo"][x]. }, it would be generating a new copy of the entire list via serialization on every iteration of that loop then just throwing it away, and that would not be something a user would intuitively expect would be happening here. Having it in a queue would force people to have to make their own copies as they consume through the queue since they can't go back and revisit the items a second time.)

@mgalyean
Copy link

As long as the overhead is "under the hood" and doesn't unrealistically affect the number of opcodes used; that is to the user it appears to be implemented using something faster than serialize/unserialize, I don't think that overhead would be an issue. But I'm not planning on sharing 10,000 point aero Cd lookup tables so others mileage may vary. If I were to use such a large data structure, it would be local on a single core. Unfortunately we can't know how people will use this, but limitations lead to creativity and smarter programming passes less data around, so.... I know what I plan to do, but that could be completely different from others. I just want to divvy up some background processes, like managing panel deployments, managing thruster balance as craft changes (not just fuel, but deploying sats etc), handling sequences, like a docking, etc. At some point you have to decide whether to sell lengths of rope long enough to hang oneself with. The folks who need a long rope and have no intention of hanging themselves are usually the ones to have the more coherent arguments

@nuggreat
Copy link

At the end of the day dun you are going to be the person implementing and maintaining this system so go with what you think would be the better system.

Why I was looking at running the shared data through the serialization system was that it would serve as a method to break references and block delegates as well as it being code we have on hand that has already been proven.

As for the sharing method IF we are going with serialization then a core suffix makes sense. Having the shared structure as a lexicon also makes sense to me. One possible solution mitigate against some one doing for x in range(0, otherCPU:sharedlex["foo"]:length) { print otherCPU:sharedlex["foo"][x]. } would be to impose a WAIT 0 on each call to :sharedlex similar to what we do with SHIP:BOUNDS and STAGE.. Also more or less no matter what to do kOS multi-threading you need to have fairly course grain work for the cores simply due to how they work.

@Dunbaratu
Copy link
Member Author

The imposed wait 0 is a great idea to force people to only use this in a coarse-grain way.

Why I was looking at running the shared data through the serialization system was that it would serve as a method to break references and block delegates as well as it being code we have on hand that has already been proven.

It makes perfect sense why you were talking about the message queue now that I see it causes a deep copy indirectly by using serialization. I didn't realize it did that originally, and didn't think we had any such feature in the code, thus why I thought trying to enforce pass by value was going to be a pain. I thought I was going to have to run through and make copy constructors for everything and test all that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Something new (not a bug fix) is being requested
Projects
None yet
Development

No branches or pull requests

3 participants