Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

defstruct macro #12

Open
rutenkolk opened this issue Sep 2, 2024 · 6 comments
Open

defstruct macro #12

rutenkolk opened this issue Sep 2, 2024 · 6 comments

Comments

@rutenkolk
Copy link
Contributor

rutenkolk commented Sep 2, 2024

As per #9 (comment)_ i found myself effectively working on a defstruct macro.
It's probably best to not resurrect the old pull request, so i wanted to move the discussion here.

I want to gather some feedback before actually making a new pull request.

Features / Properties I would expect of defstruct:

  • generate a class that enables fast marshalling native struct data into and from
  • unroll primitive reads/writes
  • support vector and hashmap serialization
  • generated classes should support most clojure functions / be easily transformed into vectors or maps.
  • be as fast as reasonably possible
  • not rely on defalias

open questions on my end:

  • how would such a class compare to e.g. defrecord?
  • what's the most performant way to have the members of the class serialize and deserialize? (possibly simply have it byte array backed?)
  • supporting the generated classes, vectors and maps comes with an overhead. i would propose to simply expose the monomorphized serialization functions as well. That way we can have both convenience and speed.

current state of my prototype:

  • unrolled writes but no reads. writes of other structs are deferred to via their respective protocols.
    • for vector / map serialization i see no way around that
    • for monomorphized serialization, i think it would be fair to assume all structs are vectors / maps / the respective generated classes
  • still relies on defalias
    • there is a problem with getting the size of a struct since mem/sizeof only works correctly with defalias-ed types.
  • the serialization protocols simply ignore the session assuming the segments won't become invalid when writing the struct

example of currently generated code:

(coffi.mem/defalias
   :raylib/Font
   [:coffi.mem/struct
    [[:baseSize :coffi.mem/int]
     [:glyphCount :coffi.mem/int]
     [:glyphPadding :coffi.mem/int]
     [:texture :raylib/Texture]
     [:recs :coffi.mem/pointer]
     [:glyphs :coffi.mem/pointer]]])
  (clojure.core/defprotocol
   proto-serialize-Font
   (serialize-Font [obj segment]))
  (clojure.core/extend-protocol
   proto-serialize-Font
   clojure.lang.IPersistentVector
   (serialize-Font
    [obj segment]
    (do
     (coffi.mem/write-int segment 0 (vector-nth obj (clojure.core/int 0)))
     (coffi.mem/write-int segment 4 (vector-nth obj (clojure.core/int 1)))
     (coffi.mem/write-int segment 8 (vector-nth obj (clojure.core/int 2)))
     (serialize-Texture
      (vector-nth obj (clojure.core/int 3))
      (coffi.mem/slice segment 12 20))
     (coffi.mem/write-address segment 32 (vector-nth obj (clojure.core/int 4)))
     (coffi.mem/write-address
      segment
      40
      (vector-nth obj (clojure.core/int 5)))))
   clojure.lang.IPersistentMap
   (serialize-Font
    [obj segment]
    (do
     (coffi.mem/write-int segment 0 (:baseSize obj))
     (coffi.mem/write-int segment 4 (:glyphCount obj))
     (coffi.mem/write-int segment 8 (:glyphPadding obj))
     (serialize-Texture (:texture obj) (coffi.mem/slice segment 12 20))
     (coffi.mem/write-address segment 32 (:recs obj))
     (coffi.mem/write-address segment 40 (:glyphs obj)))))
  (clojure.core/defmethod
   coffi.mem/serialize-into
   :raylib/Font
   [obj _struct segment _session]
   (serialize-Font obj segment))

There is definitely more work i want to put into this, but i'm interested in feedback on the trajectory on this.

@IGJoshua
Copy link
Owner

IGJoshua commented Sep 2, 2024

Hey, thanks for working on this, I'm super glad that you're working towards contributing more!

I think your list of features expected is good, although I would add the caveat that I would prefer that only map-interfaced objects be used in the serialization and deserialization of structs.
I feel like the associative structure is just more clear in this way, and if we want positional structures I'd want to just call that a tuple, which will layout-wise be the same as a struct, except it will use a vector as the clojure type.

Doing it this way will remove the need to have dynamic dispatch to determine which function to use, instead a single serialize and deserialize function can be defined, and used as the implementations of the multimethods.

This would also allow for higher performance usage because it leaves an option up to the user to define a record type which gives fast access to the fields which can reduce the cost of serialization. For deserialization I think it would be a good idea to think some about what the interface could look like to allow deserialization both as a map and as a record based on arguments they pass to the macro.

I think that a dedicated class is unlikely to provide a large performance benefit over a record as long as field access is being used rather than the ILookup methods, but using ILookup would allow both maps and record types to be serialized, which I think is likely worth it.

To serialize/deserialize in as performant a way as possible I think the baseline of calling the read/write primitive functions with offsets to start. I think there's some room to explore how to deal with struct and other types. Using the deserialize-from and serialize-into functions is an acceptable fallback. I suspect that the function could also be created with an inline definition as well. It might be a good idea to try using a registry of vars that do composite object serdes, mapping from the type name keyword to the vars with the serde functions in them/the symbols that name them.

If we go the registry route which atm I think is a good direction to explore that would open up a way to make sure that other macros for composite types can have easy access to functions which do performant serdes without needing the different macros to directly coordinate with each other.

I'm not currently in favor of trying to have native-backed memory with clojure interfaces just because it's hard to provide an immutable interface, and I think that if you are working in a situation where the serde costs dominate the time spent in clojure-land between native calls it's likely advisable to just work directly with the read/write primitive functions. I think it's valuable to make that usecase easier to work with (e.g. by adding an offset-of function that allows slicing to subfields of a segment-backed struct value), but I don't know if the defstruct macro is the best place to try to cater to that particular need.
That said, if you experiment that direction and are able to come up with something that presents an interface that feels native to clojure and is more performant than the alternative, I'll be happy to include it.

I'm also curious what you mean about size-of only working on defalias'd types? The multimethod that powers size-of is c-layout, which has implementations provided for all primitive-backed types and is intended to be implemented by the user if they are making a composite type.

Anyway, thanks so much for working on this! I look forward to talking through this some more and hearing back!

@rutenkolk
Copy link
Contributor Author

rutenkolk commented Sep 3, 2024

Thanks for the feedback!

I didn't expect the vector serialization to a point of contention, so I want to provide some rationale on why i still think it might be valuable to have positional types like vectors supported for defstruct:

  • A lot of native libraries define exactly such positional-in-nature datatypes as structs, since the distinction doesn't matter as much in languages like C, where you can simply treat a pointer to the struct it as if it was an array anyway (with the implicit assumption of no padding). Carrying that over sounds like good ergonomics to me.

  • Sometimes the distinction between positional and element-based type isn't clear and one might want to treat it as both

I've been using raylib as a case study, since it's a one header library. But here too one can find things like vector types or matrices defined as structs. And a canonical translation of those types to coffi would result in something like this:

(coffi.mem/defalias
   :raylib/Matrix
   [:coffi.mem/struct
    [[:m0 :coffi.mem/float]
     [:m4 :coffi.mem/float]
     [:m8 :coffi.mem/float]
     [:m12 :coffi.mem/float]
     [:m1 :coffi.mem/float]
     [:m5 :coffi.mem/float]
     [:m9 :coffi.mem/float]
     [:m13 :coffi.mem/float]
     [:m2 :coffi.mem/float]
     [:m6 :coffi.mem/float]
     [:m10 :coffi.mem/float]
     [:m14 :coffi.mem/float]
     [:m3 :coffi.mem/float]
     [:m7 :coffi.mem/float]
     [:m11 :coffi.mem/float]
     [:m15 :coffi.mem/float]]])

which i really wouldn't want to treat like a map on the clojure side. You could make the argument this maybe shouldn't be a "struct" in the first place, but even here, the order of member names communicates row-major vs column-major layout, so I wouldn't want to lose that information either. So a "tuple" type i would unsatisfactory in this case too. i don't think making that strict distinction would then cover all use cases.

As another example, for types like Vector4 it's even less clear-cut, since sometimes you would want to treat it as a map with access to the single members via their name and sometimes treat it as.. well, a vector.

I'm not hard set on this, but i wanted to point out why i think there are cases where this flexibility might be desirable (and a pain to implement with ok performance over and over and then also maintain it.)

I agree that defrecord is probably a good place to start and then see if any improvements can be made.
In that spirit, I'll get that up and running and look to having the map-like interface working.

I don't think I fully follow with the registry idea. In particular how would creating those vars differ from a protocol or multimethod?

Regarding defalias: As you said i could have implemented c-layout, but generating defalias was much quicker and less error prone for the start and I also didn't have to generate everything else at once. But yeah, i'm not planning on keeping it around.

Edit:
I agree that the native-backed memory option is probably silly, but i just wanted to throw it out there, since i'm not exactly sure what the jvm does or doesn't like yet with respect to (de)serialization. i was hesitant to just use defrecord because it generates extra fields that increase the size of the type, but i agree on it probably being a good starting point.

@IGJoshua
Copy link
Owner

IGJoshua commented Sep 3, 2024

Those are all good points! My response is gonna be big so I'll use some headings to keep it organized.

Matrix/Vector Types

So I agree that types like raylib's Matrix type are not going to be particularly pleasant to work with as either a vector or a map, and that something like a Vector4 would also be a nice thing to be able to use in both ways.

I think the only real difference in opinion here is where "in the stack" those niceties belong.

Personally I'm of the mind that Matrix and Vector are both instances of what I'd consider a special case, where data is neither strictly positional nor strictly map-like. I think there's something to be gained from trying to figure out what is common between those types and others which also have this sort of structure and to try to make something reusable that will be appropriate for types like these.

What I'm mostly concerned about here is that I think that defstruct has a pretty good case for working on the most common struct layouts and those are well-represented with maps. Since the goal of coffi is to make native code feel like calling pure Clojure though, I think an important step before expanding defstruct to support these types with strange layouts is to determine what would be an appropriate clojure interface for them.

Personally, for the Matrix type I would expect an interface using get-in with numeric indices to represent the two dimensions, but I can't immediately think of how I would make the macro understand that without the wrapper library needing to make some code to postprocess what coffi spits out. Likewise with the Vector4 case I'd think you would want a record type with a custom ILookup implementation which accepts both numeric indices and also :x, :y, :z, :w, and perhaps rgba as well, and again I'm unsure how you would do this without postprocessing what comes out. I'm open to ideas though.

Maybe all this means though is that there needs to be a place in the macro to provide the user with some code to postprocess the data after it's deserialized into a map/record? Then with the matrix you could put it into nested vectors, and with the vector you could construct a new value which implements ILookup appropriately? I'm not sure if that's putting too much on the library users though.

Existing Struct Serdes

The other thing that we'd be getting into at this stage though if we opened the defstruct macro up to having alternate representations like vectors etc. and part of why I was originally just wanting to stick to maps is that it would also require that the serdes defined for the ::mem/struct type be expanded to allow the same things, because I had wanted defstruct with a struct definition to have the same behavior but better performance as defalias with a struct definition.

Right now ::mem/struct only supports going to and from objects with a map interface.

EDIT: I want to clarify, I'm OK with adding more ways to serde objects, what I was getting at with the past two sections was that I want to think carefully about those new ways, and think about whether they should be under the same name or a new name. I want other creative ways to serde things, it's just a question of where to put it.

Registry

The difference between the registry idea and a multimethod/protocol is just when dispatch happens. If you have a registry available at the top level that maps from type names to the symbols you use to serde that type, then it means you can generate code which includes that symbol in the resulting code, and so the polymorphic dispatch is happening at macroexpansion time. This is a more generic version of what I've been doing with the primitive types in the insn codegen stuff, where I'm looking up the primitive typename and then using that to find the specific asm instructions needed to be called to coerce to the appropriate object type.

When doing a multimethod or protocol though, it's doing runtime dispatch between all the available methods, with protocols ofc being much faster than multimethods.

This optimization probably won't make a huge difference to most code, but in hot loops it could make a big difference.

defalias/c-layout

Yeah, I have no issue with how you were implementing it before, I was just puzzled at the statement you made, since the canonical way to define new types without defalias is to either define primitive-type, serialize*, and deserialize*, or to define c-layout, serialize-into, and deserialize-from, and if you do all of either of those groups of multimethods you should get size-of to work just fine.

I see what you meant now though, you were playing around with defalias to avoid having to define all three multimethods at once.

@IGJoshua
Copy link
Owner

IGJoshua commented Oct 2, 2024

Hey @rutenkolk, I just wanted to check in to see how things have been going and whether you've been able to work on this at all? I'm getting some more time that I might be able to use to work on coffi and I was wondering if you had made what you have so far public anywhere.

@rutenkolk
Copy link
Contributor Author

oops, yeah, i'll get to creating a proper fork. i've been hacking on it on and off in a private repo, i'll need a few more days but i'll get to it

@rutenkolk
Copy link
Contributor Author

if someone wants to see what i'm doing, i've made progress today. i'll open the PR probably tomorrow.

i just realized i coded myself into a corner and arrays are defunctional, so i have to rip that open again, even before a draft PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants