Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a programming model/API for Event Sourcing #343

Closed
sergeybykov opened this issue Apr 17, 2015 · 143 comments
Closed

Define a programming model/API for Event Sourcing #343

sergeybykov opened this issue Apr 17, 2015 · 143 comments
Assignees
Milestone

Comments

@sergeybykov
Copy link
Contributor

It would help to have a general programming model for using Event Sourcing as an approach to managing grain state. Likely that would be an alternative or an extension to the declarative persistence model. The goal is to have a unified model that would allow different implementations (and against different data stores), so that an app can pick and choose which ones to use.

@sebastianburckhardt
Copy link
Contributor

My current understanding is that Event Sourcing can be easily supported on top of the DataGrain API we have been discussing, which can in turn be done on top of various backplane services. Do you have links to particular samples that use the event sourcing pattern?

@richorama
Copy link
Contributor

As a (simpleton) developer I'd like to implement an event sourced grain like this:

[EventStoreProvider("TableStorage")]
public MyEventGrain : EventGrain, IMyGrain
{
  int balance;
  public async Task Add(int value)
  {
    this.balance += value;
    await this.Commit();
  }
}

When Commit is called (which resides on the EventGrain base class) instead of persisting the current state (as a storage provider does) we record the method call (i.e. we record that add was called, and the values of the arguments). (or if you prefer to think of this in an actor context, you record the incoming message). If a method does not call commit, the event is never written (providing the ability to support 'readonly' methods).

When the grain is activated, the events recorded for that grain are then replayed by calling the methods again in sequence (although the Commit method would be disabled). This results in the grain's internal state being restored to where it was.

Note that the grain interface is not concerned with how this grain is implemented (whether it uses event sourcing or not).

I'm not sure what to do if the grain makes calls to other grains (this sounds like a bad idea).

I'm not sure if you would want to support a 'role up' or 'snapshot' of the state, to cater for grains with a large number of events.

Another consideration is whether you would want to allow a grain (or a system) to a restore to a previous point in history (i.e. play back all the events up to a certain point).

@sergeybykov
Copy link
Contributor Author

@sebastianburckhardt We already have al least four different attempts to bring ES to Orleans: by @jkonecki, @yevhen, @ReubenBond and you. The purpose of this issue is try to come up with a programming model/API that would work for everyone while allowing for pluggable implementations.

I'm not sure what to do if the grain makes calls to other grains (this sounds like a bad idea).

@richorama I am offended you said calling other grains is a bad idea> :-)
Seriously though, I read it as you are concerned about side effects. Rightfully so, of course.

This is where it's not clear to me if we should

  1. view ES in the narrower scope of state updates, e.g. by providing a different persistence API where instead of the property bag and WriteStateAsync() would be a bunch of semantic update operations plus Commit(). This is roughly the direction @sebastianburckhardt and @jkonecki took. With this approach we are not concerned about side effects because we don't replay grain calls, and only reconstitute the persistent state object.

or

  1. use it as a 'grain calls log', which is my understanding of @ReubenBond's and @yevhen's approach and your example above.

It is also very possible that I'm too confused by all these similar but different stories.

@sebastianburckhardt
Copy link
Contributor

I think we can agree that at some point, the runtime has to be able to serialize/deserialize
(1) the grain state, and (2) an update descriptor. Requirement (1) was always there for persistent grain, but requirement (2) is new.

The question is if this should be automatic (done by compiler or at runtime using reflection), or under explicit programmer control. Today, the persistence mechanism is not automatic. The programmer explicitly provides a serializable class that defines the grain state. There are advantages and disadvantages to that.

If we are doing the same thing to serialize updates, i.e. go with explicit programmer control, the user would write serializable classes to define all the update operations. I show an example below. There is a bit more redundancy in this code than I like, unfortunately. The cool part is that since updates are now first-class, we can support very useful and powerful APIs involving updates and update sequences (e.g. we can query which updates are currently still pending and not yet confirmed, we can get an IEnumerable of all updates, and we can subscribe to a stream of updates).

[Serializable]
class MyGrainState
{
   public int Balance {get; set;}
}

[Serializable]
class MyAddOperation : IUpdates<MyGrainState> {
   public int Amount { get; set; }
   public void Update(MyGrainState state) {
      state.Balance += Amount;
   }
}

class MyGrain : DataGrain<MyGrainState>
{ 
    public async Task Add(int value)
    {
         await QueueUpdate(new MyAddOperation() { Amount = value });
    }

}

@ReubenBond
Copy link
Member

Some people were confused by my Event Sourcing implementation. I had a chat with @gregoryyoung, which hopefully cleared up most of that confusion and also helped to refine the implementation in my mind.

Framework support for Event Sourcing is a great idea! I am very confident that we can come up with an API which is both sound, easy to use, and (hopefully) which makes it obvious when it's being abused.

Here is one idea: we have command methods which do not directly emit events, event methods which do directly emit events, and query methods which are read-only.

Command methods take their arguments, perform required computation on them, collect whatever ambient information is required, and then call Event methods. Command methods are not replayable.
Here is an example of a very basic Command method:

public Task PlaceOrder(Order order, User user)
{
  // Gets the taxation rate at the time the order is placed.
  var tax = await taxActor.GetApplicableTax(order, user.Address);
  return this.OrderPlaced(order, user, tax);
}

Event methods perform validation of their arguments, emit the event, and then apply the event.
Event methods are replayable. Validation is only performed once: before the event is persisted. During replay, the event is not re-validated.
Here is an example of an Event method:

public Task AddedUser(User user)
{
  return this.Commit(
    validate: () =>
    {
      if (user == null)
      {
        throw new ArgumentNullException("user");
      }
    }, 
    apply: () => this.users.Add(user.Id));
}

Of course, if no validation is required, that method becomes very terse:

public Task AddedUser(User user)
{
  return this.Commit(() => this.users.Add(user.Id));
}

Query methods are exactly as they are today. Here is a Query method:

public Task<int> GetUserCount()
{
  return this.users.Count();
}

Alright, so those are the kinds of methods which we can have, but what if you need to emit multiple events atomically? @gregoryyoung gives an example of a stock trading platform where:

  1. An order is placed for 1000 shares
  2. The order is partially filled (200 shares)
  3. Another order is placed for the remaining 800 shares

Events 2 & 3 should be committed atomically. I propose we do this within the scope of a using block within a command method. This could even be nested (but let's not get into that).
Example of committing multiple events (pseudo code):

using (var tx = this.CreateEventTransaction())
{
  await this.TradeOccurred(buyer, seller, symbol, volume, price);
  await this.PlaceOrder(buyer, symbol, remainingVolume, price);
  await tx.Commit();
}

This is surprisingly easy to implement: each call to CreateEventTransaction pushes an in-memory IEventJournal to the grain's stack of event journals. tx.Commit() commits the events in tx's journal to the previous journal. The in-memory journal's Dispose() method pops it from the stack.

Let me know what you think. We will discuss versioning of event handlers as well as event method arguments.
We can also discuss views/projections, which can very nicely plug into Orleans' streaming system 😄!

Oh, also, we can consider creating Roslyn Code Analyzers to guide users onto the right track (eg, event methods should have exactly one statement: this.Commit(...)). This can also help us to warn users when they're performing side effects in their apply call.

Commands and events are identical if the command performs no computation & require no validation (I think this is what made @yevhen decide I had implemented command sourcing).

Message sourcing can be implemented also: you Commit() query methods. The apply parameter in Commit(...) can return a value, but that's not necessary.

Very excited!

@gabikliot
Copy link
Contributor

I think I like it a lot.
Basically, the way I understood what you wrote, one can have the full flexibility: you decide what should be sourced (journalled) and what not. You may have all methods journalled, or only some. You can emit multiple events in a method, as tx or not, or none.
The only restriction is that what ever you journal does not have side effects - it only performs transformations on the in memory state (or zero transformations and returns a value). Plus optional validation.
If I understood it correctly, then I like it! I wonder however what @yevhen has to say. :-)

Did not understand the connection between views/projections and Orleans' streaming, but that is orthogonal.

I am also super interested in: "consider creating Roslyn Code Analyzers to guide users onto the right track (eg, event methods should have exactly one statement: this.Commit(...) ). "
If it is possible to create a compile time validation of certain complex, context aware patterns, this would be super useful in a lot of places, unrelated to ES! One example is check that grain code never calls Task.Wait or Task.Result. We have more examples like this

@ReubenBond
Copy link
Member

Your understanding is correct.

I'm trying to keep from discussing everything at once, but basically the event journal could be subscribed to. Subscribers would implement the same interface as the grain (or at least he journalled part of it). That way, the subscriber can handle each of the events in a natural, .NET-native manner. Does that make sense? Does it sound like a good approach?

Regarding Roslyn Code Analyzers, I'm referring to this: https://channel9.msdn.com/Events/dotnetConf/2015/NET-Compiler-Platform-Roslyn-Analyzers-and-the-Rise-of-Code-Aware-Libraries
I'm not certain how feasible Code Analyzers are, yet - it might be a pipe-dream.

Edit: more detail on analyzers: https://msdn.microsoft.com/en-us/magazine/dn879356.aspx

@gregoryyoung
Copy link

Read the views thread on akka.persistence.

On Sat, Apr 18, 2015 at 8:00 AM, Reuben Bond notifications@github.com
wrote:

Your understanding is correct.

I'm trying to keep from discussing everything at once, but basically the
event journal could be subscribed to. Subscribers would implement the same
interface as the grain (or at least he journalled part of it). That way,
the subscriber can handle each of the events in a natural, .NET-native
manner. Does that make sense? Does it sound like a good approach?

Regarding Roslyn Code Analyzers, I'm referring to this:
https://channel9.msdn.com/Events/dotnetConf/2015/NET-Compiler-Platform-Roslyn-Analyzers-and-the-Rise-of-Code-Aware-Libraries
I'm not certain how feasible Code Analyzers are, yet - it might be a
pipe-dream.


Reply to this email directly or view it on GitHub
#343 (comment).

Studying for the Turing test

@ReubenBond
Copy link
Member

@gabikliot
Copy link
Contributor

Got it about looking on the journal as stream of events. Nice! Thanks @gregoryyoung, will do. That:http://doc.akka.io/docs/akka/snapshot/scala/persistence.html?

@ReubenBond
Copy link
Member

If that's the thread, thank you - we could integrate the notion of stream hooks into our model. We would provide a handler which is executed on each journalled message.

That gives users a chance to emit the event to a stream of their choosing - that allows for heterogeneous partitioning, rather than forcing the partition to be per-actor. So I could fork the stream for all user actors, and have security-related events emitted to a special audit log. Or if we are message sourcing, we could copy all queries to a stream for analytics.

@gregoryyoung
Copy link

Akka Persistence on the Query Side

On Sat, Apr 18, 2015 at 8:04 AM, Gabriel Kliot notifications@github.com
wrote:

Got it about looking on the journal as stream of events. Nice! Thanks
@gregoryyoung https://github.com/gregoryyoung, will do. That:
http://doc.akka.io/docs/akka/snapshot/scala/persistence.html?


Reply to this email directly or view it on GitHub
#343 (comment).

Studying for the Turing test

@ReubenBond
Copy link
Member

Alright, got it: https://groups.google.com/forum/#!topic/akka-user/MNDc9cVG1To

I'm wary of making users pay for something they might not be using, and generalized query support can be very expensive. A very simple hooking point solves the problem efficiently, but we should carefully consider how we support querying and stream discovery. I think the question of query/discovery is broader than just ES, though: people often ask how to search for grains by some property.

@witterlee
Copy link
Contributor

Very happy to see a topic of Event Sourcing
I like this point. the event can be publish to event-stream and any interested actor can subscribe this event.

...but basically the event journal could be subscribed to..

@yevhen
Copy link
Contributor

yevhen commented Apr 18, 2015

I'm trying to keep from discussing everything at once, but basically the event journal could be subscribed to. Subscribers would implement the same interface as the grain (or at least he journalled part of it). That way, the subscriber can handle each of the events in a natural, .NET-native manner. Does that make sense? Does it sound like a good approach?

No, it doesn't. That will lead to code bloat in cases where subscribers are only interested in selection of events, not the whole set.

Also, your idea doesn't mesh well with newly introduced Streams api, which is based on message-passing. Basically, with your idea - we're back to Observer api.

@jkonecki
Copy link
Contributor

I would like to start by stating that I would prefer to implement event sourcing rather than command sourcing, as described by @richorama. Command sourcing can be tricky when it comes to side-effects, like communication with external services, which shouldn't happen during replay. Also, command sourcing fails to capture the fact whether the command succeeded or not, which means it's difficult to subscribe to stream externally.

Event sourcing is defined many times by @gregoryyoung as derivation of the current state from the stream of past events. As I've mentioned during the first meetup, the separation of the grain state and grain itself is very handy here. In my opinion t is the state that is event sourced, not the gain. I see the grain more as a repository of logic. I fully agree with @richorama that the usage of the events-sourced state should be as similar to the current way gain interacts with its state as possible,

Here is @richorama example rewrote the way I see it:

[EventStoreProvider("TableStorage")]
public MyEventGrain : EventGrain, IMyGrain
{
  public async Task Add(int value)
  {
    // instead of this.State.Balance += value;
    this.State.RaiseEvent(new BalanceIncreased(value));

   // no change to current api
    await this.State.WriteStateAsync();
  }
}

I believe that the events should be explicitly defined and not auto-generated from grain's method arguments. Events are the fundamental part of domain modelling and may contain more information than is passed to the method that raises them. For example an ConfirmationEmailSent event may contain an email unique identifier returned from EmailSending service invoked from grain method and not passed as an argument to the method.

One benefit of event sourcing is the ability to rehydrate the aggregate as of a certain version, for example for debugging purposes. In order to achieve this in Orleans we would need a way to obtain a reference to the grain by passing a certain version. Right now I don't want to go into the detail of if the version is a number, string or etag. I'm just pointing out that there should be a way to obtain multiple instances of the same grain with different states. I'm guessing this with probably result with the grain reference being extended in order to differentiate between multiple instances of the same grain.

Another feature which I believe is required is the ability to publish raised events. A plugin architecture here would be perfect with providers for EventHub, Table Queues and ServiceBus being the most obvious.

@yevhen
Copy link
Contributor

yevhen commented Apr 18, 2015

+1 to @jkonecki

I'm not going to support the idea of having implicit generic events auto-generated from method signatures. It's too magical, non-conformist and non-idiomatic. It creates completely unnecessary (accidental) complexity.

There other ways exist to deal with verbosity of declaring events, such as using custom DSL or simply using the DSL provided by serialization library (.proto, .bond).

@yevhen
Copy link
Contributor

yevhen commented Apr 18, 2015

One benefit of event sourcing is the ability to rehydrate the aggregate as of a certain version, for example for debugging purposes. In order to achieve this in Orleans we would need a way to obtain a reference to the grain by passing a certain version. Right now I don't want to go into the detail of if the version is a number, string or etag. I'm just pointing out that there should be a way to obtain multiple instances of the same grain with different states. I'm guessing this with probably result with the grain reference being extended in order to differentiate between multiple instances of the same grain.

I think for that you don't need to go through the framework (runtime). Just create an instance of your (presumably) POCO aggregate-actor and replay its stream on it. Use-case that you have presented should not require runtime.

@yevhen
Copy link
Contributor

yevhen commented Apr 18, 2015

@sergeybykov

  1. use it as a 'grain calls log', which is my understanding of @ReubenBond's and @yevhen's approach and your example above.

Absolutely opposite. I view event sourcing as persistence mechanism. So I'm more on side of @jkonecki and rest of 100500 folks from ES community :)

@yevhen
Copy link
Contributor

yevhen commented Apr 18, 2015

@sergeybykov

See example here https://github.com/yevhen/Orleankka/blob/master/Source/Example.EventSourcing/Domain.cs. I don't record (log) calls to the grain, I record state changes (transitions) as events, which do capture a business intent of a change.

See https://github.com/yevhen/Orleankka/blob/master/Source/Example.EventSourcing/Domain.cs#L46 as an example of something very casual - capturing more data than just an input.

@yevhen
Copy link
Contributor

yevhen commented Apr 18, 2015

Guys, from most of the comments here, I can deduce that many of you do not understand what Event Sourcing actually is.

I see no point discussing it any further until we're on the same page. I don't like playing Chinese whispers.

Please, take your time and watch at least first quarter of https://www.youtube.com/watch?v=whCk1Q87_ZI then I believe we can have a more productive discussion.

@ReubenBond
Copy link
Member

@yevhen I get the feeling you haven't read the thread or the JabbR conversation. You told Greg Young to come and tell me that this is a bad idea and that this is not event sourcing and it's wrong. He and I had a chat, he indicated that yes, this library supports event sourcing. We can also do command and message sourcing.

We are not blindly recording calls to grains. We validate the arguments, we have the option to collect environmental information at the time of the event, and to perform computations on whatever data before persisting and applying the event. That is all clearly demonstrated above.

This proposal is much more in-tune with the rest of Orleans, in my opinion. Your system is easily implemented using this proposal. You would have a single event handler which takes arguments deriving from some kind base type with an Apply() method. You can consume this library and drop the static types as you prefer - just like how you consume Orleans and drop static types, making it more like Akka. It's fine to take your approach. We can support both elegantly.

It's much easier to throw away statically checked types than to keep them. Once you throw them away, they're gone.

I've watched Greg's videos, but thanks for the link. This one is good, too.

@veikkoeeva
Copy link
Contributor

I'm uncomfortable using event sourcing terms and I'm somewhat loss with it, but I'll note that for what I know, this looks either a form of CEP or a way to persistent events (in general fashion, i.e. without business specific structure?). Taking a cue from the linked Google thread and a question by Prakhyat Mallikarjun, whatever we do, I'd like to have an easy route to business specific queries and storage.

I think I see the value as a pattern on how to deal with events as such, especially when CEP like functionality isn't needed.

The problems I'm usually dealing with in the streaming domain are CEP like queries, how to make them fault-tolerant, how to quickly get aggregates and initialize state and this when there might be multiple streams to merge. The rest, like sequence identifiers, have always been business specific. For instance sensor readings are timestamped or have a sequence ID defined by the source. Then there is another timestamp applied when the event is received on the server and at this point their relative order, from a business perspective, doesn't matter as only timestamps are looked (and perhaps corrected for known anomalies). Considering events flowing from a metering field, they come through multiple socket ports through multiple servers (applying a consistent sequence number doesn't look like sensible), get routed to the a same event stream and persisted. Ack for receiving the event is given when it's on a durable store (just after receiving it or when persisted to a db).

What I'd like to do, many times, is to calculate aggregates like sums or cumulative sums on the fly and persist these along the data storage and use these to initialize the streams quickly upon recovering from interruptions. There might be some heavier analytics and/or data enrichment done at the data storage, which would be nice to join back to the event stream.

What I do currently for my little demo for myself is that upon ingesting data I immediately save it on the edge and then pass along for further domain-specific processing and then save that data to domain specific store and then put to transport streams for further consumption. If further processing crashes before some other save point, I'll replay the data either from the initial save location or the domain specific location. This is a kind of a stream, source of which can be seen in either of the two save locations. I can imagine this getting difficult with more downstream consumers and aggregation logic. For instance, if further downstream aggregates are being calculated, I might like to save it to the domain specific store along with the events that defines the aggregate.

For a more "application logic", there's a good example on what I've thought about at work for some years. Getting #er blog, functional event-sourcing – compose.

In any event, how I see what I'd like to have. Carry on. :)

<edit: I might add that currently I save all data explicitly without using the Grain state mechanism.

@gregoryyoung
Copy link

"
The problems I'm usually dealing with in the streaming domain are CEP like
queries, how to make them fault-tolerant, how to quickly get aggregates and
initialize state and this when there might be multiple streams to merge.
The rest, like sequence identifiers, have always been business specific.
For instance sensor readings are timestamped or have a sequence ID defined
by the source. Then there is another timestamp applied when the event is
received on the server and at this point their relative order, from a
business perspective, doesn't matter as only timestamps are looked (and
perhaps corrected for known anomalies). Considering events flowing from a
metering field, they come through multiple socket ports through multiple
servers (applying a consistent sequence number doesn't look like sensible),
get routed to the a same event stream and persisted. Ack for receiving the
event is given when it's on a durable store (just after receiving it or
when persisted to a db).
"

Perhaps metering devices are not the only use case.

I have found many threads just like this one. People tend to think of one
use case they care about and not of the many thousands that exist. You seem
to suggest linearization is a dumb idea that never works. There are
thousands of production systems that disagree with you. At times ordering
is important. There are many ways of achieving it and they have trade offs.

On Sat 18 Apr 2015 at 21:18 Veikko Eeva notifications@github.com wrote:

I'm uncomfortable using event sourcing terms and I'm somewhat loss with
it, but I'll note that for what I know, this looks either a form of CEP or
a way to persistent events (in general fashion, i.e. without business
specific structure?). Taking a cue from the linked Google thread
https://groups.google.com/forum/#!topic/akka-user/MNDc9cVG1To and a
question
https://groups.google.com/d/msg/akka-user/MNDc9cVG1To/IO9DoBTMJ3kJ by
Prakhyat Mallikarjun, whatever we do, I'd like to have an easy route to
business specific queries and storage.

I think I see the value as a pattern on how to deal with events as such,
especially when CEP like functionality isn't needed.

The problems I'm usually dealing with in the streaming domain are CEP like
queries, how to make them fault-tolerant, how to quickly get aggregates and
initialize state and this when there might be multiple streams to merge.
The rest, like sequence identifiers, have always been business specific.
For instance sensor readings are timestamped or have a sequence ID defined
by the source. Then there is another timestamp applied when the event is
received on the server and at this point their relative order, from a
business perspective, doesn't matter as only timestamps are looked (and
perhaps corrected for known anomalies). Considering events flowing from a
metering field, they come through multiple socket ports through multiple
servers (applying a consistent sequence number doesn't look like sensible),
get routed to the a same event stream and persisted. Ack for receiving the
event is given when it's on a durable store (just after receiving it or
when persisted to a db).

What I'd like to do, many times, is to calculate aggregates like sums or
cumulative sums on the fly and persist these along the data storage and use
these to initialize the streams quickly upon recovering from interruptions.
There might be some heavier analytics and/or data enrichment done at the
data storage, which would be nice to join back to the event stream.

What I do currently for my little demo for myself is that upon ingesting
data I immediately save it on the edge and then pass along for further
domain-specific processing and then save that data to domain specific store
and then put to transport streams for further consumption. If further
processing crashes before some other save point, I'll replay the data
either from the initial save location or the domain specific location. This
is a kind of a stream, source of which can be seen in either of the two
save locations. I can imagine this getting difficult with more downstream
consumers and aggregation logic. For instance, if further downstream
aggregates are being calculated, I might like to save it to the domain
specific store along with the events that defines the aggregate.

For a more "application logic", there's a good example on what I've
thought about at work for some years. Getting #er blog, functional
event-sourcing – compose
http://gettingsharper.de/2015/02/13/functional-event-sourcing-compose/.

In any event, how I see what I'd like to have. Carry on. :)


Reply to this email directly or view it on GitHub
#343 (comment).

@ReubenBond
Copy link
Member

We want an API which supports {Message,Command,Event} Sourcing. We can call it our Journaling API.

Here are a few desirable properties of the API:

  1. Easy to consume: Orleans is a framework for simplifying the development of large scale, distributed systems. The API should be in-tune with this. This means enabling journaling via constructs & techniques which are familiar to what your typical .NET developer is comfortable with. If Event Sourcing is a hassle, then guess what? Developers will avoid it. We don't want that. It is our job to do the heavy lifting to make their lives easier. Every line of code we save them is a point for us. Every time we save them from a programming error, that's another point for us.
  2. Extensible: No one's arguing here. Let's take this as an opportunity to begin support for dependency injection.
  3. Support for projections: By supporting multiple views of the event log, we allow consumers to more easily build search, analytics, auditing, billing, & many other features. We should have the ability to consume the raw log as well as support statically typed event subscription.

Supporting statically typed events is easiest if an implementation of the event sourced interface is passed to the subscribe method. That works cleanly, and you can argue for it, but @yevhen rightly points out that many of the interface methods would be often go unimplemented. One option to remedy this is to let consumers decompose their interface so that they only need to implement a subset of it - allowing the processing of events from different actor types. Alternatively, they can filter the raw event stream. Alternatively we can see what possibilities Roslyn gives us for checking type assertions at development & compile time.

@yevhen's approach more cleanly supports projections, but is verbose in the normal case, where users are logging & applying events. So it's a trade-off. This is basically the object / lexical closure equivalency debate. Orleans as a framework clearly leans towards objects, whereas Akka leans towards closures. Both options have merits, both have downsides, but let's do our best to be consistent.

@gregoryyoung
Copy link

It being we are discussing method = message = schema, its true they are all
ways of representing schema though they have varying sets of tradeoffs ...
You have found one of these trade offs so far in dealing with projection
code that needs to read events. Another that would be worth thinking about
is heterogenous models. There is not a right answer without context.

Why not just make this a choice? The same underlying "framework" code would
work regardless of how you choose to represent schema having code capable
of reading schema via type or via a method definition should be relatively
trivial to support.

On a side note on "making types being a hassle" as a form of defining
schema is a common argument I have heard with this style (messaging in
general) often by people who are new to such systems. Over time the
argument tends to go away. Whether I define thing through method based
schema, type base schema, or through an actual schema like .proto files
makes such a tiny difference.

On Sun, Apr 19, 2015 at 3:59 PM, Reuben Bond notifications@github.com
wrote:

We want an API which supports {Message,Command,Event} Sourcing. We can
call it our Journaling API.

Here are a few desirable properties of the API:

Easy to consume: Orleans is a framework for simplifying the
development of large scale, distributed systems. The API should be in-tune
with this. This means enabling journaling via constructs & techniques which
are familiar to what your typical .NET developer is comfortable with. If
Event Sourcing is a hassle, then guess what? Developers will avoid it. We
don't want that. It is our job to do the heavy lifting to make their lives
easier. Every line of code we save them is a point for us. Every time we
save them from a programming error, that's another point for us.
2.

Extensible: No one's arguing here. Let's take this as an opportunity
to begin support for dependency injection.
3.

Support for projections: By supporting multiple views of the event
log, we allow consumers to more easily build search, analytics, auditing,
billing, & many other features. We should have the ability to consume the
raw log as well as support statically typed event subscription.

Supporting statically typed events is easiest if an implementation of the
event sourced interface is passed to the subscribe method. That works
cleanly, and you can argue for it, but @yevhen https://github.com/yevhen
rightly points out that many of the interface methods would be often go
unimplemented. One option to remedy this is to let consumers decompose
their interface so that they only need to implement a subset of it -
allowing the processing of events from different actor types.
Alternatively, they can filter the raw event stream. Alternatively we can
see what possibilities Roslyn gives us for checking type assertions at
development & compile time.

@yevhen https://github.com/yevhen's approach more cleanly supports
projections, but is verbose in the normal case, where users are logging &
applying events. So it's a trade-off. This is basically the object /
lexical closure equivalency
http://c2.com/cgi/wiki?ClosuresAndObjectsAreEquivalent debate. Orleans
as a framework clearly leans towards objects, whereas Akka leans towards
closures. Both options have merits, both have downsides, but let's do our
best to be consistent.


Reply to this email directly or view it on GitHub
#343 (comment).

Studying for the Turing test

@yevhen
Copy link
Contributor

yevhen commented Apr 19, 2015

what your typical .NET developer is comfortable with

Typical Joe, doesn't build distributed systems which use event sourcing. He builds CRUD apps using his favorite RDB/ORM combo.

@richorama

As a (simpleton) developer I'd like to first gain an understanding of what event sourcing is, whether I really need it, what are the trade-offs and issues associated with it, before I even try to convince my manager.

As a (simpleton) developer, I first need to understand what distributed systems are, whether do I need to build the one and take all risks associated with it. That means, I first need to understand CAP, storage options and associated trade-offs, read some or all papers listed here and learn about distributed scalable architectures alternative to my simpleton RDB\ORM combo, such as REST, EDA, CQRS, ETC.

As a (simpleton) developer, before diving into a brave new world of distributed systems programming, it definitely won't hurt, if I first learn crucial design & modeling principles and patterns, like DDD, NoSQL, EIP, ETC. By doing that, I will avoid constantly begging authors of super-powerful distributed actor frameworks to support cross-actor (cross-partition) transactions, because I won't be modeling my actors after each row in my RDB, since at that moment I'll already have a firm understanding of what Aggregate is.

As a (simpleton) developer, by doing my homework and not being stupid, I'll get a chance to avoid shooting myself in a foot, by thinking that all of my (scalability, performance, etc) problems could be somehow auto-magically solved by technology/framework/unicorn alone, without requiring me making any investments in learning about the hard STUFF at all.

As a (simpleton) developer, at that moment, without any doubt, I'll be understanding that:

  • Events happen in the past. For example, "the speaker was booked," "the seat was reserved," "the cash was dispensed." Notice how we describe these events using the past tense.
  • Events are immutable. Because events happen in the past, they cannot be changed or undone. However, subsequent events may alter or negate the effects of earlier events. For example, "the reservation was cancelled" is an event that changes the result of an earlier reservation event.
  • Events are one-way messages. Events have a single source (publisher) that publishes the event. One or more recipients (subscribers) may receive events.
  • Typically, events include parameters that provide additional information about the event. For example, "Seat E23 was booked by Alice."
  • In the context of event sourcing, events should describe business intent. For example, "Seat E23 was booked by Alice" describes in business terms what has happened and is more descriptive than, "In the bookings table, the row with key E23 had the name field updated with the value Alice."
  • Event sourcing is a way of persisting your application's state by storing the history that determines the current state of your application. For example, a conference management system needs to track the number of completed bookings for a conference so it can check whether there are still seats available when someone tries to make a new booking.

Source: CQRS Journey. Introducing Event Sourcing

Dear Joe (typical senior assistant of junior .NET developer),
I care because you do ...

@jkonecki
Copy link
Contributor

jkonecki commented Dec 8, 2015

@sebastianburckhardt I haven't looked at your repo yet so cannot comment on the QueuedGrain API. I would like to try to implement event sourcing in such a way that there is no need to derive from a JournaledGrain base class. That would allow the freedom of deriving the grain from Grain or QueuedGrain regardless of the way the state is persisted. For me event sourcing should be limited to the State and StateProvider only.

If that is achievable than all that would be needed for the QueuedGrain API is to understand that the state is event sourced and use existing events instead of update messages to syncronise the state.

Event-sourced state require the definition of the Apply methods that transition the state - that can be simply called by QueuedGrain API during synchronisation.

@yevhen
Copy link
Contributor

yevhen commented Dec 8, 2015

Event-sourced state require the definition of the Apply methods that transition the state - that can be simply called by QueuedGrain API during synchronisation.

How does QueuedGrain not require the definition of similar handlers for update messages, if update messages are also diffs? Magic?

@jkonecki
Copy link
Contributor

jkonecki commented Dec 8, 2015

It does - that's what I'm trying to say: with event sourced state
QueuedGrains can reuse existing state transitions, for other states those
methods need to be written as it's the case right now.

On Tue, 8 Dec 2015 7:06 pm Yevhen Bobrov notifications@github.com wrote:

Event-sourced state require the definition of the Apply methods that
transition the state - that can be simply called by QueuedGrain API during
synchronisation.

How does QueuedGrain not requires the definition of similar handlers for
update messages, if update messages are also diffs? Magic?


Reply to this email directly or view it on GitHub
#343 (comment).

@sebastianburckhardt
Copy link
Contributor

I think I now understand what @jkonecki is proposing... this has the potential to be super easy to use.

Basically, you write an event sourced grain exactly as previously proposed (using JournaledGrain<JournaledGrainState<T>>, or even just Grain<JournaledGrainState<T>>) and then add a replication provider attribute.

Internally I can translate all of the JournaledGrain state interface (ReadStateAsync/WriteStateAsync/ClearStateAsync/RaiseStateEvent) into QueuedGrain API, without the user needing to know anything about the latter.

@yevhen
Copy link
Contributor

yevhen commented Dec 8, 2015

nice, that hint had a proper effect ))

@sebastianburckhardt

I think I now understand what @jkonecki is proposing...

click

@jkonecki

It does - that's what I'm trying to say

you owe me a beer ;)

@jkonecki
Copy link
Contributor

jkonecki commented Dec 9, 2015

I created a gist with my design for event-sourced grains - please comment on it:

https://gist.github.com/jkonecki/26b5bec619757e199e2d

@yevhen
Copy link
Contributor

yevhen commented Dec 9, 2015

@jkonecki I like it! 👍

@yevhen
Copy link
Contributor

yevhen commented Dec 9, 2015

Looks very functional! Immutable state, transition as left fold - very nice!

@jkonecki
Copy link
Contributor

jkonecki commented Dec 9, 2015

You can still cheet by updating the passes state and returning the same
instance if you want - it's up to the developer.

On Wed, 9 Dec 2015 9:57 am Yevhen Bobrov notifications@github.com wrote:

Looks very functional! Immutable state, transition as left fold - very
nice!


Reply to this email directly or view it on GitHub
#343 (comment).

@yevhen
Copy link
Contributor

yevhen commented Dec 9, 2015

@jkonecki I suspect for states having multitude of fields it might be better choice. That makes transition function returning a state not very convenient. It's just +1 line for return state;

@yevhen
Copy link
Contributor

yevhen commented Dec 9, 2015

@jkonecki honestly, as long-time OO aficionado, I don't like design with separate explicit state class. Looks anti-OO to me. I do prefer to have mutable fields directly in my actors. Having immutable state in single-threaded actor don't give much benefits. Also, I don't like prepending State. everywhere, just noise.

What if instead of passing State we will be passing Grain inside? Then there will be choice, either to modify grain.State = new State(...), or grain.State.total = 5 or grain.total = 5. For latter to work, total will need to be made public or Transition class defined as nested.

@jkonecki
Copy link
Contributor

jkonecki commented Dec 9, 2015

@yevhen

as long-time OO aficionado, I don't like design with separate explicit state class.

Orleans separates state from grains themselves. Do you want to have them merged into a single class? So your state properties will be declared directly inside the grain? That wasn't a part of my design as it would have a huge impact on the whole framework.

In my design for non-ES grains you just mutate the existing state (.State property) inside the grain methods. You follow by calling SaveState extension method. The syntax is identical to the current Orleans code with the only difference that SaveState is an extension not the instance method.

I actually like the separation of logic and state in Orleans...

@yevhen
Copy link
Contributor

yevhen commented Dec 9, 2015

I actually like the separation of logic and state in Orleans

Sure. Just matter of taste 😄

@sebastianburckhardt
Copy link
Contributor

Thanks for posting the example! Overall, looks pretty much like I expected. There is certainly no problem with supporting this exact JournaledGrain interface on top of QueuedGrain.

There are some points I think are worth discussing though. Mostly they are about reducing the amount of code users have to write for the expected common case. I would be interested in hearing what your thoughts are.

Subclasses vs. Marker interfaces: I am not sure that as a user, I would prefer to write

public class MyJournaledGrain : Grain<MyState>, IGrainJournaledState<MyJournaledState, MyJournaledStateTransition>

as opposed to just

public class MyJournaledGrain : JournaledGrain<MyJournaledState,MyJournaledStateTransition>

A journaled grain is just a specialized version of a grain... that is what subclasses are for. Why make it more complicated than necessary? Subclasses are also better for Intellisense... I had trouble occasionally with finding the extension methods because of missing using clauses.

Separate object for state transitions: Using a separate object for defining the state transitions may be o.k. for cases where you have a strong desire to emphasize the conceptual separation between event and state, but it requires users to write more code than if they put the apply function directly into the events (that is how the QueuedGrain API does it currently). It is also easy to support both mechanisms (users can choose how they want to do it).

Mutable vs. Immutable State: I think it will be more common to use mutable state than immutable state (as @yevhen mentions, because actors already take care of concurrency), thus I would also vote to save that one line of code (note that users can still use immutable data structures inside the grain state, so nothing is really lost). Perhaps we can also support two separate Transition classes, one for each style.

None of these are very important points that we need to spend much discussion effort on, except maybe the first point. I think we may want to design some clean extension mechanism for creating specialized grains. Chances are good that we will want to experiment with various grain versions for a while to come. I am currently creating QueuedGrains as a special case, and I think it would be better to have this be done with a general mechanism that also supports JournaledGrains.

@jkonecki
Copy link
Contributor

jkonecki commented Dec 9, 2015

I had trouble occasionally with finding the extension methods because of missing using clauses

That won't be a problem since extension methods will reside in Orleans namespace - no additional using required.
I'm happy to derive JournaledGrain<T> from Grain<T> - we just need to make sure that any methods in base class makes sense for ES grains. In fact I would create (forgive the name) StatefulGrain<T> and have Grain<T> and JournaledGrain<T> derive from it. We could than place ReadStateAsync, etc methods in Grain<T> and RaiseEvent method in JournaledGrain<T>.

If we introduce JournaledGrain<T> how would you integrate QueuedGrains API? You cannot derive from QueuedGrain<T> at the same time...

Separate object for state transitions

My existing ES branch actually has Apply methods in state (mutable state) - I created a more functional approach now after @gregoryyoung comments. I don't see much problems with state transitions not being pure left-fold.

@sebastianburckhardt
Copy link
Contributor

In fact I would create (forgive the name) StatefulGrain and have Grain and
JournaledGrain derive from it. We could than place ReadStateAsync , etc methods in Grain
and RaiseEvent method in JournaledGrain .

yes, I think that is right. StatefulGrain can remain internal to the runtime.

If we introduce JournaledGrain how would you integrate QueuedGrains API? You cannot derive
from QueuedGrain at the same time...

No need. QueuedGrain is just a thin adaptor for the underlying replication provider (which has the exact same API), there is no real code inside. JournaledGrain can just call the replication provider directly.

@jkonecki
Copy link
Contributor

jkonecki commented Dec 9, 2015

So are we OK to introduce internal StatefulGrain with common state logic and derive Grain<T> and JournaledGrain<T> from it? We can also loose the extension methods and place them in derived grain classes directly...

I updated my gist - https://gist.github.com/jkonecki/26b5bec619757e199e2d
Added StatefulGrain and removed StateTranstion (state is mutable).

@cmello
Copy link

cmello commented Jan 21, 2016

As a newcomer I find it difficult to catch up with this volume of information. Still working on it. :-) But I would like to mention something I was looking at before meeting actors: FT-Corba. It has a mix of jounal and snapshots: http://www.omg.org/spec/FT/1.0/

@sergeybykov
Copy link
Contributor Author

#1854 will add a robust solution for this.

@sergeybykov
Copy link
Contributor Author

Resolved via #1854.

@ghost ghost locked as resolved and limited conversation to collaborators Sep 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests