Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IStorageProvider methods parameter grainType from string to Type #1075

Closed
wants to merge 1 commit into from

Conversation

inadler
Copy link

@inadler inadler commented Nov 25, 2015

In my IStorageProvider implementation, I need to have the grain type in order to have the complete meta data of the given grain. I'm sure others will find this change useful.
This is part of @shayhatsor's suggestion #1035 (comment)

@dnfclas
Copy link

dnfclas commented Nov 25, 2015

Hi @inadler, I'm your friendly neighborhood .NET Foundation Pull Request Bot (You can call me DNFBOT). Thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes. I promise there's no faxing. https://cla2.dotnetfoundation.org.

TTYL, DNFBOT;

@sergeybykov sergeybykov added this to the vNext milestone Nov 25, 2015
@dnfclas
Copy link

dnfclas commented Nov 26, 2015

@inadler, Thanks for signing the contribution license agreement so quickly! Actual humans will now validate the agreement and then evaluate the PR.

Thanks, DNFBOT;

@shayhatsor
Copy link
Member

@sergeybykov, could this PR milestone be changed to an earlier one? I'm not sure when vNext is due but this change, among other things, provides a solution to #1068 that isn't a breaking change. By inspecting the grain type we can infer the original key.

@sergeybykov sergeybykov modified the milestones: 1.1.0, vNext Nov 30, 2015
@sergeybykov
Copy link
Contributor

@shayhatsor This change breaks every single persistence provider. In general, we do not promise backward compatibility, but I'm a bit hesitant with this one. I'd like to understand if such a break really necessary.

We are trying to finalize v1.1.0, and breaking all storage providers doesn't sound exciting, especially so late in the v1.1.0. cycle.

@sergeybykov sergeybykov modified the milestones: vNext, 1.1.0 Nov 30, 2015
@inadler
Copy link
Author

inadler commented Dec 1, 2015

@sergeybykov, grainType in its current state is grain's FullName (instead of AssemblyQualifiedName) which is not enough to get the actual grain's type.

@shayhatsor
Copy link
Member

@inadler 👍

@sergeybykov sergeybykov self-assigned this Dec 4, 2015
@sergeybykov
Copy link
Contributor

Stepping back a bit. Why do you need to pass grain type to storage provider exactly? Can't you pass any metadata you need via the state object?

The original intent for passing grainTypeName to the methods was a way to provide an additional classifier with the main use case in mind that it would be used for data organization, e.g. as a table name. It was probably a mistake to pass it here as opposed to making an attribute on the grain class or passing through config, but that's a different question. There wasn't an intent to use is as a source of grain type metadata information. That's why it is passed as a string, and not as a Type object. So I puzzled why you need that.

@shayhatsor
Copy link
Member

@sergeybykov, don't you think that an IStorageProvider would benefit from having the true grain type instead of a string? we are using it for the exact purpose you've stated:

a way to provide an additional classifier with the main use case in mind that it would be used for data organization

Using the real type provides many benefits :

  • Provides a finer grained map to the underlying storage. (By assembly, namespace etc.)
  • Check if this is a system grain, in order to map it into a more available storage.
  • Allows us to solve Grain key type isn't saved with the key #1068, at least temporarily till there would be a better fix.
  • Isn't a string.

@sergeybykov
Copy link
Contributor

@shayhatsor The concern here is that it would create an unhealthy tight coupling with Type. It appears to me like a quick and dirty shortcut for a more explicit and richer classification model (via attributes or something else) that will be needed anyway for things that cannot be inferred from Type. Besides, Type isn't serializable or CoreCLR compliant. TypeInfo would be better in that sense, but the large issue stays the same.

To be clear, the string grainTypeName argument doesn't solve this problem today, and will likely go away as we revamp the persistence model. But I think replacing it with a Type argument would be a step in the wrong direction, even if it enables scenarios that aren't possible today with the string.

@shayhatsor
Copy link
Member

@sergeybykov, I cannot apply:

a more explicit and richer classification model (via attributes or something else)

on system grains or system grain states.

@sergeybykov
Copy link
Contributor

@shayhatsor

@sergeybykov, I cannot apply:
a more explicit and richer classification model (via attributes or something else)
on system grains or system grain states.

Why? So long as each grain type has a unique key (string grainTypeName today), can't you define any needed metadata somewhere else, e.g. in the storage provider config?

Maybe I'm missing something here. It seems to me that for a robust persistence solution one would have to define quite a bit of metadata, such as DB and table names, ORM maps, sharding info, etc.. Most of this data cannot be inferred from the grain class's Type or TypeInfo objects. So why take the dead-end path?

@veikkoeeva
Copy link
Contributor

@sergeybykov, I cannot apply:
a more explicit and richer classification model (via attributes or something else)
on system grains or system grain states.

Why? So long as each grain type has a unique key (string grainTypeName today), can't you define any needed metadata somewhere else, e.g. in the storage provider config?

Maybe I'm missing something here. It seems to me that for a robust persistence solution one would have to define quite a bit a metadata, such as DB and table names, ORM maps, sharding info, etc.. Most of this data cannot be inferred from the grain class's Type or TypeInfo objects. So why take the dead-end path?

I'll chime in as I wrote already in Gitter from the perspective of "robust persistence solution". If this looks like being a larger issue, should we open an issue about this and list expectations from various storage points and chop it into features so as we get work underway?

If I assume in the following a mainly relational perspective. To clarify, it looks like @sergeybykov looks this more from the perspective of the state class only and @shayhatsor from both the containing grain and the state class itself.

To make the picture fuller, I'll list a few points I could see the whole solution to take:

  • In general case use a table with three columns GrainId, GrainType and Payload. This is in case no other information is given. GrainType is not strictly needed, but one could "partition" the table with it. In addition, if Orleans continues to load scripts from the database as currently, it should be possible to update the query on the fly and add a switch-case that diverts some of the INSERTs and SELECTs to custom tables (other option to do this could be choose SQL clause by, say, interceptor). It might make sense to add a ModifiedOn DATETIME(3) column. It might make make sense to think this as version information such that always INSERT and take latest to be the newest version. This might be a connection to Event Sourcing solution.
  • It might make sense to add some kind of a interceptor (a lambda function?) that can, for instance, update data layout on the fly or select a serializer on the fly. From a relational perspective, in general perhap saving as binary array makes sense from a performance, point of view, but also XML or JSON (in NVARCHAR) for easier programmatibility (sometimes it makes sense to manipulate data in the tables).
  • One could provide an extra column DataVersion, which would be explicit metadata about the class version. This way data could be updated "offload", on the storage, too instead of when Orleans reads it. The other option is to write it as part of the payload data.
  • In enterprise, on-premises setting, it might be the case one wants to write some state to "cheaper disk", say a MySQL cluster and some data to RAID SSD array with high security guarantees (say, SQL SERVER). So it makes sense to be able to define multiple connection strings.
  • One can let the database handle data replication or then use application specific sharding that could use something baked in to Orleans -- and Orleans could allow this to be pluggable.

It looks like part of the argument of @shayhatsor here is that it might make sense to write some data to storage with better guarantees than other data and for that he would like to use also the grain type to infer this information. If I understand correctly, there is some state class MaybeImportantData that could be used both by ImportantGrain and NotSoImportantGrain. Say, system grain information or some other configuration data that is needed to actually run a system or just do something not-so-important with this.

Kind of higher kinds, maybe.

@jason-bragg
Copy link
Contributor

I am of the view that grain storage should focus on storing grain state, not the grain. From this perspective, the primary information provided to storage providers should be the state of the grain and the grain’s identity. Currently this is, respectively, GrainState and GrainReference.

Storage provider implementations may need additional metadata for various reasons, including data organization, versioning, and optimization. This extends the required information to be state (GrainState), grain identity (GrainReference), and metadata (?).

Currently the metadata provided to the storage provider is the grainType as a string. This is very limited and not extensible. In my opinion, this is at the heart of the solution proposed in this thread. However, the proposed solution arbitrarily increases the amount of information provided to the storage provider, while not providing an extensible solution.

As an alternative, I propose the following:

Phase 1
Introduce new storage provider interface that roughly conforms to the pattern described in #1060. Something like:

public interface IGrainStateStorage
{
    Task ClearStateAsync();
    Task WriteStateAsync();
    Task ReadStateAsync();
}

public interface IGrainStateStorageProvider : IProvider
{
    IGrainStateStorage GetStorage(GrainReference grainReference, GrainState  grainState, IDictionary<string,object> metadata);
}

This allows us to update the storage pattern without breaking existing providers, but does not (yet) solve the metadata problem.

Phase 2
Introduce a new storage attribute something like:

[AttributeUsage(AttributeTargets.Class)]
public class GrainStateStorageProviderAttribute : Attribute
{
    public string ProviderName { get; set; }
    public IDictionary<string,object> Metadata { get; set; }
}

This attribute allows attributes to pass storage provider specific metadata associated with a grain type to the provider.

@inadler, is this sufficient to allow you to resolve your storage provider issues?

@shayhatsor
Copy link
Member

@jason-bragg, but if we have the true grain type, we can read any custom attributes. so why do we need to limit the custom attributes to an Orleans specific type?

@jason-bragg
Copy link
Contributor

So why do we need to limit the custom attributes to an Orleans specific type?

That is not a goal, just a consequence of the solution I've proposed. I'm very open to other solutions, but passing in the grain type is a leaky abstraction, because the grain type would only be passed in to get at attributes that have metadata; the metadata is what the component needs, not the grain type. Instead of solving the problem of how to get the metadata, we'd be passing the responsibility to get the metadata to the provider implementer by passing in the grain type and assuming that is sufficient for them to get what they need.

In more general terms, what should grain state storage functionality depend on?

As I posted, in my view, it needs to know the identity of the grain it is being associated with, the state being stored, and some provider specific metadata. Do you disagree? If so, what am I missing? If not, then we should explore methods of specifying metadata (like the proposed attribute) that does not require a leaky abstraction nor pass the responsibility of assembling the metadata to the provider implementer.

@shayhatsor
Copy link
Member

@jason-bragg, I obviously don't agree that passing the grain type is a "leaky abstraction" or like @sergeybykov said "quick and dirty". IMHO, it's the most simple, clean, straightforward and robust solution to this issue. I believe storage providers should get all the metadata available.

@inadler
Copy link
Author

inadler commented Dec 8, 2015

I was trying to be generic with my pull request but I now think a specific problem would have been more helpful.

In my IStorageProvider, I wish to distinguish between user and system grains and be able to use the actual Grain key (same one that I use in the GrainFactory.GetGrain method) on user grain's grain states in my persistent storage.

This is actually a deal breaker for me because my GrainState is actually my real data that I want to be able to search, test, and manipulate using it's real Id in my persistent storage (instead of the Orleans specific GrainReference key)

Having the type of the grain, allows me:

  • To know whether it is a system grain or not by using one of the bellow (I cannot decide which is less horrible) ways:
    • Enumerate all internal stateful grains type's FullName in an external configuration ==> version specific
    • Check grainTypeName.StartsWith("Orleans.")
    • Check if Type.IsVisible (because Orleans classes or mostly internal)
    • Check the assembly for Orleans DLLs
  • Get the grain key by checking which IGrainWithX is implemented so I can figure out it's actual key (unfortunately, this is the only way to do that)

if I try to get the original type by using Type.GetType(grainTypeName) , it won't work, because Type.GetType requires AssemblyQualifiedName which currently grainType is not.

@gabikliot
Copy link
Contributor

So system grains are just Rendezvous Pub Sub grains? You can easily solve that problem for that case, by either using a different storage provider for them and your app grains or same storage provider type with different name and differentiating by name.
Or we can add a method on the Storage Provider: IsOrleansInternalGrain(GrainReference)

I am not suggesting that instead of passing Type, just saying how this could be solved alternatively.

@shayhatsor
Copy link
Member

So system grains are just Rendezvous Pub Sub grains?

AFAIK currently yes, but we wish to be able to distinguish current and future internal grains.

You can easily solve that problem for that case, by either using a different storage provider for them and your app grains or same storage provider type with different name and differentiating by name

we thought about it, but then we'll have to do that for every future internal type.

Or we can add a method on the Storage Provider: IsOrleansInternalGrain(GrainReference)

a method like that can come handy, but where would you put it? on the GrainReference class ?

I am not suggesting that instead of passing Type, just saying how this could be solved alternatively.

That's exactly the thing. When you pass the type, I agree that it might be too much information for most scenarios, but when you need it, it's there.

@shayhatsor
Copy link
Member

@inadler 👍

@jason-bragg
Copy link
Contributor

on grain activation, the infrastructure call ReadStateAsync before I can manually populate the GrainState with the Id.

Good point. I didn't consider that. Since the only identifier available at load time is the GrainReference, data organization is limited to what it provides. I can definitely see how this is a significantly limiting constraint, especially when integrating with pre-existing storage patterns. The metadata solution I proposed would technically afford you the same workaround as passing in the type, but it would be equally kludgy in regards to addressing this issue.

How are you using the grain type to get the grain id?

@shayhatsor
Copy link
Member

@jason-bragg, for example: we're checking if the grain is IGrainWithIntegerKey and then call grainReference.GetPrimaryKeyLong()

@jason-bragg
Copy link
Contributor

we're checking if the grain is IGrainWithIntegerKey and then call grainReference.GetPrimaryKeyLong()

Understood. So the grain type is needed to compensate for #1043/#1068 . Is that correct?

@veikkoeeva
Copy link
Contributor

I'll cross-reference #343. In the current SqlUtils persistence implementation there is sharding used, so there are connection points. There might be some other things worth considering, as mentioned earlier in this thread (here and here).

@sergeybykov
Copy link
Contributor

@shayhatsor

@jason-bragg, for example: we're checking if the grain is IGrainWithIntegerKey and then call grainReference.GetPrimaryKeyLong()

If we added a couple of extension methods to GrainExtensions, e.g. IsGrainWithIntegerKey, IsGrainWithCompoundKey, would that solve the ID part of the problem?

@gabikliot
Copy link
Contributor

I think we should extend GrainReference (maybe via GrainExtensions) with IsGrainWithIntegerKey to allow reliably deduce the primary key. But that is unrelated to the question of grain identity available to the storage providers, which I think should not be GrainReference. I laid out my suggestion about it in #1123.

@shayhatsor , @inadler - for now, just to unblock you, I think you can use what ever hack you find to solve that issue: for example "we're checking if the grain is IGrainWithIntegerKey and then call grainReference.GetPrimaryKeyLong()" or try to get Long Key from Grain Reference and if that throws, try to get Guid. All hacks, but will unblock you, while the bigger discussion about Grain Identities and what should be espoused where is taking place.

The discussion about Grain Identities is not a light one, since it has a broad impact on the programming model, so I expect it to take some time until the decision is made.

@inadler
Copy link
Author

inadler commented Dec 9, 2015

@gabikliot , unfortunately, we are still blocked.
All the above workarounds won't do for us because we cannot identify the actual Grain Id during Grain activation.

@gabikliot
Copy link
Contributor

Ok, so just to unblock you, as I understood there are 2 issues:

  1. system grains vs. app grains. You can solve that by using different storage providers (or same provider with different name).
  2. Getting primary key for the app grain from GrainRefernce. You can solve that by writing a wrapper utility that uses GrainExtensions - https://github.com/dotnet/orleans/blob/master/src/Orleans/Core/GrainExtensions.cs#L142.
    Call public static long GetPrimaryKeyLong(this IAddressable grain, out string keyExt), if it does not throw the primary key is long, if it does throw call public static Guid GetPrimaryKey(this IAddressable grain, out string keyExt).

Does that unblock you?

@sergeybykov
Copy link
Contributor

@inadler Can you elaborate on the following problem?

All the above workarounds won't do for us because we cannot identify the actual Grain Id during Grain activation.

@sergeybykov
Copy link
Contributor

@gabikliot

  1. Getting primary key for the app grain from GrainRefernce. You can solve that by writing a wrapper utility that uses GrainExtensions -

I think we can do cleaner than that - we can add an extension method IsLongKey() which would expose UniqueKey.IsLongKey.

@inadler
Copy link
Author

inadler commented Dec 10, 2015

@gabikliot , I don't think your solution is acceptable because:

  • It doesn't meet all our needs - what about string or compounded keys?
  • In terms of performance, I don't think this is acceptable (I know we can cache the resuls for later use but in principal it doesn't feel like the right approach)

I don't think the bellow code is something we would like to use:

try { return grain.GetPrimaryKeyLong(); } catch { }
try { return grain.GetPrimaryKey(); } catch { }

@sergeybykov
Copy link
Contributor

@inadler If we add IsLongKey, IsStringKey and IsCompoundKey, will that be sufficient?

@jdom
Copy link
Member

jdom commented Dec 10, 2015

or even better, an enum, such as enum KeyKind : byte { Long, String, Compound }

@sergeybykov
Copy link
Contributor

We have GuidCompound and IntegerCompound, unfortunately.

@jdom
Copy link
Member

jdom commented Dec 10, 2015

well, all I meant is the approach, if more values are needed in that enum, we can add them, can't we?

@gabikliot
Copy link
Contributor

@inadler , I suggested that hack as a workaround, to unblock you, until a proper solution is implemented. Of course that is unacceptable as a permanent solution. I think I stressed that maybe 5 times, in various places in my comments.

If you have other ways to unblock the issue temporarily, even better.

@inadler
Copy link
Author

inadler commented Dec 13, 2015

Let me start by thanking you all for your help with this issue.

@sergeybykov, extension methods for identifying the type of the grain can surely help, but with one limitation - knowing that a key is compounded doesn't help us getting it because there is no method that return compounded key.

@jdom , @gabikliot
For now, I've implemented a workaround that might be helpful to others (with a compounded key limitation) - using a derived generic GrainState which hold both the concrete object and the RawID to be used in DB.

I basically ignore the initial (before Grain activation) ReadStateAsync (by ensuring the existence of RawID when grainState is IGenericState) and issue my own in OnActivateAsync, the 2nd time RawKey is set.

Below is the code for whom who may concern:

    public interface IGenericState 
    {
        object RawKey { get; set; }
        object GetValue();        
        void SetValue(object value);
    }

    /// <summary> GrainState wrapper with generic type </summary>
    [Serializable]
    public class GenericState<T> : GrainState, IGenericState 
    {
        /// <summary> The actual state </summary>
        public T Value { get; set; }        
        /// <summary> The actual Grain key </summary>
        public object RawKey { get; set; }
        public object GetValue() { return Value; }
        public void SetValue(object value) { Value = (T) value; }
        public override void SetAll(IDictionary<string, object> values) { Value = default(T); }       
    }


   /// <summary> Defines a grain that uses the <see cref="GenericState{T}"/> instead of the <see cref="GrainState"/>. That way there is no need to derive from GrainState and specify a generic type. Also, when setting the state,  it sets its value using reference instead of propagating all its properties using reflection</summary>
    public class StatefulGrain<T> : Grain<GenericState<T>> where T : class, new()
    {
        /// <summary> This method became internal to overcome an Orleans limitation.
        /// Use <see cref="OnActivateStatefulGrainAsync"/> instead </summary>
        public sealed override async Task OnActivateAsync()
        {            
            base.State.RawKey = RawKey;
            await ReadStateAsync();
            await OnActivateStatefulGrainAsync();
        }

        /// <summary> This method is called replaces Orleans's OnActivateAsync and is being caled by the Orleans's OnActivateAsync method that is now became internal </summary>
        public virtual Task OnActivateStatefulGrainAsync() { return TaskDone.Done; }

        protected StatefulGrain() : base() 
        { }

        protected StatefulGrain(IGrainIdentity identity, IGrainRuntime runtime, T state, IStorage storage) 
            : base(identity, runtime, new GenericState<T> {Value = state}, storage)
        { }

        /// <summary> The grain's state being read from the persistent storage </summary>
        public new T State 
        {
            get { return base.State.Value; }
            set { base.State.Value = value; }
        }

        public string Etag
        {
            get { return base.State.Etag; }
            set { base.State.Etag = value; }
        }

        public object RawKey
        {
            get
            {
                if (this is IGrainWithIntegerKey)
                    return ((IGrainWithIntegerKey)this).GetPrimaryKeyLong();

                if (this is IGrainWithStringKey)
                    return ((IGrainWithStringKey)this).GetPrimaryKeyString();

                if (this is IGrainWithGuidKey)
                    return ((IGrainWithGuidKey)this).GetPrimaryKey();

               string keyExt = null;
               object primaryKey = null;               
               if (this is IGrainWithGuidCompoundKey)
                    primaryKey = ((IGrainWithGuidCompoundKey)this).GetPrimaryKey(out keyExt);
               else if (this is IGrainWithIntegerCompoundKey)
                    primaryKey = ((IGrainWithIntegerCompoundKey)this).GetPrimaryKeyLong(out keyExt);

               return new CompoundKey(primaryKey, keyExt);   
            }
        }
    }

    public struct CompoundKey
    {
        public readonly object PrimaryKey;
        public readonly string KeyExtension;

        public CompundedKey(object primaryKey, string keyExt)                
        {
            PrimaryKey = primaryKey;
            KeyExtension = keyExt;
        }

        public override string ToString()
        {
            return string.Join("|", PrimaryKey, KeyExtension);
        }
     }  

@inadler inadler closed this Dec 13, 2015
@sergeybykov
Copy link
Contributor

@inadler I'm glad you are unblocked. This is still a weak spot in our API that we need to make better.

One thing that's not clear to me in your code is why you couldn't use GrainExtensions.GetPrimaryKey/GetPrimaryKeyLong(out string keyExt) in case of the compound keys.

I had another idea how to solve this but haven't had time to explore it yet. We are passing an untyped GrainReference to IStorageProvider methods instead of a strongly typed one. This is bad. If we passed FooReference instead of GrainReference, it would be easy to do is IGrainWithXKey checks against the reference itself just like ended up doing in RawKey.

@gabikliot
Copy link
Contributor

@sergeybykov , that would not work, since a grain may implement more than one interface.
That was one of the reasons I believe untyped GrainReference should be removed from the public API or at least its usage minimized and instead have a private identity/metadata, like I suggested in #1123, in addition to strongly typed FooReference.

BTW, in case you are still trying to get into "one identity to rule them all", we already have 2 - IGrainIdentity and GrainReference, and part of my point in #1123 is that IGrainIdentity does not have all the required info. The other part of #1123 is that this full (internal) identity should be passed to providers.

@gabikliot
Copy link
Contributor

@inadler , you solution/trick totally makes sense!
You basically ignore the initial ReadStateAsync and issue your own in OnActivateAsync, the 2nd time RawKey is set. The down side is that now you have 2 calls to ReadStateAsync, first being redundant.
I think this is OK for a temporal solution, but suggest that a more permanent one implemented, maybe along the lines of #1123.

@inadler
Copy link
Author

inadler commented Dec 14, 2015

@sergeybykov , I didn't realize GetPrimaryKeyLong(out string keyExt) is for compounded keys - I updated the above code.

@gabikliot , of course it's an hack.
In my IStorageProvider implementation, when grainState is IGenericState I verify for RawID existence - otherwise, I exit the method.

public async Task ReadStateAsync(string grainType, GrainReference grainReference, GrainState grainState) 
{ 
  var documentKey = grainReference.ToKeyString();
  if (grainState is IGenericState)
  {
     var genericState = ((IGenericState)grainState);
     if (genericState.RawKey == null) return;

     documentKey = genericState.RawKey.ToString();
   }
   ...
}

@gabikliot
Copy link
Contributor

That's exactly what I thought you were doing.

@inadler
Copy link
Author

inadler commented Dec 15, 2015

@gabikliot 👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants