Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profile stitch #444

Merged

Conversation

harleensahni
Copy link
Contributor

@harleensahni harleensahni commented Jul 24, 2018

This PR addresses issue #468.

Add a concept of profile stitch / merge rules to experience event. I had to introduce a concept of profile stitch identity similar to what we have for segment identity, but I'd prefer if we just took a much simpler approach with segment and profile ids as simple strings.

I also add segment membership to experience event and modify identity to support multiple ids for the same namespace (which is a hard requirement for merging).

@harleensahni harleensahni mentioned this pull request Jul 24, 2018
Copy link
Contributor

@cdegroot-adobe cdegroot-adobe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still processing this PR, just made a few small syntactic change requests.

@@ -20,6 +20,14 @@
"type": "string",
"description": "Identity of the consumer in the related namespace."
},
"xdm:additionalIds": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additionalIDs

"items": {
"type": "string"
},
"description": "If consumer has more than one of the given id, additional ids are listed here."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If endUser has more than one identity in the namespace, additional identity ids are listed here.

"xdm:version": "1.0",
"xdm:endUserIds": {
"https://ns.adobe.com/experience/mcid": {
"@id": "https://data.adobe.io/entities/identity/92312748749128",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has now changed to "xdm:id" as of 0.9.3

"xdm:endUserIds": {
"https://ns.adobe.com/experience/mcid": {
"@id": "https://data.adobe.io/entities/identity/92312748749128",
"additionalIds": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additionalIDs

"definitions": {
"profileStitchIdentity": {
"properties": {
"@id": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@id->"xdm:id"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want this as @id similar to how we've done with segment identity. We will have profiles stitching defined as full objects in XDM that should be referenced as their own entities

}
],
"meta:status": "experimental"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add newline please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@cdegroot-adobe
Copy link
Contributor

@harleensahni - I really like the additionalIDs. To simplify things I recommend that you pit that in a separate PR as that is quite a distinct thing and we should be able to get closure on it quickly.

I do not quite understand the concept of the core of this addition to ExperienceEvent. I am assuming that the merge rules and profile stitching are added during data-collection. So this is really a log of that. Can you explain the interaction flow a bit please?

@@ -126,6 +126,31 @@
"description": "The media activity information related to the experience event"
}
},
"xdm:profileStitching" : {
"title": "Profile Stitching",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a very Adobe-centric thing, yes? If so, it belongs in an extension and not in the primary schema

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. It's about which set of profiles segmentation is running against. Adobe has some functionality around this, but I think others in the industry do too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I would want to understand how it fits into the larger industry context before accepting this in a common schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, should we talk offline? Or do you have something in particular you're looking for?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly happy to chat offline, but what I am looking for is either an existing standard for this that we can point to from this common schema...or we move it to an Adobe extension.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lrosenthol - the concept of merging multiple profile records together to make a golden record is quite common in the industry. The term stitch and merge are synonymous. The Industry term for doing this in old school databases is creating the Golden Record and is a function of Master Data Management. We do it in real time and use simular concepts. Harleens us of the term "profile stitch" in this proposal is understandable by non Adobe practitioners without explanation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdegroot-adobe if I do a web search on "profile stitching", only one of the hits is about this (and it actually calls it visitor stitching).
Is there any existing industry reference document that we can point to that defines this?

"description": "Details about the ids that were joined by profile stitching.",
"$ref": "https://ns.adobe.com/xdm/context/profileStitch"
},
"xdm:segmentMemberships": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a "segment", where it is defined, is it specific to an Adobe workflow or generic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not defined yet in it's own xdm. This pull request didn't introduce the concept of segment membership, it already exists on profile and the PR for the segment membership is merged. I'm just adding it experience event. It's meant to be generic, but this will definitely be used by Audience Manager and I think Target.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. OK - no issue there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue Closed

"you may not use this file except in compliance with the License. You may obtain a copy",
"of the License at https://creativecommons.org/licenses/by/4.0/"
],
"$id": "https://ns.adobe.com/xdm/context/profileStitch",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs @id, as per new plans

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we need to get that approved first and then we will do the needful. I would like Kevin to return for the @id decision and exactly how we roll it out. In the XC we cannot automatically add them, so we need to do something to limit the explosion of fields in our operational store.

"$schema": "http://json-schema.org/draft-06/schema#",
"title": "Profile Stitch",
"type": "object",
"description": "Details about the ids that were joined by profile stitching.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a very Adobe-centric thing, yes? If so, it belongs in an extension and not in the primary schema

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see earlier comments on this. I am confident this is not an Adobe-centric things at all.

"of the License at https://creativecommons.org/licenses/by/4.0/"
],
"$id": "https://ns.adobe.com/xdm/context/profileStitchIdentity",
"$schema": "http://json-schema.org/draft-06/schema#",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also needs @id

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see earlier comment on it, we will pick up all the @ids in a dedicated PR.

{
"@id": "https://data.adobe.io/entities/profileStitchIdentity/1",
"xdm:namespace": {
"xdm:code": "AAM"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you also need an xdm:xid here? What use is the namespace without the thing you are namespacing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this as-is namespace @id.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@id is the ID of that particular instance of that schema - if that is what you are doing, then yes, I would agree. I thought you were referring to some other object...which would be via xdm:id

@@ -0,0 +1,40 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lower case the name of this schema, and the example too please.

@harleensahni
Copy link
Contributor Author

@cdegroot-adobe For splitting into two different PRs, one for addiitonalIDs and one for profile stitching, is it necessary? I ask because the profile stitching PR is useless without the additionalIDs and I want to call out as part of the PR for profile stitching that we'll be using additiionalIDs to have multiple ids for the same namespace.

As far as the question about what this PR does with profile stitch / merge rule and the flow:

  1. Merge rules / profile stitch are defined elsewhere (not addressed in this PR really, apart from declaring the ID format for those). This is similar to the punting approach we've taken with segment memberships but not defining segment yet.
  2. What is recorded in the Experience Event here is the result of a specific invocation of a profile stitch / merge rule that was done to connect this event and the ids in it to other profiles. Basically, what's recorded is that profile stitching that was done in this way (profileStichId, [linkedid_1, linkedid_2, linkedid_n]). And then adding segment membership on experience event with an additional field to refer back to the profile stitch id to find what other profiles that segment realization applies to (besides the one that came in on the event).

@lrosenthol
Copy link
Collaborator

@harleensahni what do you mean by "multiple IDs for the same namespace"? That just seems wrong.

As long as the "merge rules" you describe could be implemented by anyone else using the same schemas (but perhaps with their own ID format) - that sounds reasonable. If this is Adobe-centric, it needs to move to an extension.

@harleensahni
Copy link
Contributor Author

@lrosenthol

What do you mean by "multiple IDs for the same namespace"? That just seems wrong.

I think it'll be helpful if I give an example. Here the namespace will be ECID, and id will the cookie associated with the ECID and well us two values, ECID_1 (for device 1) and ECID_2 (for device 2). There'll be another namespace for identity of CRMID (possibly based off a hash of customer's email). For this example, we'll have one CRM IDs, CRMID_1.

Let's say that we've received one experience events in the past, EE_1 . On EE_1, we saw that the event came from ECID_1 and the user was logged in with CRMID_1. Now we have another experience event EE_2, the event came from ECID_2 and the user logged in was the same user with CRMID_1. If we have a profile stitching that's operating where it can join ECIDs based on the logged in CRMID, you will have two ECIDs of value ECID_1 and ECID_2 in the same namespace. This is recording that.

As far as what joining those two ECIDs, one thing is segmentation that operates against the joint profiles of ECID_1 and ECID_2. It's important to know which profiles were joined and to have that recorded so that you can record the segmentation back on both ECID_1 and ECID_2.

@lrosenthol
Copy link
Collaborator

ECID isn't a real namespace - at least as far as XDM is concerned. It's something specific to how you define the string that represents your IDs, yes? Since IDs are just arbitrary strings - that seems out of scope for XDM proper since it doesn't care what you stick in there...

@harleensahni
Copy link
Contributor Author

It's defined in XDM (under it's old name of MCID): https://github.com/adobe/xdm/blob/master/schemas/context/enduserids.schema.json#L20
When I'm saying namespace, I mean the identity space for end user ids.

@harleensahni
Copy link
Contributor Author

The alternative would be to not represent this as EndUserId schema and instead just have an array of Identity. I'm fine with either approach, but we'll need to add multiple id support for end user id either way because that is what Experience Event uses to record the IDs that came in on the call, and it does sometime happen you get multiple ids for the same thing.

@lrosenthol
Copy link
Collaborator

Gotcha - thanks for that link! (just filed an issue against that schema, #449 , but that's separate)

OK - so if you need multiple items, use an array. What am I missing??

@cdegroot-adobe
Copy link
Contributor

@lrosenthol the design of the endUserIDs is very carefully thought out. Having specific properties in known locations is very important for efficiency when there are very large numbers of records, as is the case with ExperienceEvent. That is why we have this design and it is predicated on multiple identities being an edge case. Changing how endUserIDs implements this is not and option. The additionalIDs is a way of catering for the edge case and is fine because they are not used in the same way as the "xdm:id" for the identity. additionalIDs is an array, so covers this case. As I say this specific area is quite carefully curated and what Harleen is doing here is the approach that multiple teams have discussed and come to agreement on.

@harleensahni
Copy link
Contributor Author

harleensahni commented Jul 26, 2018

The reason I chose to model this using EndUserId here is that is what ExperienceEvent uses for the primary ids that come in with the event here: https://github.com/adobe/xdm/blob/master/schemas/context/experienceevent.schema.json#L43. So I want this to be consistent with that. I think EndUserId was modeled the way it is to support being able to do lookups based on a namespace very quickly since it'll be its own column on disk. It would be weird if I modeled this as an array of EndUserIds since you could have multiple ids under the same namespace in separate instances of EndUserIds (and you can have multiple different namespaces in EndUserIds).

By adding the array of additionalIDs to EndUserIDs, you may not be able to quickly find if a particular id is in a namespace since it might be in the array, but you can at least still tell very quickly if an event has ANY ID in a particular namespace, which is valuable.

@lrosenthol
Copy link
Collaborator

Well, personally, I am not thrilled with this design but I will accept that we can't change the model at this time.

@@ -20,6 +20,14 @@
"type": "string",
"description": "Identity of the consumer in the related namespace."
},
"xdm:additionalIDs": {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Ids are also needed to support multiple ECID --> CRMID mappings that are provided by co-op and private graph customers. The support for additional Identities should be done outside of the Identity object where each id specified, is of type xdm:Identity. Either in the EndUserIds structure or in a separate Identity extension for ExperiencEvent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harleensahni please review with Rahul, I agree on your proposal, this is likely just something you need to talk through.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@biswas-adobe has concerns with Identity itself being modified. It's beginning to look like a better approach to using EndUserID here (as well as in Experience Event) is to go with an array of Identities.

Modeling with additional IDs is not compatible with #459 since it does not allow for different authentication states for each id in a namespace.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harleensahni I also feel like if would be better to use an array of Identity. This is how the same problem was handled in Profile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kstreeter, I can change to array of Identity. The problem is that we will face this same issue on EndUserIDs itself in other use cases though, and I think we're just punting the issue. There are cases where a customer can send multiple ids for the same Identity namespace on the same call. These will need to be recorded on the ExperienceEvent and that uses EndUserID.

It's also weird that we would model Identities in two different ways for anyone who is consuming these events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kstreeter @biswas-adobe I've changed this to be an array of Identity

}
},
"xdm:profileStitchID": {
"xdm:id": "https://data.adobe.io/entities/profileStitchIdentity/1",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to change this back to @id

@@ -114,5 +114,48 @@
},
"xdm:marketing": {
"xdm:trackingCode": "marketingcampaign111"
}
},
"xdm:profileStitching": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions:

  • What is the use case for adding profileStitching to ExperienceEvent?
  • Is the user expected to populate this via ETL jobs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To support profile stitching in realtime for use with realtime segmentation
Someone could populate this with an ETL job if they wanted, but that wasn't the use case I was considering. I think it's open enough to support that.

"xdm:code": "AAM"
}
},
"xdm:profileStitchID": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to add this to each segment realization and each ExperienceEvent? This information can be looked up on demand from the segment metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because without it, this event isn't complete. You'd have to reference the segment definition to look up profile stitching that occurred in this event (in ExperienceEvent.profileStitching) to what profiles were used to qualify this segment. That doesn't seem ideal. The alternative to avoid that would be to just embed endUserIds here instead of profileStitchID. That would be much more verbose if multiple segments were realized on this call since the same set of endUserIDs would be repeated in multiple memberships. Also, we may be in a position where we want to record information about profile stitching in ExperienceEvent.profileStitching for other use cases down the road. If that happens, then the same set of information would be in ExperienceEvent.profileStitching.endUserIds and each segmentmembership endUserIds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohrahit, another alternative existed also. That was to have segmentmembership be a property under each profile stitch here in ExperienceEvent. That felt too nested and more importantly, assumed that all segmentation happened with profile stitching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohrahit, are you okay with this approach?

…stead od of with EndUserIDs. Remove concept of additionalIDs
@kstreeter kstreeter added this to the Stabilizing 1.0 milestone Aug 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants