-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
profile stitch #444
profile stitch #444
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still processing this PR, just made a few small syntactic change requests.
schemas/context/identity.schema.json
Outdated
@@ -20,6 +20,14 @@ | |||
"type": "string", | |||
"description": "Identity of the consumer in the related namespace." | |||
}, | |||
"xdm:additionalIds": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additionalIDs
schemas/context/identity.schema.json
Outdated
"items": { | ||
"type": "string" | ||
}, | ||
"description": "If consumer has more than one of the given id, additional ids are listed here." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If endUser has more than one identity in the namespace, additional identity ids are listed here.
"xdm:version": "1.0", | ||
"xdm:endUserIds": { | ||
"https://ns.adobe.com/experience/mcid": { | ||
"@id": "https://data.adobe.io/entities/identity/92312748749128", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has now changed to "xdm:id" as of 0.9.3
"xdm:endUserIds": { | ||
"https://ns.adobe.com/experience/mcid": { | ||
"@id": "https://data.adobe.io/entities/identity/92312748749128", | ||
"additionalIds": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additionalIDs
"definitions": { | ||
"profileStitchIdentity": { | ||
"properties": { | ||
"@id": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@id
->"xdm:id"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want this as @id similar to how we've done with segment identity. We will have profiles stitching defined as full objects in XDM that should be referenced as their own entities
} | ||
], | ||
"meta:status": "experimental" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add newline please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@harleensahni - I really like the additionalIDs. To simplify things I recommend that you pit that in a separate PR as that is quite a distinct thing and we should be able to get closure on it quickly. I do not quite understand the concept of the core of this addition to ExperienceEvent. I am assuming that the merge rules and profile stitching are added during data-collection. So this is really a log of that. Can you explain the interaction flow a bit please? |
@@ -126,6 +126,31 @@ | |||
"description": "The media activity information related to the experience event" | |||
} | |||
}, | |||
"xdm:profileStitching" : { | |||
"title": "Profile Stitching", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a very Adobe-centric thing, yes? If so, it belongs in an extension and not in the primary schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. It's about which set of profiles segmentation is running against. Adobe has some functionality around this, but I think others in the industry do too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, I would want to understand how it fits into the larger industry context before accepting this in a common schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, should we talk offline? Or do you have something in particular you're looking for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly happy to chat offline, but what I am looking for is either an existing standard for this that we can point to from this common schema...or we move it to an Adobe extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lrosenthol - the concept of merging multiple profile records together to make a golden record is quite common in the industry. The term stitch and merge are synonymous. The Industry term for doing this in old school databases is creating the Golden Record and is a function of Master Data Management. We do it in real time and use simular concepts. Harleens us of the term "profile stitch" in this proposal is understandable by non Adobe practitioners without explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cdegroot-adobe if I do a web search on "profile stitching", only one of the hits is about this (and it actually calls it visitor stitching).
Is there any existing industry reference document that we can point to that defines this?
"description": "Details about the ids that were joined by profile stitching.", | ||
"$ref": "https://ns.adobe.com/xdm/context/profileStitch" | ||
}, | ||
"xdm:segmentMemberships": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a "segment", where it is defined, is it specific to an Adobe workflow or generic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not defined yet in it's own xdm. This pull request didn't introduce the concept of segment membership, it already exists on profile and the PR for the segment membership is merged. I'm just adding it experience event. It's meant to be generic, but this will definitely be used by Audience Manager and I think Target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. OK - no issue there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue Closed
"you may not use this file except in compliance with the License. You may obtain a copy", | ||
"of the License at https://creativecommons.org/licenses/by/4.0/" | ||
], | ||
"$id": "https://ns.adobe.com/xdm/context/profileStitch", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also needs @id
, as per new plans
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we need to get that approved first and then we will do the needful. I would like Kevin to return for the @id
decision and exactly how we roll it out. In the XC we cannot automatically add them, so we need to do something to limit the explosion of fields in our operational store.
"$schema": "http://json-schema.org/draft-06/schema#", | ||
"title": "Profile Stitch", | ||
"type": "object", | ||
"description": "Details about the ids that were joined by profile stitching.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a very Adobe-centric thing, yes? If so, it belongs in an extension and not in the primary schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see earlier comments on this. I am confident this is not an Adobe-centric things at all.
"of the License at https://creativecommons.org/licenses/by/4.0/" | ||
], | ||
"$id": "https://ns.adobe.com/xdm/context/profileStitchIdentity", | ||
"$schema": "http://json-schema.org/draft-06/schema#", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also needs @id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see earlier comment on it, we will pick up all the @id
s in a dedicated PR.
{ | ||
"@id": "https://data.adobe.io/entities/profileStitchIdentity/1", | ||
"xdm:namespace": { | ||
"xdm:code": "AAM" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you also need an xdm:xid
here? What use is the namespace without the thing you are namespacing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this as-is namespace @id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@id is the ID of that particular instance of that schema - if that is what you are doing, then yes, I would agree. I thought you were referring to some other object...which would be via xdm:id
@@ -0,0 +1,40 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lower case the name of this schema, and the example too please.
@cdegroot-adobe For splitting into two different PRs, one for addiitonalIDs and one for profile stitching, is it necessary? I ask because the profile stitching PR is useless without the additionalIDs and I want to call out as part of the PR for profile stitching that we'll be using additiionalIDs to have multiple ids for the same namespace. As far as the question about what this PR does with profile stitch / merge rule and the flow:
|
@harleensahni what do you mean by "multiple IDs for the same namespace"? That just seems wrong. As long as the "merge rules" you describe could be implemented by anyone else using the same schemas (but perhaps with their own ID format) - that sounds reasonable. If this is Adobe-centric, it needs to move to an extension. |
I think it'll be helpful if I give an example. Here the namespace will be ECID, and id will the cookie associated with the ECID and well us two values, Let's say that we've received one experience events in the past, As far as what joining those two ECIDs, one thing is segmentation that operates against the joint profiles of |
ECID isn't a real namespace - at least as far as XDM is concerned. It's something specific to how you define the string that represents your IDs, yes? Since IDs are just arbitrary strings - that seems out of scope for XDM proper since it doesn't care what you stick in there... |
It's defined in XDM (under it's old name of MCID): https://github.com/adobe/xdm/blob/master/schemas/context/enduserids.schema.json#L20 |
The alternative would be to not represent this as EndUserId schema and instead just have an array of Identity. I'm fine with either approach, but we'll need to add multiple id support for end user id either way because that is what Experience Event uses to record the IDs that came in on the call, and it does sometime happen you get multiple ids for the same thing. |
Gotcha - thanks for that link! (just filed an issue against that schema, #449 , but that's separate) OK - so if you need multiple items, use an array. What am I missing?? |
@lrosenthol the design of the endUserIDs is very carefully thought out. Having specific properties in known locations is very important for efficiency when there are very large numbers of records, as is the case with ExperienceEvent. That is why we have this design and it is predicated on multiple identities being an edge case. Changing how endUserIDs implements this is not and option. The additionalIDs is a way of catering for the edge case and is fine because they are not used in the same way as the "xdm:id" for the identity. additionalIDs is an array, so covers this case. As I say this specific area is quite carefully curated and what Harleen is doing here is the approach that multiple teams have discussed and come to agreement on. |
The reason I chose to model this using EndUserId here is that is what ExperienceEvent uses for the primary ids that come in with the event here: https://github.com/adobe/xdm/blob/master/schemas/context/experienceevent.schema.json#L43. So I want this to be consistent with that. I think EndUserId was modeled the way it is to support being able to do lookups based on a namespace very quickly since it'll be its own column on disk. It would be weird if I modeled this as an array of EndUserIds since you could have multiple ids under the same namespace in separate instances of EndUserIds (and you can have multiple different namespaces in EndUserIds). By adding the array of additionalIDs to EndUserIDs, you may not be able to quickly find if a particular id is in a namespace since it might be in the array, but you can at least still tell very quickly if an event has ANY ID in a particular namespace, which is valuable. |
Well, personally, I am not thrilled with this design but I will accept that we can't change the model at this time. |
schemas/context/identity.schema.json
Outdated
@@ -20,6 +20,14 @@ | |||
"type": "string", | |||
"description": "Identity of the consumer in the related namespace." | |||
}, | |||
"xdm:additionalIDs": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Ids are also needed to support multiple ECID --> CRMID mappings that are provided by co-op and private graph customers. The support for additional Identities should be done outside of the Identity object where each id specified, is of type xdm:Identity. Either in the EndUserIds structure or in a separate Identity extension for ExperiencEvent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harleensahni please review with Rahul, I agree on your proposal, this is likely just something you need to talk through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@biswas-adobe has concerns with Identity itself being modified. It's beginning to look like a better approach to using EndUserID here (as well as in Experience Event) is to go with an array of Identities.
Modeling with additional IDs is not compatible with #459 since it does not allow for different authentication states for each id in a namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harleensahni I also feel like if would be better to use an array of Identity. This is how the same problem was handled in Profile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kstreeter, I can change to array of Identity. The problem is that we will face this same issue on EndUserIDs itself in other use cases though, and I think we're just punting the issue. There are cases where a customer can send multiple ids for the same Identity namespace on the same call. These will need to be recorded on the ExperienceEvent and that uses EndUserID.
It's also weird that we would model Identities in two different ways for anyone who is consuming these events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kstreeter @biswas-adobe I've changed this to be an array of Identity
} | ||
}, | ||
"xdm:profileStitchID": { | ||
"xdm:id": "https://data.adobe.io/entities/profileStitchIdentity/1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to change this back to @id
@@ -114,5 +114,48 @@ | |||
}, | |||
"xdm:marketing": { | |||
"xdm:trackingCode": "marketingcampaign111" | |||
} | |||
}, | |||
"xdm:profileStitching": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of questions:
- What is the use case for adding profileStitching to ExperienceEvent?
- Is the user expected to populate this via ETL jobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To support profile stitching in realtime for use with realtime segmentation
Someone could populate this with an ETL job if they wanted, but that wasn't the use case I was considering. I think it's open enough to support that.
"xdm:code": "AAM" | ||
} | ||
}, | ||
"xdm:profileStitchID": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to add this to each segment realization and each ExperienceEvent? This information can be looked up on demand from the segment metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because without it, this event isn't complete. You'd have to reference the segment definition to look up profile stitching that occurred in this event (in ExperienceEvent.profileStitching) to what profiles were used to qualify this segment. That doesn't seem ideal. The alternative to avoid that would be to just embed endUserIds here instead of profileStitchID. That would be much more verbose if multiple segments were realized on this call since the same set of endUserIDs would be repeated in multiple memberships. Also, we may be in a position where we want to record information about profile stitching in ExperienceEvent.profileStitching for other use cases down the road. If that happens, then the same set of information would be in ExperienceEvent.profileStitching.endUserIds and each segmentmembership endUserIds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohrahit, another alternative existed also. That was to have segmentmembership be a property under each profile stitch here in ExperienceEvent. That felt too nested and more importantly, assumed that all segmentation happened with profile stitching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohrahit, are you okay with this approach?
…stead od of with EndUserIDs. Remove concept of additionalIDs
This PR addresses issue #468.
Add a concept of profile stitch / merge rules to experience event. I had to introduce a concept of profile stitch identity similar to what we have for segment identity, but I'd prefer if we just took a much simpler approach with segment and profile ids as simple strings.
I also add segment membership to experience event and modify identity to support multiple ids for the same namespace (which is a hard requirement for merging).