You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Confidence in mappings is a tricky issue. While SSSOM has a nice confidence field, it is not very clear from the specification alone what it pertains to. There are at least two possible interpretations:
confidence of the mapping: the likelihood that the mapping is correct. This is most likely the prevalent interpretation, but not the one we have intended.
confidence of the mapping justification: degree of trust gained from the justification into the truthfulness of the mapping. This is what we originally intended, but never communicated very well.
In practice, both are quite similar (especially in the frequent case of only having a single justification), but the reality is that a mapping can have multiple justifications, all of which provide different levels of confidence into the truthfulness of the mapping. We can have a low confidence value provided by a lexical match justification, and a high confidence value by a human curated match, and neither, all by itself, says something about the "likelihood that the mappings is correct".
The matter of fact is, they mean something different. And to make things worse, we have the following to consider:
the phrase "likelihood that the mappings is correct" is basically meaningless as mappings cannot really be true in the philosophical sense of the idea of truth. Mappings can serve a purpose.
There are at least two more stakeholders that the sssom standard considers, but has not yet really documented well:
registry confidence: The confidence of a mapping registry into the quality of a specific mapping set, which is basically a measure of trust of the registry into the mapping provider
user ratings (semapv:MappingReview): Basically thumbs up/down votings or confirmations that a particular mapping is correct (this is similar to semapv:ManualMappingCuration but not quite the same, as it does not include the search for alternative, possibly better, mappings)
Now given all this complexity, it makes sense to think about a recommended way how tools should determine the overall confidence in a mapping. For example, consider an instance of OxO loading a mapping set with
a low registry_confidence (not too trustworthy, i.e. ad-hoc lexical matching)
multiple justifications per mapping (all with different confidence levels)
We also want to support a user-rating feature in the app (thumbs up/down).
The two concrete things we need to determine is this:
How should the tool compute overall mapping confidence? ("Give me all the high confidence, >90%, mappings")
How should the tool capture that confidence value? By creating an additional semapv:CompositeMatching justification with mapping_tool=OxO and a confidence value compounded of all the others? By adding a non-standard mapping_confidence value to the internal data model and use that to drive search?
I don't think anything should be done here in a normative way, but I think it is valuable to discuss this or at least have a ticket to capture some of our thoughts on the matter.
For me personally, right now, I tend to think something like this is a good start for computing the mapping confidence:
mapping confidence = (m*AVG(confidence)) * (n*RegistryConfidence) * (o * (thumbs-up/ratings))
with m, n, o initially set to 1, but independently adjustable by the mapping browser developer.
and recommending to throwing a new semapv:CompositeMatching justification into the mapping database to capture this.
The text was updated successfully, but these errors were encountered:
Confidence in mappings is a tricky issue. While SSSOM has a nice
confidence
field, it is not very clear from the specification alone what it pertains to. There are at least two possible interpretations:In practice, both are quite similar (especially in the frequent case of only having a single justification), but the reality is that a mapping can have multiple justifications, all of which provide different levels of confidence into the truthfulness of the mapping. We can have a low confidence value provided by a lexical match justification, and a high confidence value by a human curated match, and neither, all by itself, says something about the "likelihood that the mappings is correct".
The matter of fact is, they mean something different. And to make things worse, we have the following to consider:
Now given all this complexity, it makes sense to think about a recommended way how tools should determine the overall confidence in a mapping. For example, consider an instance of OxO loading a mapping set with
We also want to support a user-rating feature in the app (thumbs up/down).
The two concrete things we need to determine is this:
semapv:CompositeMatching
justification withmapping_tool
=OxO and a confidence value compounded of all the others? By adding a non-standardmapping_confidence
value to the internal data model and use that to drive search?I don't think anything should be done here in a normative way, but I think it is valuable to discuss this or at least have a ticket to capture some of our thoughts on the matter.
For me personally, right now, I tend to think something like this is a good start for computing the mapping confidence:
with m, n, o initially set to 1, but independently adjustable by the mapping browser developer.
and recommending to throwing a new
semapv:CompositeMatching
justification into the mapping database to capture this.The text was updated successfully, but these errors were encountered: