Replies: 1 comment 2 replies
-
Hi Max - great question! I'll preface this by saying I'm not an expert in self-supervised ML approaches outside those use by Splink - so take what I say with a grain of salt! With that said, I have always wondered whether there may be some mileage in applying alternative ML techniques to the results of Splink. A while back I tried to run a basic logistic regression using the Splink comparison vector (comparison level values) as features, and the splink prediction as the 'label'. The idea was that it may be a way around the the 'conditional independence' assumption which is usually violated in practice (specifically conditional independence of comparison columns on match status). The idea being that multiple regression techniques allow for correlations between features. I didn't try that hard, but my memory is that logistic regression didn't help much*. However:
I like the idea of using a ML model (e.g. decision trees) iteratively - it had never occurred to me to try and it's a good idea. It certainly sounds worth experimenting e.g. with only the p > 0.9 results. I guess whatever works best! Finally, the biggest challenge we've faced with this sort of thing is producing robust evidence that it works, due to the absence of high quality labelled data to measure accuracy. We do have a couple of synthetic datasets relating to persons derived from wikidata that I could potentially share if they may be of use
|
Beta Was this translation helpful? Give feedback.
-
Hey there.
I was curious with regard to the naive bayes EM approach whether this is just a self-supervised learning approach which also could be implemented iteratively with any other algorithm like decision trees (E: train the tree with labels, M: reassign labels based on the predictions). Have I overlooked something?
If yes, there is are also fine-tunings to this approach where the retraining is only done on quite "certain" predictions (e.g. probability > 0.9). What do you think about this?
Best,
Max
Beta Was this translation helpful? Give feedback.
All reactions