-
Notifications
You must be signed in to change notification settings - Fork 2
Entity Resolution
Entity resolution allows you to find pairs of nodes (and egos) across different sessions that represent the same person, place or object. You can export a single network including these merged nodes, and their resolved properties. This is facilitated by sending a list of nodes to a script (typically python), which then returns a list of pairs with scores of the probability of matching.
You will need two things in order to use entity resolution:
Entity resolution is used to connect nodes between sessions, for this reason you should start with a dataset of at least 2 sessions.
For this example you should start with the protocol found here:
https://github.com/complexdatacollective/entity-resolution-sample/blob/master/examples/protocols/Simple%20Entity%20Resolution%20Protocol.netcanvas
An entity resolver receives a list of nodes from the Server app and returns a list of pairs with an associated probability score.
Because a resolver interprets the dataset of a network it is specific to a .netcanvas
protocol.
The resolver will receive nodes over stdin, and should return results over stdout. This can happen synchronously: sending results after all nodes are received and processed; or asynchronously: immediately returning results as soon as the first nodes are received, and continuing to process them as they are received.
Server assumes the resolver will be an interpreted script written in python.
For this example you should start with the resolver found here:
https://github.com/complexdatacollective/entity-resolution-sample/blob/master/EntityResolution.py
Prerequisites:
- python3 installed (for sample scripts, python2 should work in principle with other scripts)
- Latest Server
6.1.0
installed (available on slack) - Latest Interviewer
6.0.3
installed, and paired with interviewer (available on github)
- Download the example resolver from https://github.com/complexdatacollective/entity-resolution-sample/
- Install the protocol found in
/examples/protocols/Simple Entity Resolution Protocol.netcanvas
in Server and Interviewer. - Create at least 2 example sessions.
- Export those sessions to Server.
- In Server, go to the Simple protocol workspace and click the "Resolve data" tab
- Go to the "Resolve Sessions" section
- Select "Person" as the ego node cast type (this will convert egos into
person
nodes so that they can be included in the comparison) - Interpreter should be set to the location of the python3 installation on your system. If it's included in your $PATH, you can leave this as just
python3
. - Set the Resolver Script Path to
EntityResolution.py
in the sample files - Click Begin Entity Resolution
- For each pair you may select a combination of attributes, or 'Not a match'
- After you confirm the last match you will be presented with a summary screen, click "Save and Export" (and export will be generated with the default export settings, but with all sessions merged)
- In Server, go to the Simple protocol workspace and click the "Resolve data" tab
- Go to the "Existing resolutions" section
- Click "Export" on the resolution you would like to export
If you add new sessions you may wish to also resolve them. Resolutions are cumulative, meaning this feature will attempt to resolve these later sessions with any previous resolutions. These steps assume you have already created previous resolutions by following steps here:
https://github.com/complexdatacollective/Server/wiki/_new#resolving-sessions
- Create at least one extra session in Interviewer, and export it to Server.
- In Server, go to the Simple protocol workspace and click the "Resolve data" tab
- Go to the "Resolve Sessions" section
- You will not be able to set the ego cast type (this could cause conflicts with previous resolutions)
- Interpreter will be set to the same location as the latest resolution, but can be changed.
- The Resolver Script Path will also be set to the same path as the last resolution, and can be changed.
- Click Begin Entity Resolution
The script will receive:- Previously resolved nodes from earlier sessions, in their resolved state
- Nodes from new sessions will be unchanged.
- For each pair you may select a combination of attributes, or 'Not a match'
- After you confirm the last match you will be presented with a summary screen, click "Save and Export" (and export will be generated with the default export settings, but with all sessions merged)