Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

European Union Participant Identifier Code (PIC) #97

Open
ghost opened this issue Aug 25, 2021 · 11 comments
Open

European Union Participant Identifier Code (PIC) #97

ghost opened this issue Aug 25, 2021 · 11 comments
Labels
curation Managing registry data and registry updates data model/schema Changes to ROR data model/schema

Comments

@ghost
Copy link

ghost commented Aug 25, 2021

Have you any plans to include the EU's Participant Identifier codes? Widely used around the world for applications and partnerships in European research and educational exchange programmes.

@mariagould
Copy link
Contributor

Hi @JTJD thanks for your question. We have no current plans to add a mapping to EU PICs. Please feel free to share more information about your use case for having these identifiers available in ROR so that we can evaluate this for future development work

@paulmillar
Copy link

I would also be interested if ROR were to provide an organisation's PIC value.

I can describe my own particular use-case.

Through "Horizon 2020", the EU spent just shy of €80 billion over the past 7 years (2014 to 2020) to fund research and innovation. The next generation funding instrument will be "Horizon Europe" and will have a budget of over €95 billion.

CORDIS is the system that underpins the process of applying for funding and, if successful, for reporting on the project's activities. The EU has made all the data underpinning CORDIS available as open data, including details on all previously funded and ongoing projects. The license is very broad, allowing almost any reuse of this data.

There is a problem, though.

The EU identifies organisations with a Participant Identifier Code (PIC). This is a unique numerical ID that the EU assigns each organisation. This number, along with some metadata about the organisation (e.g., name, abbreviation, address, etc), is available through CORDIS' open-data. However (crucially) this metadata contains no link to any other identifier for the organisation. This makes it impossible to link the CORDIS corpus automatically with other corpora, unless the other data sources also use PIC values to identify organisations.

My own particular use-case involves linking CORDIS information with databases of scientific instruments, research groups (that benefit from EU funding) and people working within those research groups.

I would like to use ROR as common, unique identifiers for organisations and use the ROR metadata to further enhance information about the organisations.

I could do this manually, by creating the mapping from PIC to ROR ID for those organisations that matter for my use-case. However, I would imagine ROR supporting PIC would benefit others (including @JTJD, seemingly).

NB. There may be other corpora (from the EU or elsewhere) that use a PIC values to identify organisations, and might also benefit from ROR's support of PIC. My use-case is just an example of how this might be beneficial.

@ghost
Copy link
Author

ghost commented Dec 1, 2021 via email

@paulmillar
Copy link

Here are some further observations.

Currently, CORDIS has just shy of 40,000 organisations, compared to over 100,000 in ROR.

With some digging, I found an automated way of linking CORDIS PIC IDs to ROR IDs. The CORDIS corpus includes the EU VAT number as metadata for ~85% of the organisations it describes. Wikidata has (for some organisations) both the ROR ID and the EU VAT number. Therefore, using Wikidata, it's possible to map some CORDIS organisation's PIC ID to their corresponding ROR ID.

As a proof-of-principle, I selected 42 organisations' EU VAT numbers from CORDIS and built a SPARQL query that tries to extract those organisation's ROR ID from Wikidata. That query yielded 16 ROR IDs: a little over one third. While that's far from perfect, it's better than starting from scratch (assuming this small test is representative).

For comparison, matching names (case-insensitive, but otherwise exact) and requiring exactly one match yielded little more than 1,500 links (~3%). A more flexible might yield more, but increases the risk of false matching.

In addition, both CORDIS and ROR include geographical coordinates for organisations. Any auto-generated PIC-to-ROR link could be validated using these coordinates; for example, by calculating the (great circle arc) distance between the two coordinates and reject the link if that distance is over 1 km (say).

@paulmillar
Copy link

Hi @mariagould

You mentioned "evaluate this for future development work".

May I ask about the process through which this request would be evaluated?

In particular, I was wondering on what timescale would something likely happen?

Cheers,
Paul.

@mariagould
Copy link
Contributor

Hi @paulmillar thanks for your question. There are a number of considerations involved in changing the current data model. In terms of the mappings to other IDs there are technical considerations as well as policy ones (e.g., what criteria might be used to select the other ID types that ROR should map to, how should the mappings be prioritized, etc.). This is an area where additional consultation with users and community members will be useful, via existing channels such as our bimonthly community calls and asynchronous discussion forums. In terms of timescales, the priority for ROR development work right now is implementing the core infrastructure that is needed to support registry additions and updates. This needs to be up and running before we look at any changes to the data model. I would not expect any changes in the near term.

@paulmillar
Copy link

Thanks @mariagould for the explanation. That certainly makes sense. I look forward to the result of your consultation process.

In the mean time, I've created a proof-of-principle project (PIC-to-ROR) to generate a mapping from an organisation's PIC to the corresponding ROR identifier.

Currently, it uses the CORDIS data dump to discover a list of organisations and Wikidata to convert those with an EU VAT number to the corresponding ROR identifier. This approach is a "low-hanging fruit". I imagine adding other approaches in the future.

This is a humble beginning: of the 40,096 organisations in CORDIS, only 2,347 are mapped to their ROR identifier, a mere 5%; however, it's a starting point. I hope to improve this over time.

I've uploaded the command's output, so it's available for everyone who is interested in mapping EU PIC to ROR identifiers without having to run the code themselves. I will try to keep this file reasonably up-to-date, as time permits.

@mariagould mariagould added data model/schema Changes to ROR data model/schema curation Managing registry data and registry updates labels Mar 1, 2022
@lizkrznarich lizkrznarich transferred this issue from ror-community/ror-api Oct 26, 2022
@amandafrench
Copy link
Contributor

amandafrench commented Aug 12, 2024

@paulmillar Wanted to make sure you saw that we now have a proposal that's open for comment on adding new external IDs to ROR, and PIC is a top contender for an early add. Take a look: https://ror.org/blog/2024-07-18-id-ideas/ -- comments open through August 16, 2024

@adambuttrick
Copy link
Contributor

Additionally requested by the Czech Science Foundation - https://ror.org/01pv73b02.

@paulmillar
Copy link

Hi @amandafrench,

Thanks for the "heads up". The proposal looks quite reasonable to me.

I've added a few comments to the document (even though this is strictly past the deadline). None of them are (in any sense) blocking, just some "friendly amendments".

@amandafrench
Copy link
Contributor

Terrific! Thanks so much, @paulmillar!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
curation Managing registry data and registry updates data model/schema Changes to ROR data model/schema
Projects
Status: Schema changes
Development

No branches or pull requests

4 participants