-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema definition #4
Comments
Max Vasiliev commented on Slack:
|
The project is about mapping research papers with code repositories, if I got it wrongs please correct me. Hence, code repository seems to be a requirement when submitting new entry to the database. |
What this issue seems to be addressing is the "Papers" node, and what the necessary attributes and relationships are for that node. This is fantastic. I'm going to provide a bit more context on the larger project, then address minimum attributes and relationships for papers. I am using the term "minimum" here because there are no "mandatory" attributes or relationships. Some papers might have all attributes (an author, title, code, etc.), while others might only have one or two. Both can exist in the graph. ContextMOSS is about mapping an ecosystem, not only papers. Papers are one aspect of the ecosystem. Four core questions related to this thread are:
With this in mind, we might use these questions to guide us:
The minimums we have identified so far are laid out here: https://docs.google.com/document/d/1NEWtI7hqQA74jk9Geg8bwKVS3qTzV9hWAfEMQg_Y1gM/edit?tab=t.0 High level, the core nodes are:
The core attributes and relationships can be viewed in the doc. So the question becomes: Is this model a good starting point? What are we missing? For example, in the "projects" node, I don't think we yet have a "depends on" relationship for mapping dependencies. Paper AttributesHere are the current attributes and relationships for papers: Attributes
Relationships
We capture authors through a relationship stemming from the "people" node. So, the question becomes: Are these good starting attributes and relationships for papers? What is missing? |
One concern I see is that mapping outward from paper space reaches only a subset of all projects without some work / interpretation / confidence score etc. Great if (:Paper)-[:OFFICIAL_CODE]->(:Project) exists (paper links to authors repo) jring-o, its not just missing from the schema, its a key piece of the work we'd need to do. In the other direction, (:Paper)-[:CITES]->(:Project) would show what exactly? Ex.
** OpenAlex has only the preprint of https://arxiv.org/abs/2405.21060, with 1 citation. But this work is already in active use and being further built upon.
I think this shows integration, but how much are those models actually being used? can we estimate based on code class names? forum discussions? I also Imagine the number of definitive connections between papers and projects is relatively scarce compared to the total papers and projects. that is, most papers won't have official code. are we focusing on those that do? Maybe building off something like inclusion in HF transformers/Tensorflow/keras? How do we feel about cycles? 😅 |
To jumpstart the conversation, what other mandatory fields do we need besides the following:
The text was updated successfully, but these errors were encountered: