Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingestion: NEJLT 2023 #3720

Merged
merged 5 commits into from
Oct 31, 2024
Merged

Ingestion: NEJLT 2023 #3720

merged 5 commits into from
Oct 31, 2024

Conversation

anthology-assist
Copy link
Collaborator

@anthology-assist anthology-assist commented Aug 7, 2024

  1. In the Github sidebar, add the PR to work items and the current milestone
  2. In the Github sidebar, under "Development", make sure to link to the corresponding issue
  3. Make sure the branch is merged with the latest master branch
  4. Ensure that there are editors listed in the <meta> block
  5. For workshops, add a <venue>ws</venue> tag to its meta block
  6. For workshops, add a backlink from the main event's <event> block
  7. Add events to their relevant SIGs
  8. Look at the venue listing for prior years, and ensure that the new volume titles are consistent. You can do this by clicking on the venue name from a paper page, which will take you to the vendor listing.
  9. Navigate to the event page preview (e.g., https://preview.aclanthology.org/icnlsp-ingestion/events/icnlsp-2021/), and page through, to see if there are any glaring mistakes
  10. Skim through the complete listing, looking for mis-parsed author names.
  11. Download the frontmatter and verify that the table of contents matches at least three randomly-selected papers
  12. Download 3–5 PDFs (including the first and last one) and make sure they are correct (title, authors, page numbers).

Copy link

github-actions bot commented Aug 7, 2024

Build successful. Some useful links:

This preview will be removed when the branch is merged.

@anthology-assist anthology-assist added this to the 2024Q3 milestone Aug 7, 2024
@anthology-assist anthology-assist linked an issue Aug 7, 2024 that may be closed by this pull request
2 tasks
@mbollmann
Copy link
Member

LGTM, except that the full volume PDF doesn’t seem to be included? (The data/nejlt-09/proceedings/cdrom/nejlt-2023.1.pdf)

@mjpost
Copy link
Member

mjpost commented Aug 23, 2024

@anthology-assist can you comment here? What happened to the full volume PDF?

Comment on lines 7 to 8
<publisher>Northern European Association of Language Technology</publisher>
<address>Copenhagen, Denmark</address>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that this is a mistake that also needs to be corrected in previous issues; it should be:

<publisher>Linköping University Electronic Press</publisher>
<address>Linköping, Sweden</address>

(I could open a PR for previous issues.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to just fix these and add them to this PR?

@mbollmann
Copy link
Member

I tried to run the ingestion myself locally, but I feel I’m missing something or don’t know which script to use, since I’m not getting the same XML file as here with bin/ingest.py. :)

Would be nice if full volume PDF could be added as well. I’m happy for pointers if there was something wrong in the ingestion material regarding that.

@mjpost
Copy link
Member

mjpost commented Oct 30, 2024

Ack, I'm really sorry ingestion here came to a complete standstill.

@mjpost
Copy link
Member

mjpost commented Oct 30, 2024

@mbollmann can you post the full-volume PDF somewhere (to this post, if possible), and I will ingest this?

@mjpost
Copy link
Member

mjpost commented Oct 30, 2024

Here is the code that looks for book names:

potential_names = [

It seems we need to add a new pattern. I just did that. We can fix this manually this time if I get the PDF.

@mbollmann
Copy link
Member

Full issue PDF: https://nejlt.ep.liu.se/issue/view/407/337

I can also update the name in the script that produces the materials of course, I have just been re-using what Leon used in previous years to produce these materials. :)

@mjpost mjpost changed the title Ingestion: NEJLT Ingestion: NEJLT 2023 Oct 30, 2024
@mbollmann
Copy link
Member

LGTM now, thank you!

@mbollmann mbollmann merged commit ad4e2a8 into master Oct 31, 2024
2 checks passed
@mbollmann mbollmann deleted the nejlt-24-ingestion branch October 31, 2024 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ingestion Request: NEJLT Volume 9 (2023)
3 participants