Figure out Preprocessing #4

ablodge · 2020-07-12T17:46:38Z

Preprocessing in the parser is based on Zhang et al. 2019 and only works on AMRs. We need to figure out whether/how we want to handle preprocessing of UCCA, EDS, DRG, and PTG.

I think to get the preprocessing working on the new data, you only need to modify AMRIO to look more like AMRGraph.

One possible consequence of working without preprocessing:
AMRGraph.py apparently expects attributes to be in a particular format or else it ignores them (line 63). While working on the parser without preprocessing, this basically results in all attributes being ignored.

The text was updated successfully, but these errors were encountered:

ablodge · 2020-07-12T17:48:25Z

@jakpra Do you want to work on this one?

jakpra · 2020-07-12T18:40:45Z

Sure. I'd like to go about this by looking for general (linguistic/structural?) patterns within each framework and across frameworks.
Like I said before, the Zhang+19 preprocessing looks very AMR-specific and not very principled. It has many special cases that handle just an individual word or construction. I'm all for handling long-tail phenomena, but I can't imagine that this style of preprocessing is worth spending a lot of time on.

I'll look into the attribute formatting; I guess a simple workaround could just be to comment out those lines that would ignore the "bad" ones... But most importantly, we should check what makes them "bad" and what the shared task has to say about that.

jakpra · 2020-07-13T02:12:31Z

Disabled a bunch of AMR-specific well-formedness checks in AMRGraph.py for now so we don't lose anything from the other frameworks.
Have to check which of the checks should be re-enabled.
Ran stanza to add features.
Extracted vocabs.
Check what other (liguistically or otherwise) principled preprocessing steps we can do.
Implement additional preprocessing.
Run preprocessing.

ablodge assigned ablodge and jakpra and unassigned ablodge and jakpra Jul 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out Preprocessing #4

Figure out Preprocessing #4

ablodge commented Jul 12, 2020 •

edited

Loading

ablodge commented Jul 12, 2020

jakpra commented Jul 12, 2020

jakpra commented Jul 13, 2020 •

edited

Loading

Figure out Preprocessing #4

Figure out Preprocessing #4

Comments

ablodge commented Jul 12, 2020 • edited Loading

ablodge commented Jul 12, 2020

jakpra commented Jul 12, 2020

jakpra commented Jul 13, 2020 • edited Loading

ablodge commented Jul 12, 2020 •

edited

Loading

jakpra commented Jul 13, 2020 •

edited

Loading