Update the TweetParser code for v2? #123
Replies: 2 comments
-
This is interesting to us at twarc too: DocNow/twarc#379 we had an idea to have a v1.1 -> v2 and v2 -> v1.1 converter to make old scripts compatible. Separately, a lot of people are stuck thinking in CSVs, so we support that too https://github.com/DocNow/twarc-csv often, the CSV imported into a dataframe in R or Pandas is the starting point for analysis for people. The "JSON Helper" idea is definitely worth thinking about. |
Beta Was this translation helpful? Give feedback.
-
I need to revisit the TweetParser soon. I have 30day search code that loads relational tables. So make a parser that understands both formats and writes to a common schema. |
Beta Was this translation helpful? Give feedback.
-
The v2 version of the Python search client does not reference the "TweetParser" class at all. This is by design since the v2 update focuses, not surprisingly, only on v2. One of v2 wins is a unified and updated response JSON.
Older search endpoints returned (at least) two different JSON structures:
However, the v2 JSON is likely to evolve as we release future v2 updates. Versioning of our endpoints is now built into our endpoint URLs, allowing developers to dictate when they move from one version to the next, e.g:
/2/tweets/search/recent --> /2.1/tweets/search/recent
So I can see a day with we revisit the TweetParser and update it to be flexible and elegantly handle upcoming JSON structures.
I'm also pondering how the TweetParser code could be refactored to become a "JSON helper" tool to illustrate how to migrate from v1.1. to v2 JSON. Imagine a piece of code that parses (and stores) v1.1 JSON. Import a new TweetParser update that provides a translation wrapper from the v2 JSON to v1.1-based storage keys/names. This tool could also surface cases where a missing expansion and field specification is preventing a v1.1 attribute to not have a v2 equivalent.
Beta Was this translation helpful? Give feedback.
All reactions