Update the TweetParser code for v2? #123

jimmoffitt · 2021-04-14T18:40:27Z

jimmoffitt
Apr 14, 2021
Maintainer

The v2 version of the Python search client does not reference the "TweetParser" class at all. This is by design since the v2 update focuses, not surprisingly, only on v2. One of v2 wins is a unified and updated response JSON.

Older search endpoints returned (at least) two different JSON structures:

The "original" or "native" JSON provided by the legacy v1.1 endpoints, including the standard search version.
The "Activity Stream (AS)" format first implemented by Gnip before being acquired by Twitter.
Enterprise search served both formats.

However, the v2 JSON is likely to evolve as we release future v2 updates. Versioning of our endpoints is now built into our endpoint URLs, allowing developers to dictate when they move from one version to the next, e.g:
/2/tweets/search/recent --> /2.1/tweets/search/recent

So I can see a day with we revisit the TweetParser and update it to be flexible and elegantly handle upcoming JSON structures.

I'm also pondering how the TweetParser code could be refactored to become a "JSON helper" tool to illustrate how to migrate from v1.1. to v2 JSON. Imagine a piece of code that parses (and stores) v1.1 JSON. Import a new TweetParser update that provides a translation wrapper from the v2 JSON to v1.1-based storage keys/names. This tool could also surface cases where a missing expansion and field specification is preventing a v1.1 attribute to not have a v2 equivalent.

igorbrigadir · 2021-04-14T20:03:17Z

igorbrigadir
Apr 14, 2021

This is interesting to us at twarc too: DocNow/twarc#379 we had an idea to have a v1.1 -> v2 and v2 -> v1.1 converter to make old scripts compatible.

Separately, a lot of people are stuck thinking in CSVs, so we support that too https://github.com/DocNow/twarc-csv often, the CSV imported into a dataframe in R or Pandas is the starting point for analysis for people.

The "JSON Helper" idea is definitely worth thinking about.

0 replies

jimmoffitt · 2021-04-15T22:29:04Z

jimmoffitt
Apr 15, 2021
Maintainer Author

I need to revisit the TweetParser soon. I have 30day search code that loads relational tables. So make a parser that understands both formats and writes to a common schema.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the TweetParser code for v2? #123

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Update the TweetParser code for v2? #123

jimmoffitt Apr 14, 2021 Maintainer

Replies: 2 comments

igorbrigadir Apr 14, 2021

jimmoffitt Apr 15, 2021 Maintainer Author

jimmoffitt
Apr 14, 2021
Maintainer

igorbrigadir
Apr 14, 2021

jimmoffitt
Apr 15, 2021
Maintainer Author