Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrote existing parser with a new logic to detect tweets #229

Merged
merged 2 commits into from
Mar 8, 2023

Conversation

Bhargav-Dave
Copy link
Contributor

The earlier parser detected tweets by using RegEx queries and matching the Xpath to the tweet text elements with certain given Xpaths.

However, I discovered that every DIV on the twitter DOM that contains a tweet text has an attribute titled 'data-testid' whose value is always 'tweetText'

Hence, the new method uses the 'querySelectorAll()' method (ref: https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelectorAll) of the DOM in order to select all such DIVs whose data-testid is set to tweetText. This gives an array of all DIVs in the current DOM that contain the spans which contains the tweetText.

This array is discovered in the file parser-v2.js and then sent to the file transform-v2.js and processed there where a for loop runs through all the DIVs and does the processing that was done in transform.js

@Bhargav-Dave
Copy link
Contributor Author

Is related to: #179

@dennyabrain
Copy link
Contributor

hey @Bhargav-Dave, there's two concerns with this approach :

  1. data-testid is added to make testing easy for the twitter developer. its possible they will take it out someday
  2. this works for the tweet text but can you extract tweet url, timestamp and author handle using this method?

@dennyabrain dennyabrain merged commit 0a49c89 into tattle-made:main Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants