Compress and Store the Twitter Public Sample

Intro

This project allows you to run a set-it-and-forget-it system for streaming the Twitter public sample stream to a local set of gzipped files. Each file will contain the JSON for all the tweets from the 1% stream.

Building

This project was written with Maven, so you should be able to do mvn package and use the resulting jar file TwitterStreamFilter-1.0-SNAPSHOT-jar-with-dependencies.jar in the target directory.

Running

You have a few options with this code. If you run it without arguments, it should download the unfiltered 1% stream. You can run code for that as follows:

java -Xmx1536m -jar TwitterStreamFilter-1.0-SNAPSHOT-jar-with-dependencies.jar

You can also provide a file with keywords to track (separated by a newline) using the --keywords/-k flag, a GeoJSON file to get tweets within a geographic area using the --bounds/-b flag, or the path to a file with a list of user IDs to track with the --users/-u flag.

While running, this code will produce several files: warnings.log.YYYY-MM-DD-HH, statuses.log.YYYY-MM-DD-HH, and many .gz files. At the end of every hour, the current warnings.log. and statuses.log.* will be gzipped automatically.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml
twitter4j.properties		twitter4j.properties
usa.geojson		usa.geojson

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compress and Store the Twitter Public Sample

Intro

Building

Running

About

Releases

Packages

Languages

License

cbuntain/TwitterStreamFilter

Folders and files

Latest commit

History

Repository files navigation

Compress and Store the Twitter Public Sample

Intro

Building

Running

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages