Install

To solve dependencies: ./setup.py develop

Scripts

utpedits2graph.py

Count edits on User Talk Pages and create a graph from it. Save the graph as a pickled iGraph object.

The graph is directed and weighted. For example, two edits made by User A on User B's Talk Page is represented as an edge from A to B with weight = 2.

This script should be used on complete dumps and on stub.

signature2graph.py

Like utpedits2graph.py, but counting signature on User Talk Pages.

This script can be used on current dumps.

enrich.py

Giving a pickled iGraph object, this script downloads useful information about the users (like if the user is a bot, a sysop, ..) from the wikipedia API and creates a new pickled iGraph object.

usercontributions.py

Given a stub dump, this script counts contributions for every user on the whole wikipedia.

Results are stored in a database. Saved informations are:

Field	Type	Description
username	String
lang	String	Data on this user are related to the "lang" wikipedia
normal_edits	Integer	Edits on the article namespace
namespace_edits	String	This is an array of integers. Each integer represents the number of edits made by this user on pages in a namespace. Namespaces are numbered starting from 0 in the order found at the beginning of the XML dump file
first_edit	DateTime	Time of the first (oldest) edit
last_edit	DateTime	Time of the last (most recent) edit
comments_count	Integer	Number of comments left by this user
comments_avg	Float	Comment average length
minor	Integer	Number of minor edits
welcome	Integer	Number of edits with a comment containing the word "welcome"
npov	Integer	Number of edits with a comment containing the word "npov" (neutral point of view)
please	Integer	Number of edits with a comment containing the word "please"
thanks	Integer	Number of edits with a comment containing the word "thanks"
revert	Integer	Number of edits with a comment containing the word "revert"

usercontributions_export.py

Export data collected by usercontributions.py in a CSV file.

events_anniversary.py

This script collects revision times for all the article and talk pages and for a set of desired pages. The purpose of this analysis is to find if pages related to events are changed in a neighbourhood of the anniversary.

Data are stored in a database.

events_analysis.py

The script accepts in input a list of desired pages and the wikipedia language to be analyzed. It retrieves data from db about all the revisions of the specified language and processes revisions' statistics for each found page, such as number of edits, number of unique editors, edits made in a range of days around event's anniversary, etc... Data are outputted in a csv file, bz2 compressed

word_frequency.py

Given a list of words, find the frequency of these words in a random set of pages and in a list of desired pages (and the related talk pages).

Data are stored in a database.

countwords_groups.py

Given a current dump, count words found on every UTP and return the results by group (the group which the user belongs).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Install

Scripts

utpedits2graph.py

signature2graph.py

enrich.py

usercontributions.py

usercontributions_export.py

events_anniversary.py

events_analysis.py

word_frequency.py

countwords_groups.py

Files

README.md

Latest commit

History

README.md

File metadata and controls

Install

Scripts

utpedits2graph.py

signature2graph.py

enrich.py

usercontributions.py

usercontributions_export.py

events_anniversary.py

events_analysis.py

word_frequency.py

countwords_groups.py