Skip to content

Latest commit

 

History

History
76 lines (42 loc) · 4.31 KB

README.md

File metadata and controls

76 lines (42 loc) · 4.31 KB

Data Pipeline Description

This set of scripts is used to enrich citation data with links to dbpedia and wikidata and then transform that citation data into a dynamic GML graph.

Data Source

The citation data source for this script was downloaded from here: https://sites.google.com/site/vispubdata/home

It contains information on IEEE Visualization (IEEE VIS) publications from 1990-2020.

Data Citation: Petra Isenberg, Florian Heimerl, Steffen Koch, Tobias Isenberg, Panpan Xu, Chad Stolper, Michael Sedlmair, Jian Chen, Torsten Möller, and John Stasko. vispubdata.org: A Metadata Collection about IEEE Visualization (VIS) Publications. IEEE Transactions on Visualization and Computer Graphics, 23(9):2199–2206, September 2017. (doi: 10.1109/TVCG.2016.2615308)

Data Pipeline Summary

The data pipeline is as follows:

Download from https://sites.google.com/site/vispubdata/home

publications.csv --> Transform to JSON using CSVtoJSON.py

publications.json --> Enrich with DBpedia and WikiData links using get-concepts.py.

enriched-publications.json --> Transform JSON to XML using JSONtoXML.py

↓ enriched-publications.xml (Intermediate file generated by JSONtoXML.py script) ↓

enriched-publications-eprints-model.xml --> Transform to a dynamic co-concept graph in GML format using Pig Latin script eprints-items-publications-date-merged-edges.pig

OUTPUT/merged-file-co_node-dynamic-gml-with_edge_labels-withheader.gml --> Open directly with Gephi, apply layout and visual mappings, save and export renders

Interactive visualization

These allow for online interaction with the graph (search, community display, zoom/pan, etc.). The sigma export interactive visualization results are here:

Renders Folder

A folder with some exported PNG files of visualizations of the GML graph.
https://github.com/photomedia/citationDataEnrichTransform/tree/main/renders

  • Giant Component, Node Size mapped to Betweenness Centrality (BC) on a spline.
  • Giant Component, Node Size mapped to Betweenness Centrality (BC) on a spline, Filter Nodes with BC greater than .01
  • Giant Component, Node Size mapped to Betweenness Centrality (BC) on a spline, Filter Nodes with Degree greater than 10
  • Giant Component, Node Size mapped to Betweenness Centrality (BC) on a spline, Filter only Concepts that are related by publications from more than 1 conference
  • Temporal Filters
    • Filter leaving only concept relations that span 25 years or longer
    • Filter by time 1990-2000, 2000-2010, 2010-2020
    • Filter by time 1990-2000, 2000-2010, 2010-2020 and Duration of concept relations LESS than 10 years
    • Filter by time (2015-2020) and Duration of concept relations LESS than 5 years