This repository contains several sets of attributed graph data with ground truth classes (The information on the ground truth clusters is contained in the file "Class_info"). These datasets are processed, cleaned, and used in our previous works listed below. Please cite these papers when you use these datasets in your research works.
-
T. He, and K.C.C. Chan, "Discovering Fuzzy Structural Patterns for Graph Analytics," IEEE Transactions on Fuzzy Systems, vol. 26, no. 5, pp. 2785-2796, 2018.
-
T. He, and K.C.C. Chan, "MISAGA: An Algorithm for Mining Interesting Subgraphs in Attributed Graphs," IEEE Transactions on Cybernetics, vol. 48, no. 5, pp. 1369-1382, 2018.
-
T. He, Y. Liu, T.H. Ko, K.C.C. Chan, and Y.S. Ong, "Contextual Correlation Preserving Multiview Featured Graph Clustering," IEEE Transactions on Cybernetics, vol. 50, no. 10, pp. 4318 - 4331, 2020.
-
T. He, L. Bai and Y.S. Ong, "Vicinal Vertex Allocation for Matrix Factorization in Networks," IEEE Transactions on Cybernetics, 2021.
Here is the detail information on the these datasets:
- Twitter:
The version in this repository is processed by Tiantian He. The dataset contains 2511 vertices, 37154 edges, and 9073 node attributes representing twitter users, friendship between these users, and user profile and words in tweets, respectively.
- Facebook:
The version in this repository is processed by Tiantian He. The dataset contains 4039 vertices, 88234 edges, and 1283 node attributes representing facebook users, friendship between these users, and user profiles of these users, respectively.
- Googleplus:
The version in this repository is processed by Tiantian He. The dataset contains 7856 vertices, 321268 edges, and 2024 node attributes representing googleplus users, friendship between these users, and user profiles of these users, respectively.
Notice: The raw data (uncleaned, not processed) can be collected via http://snap.stanford.edu/data/.