Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dataset] Chameleon #5477

Merged
merged 10 commits into from
Mar 23, 2023
Merged

[Dataset] Chameleon #5477

merged 10 commits into from
Mar 23, 2023

Conversation

mufeili
Copy link
Member

@mufeili mufeili commented Mar 21, 2023

Description

This PR adds the variant of Chameleon as introduced in Geom-GCN: Geometric Graph Convolutional Networks.

A few thoughts and suggested practice:

  • Handling changes of the data source. In general, this should be very rare. If the data source indeed changes, the url will need to be changed as well. This PR appends a hash value based on URL to the data directory so that the data can be re-downloaded and processed by checking if the directory exists.
  • DGLGraph construction
    • If the graph is relatively small, then constructing it from the raw data files is very efficient as in this PR. In this case, I suggest not caching any DGL-specific data like DGLGraph. This can save a lot of trouble like if the graph is constructed in the desired way.
    • If the graph is very large, then we should have some file to track the configuration associated as I suggested in Add versioning to all DGLDatasets to force reloading when codes are changed #4293.
  • Test and regression
    • For small graphs as in this PR, unit tests are cheap enough.
    • For larger graphs, then we can have very preliminary unit tests or even no unit tests at all and instead rely on regression tests.

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • I've leverage the tools to beautify the python and c++ code.
  • The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
  • All changes have test coverage
  • Code is well-documented
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
  • Related issue is referred in this PR
  • If the PR is for a new model/paper, I've updated the example index here.

Changes

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 21, 2023

To trigger regression tests:

  • @dgl-bot run [instance-type] [which tests] [compare-with-branch];
    For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

@mufeili mufeili mentioned this pull request Mar 21, 2023
6 tasks
@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 21, 2023

Commit ID: c3a7059c15282d726cd444c023c77bec83fed07f

Build ID: 1

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

)
self._g.ndata["test_mask"] = F.astype(
F.stack(test_masks, dim=1), F.data_type_dict["bool"]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general question is whether the logic of data processing / graph construction should belong to DGL core codebase or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon the offline discussion, the general principle will be including the logic of data processing in the core DGL codebase.

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 21, 2023

Commit ID: 877deeee1d22dbf34b97865e1b717d065c044a66

Build ID: 2

Status: ❌ CI test failed in Stage [Tensorflow CPU Unit test].

Report path: link

Full logs path: link

python/dgl/data/dgl_dataset.py Outdated Show resolved Hide resolved
python/dgl/data/dgl_dataset.py Show resolved Hide resolved
python/dgl/data/wiki_network.py Outdated Show resolved Hide resolved
python/dgl/data/wiki_network.py Outdated Show resolved Hide resolved
python/dgl/data/wiki_network.py Outdated Show resolved Hide resolved
python/dgl/data/wiki_network.py Outdated Show resolved Hide resolved
python/dgl/data/wiki_network.py Outdated Show resolved Hide resolved
class ChameleonDataset(WikiNetworkDataset):
"""Wikipedia page-page network on chameleons from `Multi-scale Attributed
Node Embedding <https://arxiv.org/abs/1909.13021>`__ and later processed by
`Geom-GCN: Geometric Graph Convolutional Networks
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Geom-GCN is related? Is this dataset designated for Geom-GCN?

Copy link
Member Author

@mufeili mufeili Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Geom-GCN introduced this variant of the dataset, including turning the task from node regression into node classification, modifying node features, and introducing these dataset splits.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reword it. Currently working sounds like this dataset is just used by Geom-GCN.
e.g. later processed by --> introduced by.

Wikipedia page-page network on chameleons from `Multi-scale Attributed
    Node Embedding <https://arxiv.org/abs/1909.13021>`__, introduced by
    `Geom-GCN: Geometric Graph Convolutional Networks

Copy link
Member Author

@mufeili mufeili Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did "later processed by" -> "later modified by"

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 23, 2023

Commit ID: 486fa23b33f5bbd5e4b3d6974a6e458be9f4f33c

Build ID: 3

Status: ❌ CI test failed in Stage [Torch CPU Unit test].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 23, 2023

Commit ID: 84f64ba

Build ID: 4

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 23, 2023

Commit ID: 0d4ab4f

Build ID: 5

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 23, 2023

Commit ID: 117f833

Build ID: 6

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 23, 2023

Commit ID: 66c0291

Build ID: 7

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 23, 2023

Commit ID: 25a2aa8

Build ID: 8

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@mufeili mufeili merged commit 9e532e7 into dmlc:master Mar 23, 2023
@mufeili mufeili deleted the homophily branch March 23, 2023 14:34
chang-l pushed a commit to chang-l/dgl that referenced this pull request Mar 29, 2023
* update

* update

* update

* lint

* update

* CI

* lint

* update doc

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-36-188.ap-northeast-1.compute.internal>
@mufeili mufeili mentioned this pull request Jun 12, 2023
8 tasks
DominikaJedynak pushed a commit to DominikaJedynak/dgl that referenced this pull request Mar 12, 2024
* update

* update

* update

* lint

* update

* CI

* lint

* update doc

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-36-188.ap-northeast-1.compute.internal>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants