Illustrates how the Microsoft Academic Graph is used to generate the data and graphs in the Microsoft Academic Graph blog.
- Set up provisioning of Microsoft Academic Graph to an Azure blob storage account
- Set up Azure Data Lake Analytics for Microsoft Academic Graph
Before you begin, you should have these items of information:
✔️ The name of your Azure Storage (AS) account containing MAG dataset from Get Microsoft Academic Graph on Azure storage.
✔️ The name of your Azure Data Lake Analytics (ADLA) service from Set up Azure Data Lake Analytics.
✔️ The name of your Azure Data Lake Storage (ADLS) from Set up Azure Data Lake Analytics.
✔️ The name of the container in your Azure Storage (AS) account containing MAG dataset.
- Upload the list of 105 most impactful CS conferences to a Azure Storage, it can be the same storage containing MAG dataset.
Download TopCSConferences.txt to your local drive.
From Azure portal, go to the Azure Storage account > Containers > Upload > Select TopCSConferences.txt from your local drive > Upload
- The authors' affiliation locations are used as the paper locations. For your convenience, we included the affiliation-region mapping in AffiliationRegions.txt. This file has all the affiliations involved in this analysis. Upload this file to the same location as TopCSConferences.txt.
If you wish to get more affiliations location, MAG Affiliation.txt contains latitude and longitude for each affiliation. You can use Bing Map API to get the region/country from the coordinates.
In prerequisite Set up Azure Data Lake Analytics, you added the Azure Storage (AS) created for MAG provision as a data source for the Azure Data Lake Analytics service (ADLA). In this section, you submit an ADLA job to create functions extracting MAG and conference data from Azure Storage (AS).
-
Download
samples/CreateFunctions.usql
to your local drive.
From Azure portal, go to the Azure Storage account > Containers > [mag-yyyy-mm-dd] > samples > CreateFunctions.usql > Download. -
Download TopCSConf_CreateFunctions.usql to your local drive.
-
Go to the Azure Data Lake Analytics (ADLA) service that you created, and select Overview > New job > Open file. Select
CreateFunctions.usql
in your local drive.
Select Submit. -
The job should finish successfully.
-
Repeat step 3-4 with TopCSConf_CreateFunctions.usql.
In this section, you submit an ADLA job to count publications for each conference and region.
-
Download TopCSConferencesByRegion.usql to local drive.
-
Replace placeholder values in the script using the table below.
Value Description <MagContainer>
The container name in Azure Storage (AS) account containing MAG dataset, usually in the form of mag-yyyy-mm-dd. <AzureStorageAccount>
The name of your Azure Storage (AS) account containing MAG dataset. <SourceFileContainer>
The container name in Azure Storage (AS) account containing TopCSConferences.txt and AffilicationRegions.txt dataset. <SourceFileStorageAccount>
The name of your Azure Storage (AS) account containing TopCSConferences.txt and AffilicationRegions.txt dataset. -
In the Azure portal, go to the Azure Data Lake Analytics (ADLA) service that you created, and select Overview > New Job > Open file.
Select TopCSConferencesByRegion.usql from your local drive. -
Change AUs to 10, and select Submit.
- The job should finish successfully in about 8 minutes.
The output of the ADLA job in previous section goes to "/Output/TopCSConferencePaperRegions.tsv" in the Azure Data Lake Storage (ADLS). In this section, you use Azure portal to view output content.
-
In the Azure portal, go to the Azure Data Lake Storage (ADLS) service that you created, and select Data Explorer > Output > TopCSConferencePaperRegions.tsv.
-
TopCSConferencePaperRegions.tsv column definition as below.
Column # Description Example 0 Conference Series Short Name AAAI 1 Conference Series Long Name National Conference on Artificial Intelligence 2 Conference Domain Artificial Intelligence and Related 3 Conference Instance Name AAAI 2000 4 Conference Location Austin, TX, USA 5 Conference Start Date 2000-07-30T00:00:00.000 6 Conference End Date 2000-08-03T00:00:00.000 7 Year 2000 8 First Author Affiliation Region Canada 9 All Authors Count 24 10 First Author Count 12 -
You can download this data, using Excel or any your preferred tool for further analysis.
If you're interested in Academic analytics and visualization, we have created U-SQL samples that use some of the same functions referenced in this tutorial.
[!div class="nextstepaction"] Analytics and visualization samples