Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DiGraph instead of Graph #229

Closed
Chtchou opened this issue May 11, 2023 · 2 comments
Closed

DiGraph instead of Graph #229

Chtchou opened this issue May 11, 2023 · 2 comments

Comments

@Chtchou
Copy link

Chtchou commented May 11, 2023

Is there a specific reason why pytextrank uses a Directed Graph instead of an undirected Graph?
On your explanatory notebook you use an undirected graph (like the original implementation) but in the source code it has been changed to directed graph instead?

This can creates totally different results

@Ankush-Chander
Copy link
Contributor

Thanks @Chtchou for the heads up.

I did some experiments using DiGraph and Graph(undirected) and founds results to be different.

input text:

Compatibility of systems of linear constraints over the set of natural numbers.
Criteria of compatibility of a system of linear Diophantine equations, strict inequations, and nonstrict inequations are considered.
Upper bounds for components of a minimal set of solutions and algorithms of construction of minimal generating sets of solutions for all types of systems are given.
These criteria and the corresponding algorithms for constructing a minimal supporting set of solutions can be used in solving all the considered types systems and systems of mixed types.

DiGraph with edges from token to neighbour (current setup):

self.ranks: [('system', 0.127), ('type', 0.108), ('consider', 0.066), ('set', 0.063), ('inequation', 0.063), ('solution', 0.046), ('mixed', 0.044), ('give', 0.039), ('minimal', 0.038), ('number', 0.037), ('nonstrict', 0.035), ('linear', 0.029), ('equation', 0.026), ('strict', 0.025), ('algorithm', 0.023), ('solve', 0.023), ('natural', 0.022), ('constraint', 0.021), ('Diophantine', 0.021), ('use', 0.021), ('construction', 0.02), ('generating', 0.02), ('support', 0.018), ('construct', 0.014), ('component', 0.012), ('bound', 0.009), ('compatibility', 0.008), ('corresponding', 0.008), ('criterion', 0.007), ('upper', 0.007)]

Using DiGraph with direction of edges reversed:

self.ranks: [('criterion', 0.104), ('system', 0.068), ('set', 0.067), ('minimal', 0.061), ('solution', 0.056), ('upper', 0.052), ('algorithm', 0.05), ('compatibility', 0.046), ('linear', 0.043), ('bound', 0.031), ('corresponding', 0.031), ('type', 0.03), ('construct', 0.027), ('construction', 0.026), ('Diophantine', 0.025), ('component', 0.025), ('support', 0.025), ('equation', 0.023), ('inequation', 0.023), ('consider', 0.022), ('generating', 0.022), ('strict', 0.021), ('use', 0.021), ('constraint', 0.02), ('solve', 0.02), ('nonstrict', 0.016), ('natural', 0.012), ('mixed', 0.012), ('number', 0.009), ('give', 0.009)]

Using Graph

self.ranks: [('set', 0.087), ('system', 0.079), ('solution', 0.062), ('minimal', 0.057), ('linear', 0.048), ('type', 0.043), ('algorithm', 0.04), ('consider', 0.039), ('inequation', 0.037), ('compatibility', 0.032), ('constraint', 0.029), ('equation', 0.029), ('strict', 0.029), ('criterion', 0.028), ('Diophantine', 0.028), ('construct', 0.028), ('generating', 0.027), ('support', 0.027), ('use', 0.027), ('solve', 0.027), ('component', 0.025), ('construction', 0.023), ('natural', 0.021), ('nonstrict', 0.021), ('bound', 0.021), ('corresponding', 0.02), ('upper', 0.018), ('number', 0.017), ('give', 0.016), ('mixed', 0.016)]

Observations:

  1. current implementation tends to favour the words at the end of the sentences. Notice how consider gets higher rank by virtue of being at the later half of sentences and criterion gets punished for being at the beginning. These rankings get somewhat reversed when direction of edges change.
  2. As per networkx pagerank docs

The PageRank algorithm was designed for directed graphs but this algorithm does not check if the input graph is directed and will execute on undirected graphs by converting each oriented edge in the directed graph to two edges.

Since co-occurence is symetric and we want influence to flow equally in both the directions, I intend to change it to Graph along with test cases changes if required.

@ceteri
Copy link
Collaborator

ceteri commented Feb 21, 2024

Looks great, many thanks @Ankush-Chander and @Chtchou

@ceteri ceteri closed this as completed Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants