Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graph.Graph build_* constructors #542

Merged
merged 11 commits into from
Aug 7, 2023

Conversation

martinfleis
Copy link
Member

No description provided.

@codecov
Copy link

codecov bot commented Aug 6, 2023

Codecov Report

Merging #542 (26605c9) into geographs (b5ae4f5) will increase coverage by 1.4%.
The diff coverage is 98.6%.

Impacted file tree graph

@@             Coverage Diff             @@
##           geographs    #542     +/-   ##
===========================================
+ Coverage       79.5%   80.9%   +1.4%     
===========================================
  Files            124     125      +1     
  Lines          14235   14353    +118     
===========================================
+ Hits           11315   11606    +291     
+ Misses          2920    2747    -173     
Files Changed Coverage Δ
libpysal/graph/tests/test_triangulation.py 100.0% <ø> (ø)
libpysal/graph/_kernel.py 62.7% <75.0%> (+43.4%) ⬆️
libpysal/graph/_contiguity.py 96.8% <100.0%> (ø)
libpysal/graph/_triangulation.py 99.3% <100.0%> (+80.8%) ⬆️
libpysal/graph/_utils.py 42.7% <100.0%> (+1.3%) ⬆️
libpysal/graph/base.py 97.9% <100.0%> (+7.4%) ⬆️
libpysal/graph/tests/test_base.py 100.0% <100.0%> (ø)
libpysal/graph/tests/test_builders.py 100.0% <100.0%> (ø)

... and 2 files with indirect coverage changes

@martinfleis
Copy link
Member Author

FYI, the result of _kernel is currently misaligned to the original data. e.g. in the KNN case, the adjacency is equal but the _adjacency table is incorrectly sorted leading to sparse being wrong.

@martinfleis martinfleis marked this pull request as ready for review August 6, 2023 15:46
libpysal/graph/tests/test_builders.py Show resolved Hide resolved
libpysal/graph/tests/test_base.py Show resolved Hide resolved
libpysal/graph/base.py Show resolved Hide resolved
libpysal/graph/base.py Show resolved Hide resolved
libpysal/graph/_kernel.py Show resolved Hide resolved
libpysal/graph/_triangulation.py Show resolved Hide resolved
libpysal/graph/_triangulation.py Outdated Show resolved Hide resolved
libpysal/graph/base.py Outdated Show resolved Hide resolved
@jGaboardi
Copy link
Member

FYI, the result of _kernel is currently misaligned to the original data. e.g. in the KNN case, the adjacency is equal but the _adjacency table is incorrectly sorted leading to sparse being wrong.

Should we open a fresh issue for this?

@martinfleis
Copy link
Member Author

I fixed that in the meantime. Need to write a test and push.

@martinfleis
Copy link
Member Author

From further testing, it is clear that _kernel.py will need more care to ensure efficient queries, filtering of 0s and island handling but not within this PR. Plus that management of collocated points @ljwolf was cooking.

I'll add missing docstrings and this should be ready to go.

@martinfleis martinfleis merged commit 1428639 into pysal:geographs Aug 7, 2023
@martinfleis martinfleis deleted the constructors branch August 7, 2023 17:20
@ljwolf
Copy link
Member

ljwolf commented Aug 7, 2023

colocated points

Still cooking, but this would only affect the KNN "branch" (if k is not None)

efficient queries

It would help to understand what this means? This implementation was adopted because it was quite a bit faster than our existing Kernel()... but it's still the case that a kernel with no tapering is a full N x N matrix.

On isolates, I notice the .eliminate_zeros() now at the end of the class, which is definitely the right thing to do. I think that the from_sparse() method, then, needs to handle filling the empty rows' diagonals with a structural zero. This also reminds me that from_arrays() needs to have some way of recording/recognizing that the last m row/columns of the adjacency matrix are isolates... which might be done through a shape argument?

@martinfleis
Copy link
Member Author

efficient queries

It would help to understand what this means?

Nothing in the end. I thought we'd expose distance band as a specific case of kernel as discussed, in which case we would need a different query but I have it now implemented on a local branch separately.

On isolates, I notice the .eliminate_zeros() now at the end of the class, which is definitely the right thing to do. I think that the from_sparse() method, then, needs to handle filling the empty rows' diagonals with a structural zero.

yup, it is likely that isolates are not yet properly handled in all cases.

This also reminds me that from_arrays() needs to have some way of recording/recognizing that the last m row/columns of the adjacency matrix are isolates... which might be done through a shape argument?

What do you mean "the last m row/columns"? Isolates are not necessarily last in any representation as far as I understand.

@knaaptime
Copy link
Member

On isolates, I notice the .eliminate_zeros() now at the end of the class, which is definitely the right thing to do. I think that the from_sparse() method, then, needs to handle filling the empty rows' diagonals with a structural zero. This also reminds me that from_arrays() needs to have some way of recording/recognizing that the last m row/columns of the adjacency matrix are isolates... which might be done through a shape argument?

this is exacty what i was struggling with in the shoehorn-implementation i originally drafted. There, the isolates always get appended to the end of the adjlist, so thats why i ended up needing the sort method

@martinfleis
Copy link
Member Author

We could potentially use something like I've implemented here instead of eliminate_zeros(). That would ensure that isolates are kept in place.

@knaaptime
Copy link
Member

(we talked about why we dont want to expose sorting, but if we could, it would make this dead simple, because we could always re-align the graph with the df after subsetting the graph)

@ljwolf
Copy link
Member

ljwolf commented Aug 7, 2023

nothing in the end

So, distance band should still allow for kernels underneath the banding, right? Basically, we have three possible implementations:

  • kernel with no tapering (full kernel)
  • kernel with KNN tapering (KNN weights, possibly with a kernel function of distance)
  • kernel with radius tapering (DistanceBand weights, possibly with a kernel function of distance)

It'd be good to figure out a way to keep this API consistent... I had thought that distance banding could be achieved using a kernel weight with a boxcar kernel?

what do you mean?

For the graph A <-> B <-> C with an isolate D, the adjacency matrix is:

import numpy

A = numpy.array([
    [0,1,0,0]
    [1,0,1,0]
    [0,1,0,0]
    [0,0,0,0]
    ])

If we pass arrays:

i,j = A.nonzero()

Then we only know about A,B,C, not D. We would need to provide either ids or, if there are no ids, n? Hence, the scipy.sparse matrix constructors take a shape=(n,m) argument to ensure that if i.max() < n or j.max() < m, the empty end rows/columns are constructed.

@knaaptime
Copy link
Member

Hence, the scipy.sparse matrix constructors take a shape=(n,m) argument to ensure that if i.max() < n or j.max() < m, the empty end rows/columns are constructed.

and this is how i ended up doing it. Instantiate a matrix that includes rows of zeros, then tell the W what those mean.

so it feels like if we're going to expose id2i, we might as well include sort... because any time we need sparse, the ordering matters and sometimes its useful to control that?

@martinfleis
Copy link
Member Author

I had thought that distance banding could be achieved using a kernel weight with a boxcar kernel?

it could but you don't know K to pass to query so you'd need to get all distances and use boxcar to filter that. This is what I meant by more efficient queries above. We should use KDTree.sparse_distance_matrix for distance band instead.

For the graph A <-> B <-> C with an isolate D

Yeah, that is not a solved case at the moment. All these edge cases will need special attention but I guess it is easier to figure out the solution on top of basic implementation.

so it feels like if we're going to expose id2i, we might as well include sort... because any time we need sparse, the ordering matters and sometimes its useful to control that?

id2i should probably be private imho but I just needed to use it in other methods. I am still not sure about sort.

@ljwolf
Copy link
Member

ljwolf commented Aug 7, 2023

What about treating knn and distance band as ways to build a "precomputed" distance matrix for kernel to apply a kernel function to?

This would let knn code to live in the knn function, and likewise for radius/distanceband, leaving kernel simpler for all the rest of the methods that pass into it?

@martinfleis
Copy link
Member Author

That sounds nice. Let me give it a go since I am touching this as we speak. Will open a PR to discuss on top of code.

@martinfleis
Copy link
Member Author

A draft of treating knn and distance band as precomputed matrices is in #544.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants