graph.Graph build_* constructors #542

martinfleis · 2023-08-06T12:35:33Z

No description provided.

codecov · 2023-08-06T12:46:39Z

Codecov Report

Merging #542 (26605c9) into geographs (b5ae4f5) will increase coverage by 1.4%.
The diff coverage is 98.6%.

@@             Coverage Diff             @@
##           geographs    #542     +/-   ##
===========================================
+ Coverage       79.5%   80.9%   +1.4%     
===========================================
  Files            124     125      +1     
  Lines          14235   14353    +118     
===========================================
+ Hits           11315   11606    +291     
+ Misses          2920    2747    -173

Files Changed	Coverage Δ
libpysal/graph/tests/test_triangulation.py	`100.0% <ø> (ø)`
libpysal/graph/_kernel.py	`62.7% <75.0%> (+43.4%)`	⬆️
libpysal/graph/_contiguity.py	`96.8% <100.0%> (ø)`
libpysal/graph/_triangulation.py	`99.3% <100.0%> (+80.8%)`	⬆️
libpysal/graph/_utils.py	`42.7% <100.0%> (+1.3%)`	⬆️
libpysal/graph/base.py	`97.9% <100.0%> (+7.4%)`	⬆️
libpysal/graph/tests/test_base.py	`100.0% <100.0%> (ø)`
libpysal/graph/tests/test_builders.py	`100.0% <100.0%> (ø)`

... and 2 files with indirect coverage changes

martinfleis · 2023-08-06T15:45:34Z

FYI, the result of _kernel is currently misaligned to the original data. e.g. in the KNN case, the adjacency is equal but the _adjacency table is incorrectly sorted leading to sparse being wrong.

libpysal/graph/tests/test_builders.py

libpysal/graph/tests/test_base.py

libpysal/graph/base.py

libpysal/graph/_kernel.py

libpysal/graph/_triangulation.py

libpysal/graph/base.py

jGaboardi · 2023-08-06T19:01:55Z

FYI, the result of _kernel is currently misaligned to the original data. e.g. in the KNN case, the adjacency is equal but the _adjacency table is incorrectly sorted leading to sparse being wrong.

Should we open a fresh issue for this?

martinfleis · 2023-08-06T19:22:17Z

I fixed that in the meantime. Need to write a test and push.

martinfleis · 2023-08-06T21:31:28Z

From further testing, it is clear that _kernel.py will need more care to ensure efficient queries, filtering of 0s and island handling but not within this PR. Plus that management of collocated points @ljwolf was cooking.

I'll add missing docstrings and this should be ready to go.

ljwolf · 2023-08-07T18:14:20Z

colocated points

Still cooking, but this would only affect the KNN "branch" (if k is not None)

efficient queries

It would help to understand what this means? This implementation was adopted because it was quite a bit faster than our existing Kernel()... but it's still the case that a kernel with no tapering is a full N x N matrix.

On isolates, I notice the .eliminate_zeros() now at the end of the class, which is definitely the right thing to do. I think that the from_sparse() method, then, needs to handle filling the empty rows' diagonals with a structural zero. This also reminds me that from_arrays() needs to have some way of recording/recognizing that the last m row/columns of the adjacency matrix are isolates... which might be done through a shape argument?

martinfleis · 2023-08-07T19:11:07Z

efficient queries

It would help to understand what this means?

Nothing in the end. I thought we'd expose distance band as a specific case of kernel as discussed, in which case we would need a different query but I have it now implemented on a local branch separately.

On isolates, I notice the .eliminate_zeros() now at the end of the class, which is definitely the right thing to do. I think that the from_sparse() method, then, needs to handle filling the empty rows' diagonals with a structural zero.

yup, it is likely that isolates are not yet properly handled in all cases.

This also reminds me that from_arrays() needs to have some way of recording/recognizing that the last m row/columns of the adjacency matrix are isolates... which might be done through a shape argument?

What do you mean "the last m row/columns"? Isolates are not necessarily last in any representation as far as I understand.

knaaptime · 2023-08-07T19:14:49Z

On isolates, I notice the .eliminate_zeros() now at the end of the class, which is definitely the right thing to do. I think that the from_sparse() method, then, needs to handle filling the empty rows' diagonals with a structural zero. This also reminds me that from_arrays() needs to have some way of recording/recognizing that the last m row/columns of the adjacency matrix are isolates... which might be done through a shape argument?

this is exacty what i was struggling with in the shoehorn-implementation i originally drafted. There, the isolates always get appended to the end of the adjlist, so thats why i ended up needing the sort method

martinfleis · 2023-08-07T19:20:44Z

We could potentially use something like I've implemented here instead of eliminate_zeros(). That would ensure that isolates are kept in place.

knaaptime · 2023-08-07T19:21:00Z

(we talked about why we dont want to expose sorting, but if we could, it would make this dead simple, because we could always re-align the graph with the df after subsetting the graph)

ljwolf · 2023-08-07T19:23:00Z

nothing in the end

So, distance band should still allow for kernels underneath the banding, right? Basically, we have three possible implementations:

kernel with no tapering (full kernel)
kernel with KNN tapering (KNN weights, possibly with a kernel function of distance)
kernel with radius tapering (DistanceBand weights, possibly with a kernel function of distance)

It'd be good to figure out a way to keep this API consistent... I had thought that distance banding could be achieved using a kernel weight with a boxcar kernel?

what do you mean?

For the graph A <-> B <-> C with an isolate D, the adjacency matrix is:

import numpy

A = numpy.array([
    [0,1,0,0]
    [1,0,1,0]
    [0,1,0,0]
    [0,0,0,0]
    ])

If we pass arrays:

i,j = A.nonzero()

Then we only know about A,B,C, not D. We would need to provide either ids or, if there are no ids, n? Hence, the scipy.sparse matrix constructors take a shape=(n,m) argument to ensure that if i.max() < n or j.max() < m, the empty end rows/columns are constructed.

knaaptime · 2023-08-07T19:26:28Z

Hence, the scipy.sparse matrix constructors take a shape=(n,m) argument to ensure that if i.max() < n or j.max() < m, the empty end rows/columns are constructed.

and this is how i ended up doing it. Instantiate a matrix that includes rows of zeros, then tell the W what those mean.

so it feels like if we're going to expose id2i, we might as well include sort... because any time we need sparse, the ordering matters and sometimes its useful to control that?

martinfleis · 2023-08-07T19:37:55Z

I had thought that distance banding could be achieved using a kernel weight with a boxcar kernel?

it could but you don't know K to pass to query so you'd need to get all distances and use boxcar to filter that. This is what I meant by more efficient queries above. We should use KDTree.sparse_distance_matrix for distance band instead.

For the graph A <-> B <-> C with an isolate D

Yeah, that is not a solved case at the moment. All these edge cases will need special attention but I guess it is easier to figure out the solution on top of basic implementation.

so it feels like if we're going to expose id2i, we might as well include sort... because any time we need sparse, the ordering matters and sometimes its useful to control that?

id2i should probably be private imho but I just needed to use it in other methods. I am still not sure about sort.

ljwolf · 2023-08-07T19:51:35Z

What about treating knn and distance band as ways to build a "precomputed" distance matrix for kernel to apply a kernel function to?

This would let knn code to live in the knn function, and likewise for radius/distanceband, leaving kernel simpler for all the rest of the methods that pass into it?

martinfleis · 2023-08-07T19:54:02Z

That sounds nice. Let me give it a go since I am touching this as we speak. Will open a PR to discuss on top of code.

martinfleis · 2023-08-07T21:31:06Z

A draft of treating knn and distance band as precomputed matrices is in #544.

martinfleis added 3 commits August 6, 2023 12:01

expose triangulation in build_triangulation

11cef1d

fix scipy imports

9346348

support string ids

0da163a

martinfleis added 3 commits August 6, 2023 16:17

fix sparse

3c080e5

fix unique_ids to preserve order of sparse

eb4c7cb

expose kernel and knn

c227b5f

martinfleis marked this pull request as ready for review August 6, 2023 15:46

martinfleis requested review from sjsrey, ljwolf, knaaptime and jGaboardi August 6, 2023 15:46

jGaboardi reviewed Aug 6, 2023

View reviewed changes

jGaboardi assigned martinfleis Aug 6, 2023

jGaboardi added enhancement graph labels Aug 6, 2023

martinfleis added 4 commits August 6, 2023 22:56

fix from_sparse to ensure proper sorting

d94c3f0

incorporate review comments

3741a4d

add geodatasets to test reqs

ed9335f

_VALID_GEOMETRY_TYPES as lists

ad8e155

add docstrings

26605c9

jGaboardi approved these changes Aug 6, 2023

View reviewed changes

martinfleis merged commit 1428639 into pysal:geographs Aug 7, 2023

martinfleis deleted the constructors branch August 7, 2023 17:20

This was referenced Aug 11, 2023

expand graph.Graph constructors and other functionality #544

Merged

2 considerations for id2i in the new Graph class #549

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graph.Graph build_* constructors #542

graph.Graph build_* constructors #542

martinfleis commented Aug 6, 2023

codecov bot commented Aug 6, 2023 •

edited

Loading

martinfleis commented Aug 6, 2023

jGaboardi commented Aug 6, 2023

martinfleis commented Aug 6, 2023

martinfleis commented Aug 6, 2023

ljwolf commented Aug 7, 2023

martinfleis commented Aug 7, 2023

knaaptime commented Aug 7, 2023

martinfleis commented Aug 7, 2023

knaaptime commented Aug 7, 2023

ljwolf commented Aug 7, 2023 •

edited

Loading

knaaptime commented Aug 7, 2023

martinfleis commented Aug 7, 2023

ljwolf commented Aug 7, 2023

martinfleis commented Aug 7, 2023

martinfleis commented Aug 7, 2023

graph.Graph build_* constructors #542

graph.Graph build_* constructors #542

Conversation

martinfleis commented Aug 6, 2023

codecov bot commented Aug 6, 2023 • edited Loading

Codecov Report

martinfleis commented Aug 6, 2023

jGaboardi commented Aug 6, 2023

martinfleis commented Aug 6, 2023

martinfleis commented Aug 6, 2023

ljwolf commented Aug 7, 2023

martinfleis commented Aug 7, 2023

knaaptime commented Aug 7, 2023

martinfleis commented Aug 7, 2023

knaaptime commented Aug 7, 2023

ljwolf commented Aug 7, 2023 • edited Loading

knaaptime commented Aug 7, 2023

martinfleis commented Aug 7, 2023

ljwolf commented Aug 7, 2023

martinfleis commented Aug 7, 2023

martinfleis commented Aug 7, 2023

codecov bot commented Aug 6, 2023 •

edited

Loading

ljwolf commented Aug 7, 2023 •

edited

Loading