Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grab development-level statistics for NYCHA buildings live from nycdb #447

Merged
merged 15 commits into from
Apr 26, 2021

Conversation

sraby
Copy link
Member

@sraby sraby commented Apr 22, 2021

This PR makes use of our new nycha_bbls_18 table in nycdb, allowing us to detect whether a building address is part of NYCHA, and then subsequently find data about the development (like # of evictions to date) directly via a live connection to the nycdb database.

While there are some minor changes to the front end, the meat of this PR is in the evolution of the API endpoint at api/address/buildinginfo, which when given a bbl (like api/address/buildinginfo?bbl=2037250001), will run an updated SQL query to grab statistics on the bbl's NYCHA development if it exists.

There is a rare edge case (less than 10 bbls city-wide) where one particular BBL is shared between two different NYCHA developments. For example, bbl 1018550001 is part of both DOUGLASS I and DOUGLASS II. We handle these cases by aggregating eviction/unit counts across both developments, and showing both development names listed like "DOUGLASS I / DOUGLASS II" on the top of the page.

@sraby
Copy link
Member Author

sraby commented Apr 22, 2021

Fixes #337

Copy link
Contributor

@samaratrilling samaratrilling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some questions for my own edification, nothing blocking!

client/src/components/APIDataTypes.ts Outdated Show resolved Hide resolved
SELECT
DISTINCT DEVELOPMENT
FROM NYCHA_BBLS_18
WHERE DEVELOPMENT !~ 'POLICE SERVICE AREA'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question
!~ means does not match regular expression, case sensitive, right? Just wondering if the data ever would be expected to have a lowercase or mixed case version of this, in which case we might want !~*
https://www.postgresql.org/docs/8.1/functions-matching.html#FUNCTIONS-POSIX-TABLE

Also this is probably premature optimization but I think if the list of developments ever gets huge, using LIKE instead of a regex might be faster.

Copy link
Member Author

@sraby sraby Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch @samaratrilling! So, the data as it is now includes only uppercase strings in the development field, but I think that's simply bc of how the original source data is formatted. That being said, I think regardless we should use !~* as you pointed out just to make clear to anyone looking at the code that the match doesn't need to be case sensitive.

And hmmm interesting about the LIKE optimization. I'm gonna leave the regex checking as is since this is such a small table but that's definitely something we should look into for our other large queries. I honestly don't know much about the differences between using LIKE and something like !~* so would love to hear more.

SELECT BBL, COUNT(*) EVICTIONS FROM MARSHAL_EVICTIONS_ALL
WHERE RESIDENTIALCOMMERCIALIND = 'RESIDENTIAL'
GROUP BY BBL
) E USING(BBL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this E mean?

Copy link
Member Author

@sraby sraby Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So E is just the name (short for "Evictions") that I'm assigning to the subquery inside the preceding parentheses. I tried to keep it consistent with the alias names I was assigning the other tables in the query (N and P):

FROM NYCHA_BBLS_18 N
LEFT JOIN PLUTO_19V2 P...

Copy link
Contributor

@toolness toolness left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this looks great! A few comments below but no blockers.

client/src/components/APIDataTypes.ts Outdated Show resolved Hide resolved
Comment on lines 172 to 174
const buildingInfoResults = await APIClient.getBuildingInfo(apiResults.geosearch.bbl);
const nychaData = getNychaData(apiResults.geosearch.bbl);
const buildingInfo = buildingInfoResults.result[0];
const nychaStatsResults = await APIClient.getNychaStats(apiResults.geosearch.bbl);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, does this effectively mean that we're making two separate API calls to the server for every building a user looks up? I'm wondering if we actually want to just add NYCHA info to the getBuildingInfo() API endpoint instead, but that's potentially a premature optimization... Might be worth filing an issue to consider combining the two endpoints in the future, though?

Also note that because this code consists of two sequential await calls, this effectively means that we're making one request, waiting for the result, and then making the other request, which could be slower than e.g. using Promise.all() to run them in parallel, which could be faster (although it depends on how our server processes requests too). Again, probably not a big deal in the grand scheme of things, but it might be worth doing a bit of measurement up-front using e.g. the Chrome network panel to see if/how performance is affected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toolness I was inspired by this comment and decided to include the nycha stats SQL query as part of the building info query so we only need to make one network request. Do you mind reviewing my work once more?

@sraby sraby requested a review from toolness April 26, 2021 17:59
Copy link
Contributor

@toolness toolness left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good! Just one comment, but it's not a blocker.

Comment on lines 123 to 124
/** The name of the NYCHA development (e.g. "SOTOMAYOR HOUSES"). NULL if building is not part of NYCHA. */
development: string | null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, since this is now part of general building info, do you think these NYCHA-specific fields should have slightly different names? For instance, what if they were called nycha_development, nycha_dev_evictions, and nycha_dev_unitsres, just so clients (including e.g. Jade, who might be using this endpoint) aren't confused?

I guess that begs the question of why we don't put this information under a nycha object, which I would be an additional hassle for us right now, but might make things easier down the road... for example, we'd just have to test if nycha is null rather than having to assert that all three NYCHA-specific fields are non-null. I'll leave those decisions up to you though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about appending nycha_ to each nycha-specific field. For clarification about your second point, we actually don't check if all three NYCHA-specific fields are non-null, we only check developments as the other two can conceivably be NULL simply if there are no evictions or no res. units listed on PLUTO. So, I'm gonna file this as an issue now but since we only check to see if developments is populated it doesn't seem to add much complexity from my end IMO. Let me know if I'm missiong something

Comment on lines +125 to +131
nycha_development: string | null;
/** Total executed residential evictions (since 2017) accross the building's entire NYCHA development.
* NULL values either mean 0 evictions took place across the development or building is not part of NYCHA. */
nycha_dev_evictions: number | null;
/** Total residential units accross the building's entire NYCHA development.
* NULL values either mean PLUTO listed 0 residential units across the development or building is not part of NYCHA. */
nycha_dev_unitsres: number | null;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toolness I updated these comments to be more descriptive of what NULL values mean in each field here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh nice that's great! Cool, you can disregard my other comment about having a nycha object then, thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ok!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants