-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grab development-level statistics for NYCHA buildings live from nycdb #447
Conversation
Fixes #337 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some questions for my own edification, nothing blocking!
wow/sql/address_nychastats.sql
Outdated
SELECT | ||
DISTINCT DEVELOPMENT | ||
FROM NYCHA_BBLS_18 | ||
WHERE DEVELOPMENT !~ 'POLICE SERVICE AREA' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question
!~
means does not match regular expression, case sensitive
, right? Just wondering if the data ever would be expected to have a lowercase or mixed case version of this, in which case we might want !~*
https://www.postgresql.org/docs/8.1/functions-matching.html#FUNCTIONS-POSIX-TABLE
Also this is probably premature optimization but I think if the list of developments ever gets huge, using LIKE
instead of a regex might be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch @samaratrilling! So, the data as it is now includes only uppercase strings in the development
field, but I think that's simply bc of how the original source data is formatted. That being said, I think regardless we should use !~*
as you pointed out just to make clear to anyone looking at the code that the match doesn't need to be case sensitive.
And hmmm interesting about the LIKE
optimization. I'm gonna leave the regex checking as is since this is such a small table but that's definitely something we should look into for our other large queries. I honestly don't know much about the differences between using LIKE
and something like !~*
so would love to hear more.
wow/sql/address_nychastats.sql
Outdated
SELECT BBL, COUNT(*) EVICTIONS FROM MARSHAL_EVICTIONS_ALL | ||
WHERE RESIDENTIALCOMMERCIALIND = 'RESIDENTIAL' | ||
GROUP BY BBL | ||
) E USING(BBL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this E mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So E
is just the name (short for "Evictions") that I'm assigning to the subquery inside the preceding parentheses. I tried to keep it consistent with the alias names I was assigning the other tables in the query (N
and P
):
FROM NYCHA_BBLS_18 N
LEFT JOIN PLUTO_19V2 P...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this looks great! A few comments below but no blockers.
client/src/state-machine.ts
Outdated
const buildingInfoResults = await APIClient.getBuildingInfo(apiResults.geosearch.bbl); | ||
const nychaData = getNychaData(apiResults.geosearch.bbl); | ||
const buildingInfo = buildingInfoResults.result[0]; | ||
const nychaStatsResults = await APIClient.getNychaStats(apiResults.geosearch.bbl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, does this effectively mean that we're making two separate API calls to the server for every building a user looks up? I'm wondering if we actually want to just add NYCHA info to the getBuildingInfo()
API endpoint instead, but that's potentially a premature optimization... Might be worth filing an issue to consider combining the two endpoints in the future, though?
Also note that because this code consists of two sequential await
calls, this effectively means that we're making one request, waiting for the result, and then making the other request, which could be slower than e.g. using Promise.all()
to run them in parallel, which could be faster (although it depends on how our server processes requests too). Again, probably not a big deal in the grand scheme of things, but it might be worth doing a bit of measurement up-front using e.g. the Chrome network panel to see if/how performance is affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toolness I was inspired by this comment and decided to include the nycha stats SQL query as part of the building info query so we only need to make one network request. Do you mind reviewing my work once more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks good! Just one comment, but it's not a blocker.
/** The name of the NYCHA development (e.g. "SOTOMAYOR HOUSES"). NULL if building is not part of NYCHA. */ | ||
development: string | null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, since this is now part of general building info, do you think these NYCHA-specific fields should have slightly different names? For instance, what if they were called nycha_development
, nycha_dev_evictions
, and nycha_dev_unitsres
, just so clients (including e.g. Jade, who might be using this endpoint) aren't confused?
I guess that begs the question of why we don't put this information under a nycha
object, which I would be an additional hassle for us right now, but might make things easier down the road... for example, we'd just have to test if nycha
is null
rather than having to assert that all three NYCHA-specific fields are non-null. I'll leave those decisions up to you though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about appending nycha_
to each nycha-specific field. For clarification about your second point, we actually don't check if all three NYCHA-specific fields are non-null, we only check developments
as the other two can conceivably be NULL simply if there are no evictions or no res. units listed on PLUTO. So, I'm gonna file this as an issue now but since we only check to see if developments
is populated it doesn't seem to add much complexity from my end IMO. Let me know if I'm missiong something
nycha_development: string | null; | ||
/** Total executed residential evictions (since 2017) accross the building's entire NYCHA development. | ||
* NULL values either mean 0 evictions took place across the development or building is not part of NYCHA. */ | ||
nycha_dev_evictions: number | null; | ||
/** Total residential units accross the building's entire NYCHA development. | ||
* NULL values either mean PLUTO listed 0 residential units across the development or building is not part of NYCHA. */ | ||
nycha_dev_unitsres: number | null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toolness I updated these comments to be more descriptive of what NULL values mean in each field here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh nice that's great! Cool, you can disregard my other comment about having a nycha
object then, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ok!
This PR makes use of our new
nycha_bbls_18
table in nycdb, allowing us to detect whether a building address is part of NYCHA, and then subsequently find data about the development (like # of evictions to date) directly via a live connection to the nycdb database.While there are some minor changes to the front end, the meat of this PR is in the evolution of the API endpoint at
api/address/buildinginfo
, which when given a bbl (likeapi/address/buildinginfo?bbl=2037250001
), will run an updated SQL query to grab statistics on the bbl's NYCHA development if it exists.There is a rare edge case (less than 10 bbls city-wide) where one particular BBL is shared between two different NYCHA developments. For example, bbl
1018550001
is part of bothDOUGLASS I
andDOUGLASS II
. We handle these cases by aggregating eviction/unit counts across both developments, and showing both development names listed like "DOUGLASS I / DOUGLASS II" on the top of the page.