-
-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial support for ACMS docket reports. #3193
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good to me, but for a couple things:
- Lots of repetition with other functions. Do we want to merge these?
- We need migration files per https://github.com/freelawproject/courtlistener/wiki/Database-migrations
@johnhawkinson, I hear you that you don't have a CL env. set up, so @albertisfu once you've got your elastic PR in, could you adopt this one and see what it takes to get it landed?
Sure, I'm starting to look at this, I'm catching up on the conversation on Slack and I'll post any questions I have in order to get this landed. |
d0c76c9
to
15fdea0
Compare
@albertisfu perhaps unwisely, I removed your merge commit and rebased off the Anyhow, I just added a relaxation of the |
I think the next step is that I'll add some tests for this new upload but before some comments about after analyzing this new docket report: From the ACMS docket report, we can extract both the docket data and all the docket entries from the JSON. We could then map this data to the DB model: Docket data"caseId":"e15ebc78-9507-4639-8a61-4bc42e613a66" - > "caseNumber":"23-6364" -> "name":"United States of America v. Surmik" -> "caseOpened":"2023-04-19" -> "aNumber":null - > Not in DB "receivedDate":"2023-04-19T13:57:24Z" - > Not in DB "partyAttorneyList":"" "court":{ "caseType":"Criminal, is this case_type_information? "caseSubType":"Post-Conviction", do we have this in DB? "caseSubSubType":null do we have this in DB? "districtCourtName":null Do we have this in DB? "feeStatus":"IFP Granted" Do we have this in DB? Docket EntriesWe can get entries with these fields from the JSON: "endDate":"2023-04-14" → DocketEntry "endDateFormatted":"04/14/2023" → DocketEntry "entryNumber":1, → DocketEntry/RECAPDocument "docketEntryText":"NOTICE OF CRIMINAL APPEAL, with the district court docket, on behalf of Appellant William Surmik, FILED. [Entered: 04/19/2023 10:26 AM]", → DocketEntry "docketEntryId":"bde556a7-bdde-ed11-a7c6-001dd806a1fd" → RECAPDocument "createdOn":"2023-04-19T14:22:56Z", Not in DB "documentCount":2, → (Number of documents, 0 a minute entry, 1 only main document, 2 or greater main document + attachments, we might not need this.) "pageCount":12, → RECAPdocument "fileSize":410, RECAPdocument "restrictedPartyFilingDocketEntry":false, Not in DB "restrictedDocsAvailable":false, Not in DB Discussing this with @ERosendo we think we could use the However, we should consider that Then we should be able to match the document uploads/document availability to the right DocketEntries/RECAPDocuments. However, the situation is more complicated when there are attachments. Until you click one of them and the download page is opened, you’ll get the So a workaround that @ERosendo suggested for this issue is that we can use the We think this can work and we can avoid adding an additional field for storing We just need to confirm this approach with more examples, could you provide more examples for this and other courts where this report is available? About Juriscraper, If I understand correctly, we would obtain the JSON from the extension. We could then store this JSON in CL as a file for backup, similar to how we store HTML files. Juriscraper would then parse the PACER JSON and convert it into a format compatible with CL JSON, correct? |
ba71be8
to
b1c5ced
Compare
I think most of this is properly in juriscraper, and freelawproject/juriscraper#730 doesn't really attempt to do so, since the parsing code hasn't been written.
Yes.
…
Err, don't we store times now? I thought we did, for items received from RSS. ACMS may not expose a date_filed/date_entered distinction that I am aware of (I'm not sure—I think Appellate CM/ECF only exposes it in NDA emails, and I haven't seen any from ACMS). But it does appear that the
I don't think this is what that is. It's certainly not the size in bytes. (I guess it could be in kilobytes?).
Hilariously, we could shoehorn it in by removing the hyphens from the GUID before storing it.
This is a problem that we have in the district court, too. For instance, I guess the problem is a little different here, but related. But maybe that's not what you're proposing.
Saving the docket number doesn't preclude this. I guess we would need a new CL lookup query if we went this route?
What kind of examples do you want? https://ca9-showdoc.azurewebsites.us/23-2487 is an example of a case docketed today in the 9th circuit.
Correct. |
Yeah, now we can store
Yeah I agree, it's better to increase the length in
Got it, so, in that case, we could store the
Yeah, the docket number doesn't help here. So the idea was to use the As an alternative, also @ERosendo found that it's possible to get the
Thanks, yeah these examples are fine. So if we'll only receive ACMS dockets from courts that use docket entry numbers (like ca2 and ca9), that seems fine. Because if we expect to receive an ACMS where their docket entries don't have numbers. That will be an additional issue we need to handle since in those cases we use the Let me know your thoughts about these comments so if you agree I can start working on mocking some data for an ACMS docket and adding tests for ingesting the docket/docket entries. |
Best is the enemy of good enough. This went live on Sunday and we're missing new appellate cases, so I'm trying to move with alacrity.
That is…not exactly what I said:
I don't think we should be using
We should ignore almost all of them and use the date/time parsed out of the Entered text field, unfortunately.
As you are probably aware, I could write a treatise on naming.
Anyhow, I would call it
We do not know the future with perfect clarity, but I think it is unlikely any other courts will adopt ACMS in the current calendar year, if they ever do (not a foregone conclusion!).
It's also not clear that ACMS does or will in the future support the concept of unnumbered docket entries.
Hrmm. As long as we are changing schema (lengthening entry_number = models.BigIntegerField(
help_text=(
"# on the PACER docket page. For appellate cases, this may "
"be the internal PACER ID for the document, when an entry "
"ID is otherwise unavailable."
), I guess we can bridge that gap when we come to it? |
d0c76c9
to
b1c5ced
Compare
Oops, so, I think my force-push/rebase confused Github's tracking of @mlissner's review comments. I tried force-pushing back to d0c76c9 but that did not make them show up again, so I re-force-pushed back to where we were. Hopefully this confuses no humans, only computers.
I addressed these in Slack on Sept. 28 but should have done so here:
In a perfect world, yes. This maintains the same level of duplication found between appellate dockets and district court dockets, and seems like it is too much, but also it would be work to break it up and abstract it better. Of course the case for that work increases when you have things in triplicate rather than duplicate, but I did not see an easy way for me to tackle that task without a lot more uncertainty than I already have about this code that I'm not in a good position to test.
OK, that one's on Alberto (well, they both are, really). |
Yeah, we can tweak Well for now I'll work on increasing the length of |
b1c5ced
to
b068912
Compare
@albertisfu not sure that tagging you in the commit text for 9a874cb had the intended result, but there's a help_text typo to fix in the same migration as long as you're doing one, if it's not too much trouble. |
Thank you guys for all the discussion here. I've been mostly waiting for you guys to lead me, but I do have a couple thoughts (none about code, apparently!):
Yes, but good enough is the enemy of great. If we've got our eye on this now and have the energy to do so, I'm very much in favor of doing everything the right way so we don't have to come back around and fix things. The sad reality is we're pretty bad at coming around later to fix things, particularly things that are good enough.
This freaks me out and I'd like to emphasize that I don't want to do this. One issue is that any data we get has to be re-processed in the order it's received, including across object types. So if we get a docket, then an attachment page, then a docket, we can't just re-process the dockets alone, we'll have to do all three, in the correct order. This is partly because the data can change under us, and partly because it's an assumption baked into the merging code — I don't know what would happen if we ignored this assumption. There is also a small amount of manual work that gets done in the DB, to, for example, deal with sealed content and other issues like that. When we re-process data, we tend to wipe out those manual changes. It's also just work I don't want to do (my time is at an extreme premium and I tend to be the one that has to do it). We should make every effort to get it right, even at a time-delay, so we can avoid having to re-process data.
Yeah, Github doesn't do well with this. The worst part is, it breaks a feature that lets reviewers see changes since their last review. This is getting annoying enough that I'm getting close to blocking force pushes on all branches. That's it for me for now. Thank you guys again. I appreciate this moving forward at such a clip. |
Is it? In any case, I'm not here to define "good enough," just to say that at least from my perspective, I wouldn't hold up work mocking things based on the level of open questions we have here.
I guess this is about issue tracking and assignment and stuff.
I hear you. I could have been a bit more clear — we don't really understand what many of these fields are and whether they have any value (remember they are not displayed to the user in the UI), and we would need database model changes to store them. I think we don't need them, but if we do, there is always a way to come back and get them.
Yeah. I thought I was safe because (I thought), your comments were not actually tied to code. I guess I'll try clicking the magic circle arrow red button and see if that resolves the review state (in)consistency.
You did not speak to the T.S. Eliot question:
Har. I have no point, I just want to say: JSON Tub Time Machine…countless screaming Argonauts. |
Thanks for the comments. For the field, I think between you and Alberto, I don't have much to add. The name seems fine and reasonable to me. If you guys agree we need it and agree that the name is good, I'm +1 to whatever you decide. |
4242e38
to
f5dde3a
Compare
Yeah, I added the following changes:
0023_search_models_update.py with regular tables, event tables, and triggers. |
Oops, yes, I think you're right to raise this concern. The (The 'A' in ACMS stands for 'Appellate'.) |
Yeah, I second that. We don't need the new field for claims docs. |
Correct, I've moved |
Oops, clearly that failing test was my fault. I just tried a blind fix, but if that doesn't do it, someone with a dev environment should take it on. I am not clear whether Alberto was planning to do more, or we were waiting for Mike's re-review, or something else. Clarity on who has the next step would be good. Thanks. |
Alberto has this on his plate. |
I've added a test ( Once the Juriscraper parser is complete, please let me know, and I can update this test to replace the mock data with the actual data returned by this parser. |
Yes, I definitely shall. Is there anything blocking this from landing? |
Clearly explain the naive/aware datetime handling.
[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci
Removes unnecessary level of hierarchy to compute URLs and adds a comment about the issue with the ACMS download page.
5c2971b
to
83457df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the branch with main. This seems ready to me!
@johnhawkinson do you want to give this another look or should we just go for it? |
Merged! I agree with Eduardo that this is cooked enough. If we need more changes once this hits prod, we'll do that, but let's get this big thing under our belt and move forward. @blancoramiro, this will need some clever deployment. Let's see if we can do that today or tomorrow, since all other deploys will be blocked until we get it in (but no urgent code is expected). |
Thank you all for your contributions here. This one was a real team effort. |
Initial support for ACMS.
This probably needs the juriscraper stub to be committed: freelawproject/juriscraper#730
I do not have a courtlistener test environment available, and I'm focusing on the juriscraper and recap-chrome parts of this work. Can someone else deal with getting this up and running? I hope it is simple and mechanical.