Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IO-1445][external] Changes to LocalDataset() & get_annotations() to account for local releases pulled with folders #678

Merged
merged 10 commits into from
Oct 24, 2023

Conversation

JBWilkie
Copy link
Collaborator

@JBWilkie JBWilkie commented Oct 5, 2023

… pulled with folders

Problem

When local releases are pulled with folders, construction of LocalDataset objects & the get_annotations() function fail due to a mismatch between annotation file names and item names. A more in-depth breakdown is available in the IO-1445 Linear ticket

Solution

Instead of comparing the file names of the JSON with items, parse each JSON file to guarantee the correct local path for each item is checked against. This has the side effect of introducing significantly higher runtimes when parsing large JSON files. Therefore, make LocalDataset & get_annotations() use a streaming library to prevent this.

Changelog

  • Changes to LocalDataset construction & get_annotations() function to account for local releases pulled with folders, including releases with identically named files.
  • Addition of the json_stream library to darwin-py to improve runtimes when reading certain fields from JSON files

@linear
Copy link

linear bot commented Oct 5, 2023

IO-1445 BUG: Classification model failed A potential customer (trial) contacted us saying that their classification model failed

BUG submission from: Patryk Gronostajski
Summary (describe an issue):
Classification model failed

A potential customer (trial) contacted us saying that their classification model failed. I tried running one in their team and it failed as well. They want to use the "Formal" and "Informal" tags, and all items in the dataset.
Share Loom/Screenshots with Console/Network opened:
N/A
Darwin affected version
V2
Environment (production/staging; browser and OS version)
2.0 prod
Impact
Single (only 1 customer is impacted)
Priority
🔴 SWAT Urgent - an urgent problem that blocks users
Steps to reproduce

  1. Attempt to train a model using the "Formal" and "Informal" tags, and all items in the dataset.
    Expected Behaviour
    The model trains successfully.
    Team & Dataset Link
    Dataset: https://darwin.v7labs.com/datasets/658275/dataset-management
    Team: https://darwin.v7labs.com/super_admin/teams/5630
    Intercom ticket
    https://app.intercom.com/a/inbox/pb9z3asr/inbox/admin/6578327/conversation/235699?view=List
    Customer Name
    Beder Acosta Borges
    Renewal timeline

ARR

Tier

Assigned CSM

@JBWilkie
Copy link
Collaborator Author

JBWilkie commented Oct 5, 2023

Failing tests, need to investigate

Copy link
Contributor

@Nathanjp91 Nathanjp91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Revisions seem fine with the QA and testing done.

@JBWilkie JBWilkie merged commit 23d4d16 into master Oct 24, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants