Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: schema validation in langsmith sdk #922

Merged
merged 6 commits into from
Aug 15, 2024

Conversation

jakerachleff
Copy link
Contributor

No description provided.

monkeypatch: pytest.MonkeyPatch, langchain_client: Client
) -> None:
"""Test persisting runs and adding feedback."""
monkeypatch.setenv("LANGCHAIN_ENDPOINT", "https://dev.api.smith.langchain.com")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a quirk where we were overriding a single test to use dev. If we wanna test against dev, we should just configure the suite to run against dev in addition

@@ -2529,6 +2529,8 @@ def create_dataset(
*,
description: Optional[str] = None,
data_type: ls_schemas.DataType = ls_schemas.DataType.kv,
inputs_schema_definition: Optional[Dict[str, Any]] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "definition" necessary? My fingers hurt just looking at this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I matched the API exactly. Do we mismatch ever otherwise?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do mismatch - our apis are pretty confusing sometimes


class Config:
"""Configuration class for the schema."""

allow_population_by_field_name = True
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hinthornw I had to do some pydantic magic to make all this work. Do we do this in the SDK? I see this pattern in runtree, but I know pydantic stuff is frowned upon other parts of the code base.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we just don't use pydantic in thecreate_dataset method anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll need to remove it from all dataset related areas then, because we'll need to do conversion on any read/create/etc

description=description,
data_type=data_type,
)
dataset = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byebye pydantic

@@ -163,6 +158,12 @@ def __init__(
**kwargs: Any,
) -> None:
"""Initialize a Dataset object."""
if "inputs_schema_definition" in kwargs:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is disgusting

Copy link
Collaborator

@hinthornw hinthornw Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this actually be applied? Can we just not support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdym? yeah, there's a new integration test showing this works

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is for the dataset copying case ya?

I think it's fine. I also wouldn't resist if we just marked it as "inputs_schema_definition" itself here


# assert read API includes the schema definition
read_dataset = langchain_client.read_dataset(dataset_id=dataset.id)
assert read_dataset.inputs_schema == InputSchema.model_json_schema()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hinthornw here's the integration test for reading the input schema back out

@jakerachleff jakerachleff merged commit e485d4a into main Aug 15, 2024
8 checks passed
@jakerachleff jakerachleff deleted the 2024-08-13-input-validation branch August 15, 2024 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants