Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: truncate parsed uploads to prevent database and frontend blocking caused by excessively large files #3914

Merged
merged 53 commits into from
Sep 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
1026100
📝 (constants.ts): increase maxSizeFilesInBytes constant value from 10…
Cristhianzl Sep 23, 2024
6067d73
🐛 (inputFileComponent): fix bug in setting the maximum file size aler…
Cristhianzl Sep 23, 2024
f5583a6
📝 (schemas.py): Add a new field_serializer method to serialize data i…
Cristhianzl Sep 23, 2024
476bb70
🐛 (schemas.py): fix truncation length of text fields to 10 characters…
Cristhianzl Sep 24, 2024
74d0dfa
🔧 (switchOutputView/index.tsx): Use useMemo to memoize resultMessage …
Cristhianzl Sep 24, 2024
7419421
🐛 (model.py): Fix typo in the path for 'base_retriever' data field
Cristhianzl Sep 24, 2024
81a6421
📝 (model.py): refactor truncate_text function to truncate_long_string…
Cristhianzl Sep 25, 2024
11815cb
📝 (schemas.py): refactor serialize_data method in VertexBuildResponse…
Cristhianzl Sep 25, 2024
9c22234
Merge branch 'main' into cz/limitCsvView
Cristhianzl Sep 25, 2024
f2494e1
🔧 (schemas.py): Move the `truncate_long_strings` function to a separa…
Cristhianzl Sep 25, 2024
e8343b4
📝 (util.py): add function truncate_long_strings to recursively trunca…
Cristhianzl Sep 25, 2024
02e06ba
📝 (constants.py): add constant MAX_TEXT_LENGTH with value 99999 for d…
Cristhianzl Sep 25, 2024
5a624cf
📝 (model.py): update import path for truncate_long_strings function t…
Cristhianzl Sep 25, 2024
108eba2
✨ (test_truncate_long_strings_on_objects.py): Add unit tests for the …
Cristhianzl Sep 25, 2024
fbc3273
Merge branch 'main' into cz/limitCsvView
Cristhianzl Sep 25, 2024
89576fd
[autofix.ci] apply automated fixes
autofix-ci[bot] Sep 25, 2024
57499d0
✨ (test_truncate_long_strings_on_objects.py): Update import path for …
Cristhianzl Sep 25, 2024
b8c42ad
📝 (schemas.py): Remove unused import and variable to clean up code
Cristhianzl Sep 25, 2024
673755c
♻️ (schemas.py): refactor import statement to use the updated module …
Cristhianzl Sep 25, 2024
52c35be
📝 (model.py): Update import path for util_strings module to fix modul…
Cristhianzl Sep 25, 2024
654877a
📝 (schemas.py): refactor serialize_data method to handle both BaseMod…
Cristhianzl Sep 25, 2024
5885692
Merge branch 'main' into cz/limitCsvView
Cristhianzl Sep 25, 2024
7e37170
📝 (util_strings.py): Update util_strings.py to improve string truncat…
Cristhianzl Sep 25, 2024
aafad0a
Merge branch 'cz/limitCsvView' of github.com:langflow-ai/langflow int…
Cristhianzl Sep 25, 2024
c6d3bae
Update src/backend/base/langflow/utils/util_strings.py
Cristhianzl Sep 27, 2024
8063c29
📝 (vite.config.mts): update environment variable MAX_FILE_SIZE to be …
Cristhianzl Sep 27, 2024
cbc733b
📝 (constants.ts): update maxSizeFilesInBytes constant to use process.…
Cristhianzl Sep 27, 2024
db3b89b
📝 (switchOutputView/index.tsx): import MAX_TEXT_LENGTH constant from …
Cristhianzl Sep 27, 2024
a3c384f
Merge branch 'cz/limitCsvView' of github.com:langflow-ai/langflow int…
Cristhianzl Sep 27, 2024
f1209c0
✨ (langflow/__main__.py): add support for defining maximum file size …
Cristhianzl Sep 27, 2024
b15184b
🐛 (files.py): add validation to check if uploaded file size exceeds t…
Cristhianzl Sep 27, 2024
8a18205
✨ (schemas.py): add max_file_size_upload field to ConfigResponse sche…
Cristhianzl Sep 27, 2024
bf508fd
🔧 (vite.config.mts): remove MAX_FILE_SIZE environment variable config…
Cristhianzl Sep 27, 2024
daaa23f
✨ (base.py): introduce max_file_size_upload setting to limit the file…
Cristhianzl Sep 27, 2024
add83ad
🐛 (util.py): add support for setting max_file_size_upload in update_s…
Cristhianzl Sep 27, 2024
2b8687e
📝 (inputFileComponent/index.tsx): add support for retrieving max file…
Cristhianzl Sep 27, 2024
4fbf20a
📝 (constants.ts): remove maxSizeFilesInBytes constant as it is no lon…
Cristhianzl Sep 27, 2024
422109c
✨ (use-get-config.ts): add functionality to set max file size upload …
Cristhianzl Sep 27, 2024
339a0de
✨ (utilityStore.ts): introduce maxFileSizeUpload property and setMaxF…
Cristhianzl Sep 27, 2024
11b0bc4
✨ (frontend): introduce maxFileSizeUpload property and setMaxFileSize…
Cristhianzl Sep 27, 2024
f7f4a30
Merge branch 'main' into cz/limitCsvView
Cristhianzl Sep 27, 2024
71815d3
♻️ (util_strings.py): refactor truncate_long_strings function to impr…
Cristhianzl Sep 27, 2024
a4f81e1
🐛 (files.py): fix formatting issue in the raise statement to improve …
Cristhianzl Sep 27, 2024
6a24382
Merge branch 'cz/limitCsvView' of https://github.com/langflow-ai/lang…
Cristhianzl Sep 27, 2024
52cc51d
🐛 (files.py): fix file size comparison to correctly check if file siz…
Cristhianzl Sep 27, 2024
b3d69cf
📝 (chatView/index.tsx): import useUtilityStore to access maxFileSizeU…
Cristhianzl Sep 27, 2024
87f5010
📝 (chatInput/index.tsx): import and use maxFileSizeUpload from utilit…
Cristhianzl Sep 27, 2024
f459b4b
📝 (FileInput/index.tsx): Add support for displaying alerts and handli…
Cristhianzl Sep 27, 2024
390ab78
✨ (limit-file-size-upload.spec.ts): Add test to ensure user cannot up…
Cristhianzl Sep 27, 2024
82355a7
Merge branch 'main' into cz/limitCsvView
Cristhianzl Sep 27, 2024
ce04093
📝 (limit-file-size-upload.spec.ts): update file path to fix file not …
Cristhianzl Sep 27, 2024
bc515c4
Merge branch 'cz/limitCsvView' of https://github.com/langflow-ai/lang…
Cristhianzl Sep 27, 2024
a85e96b
Merge branch 'main' into cz/limitCsvView
Cristhianzl Sep 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions src/backend/base/langflow/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,11 @@ def run(
help="Defines the number of retries for the health check.",
envvar="LANGFLOW_HEALTH_CHECK_MAX_RETRIES",
),
max_file_size_upload: int = typer.Option(
100,
help="Defines the maximum file size for the upload in MB.",
envvar="LANGFLOW_MAX_FILE_SIZE_UPLOAD",
),
):
"""
Run Langflow.
Expand All @@ -158,6 +163,7 @@ def run(
auto_saving=auto_saving,
auto_saving_interval=auto_saving_interval,
health_check_max_retries=health_check_max_retries,
max_file_size_upload=max_file_size_upload,
)
# create path object if path is provided
static_files_dir: Path | None = Path(path) if path else None
Expand Down
6 changes: 6 additions & 0 deletions src/backend/base/langflow/api/v1/files.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,12 @@ async def upload_file(
storage_service: StorageService = Depends(get_storage_service),
):
try:
max_file_size_upload = get_storage_service().settings_service.settings.max_file_size_upload
if file.size > max_file_size_upload * 1024 * 1024:
raise HTTPException(
status_code=413, detail=f"File size is larger than the maximum file size {max_file_size_upload}MB."
)

flow_id_str = str(flow_id)
file_content = await file.read()
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
Expand Down
8 changes: 8 additions & 0 deletions src/backend/base/langflow/api/v1/schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from langflow.services.database.models.flow import FlowCreate, FlowRead
from langflow.services.database.models.user import UserRead
from langflow.services.tracing.schema import Log
from langflow.utils.util_strings import truncate_long_strings


class BuildStatus(Enum):
Expand Down Expand Up @@ -281,6 +282,12 @@ class VertexBuildResponse(BaseModel):
timestamp: datetime | None = Field(default_factory=lambda: datetime.now(timezone.utc))
"""Timestamp of the build."""

@field_serializer("data")
def serialize_data(self, data: ResultDataResponse) -> dict:
data_dict = data.model_dump() if isinstance(data, BaseModel) else data
truncated_data = truncate_long_strings(data_dict)
return truncated_data


class VerticesBuiltResponse(BaseModel):
vertices: list[VertexBuildResponse]
Expand Down Expand Up @@ -341,3 +348,4 @@ class ConfigResponse(BaseModel):
auto_saving: bool
auto_saving_interval: int
health_check_max_retries: int
max_file_size_upload: int
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,14 @@
from typing import TYPE_CHECKING
from uuid import UUID, uuid4

from pydantic import field_validator
from pydantic import field_serializer, field_validator
from sqlmodel import JSON, Column, Field, Relationship, SQLModel

if TYPE_CHECKING:
from langflow.services.database.models.flow.model import Flow

from langflow.utils.util_strings import truncate_long_strings


class TransactionBase(SQLModel):
timestamp: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
Expand All @@ -32,6 +34,11 @@ def validate_flow_id(cls, value):
value = UUID(value)
return value

@field_serializer("outputs")
def serialize_outputs(self, data) -> dict:
truncated_data = truncate_long_strings(data)
return truncated_data


class TransactionTable(TransactionBase, table=True): # type: ignore
__tablename__ = "transaction"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
if TYPE_CHECKING:
from langflow.services.database.models.flow.model import Flow

from langflow.utils.util_strings import truncate_long_strings


class VertexBuildBase(SQLModel):
timestamp: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
Expand Down Expand Up @@ -38,6 +40,16 @@ def serialize_timestamp(cls, value):
value = value.replace(tzinfo=timezone.utc)
return value

@field_serializer("data")
def serialize_data(self, data: dict) -> dict:
truncated_data = truncate_long_strings(data)
return truncated_data

@field_serializer("artifacts")
def serialize_artifacts(self, data) -> dict:
truncated_data = truncate_long_strings(data)
return truncated_data


class VertexBuildTable(VertexBuildBase, table=True): # type: ignore
__tablename__ = "vertex_build"
Expand Down
2 changes: 2 additions & 0 deletions src/backend/base/langflow/services/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,8 @@ class Settings(BaseSettings):
"""The interval in ms at which Langflow will auto save flows."""
health_check_max_retries: int = 5
"""The maximum number of retries for the health check."""
max_file_size_upload: int = 100
"""The maximum file size for the upload in MB."""

@field_validator("dev")
@classmethod
Expand Down
2 changes: 2 additions & 0 deletions src/backend/base/langflow/utils/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -183,3 +183,5 @@ def python_function(text: str) -> str:
MESSAGE_SENDER_USER = "User"
MESSAGE_SENDER_NAME_AI = "AI"
MESSAGE_SENDER_NAME_USER = "User"

MAX_TEXT_LENGTH = 99999
4 changes: 4 additions & 0 deletions src/backend/base/langflow/utils/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -431,6 +431,7 @@ def update_settings(
auto_saving: bool = True,
auto_saving_interval: int = 1000,
health_check_max_retries: int = 5,
max_file_size_upload: int = 100,
):
"""Update the settings from a config file."""
from langflow.services.utils import initialize_settings_service
Expand Down Expand Up @@ -463,6 +464,9 @@ def update_settings(
if health_check_max_retries is not None:
logger.debug(f"Setting health_check_max_retries to {health_check_max_retries}")
settings_service.settings.update_settings(health_check_max_retries=health_check_max_retries)
if max_file_size_upload is not None:
logger.debug(f"Setting max_file_size_upload to {max_file_size_upload}")
settings_service.settings.update_settings(max_file_size_upload=max_file_size_upload)


def is_class_method(func, cls):
Expand Down
28 changes: 28 additions & 0 deletions src/backend/base/langflow/utils/util_strings.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
from langflow.utils import constants


def truncate_long_strings(data, max_length=None):
"""
Recursively traverse the dictionary or list and truncate strings longer than max_length.
"""

if max_length is None:
max_length = constants.MAX_TEXT_LENGTH

if max_length < 0 or not isinstance(data, dict | list):
return data

if isinstance(data, dict):
for key, value in data.items():
if isinstance(value, str) and len(value) > max_length:
data[key] = value[:max_length] + "..."
elif isinstance(value, (dict | list)):
truncate_long_strings(value, max_length)
elif isinstance(data, list):
for index, item in enumerate(data):
if isinstance(item, str) and len(item) > max_length:
data[index] = item[:max_length] + "..."
elif isinstance(item, (dict | list)):
truncate_long_strings(item, max_length)

return data
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
from langflow.utils.util_strings import truncate_long_strings
from langflow.utils.constants import MAX_TEXT_LENGTH
import pytest


@pytest.mark.parametrize(
"input_data, max_length, expected",
[
# Test case 1: Simple string truncation
({"key": "a" * 100}, 10, {"key": "a" * 10 + "..."}),
# Test case 2: Nested dictionary
({"outer": {"inner": "b" * 100}}, 5, {"outer": {"inner": "b" * 5 + "..."}}),
# Test case 3: List of strings
(["short", "a" * 100, "also short"], 7, ["short", "a" * 7 + "...", "also sh" + "..."]),
# Test case 4: Mixed nested structure
(
{"key1": ["a" * 100, {"nested": "b" * 100}], "key2": "c" * 100},
8,
{"key1": ["a" * 8 + "...", {"nested": "b" * 8 + "..."}], "key2": "c" * 8 + "..."},
),
# Test case 5: Empty structures
({}, 10, {}),
([], 10, []),
# Test case 6: Strings at exact max_length
({"exact": "a" * 10}, 10, {"exact": "a" * 10}),
# Test case 7: Non-string values
({"num": 12345, "bool": True, "none": None}, 5, {"num": 12345, "bool": True, "none": None}),
# Test case 8: Unicode characters
({"unicode": "こんにちは世界"}, 3, {"unicode": "こんに..."}),
# Test case 9: Very large structure
(
{"key" + str(i): "value" * i for i in range(1000)},
10,
{"key" + str(i): ("value" * i)[:10] + "..." if len("value" * i) > 10 else "value" * i for i in range(1000)},
),
],
)
def test_truncate_long_strings(input_data, max_length, expected):
result = truncate_long_strings(input_data, max_length)
assert result == expected


def test_truncate_long_strings_default_max_length():
long_string = "a" * (MAX_TEXT_LENGTH + 1)
input_data = {"key": long_string}
result = truncate_long_strings(input_data)
assert len(result["key"]) == MAX_TEXT_LENGTH + 3 # +3 for the "..."


def test_truncate_long_strings_no_modification():
input_data = {"short": "short string", "nested": {"also_short": "another short string"}}
result = truncate_long_strings(input_data, 100)
assert result == input_data


# Test for type preservation
def test_truncate_long_strings_type_preservation():
input_data = {"str": "a" * 100, "list": ["b" * 100], "dict": {"nested": "c" * 100}}
result = truncate_long_strings(input_data, 10)
assert isinstance(result, dict)
assert isinstance(result["str"], str)
assert isinstance(result["list"], list)
assert isinstance(result["dict"], dict)


# Test for in-place modification
def test_truncate_long_strings_in_place_modification():
input_data = {"key": "a" * 100}
result = truncate_long_strings(input_data, 10)
assert result is input_data # Check if the same object is returned


# Test for invalid input
def test_truncate_long_strings_invalid_input():
input_string = "not a dict or list"
result = truncate_long_strings(input_string, 10)
assert result == input_string # The function should return the input unchanged


# Updated test for negative max_length
def test_truncate_long_strings_negative_max_length():
input_data = {"key": "value"}
result = truncate_long_strings(input_data, -1)
assert result == input_data # Assuming the function ignores negative max_length


# Additional test for zero max_length
def test_truncate_long_strings_zero_max_length():
input_data = {"key": "value"}
result = truncate_long_strings(input_data, 0)
assert result == {"key": "..."} # Assuming the function truncates to just "..."


# Test for very small positive max_length
def test_truncate_long_strings_small_max_length():
input_data = {"key": "value"}
result = truncate_long_strings(input_data, 1)
assert result == {"key": "v..."} # Assuming the function keeps at least one character
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
import { MAX_TEXT_LENGTH } from "@/constants/constants";
import { LogsLogType, OutputLogType } from "@/types/api";
import { useMemo } from "react";
import DataOutputComponent from "../../../../../../components/dataOutputComponent";
import ForwardedIconComponent from "../../../../../../components/genericIconComponent";
import {
Expand All @@ -16,6 +18,7 @@ interface SwitchOutputViewProps {
outputName: string;
type: "Outputs" | "Logs";
}

const SwitchOutputView: React.FC<SwitchOutputViewProps> = ({
nodeId,
outputName,
Expand All @@ -35,30 +38,59 @@ const SwitchOutputView: React.FC<SwitchOutputViewProps> = ({
if (resultMessage?.raw) {
resultMessage = resultMessage.raw;
}

const resultMessageMemoized = useMemo(() => {
if (
typeof resultMessage === "string" &&
resultMessage.length > MAX_TEXT_LENGTH
) {
resultMessage = `${resultMessage.substring(0, MAX_TEXT_LENGTH)}...`;
}

if (Array.isArray(resultMessage)) {
resultMessage = resultMessage.map((item) => {
if (item && typeof item.data === "object") {
const truncatedData = Object.fromEntries(
Object.entries(item.data).map(([key, value]) => {
if (typeof value === "string" && value.length > MAX_TEXT_LENGTH) {
return [key, `${value.substring(0, MAX_TEXT_LENGTH)}...`];
}
return [key, value];
}),
);
return { ...item, data: truncatedData };
}
return item;
});
}

return resultMessage;
}, [resultMessage]);

return type === "Outputs" ? (
<>
<Case condition={!resultType || resultType === "unknown"}>
<div>NO OUTPUT</div>
</Case>
<Case condition={resultType === "error" || resultType === "ValueError"}>
<ErrorOutput
value={`${resultMessage.errorMessage}\n\n${resultMessage.stackTrace}`}
value={`${resultMessageMemoized.errorMessage}\n\n${resultMessageMemoized.stackTrace}`}
></ErrorOutput>
</Case>

<Case condition={resultType === "text"}>
<TextOutputView left={false} value={resultMessage} />
<TextOutputView left={false} value={resultMessageMemoized} />
</Case>

<Case condition={RECORD_TYPES.includes(resultType)}>
<DataOutputComponent
rows={
Array.isArray(resultMessage)
? (resultMessage as Array<any>).every((item) => item.data)
? (resultMessage as Array<any>).map((item) => item.data)
: resultMessage
: Object.keys(resultMessage).length > 0
? [resultMessage]
Array.isArray(resultMessageMemoized)
? (resultMessageMemoized as Array<any>).every((item) => item.data)
? (resultMessageMemoized as Array<any>).map((item) => item.data)
: resultMessageMemoized
: Object.keys(resultMessageMemoized).length > 0
? [resultMessageMemoized]
: []
}
pagination={true}
Expand Down
14 changes: 9 additions & 5 deletions src/frontend/src/components/inputFileComponent/index.tsx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { maxSizeFilesInBytes } from "@/constants/constants";
import { usePostUploadFile } from "@/controllers/API/queries/files/use-post-upload-file";
import { createFileUpload } from "@/helpers/create-file-upload";
import { useUtilityStore } from "@/stores/utilityStore";
import { useEffect } from "react";
import {
CONSOLE_ERROR_MSG,
Expand All @@ -23,7 +23,7 @@ export default function InputFileComponent({
}: FileComponentType): JSX.Element {
const currentFlowId = useFlowsManagerStore((state) => state.currentFlowId);
const setErrorData = useAlertStore((state) => state.setErrorData);

const maxFileSizeUpload = useUtilityStore((state) => state.maxFileSizeUpload);
// Clear component state
useEffect(() => {
if (disabled && value !== "") {
Expand All @@ -47,9 +47,9 @@ export default function InputFileComponent({
createFileUpload({ multiple: false, accept: fileTypes?.join(",") }).then(
(files) => {
const file = files[0];
if (file.size > maxSizeFilesInBytes) {
if (file.size > maxFileSizeUpload) {
setErrorData({
title: INVALID_FILE_SIZE_ALERT(10),
title: INVALID_FILE_SIZE_ALERT(maxFileSizeUpload / 1024 / 1024),
});
return;
}
Expand All @@ -68,8 +68,12 @@ export default function InputFileComponent({
// sets the value to the user
handleOnNewValue({ value: file.name, file_path });
},
onError: () => {
onError: (error) => {
console.error(CONSOLE_ERROR_MSG);
setErrorData({
title: "Error uploading file",
list: [error.response?.data?.detail],
});
},
},
);
Expand Down
Loading