Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/rationalise file name #1127

Merged
merged 21 commits into from
Oct 29, 2024
Merged

Feature/rationalise file name #1127

merged 21 commits into from
Oct 29, 2024

Conversation

gecBurton
Copy link
Collaborator

@gecBurton gecBurton commented Oct 24, 2024

Context

The file-name is especially significant and problematic as it has to be both human readable, and globally unique.

Currently we handle this via:

  • adding a hash key to duplicate file names to keep the file names unique
  • an additional original_file_name field to store the name of the file as the user uploaded it
  • various helper methods, such as unique_name, name

I propose that we dont actually need a file name to be globally unique, we want it to be unique per user, so that:

  • a user can overwrite a file if they upload one with the same name
  • two users can upload a file with the same name and they will not conflict

to achieve this we should change the file name to user.email/file_name, this will have to convenient side effect of presenting the files in directories named after the user within s3.

Changes proposed in this pull request

  • orginal_file_name -> now unused but kept for backwards compatibility
  • name -> file_name, the original file name - for the benefit of the user

Guidance to review

do we actually want this behaviour?

Relevant links

Things to check

  • I have added any new ENV vars in all deployed environments
  • I have tested any code added or changed
  • I have run integration tests

@gecBurton gecBurton marked this pull request as draft October 24, 2024 13:05
@gecBurton gecBurton force-pushed the feature/rationalise-file-name branch 2 times, most recently from aa666ee to efae1ac Compare October 25, 2024 15:40
)

def update_status_from_core(self, status_label):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

@gecBurton gecBurton marked this pull request as ready for review October 28, 2024 12:57
except ValueError as e:
logger.exception("attempt to access non-existent file %s", self.pk, exc_info=e)
def file_name(self) -> str:
if self.original_file_name: # delete me?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for backwards compatability

original_file = models.FileField(storage=settings.STORAGES["default"]["BACKEND"])
original_file = models.FileField(
storage=settings.STORAGES["default"]["BACKEND"],
upload_to=build_s3_key,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this upload_to is the key change

@@ -64,29 +64,42 @@ def test_document_upload_status(client, alice, file_pdf_path: Path, s3_client):

@pytest.mark.django_db()
def test_upload_view_duplicate_files(alice, bob, client, file_pdf_path: Path, s3_client):
# delete all alice's files
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test has been changed significantly to capture the new intended behaviour

Copy link
Contributor

@jamesrichards4 jamesrichards4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one query about displaying the file name and whether we want to display the s3 path version in the frontend

@@ -307,10 +307,10 @@ async def handle_citations(self, citations: list[AICitation]):
for s in c.sources:
try:
file = await File.objects.aget(original_file=s.source)
payload = {"url": str(file.url), "original_file_name": file.original_file_name}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we still want to use original_file_name here so we see 'stuff.pdf' in the frontend rather than 'me@cabinetoffice/stuff.pdf'?

Copy link
Collaborator Author

@gecBurton gecBurton Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file_name will strip out me@cabinetoffice/ and just return stuff.pdf

TBH i was hoping that we could just delete original_file_name altogether in 30 days (this PR means that its no longer getting populated)

@gecBurton gecBurton merged commit 89e704f into main Oct 29, 2024
6 checks passed
@gecBurton gecBurton deleted the feature/rationalise-file-name branch October 29, 2024 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants