Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[azure] Broken listdir #1431

Open
marcinfair opened this issue Jul 11, 2024 · 8 comments
Open

[azure] Broken listdir #1431

marcinfair opened this issue Jul 11, 2024 · 8 comments

Comments

@marcinfair
Copy link

Since #1403 the logic of listdir was changed and this is not compatible with standard lib os.listdir. Do you plane do leave it as is or fix this behavior?

Issue occurs since v.1.14.4

thanx

@LeaveMyYard
Copy link

LeaveMyYard commented Jul 15, 2024

Not sure if this is 100% caused by this issue, but looks like that.

I am doing

python manage.py collectstatic --noinput --verbosity 2

and after 1.14.4 it started failing. From logs I see that prior to 1.14.4 almost all files are like this:

Skipping 'js/login.js' (not modified)
Skipping 'js/main.js' (not modified)
Skipping 'js/features/sharing.js' (not modified)
Skipping 'js/features/poll_auth.js' (not modified)

But int 1.14.4 they are:

Copying '/app/bonuses/static/js/login.js'
Copying '/app/bonuses/static/js/main.js'
Copying '/app/bonuses/static/js/features/sharing.js'
Copying '/app/bonuses/static/js/features/poll_auth.js'

So I assume the issue is indeed with listing the direction. The main issue is that for me it results in

ValueError: The file 'admin/img/sorting-icons.svg' could not be found with <bonuses.storage.StaticStorage object at 0x7f5809462530>.

My storage is

class CustomS3Boto3Storage(S3Boto3Storage):
    display_location: str

    def url(self, name: str) -> str:
        return f"{self.display_location}/{name}"
 
 
class StaticStorage(ManifestFilesMixin, CustomS3Boto3Storage):
    bucket_name: str = settings.AWS_STORAGE_BUCKET_NAME
    location: str = f"{settings.AWS_LOCATION}/static"
    display_location: str = (
        f"{settings.STATIC_CDN_CUSTOM_DOMAIN}/{settings.DO_SPACES_BUCKET_NAME}/{settings.DO_SPACES_ENV_PREFIX}/static"
    )

Both in 1.13 and 1.14.3 everything works fine

@jschneier
Copy link
Owner

Hi, thanks for the reports.

The comment in reply to my question: #1403 (comment)

@tonybaloney do you have any thoughts here? @Frodothedwarf can you shed further light on why this change had to be made?

For Azure I rely largely on the community.

@Frodothedwarf
Copy link
Contributor

@LeaveMyYard you seem to be using AWS S3 storage is that correct?
@jschneier I haven't touched AWS S3, but only the Azure part, which shouldn't affect how S3 file storage is handled.

The justifications for why this changes were made are that Azure does not support crawling like Django is trying to do when handling files (the part where it goes into one folder, then subfolders and proceeds until no folders is left. Instead Azure returns all files available, resulting in one API call, which is much more efficient than trying to crawl. However I have seen that Azure supports the listing of files with prefix of the folder, like "folder1/". If this needs to be changed I can make a new pull request solving this.

I made the change due to Djangos "--clear" command was broken.

@marcinfair where do you make use of the os.listdir()?

@LeaveMyYard
Copy link

@Frodothedwarf Digital Ocean Spaces to be precise

@Frodothedwarf
Copy link
Contributor

@Frodothedwarf Digital Ocean Spaces to be precise

I haven't touched that part of django-storages, I only made changes to the Azure part.
@jschneier maybe it has something to do with:

S3
  Pull AWS_SESSION_TOKEN from the environment (https://github.com/jschneier/django-storages/pull/1399)
  Fix newline handling for text mode files (https://github.com/jschneier/django-storages/pull/1381)
  Do not sign URLs when querystring_auth=False e.g public buckets or static files (https://github.com/jschneier/django-storages/pull/1402)
  Cache CloudFront Signers (https://github.com/jschneier/django-storages/pull/1417)

@marcinfair
Copy link
Author

@LeaveMyYard you seem to be using AWS S3 storage is that correct? @jschneier I haven't touched AWS S3, but only the Azure part, which shouldn't affect how S3 file storage is handled.

The justifications for why this changes were made are that Azure does not support crawling like Django is trying to do when handling files (the part where it goes into one folder, then subfolders and proceeds until no folders is left. Instead Azure returns all files available, resulting in one API call, which is much more efficient than trying to crawl. However I have seen that Azure supports the listing of files with prefix of the folder, like "folder1/". If this needs to be changed I can make a new pull request solving this.

I made the change due to Djangos "--clear" command was broken.

@marcinfair where do you make use of the os.listdir()?

I just listing files in azure storage in some of my custom functions. Maybe I wasn't super clear. What I meant, was that previous implementation of AzureStorage.listdir() returned names of files in azure storage folder without prepended full path, which was the same behaviour as os.listdir().

I decided to upgrade to v.1.14.4 and make workaround by implementing old behaviour of listdir with custom method on AzureStorage class.

class CustomAzureStorage(AzureStorage):
    def listdir_custom(self, path=""):
        """
        brings back listdir from below v. 1.14.4
        """
        files = []
        dirs = set()
        for name in self.list_all(path):
            n = name[len(path) :]
            if "/" in n:
                dirs.add(n.split("/", 1)[0])
            else:
                files.append(n)
        return list(dirs), files

@Frodothedwarf
Copy link
Contributor

I guess this could be implmeented correctly, if I understand correctly you want the full path?

@marcinfair
Copy link
Author

Previous implementation of listdir() I guess was fine. Method had been returning tuple[list, list]. And relevant part of it was second element:files: list, which was list of all files in specific 'folder' without prepending path. Only filenames. This implementation was in line with mentioned previously os.listdir() which also returns files relative to current working directory (cwd), so api would be more intuitive and easy to consume by newcomers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants