Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent overwriting the output files #170

Merged
merged 2 commits into from
Feb 16, 2022
Merged

Conversation

aiotter
Copy link
Contributor

@aiotter aiotter commented Feb 14, 2022

Imagine you accidentally create two tags blog and Blog, and let Lume generate _site/tags/blog/index.html and _site/tags/Blog/index.html respectively (this is exactly what I did).
Case-insensitive filesystems regard those two files as the same, allowing Lume to overwrite one with the other.

This PR warns user that there already exist a generated file.

This often happens when you try to write out `tags/Blog` and `tags/blog`
on case-insensitive systems.
@aiotter
Copy link
Contributor Author

aiotter commented Feb 14, 2022

Note that I used the experimental api Deno.flock in order to make it sure that no other asynchronous functions are writing to the file while one function reads the file stat and then writes to the file.

@aiotter
Copy link
Contributor Author

aiotter commented Feb 14, 2022

OMG I didn't realise that Lume does overwrite the current file in _site when in serve mode.
How can we allow overwriting under that situation?

@oscarotero
Copy link
Member

Ovewriting output files is the intended behavior because on serve or watch mode it is needed to ovewrite the changed pages. I had thought about adding a warning on duplicated ouput files in the past but didn't implement it. The way to implement it could be something like this:

const urls = new Set();

for (const page of pages) {
    const path = page.dest.path + page.dest.ext;
    const id = path.toLowerCase();

    if (urls.has(id)) {
        console.log(`Duplicated page ${path}`);
    }
    urls.add(path);
}

@oscarotero
Copy link
Member

Note that Writer class has already a #hashes private map to prevent write unchanged pages in the filesystem https://github.com/lumeland/lume/blob/master/core/writer.ts#L23.
Maybe we could reuse this and add the duplicated page check.

@aiotter
Copy link
Contributor Author

aiotter commented Feb 14, 2022

Pathname of URL is case sensitive. If your file system allows _site/tags/Blog and _site/tags/blog coexist, therefore https://example.com/tags/blog and https://example.com/tags/Blog can be served at the same time.
In my understanding there is no API in Deno to get file system information, so I don't think it's possible to detect this type of duplication before actually saving them.

How about defining a property in Page object which represents the page is already built or not yet?

@aiotter
Copy link
Contributor Author

aiotter commented Feb 14, 2022

Hmm? I wrote something strange. It does not work. Please forget about the suggestion at the end of my last message🤔

@aiotter
Copy link
Contributor Author

aiotter commented Feb 14, 2022

We can pool the file inode number we actually wrote to. This easily enables us to detect the duplication. But this is only possible on Unix-like system.

@oscarotero
Copy link
Member

oscarotero commented Feb 14, 2022

In order to make the site working consistently on any platform, I'd consider duplication detection case insensitive, meaning that /tags/blog/ and /tags/Blogs/ are the same page (even in unix-like system). If you see the code of the example above, the page id is always lowercase.

I also think that this detection should only show a warning, but the output file should be overwritten, so it's up to the user to fix it or not.

@oscarotero
Copy link
Member

FYI, eleventy has the same feature: 11ty/eleventy#562

@aiotter
Copy link
Contributor Author

aiotter commented Feb 14, 2022

I meant it's difficult to support case sensitive output while prohibiting it in case sensitive file systems. It's not a problem if you warn duplication detected in case insensitively for all file systems.

I'll modify the commit. Part of this PR is better to be applied to fix the problem of the output file being mixture of duplicated two input content. This happens because writing to _site is done in race condition without locking the file.

to prevent mixing up the duplicated file contents
@aiotter
Copy link
Contributor Author

aiotter commented Feb 16, 2022

Here it is!
You will see _site/tags/Blog/index.html as a mixture of the html for blog and Blog. This is because Deno.writeTextFile or Deno.writeFile can process the same file at the same time.
This commit will lock the file before truncate to make sure that writing out won't happen concurrently.

@oscarotero oscarotero merged commit 65ae761 into lumeland:master Feb 16, 2022
@oscarotero
Copy link
Member

Nice, thank you!
I'll implement also a warning to show in these cases.

oscarotero added a commit that referenced this pull request Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants