Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gatsby-source-filysystem] Video files created several times #4863

Closed
sebastienfi opened this issue Apr 6, 2018 · 17 comments
Closed

[gatsby-source-filysystem] Video files created several times #4863

sebastienfi opened this issue Apr 6, 2018 · 17 comments

Comments

@sebastienfi
Copy link
Contributor

sebastienfi commented Apr 6, 2018

Description

After I run gatsby build I found myself with a public folder of 2go for a website that has only 100Mo of assets. Looking at the generated files, I found a few hundreds of MP4 and WEBM files duplicated with the same contents.
We only have 1 MP4 and 1 WEBM assets, we ended up with hundreds of them duplicated.
Reproducing the error on OSX, we found only 2 duplicates not hundreds as seen on Windows.

image

The duplicates does not all have the same disk size, although some does.
Looking at the encodings details, I can see that a re-encoding happened, that's why my natural idea is to look on the side of Sharp or gatsby-source-filesystem for solution to this bug. While on OSX all the videos can be played, on Windows only the duplicates with the size of the original video can be read, other videos are corrupted.

Building the repro example, I noticed that the problem doesn't occurs with small video files (500Ko). With a medium sized file (9Mo), the problem occurs already.

Steps to reproduce

  1. Clone the following repo : https://github.com/sebastienfi/gatsby-using-wordpress-video-duplicates-repro
  2. Run yarn run install && yarn run build
  3. See in ./public/static/ that the video file gets duplicated.

If you are using OSX or Linux you may see a lesser number of duplicates but still.
image

Expected result

The video file should be only once in the ./public/static/folder.

Actual result

The video file is duplicated several times on the ./public/static/folder.

Environment

  • Gatsby version (npm list gatsby): gatsby@1.9.246
  • gatsby-cli version (gatsby --version): 1.9.244
  • Node.js version: v8.4.0
  • Operating System: Windows 10 64 Bits & OSX Latest

File contents (if changed):

gatsby-config.js: added gatsby-source-filesystem to copy the downloaded media assets to the static folder.

module.exports = {
  siteMetadata: {
    title: `A sample site using gatsby-source-wordpress without ACF`,
    subtitle: `Data fetched from a site hosted on pantheonsite.io`,
  },
  plugins: [
    /*
     * Gatsby's data processing layer begins with “source”
     * plugins. Here the site sources its data from Wordpress.
     */
    {
      resolve: `gatsby-source-wordpress`,
      options: {
        /*
        * The base URL of the Wordpress site without the trailingslash and the protocol. This is required.
        * Example : 'gatsbyjswpexample.wordpress.com' or 'www.example-site.com'
        */
        baseUrl: `dev-repro-wp-sebastienfi.pantheonsite.io`,
        // The protocol. This can be http or https.
        protocol: `http`,
        // Indicates whether the site is hosted on wordpress.com.
        // If false, then the asumption is made that the site is self hosted.
        // If true, then the plugin will source its content on wordpress.com using the JSON REST API V2.
        // If your site is hosted on wordpress.org, then set this to false.
        hostingWPCOM: false,
        // If useACF is true, then the source plugin will try to import the Wordpress ACF Plugin contents.
        // This feature is untested for sites hosted on Wordpress.com
        useACF: true,
      },
    },
    {
      resolve: `gatsby-source-filesystem`,
      options: {
        name: `images`,
        path: `${__dirname}/.cache/gatsby-source-filesystem`
      }
    },
    `gatsby-transformer-sharp`,
    `gatsby-plugin-sharp`,
    `gatsby-plugin-glamor`,
    {
      resolve: `gatsby-plugin-typography`,
      options: {
        pathToConfigModule: `src/utils/typography.js`,
      },
    },
  ],
}

package.json: not changed from examples/using-wordpress

{
  "name": "gatsby-example-using-wordpress",
  "private": true,
  "description": "Gatsby example site using the Wordpress source plugin",
  "version": "1.0.0-beta.6",
  "author": "Sebastien Fichot <fichot.sebastien@gmail.com>",
  "dependencies": {
    "gatsby": "^1.9.45",
    "gatsby-image": "^1.0.4",
    "gatsby-link": "^1.6.21",
    "gatsby-plugin-glamor": "^1.6.7",
    "gatsby-plugin-react-helmet": "^2.0.1",
    "gatsby-plugin-sharp": "^1.6.7",
    "gatsby-plugin-styled-components": "^2.0.2",
    "gatsby-plugin-typography": "^1.7.9",
    "gatsby-source-wordpress": "^2.0.0",
    "gatsby-transformer-sharp": "^1.6.5",
    "lodash": "^4.16.4",
    "react-helmet": "^5.2.0",
    "react-icons": "^2.2.5",
    "typography-theme-wordpress-2013": "^0.15.10"
  },
  "keywords": [
    "gatsby"
  ],
  "license": "MIT",
  "main": "n/a",
  "scripts": {
    "dev": "gatsby develop",
    "lint": "./node_modules/.bin/eslint --ext .js,.jsx --ignore-pattern public .",
    "test": "echo \"Error: no test specified\" && exit 1",
    "develop": "gatsby develop",
    "build": "gatsby build",
    "start": "gatsby serve",
    "predeploy": "gatsby build --prefix-paths"
  },
  "devDependencies": {
    "eslint": "^4.1.1"
  }
}

gatsby-node.js: not changed from examples/using-wordpress

const _ = require(`lodash`)
const Promise = require(`bluebird`)
const path = require(`path`)
const slash = require(`slash`)

// Implement the Gatsby API “createPages”. This is
// called after the Gatsby bootstrap is finished so you have
// access to any information necessary to programmatically
// create pages.
// Will create pages for Wordpress pages (route : /{slug})
// Will create pages for Wordpress posts (route : /post/{slug})
exports.createPages = ({ graphql, boundActionCreators }) => {
  const { createPage } = boundActionCreators
  return new Promise((resolve, reject) => {
    // The “graphql” function allows us to run arbitrary
    // queries against the local Wordpress graphql schema. Think of
    // it like the site has a built-in database constructed
    // from the fetched data that you can run queries against.

    // ==== PAGES (WORDPRESS NATIVE) ====
    graphql(
      `
        {
          allWordpressPage {
            edges {
              node {
                id
                slug
                status
                template
              }
            }
          }
        }
      `
    )
      .then(result => {
        if (result.errors) {
          console.log(result.errors)
          reject(result.errors)
        }

        // Create Page pages.
        const pageTemplate = path.resolve(`./src/templates/page.js`)
        // We want to create a detailed page for each
        // page node. We'll just use the Wordpress Slug for the slug.
        // The Page ID is prefixed with 'PAGE_'
        _.each(result.data.allWordpressPage.edges, edge => {
          // Gatsby uses Redux to manage its internal state.
          // Plugins and sites can use functions like "createPage"
          // to interact with Gatsby.
          createPage({
            // Each page is required to have a `path` as well
            // as a template component. The `context` is
            // optional but is often necessary so the template
            // can query data specific to each page.
            path: `/${edge.node.slug}/`,
            component: slash(pageTemplate),
            context: {
              id: edge.node.id,
            },
          })
        })
      })
      // ==== END PAGES ====

      // ==== POSTS (WORDPRESS NATIVE AND ACF) ====
      .then(() => {
        graphql(
          `
            {
              allWordpressPost {
                edges {
                  node {
                    id
                    slug
                    status
                    template
                    format
                  }
                }
              }
            }
          `
        ).then(result => {
          if (result.errors) {
            console.log(result.errors)
            reject(result.errors)
          }
          const postTemplate = path.resolve(`./src/templates/post.js`)
          // We want to create a detailed page for each
          // post node. We'll just use the Wordpress Slug for the slug.
          // The Post ID is prefixed with 'POST_'
          _.each(result.data.allWordpressPost.edges, edge => {
            createPage({
              path: edge.node.slug,
              component: slash(postTemplate),
              context: {
                id: edge.node.id,
              },
            })
          })
          resolve()
        })
      })
    // ==== END POSTS ====
  })
}

gatsby-browser.js: not changed
gatsby-ssr.js: not changed

@KyleAMathews
Copy link
Contributor

Weird bug... how are those files getting to the public directory? Nothing copies them there by default.

@sebastienfi
Copy link
Contributor Author

sebastienfi commented Apr 6, 2018

@KyleAMathews gatsby-source-wordpress copy them to .cache/gatsby-source-filesystem/ (createRemoteFileNode), then gatsby-source-filesystem copy them from .cache/ to ./public/static/

Worth mentionning that the duplicated files are also present in .cache/gatsby-source-filesystem/

@KyleAMathews
Copy link
Contributor

gatsby-source-filesystem doesn't copy anything unless you query the files is my point. What's your query?

@sebastienfi
Copy link
Contributor Author

sebastienfi commented Apr 6, 2018

query currentPageQuery($id: String!) {
    wordpressPage(id: { eq: $id }) {
      title
      content
      date(formatString: "MMMM DD, YYYY")
      acf {
        video {
          source_url
          localFile {
            publicURL
          }
        }
      }
    }  
  }

@KyleAMathews
Copy link
Contributor

You want to search for the code handling this and debug what's happening?

@sebastienfi
Copy link
Contributor Author

sebastienfi commented Apr 6, 2018

I have too much other things in my hands to do that just yet, it'll be days before I could get to that.
I just wanted to report the issue so that someone could anticipate.
I may have time over the weekend though. If you have an idea of where to debug more precisely, enlighten us 🗡

@KyleAMathews
Copy link
Contributor

Here you go!

@pieh
Copy link
Contributor

pieh commented Apr 6, 2018

I can reproduce it locally - so it seems server doesn't handle etag/304 and we will redownload file everytime - I would assume this changes contentDigest of file which is used when creating publicURL link. Not yet sure why sometimes file become corrupted after copying

@pieh
Copy link
Contributor

pieh commented Apr 6, 2018

@sebastienfi if you will have time you can check #4872, which partially helps with this issue (creating multiple copies in public/static)

and for corrupted files - it seems that await fs.move() (at least on windows) doesn't really work - immediate fs.stat after that will show different size than after few additional seconds, so I would assume we start copying not fully written file and end up with corrupted videos ... not sure how to handle that ;/

@sebastienfi
Copy link
Contributor Author

sebastienfi commented Apr 6, 2018

@pieh same diagnostic here.
I tried updating to latest chokidar and it makes on Windows the problem less problematic (same number of duplicates as in OSX). But the bug is still here after update.
The fs event update is triggered before the file ended up its writing, so gatsby-source-filesystem imports the file which happens to be partial.

@sebastienfi
Copy link
Contributor Author

I think #4872 is included in gatsby-source-wordpress@2.0.74 just released by Kyle. Checking it now

@sebastienfi
Copy link
Contributor Author

#4872 effectively impacts the bug in 2 ways :

  1. on first build it reduces the number of duplicates somehow, but there is still duplicates.
  2. on second build, the file is already downloaded so the "duplicates" bug doesn't happen.

@pieh
Copy link
Contributor

pieh commented Apr 6, 2018

#4872 won't change anything with clean/first build - I couldn't reproduce duplicate files on initial build from reproduction wordpress site and code from your description. It also didn't address corrupted files situation.

@pieh
Copy link
Contributor

pieh commented Apr 6, 2018

@sebastienfi another PR, this time for corrupted files - #4877

@sebastienfi
Copy link
Contributor Author

@pieh #4877 has no impact on this bug.

@pieh
Copy link
Contributor

pieh commented Apr 7, 2018

Do you still get corrupted files in public with gatsby-source-filesystem@1.5.29 ? After this fix I couldn't reproduce corrupted files anymore, so maybe my reproduction site just is not enough to reproduce this anymore and I need something more complex.

Or you still have duplicates? Again I couldn't reproduce it anymore too with updated gatsby-source-wordpress

There's not much more I can think of right now if I can't reproduce problem anymore :/

@sebastienfi
Copy link
Contributor Author

Yup, solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants