Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initialCanonicalUrl is not taking into account basePath from config #59971

Closed
1 task done
zlwaterfield opened this issue Dec 27, 2023 · 9 comments
Closed
1 task done
Labels
bug Issue was opened via the bug report template. locked

Comments

@zlwaterfield
Copy link
Contributor

Link to the code that reproduces this issue

https://github.com/zlwaterfield/initial-canonical-url-bug

To Reproduce

  1. Create a Next.js app that uses a basePath (for example '/g/`) and the app router (or look at example repo)
  2. Create and load a page in the application (/path/testing-page in example repo)
  3. Inspect the page and find the script with self.__next_f.push(....
  4. Find the initialCanonicalUrl and it will be missing the base path (path).

Also see #53274 for more information.

Current vs. Expected behavior

The initialCanonicalUrl should have the basePath included in it.

Verify canary release

  • I verified that the issue exists in the latest Next.js canary release

Provide environment information

Operating System:
  Platform: darwin
  Arch: arm64
  Version: Darwin Kernel Version 23.1.0: Mon Oct  9 21:27:24 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6000
Binaries:
  Node: 18.17.0
  npm: 9.6.7
  Yarn: 1.22.19
  pnpm: 8.12.1
Relevant Packages:
  next: 14.0.4
  eslint-config-next: N/A
  react: 18.2.0
  react-dom: 18.2.0
  typescript: N/A
Next.js Config:
  output: N/A

Which area(s) are affected? (Select all that apply)

App Router, Metadata (metadata, generateMetadata, next/head), Script optimization (next/script)

Additional context

This is causing issues with SEO because crawlers see the URL and think it's a valid URL. From what I gather, there is currently no way to properly set it. We are getting 404s from this in the Google Search Console.

My only idea to fix it right now is to rewrite the URL in our Cloudflare worker until a fix is shipped in Next.js

@zlwaterfield zlwaterfield added the bug Issue was opened via the bug report template. label Dec 27, 2023
@omerman
Copy link

omerman commented Jun 17, 2024

Why is this not addressed :/ I have tons of 404 urls because of it in my search console 😓

@huozhi
Copy link
Member

huozhi commented Jun 26, 2024

Google won't pick the initialCanonicalUrl in tht html respnose for SEO, that value is only for internal state. The canonical url should be configured through Metadata API through alternates.canonical. https://nextjs.org/docs/app/api-reference/functions/generate-metadata then google can pick it up properly.

@huozhi huozhi closed this as completed Jun 26, 2024
@c0b41
Copy link
Contributor

c0b41 commented Jun 26, 2024

@huozhi can you stop closing issue, everyone have same issue, crawler picking up everything self.__next_f inside, i have so many 404 url's

for refs #53274 #40143 #41433

@huozhi
Copy link
Member

huozhi commented Jun 26, 2024

@c0b41 If the assumption is that google crawler read those content and parse it as canonical url, I'd assume there will be a much wider impact. Or it could also be search console having issues with specific app. There're only screenshots in 40143 that is not available to investigate.

@omerman
Copy link

omerman commented Jun 26, 2024

I will be happy to conduct a google meet and show you my own search console how thousands of urls are considered 404 by google because of initialCanonical.
Moreover, in another project i had to go over 40k static pages i have and add a script to modify this variable so that google wont complain about it 🤷‍♂️

@arun-kambhammettu
Copy link

This shouldn't be closed, 404s and 308, Google is picking up initialCanonicalUrl

@huozhi
Copy link
Member

huozhi commented Jun 27, 2024

I wonder if it's related to this fix (#67135), when you have a static not found page, but since it's missing noindex so that google still indexed it but actually it should be ignored.

@omerman
Copy link

omerman commented Jun 28, 2024

@huozhi To be honest I dont think so, my site is not statically generated, and I see a noindex tag within the 404 pages.
My site's version does not include #67135 fix yet.
I think it's much more simple than that.. I think google simply inspects the content of the page (just like a simple view source) and it recognizes variables that matches the pattern of links.. e.g contains slashes... and simply treats those as "links" coming from the page.. That's my theory.. I think that, because whenever I have a page in a folder like:
[...paths], I see that the paths variable which is also embeded inside the inline content of the page, is also being considered as links by google.. #40143

Copy link
Contributor

This closed issue has been automatically locked because it had no new activity for 2 weeks. If you are running into a similar issue, please create a new issue with the steps to reproduce. Thank you.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue was opened via the bug report template. locked
Projects
None yet
Development

No branches or pull requests

5 participants