Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(gatsby-source-drupal): Use list of UUIDs generated by Drupal to fetch content individually #32131

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
186 changes: 56 additions & 130 deletions packages/gatsby-source-drupal/src/gatsby-node.js
Original file line number Diff line number Diff line change
Expand Up @@ -22,18 +22,25 @@ const agent = {
// http2: new http2wrapper.Agent(),
}

let lastReport = 0
const REPORT_EVERY_N = 1000
async function worker([url, options]) {
return got(url, {
const result = await got(url, {
agent,
cache: false,
// request: http2wrapper.auto,
// http2: true,
...options,
})
const remainingRequests = requestQueue.length()
if (Math.abs(lastReport - remainingRequests) >= REPORT_EVERY_N) {
console.log(`Fetching: ${url} (${remainingRequests} requests remaining)`)
lastReport = remainingRequests
}
return result
}

const requestQueue = require(`fastq`).promise(worker, 20)

const asyncPool = require(`tiny-async-pool`)
const bodyParser = require(`body-parser`)

Expand Down Expand Up @@ -77,7 +84,7 @@ exports.sourceNodes = async (
apiBase = `jsonapi`,
basicAuth = {},
filters,
headers,
headers = {},
params = {},
concurrentFileRequests = 20,
concurrentAPIRequests = 20,
Expand All @@ -95,12 +102,17 @@ exports.sourceNodes = async (
enabledLanguages: [`und`],
translatableEntities: [],
},
useAuthOn = [],
} = pluginOptions
const { createNode, setPluginStatus, touchNode } = actions

// Update the concurrency limit from the plugin options
requestQueue.concurrency = concurrentAPIRequests

if (typeof basicAuth.username === 'string' && typeof basicAuth.password === 'string') {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to work around: sindresorhus/got#1169

headers['Authorization'] = `Basic ${Buffer.from(`${basicAuth.username}:${basicAuth.password}`).toString('base64')}`
}

if (webhookBody && Object.keys(webhookBody).length) {
const changesActivity = reporter.activityTimer(
`loading Drupal content changes`,
Expand Down Expand Up @@ -173,8 +185,6 @@ exports.sourceNodes = async (
const res = await requestQueue.push([
urlJoin(baseUrl, `gatsby-fastbuilds/sync/`, lastFetched.toString()),
{
username: basicAuth.username,
password: basicAuth.password,
headers,
searchParams: params,
responseType: `json`,
Expand Down Expand Up @@ -268,143 +278,59 @@ exports.sourceNodes = async (

drupalFetchActivity.start()

let allData
try {
const res = await requestQueue.push([
urlJoin(baseUrl, apiBase),
{
username: basicAuth.username,
password: basicAuth.password,
headers,
searchParams: params,
responseType: `json`,
},
])
allData = await Promise.all(
_.map(res.body.links, async (url, type) => {
const dataArray = []
if (disallowedLinkTypes.includes(type)) return
if (!url) return
if (!type) return

// Lookup this type in our list of language alterable entities.
const isTranslatable = languageConfig.translatableEntities.some(
entityType => entityType === type
)

const getNext = async url => {
if (typeof url === `object`) {
// url can be string or object containing href field
url = url.href

// Apply any filters configured in gatsby-config.js. Filters
// can be any valid JSON API filter query string.
// See https://www.drupal.org/docs/8/modules/jsonapi/filtering
if (typeof filters === `object`) {
if (filters.hasOwnProperty(type)) {
url = new URL(url)
const filterParams = new URLSearchParams(filters[type])
const filterKeys = Array.from(filterParams.keys())
filterKeys.forEach(filterKey => {
// Only add filter params to url if it has not already been
// added.
if (!url.searchParams.has(filterKey)) {
url.searchParams.set(filterKey, filterParams.get(filterKey))
}
})
url = url.toString()
}
}
}

let d
try {
d = await requestQueue.push([
url,
{
username: basicAuth.username,
password: basicAuth.password,
headers,
responseType: `json`,
},
])
} catch (error) {
if (error.response && error.response.statusCode == 405) {
// The endpoint doesn't support the GET method, so just skip it.
return
} else {
console.error(`Failed to fetch ${url}`, error.message)
console.log(error)
throw error
}
}
dataArray.push(...d.body.data)
// Add support for includes. Includes allow entity data to be expanded
// based on relationships. The expanded data is exposed as `included`
// in the JSON API response.
// See https://www.drupal.org/docs/8/modules/jsonapi/includes
if (d.body.included) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're losing "included"

dataArray.push(...d.body.included)
}
if (d.body.links && d.body.links.next) {
await getNext(d.body.links.next)
const listResponse = await requestQueue.push([
urlJoin(baseUrl, 'gatsby/content-list'),
{
headers,
}
])
const listResponseBody = JSON.parse(listResponse.body)
Copy link
Contributor Author

@Auspicus Auspicus Jun 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For whatever reason, passing responseType: 'json' to got and letting it do the JSON.parse results in some very weird artifacts in some environments. It worked 100% of the time on my local builds but doing a build in CI failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's almost certainly that somehow your CI version isn't getting the right version of got. Probably because you're running this in a local plugin which means that it's whatever version of got is resolving in your site which because Gatsby's dependency tree unfortunately contains older versions of Got, isn't going to be predictable atm. Probably best for testing to publish the module to a temp npm package so that it can resolve its own dependencies correctly.


const requestPromises = []
for (let entityTypeAndBundle in listResponseBody) {
if (disallowedLinkTypes.indexOf(entityTypeAndBundle) !== -1) continue
const isTranslatable = languageConfig.translatableEntities.indexOf(entityTypeAndBundle) !== -1
const shouldUseAuth = useAuthOn.indexOf(entityTypeAndBundle) !== -1
const [entityType, entityBundle] = entityTypeAndBundle.split('--')

for (let entityUuid of listResponseBody[entityTypeAndBundle]) {
requestPromises.push(
requestQueue.push([
urlJoin(baseUrl, apiBase, `/${entityType}/${entityBundle}/${entityUuid}`),
{
headers: shouldUseAuth ? headers : undefined,
}
}

if (isTranslatable === false) {
await getNext(url)
} else {
for (let i = 0; i < languageConfig.enabledLanguages.length; i++) {
let currentLanguage = languageConfig.enabledLanguages[i]
const urlPath = url.href.split(`${apiBase}/`).pop()
const baseUrlWithoutTrailingSlash = baseUrl.replace(/\/$/, ``)
// The default language's JSON API is at the root.
if (
currentLanguage === getOptions().languageConfig.defaultLanguage ||
baseUrlWithoutTrailingSlash.slice(-currentLanguage.length) ==
currentLanguage
) {
currentLanguage = ``
}
]).then(response => JSON.parse(response.body).data).catch(() => {})
)

const joinedUrl = urlJoin(
baseUrlWithoutTrailingSlash,
currentLanguage,
apiBase,
urlPath
if (isTranslatable) {
for (let language of languageConfig.enabledLanguages) {
if (language !== languageConfig.defaultLanguage) {
requestPromises.push(
requestQueue.push([
urlJoin(baseUrl, language, apiBase, `/${entityType}/${entityBundle}/${entityUuid}`),
{
headers: shouldUseAuth ? headers : undefined
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dumps all the other headers if you don't want auth on the entity, need a better way to do this.

}
]).then(response => JSON.parse(response.body).data).catch(() => {})
)
const dataForLanguage = await getNext(joinedUrl)

dataArray.push(...dataForLanguage)
}
}

const result = {
type,
data: dataArray,
}

// eslint-disable-next-line consistent-return
return result
})
)
} catch (e) {
gracefullyRethrow(drupalFetchActivity, e)
return
}
}
}

const allData = await Promise.all(requestPromises)

drupalFetchActivity.end()

const nodes = new Map()

// first pass - create basic nodes
_.each(allData, contentType => {
if (!contentType) return
_.each(contentType.data, datum => {
if (!datum) return
const node = nodeFromData(datum, createNodeId, entityReferenceRevisions)
nodes.set(node.id, node)
})
_.each(allData, datum => {
if (!datum) return
const node = nodeFromData(datum, createNodeId, entityReferenceRevisions)
nodes.set(node.id, node)
})

// second pass - handle relationships and back references
Expand Down