Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

682 dataset task list fetch data from all active resources #717

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
c3d1ad6
update tasklist to show only relevant issues
GeorgeGoodall Dec 4, 2024
2d19b15
Merge commit 'b91dc88aca0bb6e03d76aa0d67ca70ec3e8ce7e6' into 682-data…
GeorgeGoodall Dec 9, 2024
0159e31
Merge commit '600fc91c6065299ffb567c7ce0e90f4f28ca5f47' into 682-data…
GeorgeGoodall Dec 9, 2024
0a9d508
fix bug for invalid param
GeorgeGoodall Dec 9, 2024
af39dc9
fix tests
GeorgeGoodall Dec 9, 2024
8c89de9
remove comments
GeorgeGoodall Dec 9, 2024
381095a
remove unneeded middleware for getting entity issues
GeorgeGoodall Dec 11, 2024
e79497a
remove and fix tests
GeorgeGoodall Dec 11, 2024
7e7c2cf
add js doc
GeorgeGoodall Dec 12, 2024
0dba092
add tests
GeorgeGoodall Dec 12, 2024
595658d
update test name
GeorgeGoodall Dec 12, 2024
70f35d8
Merge remote-tracking branch 'origin/main' into 682-dataset-task-list…
GeorgeGoodall Dec 12, 2024
b3240e1
Merge branch 'main' into 682-dataset-task-list-fetch-data-from-all-ac…
GeorgeGoodall-GovUk Dec 12, 2024
1aba7d3
update queries to ensure we are getting the correct issues
GeorgeGoodall Dec 12, 2024
19190c7
get entity counts for resources
GeorgeGoodall Dec 12, 2024
b4d4674
fix available datasets
GeorgeGoodall Dec 12, 2024
0ffd784
get entity_count on resource
GeorgeGoodall Dec 12, 2024
82eb4bf
look for entity null too
GeorgeGoodall Dec 12, 2024
99235f0
dont group issues as doing it in the query
GeorgeGoodall Dec 12, 2024
099f009
bug fixes
GeorgeGoodall Dec 13, 2024
e789a72
fix tests
GeorgeGoodall Dec 13, 2024
d74b089
Update src/middleware/common.middleware.js
GeorgeGoodall-GovUk Dec 13, 2024
c017706
Merge branch '682-dataset-task-list-fetch-data-from-all-active-resour…
GeorgeGoodall Dec 13, 2024
33c0938
Update src/middleware/datasetTaskList.middleware.js
GeorgeGoodall-GovUk Dec 13, 2024
87d819a
Update src/middleware/middleware.builders.js
GeorgeGoodall-GovUk Dec 13, 2024
d65bb83
Update src/middleware/middleware.builders.js
GeorgeGoodall-GovUk Dec 13, 2024
16f21b9
improved fetchEntitiesIssueCount
GeorgeGoodall Dec 13, 2024
eacadf1
Update src/middleware/middleware.builders.js
GeorgeGoodall-GovUk Dec 13, 2024
f6ec462
improved error handling
GeorgeGoodall Dec 13, 2024
de308c9
Merge commit 'a0b54a89a0180af991df6123d5eb6aa59a1e22de' into 682-data…
GeorgeGoodall Dec 13, 2024
5b034c4
Update src/middleware/datasetTaskList.middleware.js
GeorgeGoodall-GovUk Dec 13, 2024
5ffa55c
Update src/middleware/datasetTaskList.middleware.js
GeorgeGoodall-GovUk Dec 13, 2024
8d2183b
fix special issue types
GeorgeGoodall Dec 13, 2024
061a38c
fix test
GeorgeGoodall Dec 13, 2024
7de56d1
use + instead of concat in sql query
GeorgeGoodall Dec 17, 2024
81db693
Merge commit 'b261a0b8d7096f5953b6fa045805047cce30068b' into 682-data…
GeorgeGoodall Dec 17, 2024
f84569f
improve prepareEntry param validation
GeorgeGoodall Dec 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
188 changes: 118 additions & 70 deletions src/middleware/common.middleware.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import logger from '../utils/logger.js'
import { types } from '../utils/logging.js'
import { entryIssueGroups } from '../utils/utils.js'
import performanceDbApi from '../services/performanceDbApi.js'
import { fetchOne, FetchOptions, FetchOneFallbackPolicy, fetchMany, renderTemplate } from './middleware.builders.js'
import * as v from 'valibot'
Expand Down Expand Up @@ -187,7 +188,7 @@ export const createPaginationTemplateParams = (req, res, next) => {

export const fetchResources = fetchMany({
query: ({ req }) => `
SELECT r.end_date, r.entry_date, r.mime_type, r.resource, r.start_date, rle.endpoint_url, rle.licence, rle.status, rle.latest_log_entry_date, rle.endpoint_entry_date from resource r
SELECT DISTINCT rd.dataset, r.end_date, r.entry_date, r.mime_type, r.resource, r.start_date, rle.endpoint_url, rle.licence, rle.status, rle.latest_log_entry_date, rle.endpoint_entry_date from resource r
LEFT JOIN resource_organisation ro ON ro.resource = r.resource
LEFT JOIN resource_dataset rd ON rd.resource = r.resource
LEFT JOIN reporting_latest_endpoints rle ON r.resource = rle.resource
Expand All @@ -198,6 +199,31 @@ export const fetchResources = fetchMany({
result: 'resources'
})

export const addEntityCountsToResources = async (req, res, next) => {
const { resources } = req

const promises = resources.map(resource => {
const query = `SELECT entry_count FROM dataset_resource WHERE resource = "${resource.resource}"`
return datasette.runQuery(query, resource.dataset)
})

try {
const datasetResources = await Promise.all(promises).catch(error => {
logger.error('Failed to fetch dataset resources', { error, type: types.App })
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. type: types.DataFetch
  2. including the exception instance in the info object doesn't really work - it's not a plain JS object and you need to extract the message and/or stack. grep for errorMessage and errorStack in the codebase for examples.

throw error
})

req.resources = resources.map((resource, i) => {
return { ...resource, entry_count: datasetResources[i]?.formattedData[0]?.entry_count }
})

next()
} catch (error) {
logger.error('Error in addEntityCountsToResources', { error, type: types.App })
next(error)
}
}
GeorgeGoodall-GovUk marked this conversation as resolved.
Show resolved Hide resolved

// Specification

export const fetchSpecification = fetchOne({
Expand Down Expand Up @@ -368,67 +394,25 @@ export const filterOutEntitiesWithoutIssues = (req, res, next) => {

const fetchEntityIssuesForFieldAndType = fetchMany({
query: ({ req, params }) => {
const issueTypeFilter = params.issue_type ? `AND issue_type = '${params.issue_type}'` : ''
const issueFieldFilter = params.issue_field ? `AND field = '${params.issue_field}'` : ''

const issueTypeClause = params.issue_type ? `AND i.issue_type = '${params.issue_type}'` : ''
const issueFieldClause = params.issue_field ? `AND field = '${params.issue_field}'` : ''
return `
SELECT e.entity, i.* FROM entity e
INNER JOIN issue i ON e.entity = i.entity
WHERE e.organisation_entity = ${req.orgInfo.entity}
${issueTypeFilter}
${issueFieldFilter}`
select *
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need everything (select * ...) from that table? Even if we do, it's better to have that stated explicitly as a form of documentation.

from issue i
LEFT JOIN issue_type it ON i.issue_type = it.issue_type
WHERE resource = '${req.resources[0].resource}'
${issueTypeClause}
AND it.responsibility = 'external'
AND it.severity = 'error'
${issueFieldClause}
AND i.dataset = '${req.params.dataset}'
AND entity != ''
`
// LIMIT ${req.dataRange.pageLength} OFFSET ${req.dataRange.offset}
},
dataset: FetchOptions.fromParams,
result: 'issues'
})

export const FilterOutIssuesToMostRecent = (req, res, next) => {
const { resources, issues } = req

const issuesWithResources = issues.filter(issue => {
if (!issue.resource || !resources.find(resource => resource.resource === issue.resource)) {
logger.warn(`Missing resource on issue: ${JSON.stringify(issue)}`)
return false
}
return true
})

const groupedIssues = issuesWithResources.reduce((acc, current) => {
current.start_date = new Date(resources.find(resource => resource.resource === current.resource)?.start_date)
const { entity, field } = current
if (!acc[entity]) {
acc[entity] = {}
}
if (!acc[entity][field]) {
acc[entity][field] = []
}
acc[entity][field].push(current)

return acc
}, {})

const recentIssues = Object.fromEntries(Object.entries(groupedIssues).map(([entityName, issuesByEntity]) =>
[
entityName,
Object.fromEntries(Object.entries(issuesByEntity).map(([field, issues]) => [
field,
issues.sort((a, b) => b.start_date.getTime() - a.start_date.getTime())[0]
]))
]
))

const issuesFlattened = []

Object.values(recentIssues).forEach(issueByEntry => {
Object.values(issueByEntry).forEach(issueByField => {
issuesFlattened.push(issueByField)
})
})

req.issues = issuesFlattened
next()
}

export const removeIssuesThatHaveBeenFixed = async (req, res, next) => {
const { issues, resources } = req

Expand Down Expand Up @@ -503,18 +487,80 @@ export const addFieldMappingsToIssue = (req, res, next) => {
next()
}

export const addReferencesToIssues = (req, res, next) => {
const { issues, entities } = req
// We can only get the issues without entity from the latest resource as we have no way of knowing if those in previous resources have been fixed?
export const fetchEntryIssues = fetchMany({
query: ({ req, params }) => {
const issueTypeClause = params.issue_type ? `AND i.issue_type = '${params.issue_type}'` : ''
const issueFieldClause = params.issue_field ? `AND field = '${params.issue_field}'` : ''
return `
select *
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned previously: select * ...

from issue i
LEFT JOIN issue_type it ON i.issue_type = it.issue_type
WHERE resource = '${req.resources[0].resource}'
${issueTypeClause}
AND it.responsibility = 'external'
AND it.severity = 'error'
AND i.dataset = '${req.params.dataset}'
${issueFieldClause}
AND (entity = '' OR entity is NULL OR i.issue_type in ('${entryIssueGroups.map(issue => issue.type).join("', '")}'))
LIMIT ${req.dataRange.pageLength} OFFSET ${req.dataRange.offset}
`
},
result: 'entryIssues'
})
GeorgeGoodall-GovUk marked this conversation as resolved.
Show resolved Hide resolved

req.issues = issues.map(issue => {
const reference = entities.find(entity => entity.entity === issue.entity)?.reference
export const fetchEntityIssueCounts = fetchMany({
query: ({ req }) => {
const datasetClause = req.params.dataset ? `AND i.dataset = '${req.params.dataset}'` : ''
return `
select dataset, field, i.issue_type, COUNT(resource+line_number) as count
from issue i
LEFT JOIN issue_type it ON i.issue_type = it.issue_type
WHERE resource in ('${req.resources.map(resource => resource.resource).join("', '")}')
AND (entity != '' AND entity IS NOT NULL)
AND it.responsibility = 'external'
AND it.severity = 'error'
${datasetClause}
GROUP BY field, i.issue_type, dataset
`
},
result: 'entityIssueCounts'
})

return { ...issue, reference }
export const getMostRecentResources = (resources) => {
const mostRecentResourcesMap = {}
resources.forEach(resource => {
const currentRecent = mostRecentResourcesMap[resource.dataset]
if (!currentRecent || new Date(currentRecent.start_date) < resource.start_date) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comparing (possibly invalid) date to a string here. JS will turn this into string comparison but one of the strings can be "Invalid date" etc. Use Date.getTime() for comparison

mostRecentResourcesMap[resource.dataset] = resource
}
})

next()
return Object.values(mostRecentResourcesMap)
}

export const fetchEntryIssueCounts = fetchMany({
query: ({ req }) => {
const datasetClause = req.params.dataset ? `AND i.dataset = '${req.params.dataset}'` : ''

const mostRecentResources = getMostRecentResources(req.resources)

const resourceIds = Object.values(mostRecentResources).map(resource => resource.resource)

return `
select dataset, field, i.issue_type, COUNT(resource + line_number) as count
from issue i
LEFT JOIN issue_type it ON i.issue_type = it.issue_type
WHERE resource in ('${resourceIds.join("', '")}')
AND (entity = '' OR entity is NULL)
AND it.responsibility = 'external'
AND it.severity = 'error'
${datasetClause}
GROUP BY field, i.issue_type, dataset
`
},
result: 'entryIssueCounts'
})

/**
* This middleware chain is responsible for retrieving all entities for the given organisation, their latest issues,
* filtering out issues that have been fixed, and constructing the table params.
Expand All @@ -529,11 +575,11 @@ export const addReferencesToIssues = (req, res, next) => {
*/
export const processRelevantIssuesMiddlewares = [
fetchEntityIssuesForFieldAndType,
FilterOutIssuesToMostRecent,
removeIssuesThatHaveBeenFixed,
// arguably removeIssuesThatHaveBeenFixed should be s step however we have only currently found one organisation,
// however this step is very time consuming, so in order to progress im commenting it out for now
// removeIssuesThatHaveBeenFixed,
fetchFieldMappings,
addFieldMappingsToIssue,
addReferencesToIssues
addFieldMappingsToIssue
]

// Other
Expand Down Expand Up @@ -593,14 +639,16 @@ export const getSetDataRange = (pageLength) => (req, res, next) => {

export function getErrorSummaryItems (req, res, next) {
const { issue_type: issueType, issue_field: issueField } = req.params
const { issues, issueCount, entities, resources } = req
const { entryIssues, issues: entityIssues, issueCount, entities, resources } = req

const issues = entityIssues || entryIssues
GeorgeGoodall-GovUk marked this conversation as resolved.
Show resolved Hide resolved

const totalRecordCount = entities ? entities.length : resources[0].entry_count
const totalIssues = issueCount?.count || issues.length

const errorHeading = ''
const issueItems = [{
html: performanceDbApi.getTaskMessage({ issue_type: issueType, num_issues: totalIssues, entityCount: totalRecordCount, field: issueField }, true)
html: performanceDbApi.getTaskMessage({ issue_type: issueType, num_issues: totalIssues, rowCount: totalRecordCount, field: issueField }, true)
}]

req.errorSummary = {
Expand Down
Loading
Loading