As a BHL developer I want a shorter version of a dump with filtered data #61

dimus · 2023-01-03T17:35:00Z

According to @mlichtenberg the following filters are currently applied in the previous version of bhlindex:

If the output WILL be filtered, then the needed columns are

names.csv

NameID
DetectedName
MatchedCanonical
MatchedFullName
RecordID
DataSourceID

occurrences.csv

NameID
PageID

If the output will NOT be filtered, then the needed columns are:

names.csv

NameID
DetectedName
MatchedCanonical
MatchedFullName
RecordID
DataSourceID
MatchSortOrder
MatchType
OddsLog10
Curation
Error

occurrences.csv

NameID
PageID

The text was updated successfully, but these errors were encountered:

dimus · 2023-01-03T17:36:16Z

Filter:

COPY (
SELECT [n.name](http://n.name/), n.matched_name, n.matched_canonical
FROM name_strings n INNER JOIN name_statuses st ON [n.name](http://n.name/) = [st.name](http://st.name/)
WHERE (n.match_type IN ('ExactMatch', 'ExactCanonicalMatch') AND n.curation <> 'Unknown')
OR (n.match_type IN ('FuzzyCanonical', 'FuzzyPartial') AND (st.odds > 1000000 OR n.edit_distance IN (0,1) OR n.stem_edit_distance IN (0,1)))
OR (n.match_type IN ('NoMatch', '') AND st.odds > 1000000)
OR (n.match_type = 'ExactPartialMatch')
) TO STDOUT DELIMITER '|'

dimus closed this as completed in c13c270 Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

As a BHL developer I want a shorter version of a dump with filtered data #61

As a BHL developer I want a shorter version of a dump with filtered data #61

dimus commented Jan 3, 2023

dimus commented Jan 3, 2023

As a BHL developer I want a shorter version of a dump with filtered data #61

As a BHL developer I want a shorter version of a dump with filtered data #61

Comments

dimus commented Jan 3, 2023

dimus commented Jan 3, 2023