Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace addressit with pelias native parser #1287

Merged
merged 55 commits into from
Oct 1, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
8af64f3
feat(parser): replace addressit with pelias native parser
missinglink May 2, 2019
62d23e2
feat(parser): improved postfix cursor position for text with no admin…
missinglink May 2, 2019
4632452
feat(parser): pelias/parser improvements
missinglink May 3, 2019
10c9889
feat(parser): bump pelias/parser version
missinglink May 3, 2019
559fdb0
feat(parser): bump pelias/parser version
missinglink May 13, 2019
30a6286
feat(parser): bump pelias/parser version
missinglink May 14, 2019
b24bf31
feat(parser): bump pelias/parser version
missinglink May 14, 2019
2d82d38
feat(parser): updates to tokenizer sanitizer
missinglink May 15, 2019
4de4983
typo
missinglink May 15, 2019
88e2390
feat(parser): stricter tokenization of exact matching admin queries
missinglink May 15, 2019
308df52
feat(parser): switch to using multi_match for admin subqueries
missinglink May 15, 2019
f5dcf3c
feat(admin_subqueries): test cross_fields query
missinglink May 15, 2019
58f8171
feat(admin_subqueries): test operator:and query
missinglink May 15, 2019
cd2f159
feat(admin_subqueries): set all boosts to 1
missinglink May 15, 2019
22d69aa
feat(admin_subqueries): add locality_a and country_a to multi_match
missinglink May 15, 2019
79c5c45
feat(admin_subqueries): revert to operator:or
missinglink May 15, 2019
f31d5ae
feat(admin_subqueries): remove cutoff_frequency
missinglink May 15, 2019
325c070
feat(admin_subqueries): move admin matching to MUST condition
missinglink May 15, 2019
5e82ec6
feat(tokenizer): consider query as complete if the final char is a nu…
missinglink May 16, 2019
74a337d
feat(autocomplete): test removing exact_matching subquery
missinglink May 16, 2019
e0b8a6b
feat(admin_subqueries): add cutoff_frequency
missinglink May 16, 2019
235fafe
feat(pelias_parser): admin queries - remove subject from admin subquery
missinglink May 16, 2019
f281688
feat(admin_subqueries): remove cutoff_frequency
missinglink May 16, 2019
cfbd5f7
feat(autocomplete): use phrase index for complete tokens
missinglink May 16, 2019
68a0776
feat(parser): remove parsed_text.name
missinglink May 16, 2019
9258f2e
feat(parser): so not consider address parses as safe to use with an n…
missinglink May 16, 2019
7a62b3d
feat(parser): bump pelias/parser version
missinglink May 16, 2019
6bcd91d
feat(tokenizer): consider query as complete if the $subject is not at…
missinglink May 16, 2019
5923a1a
feat(parser): bump pelias/parser version
missinglink May 16, 2019
12af8cc
feat(autocomplete): experiment adding name.default to admin multi_match
missinglink May 16, 2019
43d727b
feat(autocomplete): progess commit
missinglink Jun 3, 2019
61cceeb
feat(autocomplete): typo
missinglink Jun 4, 2019
dc69e0d
feat(autocomplete): improved matching at the cusp
missinglink Jun 4, 2019
0cdc5e8
feat(autocomplete): improved performance and reduced noise for admin …
missinglink Jun 4, 2019
4b79aa1
feat(deps): bump parser dep version
missinglink Jun 5, 2019
1045c84
feat(deps): bump parser dep version
missinglink Jun 6, 2019
2743575
feat(deps): bump parser dep version
missinglink Jun 6, 2019
f079baf
test: disable parserConsumedAllTokens for admin parses
missinglink Jun 6, 2019
f07cb90
feat(deps): bump parser dep version
missinglink Jun 6, 2019
40b62bc
feat(query): add should subquery for cross_street matching
missinglink Jun 7, 2019
9d69fe1
feat(logging): add summary logging for pelias parser
missinglink Jun 10, 2019
5974f7a
feat(deps): bump parser dep version
missinglink Jun 12, 2019
85693a8
feat(pelias_parser): additional parser tests
missinglink Jun 17, 2019
abeb48f
feat(deps): bump parser dep version
missinglink Jun 17, 2019
770e820
feat(pelias_parser): fix tests
missinglink Jul 10, 2019
2257ec7
feat(search_addressit): generate cross_street subquery where available
missinglink Aug 15, 2019
d7d5f7b
feat(pelias_parser): limit input text to 140 characters
missinglink Sep 16, 2019
670666c
feat(pelias_parser): replace peliasQueryPartialToken analyzer with pe…
missinglink Sep 25, 2019
30760b9
feat(pelias_parser): disable "ngrams_last_token_only_multi" view when…
missinglink Sep 25, 2019
866c479
Add context to pelias parser logs
orangejulius Sep 25, 2019
c0749a0
Pin to pelias-parser-1.38.0 for now
orangejulius Sep 25, 2019
97f6496
refactor(pelias_parser): add code comments relating to "add_name_to_m…
missinglink Oct 1, 2019
b2d3b16
refactor(pelias_parser): remove disused code/comments
missinglink Oct 1, 2019
1e1cf24
feat(pelias_parser): completely remove "addressit" and references to it
missinglink Oct 1, 2019
88656e0
refactor(pelias_parser): remove references to "original style queries"
missinglink Oct 1, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 0 additions & 14 deletions controller/predicates/is_addressit_parse.js

This file was deleted.

14 changes: 14 additions & 0 deletions controller/predicates/is_pelias_parse.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
const _ = require('lodash');
const Debug = require('../../helper/debug');
const debugLog = new Debug('controller:predicates:is_pelias_parse');
const stackTraceLine = require('../../helper/stackTraceLine');

// returns true IFF req.clean.parser is pelias
module.exports = (req, res) => {
const is_pelias_parse = _.get(req, 'clean.parser') === 'pelias';
debugLog.push(req, () => ({
reply: is_pelias_parse,
stack_trace: stackTraceLine()
}));
return is_pelias_parse;
};
4 changes: 2 additions & 2 deletions middleware/confidenceScore.js
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@ function setup(peliasConfig) {
}

function computeScores(req, res, next) {
// do nothing if no result data set or if query is not of the original variety
// do nothing if no result data set or if query is not of the pelias_parser variety
if (check.undefined(req.clean) || check.undefined(res) ||
check.undefined(res.data) || check.undefined(res.meta) ||
res.meta.query_type !== 'search_addressit') {
res.meta.query_type !== 'search_pelias_parser') {
return next();
}

Expand Down
4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -36,16 +36,15 @@
"node": ">=8.0.0"
},
"dependencies": {
"@hapi/joi": "^15.0.0",
"@mapbox/geojson-extent": "^0.3.1",
"addressit": "1.7.0",
"async": "^3.0.1",
"check-types": "^10.0.0",
"elasticsearch": "^16.0.0",
"express": "^4.8.8",
"geojson": "^0.5.0",
"geolib": "^3.0.0",
"iso-639-3": "^1.0.0",
"@hapi/joi": "^15.0.0",
"locale": "^0.1.0",
"lodash": "^4.17.4",
"markdown": "^0.5.0",
Expand All @@ -56,6 +55,7 @@
"pelias-logger": "^1.2.0",
"pelias-microservice-wrapper": "^1.7.0",
"pelias-model": "^7.0.0",
"pelias-parser": "1.38.0",
"pelias-query": "^9.14.0",
"pelias-sorting": "^1.2.0",
"predicates": "^2.0.0",
Expand Down
48 changes: 34 additions & 14 deletions query/autocomplete.js
Original file line number Diff line number Diff line change
@@ -1,49 +1,61 @@
const peliasQuery = require('pelias-query');
const defaults = require('./autocomplete_defaults');
const textParser = require('./text_parser_addressit');
const textParser = require('./text_parser_pelias');
const check = require('check-types');
const logger = require('pelias-logger').get('api');
const config = require('pelias-config').generate();
const placeTypes = require('../helper/placeTypes');

// additional views (these may be merged in to pelias/query at a later date)
var views = {
custom_boosts: require('./view/boost_sources_and_layers'),
ngrams_strict: require('./view/ngrams_strict'),
ngrams_last_token_only: require('./view/ngrams_last_token_only'),
ngrams_last_token_only_multi: require('./view/ngrams_last_token_only_multi'),
admin_multi_match_first: require('./view/admin_multi_match_first'),
admin_multi_match_last: require('./view/admin_multi_match_last'),
phrase_first_tokens_only: require('./view/phrase_first_tokens_only'),
pop_subquery: require('./view/pop_subquery'),
boost_exact_matches: require('./view/boost_exact_matches'),
max_character_count_layer_filter: require('./view/max_character_count_layer_filter'),
focus_point_filter: require('./view/focus_point_distance_filter')
};

// add abbrevations for the fields pelias/parser is able to detect.
var adminFields = placeTypes.concat(['locality_a', 'region_a', 'country_a']);

// add some name field(s) to the admin fields in order to improve venue matching
// note: this is a bit of a hacky way to add a 'name' field to the list
// of multimatch fields normally reserved for admin subquerying.
// in some cases we are not sure if certain tokens refer to admin components
// or are part of the place name (such as some venue names).
// the variable name 'add_name_to_multimatch' is arbitrary, it can be any value so
// long as there is a corresponding 'admin:*:field' variable set which defines
// the name of the field to use.
// this functionality is not enabled unless the 'input:add_name_to_multimatch'
// variable is set to a non-empty value at query-time.
adminFields = adminFields.concat(['add_name_to_multimatch']);

//------------------------------
// autocomplete query
//------------------------------
var query = new peliasQuery.layout.FilteredBooleanQuery();

// mandatory matches
query.score( views.phrase_first_tokens_only, 'must' );
query.score( views.ngrams_last_token_only, 'must' );
query.score( views.ngrams_last_token_only_multi( adminFields ), 'must' );

// admin components
query.score( views.admin_multi_match_first( adminFields ), 'must');
query.score( views.admin_multi_match_last( adminFields ), 'must');

// address components
query.score( peliasQuery.view.address('housenumber') );
query.score( peliasQuery.view.address('street') );
query.score( peliasQuery.view.address('cross_street') );
query.score( peliasQuery.view.address('postcode') );

// admin components
query.score( peliasQuery.view.admin('country') );
query.score( peliasQuery.view.admin('country_a') );
query.score( peliasQuery.view.admin('region') );
query.score( peliasQuery.view.admin('region_a') );
query.score( peliasQuery.view.admin('county') );
query.score( peliasQuery.view.admin('borough') );
query.score( peliasQuery.view.admin('localadmin') );
query.score( peliasQuery.view.admin('locality') );
query.score( peliasQuery.view.admin('neighbourhood') );

// scoring boost
query.score( views.boost_exact_matches );
query.score( peliasQuery.view.focus( views.ngrams_strict ) );
query.score( peliasQuery.view.popularity( views.pop_subquery ) );
query.score( peliasQuery.view.population( views.pop_subquery ) );
Expand Down Expand Up @@ -165,6 +177,14 @@ function generateQuery( clean ){
textParser( clean, vs );
}

// set the 'add_name_to_multimatch' variable only in the case where one
// or more of the admin variables are set.
// the value 'enabled' is not relevant, it just needs to be any non-empty
// value so that the associated field is added to the multimatch query.
// see code comments above for additional information.
let isAdminSet = adminFields.some(field => vs.isset('input:' + field));
if ( isAdminSet ){ vs.var('input:add_name_to_multimatch', 'enabled'); }

return {
type: 'autocomplete',
body: query.render(vs)
Expand Down
55 changes: 42 additions & 13 deletions query/autocomplete_defaults.js
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ module.exports = _.merge({}, peliasQuery.defaults, {
'ngram:cutoff_frequency': 0.01,

'phrase:analyzer': 'peliasQuery',
'phrase:field': 'name.default',
'phrase:field': 'phrase.default',
'phrase:boost': 1,
'phrase:slop': 3,
'phrase:cutoff_frequency': 0.01,
Expand All @@ -46,59 +46,88 @@ module.exports = _.merge({}, peliasQuery.defaults, {
'address:street:boost': 5,
'address:street:cutoff_frequency': 0.01,

'address:cross_street:analyzer': 'peliasStreet',
'address:cross_street:field': 'address_parts.cross_street',
'address:cross_street:boost': 5,
'address:cross_street:cutoff_frequency': 0.01,

'address:postcode:analyzer': 'peliasZip',
'address:postcode:field': 'address_parts.zip',
'address:postcode:boost': 2000,
'address:postcode:cutoff_frequency': 0.01,

// generic multi_match cutoff_frequency
'multi_match:cutoff_frequency': 0.01,
// generic multi_match config
'multi_match:type': 'cross_fields',

// setting 'cutoff_frequency' will result in very common
// terms such as country not scoring at all
// 'multi_match:cutoff_frequency': 0.01,

'admin:country_a:analyzer': 'standard',
'admin:country_a:field': 'parent.country_a.ngram',
'admin:country_a:boost': 1000,
'admin:country_a:boost': 4,
'admin:country_a:cutoff_frequency': 0.01,

'admin:country:analyzer': 'peliasAdmin',
'admin:country:field': 'parent.country.ngram',
'admin:country:boost': 800,
'admin:country:boost': 1,
'admin:country:cutoff_frequency': 0.01,

'admin:dependency:analyzer': 'peliasAdmin',
'admin:dependency:field': 'parent.dependency.ngram',
'admin:dependency:boost': 1,
'admin:dependency:cutoff_frequency': 0.01,

'admin:region:analyzer': 'peliasAdmin',
'admin:region:field': 'parent.region.ngram',
'admin:region:boost': 600,
'admin:region:boost': 1,
'admin:region:cutoff_frequency': 0.01,

'admin:region_a:analyzer': 'peliasAdmin',
'admin:region_a:field': 'parent.region_a.ngram',
'admin:region_a:boost': 600,
'admin:region_a:boost': 4,
'admin:region_a:cutoff_frequency': 0.01,

'admin:macroregion:analyzer': 'peliasAdmin',
'admin:macroregion:field': 'parent.macroregion.ngram',
'admin:macroregion:boost': 1,
'admin:macroregion:cutoff_frequency': 0.01,

'admin:county:analyzer': 'peliasAdmin',
'admin:county:field': 'parent.county.ngram',
'admin:county:boost': 400,
'admin:county:boost': 1,
'admin:county:cutoff_frequency': 0.01,

'admin:localadmin:analyzer': 'peliasAdmin',
'admin:localadmin:field': 'parent.localadmin.ngram',
'admin:localadmin:boost': 200,
'admin:localadmin:boost': 1,
'admin:localadmin:cutoff_frequency': 0.01,

'admin:locality:analyzer': 'peliasAdmin',
'admin:locality:field': 'parent.locality.ngram',
'admin:locality:boost': 200,
'admin:locality:boost': 1,
'admin:locality:cutoff_frequency': 0.01,

'admin:locality_a:analyzer': 'peliasAdmin',
'admin:locality_a:field': 'parent.locality_a.ngram',
'admin:locality_a:boost': 1,
'admin:locality_a:cutoff_frequency': 0.01,

'admin:neighbourhood:analyzer': 'peliasAdmin',
'admin:neighbourhood:field': 'parent.neighbourhood.ngram',
'admin:neighbourhood:boost': 200,
'admin:neighbourhood:boost': 1,
'admin:neighbourhood:cutoff_frequency': 0.01,

'admin:borough:analyzer': 'peliasAdmin',
'admin:borough:field': 'parent.borough.ngram',
'admin:borough:boost': 600,
'admin:borough:boost': 1,
'admin:borough:cutoff_frequency': 0.01,

// an additional 'name' field to add to admin multi-match queries.
// this is used to improve venue matching in cases where the we
// are unsure if the tokens represent admin or name components.
'admin:add_name_to_multimatch:field': 'name.default',

'popularity:field': 'popularity',
'popularity:modifier': 'log1p',
'popularity:max_boost': 20,
Expand All @@ -115,4 +144,4 @@ module.exports = _.merge({}, peliasQuery.defaults, {
'custom:boosting:max_boost': 50, // maximum boosting which can be applied (max_boost/boost = max_score)
'custom:boosting:score_mode': 'sum', // sum all function scores before multiplying the boost
'custom:boosting:boost_mode': 'multiply' // this mode is not relevant because there is no query section
});
});
2 changes: 1 addition & 1 deletion query/search.js
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ function getQuery(vs) {
};
}

// returning undefined is a signal to a later step that the addressit-parsed
// returning undefined is a signal to a later step that a fallback parser
// query should be queried for
return undefined;

Expand Down
17 changes: 6 additions & 11 deletions query/search_addressit.js → query/search_pelias_parser.js
Original file line number Diff line number Diff line change
@@ -1,17 +1,16 @@
const peliasQuery = require('pelias-query');
const defaults = require('./search_defaults');
const textParser = require('./text_parser_addressit');
const textParser = require('./text_parser_pelias');
const check = require('check-types');
const logger = require('pelias-logger').get('api');
const config = require('pelias-config').generate().api;

var placeTypes = require('../helper/placeTypes');
var views = { custom_boosts: require('./view/boost_sources_and_layers') };

// region_a is also an admin field. addressit tries to detect
// region_a, in which case we use a match query specifically for it.
// but address it doesn't know about all of them so it helps to search
// against this with the other admin parts as a fallback
// region_a is also an admin field which can be identified by
// the pelias_parser. this functionality was inherited from the
// previous parser we used prior to the creation of pelias_parser.
var adminFields = placeTypes.concat(['region_a']);

//------------------------------
Expand All @@ -31,14 +30,10 @@ query.score( peliasQuery.view.population( peliasQuery.view.phrase ) );
// address components
query.score( peliasQuery.view.address('housenumber') );
query.score( peliasQuery.view.address('street') );
query.score( peliasQuery.view.address('cross_street') );
query.score( peliasQuery.view.address('postcode') );

// admin components
// country_a and region_a are left as matches here because the text-analyzer
// can sometimes detect them, in which case a query more specific than a
// multi_match is appropriate.
query.score( peliasQuery.view.admin('country_a') );
query.score( peliasQuery.view.admin('region_a') );
query.score( peliasQuery.view.admin_multi_match(adminFields, 'peliasAdmin') );
query.score( views.custom_boosts( config.customBoosts ) );

Expand Down Expand Up @@ -142,7 +137,7 @@ function generateQuery( clean ){
}

return {
type: 'search_addressit',
type: 'search_pelias_parser',
body: query.render(vs)
};
}
Expand Down
Loading