Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support hooks in analysis pipeline #1887

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft

support hooks in analysis pipeline #1887

wants to merge 4 commits into from

Conversation

moshaad7
Copy link
Contributor

@moshaad7 moshaad7 commented Oct 10, 2023

Description

Aim is to let embedder register analyzers in bleve, at run time.
These registered analyzers can then be specified in the index mapping as analyzers for fields.

change log

  • new Analyzer interface

    • new Type() method
    • Analyze() method now returns an interface{} instead of TokenStream
      • caller can cast returned value to appropriate type based on analyzer.Type()
      • for example, some analyzers like to return TokenStream while some would return TokenStream and error.
  • updates in Field interface

    • Analyze() method of a field can now return an error.
    • error handling will be done by scorch/upside_down
  • New Registry to store embedder submitted analysis hooks

  • update analyzer registry to also hold analyzers created using hooks

Related changes:

const (
TokensAnalyzerType = "token"
HookTokensAnalyzerType = "hook_token"
VectorAnalyzerType = "vector"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move vector related stuff to a separate file with "vector" build tag

Err error
}

func AnalyzeForTokens(analyzer Analyzer, input []byte) (TokenStream, error) {
Copy link
Contributor Author

@moshaad7 moshaad7 Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment:
// A utility function, helpful for analyzing an input to generate TokenStream ( and error, if any )

Previously, Analyze() method of an analyzer to return TokenStream.
But as per the change in this PR, Analyze() method will now return a value of type interface{}.
( Validating and using it can be done based on analyzer.Type() )

Thus, For the benefit of users of old Analyzer interface, this utiity will come handly , to migrate to new Analyzer interface.

analyzerType := analyzer.Type()
if analyzerType != TokensAnalyzerType &&
analyzerType != HookTokensAnalyzerType {
return nil, fmt.Errorf("cannot analyze text with analyzer of type: %s",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternate error msg: "given analyzer is not compatible to be used as a token analyzer"

@moshaad7 moshaad7 self-assigned this Oct 10, 2023
- While analyzing a doc, analysis of few fields can fail.
- We want to index the part of doc for which analysis succeeded.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants