Skip to content

Commit

Permalink
Merge pull request #43 from ai-cfia/issue40-integrate-pagination
Browse files Browse the repository at this point in the history
Issue40-integrate-pagination
  • Loading branch information
k-allagbe authored Feb 13, 2024
2 parents 11205ce + 73df7a4 commit 730c319
Show file tree
Hide file tree
Showing 14 changed files with 236 additions and 200 deletions.
50 changes: 22 additions & 28 deletions .env.template
Original file line number Diff line number Diff line change
@@ -1,21 +1,20 @@
# Endpoint URL of Azure Cognitive Search service. Format:
# https://[service-name].search.windows.net
FINESSE_BACKEND_AZURE_SEARCH_ENDPOINT=<Azure-Search-Service-Endpoint>
FINESSE_BACKEND_AZURE_SEARCH_ENDPOINT=

# API key for Azure Cognitive Search. Used for operations such as
# querying the search index.
FINESSE_BACKEND_AZURE_SEARCH_API_KEY=<Azure-Search-API-Key>
FINESSE_BACKEND_AZURE_SEARCH_API_KEY=

# Name of the search index in Azure Cognitive Search. Contains documents
# for search operations.
FINESSE_BACKEND_AZURE_SEARCH_INDEX_NAME=<Search-Index-Name>
FINESSE_BACKEND_AZURE_SEARCH_INDEX_NAME=

# Boolean flag to enable or disable debug mode for the application.
# Defaults to False when not set. Optional.
# FINESSE_BACKEND_DEBUG_MODE=<True/False>
# Boolean flag to enable or disable debug mode for the application. Optional.
# FINESSE_BACKEND_DEBUG_MODE=False

# URL for static search files.
FINESSE_BACKEND_STATIC_FILE_URL=https://api.github.com/repos/ai-cfia/finesse-data/contents
# FINESSE_BACKEND_STATIC_FILE_URL=https://api.github.com/repos/ai-cfia/finesse-data/contents

# Message for empty search query errors. Optional.
# FINESSE_BACKEND_ERROR_EMPTY_QUERY="Search query cannot be empty"
Expand All @@ -37,46 +36,41 @@ FINESSE_BACKEND_STATIC_FILE_URL=https://api.github.com/repos/ai-cfia/finesse-dat
# FINESSE_BACKEND_FUZZY_MATCH_THRESHOLD=90

# Regular expression pattern used for sanitizing input to prevent log injection. Optional.
# FINESSE_BACKEND_SANITIZE_PATTERN="[^\w \d\"#\$%&'\(\)\*\+,-\.\/:;?@\^_`{\|}~]+|\%\w+|;|/|\(|\)"
# FINESSE_BACKEND_SANITIZE_PATTERN="[^\w \d\"#\$%&'\(\)\*\+,-\.\/:;?@\^_`{\|}~]+|\%\w+|;|/|\(|\)]"

# API key for OpenAI, used for authentication when making requests.
# Obtain from: https://portal.azure.com/#home
OPENAI_API_KEY=<OpenAI-API-Key>
OPENAI_API_KEY=

# The version of the OpenAI API being used.
# Example: 2023-05-15
OPENAI_API_VERSION=<OpenAI-API-Version>
OPENAI_API_VERSION=

# Deployment name for GPT-based models in Azure OpenAI.
# Example: davinci
AZURE_OPENAI_GPT_DEPLOYMENT=<Azure-OpenAI-GPT-Deployment>
AZURE_OPENAI_GPT_DEPLOYMENT=

# Deployment name for ChatGPT models in Azure OpenAI.
# Example: chat
AZURE_OPENAI_CHATGPT_DEPLOYMENT=<Azure-OpenAI-ChatGPT-Deployment>
AZURE_OPENAI_CHATGPT_DEPLOYMENT=

# Data Source Name (DSN) for configuring a database connection in Louis's system.
# Format: postgresql://PGUSER:PGPASSWORD@DB_SERVER_CONTAINER_NAME/PGBASE
LOUIS_DSN=<Louis-DSN>
LOUIS_DSN=

# Schema within the Louis database system.
# Example: louis_0.0.5
LOUIS_SCHEMA=<Louis-Schema>
LOUIS_SCHEMA=

# Endpoint URL for making requests to the OpenAI API.
# Obtain along with the OpenAI API Key.
OPENAI_ENDPOINT=<OpenAI-Endpoint>
OPENAI_ENDPOINT=

# File containing the weights for the search.
# Example: finesse-weights.json
FINESSE_WEIGHTS=<Finesse-Weights>
FINESSE_WEIGHTS={"recency":1,"traffic":1,"current":0.5,"typicality":0.2,"similarity":1}

# Specific OpenAI API model engine to be used.
# Example: ada
OPENAI_API_ENGINE=<OpenAI-API-Engine>
OPENAI_API_ENGINE=

# Fields to highlight in Azure Cognitive Search responses. Optional.
# FINESSE_BACKEND_HIGHLIGHT_FIELDS=content
# JSON map for transforming Azure Search responses. Represented as a JSON string. Optional.
# Knowledge of the index search result structure is required.
FINESSE_BACKEND_AZURE_SEARCH_TRANSFORM_MAP={"id": "/id", "title": "/title", "score": "/@search.score", "url": "/url", "content": "/@search.highlights/content/0", "last_updated": "/last_updated"}

# HTML tag used for highlighting in Azure Cognitive Search responses. Optional.
# FINESSE_BACKEND_HIGHLIGHT_TAG=strong
# Parameters for Azure Cognitive Search queries. Represented as a JSON string. Optional.
# Consult https://learn.microsoft.com/en-us/python/api/azure-search-documents/azure.search.documents.searchclient?view=azure-python#azure-search-documents-searchclient-search
FINESSE_BACKEND_AZURE_SEARCH_PARAMS={"highlight_fields": "content", "highlight_pre_tag": "<strong>", "highlight_post_tag": "</strong>"}
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ venv/
# Ignore IDE and editor-specific files (customize this based on your editor or IDE)
.idea/

.DS_Store

# Ignore environment-specific files
.env

Expand Down
5 changes: 2 additions & 3 deletions app/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
from .app_creator import create_app
from .config import Config
from .config import create_config

configuration = Config()
app = create_app(configuration)
app = create_app(create_config())
2 changes: 1 addition & 1 deletion app/app_creator.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
def create_app(config):
app = Flask(__name__)
CORS(app)
app.config.from_object(config)
app.config.update(config)

from .blueprints.monitor import monitor_blueprint
from .blueprints.search import search_blueprint
Expand Down
8 changes: 7 additions & 1 deletion app/blueprints/search.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,14 @@ def get_non_empty_query():

@search_blueprint.route("/azure", methods=["POST"])
def search_azure():
config = current_app.config
skip = request.args.get("skip", default=config["AZURE_SEARCH_SKIP"], type=int)
top = request.args.get("top", default=config["AZURE_SEARCH_TOP"], type=int)
query = get_non_empty_query()
results = search(query, current_app.config["AZURE_CONFIG"])
search_params = {**config["AZURE_SEARCH_PARAMS"], "skip": skip, "top": top}
client = config["AZURE_SEARCH_CLIENT"]
transform_map = config["AZURE_SEARCH_TRANSFORM_MAP"]
results = search(query, client, search_params, transform_map)
return jsonify(results)


Expand Down
126 changes: 69 additions & 57 deletions app/config.py
Original file line number Diff line number Diff line change
@@ -1,71 +1,83 @@
import json
import os
from dataclasses import dataclass
from typing import TypedDict

import app.constants as constants
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from dotenv import load_dotenv
from index_search import AzureIndexSearchConfig

load_dotenv()

AZURE_SEARCH_ENDPOINT = ""
AZURE_SEARCH_INDEX_NAME = ""
AZURE_SEARCH_API_KEY = ""
DEFAULT_DEBUG_MODE = "False"
DEFAULT_ERROR_EMPTY_QUERY = "Search query cannot be empty"
DEFAULT_ERROR_AZURE_FAILED = "Azure index search failed."
DEFAULT_ERROR_FINESSE_DATA_FAILED = "finesse-data static search failed"
DEFAULT_ERROR_UNEXPECTED = "Unexpected error."
DEFAULT_FUZZY_MATCH_THRESHOLD = 90
DEFAULT_ERROR_AILAB_FAILED = "Ailab-db search failed."
DEFAULT_SANITIZE_PATTERN = (
"[^\w \d\"#\$%&'\(\)\*\+,-\.\/:;?@\^_`{\|}~]+|\%\w+|;|/|\(|\)"
)
DEFAULT_HIGHLIGHT_FIELDS = "content"
DEFAULT_HIGHLIGHT_TAG = "strong"

class Config(TypedDict):
AZURE_SEARCH_SKIP: int
AZURE_SEARCH_TOP: int
AZURE_SEARCH_CLIENT: SearchClient
AZURE_SEARCH_PARAMS: dict
AZURE_SEARCH_TRANSFORM_MAP: dict
FINESSE_DATA_URL: str
DEBUG: bool
ERROR_EMPTY_QUERY: str
ERROR_AZURE_FAILED: str
ERROR_FINESSE_DATA_FAILED: str
ERROR_AILAB_FAILED: str
ERROR_UNEXPECTED: str
FUZZY_MATCH_THRESHOLD: int
SANITIZE_PATTERN: str

@dataclass
class Config:
AZURE_CONFIG = AzureIndexSearchConfig(
client=SearchClient(
os.getenv("FINESSE_BACKEND_AZURE_SEARCH_ENDPOINT", AZURE_SEARCH_ENDPOINT),
os.getenv(
"FINESSE_BACKEND_AZURE_SEARCH_INDEX_NAME", AZURE_SEARCH_INDEX_NAME
),
AzureKeyCredential(
os.getenv("FINESSE_BACKEND_AZURE_SEARCH_API_KEY", AZURE_SEARCH_API_KEY)
),
),
highlight_fields=os.getenv(
"FINESSE_BACKEND_HIGHLIGHT_FIELDS", DEFAULT_HIGHLIGHT_FIELDS

def create_config() -> Config:
azure_search_client = SearchClient(
endpoint=os.getenv("FINESSE_BACKEND_AZURE_SEARCH_ENDPOINT", ""),
index_name=os.getenv("FINESSE_BACKEND_AZURE_SEARCH_INDEX_NAME", ""),
credential=AzureKeyCredential(
os.getenv("FINESSE_BACKEND_AZURE_SEARCH_API_KEY", "")
),
highlight_tag=os.getenv("FINESSE_BACKEND_HIGHLIGHT_TAG", DEFAULT_HIGHLIGHT_TAG),
)
FINESSE_DATA_URL = os.getenv("FINESSE_BACKEND_STATIC_FILE_URL")
DEBUG = (
os.getenv("FINESSE_BACKEND_DEBUG_MODE", DEFAULT_DEBUG_MODE).lower() == "true"
)
ERROR_EMPTY_QUERY = os.getenv(
"FINESSE_BACKEND_ERROR_EMPTY_QUERY", DEFAULT_ERROR_EMPTY_QUERY
)
ERROR_AZURE_FAILED = os.getenv(
"FINESSE_BACKEND_ERROR_AZURE_FAILED", DEFAULT_ERROR_AZURE_FAILED
)
ERROR_FINESSE_DATA_FAILED = os.getenv(
"FINESSE_BACKEND_ERROR_FINESSE_DATA_FAILED", DEFAULT_ERROR_FINESSE_DATA_FAILED
)
ERROR_AILAB_FAILED = os.getenv(
"FINESSE_BACKEND_ERROR_AILAB_FAILED", DEFAULT_ERROR_AILAB_FAILED
)
ERROR_UNEXPECTED = os.getenv(
"FINESSE_BACKEND_ERROR_UNEXPECTED", DEFAULT_ERROR_UNEXPECTED
)
FUZZY_MATCH_THRESHOLD = int(
os.getenv(
"FINESSE_BACKEND_FUZZY_MATCH_THRESHOLD", DEFAULT_FUZZY_MATCH_THRESHOLD
)
azure_search_transform_map = (
json.loads(os.getenv("FINESSE_BACKEND_AZURE_SEARCH_TRANSFORM_MAP", "{}"))
or constants.DEFAULT_AZURE_SEARCH_TRANSFORM_MAP_JSON
)
SANITIZE_PATTERN = os.getenv(
"FINESSE_BACKEND_SANITIZE_PATTERN", DEFAULT_SANITIZE_PATTERN
azure_search_params = (
json.loads(os.getenv("FINESSE_BACKEND_AZURE_SEARCH_PARAMS", "{}"))
or constants.DEFAULT_AZURE_SEARCH_PARAMS
)

return {
"AZURE_SEARCH_SKIP": constants.DEFAULT_AZURE_SEARCH_SKIP,
"AZURE_SEARCH_TOP": constants.DEFAULT_AZURE_SEARCH_TOP,
"AZURE_SEARCH_CLIENT": azure_search_client,
"AZURE_SEARCH_PARAMS": azure_search_params,
"AZURE_SEARCH_TRANSFORM_MAP": azure_search_transform_map,
"FINESSE_DATA_URL": os.getenv("FINESSE_BACKEND_STATIC_FILE_URL"),
"DEBUG": os.getenv(
"FINESSE_BACKEND_DEBUG_MODE", constants.DEFAULT_DEBUG_MODE
).lower()
== "true",
"ERROR_EMPTY_QUERY": os.getenv(
"FINESSE_BACKEND_ERROR_EMPTY_QUERY", constants.DEFAULT_ERROR_EMPTY_QUERY
),
"ERROR_AZURE_FAILED": os.getenv(
"FINESSE_BACKEND_ERROR_AZURE_FAILED", constants.DEFAULT_ERROR_AZURE_FAILED
),
"ERROR_FINESSE_DATA_FAILED": os.getenv(
"FINESSE_BACKEND_ERROR_FINESSE_DATA_FAILED",
constants.DEFAULT_ERROR_FINESSE_DATA_FAILED,
),
"ERROR_AILAB_FAILED": os.getenv(
"FINESSE_BACKEND_ERROR_AILAB_FAILED", constants.DEFAULT_ERROR_AILAB_FAILED
),
"ERROR_UNEXPECTED": os.getenv(
"FINESSE_BACKEND_ERROR_UNEXPECTED", constants.DEFAULT_ERROR_UNEXPECTED
),
"FUZZY_MATCH_THRESHOLD": int(
os.getenv(
"FINESSE_BACKEND_FUZZY_MATCH_THRESHOLD",
str(constants.DEFAULT_FUZZY_MATCH_THRESHOLD),
)
),
"SANITIZE_PATTERN": os.getenv(
"FINESSE_BACKEND_SANITIZE_PATTERN", constants.DEFAULT_SANITIZE_PATTERN
),
}
48 changes: 48 additions & 0 deletions app/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Constants

# Flag to determine if debug mode is active, default is False
DEFAULT_DEBUG_MODE = "False"

# Default error message for empty search queries
DEFAULT_ERROR_EMPTY_QUERY = "Search query cannot be empty"

# Default error message when Azure search service fails
DEFAULT_ERROR_AZURE_FAILED = "Azure index search failed."

# Default error message for failures in finesse-data static search
DEFAULT_ERROR_FINESSE_DATA_FAILED = "finesse-data static search failed"

# Default error message for any unexpected errors encountered
DEFAULT_ERROR_UNEXPECTED = "Unexpected error."

# Threshold for fuzzy match scoring, default set to 90%
DEFAULT_FUZZY_MATCH_THRESHOLD = 90

# Default error message when Ailab-db search fails
DEFAULT_ERROR_AILAB_FAILED = "Ailab-db search failed."

# Regular expression pattern for sanitizing search queries
DEFAULT_SANITIZE_PATTERN = (
"[^\w \d\"#\$%&'\(\)\*\+,-\.\/:;?@\^_`{\|}~]+|\%\w+|;|/|\(|\)"
)

# Default number of search results to skip in Azure search, default is 0
DEFAULT_AZURE_SEARCH_SKIP = 0

# Default number of search results to return from Azure search, default is 10
DEFAULT_AZURE_SEARCH_TOP = 10

# Mapping of Azure search result fields to desired output structure.
# Knowledge of the index search result structure is required.
DEFAULT_AZURE_SEARCH_TRANSFORM_MAP_JSON = {
"id": "/id",
"title": "/title",
"score": "/@search.score",
"url": "/url",
"content": "/content",
"last_updated": "/last_updated",
}

# Default parameters for Azure search highlighting
# Consult https://learn.microsoft.com/en-us/python/api/azure-search-documents/azure.search.documents.searchclient?view=azure-python#azure-search-documents-searchclient-search
DEFAULT_AZURE_SEARCH_PARAMS = {}
13 changes: 8 additions & 5 deletions docs/USAGE.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,25 +37,28 @@ docker run -p 5000:5000 -e PORT=$PORT --env-file .env finesse-backend

## Check if the API is working properly

Test the path: `/search/static`
### Test the path: `/search/static`

```bash
curl -X POST http://localhost:5000/search/static --data '{"query": "is e.coli a virus or bacteria?"}' -H "Content-Type: application/json"
```

Test the path: `/search/azure`
### Test the path: `/search/azure`

```bash
curl -X POST http://localhost:5000/search/azure --data '{"query": "is e.coli a virus or bacteria?"}' -H "Content-Type: application/json"
curl -X POST "http://localhost:5000/search/azure?top=10&skip=0" --data '{"query": "is e.coli a virus or bacteria?"}' -H "Content-Type: application/json"
```

Test the path: `/search/ailab`
- `top` (optional): Number of search results to return.
- `skip` (optional): Number of search results to skip from the start.

### Test the path: `/search/ailab`

```bash
curl -X POST http://localhost:5000/search/ailab --data '{"query": "is e.coli a virus or bacteria?"}' -H "Content-Type: application/json"
```

JSON structure explanation:
### JSON structure explanation

- id: The unique identifier for each document.
- url: The URL of the document, which should point to inspection.canada.ca.
Expand Down
2 changes: 1 addition & 1 deletion requirements-production.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@ flask-cors==4.0.0 # Released: 2023-06-26
gunicorn==21.2.0 # Released: 2023-07-19
python-dotenv==1.0.0 # Released: 2023-02-24
git+https://github.com/ai-cfia/azure-db.git@main#subdirectory=azure-ai-search
fuzzywuzzy==0.18.0
fuzzywuzzy==0.18.0
python-Levenshtein== 0.23.0
git+https://github.com/ai-cfia/ailab-db@main
29 changes: 0 additions & 29 deletions tests/check_connection.py

This file was deleted.

Loading

0 comments on commit 730c319

Please sign in to comment.