Skip to content
Mahesh Maan edited this page Aug 5, 2021 · 3 revisions

Components

[This page is work-in-progress (Aug 05, 2021)]

PQAI components are versatile building blocks that are frequently required in building useful applications in the patent AI space.

Each component does a well-defined job. For example, a Filter component filters out patents that don’t satisfy a given filter condition (e.g., published before a given date, or mentioning certain keywords).

List of PQAI components:

  1. Database
  2. Classifier
  3. Filter
  4. Ranker
  5. Indexer
  6. Index
  7. Encoder
  8. Consolidator

Patent Database

This is an instance of abstract component Storage (described at the end of this list). Patent Database is a special component in the sense that all components can be configured to access it. A major benefit of this approach is that components can pass around references within the Patent Database (e.g., patent numbers) instead of patent data itself. This keeps the component interfaces clean and lightweight.

Encoder

An encoder takes in an entity and returns its representation. The input entities and output representations can both take many forms, making this component very versatile. For instance, one instantiation of an encoder can be in the form of a Patent Vectorizer – which accepts a patent number (as described earlier, all components can retrieve patent data given the patent number) and returns a vector embedding in a high dimensional space that corresponds to the given patent. A bag-of-words encoder can be another example of this component.

Index

An index is a data structure optimized for searching among entity representations. It differs from a Store in that it may not necessarily be able to return the original representation. It accepts a compatible query and returns a set of entity pointers. A Patent Vector Index, for example, might accept a query vector and return a set of patent number as top matches for the query. Note that the Index accepts query representations and not raw queries, therefore, it has to plugged into a suitable Encoder to turn the raw query into a compatible representation.

Ranker

It accepts a set of entities and returns a list of the same entities, the order of which is determined by a ranking criterion. A Patent Ranker for instance, would accept a set of patents and a user query as input and orders those patents in decreasing order of relevancy to the given query.

Classifier

A classifier associates one of a finite set of predefined labels to a patent, where the labels have unique meanings associated with them. A Patent Classifier for instance, could take as input a set of patent numbers and associate, with each patent number, a label, which may mean for example whether this patent is related to solar cell technology or not. Internally, classifiers can make use of configurable classifier models, which can be initialized with inputs such as (patent-number, label) pairs or a textual description.

Consolidator

A consolidator accepts a set of patents and associates one of a finite set of arbitrary labels to each patent. Essentially, it creates clusters of patents where each cluster’s patents have some common characteristics. A Technology Consolidator, for instance, can accept a set of patent numbers and then group them into, say, 3 groups, depending on the technologies they relate to.

Filter

A filter accepts a set of entities and depending on a filter criterion returns a subset of them. A Patent Filter, for instance, would filter out patents satisfying a condition such as a publication date criterion. Filters can be cascaded to create a Filter Sequence. For instance, a date period filter can be created by cascading a before-date filter and an after-date filter.

Sorter

The input-output characteristics of a sorter are similar to a ranker but in its output, only the relative positions of the entities matter. A Patent Sorter can, for instance, accept a list of patent numbers and arrange them such that any patent in the list is succeeded by the most similar patent to it in the rest of the list. Such a sorter can be useful during a manual review of patents (all related patents come in sequence and the reviewer can make use of insights still available in their short term memory).

Patent Number Parser

It accepts plain text as input, then detects and extracts any patent numbers in it, translates them into a standard format (e.g., by truncating or adding zeros) and then outputs a list of patent numbers that can be directly inputted to the Patent Database component. This component, when used at the boundary of a patent data mining system, can eliminate all issues that arise due to patent number format mismatching.

Storage

Storage is an abstract wrapper around as a data source. It stores entities that are all of the same type but other than this, it makes no assumption about how the data is stored (e.g. whether it is stored in a local database, in the primary memory, or on a remote server). A Storage component performs two operations: it saves and retrieves entities. In the saving operation, it accepts an entity and returns its entity identifier. Retrieval operation is the opposite of saving operation - an entity is returned in response to a supplied identifier. Storages can be configured to be read-only too.

Clone this wiki locally