This example leverages OpenAI for identifying PII (names, addresses, DOB) and PDFTron for text extraction and redaction.
Inside of server/
create a new file called config.env
and place the demo key from PDFTron and Open.AI:
PORT=9000
PDFTRONKEY=
OPENAI_API_KEY=
After in the terminal run the following:
cd client
npm i
npm start
cd server
npm i
npm start
Node.js server will act as a file storage. PDFTron Node.js SDK will extract text, search, and create markup annotations. Open.AI will detect names and addresses from the text provided by PDFTron.
getNamesAndAddressesFromOpenAI
accepts text extracted from a document, and builds a prompt
that accepts a natural language command to extract names and addresses. It can be modified to search for other information. For testing purposes the function is commented out. Please uncomment and build your prompt
as needed.
const getNamesAndAddressesFromOpenAI = async (text) => {
return await openai.createCompletion('text-davinci-002', {
prompt: `Extract names and address from this text: ${text}`,
temperature: 0,
max_tokens: 64,
top_p: 1.0,
frequency_penalty: 0.0,
presence_penalty: 0.0,
});
};
Summarization of the contract works in a similar way to PII search, where inside of the prompt
Tl;dr
is added to the end of the string that needs to be summarized. For testing purposes the function is commented out. Please uncomment and build your prompt
as needed.
const summarizeTheContract = async (text) => {
return await openai.createCompletion('text-davinci-002', {
prompt: `${text} \n\nTl;dr`,
temperature: 0.7,
max_tokens: 60,
top_p: 1.0,
frequency_penalty: 0.0,
presence_penalty: 0.0,
});
};
Here is a sample summarization of the file in the repository.
This is a contract between a company and a bank for the sale of goods. The company agrees to sell the goods to the bank for a sum of money, and the bank agrees to purchase the goods from the company. The contract includes terms and conditions for the sale and purchase of the goods