PHOTON is an advanced personal desktop assistant inspired by JARVIS from Iron Man. Built using the Groq API and the Llama model, The assistant is equipped to handle complex command structures and can invoke agents and tools to execute specific tasks. It operates as a conversational AI with the capability to transcribe, summarize, and respond intelligently based on context.
This project consists of several key files, each playing a crucial role in the assistant's functionality.
The core of the PHOTON assistant, photon.py
handles the main functionality, including:
- Speech Recognition: Converts voice input into text using a speech-to-text system.
- Model Response Generation: Sends the transcribed text to the Groq model, which processes the query and generates a response.
- Text-to-Speech Conversion: Converts the model's response back into speech for output.
- Agent Invocation: Based on the system prompt format,
photon.py
can callagent.py
to handle specific tasks. If the model responds in a format that indicates an agent is needed, the message is routed to the agent, which then invokes necessary tools to fulfill the request.
The system prompt in photon.py
defines the assistant's personality, behavior, and response style, ensuring PHOTON operates with precision and efficiency.
agent.py
is responsible for initializing and managing the agent. Key functions include:
- Embedding Model Initialization: Initializes an embedding model from HuggingFace for vector storage and retrieval.
- Groq Model Initialization: Sets up the Groq model to support agent operations.
- Vector Storage: Uses Llama Index's Simple Directory Reader to vectorize and store data for use in queries.
- Agent Initialization: The agent is set up to handle specific tasks and has access to various tools (imported from
tools.py
), which it can invoke based on user commands. - Storage Folder Creation: When the agent processes and vectorizes data, it saves the index in a
storage/
folder for efficient retrieval in future interactions.
tools.py
houses various helper functions and tools required by the agent to perform specific actions. These tools are:
- Specific Functions: Each tool is a standalone function written to handle particular tasks requested by the agent.
- Importable by Agent:
tools.py
functions are imported intoagent.py
, making them accessible to the agent when specific operations are needed.
transcriptions.py
operates independently from the main agent system, focusing solely on transcription tasks. Its functionalities include:
- Speech-to-Text Conversion: Similar to
photon.py
, this file uses a speech-to-text system to transcribe spoken input. - LLM Integration for Refinement: Transcribed text is sent through the language model (LLM) for refinement and rephrasing, ensuring clear and coherent output.
- Automated Typing: Once refined, the output text is either displayed or typed out on the user’s computer.
data/
Folder: Contains data files and any resources needed for your local knowledge base. It can include PDF's, TXT files, JSON files and more.storage/
Folder: Automatically created byagent.py
for storing vectorized data used in queries. This folder houses index files for efficient information retrieval.
The project requires the following dependencies, which are listed in requirements.txt
:
To install dependencies, run:
pip install -r requirements.txt
-
Run the Assistant: Start the assistant by running
photon.py
:python photon.py
-
Voice Interaction: Speak to PHOTON to initiate commands. It will transcribe your speech, process it through the model, and respond with audio output.
-
Agent Invocation: If the assistant's response requires specific actions, it will automatically call
agent.py
to handle the request, using the tools fromtools.py
as needed. -
Transcriptions Only: For transcription-focused tasks, use
transcriptions.py
independently to transcribe, refine, and output text without engaging the full conversational model.
- Speech to Text: User speech is transcribed into text, which is then fed to the Groq model for processing.
- Model Processing: The LLM in
photon.py
generates a response based on the prompt and context. - Agent and Tools: If the model determines an agent is needed,
agent.py
is called, which can use tools intools.py
to perform specific tasks. - Text to Speech: The assistant’s response is converted to speech and delivered to the user.
- Conversational AI: Engages users in natural language, handling tasks and queries seamlessly.
- Contextual Transcription:
transcriptions.py
provides transcription capabilities with context-aware rephrasing. - Tool-Based Functionality: Tools in
tools.py
offer modular, expandable functionality for diverse operations. - Vectorized Query Storage: Efficient data storage in
storage/
allows for quick retrieval in agent operations.
Potential areas for further development include:
- Expanding tool functions in
tools.py
for more complex tasks. - Enhancing agent capabilities in
agent.py
for smarter decision-making. - Adding more language support and refining transcription accuracy in
transcriptions.py
.