An automated document analyzer for Paperless-ngx using OpenAI API and Ollama (Mistral, llama, phi 3, gemma 2) to automatically analyze and tag your documents.
-
🔍 Automatic document scanning in Paperless-ngx
-
🤖 AI-powered document analysis using OpenAI API and Ollama (Mistral, llama, phi 3, gemma 2)
-
🏷️ Automatic tag and correspondent assignment
-
🔨 (NEW) Manual mode to do analysing by hand with help of AI.
-
🚀 Easy setup through web interface
-
📊 Document processing dashboard
-
🔄 Automatic restart and health monitoring
-
🛡️ Error handling and graceful shutdown
-
🐳 Docker support with health checks
- Docker and Docker Compose
- Access to a Paperless-ngx installation
- OpenAI API key or your own Ollama instance with your chosen model running and reachable.
- Basic understanding of cron syntax (for scan interval configuration)
docker pull clusterzx/paperless-ai
- Clone the repository:
git clone https://github.com/clusterzx/paperless-ai.git
cd paperless-ai
- Start the container:
docker-compose up -d
- Open your browser and navigate to:
http://localhost:3000
- Complete the setup by providing:
- Paperless-ngx API URL
- Paperless-ngx API Token
- Ollama API Data OR
- OpenAI API Key
- Scan interval (default: every 30 minutes)
-
Document Discovery
- Periodically scans Paperless-ngx for new documents
- Tracks processed documents in a local SQLite database
-
AI Analysis
- Sends document content to OpenAI API or Ollama for analysis
- Extracts relevant tags and correspondent information
- Uses GPT-4o-mini or your custom Ollama model for accurate document understanding
-
Automatic Organization
- Creates new tags if they don't exist
- Creates new correspondents if they don't exist
- Updates documents with analyzed information
- Marks documents as processed to avoid duplicate analysis
You can now manually analyze your files by hand with the help of AI in a beautiful Webinterface. Reachable via the /manual endpoint from the webinterface.
The application can be configured through environment variables:
Variable | Description | Default |
---|---|---|
PAPERLESS_API_URL | URL to your Paperless-ngx API | - |
PAPERLESS_API_TOKEN | API Token from Paperless-ngx | - |
AI_PROVIDER | AI provider to use (openai or ollama) | openai |
OPENAI_API_KEY | Your OpenAI API key (required if using openai) | - |
OLLAMA_API_URL | URL to your Ollama instance | http://localhost:11434 |
OLLAMA_MODEL | Ollama model to use (e.g. llama2, mistral) | llama2 |
SCAN_INTERVAL | Cron expression for scan interval | */30 * * * * |
The application comes with full Docker support:
- Automatic container restart on failure
- Health monitoring
- Volume persistence for database
- Resource management
- Graceful shutdown handling
# Start the container
docker-compose up -d
# View logs
docker-compose logs -f
# Restart container
docker-compose restart
# Stop container
docker-compose down
# Rebuild and start
docker-compose up -d --build
The application provides a health check endpoint at /health
that returns:
# Healthy system
{
"status": "healthy"
}
# System not configured
{
"status": "not_configured",
"message": "Application setup not completed"
}
# Database error
{
"status": "database_error",
"message": "Database check failed"
}
To run the application locally without Docker:
- Install dependencies:
npm install
- Start the development server:
npm run test
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
- Store API keys securely
- Restrict container access
- Monitor API usage
- Regularly update dependencies
- Back up your database
This project is licensed under the MIT License - see the LICENSE file for details.
- Paperless-ngx for the amazing document management system
- OpenAI API
- The Express.js and Node.js communities for their excellent tools
If you encounter any issues or have questions:
- Check the Issues section
- Create a new issue if yours isn't already listed
- Provide detailed information about your setup and the problem
- Support for custom AI models
- Support for multiple language analysis
- Advanced tag matching algorithms
- Custom rules for document processing
- Enhanced web interface with statistics