The algorithm consists of several stages, each performing specific tasks to identify and qualify potential leads. The pipeline consists of the following main stages: Data Retrieval and Enrichment, Scoring, Lead Qualification, Result Processing
- [CODE REDACTED DUE TO CONFIDENTIALITY AGREEMENT]
Task: get_organizations
- Fetches organization data from Apollo API
- Implements pagination to retrieve multiple pages of results
- Maximum pages retrieved: APOLLO_SEARCH_MAX_PAGES
Task: enrich_business_info
- Enriches organization data with additional business information from Apollo API
- Processes organizations in batches (size: APOLLO_BATCH_SIZE)
- Extracts keywords, industries, SEO description, and short description
- Implements batch processing to reduce API calls and improve efficiency
Task: enrich_traffic_info
- Queries Snowflake database for traffic data using SnowflakeQuery class
- Retrieves monthly visits for each organization
- Utilizes batch querying for improved performance
Task: enrich_tech_stack_info
- Enriches organization data with tech stack information from BuiltWith API
- Processes organizations in batches (size: BUILTWITH_BATCH_SIZE)
- Extracts technology names used by each organization
- Implements batch processing to reduce API calls and improve efficiency
Task: scoring_business_info
- Uses BusinessInfoAnalyzer class to score enriched business information
- Calculates similarity scores for keywords, industries, and descriptions using embeddings
- Embedding model: text-embedding-3-large
- Implements caching mechanism for embeddings to improve performance
- Utilizes vectorization techniques for runtime optimization
Task: scoring_traffic_info
- Uses TrafficAnalyzer class to score traffic data
- Calculates percentile rank of organization's traffic compared to existing customers
- Optimized for faster processing of large datasets
Task: scoring_tech_stack_info
- Uses TechStackAnalyzer class to score tech stack information
- Calculates similarity score for tech stack using embeddings
- Utilizes vectorization techniques for runtime optimization
Tasks: openai_qualification_with_scores, gemini_qualification_with_scores, claude_qualification_with_scores
- Implements parallel lead qualification using three different LLM models: OpenAI (GPT-4o), Google Gemini 1.5 Flash, Anthropic Claude 3.5 Sonnet
- Each model analyzes scores and determines if the organization is a potential lead
- Models also recommend the most suitable type of salesperson for each lead
- Utilizes concurrent API requests for improved performance
Task: qualified_leads_aggregation
- Aggregates results from all three LLM models
- Calculates an average score based on the decisions of all models
- Determines the final decision on whether an organization is a potential lead
- Aggregates salesperson recommendations
Task: sales_routing
- Compiles all gathered information and qualification results
- Generates a pandas Dataframe with lead information, qualification decisions, and recommendations
- Uploads the rows in the Dataframe to Snowflake for further use by the sales team
- Embedding Analysis: Utilizes OpenAI's text embedding model for semantic similarity calculations in business info and tech stack scoring.
- Traffic Analysis: Uses percentile ranking to evaluate an organization's traffic in relation to existing customers.
- Multi-Model Qualification: Employs three different LLM models to provide diverse perspectives on lead qualification.
- Batch Processing: Implements batch processing for API calls to Apollo and BuiltWith to optimize performance and respect API rate limits.
- Caching: Uses a caching mechanism for embeddings to reduce redundant API calls and improve overall performance.
- Concurrent Processing: Utilizes parallel processing for LLM API calls to reduce overall qualification time.