AgenTopic is an advanced topic modeling project that enhances traditional neural topic modeling by integrating an iterative feedback loop with a large language model (e.g., GPT-4) and a caching memory module. This approach optimizes and compares different parameters and topic outcomes, selecting the best parameters for modeling.
This project is specifically designed to perform topic modeling on psoriasis literature, based on the dataset from psknlr.github.io. It facilitates precise literature retrieval by combining information on titles, journals, publication years, abstracts, and topics.
- Initial topic modeling with BERTopic
- Iterative refinement of topics using GPT-4 suggestions
- Fine-tuning language models based on new topics
- Evaluation of models using various metrics
- Selection of the optimal model with GPT-4 assistance
- Integration with psoriasis literature dataset from psknlr.github.io
-
Initial Topic Modeling
- Document and Sentence Embedding
- Dimensionality Reduction
- Clustering
- Topic Modeling
-
Feedback Loop with Language Model
- Generate Topic Summaries and Labels
- Create Actions Based on Feedback
- Assign New Weights to Word Embeddings
-
Refinement of Topic Modeling
- Fine-Tuning the Language Model
- Recalculate Topic Vectors
-
Caching Memory Module
- Multiple Tuning Iterations
- Selecting Optimal Parameters
pip install -r requirements.txt
python main.py
This project utilizes the psoriasis literature dataset from psknlr.github.io.
Note: To view and search the dataset, please visit psknlr.github.io directly.
We welcome contributions from the community! Please follow these guidelines to contribute to AgenTopic:
- Fork the repository.
- Create a new branch (
git checkout -b feature/YourFeature
). - Commit your changes (
git commit -m 'Add some feature'
). - Push to the branch (
git push origin feature/YourFeature
). - Open a Pull Request.
Please ensure your contributions adhere to the project's coding standards and include appropriate tests.
For any questions or suggestions, please contact [Yanlan Kang](Yanlan Kang:ylkang96engd@gmail.com).
If you find this work useful in your research, please cite our repository:
@misc{AgenTopic,
author = {FulPhil},
title = {AgenTopic: Topic Modeling with LLM-based Agent},
year = {2024},
publisher = {GitHub},
journal = {GitHub Repository},
howpublished = {\url{https://github.com/pariskang/AgenTopic}}
}