-
Notifications
You must be signed in to change notification settings - Fork 44.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search Google in 2023, and save to long term memory #507
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in the PR include switching the scraper to Selenium, improving the prompt structure, and optimizing the code for faster processing using CodeGenie. The author has added tests and considered potential risks. The changes have resulted in more robust scraping and longer uninterrupted sessions. The addition of GPT4 could further improve efficiency in deploying bots. Overall, the changes seem to be positive and well-considered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my first review on an open-source project, so I'm leaving it as a comment.
But in general, I think this brings in some welcome changes.
Thanks!
scripts/ai_config.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding time, I think, is a great change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time out to review.
scripts/browse.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that a better webscaper should be implemented.
@@ -1,28 +1,18 @@ | |||
import browse | |||
import json | |||
from memory import PineconeMemory | |||
import memory as mem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question - if Pinecone isn't being used, how does the program know which mem to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, Honestly I couldn't figure out why that worked and not PineconeMemory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no usage of memory
here. The only command that needed memory
was memory.add
. Which seems to have been removed away from the prompt.
There is very little need for memory.add
command in the first place since we're storing to the embeddings vector database for every command.
@@ -118,62 +105,26 @@ def get_datetime(): | |||
|
|||
def google_search(query, num_results=8): | |||
search_results = [] | |||
for j in ddg(query, max_results=num_results): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not using ddg
(duck duck go) anymore?
EDIT: NVM I had to reinstall requirements.txt This is no longer an issue I have this error (base) PS C:\Users\USERNAME\Source\Repos\Auto-GPT> python scripts/main.py --gpt3only There is no module named googlesearch??? |
Background
This is basically a culmination of a handful of different PRs all rolled into one with some GPT4 and CodeGenie optimization.
I'll start with the biggest one #121, changing the scraper from BeautifulSoup to Selenium. I also believe this to be the best option going forward as it's much more robust.
But once we have access to the web we need data from 2023 #185 , then a more robust prompt structure wouldn't hurt.
Finally, when AutoGPT finds better data it needs to overwrite the memory, and this comment helped a lot. I also had CodeGenie optimize the code for faster processing.
Changes
Added the persistent_memory = [] stub back to prevent errors
Added string_key_memory = {} to handle memory_ovr error no attribute 'string_key_memory'
Removed space typo in prompt.txt
Test Plan
Google searches no longer time out, and if it does return something it's unable to save the data to long-term memory. I've been able to successfully get it to run for 15 - 20 min sessions regularly before hitting any errors.
⭐️ 26 min 🎥
https://youtu.be/yM_yxVn4y2I
Change Safety
I have not added extensive test coverage but did have an uninterrupted session of 35 minutes before hitting another error. Mind you this was all on
--gpt3only
as I don't have access to GPT4 yet. AutoGPT knew it needed to check the price of a cryptocurrency multiple times so it reasoned it was more efficient to deploy a bot to constantly watch this page for price changes, and if I would have had GPT4 it would have started building python bots for me.