Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโ€™ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature/Version Update] Profile Class and Database Support ๐Ÿค๐Ÿคณ๐Ÿ’ป #6

Merged
merged 48 commits into from
Jul 20, 2022

Conversation

TDKorn
Copy link
Owner

@TDKorn TDKorn commented Jul 20, 2022

Basically, a feature branch that went too far and spiraled into a pretty much complete 100% remake of everything โœจ๐Ÿ˜˜

This PR has almost all of the actual code for v2.0.0.

I made this part of the stacked PR (#5) instead of merging it directly into master because the changes were so drastic that it really couldn't be merged before rewriting the README, creating accompanying documentation, and testing it more.

Overall, I am certain that these 3 weeks will haunt me. That being said, it was still nothing compared to the pain of my-magento

Feature Overview

The major change was splitting up the InstaTweet class, as it was far more focused on the settings than it was on InstaTweeting (definition), which doesn't make sense given the name of the whole thing

I've also been wanting to bring database support to the public version, as I'd written the code for it a while ago and have been using it to schedule InstaTweet and persist data remotely.

Both of these things seemed like a perfect excuse to refactor some methods into a Profile class that can easily be configured, saved, loaded, and templated in both local and remote contexts

The Profile Class

Although the Profile class is new, it's really just a polished version of the old InstaTweet.

It has the same minimum required settings

Old โ›”๐Ÿ˜ฒ

self.profile_name = kwargs.get('profile', 'default')
if not self.is_default:
self.load_profile(self.profile_name)
else:
self.session_id = kwargs.get('session_id', '')
self.twitter_keys = kwargs.get('twitter_keys', None)
self.user_agent = kwargs.get('user_agent', get_agent())
self.user_map = kwargs.get('user_map', {})
print('Using default profile.')

New ๐Ÿฅต๐Ÿ˜ฉ

self.local = local
self.name = name # Will raise Exception if name is already used
self.session_id = kwargs.get('session_id', '')
self.twitter_keys = kwargs.get('twitter_keys', TweetClient.DEFAULT_KEYS)
self.user_agent = kwargs.get('user_agent', utils.get_agent())
self.proxy_key = kwargs.get('proxy_key', None)
self.user_map = kwargs.get('user_map', {})

However, it now

  • Properly supports multiple save/export (and load) formats; default is pickle
  • Supports proxies and saving data remotely in any SQLAlchemy compatible database
  • Can switch between local/remote saves via the local property setter
  • Has methods to make it simple to modify and access its data
  • Has more documentation than code
  • Is easier on the eyes and brain

The DBConnection Class

The DBConnection uses SQLAlchemy to query, load, save, and delete remotely saved profiles in the Profiles database table. It was an excuse to try out context management. I won't lie.

  • To designate a Profile as remote, just set Profile.local=False

Warning
You MUST configure the DATABASE_URL environment variable before these features can be used

If this would be of interest to you, please see the full documentation of the db module


Local vs. Remote?

When you call save() on a Profile, the name is used as a unique identifier to create or update save data in the location specified by local

  • Local saves are made to the LOCAL_DIR, as pickle files
  • Remote saves are made to a database (via the db module) as pickle bytes

The Profile can be deserialized at a later time by passing the same name and value of local to either Profile.load() or InstaTweet.load()

Since the name of a Profile can be

  • Passed as a parameter during initialization
  • Set directly as an object attribute after initialization
  • Changed during the call to save()

A property setter will always make sure that no profile_exists() with the same name, in the location specified by local

@name.setter
def name(self, profile_name):
"""Sets the profile name, if a profile with that name doesn't already exist locally/remotely"""
if profile_name != 'default' and self.profile_exists(profile_name, local=self.local):
if self.local:
raise FileExistsError(
'Local save file already exists for profile named "{}"\n'.format(profile_name) +
'Please choose another name, load the profile, or delete the file.')
else:
raise ResourceWarning(
'Database record already exists for profile named "{}"\n'.format(profile_name) +
'Please choose another name or use InstaTweet.db to load/delete the profile'
)
self._name = profile_name

@staticmethod
def profile_exists(name: str, local: bool = True) -> bool:
"""Check if a profile with the given name and location (local/remote) already exists"""
if local:
return os.path.exists(Profile.get_local_path(name))
else:
with DBConnection() as db:
return bool(db.query_profile(name).first())

Note
You must load() a saved Profile's settings, as initializing a new one with that name would be considered a duplicate


The New InstaTweet ๐Ÿฅโœจ

With the settings out of the way, InstaTweet now focuses on the actual InstaTweeting - the detection and downloading of new Instagram posts, sending of tweets, and [when applicable] the autosaving of its loaded Profile. You would know this if you read the definition above.

Once you've configured a Profile, you can initialize an InstaTweet object in 2 ways:

  • If you've saved the Profile, you can load() its settings in by name
    • This same save file will be automatically updated whenever progress is made

@classmethod
def load(cls, profile_name: str, local: bool = True) -> "InstaTweet":
"""Loads a profile by name
:param profile_name: profile name
:param local: whether the profile is saved locally (True) or remotely on a SQLAlchemy-supported database
"""
return cls(profile=Profile.load(name=profile_name, local=local))

  • Otherwise, you can use the Profile to directly initialize InstaTweet; since no save file exists, nothing will be saved during a run

def __init__(self, profile: Profile):
"""Initializes InstaTweet using a fully configured :class:`~.Profile`
The :class:`~.Profile` will be used to initialize an :class:`~.InstaClient` and :class:`~.TweetClient`
:Note:
Profile settings will only be validated when calling :meth:`~.start`
:param profile: the :class:`~.Profile` to use for InstaTweeting
"""
self.profile = profile
self.proxies = self.get_proxies()
self.insta = self.get_insta_client()
self.twitter = self.get_tweet_client()


Note
The profile just needs to be configured, it doesn't need to be saved. If you plan to run it
more than once, you should really save it though


Assuming the settings are valid, you can then call start(), just like before

OKAY IM DONE, LIKE I WROTE DOCUMENTATION FOR A REASON โ€ผ

TDKorn added 30 commits May 2, 2022 20:33
"InstaTweet" doesn't intuitively equate to "Profile", and since there will be two types of InstaTweet profiles now (hosted database vs local) it just makes sense to change it.

Almost everything in this commit is going to be overwritten, but I want to have it just in case

---

utils.py
* Change filetype argument to not need a "." before filetype

profile.py
* Added Profile class and helper functions
* I do not like how this is written; it is replaced by an abstract Profile class and two concrete subclasses for the two profile types
* Database functions which will also mostly be overwritten

* Create Profiles model/table for database
Changed Profile class to abstract class, with DBProfile and LocalProfile as concrete subclasses.
To prepare for new versions
Rough copy of profile "skeletons" (only saving/loading methods) now finished.
Overall a quality improvement (imo), although I HATE the names of the profile classes. I'll figure something better out eventually...

---

db.py
* Functions used by DBProfile for loading/saving
* Unsure about sqlalchemy sessionmaker/scoped_session in terms of what's automatically done or not (ex. I think it autocommits? but I'm not 100% sure)
Was just verifying that everything works (it does)
* Taken from my other repo: https://github.com/TDKorn/my-magento/blob/main/magento/utils.py#L339

* It's uglier but it removes the bs4 requirement, although I never added it anyways??
    -  Literally what was I doing lol, no way I was this bad in January...
* File to use for scheduling on a server
* Also move some things around ๐Ÿคฉ๐Ÿ˜
* add_user()/add_users() to add a single/multiple IG user to the user map

* Profile.exists property -- indicates if, for the current profile name, either:
    - A local save file exists (for a local profile); or
    - A database record exists (for non-local profile)

* If Profile.exists is true an exception will be raised, to avoid accidental overwriting of a save file
    - Either load the save file with Profile.load(name)  or delete the save file
    - Delete db profile records via InstaTweet.db.delete_profile()

* USER_MAPPING is the default map structure that's used to add new users to the user_map
literally what purpose did that folder serve
* Properties with setters:
  - `twitter_keys` -- Setter makes sure all keys are present and all keys have values
  - `session_id` -- Setter makes sure it's a string (it's a cookie)

* add_hashtags() - self explanatory

* config -- property that returns all useful settings

* view_config() -- method to print config and make it much more legible
* It looks like I did a lot but I did literally nothing except change the class structure so that methods come before properties
* PROBLEM: The `exists` property prevents profile initialization if there's an existing profile with the specified name. But if a profile is already initialized, and then its name is changed, no checks are in place
    - This means nothing would stop you from changing the name to that of an existing profile, making it very easy to inadvertently override a save file

* `exists` currently uses both the `name` and `local` instance attributes to check for existing profiles
    - I want to check any time the `name` attribute is set, but can't call `exists` unless the  `name` is already set...

* Solution: create `profile_exists()` static method, change `exists` to use this new method, and make `name` into a property with a setter
    - Now, regardless of when the name is set (ie. during or after initialization), the setter will first call `profile_exists()` to ensure there are no conflicts
    - All methods that used `exists` are still able to, as it is now just an instance-specific wrapper for the static method

To Summarize:
* This means that you can't
    1.  Initialize a profile with an already used name
    2.  Change the name of an existing profile to an already used name

* Any saved Profile should be loaded using `Profile.load()`; you cannot load it by initializing a Profile with the same name (will raise exception)
* Saved Profiles can be deleted via os.remove() or InstaTweet.db.delete_profile() depending on if it is local/hosted

--

* Also added more docstrings, type-checking, etc. in other methods
* Methods really just don't flow well idk how to explain it but it feels yuck
* Added more detail to exception/print messages to help differentiate between local and db method/function calls
* Fix formatting for eventual Sphinx docs
* Sounds crazy but i think it might help to have the database, like, actually exist.... pure speculation though

* Also added a docstring but I feel like it'd be insulting to add them for the rest of the functions??
    - bc it's so obvious... like wow what does delete_profile() do?? ๐Ÿ™€๐Ÿคฏ pls explain ๐Ÿ˜ฉ
Update profile.py
* okay wow...  skinny legend
* InstaThicc?? ๐Ÿ‘๐Ÿ˜ฉ๐Ÿฅต not anymore โ›”๐Ÿ˜ฒ... this is InstaTWEET ๐Ÿคฉโœจ๐Ÿฅโœจ๐Ÿคฉ don't get it confused
* Replace the ?__a=1 endpoint with one that isn't deprecated
    - This is definitely a temporary fix
    - I want to use someone else's package bc mine is too minimalistic but I literally can't get anything to work so...

* Also changed it to make use of the new Profile class
* Add docstrings to InstaClient

* Move `check_posts()`  method from InstaClient to InstaTweet and rename it to `get_new_posts()`
    - It uses the user map from a Profile to see which posts are new... which is highly specific to this repo
    - Makes more sense to keep InstaClient as a class that has general methods (?)
        - I'm trying to say I want it to be functional outside of the repo

---

* Note that I do NOT recommend using InstaClient outside of the repo; there are plenty of packages out there with better security and features. They were all too advanced for my needs here

---
Update profile.py
* Fix exception msgs
I can't wait to replace TweetClient with literally ANY package. But for now,
* Updated calls to `InstaPost.file_path` to use `filepath` instead
* Changed `video_path` to `media_path`
* Changed `user_map` to store only the Tweet link, timestamp, and text
    - Save files were getting THICC
    - I don't really think any of the tweet data is useful, but if any of the 0 people using this disagree, please let me know ๐Ÿ’–
* I realized I'm gonna squash merge this branch so I'm gonna do a bunch of stuff I've been hold back on bc I didn't want to keep force pushing
* I will add back updated ones
* But I will not say when (it would be a lie)
TDKorn added 16 commits July 7, 2022 05:11
* It retrieves the database url from an environment variable
    - If you're not using a database, its probably not gonna be set... so like why ruin your day like that
The Dehyphenization of InstaTweet
* it's the lack of hyphen for me ๐Ÿคช๐Ÿ˜ฉโœจ
* Recreated in the next commit, just want to fully delete first so I don't get the weird unified diff that makes it hard to see what happened
`TweetClient`
* It's doing the exact same thing as before
* But without that stank raw-ass code I threw together (and then pretended didn't exist)
* I think it's much cleaner now. Plus, I don't need to maintain the twitter part, which is a relief bc I can barely maintain my will to live
    - For legal/employment purposes, that was a joke ๐Ÿ˜ƒ๐Ÿ˜‚๐Ÿ˜ƒ I love being alive haha ๐Ÿ˜‚๐Ÿคฃ wasn't that silly ๐Ÿคช๐Ÿ˜€๐Ÿคฃ๐Ÿ˜‚๐Ÿ˜‚๐Ÿ˜‚๐Ÿ˜‚๐Ÿ˜‚๐Ÿ˜‚
* Using `tweepy`  might make it really easy to authorize users with my consumer key/token? I think that's a feature
    - If it is, then at some point I'll implement it and deprecate the twitter keys requirement

---

* Updated `InstaClient`, `InstaTweet`, `InstaPost` classes to reflect changes in `TweetClient`
And add property docstrings while I'm at it
Lots of small things, preparing to merge
* I don't want to keep making small commits there's already 40 ๐Ÿ˜ญ๐Ÿ˜ญ๐Ÿ˜ญ

---

All Files
  * Docstrings and type hints woooooo yeaaa okaay

---

profile.py
  * Profile serialization methods -> to_dict, to_json, to_pickle
  * Profile deserialization class methods -> from_json, from_dict
      - Use load() to deserialize pickle save formats

  * Add methods for easy access to user_map entries
      - `get_user()` returns the specified IG user's dict entry in the user map
      - `get_scraped_from()` returns the `scraped` list for the specified user
      - `get_hashtags_for()` returns the `hashtags` list for the specified user
      - `get_tweets_for()` returns the `tweets` list for the specified user

  * Remove `add_user()` method -> just use `add_users()` to add a single/multiple user(s)

---

instaclient.py
  * Fix imports

---

instatweet.py
  * Implement new Profile methods in `start()` and `get_new_posts()`

---

tweetclient.py
  * Fix imports
  * Fix `get_api()` to actually work
That makes sense, right?
* `build_tweet`  updated to use hashtags finally
    - I've been avoiding this so long omg like the old version was SO badly written it was actually overwhelming
    - No fr look at this https://github.com/TDKorn/insta-tweet/blob/5306035b21b01bb2f2f564f2f0bbb5c71c2f7a83/InstaTweet/core/tweetclient.py#L170-L185

* Add `pick_hashtags` method to randomly pick and format hashtags from the user map
    - It picks, at most, `MAX_HASHTAGS`
* Change access methods to be less dumb
    - Like why did i make it as inefficient as possible
* If there's no caption on a post, there's no media edge or node
    - Soo returning the 0th index edge wasn't ideal
* It's an excuse to try out context managers
* I was working on sphinx docs and was like, will it have trouble with the db stuff?
    - Did I try and see if it worked? Nope, I assumed it wouldn't and redid the entire thing. that's normal right
* I just took the functions I had and wrapped them, it looks like I did a lot more
* Also deleted `models.py` and moved the like, 6 lines from it to `db.py`
* Also implemented the new class in `Profile`
* Change enter and exit for DBConnection
    - Initially was thinking to have a single persistent session and the instance would be transient
   - Changed my mind to make the entire connection transient, but I was tired and... yea...
    - Illogical to check for an existing session if it's supposed to create a new one each time??? like
    - Even worse was that, since exit only called `session.close`, the check for a session would actually return True, and so it wouldn't ever make a new one omg
    - Now, it disposes of the engine and the session on exit. Is the engine part needed? I literally don't know. But better safe than sorry? ๐Ÿคก
    - Overall it's kind of a mess, like why am I still using a class variable? It's there to prevent creating multiple sessions, but if I don't want to reuse sessions then...?

* Basically, I don't know what I really want this class to do yet
    - I'm sure it will become clearer when I learn more about databases
    - But for now it's gonna be ugly messy and questionable
        * In other words, no different than the rest of my work  (:

* Add docstrings
* Delete models.py since it was all moved into db.py
Main Changes
 * `InstaClient.DOWNLOAD_DIR` is where IG posts will be downloaded to now, rather than nowhere in particular
    - When `InstaClient` downloads a post, it still sets the `filepath` attribute on the `InstaPost`

 * `InstaPost` has new properties
    - `filetype`: returns the filetype of the post (video/photo --> 'mp4'/'jpg')
    - `filename`: returns the default basename for the file (`id` + `filetype`)
    - `is_downloaded`: returns if the post has been downloaded yet; checks `filepath` attribute to see if a file exists

---

General Issue
* I'm struggling to decide if I want the InstaClient/Post to be functional as standalone classes or not

* Take this commit itself as an example:
    - Felt illogical to "calculate" the download path within the `InstaClient`, considering the filename is an intrinsic property of an `InstaPost`
    - Making the `filepath` a property of InstaPost, rather an than attribute, would require a property setter
        > The filepath is used to determine if the post was downloaded successfully, ie. the `is_downloaded` property
        > So if passing a custom filepath to `InstaClient.download_post()`, it needs to be overriden on the `InstaPost` too
    - Feels excessive to doo all this for something that will NEVER happen WITHIN the package?
        > Like it'd only be a problem if the classes aren't used as intended (but do I really have an intention?? no... im not the InstaClient police or something... so...?)
    - Instead I kept it as an attribute, added `filename` to help construct almost all of the filepath, but still... it feels wrong to make it harder to customize

* Note that this commit replaced commit ea1918b
    - In that commit, there's an option to specify a download directory and override the InstaClient one
    - But what if you want a custom file name? How much customization ability am I supposed to provide? Like UGHHHHHHHHHHHHHHH ๐Ÿคฎ๐Ÿคฎ๐Ÿคฎ๐Ÿคฎ
    - god knew I'd be too powerful if I didn't have crippling ADHD paralysis ๐Ÿ˜žโœŠ
and maybe a lil extra from here ๐Ÿ˜ฉ๐Ÿ˜Ž

Squashed commit of the following:
commit 64aa0ed
Author: TDKorn <96394652+TDKorn@users.noreply.github.com>
Date: Sun Jul 17 21:47:56 2022 -0400

Update instatweet.py

commit 2e86bcf
Author: TDKorn <96394652+TDKorn@users.noreply.github.com>
Date: Fri Jul 15 14:21:19 2022 -0400

Update InstaPost/InstaUser

commit 2cb3f86
Author: TDKorn <96394652+TDKorn@users.noreply.github.com>
Date: Thu Jul 14 01:27:19 2022 -0400

Fix more docstrings

Update instaclient.py

Update profile.py

commit 91ce8af
Author: TDKorn <96394652+TDKorn@users.noreply.github.com>
Date: Wed Jul 13 00:28:14 2022 -0400

Docstring Fixes Jul 12 i ve had enough
* Change how context management and `connect()` works
    - Not sure why I was recreating the engine and DB tables every connection?
        > Was making it crazy slow, understandably
    - Added `ENGINE` class variable
        > The engine and db tables are created/set only once now -- the first time a connection attempt is made after the package is imported
    - `SESSION` class variable -- is back to the only thing that's "context managed" (idk the right term and idc)
        > I remembered what I was trying to do initially lol (see whiny rant on commit e853045)
        > If chaining methods in the same context, will try to use an existing session... as expected (since it'd be within the same `with` statement)

* Fixed `save_profile()` -- was overwriting `profile` variable then trying to use the original value ๐Ÿ‘Œ๐Ÿ˜ฉ๐Ÿ‘Œ

* Added docstrings for other methods, I know I said it'd be insulting in 3716eba but like, it's so bare
    - Plus I changed sphinx settings and now I actually need to write docstrings for everything
        > Please end my suffering
@TDKorn
Copy link
Owner Author

TDKorn commented Jul 20, 2022

ima do this part later lol

say less

@TDKorn TDKorn marked this pull request as ready for review July 20, 2022 06:52
@TDKorn TDKorn temporarily deployed to instatweet July 20, 2022 07:29 Inactive
* Update dependencies in `requirements.txt`
* Honestly I really liked the whole cropping thing in v1.0 but it was gross and messy so ima ๐Ÿšฎ that
    - I think it was pretty specific to my own IG page anyways? Bc I added black borders to videos
@TDKorn TDKorn force-pushed the db-and-profiles branch from 7fa1a17 to e317174 Compare July 20, 2022 08:29
@TDKorn TDKorn changed the title [Feature] Profile Class and Database Support [Feature/Version Update] Profile Class and Database Support ๐Ÿค๐Ÿคณ๐Ÿ’ป Jul 20, 2022
Copy link
Owner Author

@TDKorn TDKorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's hot

@TDKorn TDKorn merged commit 40c2eba into 2.0.0 Jul 20, 2022
TDKorn added a commit that referenced this pull request Sep 3, 2022
I never thought the day would come. Yet here we are.

Please see #5 for full details on the new and improved ๐ŸคInstaTweet v2.0.0 ๐Ÿฃโœจ

--------------------------------------------------------------------------------------------------

Since this has been done as a stacked PR, there's... a lot to take in

Notably, this update includes:
* (#6)  [Feature/Version Update] ```Profile``` Class and Database Support ๐Ÿค๐Ÿคณ๐Ÿ’ป
* (#7) - Add Sphinx (๐Ÿ˜ฃ) / ReadTheDocs Files ๐Ÿ“–๐Ÿง + Custom Theme Integrated w GitHub ๐Ÿ’‹
* Pain and Suffering (above baseline)
@TDKorn TDKorn deleted the db-and-profiles branch April 27, 2023 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant