Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'period_length' not kept when only changing 'steps_per_period' in gif animation #11

Closed
SoundSpinning opened this issue Aug 24, 2020 · 15 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@SoundSpinning
Copy link
Contributor

Hi Jack,

Firstly, many things for sharing this great tool, it worked out of the box for me!
So, I played with a small python (see attached jupyter notebook) with a simple pendulum equation plotting angle, ang_velocity & ang_acceleration. Then I used your tool and got nice gif animations.

However, I noticed that for period_length of 50ms (see last cell in attached notebook), it works fine for 1 & 2 steps_per_period, but when I try 3 or 4 the period seems much longer than 50ms; i.e. slower animation. Would you know why, is it because the period needs to be an integer and something happens depending on number of frames?

Cheers,
Jesus
pendulum_sample.zip

@JackMcKew
Copy link
Owner

JackMcKew commented Aug 24, 2020

Hey!

Thank you very much for the feedback

The current implementation is the period length defines how long to show each 'row' of data for, and the steps_per_period will essentially make that 'row' show for the number of times you specify, hence why it seems 'slower' at higher numbers because it'll show that 'row' for 4 times as long. This is done by interpolating the dataframe by the steps_per_period (https://github.com/JackMcKew/pandas_alive/blob/main/pandas_alive/_base_chart.py#L124)

This functionality is intended, for 'faster' animations, I'd suggest leaving steps_per_period as 1 and using a smaller period_length

For ease of reproducibility, here is a colab version of the notebook attached: https://colab.research.google.com/drive/1UnGPh4azfIz36IqwnEdHd3-jvNhy0-9u?usp=sharing

@SoundSpinning
Copy link
Contributor Author

Thanks for the quick reply.
In your code "_base_chart.py", L559 you have:
self.fps = 1000 / self.period_length * self.steps_per_period
, which makes sense to me; i.e. for a fixed period applied to user input rows, as we increase the 'steps_per_period' the FPS goes up. This means that we keep the length of the animation constant, and we just add detail for each period, e.g. this may help to show better in a hbar race the vertical transition of bars when one wins over another, neat.
So, as we increase the stepping number the speed of video should remain constant, not slower. Then I ran a few tests and installed in Anaconda 'ffmpeg', so I could get .MP4 vids instead of .GIF ones. And this works fine, as per your code. I was able to keep period=100ms, then change steps from 1 to 10, and all MP4s looked fine, same speed.
Then I went back to the default .GIF output I had before, and as soon as I went to an FPS ~=60 or higher the codec doesn't get it right, fine below that value. Funny enough my screen refresh rate is 60Hz = 60FPS, who knows...

Conclusion: your code is tip-top on this, default .GIF output sucks at high FPS, and 'ffmpeg' with .MP4 wins hands down.

@SoundSpinning
Copy link
Contributor Author

Also, I've tested further your tool and uploaded to the colab area: pendulum_sample_2.ipynb.

I've now generated a line and a bar race chart animations. Then I combined them using animate_multiple_plots, and it works! I have the following comments:

  • I hit the issue you mention when combining line plots with others where the last and 1st points get joined. The odd thing is that you can see this happens from the start of the animation, I thought this would be only at the very end when plotting last point?
  • The period in combined one seems to be missing one frame: T=1.48 when my dataFrame goes to 1.49.
  • I was able to get axis labels from the figure ('figs') nicely, and plot size looks correct in image in cell 11. However, the output MP4 is much larger than the figure. See the video embed at the actual size in cell 12. Do you know why this may be?
  • By generating the combined one calling the individual ones via a variable it seems to rerun them all (?). Is there a way to run all animations just once when producing the combined ones?

Thanks for your time on this.

@JackMcKew
Copy link
Owner

Thank you again for the kind words, it's very much appreciated!

Regarding the first comment:

In-memory writing to GIF wasn't supported at first, until I had a number of users complaining they needed to install an external dependency, thus it was never considered a first class method in my eyes. I remember looking into the fps problem in memory, and the current implementation makes use of Pillow for writing in memory, which from my understanding doesn't have an fps argument for the save function (https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html#gif). This could be improved by introducing another dependency of imageio and would possibly be a great PR (https://imageio.readthedocs.io/en/stable/format_gif-pil.html).

Regarding the second comment:

  1. I still have no idea where that bug is stemming from, and currently don't have the bandwidth, nor the interest to solve, however more than happy to answer any questions if someone wants to tackle it though
  2. This is odd, I'd say it would be this number needing to start at 0 https://github.com/JackMcKew/pandas_alive/blob/main/pandas_alive/plotting.py#L586, rather than 1 but would need to test it
  3. I believe mp4 files scale to a default resolution? Maybe, not too sure as to why, but I've had a similar thing when outputting animate_multiple_plots in memory with Pillow having them appear very small in the GIF
  4. The way animate_multiple_plots works is by creating the instance of the chart in essentially a 'container' of sorts in the subplots, and then doing the full process to generate the animation, since each chart is being almost rebuilt in the subplots chart, from my understanding, it wouldn't be possible to cram the already processed output inside it, but I'd be happily wrong haha

@SoundSpinning
Copy link
Contributor Author

Thanks for the answers. When I get a min I may dive into your code a bit, and see if we resolve the mystery of the closed line plots! My problem is that I'm fairly new to Github, only use it for my small projects, and pages for web devs. I've never done a PR so I may dig in in the docs and learn how to do it, and give you a hand, if I can. I may fire some questions on the way if that's OK, before I screw anything up...

Have a good day, I'll be in touch, I'm gonna test further your tool, I do think there are good applications for it, and I'm hoping to be able to add some (non-covid) examples.

@JackMcKew
Copy link
Owner

Here's a good introductory article for submitting PRs: https://opensource.com/article/19/7/create-pull-request-github.

Feel free to reach out on any of my social medias linked on my profile for anything that I can help with.

@JackMcKew JackMcKew added the help wanted Extra attention is needed label Aug 29, 2020
@SoundSpinning
Copy link
Contributor Author

Thanks for the link, gonna give it a go this week, see what rabbit holes I find...
I do have questions about the Python set up for this:

  • Do you set up a new devs py channel and install in there your requirements.txt? So far I've installed all new packages under default conda or pip channels. What do you recommend?
  • Then do you work under the 'develop' branch, once happy do you then merge onto 'main' one? I see you also have a 'master' branch?
  • When in devs how do you call in a notebook your dev package, is it just declaring the path locally instead of 'import pandas_alive'?

Thanks.

@JackMcKew
Copy link
Owner

Awesome!

Regarding your questions above:

  1. If you're developing on Windows, here's a write up I previously did around managing virtual environments, pyenv is natively supported on other platforms if you aren't on Windows: https://jackmckew.dev/managing-virtual-environments-on-windows.html
  2. All development work should go under the develop branch until tested & approved by maintainers to be brought into the main branch and a subsequent release should be scheduled across Github, PyPI and Conda
  3. If you've followed step 1 and set up a virtual environment, this should enable you to run pip install -e . inside the pandas_alive directory to install the local version for which you can interact like you would in any other package.

These are all great things that I will put on my to do list to add to the developer notes in the README

@SoundSpinning
Copy link
Contributor Author

Excellent! Will have a read and see where all that takes me.
One more question: so, say I succeed and create a PR, do I just leave my develop branch as it is, pushed to my forked repo; i.e. without merging into my main branch till you approve it?

@JackMcKew
Copy link
Owner

Create a branch off of develop, naming it whatever you'd like, do any development work and continue to commit to the branch you created off of develop. Once you are happy with your work, you can then submit a pull request to merge your branch with develop.

Following all of that, then we go through the thorough testing and documentation updates, before merging into main and publishing a new release.

@SoundSpinning
Copy link
Contributor Author

SoundSpinning commented Sep 3, 2020

Hi Jack. I've done quite a bit of reading/learning on setups, etc. Since I'm so new to all these setups and methods, I'm afraid I still have some questions:

  1. I read, several times, your blog post about Pyenv-win & virtualenv. Am I right in thinking that you ditched the conda approach because of size of environments when developing? Does conda also impact your package release size?

  2. I've been testing how to create slim envs in conda, with minimum packages and it seems to work. Is there any reason you can see that I couldn't continue using a slim env in conda with your requirements.txt installed on top of it? Would this then brake anything your end after I work on your code in this way?

As you mention in your blog somewhere, imagine you're talking to a 5-year old here... Thanks.

@JackMcKew
Copy link
Owner

Hey! Having questions is fantastic!

In regard to 1, I stopped using anaconda as it seemed to bloat my hard drive like nothing else (eg, if you install pandas in anaconda, it comes with mkl which from memory is ~150MB). I've never tried packaging from within anaconda, and personally prefer using Poetry (https://jackmckew.dev/packaging-python-packages-with-poetry.html)

Regarding number 2, I don't think this would be a problem at all, provided the package in the end is still done with Poetry, I'm not sure on the interaction between Poetry and Anaconda but Poetry only depends on the source code of the package

In conclusion, however you're doing local development, whether it's with pip, anaconda, etc doesn't matter as it'll still only be a change to the source code for which we can use Poetry to package for PyPI

Hope this makes sense, please feel free to ask any questions ☺️

@SoundSpinning
Copy link
Contributor Author

Makes a lot of sense, thanks again.

@SoundSpinning
Copy link
Contributor Author

SoundSpinning commented Sep 13, 2020

Since I spent quite a bit of time getting my head around Python set-ups with conda for project collaboration, I thought I'd put some notes together here in case it is useful to other newbies like me into this wonderful world of open source collaboration.

Python set up with conda for project collaboration

Note: these comments refer to an installation on a Windows 10 PC.

Preamble:
Anaconda is a great starter program to learn Python for Data Science and others. With a single install you get hundreds of packages which will allow you to follow most tutorials while learning. This, most likely, will be via the Jupyter notebook app already included in the main install.
Anaconda will install a default Python environment called base, and may be at least 3GB large on your PC. The main package & environment manager is called conda. However, once you start working in collaboration with other Python projects, it is not recommended to continue with the base (default environment). This is because project authors will be careful to stick to specific package versions for development. To be able to work with these projects you'll need to create new conda environments for each project, and then install those specific package versions under your new Python project environment.
Therefore, to simply copy (clone) your (large) base environment into a new one is not a good idea. It'll over time eat disc space, but also conda may become slow. Worst of all, you won't guarantee that your setup will work with the package versions for that specific project.

So, what to do next? This is what it worked for me and keeps disc size to a minimum and allows for many conda environments to work fine (so far) with other projects.

1.- I uninstalled the whole of the Anaconda program from my PC, including deleting manually any anaconda entries anywhere on my machine.

2.- I installed Miniconda3 under C:\miniconda3. Make sure you do not install it under a path with spaces in it, or else things may break later on and cause hassles.
Miniconda installs only conda, python and pip in the base (default) environment and a very small number of dependencies.

3.- Set up conda & the conda-forge channel:
In order to use conda (as the general package & environment manager) from base for all other conda environments without having to install it on each of them, all you need to do is to make sure you add to the (Windows) PATH the path to the conda main installation folders. It should look something like this in your PATH: C:\miniconda3;C:\miniconda3\condabin;C:\miniconda3\Scripts ... (your other PATH entries).
Once this is done you can then call (use) the conda command from any other environment while keeping a single install of conda in the base env only.

conda-forge is a community driven conda packages repository, which is recommended as your main channel for installs. pandas_alive was recently added to this channel, neat. You'll need to execute these commands once in an Anaconda terminal to set this up permanently:
conda config --add channels conda-forge
conda config --set channel_priority strict
After the above, each time you do conda install package-name on any environment it'll look 1st into the conda-forge channel.

4.- Set up a new conda env example:
conda create -n py38 python=3.8
conda activate py38
conda install pandas matplotlib notebook pandas-alive ffmpeg xlrd jupyter_contrib_nbextensions tqdm
This example will install the above main packages and (many) dependencies. To keep track of what's installed in each env, conda can export YAML files so that they can later be used to create/update an env. You can create a *.yml for your new env like this:
conda env export --from-history > env-name.yml
Note the --from-history flag. This is very handy as it'll list only the main packages versions without their dependencies. This allows for a safer cross-platform install minimising breaks from dependencies. You can then continue hand editing this file when you add any main package and its version, without having to list all dependencies.

Attached is a (small) .yml file which worked for me setting/updating a conda environment to work with the pandas_alive project.
py38-pandas_alive.zip:
conda create -f py38-pandas_alive.yml
or after hand editing the file adding new main packages:
conda update -f py38-pandas_alive.yml

NOTE: I found a gotcha in Windows where conda will write the 1st .yml file encoded as UTF-16. This file will not work later on with conda create/update. The trick is to open the .yml file in your text editor, change the encoding to UTF-8 and save the file. Then continue editing it if required in your text editor and should work fine from there onwards.

@JackMcKew : does PIP allow you to create a similar env file with only the main packages; i.e. without all the dependencies? This would reduce the size of your requirements.txt and will be easier to maintain. It also seems to be advised (from what I've read) when installing the project cross-platform (Windows, Linux or iOS).

5.- Some recommended setting in Windows:
If you've come this far here, you most likely have played with GIT and have git for windows installed. From my experience I also recommend to install MINGW64 on your Windows setup. Both combined and set up properly will bring all the useful Linux tools you may need, not only on the git bash terminal, but also across the Windows Powershell command window too.

The above is useful with Python via conda, because you need to use the Anaconda command window. In Windows when you 1st search for this you'll most likely be confronted with two options: usual CMD style prompt, and the Powershell prompt version. I suggest you use the latter and pin it to your task bar for future easy access. If all set up properly then you will have access to the history of commands from all windows and with Linux commands.

A very useful shortcut is Ctrl+r. This will allow you to search back in the history of commands as you type a string. Repeat the same shortcut if the 1st search is not the command you want and it'll search back further. It saves a lot of time when trying to remember and/or go back to long or unusual commands without having to google them again.

Another useful conda command when you need to clean up your overall python set up is:
conda clean -a

6.- IDE (code editor choice):
I used for years Sublime Text. However, I recently switched to VS code and it seems brilliant, free and with monthly updates. It has an instant and excellent integration with GIT and Python environments, including conda. Well worth a try.

Please add to this thread any other tips and tricks from your experience with conda installs.

@JackMcKew
Copy link
Owner

My apologies for not replying to this sooner! 🙃

I've added this in 5cfa900 which will form part of the documentation.

The requirements.txt in the root of this directory is for the CI/CD processes on Github actions, the pyproject.toml is what lists the dependencies for when a user pip installs pandas_alive and only the lists the necessary packages for operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants