Releases: CouncilDataProject/cdp-backend
Prep for Matter Text Extraction
Full Changelog: v4.1.1...v4.1.2
Reduced Complexity Pipeline
What's Changed
- feature/reduce-event-gather-complexity by @evamaxfield in #232
- Remove Unneccessary Re-Encode If Video Is Already H264 by @whargrove in #234
- Use requests stream and shutil.copyfileobj to constrain memory usage during resource copy by @whargrove in #236
Full Changelog: v4.0.9...v4.1.0
Use PyPI Faster Whisper Release, Better Word Level Timestamps, Other Minor Bugfixes
Pull in faster-whisper
directly from PyPI, new faster-whisper
lib also pulled in the base library's changes to allow word level timestamps (we no longer have to linearly interpolate! Finally, this is an attempt to fix a JSON decode error during config reading.
What's Changed
Full Changelog: v4.0.8...v4.0.9
Try to handle missing www video uris
Full Changelog: v4.0.9.rc0...v4.0.9.rc1
Faster Whisper from PyPI, Better Word Timestamps, Fix JSON Load
Pull in faster-whisper
directly from PyPI, new faster-whisper
lib also pulled in the base library's changes to allow word level timestamps (we no longer have to linearly interpolate! Finally, this is an attempt to fix a JSON decode error during config reading.
What's Changed
Full Changelog: v4.0.0...v4.0.9.rc0
Google Speech-to-Text Out, Whisper In
CouncilDataProject cdp-backend v4.0.0
just update-from-cookiecutter
.
You should re-read through the SETUP/README.md document as there is some new minor configuration required. Specifically the new PERSONAL_ACCESS_TOKEN
and Quote Increase request should be the only things that need to be updated for existing instances.
You should also lower how often your CRON event gather runs prior to running just update-from-cookiecutter
. All of the instances maintained by the CDP Core Team will be lowered to running only once per day.
Council Data Project is a backend, frontend, and cookiecutter deployment for creating a whole database, storage system, and website, for archiving, exploring, and tracking municipal council action.
This library, cdp-backend
maintains the pipelines, database models, infrastructure configuration, etc.
v4.0.0
There are two main changes for this release.
- We are swapping out Google Speech-to-Text for OpenAIs Whisper.
Specifically, we are using a forked version called faster-whisper. This new speech-to-text model performs much better (ranging from ~3.6% word-error-rate to ~9% word-error-rate on long audio files).
To use this new model efficiently, we need access to a GPU. Since GitHub Actions do not have GPUs available, we are using a system which spins up a Google Cloud Compute Engine instance, connects to it, runs our job, and then tears it down all in the course of a single GitHub Action workflow. From multiple tests, this should be a reduction in cost and processing time however with this release we will do more testing to get a better estimate.
- We have switched from MIT to MPLv2 License.
Unless you are trying to fork our code and take it private, this won't affect you.
Bugfix for Trimmed Videos During Parallel Processing
In v3.2.10, we introduced video trimming during processing in cases where users may just want to process part of a larger video. That functionality broke when trying to parallel process events because all trimmed sections were stored under the same file name. This release fixes that behavior by making the temporary file name used for the clipped portion random / a uuid.
What's Changed
- bugfix/unique-temp-filenames by @chrisjkhan in #225
Full Changelog: v3.2.10...v3.2.11
Trimming Video Prior to Processing
What's Changed
- Add transcription range fields to database and ingestion models, add … by @chrisjkhan in #221
New Contributors
- @chrisjkhan made their first contribution in #221
Additionally I would like to thank: @dphoria and @smai-f
Full Changelog: v3.2.8...v3.2.9
Event Index Chunk Upload Fix
After an initial report from @phildini a month or so ago that the Alameda instance event index was missing a lot of n-grams and @conantp's second report. @conantp investigated the issue and found that we had bug in our index chunk upload code which ultimately meant that parts of the index were simply never updated. This was a drastic bug and much thanks should be given to @conantp for both investigating, finding, fixing, and testing the changes needed.
@conantp has already ran an index generation and upload to the Asheville instance: https://sunshine-request.github.io/cdp-asheville/#/events
What's Changed
New Contributors
Full Changelog: v3.2.6...v3.2.7
Further fix infra deployment due to bad import management
Further fixes the bad library import to protect infrastructure deployments.
Full Changelog: v3.2.5...v3.2.6