Skip to content

Releases: kubecost/cost-analyzer-helm-chart

V2.5.0-rc.0

08 Nov 00:36
56faa05
Compare
Choose a tag to compare
V2.5.0-rc.0 Pre-release
Pre-release
  • GPU Savings
  • Turbonomics Integration

v2.4.2

22 Oct 18:19
43fcc54
Compare
Choose a tag to compare

What's Changed

  • Reduced aggregator memory footprint during derivation
  • Add Oracle Cloud allocation support to Kubecost
  • Restore logout button for SSO
  • UI scrolling does not work after closing cloud cost modal
  • Improve performance on certain allocation API calls
  • Add missing container security context in Prometheus server statefulset #3691
  • Add cluster controller resource helm template #3692

Full Changelog: v2.4.1...v2.4.2

v2.4.2-rc.0

10 Oct 17:33
5c4d9c2
Compare
Choose a tag to compare
v2.4.2-rc.0 Pre-release
Pre-release

Fixes

  • Allow allocations to be created from oracle cloud oke clusters.
  • Allow node group sizing to use same-family recommendations.

v2.4.1

26 Sep 21:38
4ebba2d
Compare
Choose a tag to compare

Fixes
#3685 Fix issue with invalid column name after database upgrade.
Fix an issue with idle weighted by cluster after new GPU Idle was added.
Fix an issue with request-sizing page, ensure compatible query window is used.
Fix an issue showing an error on cloud costs page inadvertently.
Fix an issue where topline allocations costs changes depending on idle and shared cost settings.
Helm Fixes
#3674 When a federated store is provided, Kubecost should either be running as agentOnly or as a statefulset.
#3684 Add /diagnostic/nodeCount endpoint to helm chart.

v2.4.1-rc.1

26 Sep 15:53
9cd1acc
Compare
Choose a tag to compare
v2.4.1-rc.1 Pre-release
Pre-release

Fixes

  • #3685 Fix issue with invalid column name after database upgrade.
  • Fix an issue where container sizing would fail when using a custom profile.
  • Fix an issue where allocations would fail when sharing both tenancy and namespaces.
  • Fix an issue with idle weighted by cluster after new GPU Idle was added.
  • Fix an issue with request-sizing page, ensure compatible query window is used.
  • Fix an issue showing an error on cloud costs page inadvertently.

Helm Fixes

  • #3674 When a federated store is provided, Kubecost should either be running as agentOnly or as a statefulset.
  • #3684 Add /diagnostic/nodeCount endpoint to helm chart.

v2.4.0

17 Sep 01:13
b98b245
Compare
Choose a tag to compare

Overview

Version 2.4 is a ‘edge’ release focused on GPU efficiency, many bug fixes, as well as several quality of life improvements.

Important Notices

  1. Upgrading to 2.4 will add new fields to the Kubecost ETL that support the GPU monitoring features. The new ETL files are not backward compatible with previous versions. Multi-cluster users MUST upgrade the primary before upgrading secondary (agents).

The current 2.3.x release is considered stable and will continue to be maintained. Kubecost will release a new 2.3.x version that is compatible with the ETL changes in 2.4.x that will allow downgrading to that version from 2.4.x. All this said, the 2.4.0 release has been extensively tested and we recommend upgrading to take advantage of the new features and significant number of bug/CVE fixes.

  1. An agent upgrade to version 2.4+ is required to gather the additional metrics for NVIDIA GPU workloads. If NVIDIA GPUs are not used, the agent upgrade is not required.

Major Features

  • [Feature] Incorporate GPU Efficiency into efficiency metrics displayed around the application.
  • [Feature] Ability to rightsize node groups in cluster-sizing. Note that this requires that the agents(secondaries) must be at or above 1.100, which added support for node labels
  • [Feature] Add options to the Allocations page to see Idle costs broken down per-node and per-cluster.
  • [Feature] Add support for Collections Budgets.
  • [Feature] Add support for Idle Costs to Collections.

Minor Features

  • [Feature] Add support for new setting in helm to enable standard discount to be applied in kubecost primary cluster installation that applies to data coming from secondary clusters.
  • [Feature] Add support for certificates when using a custom SMTP server with Kubecost.
  • [Feature] Add new FOCUS spec fields to Cloud Cost to support Account Name, Invoice Entity Name, Region ID, and Availability Zone.
  • [Feature] Add the ability to support BYO certificates for SMTP integration.
  • [Feature] Add a check in the Settings page which alerts users when their Helm Chart, UI image, and API image versions are not in sync.
  • [Feature] Add four new fields from the FOCUS spec to Cloud Costs.
  • [Feature] Add four new Fields from the FOCUS spec to Cloud Budgets.
  • [Feature] Add limited support for feature-flagging via the Helm chart.
  • [Feature] Agent diagnostics is now enabled by default
  • [Enhancement] Substantial application-wide improvements to WCAG 2.1 AA accessibility.
  • [Enhancement] Add a loading indicator when downloading request sizing CSVs to show that the download has, in fact, been initiated.
  • [Enhancement] Add a loading indicator when request sizing data is refreshing.
  • [Enhancement] Remove the “New” badges from pages that were introduced in 2.0.
  • [Enhancement] Default Request Sizing window to 3d instead of 48h. Using 48 data points was causing the page to hang or crash for some larger data sets.
  • [Enhancement] When an array of empty data is returned from the custom costs API, show an informative message rather than and empty graph/table.
  • [Enhancement] Show an informative message when the Request Sizing API returns a response with an empty set of Recommendations.
  • [Enhancement] Show an informative message when attempting to create a Budget fails.
  • [Enhancement] More information in bug reports.
  • [Enhancement] Show a more informative error response when cluster sizing recommendations cannot be generated due to not finding cloud provider information for a cluster.
  • [Enhancement] Show friendlier Cloud Account Names in Overview / Cloud Cost tables instead of Cloud Account IDs, when names are available.
  • [Enhancement] Add the ability to see aggregator PV usage in /diagnostics page.

Fixes

  • [Fix] Add a new script for copying alerts to the aggregator pod from cost-model as we moved this endpoint over. If you have alerts configured prior to 2.4, you’ll need to run this script upon upgrading.
  • [Fix] Fix an issue where overview cluster efficiency shows usage as 0.
  • [Fix] Fix an issue where resource hourly cost is incorrectly calculated on drill down.
  • [Fix] Fix an issue when changing from separate idle by node to another idle configuration.
  • [Fix] Fix an issue with GPU idle calculations in allocation.
  • [Fix] Fix assets that appear to be missing account ID.
  • [Fix] Fix an issue causing discrepancies in collections cost in the k8s domain for query windows that yield relative date boundaries.
  • [Fix] Fix csv pricing for gpus not correctly reflecting in kubecost.
  • [Fix] Fix an issue with the Allocation API not matching Allocation Summary API on costs.
  • [Fix] Fix an http 500 error in cluster right-sizing.
  • [Fix] Fix an issue with Allocation API calculation on PV costs.
  • [Fix] Fix an issue with Allocation API and Allocation Summary API cost accuracy when cost metric is not set to cumulative cost.
  • [Fix] Fix an issue with AKS reconciliation of BRL currency costs.
  • [Fix] Fix an issue with Asset budgets using the ‘Project” workload type.
  • [Fix] Fix several issue with /clusters page, issues causing inaccurate provider selection, as well as costs.
  • [Fix] Fix aws:eks:cluster-name tag not being picked up.
  • [Fix] Fix an issue causing inflated network costs for Azure clusters.
  • [Fix] Fix an issue where HA and DR icons are not working properly on /settings page.
  • [Fix] Fix an issue with Carbon Costs and Trends getting HTTP 500 in allocations.
  • [Fix] Fix issue in orphaned resources API causing a 500 error on a single resource lookup failure from provider.
  • [Fix] Fix issue in allocations presenting non zero shared costs when sharing is disabled.
  • [Fix] Fix the scalability of the clusters API for accuracy and speed.
  • [Fix] Better error handling in some cases where the app fails to start. Allow users to enter a license key or start/extend an Enterprise trial when blocked on license violations.
  • [Fix] Update math in the Overview’s efficiency graph card so as not to show negative allocation, which is impossible.
  • [Fix] Remove the Category filter from Asset Budget filter options, as it is unsupported.
  • [Fix] Prevent drilling into Pod items in the Efficiency page. Previously, this would set the aggregation to Namespace and remove all filters.
  • [Fix] Request Sizing had two separate UI elements for setting Filters. The one in the Customize menu has been removed.
  • [Fix] Remove an unnecessary check for the presence of the Network Cost daemonset on the primary cluster before rendering the Network Costs page. Secondary clusters may be reporting network costs that can be viewed from this page, regardless of the state of the daemonset on the primary.
  • [Fix] Prevent querying for data older than the 15 day retention period for Free tier in the Collections and Efficiency pages.
  • [Fix] Correctly generate links from the Allocations page to the Request Right Sizing page when filtering and/or aggregating by custom label.
  • [Fix] Correct an error that resulted from savings Cloud Cost reports with custom labels.
  • [Fix] Correct a broken link to the Efficiency Report documentation.
  • [Fix] Fix a bug in Assets where updating the Cost Metric field would remove any applied filters.
  • [Fix] Fix an issue where step size was not honored in Efficiency Reports.
  • [Fix] Fix a variety of issues in the Allocation Detail Modal (shown when clicking on a Pod row). This modal would issue an incorrect and expensive Assets query to try to derive the Pod’s Node. When it failed, it would show a cryptic message about credentials.
  • [Fix] Fix a bug that caused the Clusters list to filter incorrectly.
  • [Fix] Remove the unallocated item from the Overview’s Namespace Breakdown table.
  • [Fix] Fix an issue where sometimes applying a license would hide the current active Free Enterprise Trial status and vice-versa. The settings page now always shows both the active license and the state of an installations free trial.
  • [Fix] Fix an issue where custom SMTP tests/updates from the UI could fail.
  • [Fix] Fix Alerts only alerting on data from the Primary cluster. All alerts except Cluster/Application Health alerts will leverage data from secondary clusters.
  • [Fix] Don’t try to show all per-day cluster costs in the Overview page. Show top 10 like we do in other graphs.
  • [Fix] Fix an issue where UI-created Budgets that reset on Sunday did not create correctly.
  • [Fix] Fix an issue where the UI could send an incorrect parameter to the Cluster Sizing API.
  • [Fix] Fix an issue with Assets monthly totals not appropriately lining up.
  • [Fix] Fix category options in asset autocomplete.
  • [Fix] Fix an issue where namespace turndown always shows the next run as ‘coming soon’.
  • [Fix] Fix alerts to be multi-cluster aware.
  • [Fix] Fix missing claim names in persistent volume sizing.
  • [Fix] Fix the default experience for cluster right sizing when current daily data isn’t yet available.
  • [Fix] Fix an inaccuracy in pod costs on abandoned workloads savings page.
  • [Fix] Fix an issue where the cluster provider name could be incorrect on the clusters page.
  • [Fix] Fix an issue where total and page count on container right-sizing page had values when no recommendations were available.
  • [Fix] Fix an issue where database timestamps weren’t being correctly set for some data, defaulting to Jan 1st 1970.
  • [Fix] Fix an issue with PV discrepancy between allocation and allocation summary API.
  • [Fix] Fix an issue with saving SMTP configuration after edits.
  • [Fix] Fix an issue where aggregator can run out of pv space and no warnings to the frontend are available.
  • [Fix] Fix an issue where shared costs do not show correctly in the top level allocations view.
  • [Fix] Fix an issue where node counts don’t match across allocation, assets, and cluster inspect.
  • [Fix] Fix an issue where allocation API does not matc...
Read more

v2.3.5

29 Aug 01:33
52a30b2
Compare
Choose a tag to compare

Fixes

  • Reduced memory footprint.
  • Fix an issue blocking settings page because of core count limit.
  • Fix an issue with Cloud costs page not leveraging cloud account mappings.
  • Fix an issue with higher than normal numbers in gcp cloud costs.
  • Fix an issue with aggregating by label on right-sizing.
  • Fix an issue with request-sizing sort by field causing an api error.
  • Fix an issue with scheduled reports not sending.
  • Fix an issue with collections adding cloud cost with custom labels.
  • Fix an issue with external costs not getting ingested properly.
  • Fix an issue with asset budgets getting an api error using service workload type.
  • Fix an issue with budgets page resetting weekly send 1-7 from 0-6.
  • Fix an issue in budgets collection selector not showing selected value.
  • Fix an issue causing prometheus query error in local storage queries.
  • Fix an issue creating cloud cost reports via helm chart.
  • Fix an issue with trial status disappearing on upgrade of eks optimized.
  • Fix an issue with slow queries on cluster status api.
  • Fix an issue with invalid provider name in the cluster list and cluster detail.
  • Fix error visible in aggregator logs “append row Failure: acquiring max concurrency semaphore: context canceled” resulting in hung api responses.
  • Fix an issue with allocation top line matching allocation summary api.
  • Fix an issue with assets not matching allocation summary api.
  • Fix an issue with adjustments when no cloud integration or custom pricing is enabled.
  • Fix an issue with auth loops when OIDC values are set.
  • Fix an issue with azure costs being lower in kubecost than azure.

Helm Fixes

  • #3595 Fix okta redirect loop.
  • #3600 Reduce Memory usage.

Security Updates

Known issues:

  • prom/prometheus v2.53.1 in our helm chart has a known critical CVE-2024-41110 that has not been resolved upstream. Once this is resolved we will patch again.
  • redhat/ubi9 has a known high CVE-2024-6345 that has not been resolved upstream. We are working for alternate resolutions here as ubi9 has left this open for some time. Will patch a resolution for this when we have a verified and tested solution.

V2.3.4

30 Jul 17:48
d366e27
Compare
Choose a tag to compare

Fixes

  • Fix Ingestion progress bar issues where the green bar doesn’t complete.
  • Fix an issue with missing windows keeping ingestion from completing.
  • Fix an issue with efficiency numbers matching when you drill down from cluster to namespace.
  • Fix an issue with cluster totals matching the sum of the pages.
  • Fix an issue with abandoned workloads when a pod has 0 utilization.
  • Fix an issue with request sizing when a pod has 0 utilization.
  • Fix an issue with the cluster right-sizing page when some clusters are missing configuration.
  • Fix an issue with cloud cost scheduled reports not working appropriately.
  • Fix an issue with asset budget reports showing the wrong aggregation on form.

Known issues:

If Grafana links are disabled, enable them by adding the following helm value:

kubecostAggregator:
  extraEnv:
  - name: GRAFANA_ENABLED
    value: "true"
  • 2.3.5: Adding a custom label in collections for cloud costs fails
  • 2.3.5: External Costs is broken for Datadog
  • 2.3.5: Double counting discounts in certain situations
  • 2.4: In budgets adding an assets tag causes alerts to fail
  • 2.4: Allocation container detail modal (last level popup) is slow/fails in large environments
  • PV cost discrepancy in multi cluster environments (no cluster filter)
  • Request-sizing cannot sort by average or max usage

V2.3.3

18 Jul 01:10
badf0e0
Compare
Choose a tag to compare

Front-End

  • Fixes an issue where budgets were not validating properly
  • Fixes an issue where Actions configs do not delete properly
  • Fixes an issue where creating a request right-sizing action, the autocomplete menu offers options outside the primary cluster
  • Fixes an issue where Cloud Budget Alerts would not work in Alerts page
  • Fixes an issue where the refresh button on KC actions page would not respond
  • Fixes an issue where changing the cost metric on the assets page didn’t result in Total Cost being updated
  • Fixes an issue where the clusters page would show an incorrect total cost for cluster and doesn't match cluster inspect page
  • Fixes an issue where you could not delete an NS Turndown Action
  • Fixed an issue where unclaimed volumes page is listing volumes that are bound and huge discrepancy compared to v1.108.1
  • Fixes an issue where creating Request Sizing actions via Helm values results in incorrect UI display
  • Fixes an issue where sorting the Allocations table crashes the page
  • Fixes an issue where collections missing filter by container option
  • Fixes an issue where estimated savings were greater than Total Cost on Cluster Page
  • Fixes an issue where drilling down when aggregated by Deployment breaks the Efficiency Reports page
  • Adds an orchestrator to bug report

Backend

  • Fixes an issue where QueryAssetCTE would cause a panic
  • Significantly improves Aggregator Tuning and Testing
  • Fixes an ingestion bug which could lead to gaps in data ingested, even though the diagnostics said all files had been ingested. This is typically caused when the Aggregator is stopped/killed. Partial or failed ingestion attempts will now automatically retry
  • Fixes an issue where transactions were not handled properly
  • Fixes "NewBillingParseSchema: failed to find Date field" error
  • Fixes cloudAccountMapping
  • Fixes an issue where odic or saml redirect on certain configurations would 404 after login
  • Fixes an issue where a PDF attached to a scheduled report is corrupt
  • Fixes an issue where custom labels would not work as expected (department, owner, etc)
  • Fixes an issue where scheduled reports were not being sent
  • Fix an issue with EKS Optimized gating from backend
  • Fixes an issue where derivation steps would run out of order
  • Fixes an issue with filters in scheduled cloud cost reports not being parsed correctly

Helm Changes

  • #3532 Add API configuration endpoints to nginx config
  • #3536 Bump Cluster Controller to 0.16.5

V2.3.2

03 Jul 00:35
3809f94
Compare
Choose a tag to compare

Frontend

  • Fix an issue where the graph on-hover tooltips in the Cloud Costs page were showing windows in local time, instead of UTC. This sometimes caused the tooltip and x-axis to disagree on date.
  • Fix an issue where the Automatic Request Sizing UI in Savings Actions was passing an invalid filter when attempting to filter workloads by Deployment.
  • Added a tooltip to the Clusters List page explaining Workload Efficiency, and changed the column name from “Efficiency” to “Workload Efficiency”.
  • Fix an issue where the Automatic Request Sizing UI in Savings Actions would sometimes produce malformed timestamps in requests to schedule recurring sizing actions.
  • Fix a UI issue where any errors that occurred during the creation of a cluster turndown schedule were not relayed to the user.
  • Fix a UI issue where creating an Alert with a window other than 1d-7d would cause it to show up in the Alerts table as having a window of “7d”.
  • Note that the UI does not have a way to accurately represent completely arbitrary windows at this time. Setting windows of e.g. “1h” via the Helm chart will produce similarly broken displays in the UI, although the alert will function correctly.

Backend

  • Fix an issue where Allocation reports with a step size of “hour” could not be saved.
  • Add a DB info log, printed after DB instance creation.
  • Improve stability and memory consumption on high scale environments by switching off Jemalloc.
  • Fix an issue where idle compute resources would sometimes not account properly in reconciled allocation data.
  • Bump image to resolve CVE-2024-35255
  • Helm chart
  • Add namespace label to aggregator serviceMonitor.
  • Add option for aggregator env var for DB_TRIM_MEMORY_ON_CLOSE.

Cluster turndown

  • Send HTTP responses in header rather than body for better compliance with HTTP standards.

Cluster controller

  • Update Cluster turndown reference version to v0.16.3

The following CVEs have been resolved: