Want to refactor a KQL query and check that the new version is compatible?
No more tedious manual comparing! This tool will judge if queries results are equivalent
Table of Contents
KQL refactoring can be motivated by many use cases including;
- Performance improvements from rephrasing the original query
- Having to switch to a new table, that is built differently, and therefore requires to adjust the original query
However, after refactoring comes the tedious task of verifying that there is no regression. This tool purpose is to make this task and easy and hint what and where differences come from.
After successful demonstration in multiple use cases that prevented bugs and saved manual work, and interest from outside of the team, we release this tool so you can benefit from it too.
Tool creates based on original and refactored queries, a new query that determines if they return the same results and what differences there are if any.
First comparing that there are the same number of results, and if so comparing columns value-by-value, according to customizable comparators.
In case that there are differences, you have access to a the comparision table so it would be easier to find the RC.
Inputs are:
header.txt
: Used for common definitions, such as the time range on which both queries are compared on or for newly defined comparatorscolumns_comparators.txt
: The comparators to use when comparing the columns, see explanation belowgold.kql
: Original KQL Queryrefactored.kql
: Refactored KQL Querygold_columns.txt
: The names of the columns of the original queryrefactored_columns.txt
: The names of the columns of the refactored query
NOTE!
The first column in each of the files { gold_columns.txt
, refactored_columns.txt
} should be a unique identifier that is used to match row-to-row the values of the tables.
e.g.
Matching by ID number.
Output is:
compare.kql
: The KQL query that used to compare both queries
The tool generates a query that matches row-by-row the results of the gold and refactored queries results tables. The comparision of each of the columns is based on the the comparators names file supplied.
The given comparators are:
regulars
: Comparision with the equal operator,x == y
reals
: Comparision of float numbers,double
type, comparision is if difference between absolute value is less thanEPSILON=10e-5
series
: Comparision of an array of float numbers, by subtraction of the values and checking if both minimum and maximum values of the result are less thanEPSILON=10e-5
You can extend and implement your own comparator and adding it to the header file, such that it's name follows the following convention is_[comparators]_equals
e.g. Just as a trivial example, case insensitive comparision of strings
let are_strings_equal = (s1:string, s2:string)
{
s1 =~ s2
};
In particular this can be extended to create a similarity comparator:
let are_series_sim_equal = (vec1:dynamic, vec2:dynamic)
{
let SIMILARITY_THRESHOLD = 0.5;
let subtract = series_subtract(vec1, vec2);
// If there is a ground truth data array, choose it as the avg
// Else, when can't estimate which is more accurate choosing average of averages
let avg1 = todouble(series_stats_dynamic(vec1)['avg']);
let avg2 = todouble(series_stats_dynamic(vec2)['avg']);
let max_diff = abs(todouble(series_stats_dynamic(subtract)['max']));
let avg_avg = (avg1 + avg2) / 2.0;
let similarity = max_diff / avg_avg;
similarity < SIMILARITY_THRESHOLD
};
- In case the refactored query is of a new table, where the timestamps are not the same due to sporadic delays, but the data itself should be similar;
1.1. Define inheader.txt
the same time range on which the results are queried, make sure to exclude last bin of data (because it contains only partial results)
1.2. Utilizemake-series
operator to aggregate the data to bins. This allows to compare aggregated rows instead of single-row to single-row
1.3. The framework will match row-to-row by the unique-id and compare each of the aggregated columns based on the comparator, you can define similarity comparators
Header:
let start = ago(1d);
let end = ago(1h);
let dt = 1h;
Query:
Table
| where Timestamp between(start..end)
| make-series AvgData = avg(Data)
on Timestamp from start to end step dt by UniqueId
| project UniqueId, AvgData
Aviv Yaniv
Email
Site
Blog
StackOverflow
GitHub
Project Euler