Skip to content

Commit

Permalink
Add group-by feature in APM rules (#155001)
Browse files Browse the repository at this point in the history
## Summary
Adds group-by dropdown in the following APM rules.
- APM Latency threshold (Preselected fields: `service.name`,
`service.environment`, `transaction.type`)
- APM Failed transaction rate (Preselected fields: `service.name`,
`service.environment`, `transaction.type`)
- APM Error count threshold (Preselected fields: `service.name`,
`service.environment`)

<img width="609" alt="Screenshot 2023-04-17 at 13 44 34"
src="https://user-images.githubusercontent.com/69037875/232475262-41786edf-d16b-4b1f-90a9-8fe242a36bcc.png">

The preselected fields cannot be removed by user. The `transaction.name`
field is selectable by user from the group-by dropdown.

- #154535
- #154536
- #154537

Reason message is updated to include group key instead of only service
name:
- #155011

The `transaction.name` is added to the alert document:
- #154543

The `transaction.name` action variable is added in UI:
- #154545

The `transaction.name` is added to the context of active alert
notifications:
- #154547

There are additional fields in group-by dropdown for Error count
threshold rule: #155633
- error.grouping_key
- error.grouping_name

## Fixes
- #154818

### Update on Alert Id
The alert Id is updated for all 3 rules. The new Id is generated from
the group key. This is to avoid issues similar to #154818 where alerts
are scheduled with same ID. Example of the new alert Ids -
`opbeans-java_development_request_GET /flaky`,
`opbeans-java_development_GET /fail`

## Out of scope of this PR
- Updating the preview chart based on selected group by fields

## Checklist
- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

## Release note
As the alert Id is updated for the APM Latency threshold rule, APM
Failed transaction rate rule and APM Error count rule, the existing
alerts, if any, will be recovered, and new alerts will be fired in place
of them.

---------

Co-authored-by: Katerina Patticha <kate@kpatticha.com>
Co-authored-by: Søren Louv-Jansen <sorenlouv@gmail.com>
  • Loading branch information
3 people authored Apr 25, 2023
1 parent 2fad86a commit ddd09ac
Show file tree
Hide file tree
Showing 27 changed files with 1,943 additions and 176 deletions.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

69 changes: 57 additions & 12 deletions x-pack/plugins/apm/common/rules/apm_rule_types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,14 @@ import type {
import type { ActionGroup } from '@kbn/alerting-plugin/common';
import { formatDurationFromTimeUnitChar } from '@kbn/observability-plugin/common';
import { ANOMALY_SEVERITY, ANOMALY_THRESHOLD } from '../ml_constants';
import {
ERROR_GROUP_ID,
SERVICE_ENVIRONMENT,
SERVICE_NAME,
TRANSACTION_NAME,
TRANSACTION_TYPE,
} from '../es_fields/apm';
import { getEnvironmentLabel } from '../environment_filter_values';

export const APM_SERVER_FEATURE_ID = 'apm';

Expand All @@ -40,95 +48,132 @@ const THRESHOLD_MET_GROUP: ActionGroup<ThresholdMetActionGroupId> = {
}),
};

const getFieldNameLabel = (field: string): string => {
switch (field) {
case SERVICE_NAME:
return 'service';
case SERVICE_ENVIRONMENT:
return 'env';
case TRANSACTION_TYPE:
return 'type';
case TRANSACTION_NAME:
return 'name';
case ERROR_GROUP_ID:
return 'error key';
default:
return field;
}
};

export const getFieldValueLabel = (
field: string,
fieldValue: string
): string => {
return field === SERVICE_ENVIRONMENT
? getEnvironmentLabel(fieldValue)
: fieldValue;
};

const formatGroupByFields = (groupByFields: Record<string, string>): string => {
const groupByFieldLabels = Object.keys(groupByFields).map(
(field) =>
`${getFieldNameLabel(field)}: ${getFieldValueLabel(
field,
groupByFields[field]
)}`
);
return groupByFieldLabels.join(', ');
};

export function formatErrorCountReason({
threshold,
measured,
serviceName,
windowSize,
windowUnit,
groupByFields,
}: {
threshold: number;
measured: number;
serviceName: string;
windowSize: number;
windowUnit: string;
groupByFields: Record<string, string>;
}) {
return i18n.translate('xpack.apm.alertTypes.errorCount.reason', {
defaultMessage: `Error count is {measured} in the last {interval} for {serviceName}. Alert when > {threshold}.`,
defaultMessage: `Error count is {measured} in the last {interval} for {group}. Alert when > {threshold}.`,
values: {
threshold,
measured,
serviceName,
interval: formatDurationFromTimeUnitChar(
windowSize,
windowUnit as TimeUnitChar
),
group: formatGroupByFields(groupByFields),
},
});
}

export function formatTransactionDurationReason({
threshold,
measured,
serviceName,
asDuration,
aggregationType,
windowSize,
windowUnit,
groupByFields,
}: {
threshold: number;
measured: number;
serviceName: string;
asDuration: AsDuration;
aggregationType: string;
windowSize: number;
windowUnit: string;
groupByFields: Record<string, string>;
}) {
let aggregationTypeFormatted =
aggregationType.charAt(0).toUpperCase() + aggregationType.slice(1);
if (aggregationTypeFormatted === 'Avg')
aggregationTypeFormatted = aggregationTypeFormatted + '.';

return i18n.translate('xpack.apm.alertTypes.transactionDuration.reason', {
defaultMessage: `{aggregationType} latency is {measured} in the last {interval} for {serviceName}. Alert when > {threshold}.`,
defaultMessage: `{aggregationType} latency is {measured} in the last {interval} for {group}. Alert when > {threshold}.`,
values: {
threshold: asDuration(threshold),
measured: asDuration(measured),
serviceName,
aggregationType: aggregationTypeFormatted,
interval: formatDurationFromTimeUnitChar(
windowSize,
windowUnit as TimeUnitChar
),
group: formatGroupByFields(groupByFields),
},
});
}

export function formatTransactionErrorRateReason({
threshold,
measured,
serviceName,
asPercent,
windowSize,
windowUnit,
groupByFields,
}: {
threshold: number;
measured: number;
serviceName: string;
asPercent: AsPercent;
windowSize: number;
windowUnit: string;
groupByFields: Record<string, string>;
}) {
return i18n.translate('xpack.apm.alertTypes.transactionErrorRate.reason', {
defaultMessage: `Failed transactions is {measured} in the last {interval} for {serviceName}. Alert when > {threshold}.`,
defaultMessage: `Failed transactions is {measured} in the last {interval} for {group}. Alert when > {threshold}.`,
values: {
threshold: asPercent(threshold, 100),
measured: asPercent(measured, 100),
serviceName,
interval: formatDurationFromTimeUnitChar(
windowSize,
windowUnit as TimeUnitChar
),
group: formatGroupByFields(groupByFields),
},
});
}
Expand Down
3 changes: 3 additions & 0 deletions x-pack/plugins/apm/common/rules/schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ export const errorCountParamsSchema = schema.object({
threshold: schema.number(),
serviceName: schema.maybe(schema.string()),
environment: schema.string(),
groupBy: schema.maybe(schema.arrayOf(schema.string())),
errorGroupingKey: schema.maybe(schema.string()),
});

Expand All @@ -31,6 +32,7 @@ export const transactionDurationParamsSchema = schema.object({
schema.literal(AggregationType.P99),
]),
environment: schema.string(),
groupBy: schema.maybe(schema.arrayOf(schema.string())),
});

export const anomalyParamsSchema = schema.object({
Expand All @@ -55,6 +57,7 @@ export const transactionErrorRateParamsSchema = schema.object({
transactionName: schema.maybe(schema.string()),
serviceName: schema.maybe(schema.string()),
environment: schema.string(),
groupBy: schema.maybe(schema.arrayOf(schema.string())),
});

type ErrorCountParamsType = TypeOf<typeof errorCountParamsSchema>;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,15 @@

import { i18n } from '@kbn/i18n';
import { defaults, omit } from 'lodash';
import React, { useEffect } from 'react';
import React, { useCallback, useEffect } from 'react';
import { CoreStart } from '@kbn/core/public';
import { useKibana } from '@kbn/kibana-react-plugin/public';
import {
ForLastExpression,
TIME_UNITS,
} from '@kbn/triggers-actions-ui-plugin/public';
import { EuiFormRow } from '@elastic/eui';
import { EuiSpacer } from '@elastic/eui';
import { ENVIRONMENT_ALL } from '../../../../../common/environment_filter_values';
import { asInteger } from '../../../../../common/utils/formatters';
import { useFetcher } from '../../../../hooks/use_fetcher';
Expand All @@ -27,13 +29,21 @@ import {
} from '../../utils/fields';
import { AlertMetadata, getIntervalAndTimeRange } from '../../utils/helper';
import { ApmRuleParamsContainer } from '../../ui_components/apm_rule_params_container';
import { APMRuleGroupBy } from '../../ui_components/apm_rule_group_by';
import {
SERVICE_ENVIRONMENT,
SERVICE_NAME,
TRANSACTION_NAME,
ERROR_GROUP_ID,
} from '../../../../../common/es_fields/apm';

export interface RuleParams {
windowSize?: number;
windowUnit?: TIME_UNITS;
threshold?: number;
serviceName?: string;
environment?: string;
groupBy?: string[] | undefined;
errorGroupingKey?: string;
}

Expand Down Expand Up @@ -95,6 +105,13 @@ export function ErrorCountRuleType(props: Props) {
]
);

const onGroupByChange = useCallback(
(group: string[] | null) => {
setRuleParams('groupBy', group ?? []);
},
[setRuleParams]
);

const fields = [
<ServiceField
currentValue={params.serviceName}
Expand Down Expand Up @@ -149,11 +166,42 @@ export function ErrorCountRuleType(props: Props) {
/>
);

const groupAlertsBy = (
<>
<EuiFormRow
label={i18n.translate(
'xpack.apm.ruleFlyout.errorCount.createAlertPerText',
{
defaultMessage: 'Group alerts by',
}
)}
helpText={i18n.translate(
'xpack.apm.ruleFlyout.errorCount.createAlertPerHelpText',
{
defaultMessage:
'Create an alert for every unique value. For example: "transaction.name". By default, alert is created for every unique service.name and service.environment.',
}
)}
fullWidth
display="rowCompressed"
>
<APMRuleGroupBy
onChange={onGroupByChange}
options={{ groupBy: ruleParams.groupBy }}
fields={[TRANSACTION_NAME, ERROR_GROUP_ID]}
preSelectedOptions={[SERVICE_NAME, SERVICE_ENVIRONMENT]}
/>
</EuiFormRow>
<EuiSpacer size="m" />
</>
);

return (
<ApmRuleParamsContainer
minimumWindowSize={{ value: 5, unit: TIME_UNITS.MINUTE }}
defaultParams={params}
fields={fields}
groupAlertsBy={groupAlertsBy}
setRuleParams={setRuleParams}
setRuleProperty={setRuleProperty}
chartPreview={chartPreview}
Expand Down
Loading

0 comments on commit ddd09ac

Please sign in to comment.