Skip to content

Latest commit

 

History

History
336 lines (224 loc) · 8.8 KB

KQL Guide.md

File metadata and controls

336 lines (224 loc) · 8.8 KB

Getting Started with KQL (Kusto Query Language)

This document is designed to help you get started with KQL, particularly in the context of Microsoft Advanced Hunting. KQL is a powerful query language used to extract and analyze data from large datasets. This guide will introduce you to the basics of KQL and provide tips on how to explore and query your data effectively.


Table of Contents

  1. Introduction to KQL
  2. Understanding Data Tables
  3. Basic Query Structure
  4. Common Operators and Commands
  5. Filtering Data with where
  6. Projecting and Extending Data
  7. Sorting and Ordering Data
  8. Aggregating Data
  9. Exploring Table Schemas
  10. Using Functions and Operators
  11. Joining Tables
  12. Best Practices
  13. Additional Resources

Introduction to KQL

Kusto Query Language (KQL) is a powerful language used to query large datasets stored in Azure Data Explorer, Microsoft Sentinel, Microsoft Defender for Endpoint, and other Microsoft services. It allows you to perform complex queries and data analysis with ease.

Why Use KQL?

  • Efficient Data Retrieval: Quickly extract relevant data from massive datasets.
  • Advanced Analytics: Perform aggregations, joins, and statistical analysis.
  • Flexible Syntax: Build queries that suit your specific needs.
  • Integration: Works seamlessly with Microsoft security tools for threat hunting.

Understanding Data Tables

In Microsoft Advanced Hunting, data is organized into tables, each containing records of specific event types. Some commonly used tables include:

  • DeviceProcessEvents: Information about process creation and termination.
  • DeviceNetworkEvents: Network connection events.
  • DeviceFileEvents: File creation, modification, and deletion events.
  • DeviceLogonEvents: User logon and authentication events.
  • AlertEvents: Security alerts generated by Microsoft Defender.

Understanding the structure and content of these tables is essential for effective querying.


Basic Query Structure

A KQL query typically starts with a table name, followed by a series of operators that manipulate the data.

Syntax:

TableName
| Operator1
| Operator2
| ...

Example:

DeviceProcessEvents
| where FileName == "malicious.exe"
| project Timestamp, DeviceName, FileName, ProcessCommandLine
| order by Timestamp desc

This query retrieves process events where the file name is malicious.exe, selects specific columns, and orders the results by the most recent events.


Common Operators and Commands

  • where: Filters records based on a condition.
  • project: Selects specific columns to include in the output.
  • extend: Creates new calculated columns.
  • summarize: Aggregates data using functions like count(), sum(), or avg().
  • order by: Sorts the results based on specified columns.
  • join: Combines data from two tables based on a common key.
  • parse_json(): Parses a JSON string into an object.

Filtering Data with where

The where operator filters records to include only those that meet specified conditions.

Example:

DeviceNetworkEvents
| where RemoteIP == "192.168.1.10"

Common Comparison Operators:

  • ==: Equal to
  • !=: Not equal to
  • contains: Contains a substring
  • startswith: Starts with a substring
  • in: In a list of values
  • between(): Within a range

Example with Multiple Conditions:

DeviceProcessEvents
| where FileName == "cmd.exe" and ProcessCommandLine contains "net user"

Projecting and Extending Data

The project Operator

Selects specific columns to include in the output, reducing the amount of data returned.

Example:

| project Timestamp, DeviceName, FileName, FolderPath

The extend Operator

Adds new columns or calculates new values based on existing data.

Example:

| extend FileExtension = split(FileName, ".")[1]

This creates a new column FileExtension by splitting FileName on the dot character.


Sorting and Ordering Data

The order by operator sorts the query results based on specified columns, either ascending (asc) or descending (desc).

Example:

| order by Timestamp desc

Aggregating Data

The summarize Operator

Aggregates data using functions and groups results by specified columns.

Common Aggregate Functions:

  • count(): Counts the number of records.
  • sum(ColumnName): Sums the values in a column.
  • avg(ColumnName): Calculates the average value.
  • max(ColumnName): Finds the maximum value.
  • min(ColumnName): Finds the minimum value.

Example:

DeviceFileEvents
| where ActionType == "FileDeleted"
| summarize DeletionCount = count() by DeviceName
| order by DeletionCount desc

This query counts the number of file deletions per device and orders the results by the highest counts.


Exploring Table Schemas

Understanding the structure of a table helps in crafting accurate queries. The | getschema command displays the columns and data types in a table.

How to Use getschema:

TableName
| getschema

Example:

DeviceProcessEvents
| getschema

Use this command to:

  • Identify available fields.
  • Understand data types (e.g., string, datetime, int).
  • Plan which columns to use in your queries.

Using Functions and Operators

KQL provides a variety of functions to manipulate and analyze data.

String Functions:

  • tolower(): Converts a string to lowercase.
  • toupper(): Converts a string to uppercase.
  • trim(): Removes leading and trailing spaces.

Date and Time Functions:

  • ago(): Calculates a datetime value relative to now.
  • datetime(): Converts a string to a datetime value.

Example:

DeviceLogonEvents
| where Timestamp > ago(1d)

This filters events that occurred within the last day.


Joining Tables

Combine data from two tables to enrich your results.

Syntax:

Table1
| join kind=JoinType (
    Table2
    | WhereClause
) on CommonField

Example:

DeviceProcessEvents
| where FileName == "powershell.exe"
| join kind=inner (
    DeviceNetworkEvents
    | where RemoteIPType == "Public"
) on DeviceId
| project Timestamp, DeviceName, FileName, RemoteIP, ProcessCommandLine

This query finds devices where powershell.exe ran and made network connections to public IPs.


Best Practices

  • Start Simple: Begin with basic queries and gradually add complexity.
  • Filter Early: Use where clauses at the beginning to limit data processing.
  • Verify Field Names: Use | getschema or sample data to confirm field names.
  • Comment Your Queries: Use // to add comments and explanations.
  • Limit Results for Testing: Use | take 10 or | limit 10 during development.
  • Be Mindful of Data Types: Ensure you use appropriate functions for the data type.

Additional Resources


Sample Query Walkthrough

Scenario: Identify Suspicious PowerShell Executions

Step 1: Filter for PowerShell Executions

DeviceProcessEvents
| where FileName == "powershell.exe"

Step 2: Filter for Suspicious Command Lines

| where ProcessCommandLine contains "-EncodedCommand"

Step 3: Project Relevant Fields

| project Timestamp, DeviceName, AccountName, ProcessCommandLine, InitiatingProcessFileName

Step 4: Order Results

| order by Timestamp desc

Full Query:

DeviceProcessEvents
| where FileName == "powershell.exe"
| where ProcessCommandLine contains "-EncodedCommand"
| project Timestamp, DeviceName, AccountName, ProcessCommandLine, InitiatingProcessFileName
| order by Timestamp desc

Explanation:

  • This query identifies instances where powershell.exe was run with the -EncodedCommand parameter, which is often used to obfuscate malicious scripts.

Conclusion

KQL is a versatile language that empowers you to explore and analyze data effectively. By mastering the basic operators and understanding how to manipulate data, you can uncover valuable insights and enhance your threat-hunting capabilities.


Happy querying!