YARA Forge specializes in delivering high-quality YARA rule packages for immediate integration into security platforms. This tool automates the sourcing, standardization, and optimization of YARA rules from a variety of public repositories shared by different organizations and individuals. By collating these community-contributed rules, YARA Forge ensures that each package meets rigorous quality standards, offering a diverse and comprehensive rule set.
The output is a set of reliable, performance-oriented YARA rules, curated from these public repositories and made ready for use by analysts and security teams. With YARA Forge, you get a straightforward solution to employing consistent and effective YARA rules from a broad community base without the hassle.
Choose the YARA rule set that meets your requirements:
- Core Set: Contains only rules with high accuracy and low false positive rates, optimized for performance. Ideal for critical environments where stability is key.
- Extended Set: Expands the Core Set with additional threat hunting rules for a wider coverage, accepting minimal increases in false positives and scan impact. Suitable for balanced security needs.
- Full Set: Incorporates all functional rules, prioritizing breadth of threat detection. Best for scenarios where extensive coverage outweighs the cost of higher false positives and resource use.
This section provides details on each rule set and their intended use in a technical environment.
The "Core" rule set excludes rules with a high propensity for false positives and those that significantly affect system performance. It filters out low "score" rules, often used for threat hunting, and low "quality" rules that may slow down scans. This set prioritizes stability and accuracy.
Use the "Core" rule set if you need to:
- Minimize false positives
- Ensure rules are performance-optimized
- Employ a concise set of highly accurate rules
The "Extended" rule set builds upon the "Core" by adding effective threat hunting rules that might slightly increase false positives and affect scan performance. It excludes rules that are experimental or have very low quality scores.
The "Extended" rule set is ideal for those who:
- Want a balance between detection capability and performance
- Prefer broader coverage with a controlled increase in false positives
The "Full" rule set includes all operational rules from the repositories, except for those that are non-functional or have a severe impact on performance. It comprises threat hunting rules and expects a higher volume of false positives.
Choose the "Full" rule set if you:
- Aim for the broadest threat coverage available
- Have a system in place to manage false positives effectively
- Are less concerned about the impact on system resources and scanning speed
In the collection phase, YARA rules are retrieved by cloning their respective GitHub repositories. This approach is chosen over downloading ZIP files to obtain commit history, which provides additional data on rule creation and modification times if such timestamps are not present in the rule metadata. The repositories are cloned into the ./repos directory, with each repository stored in a separate subdirectory. License information for each rule is also extracted at this stage and recorded for subsequent use. The collected rules are parsed, and metadata is structured, capturing details such as retrieval time, latest commit hash, repository owner, and branch.
During processing, rules are conformed to a standardized format defined in the project's YARA Style Guide. This includes ensuring that all required metadata fields are present and uniformly named across the dataset. For example, various terms like "information," "details," and "desc" are consolidated under a single "description" field. Additionally, every rule is assigned a unique identifier (UUID) based on its characteristics, ensuring distinct identification across the dataset. An automated tagging system is also employed to enrich the rules further. This system extracts specific characteristics such as CVE numbers and MITRE attack techniques from the rules and adds them as tags, enhancing their searchability and relevance for specific security scenarios. This normalization facilitates compatibility with different security products that consume these rules. Duplicate detection goes beyond name comparison, extending to the logical structure of the rules to ensure only unique rules are retained. Private rules are renamed and all corresponding references are updated accordingly to maintain consistency.
Quality assessment is conducted by assigning different scores to each rule based on several characteristics (quality, importance, severity), with the aim of quantifying rule relevance for the different output packages. The base quality score from the YARA Forge configuration file is adjusted according to any detected issues with a rule. Rule evaluation is performed using YARA for detecting syntax errors and performance warnings, and yaraQA for identifying less apparent logic and performance issues. yaraQA checks include evaluations of string duplication, atom length, module calculation costs, and regular expression performance. Issues identified result in deductions from the rule's quality score, as defined in the configuration file. Manual quality deductions are also applied for rules known to generate false positives in goodware databases.
The output phase involves the creation of rule packages. Three distinct rule sets are defined: Core, Extended, and Full, each with its own filtering criteria as specified in the configuration file. Packages are assembled with a header section that includes metadata and license information. These are then saved in the ./packages directory within their own subdirectories. A final verification step is performed using YARA to confirm there are no errors in the compiled rule sets.
The --debug flag enables verbose logging, providing detailed information on the rule retrieval process, quality checks, and filter application. This facilitates debugging and offers transparency into the rule selection and packaging process.
The log file of the latest build can be viewed here
The discovered quality / performance / resource usage issues can be found here
Before:
After:
The file yara-forge-rule-issues.yml
contains all identified issues with the collected rules. It gets uploaded as an attachment to each rule set release.
/*
* YARA-Forge YARA Rule Package
* https://github.com/YARAHQ/yara-forge
*
* Rule Package Information
* Name: core
* Description: Default YARA Rule Package - Core
* YARA-Forge Version: 0.6.0
* YARA-QA Commit: 6d0cfc3b5356c3a58f79d98077ad505e4493785c
* Minimum Quality: 70
* Force Include Importance Level: 80
* Force Exclude Importance Level: 50
* Minimum Age (in days): 1
* Minimum Score: 65
* Creation Date: 2023-12-04
* Number of Rules: 7164
* Skipped: 0 (age), 583 (quality), 663 (score), 1195 (importance)
*/
/*
* YARA Rule Set
* Repository Name: Elastic
* Repository: https://github.com/elastic/protections-artifacts/
* Retrieval Date: 2023-12-04
* Git Commit: cb45629514acefc68a9d08111b3a76bc90e52238
* Number of Rules: 1331
* Skipped: 0 (age), 69 (quality), 0 (score), 0 (importance)
*
*
* LICENSE
*
* Elastic License 2.0
URL: https://www.elastic.co/licensing/elastic-license
## Acceptance
...
You can find the YARA Forge program code here.
The YARA rule packages are released as GitHub releases in the YARA Forge repository.
- Add more repositories: I'd like to add more repositories to the collection. If you know of any good repositories, please let me know. (I'm in contact with Jakub from Avast and I'm going to add the Avast rules to the collection as soon as they've decided on a license)
- Automatic transformations: I'd like to automatically transform rules to improve them, e.g. rewrite a
$mz = { 4d 5a }
asuint16(0) == 0x5a4d
- Keep some of the formatting in the conditions or apply the one described in the best practice guide
- Improved and added performance measurements: I'd like to measure the performance of other string types (not just regular expressions), complex condition evaluations and imported module functions (e.g.
pe
module functions). - Automated rule testing: I'd like to automatically test the rules against a set of goodware and malware samples to identify false positives and false negatives. Currently I still test the rules manually and add negative scores in the
custom-score-reductions.yml
file for rules that have shown to produce false positives on our internal goodware set. In order to test them live in the github workflows, the script would need access to an Mquery or Klara instance from within the workflows to evaluate the number of false positives matches while it's running. I still don't know how to approach that challenge. Please contact me if you have an idea how to do this. - Better tracking of changes in rules and repositories: I'd like to track changes in rules better. Currently I only track the latest commit hash of a repository and the latest modification time of a rule. This would allow me to identify rules that have been changed and need to be re-evaluated. (e.g. false positives have been fixed, performance issues have been resolved, etc.)
- Automated tagging of rules based on keywords / regular expressions to automatically add MITRE ATT&CK techniques and other tags
I want to express my sincere appreciation to all the repository owners and rule authors who have indirectly contributed to YARA Forge. Your commitment to cybersecurity and the high quality of your work have allowed me to offer a solution that effectively reformats, filters, and repackages your rules into more functional and accessible packages.
Below is the list of repositories included in the initial release of YARA Forge:
Status as of 2021-12-12
- ReversingLabs
- Elastic
- R3c0nst
- CAPE
- BinaryAlert
- DeadBits
- DelivrTo
- ESET
- FireEye-RT
- GCTI
- Malpedia
- McAfee ATR
- Arkbird SOLG
- Telekom Security
- Volexity
- JPCERTCC
- Signature Base
- SecuInfra
- RussianPanda
- Others via Michael Worth's repository
For a full list of all currently integrated YARA rule repositories, review the config file here.
Author of YARA Forge: Florian Roth