Part 3 - YARA Rule Engineering and Key Modules.
Approach
To understand the concept of rule engineering, we can break it down into three key phases.
First, we start by identifying the type of target file we’re dealing with. Next, we look for interesting artifacts that can be used to craft the detection rules. Finally, we convert these identified artifacts into patterns.
Building blocks for Rules
In a well-written YARA rule, it’s beneficial to include several key sections, although you can skip some of them if necessary. The essential section you must include is the “condition” section. However, I strongly recommend using all the following sections,
Believe me, If someone reviews your rule later, they will appreciate having the complete context and rationale behind it.
Here are some key concepts I’ll cover in this section. However, it’s important to note that covering every nuance of rule engineering is not feasible. For a more comprehensive understanding, I recommend the official YARA documentation.
Define Patterns
Patterns are the core of any rule, and the effectiveness of your rules largely depends on your expertise in file analysis. There are three primary ways to define patterns within a rule:
- ASCII Strings: Define patterns based on plain text strings found within the file.
- Hexadecimal Byte Sequences: Patterns are defined based on specific sequences of bytes (This is one of the most common approaches).
- Regular Expressions: Define complex patterns Using regex.
The HEX patterns are written inside
{ }
and REGEX patterns are written inside/ /
It’s good to know how hex patterns are written in the rule. I have covered a few commonly used patterns that may be useful when you get started with defining the rules.
Wildcards | { 6D 81 6C ?? ?? 72 65 } |
Alternatives | { (6D | 7D) 61 6C 77 61 72 65 } |
Jumps | { 6D 81 6C [1-3] 65 } |
Wildcards
are used when you don’t know the exact value. You can use placeholder characters ?? to represent unknown bytes.
Alternatives
can be used when there is uncertainty. For example, you might specify a pattern to match either “AA” or “BB.”
Jumps
are very useful when writing rules. There are different types of jumps you can define when creating rules as shown below.
1
2
3
4
5
[1] -> Junp one byte and look for the pattern defind after this jump.
[1-3] -> Specify the Range and look for the pattern defind after this jump.
[40-80] -> Specify the Range and look for the pattern defind after this jump.
[40-] -> Infinite jump and look for the pattern defind after this jump.
[-] -> Jump from 0 to Infinite and look for the pattern defind after this jump.
Sometimes it can be challenging to write a pattern without using regular expressions. As we know, there is a significant learning curve to crafting effective regex patterns. However, there’s no need to worry now, AI tools like ChatGPT can help define regex patterns for you. Just remember to review any patterns generated by these tools before using them into your YARA rules.
Define Modifiers
To define a string pattern within a rule, the string itself must be declared as a variable. You can use special keywords a.k.a modifiers
, to instruct the YARA engine on how to handle these string patterns. Some of the common modifiers I’ve used when writing rules are listed below. As mentioned multiple times in this series, refer to the official YARA documentation for more detailed and up-to-date information.
Keyword | String Types | Notes |
ascii | Text, Regex | match ASCII characters |
nocase | Text, Regex | Ignore case, Text strings in YARA are case-sensitive by default |
wide | Text, Regex | match UTF16 characters,typical in many executable binaries |
fullword | Text, Regex | match only if delimited by non-alphanumeric characters |
xor | Text | search for strings with a single byte XOR applied to them |
base64 | Text | search for strings that have been base64 encoded |
base64wide | Text | search for strings that have been base64 encoded & apply Wide |
Define conditions
The condition section is mandatory and, in practice, quite straightforward. The condition
section defines the criteria for the rule to trigger a successful match. I’ve included a few examples below, which are self-explanatory. More complex examples will be covered in the upcoming section, where I’ll explain their use cases.
1
2
3
4
5
6
7
8
9
all of them
any of them
2 of ($a,$b,$c)
3 of them
4 of ($a*)
$a and not $b
(not $a) and (filesize > 0)
math.entropy(0, filesize) >= 7.0
filesize < 60KB and ( 1 of ($x*) or all of ($s*) )
Define Loops
Logical operators and loops can be used within the rule. Frankly speaking, I haven’t used loops much in most of my rules.
1
2
3
4
for all of them :
for all of ($a*) :
for any section in pe.sections : ( section.name == ".text" )
for any i in (0..pe.number_of_sections-1) : ( pe.sections[i].name == ".text" )
Define Scope
We can use a few keywords, such as global
, private
, and include
, to define the scope of the rules.
The global
keyword helps to enforce restrictions across all rules simultaneously. For example, if you want to apply a rule for files that are less than 1 MB, are Windows PE files, and are NOT signed, you can define these conditions using the global keyword instead of specifying them in each individual rule.
1
2
3
4
5
6
7
8
9
10
11
Import "pe"
global rule GlobalRule
{
condition:
filesize < 1MB and
pe.is_pe and
not pe.is_signed
}
rule rule1 { …. }
rule rule2 { …. }
rule rule3 { …. }
The private
Keyword suppress the output when match on a given file and prevent cluttering the outputs.
1
2
3
4
5
6
private rule rule1 { …}
private rule rule2 { …}
private rule rule3 { …}
rule rule4{ …. }
rule rule5 { …. }
rule rule6 { …. }
The include
Keyword help to organize the rules in multiple rule files. For example webshell.yara
rule can include multiple webshell yara rules as shown below.
1
2
3
4
5
6
include "/sftp//yara/includes/c99.yar"
include "/sftp//yara/includes/chinaC.yar"
include "/sftp//yara/includes/asp.yar"
include "/sftp//yara/includes/php.yar"
include "/sftp//yara/includes/iis.yar"
include "/sftp//yara/includes/other.yar"
Define YARA Modules
Modules have functions which can be used when writing a YARA rule. They often do the heavy lifting so that we can write less code when developing rules. Consider it as modules we import in programming languages such as Python in order to reuse existing code to achieve something. Here are some of the key modules available when writing YARA rules. YARA developers continuously add new features to existing modules and create new ones, so be sure to check the official documentation for the most up-to-date details.
There is one caveat, though. The modules are highly dependent on the YARA version, hence it is important to check which YARA version is running on the target application. If the target application is running with a lower YARA version than the one you tested the rules with, then there may be a chance that it won’t work. This is one reason people often complain that the
rules work perfectly fine in the test environment but not in production
. Also, note that some of the tools did not implement all the YARA features into their tech stack due to various performance reasons. Read the documentation of the respective tech stack before writing the rules and include only the modules supported by the target tool
Here is the link to all available modules. Have a look before writing your next YARA rule; there may be something already there.
PE Module
The PE module exposes most of the fields present in a Microsoft Windows PE file format header. here are some of the commonly used functions when writting rules related to .exe and .dll ,
1
2
3
4
pe. is_pe
pe.timestamp
pe.signatures.*
pe.signatures.serial
Console Module
The console
module helps the analysts in writing and debugging rules by logging information during execution, such as PE header details. I’ll primarily use this module for debugging rules or for file analysis itself.
VT Module
The VT
(VirusTotal) module is a significant topic and is covered in greater detail in another section of this series. This module offers a wide range of features that can be utilized on the VirusTotal platform for both live and retrospective hunts.
Other commonly used modules are listed below:
1
2
3
4
math.entropy
dotnet.number_of_resources
hash.sha256
magic.mime_type
Identify File Types
When working with YARA, you may encounter different types of files, and identifying the file type can sometimes be daunting. However, don’t worry YARA provides multiple ways to define the file type, particularly based on byte sequences and their locations.
We can write rule conditions that depend on data stored at a certain file offset or memory virtual address, using the following functions,
int 8/16/32 | reads 8, 16, and 32 bits signed integers - little-endian format |
uint 16/32 | reads 16, and 32 bits signed integers - little-endian format |
int 8/16/32be | reads 8, 16, and 32 bits signed integers - big-endian format |
uint 16/32be | reads 16, and 32 bits signed integers - big-endian format |
In a
little-endian
format the byte order is reversed with the most significant byte on the right
Here are some of the most commonly used byte sequences, also known as magic numbers
, that I have come across while writing rules:
Magic Number | Description |
uint16(0) == 0x5a4d | MZ signature at offset 0 |
uint16be(0) == 0x4D5A | MZ signature at offset 0 |
uint16(0) == 0x457f | Linux ELF signature at offset 0 |
uint32be(0) == 0x7f454c46 | Linux ELF signature at offset 0 |
uint32(0) == 0xfeedface | MacOS macho2 |
uint32(0) == 0xfeedfacf | MacOS macho64 |
uint32(0) == 0xcefaedfe | MacOS macho64_2 |
uint32(0) == 0xcffaedfe | MacOS macho64_3 |
uint16(0) == 0xcfd0 | Word/Office Document |
uint32(0) == 0x74725C7B | rtf signature at offset 0 |
uint32(0) == 0x52617221 | rar signature at offset 0 |
uint32(0) == 0x04034b50 | zip signature at offset 0 |
uint16(0) == 0x1f8b | gzip signature at offset 0 |
uint32(0) == 0x377abcaf | 7zip signature at offset 0 |
uint32(0) == 0x75737461 | tar signature at offset 0 |
uint16(0) == 0x004c | Windows lnk signature at offset 0 |
uint32(0) == 0x25504446 | pdf signature at offset 0 |
I’ve tried to cover some key approaches to rule engineering and provided real-life examples. However, this is a broad and evolving topic. I highly recommend referring to the official documentation and other resources available online to learn more about rule engineering.