Skip to the content.

Published: September 4, 2020

grap is our tool to match binaries at the assembly code level, matching control flow graphs: https://github.com/QuoSecGmbH/grap/

This post demonstrates how to use grap to quickly find and analyse documented features (based on public reports) in published QakBot samples:

This is a tutorial demonstrating grap’s features with increasing complexity.

References

[1] - Reversing Qakbot - https://hatching.io/blog/reversing-qakbot/

[2] - Deep Analysis of QBot Banking Trojan - https://n1ght-w0lf.github.io/malware%20analysis/qbot-banking-trojan/

[3] - Malware Analysis: Qakbot [Part 2] - https://darkopcodes.wordpress.com/2020/06/07/malware-analysis-qakbot-part-2/

1 - Samples

QakBot samples are packed and the unpacking process consists in decrypting a buffer that will be written over the memory-mapped first-stage PE [1].

Thus Malpedia has a 3 types of QakBot samples, they are renamed to not disclose them, for instance:

2 - Approach

We focus on features (decryption, anti-analysis, parsers…) because:

Rules (grap patterns) detecting features:

3 - First-stage PE: PE parsing

QakBot unpacks itself and fixes headers [1], we can thus expect comparison or write of values such as “MZ” (0x5a4d) or “PE” (0x4550).

The easiest way is to write a quick pattern that will be automatically converted as a full grap pattern. Looking for any instruction containing 0x5a4d can be done with grap 0x5a4d *:

$ grap 0x5a4d *
s05.grapcfg - 2035 instructions
1 matches: tmp0 (1)

tmp0 - match 1
1: 0x401933, cmp eax, 0x5a4d
---
s12.grapcfg - 1616 instructions
1 matches: tmp0 (1)

tmp0 - match 1
1: 0x402dfb, cmp word ptr [eax], 0x5a4d
---
s12_unpacked_1.grapcfg - 13891 instructions
1 matches: tmp0 (1)

tmp0 - match 1
1: 0x40451b, mov eax, 0x5a4d

This first run will disassemble the binaries using grap’s embedded disassembler (recursive disassembler based on Capstone) and save the disassembly into “.grapcfg” files so there is no need to disassemble them again.

Let’s investigate the generated pattern (grap -v will output its path, the IDA plugin can also be used) and see that this is done through a regex on the full instruction string:

PE_parsing_IDA2

A more precise condition can be written to match any instruction having 0x5a4d or 0x4550 as an argument: arg1 is 0x5a4d or arg2 is 0x5a4d or arg1 is 0x4550 or arg2 is 0x4550 (see trick_PEparsing.grapp):

PE_grapp

Within instruction conditions most fields (inst, arg1, arg2, opcode) are considered and matched as strings. Those strings are obtained through Capstone disassembly (even within IDA), patterns shall thus be written with Capstone syntax in mind.

4 - Unpacked samples

4.1 - cpuid

QakBot uses the cpuid instruction to determine whether it is running in a VM [1]. It first gets the CPU vendor with eax=0 and the processor features with eax=1 [2].

Let’s find cpuid usage within our samples:

$ grap -q "opcode is cpuid" *.grapcfg
s02_unpacked_1.grapcfg (13898) - tmp0 (2)
s12_unpacked_1.grapcfg (13891) - tmp0 (2)
s06_dump.grapcfg (21194) - tmp0 (2)
s08.grapcfg (22464) - tmp0 (2)
s08_dump.grapcfg (21151) - tmp0 (2)
s10.grapcfg (12479) - tmp0 (2)
s10_dump_2.grapcfg (15130) - tmp0 (2)
s15.grapcfg (10483) - tmp0 (2)

Two unpacked samples match, both have two ‘cpuid’ matches, let’s investigate those matches by looking also for the instruction preceding cpuid:

$ grap "*->opcode is cpuid" s12_unpacked_1.grapcfg
s12_unpacked_1.grapcfg - 13891 instructions
2 matches: tmp0 (2)

tmp0 - match 1
1: 0x406260, mov eax, 1
2: 0x406265, cpuid

tmp0 - match 2
1: 0x4062a7, xor eax, eax
2: 0x4062a9, cpuid

The pattern *->opcode is cpuid means: look for any instruction that is sequentially followed by cpuid.

This sample has the expected behavior: one call with eax=0 (xor eax, eax) and one call with eax=1 (mov eax, 1).

We can use this pattern within IDA using bindings to find where these calls are located:

cpuid_IDA2

We can also use grap to quickly review all instructions preceding cpuid to find possible alternatives:

$ grap "PRE:* -> opcode is cpuid" *.grapcfg | grep PRE
PRE: 0x406260, mov eax, 1
PRE: 0x4062a7, xor eax, eax
PRE: 0x40b146, mov eax, 1
PRE: 0x40b19d, xor eax, eax
PRE: 0x404bef, xor eax, eax
PRE: 0x404bff, mov eax, 1
PRE: 0x40b160, mov eax, 1
PRE: 0x40b1ad, xor eax, eax
PRE: 0x4087ee, xor eax, eax
PRE: 0x4087fe, mov eax, 1
PRE: 0x403d27, xor eax, eax
PRE: 0x403db7, mov eax, 1
PRE: 0x406260, mov eax, 1
PRE: 0x4062a7, xor eax, eax
PRE: 0x44a5e9, xor eax, eax
PRE: 0x44a5f9, mov eax, 1
PRE: 0x405610, mov eax, 1
PRE: 0x405657, xor eax, eax

The pattern "PRE:* -> opcode is cpuid" defines the name of the preceding instruction (PRE), allowing for easy grepping.

The cpuid instruction is, as expected, always called with eax=0 or eax=1.

A simple pattern detecting this technique is (see trick_cpuid.grapp):

cpuid_grapp.png

4.2 - VMWare detection

Reports describe Qakbot’s attempts at detecting VMWare [1] through a technique described by VMWare:

We can look for a cmp instruction with the VMware hypervisor magic value as argument:

VMWare_IDA

The IDA plugin allows for interactive pattern creation:

The shown generated detection pattern is very specific: if the mov instructions were re-ordered, the code would still be identical but the pattern would not detect it.

To base detection on the magic value and in, we can manually write a pattern that will:

We now have 3 patterns (see trick_vmware_detection.grapp):

VMWare_grapp

Let’s try our patterns on the samples:

$ grap -q trick_vmware_detection.grapp *.grapcfg
s02_unpacked_1.grapcfg (13898) - trick_vmware_detection_cmp (1), trick_vmware_detection_magic (2), trick_vmware_detection_v1 (1)
s06_dump.grapcfg (21194) - trick_vmware_detection_cmp (1), trick_vmware_detection_magic (2)
s08_dump.grapcfg (21151) - trick_vmware_detection_cmp (1), trick_vmware_detection_magic (2)
s10_dump_2.grapcfg (15130) - trick_vmware_detection_cmp (1), trick_vmware_detection_magic (2)
s12_unpacked_1.grapcfg (13891) - trick_vmware_detection_cmp (1), trick_vmware_detection_magic (2), trick_vmware_detection_v1 (1)
s21.grapcfg (11685) - trick_vmware_detection_cmp (1), trick_vmware_detection_magic (1), trick_vmware_detection_v1 (1)
s23_unpacked.grapcfg (10078) - trick_vmware_detection_magic (2), trick_vmware_detection_v1 (1)

7 samples match:

Let’s look at the trick_vmware_detection_magic matches (-m forces to output matching instructions):

$ grap -m trick_vmware_detection.grapp s02_unpacked_1.grapcfg
s02_unpacked_1.grapcfg - 13898 instructions
4 matches: trick_vmware_detection_cmp (1), trick_vmware_detection_magic (2), trick_vmware_detection_v1 (1)

trick_vmware_detection_magic - match 1
1_magic: 0x406338, mov eax, 0x564d5868
2_other0: 0x40633d, mov ecx, 0x14
2_other1: 0x406342, mov dx, 0x5658
3_in: 0x406346, in eax, dx

trick_vmware_detection_magic - match 2
1_magic: 0x406d7f, mov eax, 0x564d5868
2_other0: 0x406d84, mov ecx, 0xa
2_other1: 0x406d89, mov dx, 0x5658
3_in: 0x406d8d, in eax, dx

There is a variant of the technique described by VMWare, this time with ecx=0x14.

In this case the in instruction calls the VMWare function to get memory size, this can also be used for VMWare detection: https://www.aldeid.com/wiki/VMXh-Magic-Value.

repeat and lazyrepeat

By default repeat will match the maximum of sequential instructions which all have a single parent and a single child (thus within the same basic-block containing no jump nor call).

With lazyrepeat=true, repeat will stop at the first instruction matching the next condition in the pattern:

Unfortunately this approach makes some code sequences difficult to match. Let’s try to match all basic blocks with this shape:

A potential pattern would be: <!–

digraph any_xor_call {
any [cond=true, minrepeat=1, maxrepeat=4, lazyrepeat=true]
xor [cond="opcode is xor"]
call [cond="opcode is call"]

any−>xor
xor−>call
}

–>

lazyrepeat_grapp1.png

Let’s try to match push, push, xor, xor, call with this pattern.

If, as shown, we set lazyrepeat=true, candidates for each node are:

Alternatively, if we set lazyrepeat=false (“any” node), we get the following candidates:

The pattern can thus not match the wanted instruction sequence.

One solution is to use repeat=+ with lazyrepeat=true when matching on xor:

lazyrepeat_grapp2.png

Candidates are now:

Be aware of the behavior of lazyrepeat when writing and testing patterns with repeated instructions: it leads to many unintuitive behavior (no match).

4.3 - Obfuscation with empty loops

Qakbot uses unreachable empty loops as an obfuscation method [1]:

emptyloop_IDA.png

Basic blocks at addresses 0x404ce7 and 0x404ceb are unreachable.

A quick pattern can detect the loop: opcode is xor and arg2 is _arg1 -> je -> jmp -2> 1:

This pattern detects an unreachable empty loop using xor, je and jmp because when both xor arguments are identical the result is 0 and the condition jump je is always taken.

We can look for potential variants with xor having different arguments and different conditional jumps, including cases where the jump is not taken : xor -> * -*> * -2> 1:

Matching both patterns on the samples:

$ grap -q "opcode is xor and arg2 is _arg1 -> je -> jmp -2> 1" -p "xor -> * -*> * -2> 1" *.grapcfg 
s01_unpacked.grapcfg (53616) - tmp0 (224), tmp1 (224)
s02_unpacked_1.grapcfg (13898) - tmp0 (87), tmp1 (87)
s08_dump.grapcfg (21151) - tmp0 (625), tmp1 (625)
s07_dump.grapcfg (48735) - tmp0 (219), tmp1 (219)
[...]

We find:

The full patterns for this technique are (see qakbot_trick_emptyloop.grapp):

emptyloop_grapp.png

Back into IDA, the plugin finds and colors matches:

emptyloop_IDA2.png

4.4 - RC4 Key Scheduling

Qakbot uses RC4 to decrypt an embedded resource [2,3].

RC4’s Key Scheduling algorithm begins with creating a permutation array with all values from 0 to 255: S=[i for i in range(0, 0x100)].

One documented sample (see “RC4 ENCRYPTION” in [3]) uses a mov eax, 0x100 followed by a push, we can find it in our samples with grap -q "mov eax, 0x100 -> push" *.grapcfg, exploring the matches leads to the following function from which we can create a more precise pattern:

rc4_IDA.png

In many samples the implementation is actually slightly different, without push esi:

rc4_IDA2.png

Our final pattern will have this push instruction as an optional match with repeat=? (see qakbot_rc4init.grapp):

rc4_grapp.png

5 - Sample navigation

Putting together the previous patterns in a single folder under the right plugin folder (on Linux in “~/idapro-7.5/plugins/idagrap/patterns/test/misc/files/qakbot_patterns”) will make them available within the plugin.

Matching the patterns on unpacked samples (-sa shows also non-matched samples):

grap -q -sa qakbot_patterns/ *_unpacked*.grapcfg
s02_unpacked_1.grapcfg (13898) - qakbot_rc4init_gen (1), qakbot_trick_emptyloop_generic (87), qakbot_trick_emptyloop_v1 (87), trick_PEparsing (3), trick_cpuid (2), trick_vmware_detection_cmp (1), trick_vmware_detection_magic (2), trick_vmware_detection_v1 (1)
s12_unpacked_4.grapcfg (5456)
s02_unpacked_2.grapcfg (25461) - qakbot_rc4init_gen (1), qakbot_trick_emptyloop_generic (84), qakbot_trick_emptyloop_v1 (84), trick_PEparsing (1)
s01_unpacked.grapcfg (53616) - qakbot_rc4init_gen (1), qakbot_trick_emptyloop_generic (224), qakbot_trick_emptyloop_v1 (224), trick_PEparsing (3)
s03_unpacked.grapcfg (47070) - qakbot_rc4init_gen (1), qakbot_trick_emptyloop_generic (219), qakbot_trick_emptyloop_v1 (219), trick_PEparsing (3)
s04_unpacked.grapcfg (52604) - qakbot_rc4init_gen (1), qakbot_trick_emptyloop_generic (224), qakbot_trick_emptyloop_v1 (224), trick_PEparsing (3)
s07_unpacked.grapcfg (47281) - qakbot_rc4init_gen (1), qakbot_trick_emptyloop_generic (219), qakbot_trick_emptyloop_v1 (219), trick_PEparsing (3)
s09_unpacked.grapcfg (35471) - trick_PEparsing (2)
s12_unpacked_1.grapcfg (13891) - qakbot_rc4init_gen (1), qakbot_trick_emptyloop_generic (87), qakbot_trick_emptyloop_v1 (87), trick_PEparsing (3), trick_cpuid (2), trick_vmware_detection_cmp (1), trick_vmware_detection_magic (2), trick_vmware_detection_v1 (1)
s19_unpacked.grapcfg (36736) - qakbot_rc4init_gen (1), qakbot_trick_emptyloop_generic (176), qakbot_trick_emptyloop_v1 (176)
s20_unpacked.grapcfg (38455) - qakbot_rc4init_gen (1), qakbot_trick_emptyloop_generic (219), qakbot_trick_emptyloop_v1 (219), trick_PEparsing (2)
s22_unpacked.grapcfg (34460) - trick_PEparsing (2)
s23_unpacked.grapcfg (10078) - trick_vmware_detection_magic (2), trick_vmware_detection_v1 (1)
[...]

The are some differences amonsts unpacked samples:

Though we will not go further now, those differences can be a way to distinguish between versions and sample types.

Within IDA the patterns help navigate a sample by finding the documented features and leading to their implementations:

navigation_IDA.png

Besides the previously described patterns, the plugin includes pre-defined patterns such as:

The screenshot shows a match of a basic block containing a xor instruction with pattern bb_xor_loop. The highlighted instruction (and ebx, 0x3f) is actually part of the documented string decryption function [2] (the decryption script contains the following operation: idx&0x3F).

The strings decryption function will be further described in another post, along with a method to automatically decrypt the samples’ strings.

Conclusion

We demonstrated how to use grap to find and analyze documented malware features in public samples as a way to get up-to-speed on a malware family.

Creating grap patterns matching against malware samples is simplified by the IDA plugin and the quick pattern syntax. In complex cases you will still need to understand the pattern syntax and write them manually.

Patterns are useful to find implementation alternatives and to navigate and classify unknown samples.

Resources

More documentation on grap can be found here: