Learning miasm: Part 2: Analyzing instructions

January 12, 2020 #reverse-engineering #miasm #python #disassembly

Once we’ve used miasm to load data into a container, then its time to disassemble instructions. I’ve found that there are two useful modes of disassembling:

parse a few bytes into an instruction instance, or
parse many bytes, analyze the instructions, and recover a Control Flow Graph (CFG).

The first approach is useful when dealing with very small snippets, while the second is better when reasoning about sequences of instructions. If you’re picturing your algorithm in terms of IDA’s graph view, then you’ll want to reach for the CFG technique.

Disassembling single instructions

You can use miasm’s disassembler object to parse bytes into an instruction. Note that only the first instruction is parsed.

buf = b'\xB8\x01\x00\x00\x00\x90\x90\x90\x90'
container = miasm.analysis.binary.Container.from_string(buf)
machine = miasm.analysis.machine.Machine('x86_32')  # use container.arch when inspecting PE/ELF/etc.
disassembler = machine.dis_engine(container.bin_stream, follow_call=True)
insn = disassembler.dis_instr(offset=0x0)

print(insn)
>>> <miasm.arch.x86.arch.instruction_x86 at 0x7fbaa250f2e8>

print(str(insn))
>>> 'MOV        EAX, 0x1'

The mnemonic is accessible as name and the operands are called args. The length of the instruction is available as a property called l.

assert insn.l == 5

assert insn.name == 'MOV'

print(insn.args)
>>> [ExprId('EAX', 32), ExprInt(0x1, 32)]

An operand instance has some helpful accessors to help you determine the type and value:

arg0 = insn.args[0]
arg1 = insn.args[1]

print(arg0)
>>> ExprId('EAX', 32)

assert arg0.is_id() == True
assert arg0.is_int() == False
assert arg0.is_mem() == False
assert arg0.is_loc() == False
assert arg0.name == 'EAX'

print(arg1)
>>> ExprInt(0x1, 32)

assert arg1.is_id() == False
assert arg1.is_int() == True
assert arg1.is_mem() == False
assert arg1.is_loc() == False
assert arg1.arg == miasm.expression.modint.uint32(0x1)

Disassembling multiple instructions

Most of the time, I deal with long sequences of instructions. For these, I’ve found miasm’s CFG analysis more useful because it helps me understand the dependencies among instructions. In this mode, miasm parses many instructions that it groups by basic block. Miasm follows control flow instructions, such as jmp or call, and records links among the basic blocks.

The result of this operation is an AsmCFG instance, which stands for “AsSeMbly Control Flow Graph” (miasm also has IrCFG and IraCFG objects for reasoning about intermediate representations, too). This object provides accessors for the recovered basic blocks and their control flow edges. Within a basic block, you can enumerate the instruction instances and inspect them as shown in the first section.

While unlikely to be useful to large scale analysis, miasm can render AsmCFG instances to Graphviz Dot format. This is how I’ll demonstrate examples. An empty CFG has an empty image, of course:

import graphviz
cfg = miasm.core.asmblock.AsmCFG()
graphviz.Source(cfg.dot())

The output (empty):

back to disassembling…

To demonstrate features of miasm’s CFG analysis, let’s use the following disassembly snippet:

00000000 <A>:
0:  b8 01 00 00 00          mov    eax,0x1
5:  eb 00                   jmp    7 <B>
00000007 <B>:
7:  b8 02 00 00 00          mov    eax,0x2
c:  eb 00                   jmp    e <C>
0000000e <C>:
e:  b8 03 00 00 00          mov    eax,0x3
13: eb f9                   jmp    e <C>
00000015 <D>:
15: b8 04 00 00 00          mov    eax,0x4
1a: eb 00                   jmp    1c <E>
0000001c <E>:
1c: b8 05 00 00 00          mov    eax,0x5
21: eb eb                   jmp    e <C>

These instructions can be broken down into five basic blocks that have labels A - E. Basic block A flows to B flows to C, and D flows to E flows to C. The two entry points A and D both eventually flow to C. Basic block C flows to itself. If you were to lay them out in IDA, you might see a Y shape:

We’ll load the raw shellcode into a container, select the appropriate architecture, and create a disassembler. Then, dis_multiblock disassembles many instructions, grouping them into basic blocks and control flow edges.

buf = b'\xB8\x01\x00\x00\x00\xEB\x00\xB8\x02\x00\x00\x00\xEB\x00\xB8\x03\x00\x00\x00\xEB\xF9\xB8\x04\x00\x00\x00\xEB\x00\xB8\x05\x00\x00\x00\xEB\xEB'
container = miasm.analysis.binary.Container.from_string(buf)
machine = miasm.analysis.machine.Machine('x86_32')  # use container.arch when inspecting PE/ELF/etc.
disassembler = machine.dis_engine(container.bin_stream, follow_call=True)
cfg = disassembler.dis_multiblock(offset=0x0)

This results in the following CFG:

Some things to note:

miasm started disassembling from address 0x0, which we specified as offset=0x0 above.
miasm did not find basic blocks D or E, because they are not referenced from the entry point.
the disassembly looks consistent with our initial listing, that’s good!
miasm created its own names for these basic blocks using a reasonable default schema (loc_ + address).

Naming locations with LocationDB

Let’s start by addressing that final point and teach miasm about our names for these basic blocks. To do this, we’ll use a LocationDB to associate names with locations within a container. I’ve found that miasm uses the LocationDB heavily to refer to locations in a Container, so you’ll quickly get familiar with this abstraction.

container = miasm.analysis.binary.Container.from_string(buf)

loc_db = container.loc_db
loc_db.add_location(offset=0x0, name="A")
loc_db.add_location(offset=0x7, name="B")
loc_db.add_location(offset=0xE, name="C")
loc_db.add_location(offset=0x15, name="D")
loc_db.add_location(offset=0x1C, name="E")

machine = miasm.analysis.machine.Machine('x86_32')  # use container.arch when inspecting PE/ELF/etc.
disassembler = machine.dis_engine(container.bin_stream, loc_db=loc_db, follow_call=True)  # note: added reference to `loc_db`
cfg = disassembler.dis_multiblock(offset=0x0)

Good! Miasm now renders our basic blocks titles and labels using the names we expect. Since miasm never does this automatically, we should always apply relevant names when setting up a workspace. For example, we can apply the names of imported and exported symbols to a PE container:

pe.loc_db.add_location(offset=pe.entry_point, name='entry', strict=False)

for symbol, va in miasm.jitter.loader.pe.get_export_name_addr_list(pe.executable):
    if isinstance(symbol, int):
        # ordinal
        pe.loc_db.add_location(offset=va, name=f'#{symbol}', strict=False)
    else:
        pe.loc_db.add_location(offset=va, name=symbol, strict=False)

for ((dll, symbol), addresses) in miasm.jitter.loader.pe.get_import_address_pe(pe.executable).items():
    for va in addresses:
        pe.loc_db.add_location(offset=va, name=f'{dll}!{symbol}', strict=False)

Returning to our shellcode example, we can query the LocationDB by offset and name:

loc_db.get_offset_location(0x0)
>>> <LocKey 0>

loc_db.get_name_location('A')
>>> <LocKey 0>

loc_db.get_location_names(loc_db.get_name_location('A'))
>>> frozenset({b'A'})

That final line is neat to me: miasm supports assigning multiple names to the same location. This is helpful when a function is exported many times with different names.

Disassembling from many entry points

We know, from prior knowledge, that our shellcode snippet has multiple entry points: A and D. How can we get miasm to recognize both of these? I found that the best way is to generate a CFG for each entry point and then merge the graphs together into a single, master CFG:

job_done = set()
a_cfg = disassembler.dis_multiblock(loc_db.get_name_offset("A"), job_done=job_done)
d_cfg = disassembler.dis_multiblock(loc_db.get_name_offset("D"), job_done=job_done)
cfg = miasm.core.asmblock.AsmCFG(loc_db)
cfg.merge(a_cfg)
cfg.merge(d_cfg)
cfg.apply_splitting(disassembler.loc_db,
                    dis_block_callback=disassembler.dis_block_callback,
                    mn=disassembler.arch,
                    attrib=disassembler.attrib,
                    pool_bin=disassembler.bin_stream)

(In my testing, I discovered a couple ways to build a single, master CFG and that the above strategy shows the best performance. Please review the Appendix for further details.)

Conclusion

This is the second post in a series. Next, I’ll collect some thoughts on how to use miasm to identify functions and other constructs. Maybe we’ll get to emulation (still haven’t attempted this yet).

When it comes to disassembly, I like how miasm is very explicit about its analysis and results. It doesn’t hide side effects within stateful objects, rather it returns the results and lets you decide how to merge, diff, and/or serialize them. I can imagine building an “undo” feature when using miasm, while this would be nearly impossible in IDAPython or vivisect.

The downside, of course, is that you have to stitch together analyzers as you see fit. I wonder if somewhere there is a repository of reusable miasm analyzers. I haven’t found one yet.

Appendix I: Miasm disassembly internals

I’ve shared a technique for disassembling multiple entry points and merging their control flow graphs. For me, this construction worked the best, but it took multiple tries to get here. Let’s learn a bit more about miasm’s algorithm so we can maintain performance and avoid a subtle bug.

Here is the implementation of dis_multiblock:

    def dis_multiblock(self, offset, blocks=None, job_done=None):
        if job_done is None:
            job_done = set()
        if blocks is None:
            blocks = AsmCFG(self.loc_db)
        todo = [offset]

        while len(todo):
            ...
            target_offset = int(todo.pop(0))
            if (target_offset is None or
                    target_offset in job_done):
                continue
            cur_block, nexts = self._dis_block(target_offset, job_done)
            todo += nexts
            blocks.add_block(cur_block)
        ...

This is a straightforward recursive descent disassembler that uses a job queue of virtual addresses called todo. Starting from the given virtual address, miasm disassembles a basic block, saves it off, and adds the successors to the queue. It then pops a new virtual address from the queue and repeats, until the queue is empty.

To avoid looping endlessly over blocks that branch to themselves, miasm uses an index called job_done to track the virtual addresses of blocks it has already considered. It skips blocks that it’s already seen. You can provide an initialized set for job_done to restrict the paths that miasm will disassemble.

Miasm places the collected basic blocks into the AsmCFG instance named blocks. Again, you can provide an initialized instances for blocks, in which case miasm will merge the results together.

With this knowledge, we can create a naive solution in which we disassemble from multiple points, reusing the same AsmCFG instance:

cfg = miasm.core.asmblock.AsmCFG(loc_db=loc_db)
disassembler.dis_multiblock(offset=loc_db.get_name_offset("A"), blocks=cfg)
disassembler.dis_multiblock(offset=loc_db.get_name_offset("D"), blocks=cfg)

That graph looks great! It has all the basic blocks that we expect. It works by disassembling from each entry point, and having miasm place the results into the same mutable AsmCFG instance.

However, there are two issues with this naive solution:

the performance is sub-optimal, because job_done is not maintained across calls.
there’s unexpected O(n²) runtime thanks to internal basic block bookkeeping.

`job_done`

First, miasm does too much work because we’re not providing a shared job_done index to dis_multiblock. Instead, the local variable get reinitialized on each call, so miasm forgets which blocks it has already processed and repeats work - wasting our time.

In our example, we call dis_multiblock twice, for entry points A and D. Miasm will disassemble blocks A, B, and then C during the first invocation, then D, E, and then C (again!). The work for C (and any blocks that flow from it) is done twice. While our example is simple, and C is trivial to disassemble, C may be extremely complex in the real world. We might be left wondering why miasm seems slow at analysis.

Therefore, we should tweak our solution to share an instance of job_done across calls to dis_multiblock:

cfg = miasm.core.asmblock.AsmCFG(loc_db=loc_db)
job_done = set()
disassembler.dis_multiblock(offset=loc_db.get_name_offset("A"), job_done=job_done, blocks=cfg)
disassembler.dis_multiblock(offset=loc_db.get_name_offset("D"), job_done=job_done, blocks=cfg)

And the results are exactly the same:

Block splitting

However, I found that this solution still showed very poor performance when analyzing real programs. The culprit is that miasm invokes a routine called apply_splitting at the tail of dis_multiblock, whose purpose is to split basic blocks that have a jump target in the middle of them.

For example, consider the assembly snippet:

A:
mov eax, 1
B:
mov eax, 2
jmp B

Before splitting, miasm incorrectly computes the following CFG, which groups all instructions into a single basic block:

After splitting, miasm correctly recovers the CFG, splitting A to account for the jump target B:

This behavior is good; however, there is trouble in the way dis_multiblock interacts with apply_splitting. Every time apply_splitting is invoked, miasm checks all basic blocks in the CFG:

    def apply_splitting(self, loc_db, dis_block_callback=None, **kwargs):
        ...
        todo = set(self.blocks)

        while todo:
            cur_block = todo.pop()

If we use the same AsmCFG in across multiple calls to dis_multiblock, then we repeatedly process the same blocks many times. While it may look innocuous, when disassembling kernel32.dll with its 2000 exports, miasm spends most of its time re-re-re-checking blocks for splits. The runtime performance is O(n²).

The trick is to create a separate CFG for each entry point and merge them at the end (also using the job_done trick, too). This way, each basic block is checked for splitting just once, and runtime performance is back to O(n).

job_done = set()
a_cfg = disassembler.dis_multiblock(loc_db.get_name_offset("A"), job_done=job_done)
d_cfg = disassembler.dis_multiblock(loc_db.get_name_offset("D"), job_done=job_done)
cfg = miasm.core.asmblock.AsmCFG(loc_db)
cfg.merge(a_cfg)
cfg.merge(d_cfg)
cfg.apply_splitting(disassembler.loc_db,
                    dis_block_callback=disassembler.dis_block_callback,
                    mn=disassembler.arch,
                    attrib=disassembler.attrib,
                    pool_bin=disassembler.bin_stream)

The code is a little ugly, but its easily wrapped in a function. For me, this brought the runtime of disassembling 100 exports from kernel32.dll down from 90 seconds to 3 seconds!

Wrapping everything together, here’s the utility function I’ve been using recently:

def disassemble_all(container, entries):
    machine = miasm.analysis.machine.Machine(container.arch)
    mdis = machine.dis_engine(container.bin_stream, loc_db=container.loc_db, follow_call=True)
    
    job_done = set()
    cfgs = {}

    for va in entries:
        cfgs[va] = mdis.dis_multiblock(va, job_done=job_done)

    blocks = miasm.core.asmblock.AsmCFG(container.loc_db)
    for cfg in cfgs.values():
        blocks.merge(cfg)

    blocks.apply_splitting(container.loc_db,
                           dis_block_callback=mdis.dis_block_callback,
                           mn=mdis.arch,
                           attrib=mdis.attrib,
                           pool_bin=mdis.bin_stream)
    return blocks


entries = [pe.entry_point]
entries.extend([va for symbol, va in miasm.jitter.loader.pe.get_export_name_addr_list(pe.executable)])
cfg = disassemble_all(pe, entries)