Microsoft Windows uses a set of Registry keys known as “shellbags” to maintain the size, view, icon, and position of a folder when using Explorer. These keys are useful to a forensic investigator. Shellbags persist information for directories even after the directory is removed, which means that they can be used to enumerate past mounted volumes, deleted files, and user actions.
Yuandong Zhu, Pavel Gladyshev, and Joshua James provided a nice overview of the investigative value of shellbags in “Using shellbag information to reconstruct user activities” [pdf]; however, they do not describe how to programmatically access the data. Allan S Hay went into greater detail in his December, 2004 document “MiTeC Registry Analyser” [pdf], although he also leaves out a thorough analysis of the format. TZWorks provides an effective closed-source shellbag parser sbag, but does not explain its algorithm. Yogesh Khatri first described the basic structure of Windows Shell Items in his blog post for 42 LLC entitled Shell BAG Format Analysis. Joachim Metz went on to described the binary format of the Windows Shell Item structures with great detail in Windows Shell Item format specification [pdf]. This page documents an approach to parsing shellbags in detail, as well as introduces an open-source, cross-platform shellbag parser.
Shellbag locations
Shellbags may be found in a few locations, depending on operating system version and user profile. On a Windows XP system, shellbags may be found under:
-
HKEY\_USERS\{USERID}\Software\Microsoft\Windows\Shell\ -
HKEY\_USERS\{USERID}\Software\Microsoft\Windows\ShellNoRoam\
The NTUser.dat hive file persists the Registry key
HKEY\_USERS\{USERID}\.
On a Windows 7 system, shellbags may be found under:
-
HEKY\_USERS\{USERID}\Local Settings\Software\Microsoft\Windows\Shell\
The UsrClass.dat hive file persists the registry key
HKEY\_USERS\{USERID}\.
Shellbag Parsing
Let us begin with the Shell\ key. The Shell\ key does not have any
values. Under the Shell\ key are two keys: Shell\Bags\ and
Shell\BagMRU\.
FOLDERDATA
Each subkey under Shell\Bags\ is named as increasing integers from
one, such as Shell\Bags\1\ or Shell\Bags\2\. Let us call these
subkeys FOLDERDATA, since they each represent one item viewed in
Explorer, and this is usually a folder. FOLDERDATA subkeys do not have
any values, but often have subkeys. The most common subkey is
Shell\Bags\{Int}\Shell\, but there are a few other possibilities
(ComDlg, Desktop, etc.). The subkeys under a FOLDERDATA describe the
settings, position, and icon when viewing the folder in Explorer. In
particular, a Registry value whose name begins with ItemPos specifies
the location of the icons for a given desktop resolution. For example,
on my Windows 7 system, the Registry key
HKEY\_USERS\{USERID}\Local Settings\Software\Microsoft\Windows\Shell\Bags\6\Shell\{5C4F28B5-F869-4E84-8E60-F11DB97C5CC7}
has 12 values that record various configurations. This set includes the
value ItemPos1427x820(1) that has type REG_BIN with length 0x120:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | |
With no tools beyond Regedit (or
Regview.py),
Windows 8.3 filenames (eg. MOZILL\~1.LNK) and Unicode filenames (eg.
Mozilla Firefox.lnk) stand out. Fortunately, by applying the formats
found in Joachim’s paper, more details can be extracted. Throughout this
document, I refer to this Registry value type as an ITEMPOS value.
ITEMPOS values
The ITEMPOS value’s structure is a list of Windows File Entry Shell
Items (SHITEM_FILEENTRY) terminated by an entry whose size field is
zero. The list begins at offset 0x10. Items are preceeded by 0x8 bytes
whose meaning is unknown. The minimum size of a SHITEM_FILEENTRY
structure is 0x15 bytes, so entries whose size field is less than 0x15
should be skipped. The valid SHITEM_FILEENTRY items have the following
structure (in pseudo-C / 010 Editor
template format):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | |
FILEREFERENCE is a 64bit MFT file reference structure (48 bits file
MFT record number, 16 bits MFT sequence number). FILEATTRS is a 16 bit
set of flags that specifies attributes such as if the item is read-only
or a system file. Applying this template to the ITEMPOS Registry
value, we see there are four list items: one invalid entry, and three
SHITEM_FILEENTRY items.
00 00 00 00 --> header/footer 00 00 00 00 --> unknown padding (item position?) 00 00 00 00 --> invalid SHITEM_FILEENTRY 00 00 00 00 --> SHITEM_FILEENTRY 0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0010 15 00 00 00 51 00 00 00 14 00 1F 60 40 F0 5F 64 ....Q......`@._d 0020 81 50 1B 10 9F 08 00 AA 00 2F 95 4E 15 00 00 00 .P......./.N.... 0030 A0 00 00 00 46 00 3A 00 02 02 00 00 10 3D 0C 8E ....F.:......=.. 0040 20 00 43 79 67 77 69 6E 2E 6C 6E 6B 00 00 2C 00 .Cygwin.lnk..,. 0050 03 00 04 00 EF BE 10 3D 0C 8E 10 3D 0C 8E 14 00 .......=...=.... 0060 00 00 43 00 79 00 67 00 77 00 69 00 6E 00 2E 00 ..C.y.g.w.i.n... 0070 6C 00 6E 00 6B 00 00 00 1A 00 15 00 00 00 02 00 l.n.k........... 0080 00 00 5A 00 3A 00 42 06 00 00 10 3D 91 7C 20 00 ..Z.:.B....=.| . 0090 4D 4F 5A 49 4C 4C 7E 31 2E 4C 4E 4B 00 00 3E 00 MOZILL~1.LNK..>. 00A0 03 00 04 00 EF BE 10 3D 91 7C 10 3D 61 85 14 00 .......=.|.=a... 00B0 00 00 4D 00 6F 00 7A 00 69 00 6C 00 6C 00 61 00 ..M.o.z.i.l.l.a. 00C0 20 00 46 00 69 00 72 00 65 00 66 00 6F 00 78 00 .F.i.r.e.f.o.x. 00D0 2E 00 6C 00 6E 00 6B 00 00 00 1C 00 41 01 00 00 ..l.n.k.....A... 00E0 51 00 00 00 30 00 31 00 00 00 00 00 10 3D 2C 81 Q...0.1......=,. 00F0 10 00 4D 49 52 00 1E 00 03 00 04 00 EF BE 10 3D ..MIR..........= 0100 B0 80 10 3D A7 8C 14 00 00 00 4D 00 49 00 52 00 ...=......M.I.R. 0110 00 00 12 00 41 01 00 00 51 00 00 00 00 00 00 00 ....A...Q.......
Taking the first valid entry from offset 0x34, let’s parse out the fields from the binary. The following block visually maps out the relevant bytes, while the table translates each field into a human readable value.
00 00 00 00 --> SHITEM_FILEENTRY size 00 00 00 00 --> filesize 00 00 00 00 --> timestamp 00 00 00 00 --> filename 0000 46 00 3A 00 02 02 00 00 10 3D 0C 8E 20 00 43 79 F.:.....w.=.Ž Cy 0010 67 77 69 6E 2E 6C 6E 6B 00 00 2C 00 03 00 04 00 gwin.lnk..,..... 0020 EF BE 10 3D 0C 8E 10 3D 0C 8E 14 00 00 00 43 00 ï¾.=.Ž.=.Ž....C. 0030 79 00 67 00 77 00 69 00 6E 00 2E 00 6C 00 6E 00 y.g.w.i.n...l.n. 0040 6B 00 00 00 1A 00 k.....
| Offset | Field | Value |
|---|---|---|
| 0x00 | ITEMPOS size | 0x46 |
| 0x04 | Filesize | 0x202 |
| 0x08 | Modified Date | August 16, 2010 at 17:48:24 |
| 0x0E | 8.3 Filename | Cygwin.lnk |
| 0x22 | Created Date | August 16, 2010 at 17:48:24 |
| 0x26 | Modified Date | August 16, 2010 at 17:48:24 |
| 0x2E | Unicode Filename | Cywgin.lnk |
At this point, it is easy to write parser that explores the FOLDERDATA keys under the Shell registry key. For each FOLDERDATA, the parser might enumerate each ITEMPOS value and consider the binary blob. By applying the binary template above, the tool could identify filenames, MACB timestamps, and other metadata independent of the filesystem MFT. Unfortunately, we’re still missing a key piece of information: the full file path.
BagMRU tree
To recover file paths from Shellbags, we’ll need to consider the
Registry keys under BagMRU. The subkeys under Shell\BagMRU form a
recursive, tree-like structure that mirrors the file system on disk.
Shell\BagMRU is the root of the tree. Each subkey is a node
representing a folder, and like a folder, may contain children nodes.
Yet, unlike (most) folders, the nodes are named as increasing integers
from zero. For example, the branch Shell\BagMRU\0 might have the
children 0, 1, and 2.
All nodes in this tree have a value named MRUListEx, and many have a
value named NodeSlot. NodeSlot is what interests us, as it forms the
link between the filesystem tree structure and the FOLDERDATA keys. A
NodeSlot value has type REG_DWORD and should be interpreted as a
pointer to the FOLDERDATA key with the same name. For example, on my
workstation, the key Shell\BagMRU\1\1\3\0 has a NodeSlot value of
144. This means that the FOLDERDATA Shell\Bags\144\ corresponds to a
folder with a path of four components. What are they? The components are
described by the values at Shell\BagMRU\1, Shell\BagMRU\1\1,
Shell\BagMRU\1\1\3, and Shell\BagMRU\1\1\3\0.
SHITEMLIST
In addition to the values MRUListEx and NodeSlot, nodes of the
Shell\BagMRU tree have one value for each subkey. The values have the
same name as the subkey; since the subkeys are named as increasing
integers, so are the values. Each value records metadata about the
filesystem path component associated with the subkey. The values have
type REG_BIN, and have an internal binary structure known as an
SHITEMLIST. An SHITEMLIST is formed by contiguous items terminated
by an empty item. Practically, though, the SHITEMLIST of a BagMRU node
will have two entries: a relevant entry, and the empty terminator item.
The first word of each SHITEM gives the item’s size.
Joachim’s paper on Window’s shell items is the best resource for
understanding the variations among SHITEM entries. From a high level,
there are at least ten types of items that range from SHITEM_FILEENTRY
and SHITEM_FOLDERENTRY to SHITEM_CONTTROLPANELENTRY. For each of
these types, we can extract at least a path component such as “My
Documents” or “\myserver”. Fortunately, most items have type
SHITEM_FOLDERENTRY, which provides additional metadata including MAC
timestamps. A small number of items do not conform to the known
structure, although these do not usually contain any human readable
strings or hints.
Putting it all together
With the SHITEMLIST structure in hand, we now have enough information
to comprehensively parse Windows shellbags. To do this, first recurse
down the Shell\BagMRU keys while complete directory paths. At each
node, record any available metadata and lookup the associated
FOLDERDATA. Recall that the FOLDERDATA may indicate some of the items
contained by the directory, so record this metadata, too. Finally,
format and enjoy!
The following code block lists the algorithm in a Pythonish language for the programmers in the room.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | |
Shellbags.py
Using these concepts, I’ve implemented a cross-platform shellbag parser for Windows XP and greater in the Python programming language. The code is freely available here, so all algorithms and structures are accessible to interested parties. I’ve licensed the code under the Apache 2.0 license, so please feel encouraged to take and improve the routines as you feel fit. As a benchmark, shellbags.py tends to identify at least the items returned by the sbag utility, and in some cases returns more.
Shellbags.py accepts the
path to a raw Registry hive acquired forensically as a command line
argument. To ensure interoperability, output is formatted according to
the Bodyfile specification by default. The following block lists a
demonstration of me running shellbags.py against a Windows XP
NTUSER.dat Registry hive.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
To improve readability, I ran the output through the mactime utility to generate a timeline of activity. The following block lists a portion of this sample.
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
Help
For reference, the following code block lists the command line parameters accepted by shellbags.py. Now get going and try it out!
1 2 3 4 5 6 7 8 9 10 11 | |