Because writing computer programs is hard, computer programmers are
taught to use encapsulation and modularity to hide complexity and
reduce the potential for errors. Their programs will have a
high-level, hierarchical structure that reflects their choice of
internal abstractions. We designed and forged a system, Laika, that
detects this structure in memory using Bayesian unsupervised learning.
Because almost all programs use data structures, their memory images
consist of many copies of a relatively small number of templates.
Given a memory image, Laika can find both the data structures and
We then used Laika to detect three common polymorphic botnets by
comparing their data structures. Because it avoids their code
polymorphism entirely, Laika is extremely accurate. Finally, we argue
that writing a data structure polymorphic virus is likely to be
considerably harder than writing a code polymorphic virus.