Friday, April 3, 2015

Bin-tropy analysis to detect cryptomalware

Bin-tropy calculation:

Analyzing the file contents in each path or drive is one of the preliminary steps in detecting the crypto-malware execution. A main difference between an encrypted file and a normal file is that the randomness in characters in encrypted file is greater than expected in a normal file.

The Binary Entropy calculation is done using statistical test suite based on Discrete Fourier tranformation of the file sequence.

Steps involve :

Binary sequence of the file content to be analysed. Each 0 and 1 in the sequence to be converted to -1 and 1 respectively. For eg : Seq = 10110101 converted as Seq = 1, -1, 1,1,-1,1,-1,1.
Apply discrete fourier transform (DFT) to the sequence so that a continuous sine wave can be produced. This would reveal periodic repetition in the input data. In this case, periodic components of the sequence of bits at different frequencies.
Calculate the modulus of the substring of the DFT sequence generated, which would give the sequence of peak heights.
Compute threshold peak height value (95% peak height value). Threshold = √(log 1/0.05)n
Under the assumption of randomness, 95% of the peak heights obtained from the sequence should be less than this threshold value.
To compare the theoretical number of peaks (95% of the peak heights) that are less than threshold, with the actual number of peaks that are less than threshold, compute theoretical (N) = .95 (n / 2) , expected number of peaks with heights less than threshold actual (N_1) = the actual number of peaks that are less than T (as observed)
Find d = normalized difference between the expected and theoretical number of frequency components that are beyond the 95% threshold.
Compute complementary error function value as “E = erfc( abs(d)/√2)”

If the computed E value is greater than 0.01, then conclude that the input sequence is random (encrypted). Else non-random sequence (normal).

d value that is too low means that there are too few peaks below T, and too many peaks above T.

Limitations of the bin-entropy detection method:

Not perfect in cases of very small files or user encrypted files.

For eg : say a txt file with “SSN : 0123456789″.
Randomness test would fail with E > threshold because within the 14 characters, except “S”, all of them are unique, thus random in nature. Even though it is a valid text, the entropy value would be higher than threshold.
In case of user encrypted files, Entropy would already be higher, so if a malware starts encrypting the same file again,
the script cannot differentiate between “legitimate user encryption” and “unauthorized encryption” thus wouldn’t be
efficient.

Source : Python Sourcecode for the implementation can be found in https://github.com/EC700/Charlie-2/tree/master/Entropy

References: Bin Entropy calculated based on ‘Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications’
published by National Institute of Standards and Technology, U.S Department of Commerce
Source : http://csrc.nist.gov/groups/ST/toolkit/rng/documents/SP800-22rev1a.pdf

Malware File Detection

Packed PE file detection :

Advanced malwares evade detection to common methodologies/softwares by polymorphism or obfuscation. Malwares can be transferred and executed across different victims as packed executables which can avoid detection by signature analysis. We can detect packed files by extracting specific elements of packing and calculating entropy of the entry point.

Entropy is the measure of unpredicatability or randomness in an information stream.

PEID – most common tool to detect PE packed or encrypted malware by signature detection
MRC – structured analysis by file entropy. If encryption or suspicious packing is detected, weight value is added so that entry becomes high.

Recent malware execution files are packed to avoid malware detection and fast propagation. PE packed file is compressed and encrypted. Thus the data within the pack is random, which can be found by byte entropy calculation. The original execution code is compressed in the “Packed Data” section in the image and it has to be unpacked for binary analysis.

The entropy values range between 0 – 8. Specific file types will have entropy values within smaller bands in that range.

Methods used to differentiate normal file and packed file are :

entropy calculation of the whole file, entropy calculation of the entry section

Entropy calculation at the entry section gives the better option.

Essential elements of a packed PE file is IMAGE_HEADER_SECTION. One important feature is that the packed file is executable only when WRITE property is included in the header section. So this is one of the main points checked in the detection of packing.

Packed PE file detection techniques – entropy based detection and characteristic based detection (behaviour based ).

Entropy based detection – By previous data sets, Packed PE file has entropy > 6.85 in the entry point.

Characteristic based detection – there are different characteristics/action taken in unpacking file in normal PE file and packed PE file. In packed executables, WRITE property is required to do the unpacking and executing. Therefore packed PE file can be selected by checking the entry point section and verifying if it includes WRITE property.

In a Normal file, EXECUTE, READ, CODE or DATA occurs commonly.

Packed PE detection flow chart : faster and efficient detection. Receives the whole drive and sequentially checks each file in the disk for PE signature(by looking for the “MZ” file signature).

If it is a PE file, then find the entry point section. Check if entry point section has WRITE property included, and if so, calculate entropy of that section to find if it is greater than the threshold for normal PE file.(6.85)

This method has better detection rate and time, when done on huge dataset / whole harddrive.

Implementation of crypto-malware detection using file entropy changes in the project

In the project, one of the pre-detection methods is Detection by file entropy changes. The steps involve :

Calculating binary entropy of the files (.txt, .doc, .pdf) in the path or drive specified
Detecting ASCII percent change of the files.

The process involve detecting the repetitive patterns in the binary sequence that would indicate a
deviation from the assumption of randomness.

Implementation of the binary entropy analysis to detect cryptomalware execution in real time is here.

Source:

http://www.forensickb.com/2013/03/file-entropy-explained.html

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.120.9861&rep=rep1&type=pdf

http://csrc.nist.gov/groups/ST/toolkit/rng/documents/SP800-22rev1a.pdf

Every code is breakable!!!