Malware Analysis for Hedgeogs

Lecture 1 Intro

Analysis Process:

Triage: make assumptions -> select tools
Analysis: analysis -> make assumptions -> new analysis -> get facts
Report: write down facts -> verdict

lots of guessing

Lecture 2 VMs

Environment: classic vm installation, the interesting part is number 7 (shared folders scripts), 8(secure folders) and 9(secure network). Jigsaw detonation. Pretty cool. Also the files contain a lot of additional info

Lecture 3 Triage

Decide what to tackle first (parable of the blind men and the elefant)

Get Overview -> Determine tools and step -> Choose samples -> Discover low hanging fruits

Tirage Steps:

File type: TrID and hex editor. DetectItEasy in case of PE
- Polyglots: files that have more than one type
- Shift+RightClick -> Powershell in the context menu
Whole file examination: Strings(strings.exe by sysinternal), visualization, hex editor execution, ability to embed files, icons
Metadata viewer
- use specific parser for the filetype
- read specifications. Look for: magic bytes, ability for code
Automatic reports: sandboxes and antivirus
- Malware names component: Type|Platform|Family|Variant(singature or id)|Modifier
- Defaults: Type=Trojan, Family=Agent (These do not mean anything)
- Be careful of Specific(small variant, concrete type, no default) vs Unspecific names(long variant, use of “gen”"susp""heur"[and other scoring type]"ml""AI"\specific names of detection technologies, defaults). List of detection techs: Kazy, Razy, Zusy, Raftor, WisdomEyes, Artemis(only mcafee)
- Keywords in families(table file)
- CARO naming conventions
- Use malpedia when you get the name
First research: internet info on what you have

Quick Analysis

“&” means new line in cmd.exe

Lecture 4 Wrappers

You want the code that does the things, but usually the malware is packed and wrapped in other files. What you’ll find can be

The actual malware
Init code
Environments (wrappers or installers)
Static linked libs

Wrappers:

interpreters for the intepreted script. In binary this results in a 3 parts binary (unpacker/runner, environment, script proper)
ex. Launch4J, Bat2Exe, PyInstaller

How to Unwrap?

Dynamically: usually upackers drop the real script in %TEMP%, so monitor file writing -> run exe -> copy the files written
Statically: search in hexdump and search with strings.exe (you can find the region with magic numbers)
Statically (Encrypted): find the wrapper with DiE and search the extraction tool to get the code
Use procmon and apimonitor
Use deny deletion on folders to make the files not removable
Turn of clickable links on notepad++

Installers: - builders with propertary script - ex NSIS, Inno

How to obtain the binaries inside? - Unarchivers (7z), extractors found online (use DiE), 7z v1505 has NSIS extraction compatibility - Dynamic: monitor the file write and get the files - Analyze installer code (try to use the installer yourself to understand better)

renamer: mass renamer
you have to learn the scripting language of shit

Lecture 5 ASEP

Auto Start Extensibility Points Types:

System persistence (provided by windows)
- Run, RunOnce, RunOnceEx
- Startup folder (path)
- Scheduled tasks (path)
- Services (SCM)
Program Loader Abuse (exploit windows loader)
- Image file execution option
- extension hijacking
- shortcut manipulation
- COM hijacking
- SHIM databases (path)
Application Abuse (exploit plugins)
- Trojanized system binaries
- Office add-ins
- Browser Helper Objects (BHO) [no more used]
System Behavior (exploit windows)
- WinLogon (change file manager, change notification package, change userinit)
- DLL hijacking
- Appinit DLLs [no more used]
- Active Setup

Examination Tools - Sysinternals Autorun.exe - WineSap - Farbar recovery scan tool

Windows Registry: where the malware goes to get permission and persistency - Value Data types (might is useful): REG_SZ (path/names string), REG_DWORD (usually binary), REG_BINARY (can stroe whole files) - 9 root keys HK– and 7 hives. Hive: set of keys with files associated

Tools: sc.exe: create services. services.msc: see active services (you have to close it before removing it from the registry) autoruns64.exe: check autoruns (show hidden). Be careful because it is better to cancel from the regmon than from autorun.exe (it might not work or fuck up shit)

In the tool there are variuìous techniques. CHeck lec 5 lab 6

Lecture 6 PE & .NET

In the PE header things are in little endian, except MSDOS header
Important offset: 0x3C -> pointer to PE header
After the sections you have the “overlay” aka a section that is not part of the PE specification
Pipeline to find things: 0x00 (MZ) -> 0x3C points to PE header -> +4b you find COFF header -> +20b you find Optional Header -> +24+SizeofOptionalHeader you find section table -> +40*N of sections you find sections -> after everything overlay

Lecture 7 Analysis

Types:

Static
Dynamic
Meta inspection (aka basic)
Code inspection (aka advanced)
Can be combined (matrix 2x2)

When to use?

SM -> Triage
DM -> Triage and main starting point
SC -> Main analysis
DC -> Main analysis (aid for SC)

Verdicts: describe the analysis in a nutshell, basically a summary. Important to consider: general summary, possible danger, relevance, classifiability Examples of name: Malware, Riskware(hacking tools), Grayware(bad things, not malware), PUP, Corrupted, Clean

Clean or Malware?

Trojanized software
Packed programs that do not show anything (not explained in this lecture)
Grayware like cracks

It’s hard when the code you are analyzing is clean, you have to prove absence (know when to stop, set a timer) -> Check the metadata if they make sense (like the name, certificates and so on) -> Widespread of the code and age (virustotal) -> Check entry points to see if something else is doing the bad things

Diff binaries tools:

Ybindiff= very simple, good for small differences
Meld= for text comparison, good with decompiled code
Bindiff= for disassemblers (use Binexport with Ghidra)
Portexanalyzer

Find certificates (use Analyzepesig)

Bytes after singature should be 0
Bytes after PKCS7 should be 0

Signature verification

There is an area in the file that calculates the file’s hash (PKCS)
There are areas excluded from the calculation (signature itself and padding, checksum and pointer to hash)
Data can be hidden in the digital singature

Lecture 8 Reports

Not a fixed way to do it

Situation 1, you are in antivirus company

Hash
Submitter
Date
Reason to submission
Additional info
Description of what the file is doing
AV detection before and after
Verdict

Situation 2, blog

Tell a story
Technical details (infection vector, persistence, evasion techniques, idiosyncracies, communication, potential damage)
Classification and type
Protection opportunities
IOC (hashes, filenames, C2 severs, URLs)

How to classify Malware (Type, Subtype, Family, Subfamily, Variant)

Types by propagation:

Virus: file infector
Worm: self replicate
Other (Trojan): no self replicate
Peter Szor infections strategies

Types by payload:

Ransomware: file, screen
Backdoor: RAT, webshell
Stealer: credential, cookies
Dropper (has other malware inside it)
Downloader (downloads malware)
Loader (loads malware without dropping it in disk)

Families:

Start from detection names
Use malpedia
Look for aliases
Search unique strings
Binary diff and code overlaps

Analyzing Notes: 0. Hash

File Type
Malware Type
Malware Family
Communication
Persistence
Main Behavior

Cyberchef: the way to download/load/drop malware from other malware

Exiftool for images

Lecture 9 Ghidra

Symbol Tree

Import: thing you can see with a PE viewer
Export: all the entry points: exporte functions and the PE entry point (called “entry”)
Labels: similar to functions but for data (ex structs)
Classes: for C++ classes (care for name mangling)
Namespaces: to avoid conflicts

Data Type Manager

Majority of types are already guessed by Ghidra
You should add yours here

How to find main:

MinGW: “mainSOMETHING”(mainCRTStartup), entry -> _tMainCRTStratup -> main.
VIsual Studio: entry -> common main -> scrt common main -> invoke_main

If not debug, every main function return success or failure (0 or 1 in int). So go to the end and check what is returned (or exit) and trace it back

in Ghidra, something like iVar1 = (int) (iVar2 & 0xfffffffff) it’s used to convert from 64 to 32
Always triage first. Check if it is interpreted or similar

Lecture 10 x64DBG

Memory breakpoint are implemented with Page Guards: breakpoint is on the whole page

ASLR: Moving from Ghidra to x64DBG. ASLR means that things are not loaded at image base Ways to make it work:

Rebasing: change the base address from Ghidra (memory map -> “hose” -> image address)
Patching the sample: patch the ASLR flag in the PE header (field name “dll characteristics”)
Turn off exploit protection :)

nible = 4 bits

Lecture 11 Legion Ransomware

Simulation top to bottom analysis (understand the ransomware and decrypt data)

First Part

Check the ransomware note -> an email, research that
Check the encrypted file -> the beginning seems to have a signature, also the end (patterns) -> also patterns in the naming scheme
DiE on the ransomware
Strings on ransomware -> something that is related to what we already know -> “PASSWORD_MARKER”, a C2 link, the API
Overlay -> Image Homework: Triage, VirusTotal, google strings

Second part

Open Ghidra
Find the main (classic entry and follow the returns)
Use API calls to make ideas in your head of the flow -> Use loops to do the same thing (arrow that goes up)
Apply names and rename (markup), but only on the important parts (in our case the encryption func)
Times checking probably are checking the demo

Third part

Find the encryption, quick way: identify API useful, identify loop, analyze the loop OR use API calls that do encryption or “write” calls
Check XREF and find the function we care about
Found the XOR and the encryption loop
Add the functions you renamed to a namespace

Fourth part

Patch the ransomware so it always run (not only of some days). You know how to do it brother
Run some monitoring tools (procmon, procexp) to monitor what is happening

Lecture 12 Packing

Types:

Compressor: shrink the code (UPX)
Crypter: evade AV
Protector: prevent RE. Sum of the 2

The stub might be given as input for some packers. Stubs can be different each time (especially for crypters)

How they are run?

They’re own process
Inject in other processes (RunPE)

Scantime crypter = dropper builder (writes the file to disk and runs it, as a dropper)

Target Location: how the packer finds the encrypted file?

Start and end markers
Fixed locations (or using the PE file header). For example just append the file after the stub (called EOF)
Fix the section table and add a last section
Use resources

Unpacking Methods

Debugger + BP: manual, run sample and bp on functions
Run and Dump: easy, tools available
Static unpacking: write a script/cyberchef to extract
Emulation: classic emulation
Self extraction: change the code to write somewhere the unpacked file. Usually for scripts

Unpacking Stubs

Own process unpackers 1.1. Empty section on disk (not existent), but when loaded in memory there is an empty section filled with 0s. You can find this type of packers from PE header
- Uses tail jump: jump to OEP 1.2. Creates ex novo a new section in memory
Process injection unpacking (hollowing/RunPE) 2.1. Standard: Same as before but injects the unpacked code in another process. First it has to suspend the other process, create a new section, extract and set entry point to the target data. Finally resume the other process 2.2. Hollowing: Same as before but unmaps all the other process, then allocates the memory and goes on as said
Hybrid

Unpacking steps

Create target location (create/find other process, create/modify new section)
Prepare location (permissions, like write)
Write the data
Prepare execution (ex. activate the process)
Execution

Unpacking WinUpack

CFF explorer “DLLs can move” - ASLR removed
fix x64dbg when it changes the code: go to the changed opcode: Analysis -> treat from selection as -> Byte. Then go to entry point line -> mark the bytes seen as data -> Analysis -> treat from selection as code

DiE
Check PE header
Find the OEP: 3.1. check memory map, use “Trace over” and set break condition “cip>= 401000 && cip<=40b00” and increase trace maximum steps 3.2. check the saved state (pushad) done before packing. After pushad, follow esp and bp memory on that (if there is a popad around, then it’s good) -> find the tail jmp
Dump (with Scylla). Rebuild imports

A generic approach

Identify
- if virtualizer (you’ll find them with strings): fuck
Skim for obvious encryption (big base64 strings, XOR areas, large int arrays) in interesting areas (EOF, overlay, last section, resources)
Run and dump (mal_unpack.exe and similar)
Debug (API monitoring -> np functions -> dump and fix)
- Log API calls to get an idea on how the stub works
- bp on those functions (transfer code, create process, allocate memory, decryption)
- Fixes (PE unmapping, imports, OEP, header)

What is unmapping?