Malware Analysis for Hedgeogs
Lecture 1 Intro
Analysis Process:
- Triage: make assumptions -> select tools
- Analysis: analysis -> make assumptions -> new analysis -> get facts
- Report: write down facts -> verdict
lots of guessing
Lecture 2 VMs
Environment: classic vm installation, the interesting part is number 7 (shared folders scripts), 8(secure folders) and 9(secure network). Jigsaw detonation. Pretty cool. Also the files contain a lot of additional info
Lecture 3 Triage
Decide what to tackle first (parable of the blind men and the elefant)
Get Overview -> Determine tools and step -> Choose samples -> Discover low hanging fruits
Tirage Steps:
- File type: TrID and hex editor. DetectItEasy in case of PE
- Polyglots: files that have more than one type
- Shift+RightClick -> Powershell in the context menu
- Whole file examination: Strings(strings.exe by sysinternal), visualization, hex editor execution, ability to embed files, icons
- Metadata viewer
- use specific parser for the filetype
- read specifications. Look for: magic bytes, ability for code
- Automatic reports: sandboxes and antivirus
- Malware names component: Type|Platform|Family|Variant(singature or id)|Modifier
- Defaults: Type=Trojan, Family=Agent (These do not mean anything)
- Be careful of Specific(small variant, concrete type, no default) vs Unspecific names(long variant, use of “gen”"susp""heur"[and other scoring type]"ml""AI"\specific names of detection technologies, defaults). List of detection techs: Kazy, Razy, Zusy, Raftor, WisdomEyes, Artemis(only mcafee)
- Keywords in families(table file)
- CARO naming conventions
- Use malpedia when you get the name
- First research: internet info on what you have
Quick Analysis
- “&” means new line in cmd.exe
Lecture 4 Wrappers
You want the code that does the things, but usually the malware is packed and wrapped in other files. What you’ll find can be
- The actual malware
- Init code
- Environments (wrappers or installers)
- Static linked libs
Wrappers:
-
interpreters for the intepreted script. In binary this results in a 3 parts binary (unpacker/runner, environment, script proper)
-
ex. Launch4J, Bat2Exe, PyInstaller
How to Unwrap?
-
Dynamically: usually upackers drop the real script in %TEMP%, so monitor file writing -> run exe -> copy the files written
-
Statically: search in hexdump and search with strings.exe (you can find the region with magic numbers)
-
Statically (Encrypted): find the wrapper with DiE and search the extraction tool to get the code
-
Use procmon and apimonitor
-
Use deny deletion on folders to make the files not removable
-
Turn of clickable links on notepad++
Installers: - builders with propertary script - ex NSIS, Inno
How to obtain the binaries inside? - Unarchivers (7z), extractors found online (use DiE), 7z v1505 has NSIS extraction compatibility - Dynamic: monitor the file write and get the files - Analyze installer code (try to use the installer yourself to understand better)
- renamer: mass renamer
- you have to learn the scripting language of shit
Lecture 5 ASEP
Auto Start Extensibility Points Types:
- System persistence (provided by windows)
- Run, RunOnce, RunOnceEx
- Startup folder (path)
- Scheduled tasks (path)
- Services (SCM)
- Program Loader Abuse (exploit windows loader)
- Image file execution option
- extension hijacking
- shortcut manipulation
- COM hijacking
- SHIM databases (path)
- Application Abuse (exploit plugins)
- Trojanized system binaries
- Office add-ins
- Browser Helper Objects (BHO) [no more used]
- System Behavior (exploit windows)
- WinLogon (change file manager, change notification package, change userinit)
- DLL hijacking
- Appinit DLLs [no more used]
- Active Setup
Examination Tools - Sysinternals Autorun.exe - WineSap - Farbar recovery scan tool
Windows Registry: where the malware goes to get permission and persistency - Value Data types (might is useful): REG_SZ (path/names string), REG_DWORD (usually binary), REG_BINARY (can stroe whole files) - 9 root keys HK– and 7 hives. Hive: set of keys with files associated
Tools: sc.exe: create services. services.msc: see active services (you have to close it before removing it from the registry) autoruns64.exe: check autoruns (show hidden). Be careful because it is better to cancel from the regmon than from autorun.exe (it might not work or fuck up shit)
- In the tool there are variuìous techniques. CHeck lec 5 lab 6
Lecture 6 PE & .NET
- In the PE header things are in little endian, except MSDOS header
- Important offset: 0x3C -> pointer to PE header
- After the sections you have the “overlay” aka a section that is not part of the PE specification
- Pipeline to find things: 0x00 (MZ) -> 0x3C points to PE header -> +4b you find COFF header -> +20b you find Optional Header -> +24+SizeofOptionalHeader you find section table -> +40*N of sections you find sections -> after everything overlay
Lecture 7 Analysis
Types:
-
Static
-
Dynamic
-
Meta inspection (aka basic)
-
Code inspection (aka advanced)
-
Can be combined (matrix 2x2)
When to use?
- SM -> Triage
- DM -> Triage and main starting point
- SC -> Main analysis
- DC -> Main analysis (aid for SC)
Verdicts: describe the analysis in a nutshell, basically a summary. Important to consider: general summary, possible danger, relevance, classifiability Examples of name: Malware, Riskware(hacking tools), Grayware(bad things, not malware), PUP, Corrupted, Clean
Clean or Malware?
- Trojanized software
- Packed programs that do not show anything (not explained in this lecture)
- Grayware like cracks
It’s hard when the code you are analyzing is clean, you have to prove absence (know when to stop, set a timer) -> Check the metadata if they make sense (like the name, certificates and so on) -> Widespread of the code and age (virustotal) -> Check entry points to see if something else is doing the bad things
Diff binaries tools:
- Ybindiff= very simple, good for small differences
- Meld= for text comparison, good with decompiled code
- Bindiff= for disassemblers (use Binexport with Ghidra)
- Portexanalyzer
Find certificates (use Analyzepesig)
- Bytes after singature should be 0
- Bytes after PKCS7 should be 0
Signature verification
- There is an area in the file that calculates the file’s hash (PKCS)
- There are areas excluded from the calculation (signature itself and padding, checksum and pointer to hash)
- Data can be hidden in the digital singature
Lecture 8 Reports
Not a fixed way to do it
Situation 1, you are in antivirus company
- Hash
- Submitter
- Date
- Reason to submission
- Additional info
- Description of what the file is doing
- AV detection before and after
- Verdict
Situation 2, blog
- Tell a story
- Technical details (infection vector, persistence, evasion techniques, idiosyncracies, communication, potential damage)
- Classification and type
- Protection opportunities
- IOC (hashes, filenames, C2 severs, URLs)
How to classify Malware (Type, Subtype, Family, Subfamily, Variant)
Types by propagation:
-
Virus: file infector
-
Worm: self replicate
-
Other (Trojan): no self replicate
-
Peter Szor infections strategies
Types by payload:
- Ransomware: file, screen
- Backdoor: RAT, webshell
- Stealer: credential, cookies
- Dropper (has other malware inside it)
- Downloader (downloads malware)
- Loader (loads malware without dropping it in disk)
Families:
- Start from detection names
- Use malpedia
- Look for aliases
- Search unique strings
- Binary diff and code overlaps
Analyzing Notes: 0. Hash
- File Type
- Malware Type
- Malware Family
- Communication
- Persistence
- Main Behavior
Cyberchef: the way to download/load/drop malware from other malware
- Exiftool for images
Lecture 9 Ghidra
Symbol Tree
- Import: thing you can see with a PE viewer
- Export: all the entry points: exporte functions and the PE entry point (called “entry”)
- Labels: similar to functions but for data (ex structs)
- Classes: for C++ classes (care for name mangling)
- Namespaces: to avoid conflicts
Data Type Manager
- Majority of types are already guessed by Ghidra
- You should add yours here
How to find main:
- MinGW: “mainSOMETHING”(mainCRTStartup), entry -> _tMainCRTStratup -> main.
- VIsual Studio: entry -> common main -> scrt common main -> invoke_main
If not debug, every main function return success or failure (0 or 1 in int). So go to the end and check what is returned (or exit) and trace it back
- in Ghidra, something like iVar1 = (int) (iVar2 & 0xfffffffff) it’s used to convert from 64 to 32
- Always triage first. Check if it is interpreted or similar
Lecture 10 x64DBG
Memory breakpoint are implemented with Page Guards: breakpoint is on the whole page
ASLR: Moving from Ghidra to x64DBG. ASLR means that things are not loaded at image base Ways to make it work:
- Rebasing: change the base address from Ghidra (memory map -> “hose” -> image address)
- Patching the sample: patch the ASLR flag in the PE header (field name “dll characteristics”)
- Turn off exploit protection :)
- nible = 4 bits
Lecture 11 Legion Ransomware
Simulation top to bottom analysis (understand the ransomware and decrypt data)
First Part
- Check the ransomware note -> an email, research that
- Check the encrypted file -> the beginning seems to have a signature, also the end (patterns) -> also patterns in the naming scheme
- DiE on the ransomware
- Strings on ransomware -> something that is related to what we already know -> “PASSWORD_MARKER”, a C2 link, the API
- Overlay -> Image Homework: Triage, VirusTotal, google strings
Second part
- Open Ghidra
- Find the main (classic entry and follow the returns)
- Use API calls to make ideas in your head of the flow -> Use loops to do the same thing (arrow that goes up)
- Apply names and rename (markup), but only on the important parts (in our case the encryption func)
- Times checking probably are checking the demo
Third part
- Find the encryption, quick way: identify API useful, identify loop, analyze the loop OR use API calls that do encryption or “write” calls
- Check XREF and find the function we care about
- Found the XOR and the encryption loop
- Add the functions you renamed to a namespace
Fourth part
- Patch the ransomware so it always run (not only of some days). You know how to do it brother
- Run some monitoring tools (procmon, procexp) to monitor what is happening
Lecture 12 Packing
Types:
- Compressor: shrink the code (UPX)
- Crypter: evade AV
- Protector: prevent RE. Sum of the 2
- The stub might be given as input for some packers. Stubs can be different each time (especially for crypters)
How they are run?
- They’re own process
- Inject in other processes (RunPE)
Scantime crypter = dropper builder (writes the file to disk and runs it, as a dropper)
Target Location: how the packer finds the encrypted file?
- Start and end markers
- Fixed locations (or using the PE file header). For example just append the file after the stub (called EOF)
- Fix the section table and add a last section
- Use resources
Unpacking Methods
- Debugger + BP: manual, run sample and bp on functions
- Run and Dump: easy, tools available
- Static unpacking: write a script/cyberchef to extract
- Emulation: classic emulation
- Self extraction: change the code to write somewhere the unpacked file. Usually for scripts
Unpacking Stubs
- Own process unpackers
1.1. Empty section on disk (not existent), but when loaded in memory there is an empty section filled with 0s. You can find this type of packers from PE header
- Uses tail jump: jump to OEP 1.2. Creates ex novo a new section in memory
- Process injection unpacking (hollowing/RunPE) 2.1. Standard: Same as before but injects the unpacked code in another process. First it has to suspend the other process, create a new section, extract and set entry point to the target data. Finally resume the other process 2.2. Hollowing: Same as before but unmaps all the other process, then allocates the memory and goes on as said
- Hybrid
Unpacking steps
- Create target location (create/find other process, create/modify new section)
- Prepare location (permissions, like write)
- Write the data
- Prepare execution (ex. activate the process)
- Execution
Unpacking WinUpack
- CFF explorer “DLLs can move” - ASLR removed
- fix x64dbg when it changes the code: go to the changed opcode: Analysis -> treat from selection as -> Byte. Then go to entry point line -> mark the bytes seen as data -> Analysis -> treat from selection as code
- DiE
- Check PE header
- Find the OEP: 3.1. check memory map, use “Trace over” and set break condition “cip>= 401000 && cip<=40b00” and increase trace maximum steps 3.2. check the saved state (pushad) done before packing. After pushad, follow esp and bp memory on that (if there is a popad around, then it’s good) -> find the tail jmp
- Dump (with Scylla). Rebuild imports
A generic approach
- Identify
- if virtualizer (you’ll find them with strings): fuck
- Skim for obvious encryption (big base64 strings, XOR areas, large int arrays) in interesting areas (EOF, overlay, last section, resources)
- Run and dump (mal_unpack.exe and similar)
- Debug (API monitoring -> np functions -> dump and fix)
- Log API calls to get an idea on how the stub works
- bp on those functions (transfer code, create process, allocate memory, decryption)
- Fixes (PE unmapping, imports, OEP, header)
What is unmapping?