Malware Analysis for Hedgeogs

Lecture 1 Intro

Analysis Process:

  1. Triage: make assumptions -> select tools
  2. Analysis: analysis -> make assumptions -> new analysis -> get facts
  3. Report: write down facts -> verdict

lots of guessing

Lecture 2 VMs

Environment: classic vm installation, the interesting part is number 7 (shared folders scripts), 8(secure folders) and 9(secure network). Jigsaw detonation. Pretty cool. Also the files contain a lot of additional info

Lecture 3 Triage

Decide what to tackle first (parable of the blind men and the elefant)

Get Overview -> Determine tools and step -> Choose samples -> Discover low hanging fruits

Tirage Steps:

  1. File type: TrID and hex editor. DetectItEasy in case of PE
    • Polyglots: files that have more than one type
    • Shift+RightClick -> Powershell in the context menu
  2. Whole file examination: Strings(strings.exe by sysinternal), visualization, hex editor execution, ability to embed files, icons
  3. Metadata viewer
    • use specific parser for the filetype
    • read specifications. Look for: magic bytes, ability for code
  4. Automatic reports: sandboxes and antivirus
    • Malware names component: Type|Platform|Family|Variant(singature or id)|Modifier
    • Defaults: Type=Trojan, Family=Agent (These do not mean anything)
    • Be careful of Specific(small variant, concrete type, no default) vs Unspecific names(long variant, use of “gen”"susp""heur"[and other scoring type]"ml""AI"\specific names of detection technologies, defaults). List of detection techs: Kazy, Razy, Zusy, Raftor, WisdomEyes, Artemis(only mcafee)
    • Keywords in families(table file)
    • CARO naming conventions
    • Use malpedia when you get the name
  5. First research: internet info on what you have

Quick Analysis

  • “&” means new line in cmd.exe

Lecture 4 Wrappers

You want the code that does the things, but usually the malware is packed and wrapped in other files. What you’ll find can be

  1. The actual malware
  2. Init code
  3. Environments (wrappers or installers)
  4. Static linked libs

Wrappers:

  • interpreters for the intepreted script. In binary this results in a 3 parts binary (unpacker/runner, environment, script proper)

  • ex. Launch4J, Bat2Exe, PyInstaller

How to Unwrap?

  • Dynamically: usually upackers drop the real script in %TEMP%, so monitor file writing -> run exe -> copy the files written

  • Statically: search in hexdump and search with strings.exe (you can find the region with magic numbers)

  • Statically (Encrypted): find the wrapper with DiE and search the extraction tool to get the code

  • Use procmon and apimonitor

  • Use deny deletion on folders to make the files not removable

  • Turn of clickable links on notepad++

Installers: - builders with propertary script - ex NSIS, Inno

How to obtain the binaries inside? - Unarchivers (7z), extractors found online (use DiE), 7z v1505 has NSIS extraction compatibility - Dynamic: monitor the file write and get the files - Analyze installer code (try to use the installer yourself to understand better)

  • renamer: mass renamer
  • you have to learn the scripting language of shit

Lecture 5 ASEP

Auto Start Extensibility Points Types:

  1. System persistence (provided by windows)
    • Run, RunOnce, RunOnceEx
    • Startup folder (path)
    • Scheduled tasks (path)
    • Services (SCM)
  2. Program Loader Abuse (exploit windows loader)
    • Image file execution option
    • extension hijacking
    • shortcut manipulation
    • COM hijacking
    • SHIM databases (path)
  3. Application Abuse (exploit plugins)
    • Trojanized system binaries
    • Office add-ins
    • Browser Helper Objects (BHO) [no more used]
  4. System Behavior (exploit windows)
    • WinLogon (change file manager, change notification package, change userinit)
    • DLL hijacking
    • Appinit DLLs [no more used]
    • Active Setup

Examination Tools - Sysinternals Autorun.exe - WineSap - Farbar recovery scan tool

Windows Registry: where the malware goes to get permission and persistency - Value Data types (might is useful): REG_SZ (path/names string), REG_DWORD (usually binary), REG_BINARY (can stroe whole files) - 9 root keys HK– and 7 hives. Hive: set of keys with files associated

Tools: sc.exe: create services. services.msc: see active services (you have to close it before removing it from the registry) autoruns64.exe: check autoruns (show hidden). Be careful because it is better to cancel from the regmon than from autorun.exe (it might not work or fuck up shit)

  • In the tool there are variuìous techniques. CHeck lec 5 lab 6

Lecture 6 PE & .NET

  • In the PE header things are in little endian, except MSDOS header
  • Important offset: 0x3C -> pointer to PE header
  • After the sections you have the “overlay” aka a section that is not part of the PE specification
  • Pipeline to find things: 0x00 (MZ) -> 0x3C points to PE header -> +4b you find COFF header -> +20b you find Optional Header -> +24+SizeofOptionalHeader you find section table -> +40*N of sections you find sections -> after everything overlay

Lecture 7 Analysis

Types:

  • Static

  • Dynamic

  • Meta inspection (aka basic)

  • Code inspection (aka advanced)

  • Can be combined (matrix 2x2)

When to use?

  • SM -> Triage
  • DM -> Triage and main starting point
  • SC -> Main analysis
  • DC -> Main analysis (aid for SC)

Verdicts: describe the analysis in a nutshell, basically a summary. Important to consider: general summary, possible danger, relevance, classifiability Examples of name: Malware, Riskware(hacking tools), Grayware(bad things, not malware), PUP, Corrupted, Clean

Clean or Malware?

  1. Trojanized software
  2. Packed programs that do not show anything (not explained in this lecture)
  3. Grayware like cracks

It’s hard when the code you are analyzing is clean, you have to prove absence (know when to stop, set a timer) -> Check the metadata if they make sense (like the name, certificates and so on) -> Widespread of the code and age (virustotal) -> Check entry points to see if something else is doing the bad things

Diff binaries tools:

  • Ybindiff= very simple, good for small differences
  • Meld= for text comparison, good with decompiled code
  • Bindiff= for disassemblers (use Binexport with Ghidra)
  • Portexanalyzer

Find certificates (use Analyzepesig)

  • Bytes after singature should be 0
  • Bytes after PKCS7 should be 0

Signature verification

  • There is an area in the file that calculates the file’s hash (PKCS)
  • There are areas excluded from the calculation (signature itself and padding, checksum and pointer to hash)
  • Data can be hidden in the digital singature

Lecture 8 Reports

Not a fixed way to do it

Situation 1, you are in antivirus company

  1. Hash
  2. Submitter
  3. Date
  4. Reason to submission
  5. Additional info
  6. Description of what the file is doing
  7. AV detection before and after
  8. Verdict

Situation 2, blog

  1. Tell a story
  2. Technical details (infection vector, persistence, evasion techniques, idiosyncracies, communication, potential damage)
  3. Classification and type
  4. Protection opportunities
  5. IOC (hashes, filenames, C2 severs, URLs)

How to classify Malware (Type, Subtype, Family, Subfamily, Variant)

Types by propagation:

  • Virus: file infector

  • Worm: self replicate

  • Other (Trojan): no self replicate

  • Peter Szor infections strategies

Types by payload:

  • Ransomware: file, screen
  • Backdoor: RAT, webshell
  • Stealer: credential, cookies
  • Dropper (has other malware inside it)
  • Downloader (downloads malware)
  • Loader (loads malware without dropping it in disk)

Families:

  • Start from detection names
  • Use malpedia
  • Look for aliases
  • Search unique strings
  • Binary diff and code overlaps

Analyzing Notes: 0. Hash

  1. File Type
  2. Malware Type
  3. Malware Family
  4. Communication
  5. Persistence
  6. Main Behavior

Cyberchef: the way to download/load/drop malware from other malware

  • Exiftool for images

Lecture 9 Ghidra

Symbol Tree

  • Import: thing you can see with a PE viewer
  • Export: all the entry points: exporte functions and the PE entry point (called “entry”)
  • Labels: similar to functions but for data (ex structs)
  • Classes: for C++ classes (care for name mangling)
  • Namespaces: to avoid conflicts

Data Type Manager

  • Majority of types are already guessed by Ghidra
  • You should add yours here

How to find main:

  1. MinGW: “mainSOMETHING”(mainCRTStartup), entry -> _tMainCRTStratup -> main.
  2. VIsual Studio: entry -> common main -> scrt common main -> invoke_main

If not debug, every main function return success or failure (0 or 1 in int). So go to the end and check what is returned (or exit) and trace it back

  • in Ghidra, something like iVar1 = (int) (iVar2 & 0xfffffffff) it’s used to convert from 64 to 32
  • Always triage first. Check if it is interpreted or similar

Lecture 10 x64DBG

Memory breakpoint are implemented with Page Guards: breakpoint is on the whole page

ASLR: Moving from Ghidra to x64DBG. ASLR means that things are not loaded at image base Ways to make it work:

  1. Rebasing: change the base address from Ghidra (memory map -> “hose” -> image address)
  2. Patching the sample: patch the ASLR flag in the PE header (field name “dll characteristics”)
  3. Turn off exploit protection :)
  • nible = 4 bits

Lecture 11 Legion Ransomware

Simulation top to bottom analysis (understand the ransomware and decrypt data)

First Part

  1. Check the ransomware note -> an email, research that
  2. Check the encrypted file -> the beginning seems to have a signature, also the end (patterns) -> also patterns in the naming scheme
  3. DiE on the ransomware
  4. Strings on ransomware -> something that is related to what we already know -> “PASSWORD_MARKER”, a C2 link, the API
  5. Overlay -> Image Homework: Triage, VirusTotal, google strings

Second part

  1. Open Ghidra
  2. Find the main (classic entry and follow the returns)
  3. Use API calls to make ideas in your head of the flow -> Use loops to do the same thing (arrow that goes up)
  4. Apply names and rename (markup), but only on the important parts (in our case the encryption func)
  5. Times checking probably are checking the demo

Third part

  1. Find the encryption, quick way: identify API useful, identify loop, analyze the loop OR use API calls that do encryption or “write” calls
  2. Check XREF and find the function we care about
  3. Found the XOR and the encryption loop
  4. Add the functions you renamed to a namespace

Fourth part

  1. Patch the ransomware so it always run (not only of some days). You know how to do it brother
  2. Run some monitoring tools (procmon, procexp) to monitor what is happening

Lecture 12 Packing

Types:

  1. Compressor: shrink the code (UPX)
  2. Crypter: evade AV
  3. Protector: prevent RE. Sum of the 2
  • The stub might be given as input for some packers. Stubs can be different each time (especially for crypters)

How they are run?

  1. They’re own process
  2. Inject in other processes (RunPE)

Scantime crypter = dropper builder (writes the file to disk and runs it, as a dropper)

Target Location: how the packer finds the encrypted file?

  1. Start and end markers
  2. Fixed locations (or using the PE file header). For example just append the file after the stub (called EOF)
  3. Fix the section table and add a last section
  4. Use resources

Unpacking Methods

  1. Debugger + BP: manual, run sample and bp on functions
  2. Run and Dump: easy, tools available
  3. Static unpacking: write a script/cyberchef to extract
  4. Emulation: classic emulation
  5. Self extraction: change the code to write somewhere the unpacked file. Usually for scripts

Unpacking Stubs

  1. Own process unpackers 1.1. Empty section on disk (not existent), but when loaded in memory there is an empty section filled with 0s. You can find this type of packers from PE header
    • Uses tail jump: jump to OEP 1.2. Creates ex novo a new section in memory
  2. Process injection unpacking (hollowing/RunPE) 2.1. Standard: Same as before but injects the unpacked code in another process. First it has to suspend the other process, create a new section, extract and set entry point to the target data. Finally resume the other process 2.2. Hollowing: Same as before but unmaps all the other process, then allocates the memory and goes on as said
  3. Hybrid

Unpacking steps

  1. Create target location (create/find other process, create/modify new section)
  2. Prepare location (permissions, like write)
  3. Write the data
  4. Prepare execution (ex. activate the process)
  5. Execution

Unpacking WinUpack

  • CFF explorer “DLLs can move” - ASLR removed
  • fix x64dbg when it changes the code: go to the changed opcode: Analysis -> treat from selection as -> Byte. Then go to entry point line -> mark the bytes seen as data -> Analysis -> treat from selection as code
  1. DiE
  2. Check PE header
  3. Find the OEP: 3.1. check memory map, use “Trace over” and set break condition “cip>= 401000 && cip<=40b00” and increase trace maximum steps 3.2. check the saved state (pushad) done before packing. After pushad, follow esp and bp memory on that (if there is a popad around, then it’s good) -> find the tail jmp
  4. Dump (with Scylla). Rebuild imports

A generic approach

  1. Identify
    • if virtualizer (you’ll find them with strings): fuck
  2. Skim for obvious encryption (big base64 strings, XOR areas, large int arrays) in interesting areas (EOF, overlay, last section, resources)
  3. Run and dump (mal_unpack.exe and similar)
  4. Debug (API monitoring -> np functions -> dump and fix)
    • Log API calls to get an idea on how the stub works
    • bp on those functions (transfer code, create process, allocate memory, decryption)
    • Fixes (PE unmapping, imports, OEP, header)

What is unmapping?