Latest attack on PyPI users shows crooks are only getting better – Ars Technica
More than 400 malicious packages were recently uploaded to PyPI (Python Package Index), the official code repository for the Python programming language, which is the latest indication that targeting software developers using this form of attack is not a passing fad is.
All 451 packages recently found by security firm Phylum contained almost identical malicious payloads and were uploaded in rapid succession bursts. Once installed, the packages create a malicious JavaScript extension that loads every time a browser is opened on the infected device, a trick that gives the malware persistence across reboots.
The JavaScript monitors the infected developer’s clipboard for cryptocurrency addresses that may be copied there. If an address is found, the malware replaces it with an attacker’s address. The goal: to intercept payments that the developer intended to make to another party.
In November, Phylum identified dozens of packages that were downloaded hundreds of times and used highly coded JavaScript to covertly do the same thing. Specifically it says:
- Creates a text area on the page
- Pasted clipboard contents
- Used a set of regular expressions to search for common cryptocurrency address formats
- All identified addresses have been replaced with the attacker-controlled addresses in the previously created text area
- Copied the text area to the clipboard
“If at any point a compromised developer copies a wallet address, the malicious package will replace the address with an attacker-controlled address,” Louis Lang, Phylum’s chief technical officer, wrote in the November post. “This stealthy search/replace will result in the end user inadvertently sending their funds to the attacker.”
New obfuscation method
In addition to the huge increase in malicious packages uploaded, the latest campaign also uses a significantly different way to cover its tracks. While the packages released in November used encoding to obfuscate the behavior of JavaScript, the new packages write function and variable identifiers in seemingly random 16-bit combinations of Chinese language ideographs, found in the following table:
Unicode code point | ideogram | definition |
---|---|---|
0x4eba | 人 | Man; People; Humanity; someone else |
0x5200 | 刀 | Knife; old coin; measure |
0x53e3 | 口 | Mouth; open end; entrance gate |
0x5973 | 女 | woman, girl; feminine |
0x5b50 | 子 | Child; fruit, seeds of |
0x5c71 | 山 | mountain, hill, summit |
0x65e5 | 日 | Sun; Day; time of day |
0x6708 | 月 | Moon; Month |
0x6728 | 木 | Tree; wood, timber; wooden |
0x6c34 | 水 | Water, Liquid, Lotion, Juice |
0x76ee | 目 | Eye; Have a look; division, theme |
0x99ac | 馬 | Horse; Last name |
0x9a6c | 马 | Horse; Last name |
0x9ce5 | 鳥 | Bird |
0x9e1f | 鸟 | Bird |
Using this table, the line of code
''.join(map(getattr(__builtins__, oct.__str__()[-3 << 0] + hex.__str__()[-1 << 2] + copyright.__str__()[4 << 0]), [(((1 << 4) - 1) << 3) - 1, ((((3 << 2) + 1)) << 3) + 1, (7 << 4) - (1 << 1), ((((3 << 2) + 1)) << 2) - 1, (((3 << 3) + 1) << 1)]))
creates the built-in function chr
and maps the function to the list of integers [119, 105, 110, 51, 50]
. Then the line combines it into a string that ultimately results 'win32'
.
Phylum researchers stated:
We can see a number of such calls
oct.__str__()[-3 << 0]
. The[-3 << 0]
evaluates[-3]
Andoct.__str__()
evaluates the string'<built-in function oct>'
. Using Python’s index operator[]
on a string with a-3
in this case, the 3rd character from the end of the character string is used'<built-in function oct>'[-3]
will evaluate'c'
. If we continue with that on the other 2, we get here'c' + 'h' + 'r'
and simply evaluating the complex bitwise arithmetic appended at the end leaves us with:''.join(map(getattr(__builtins__, 'c' + 'h' + 'r'), [119, 105, 110, 51, 50]))
The
getattr(__builtins__, 'c' + 'h' + 'r')
just gives us the built-in functionchr
and then it mapschr
to the list of ints[119, 105, 110, 51, 50]
and then puts it all together into a chain that ultimately gives us'win32'
. This technique is continued throughout the code.
Ultimately, although the technique gives the appearance of heavily obfuscated code, the researchers said the technique is easy to circumvent by simply observing what the code is doing when it’s executed.
The latest batch of malicious packages tries to capitalize on typos developers make when downloading any of these legitimate packages:
- bitcoinlib
- ccxt
- crypto comparison
- crypto feed
- Freqtrade
- selenium
- Solana
- vyper
- web sockets
- finance
- pandas
- matplotlib
- aiohttp
- nice soup
- tensor flow
- selenium
- scratchy
- colorama
- learn scikit
- Pytorch
- pygames
- pyinstaller
For example, packages targeting the legitimate vyper package used 13 filenames that omitted or duplicated a single character, or swapped two characters of the correct name:
- Ypres
- vper
- alas
- vip
- vvyper
- vyyper
- vypper
- vipeer
- vyperr
- Yvper
- vpyer
- viepr
- vypre
“This technique is trivially easy to automate with a script (we leave this as an exercise for the reader), and as the length of the legitimate package name increases, so do the possible typosquats,” the researchers wrote. “For example, our system has 38 typosquats of the cryptocompare
Package released almost simultaneously by the named user pinigin.9494
.”
The availability of malicious packages in legitimate code repositories that closely resemble the names of legitimate packages dates back at least to 2016, when a college student uploaded 214 booby-trapped packages to the PyPI, RubyGems, and NPM repositories, containing slightly modified names of legitimate packages. The result: the scammer code was run more than 45,000 times on more than 17,000 separate domains, and more than half of them were given omnipotent administrator rights. Since then, so-called typosquatting attacks have flourished.
The names of all 451 malicious packages found by the Phylum researchers are included in the blog post. It’s not a bad idea for anyone who intends to download one of the legitimate packages to check that they haven’t accidentally received a malicious look-alike.