Programming languages: This sneaky trick could allow attackers to hide ‘invisible’ vulnerabilities in code

If you’re using the Rust programming language — or JavaScript, Java, Go or Python — in a project, you may want to check for potential differences between reviewed code versus the compiled code that’s been output.

The Rust Security Response working group (WG) has flagged a strange security vulnerability that is being tracked as CVE-2021-42574 and is urging developers to upgrade to Rust version 1.56.1.

News of the obscure bug was disseminated in a mailing list today. The Rust project has also flagged the Unicode “bidirectional override” issue in a blogpost. But it’s a general bug that doesn’t affect just Rust but all code that’s written in popular languages that use Unicode.

SEE: Cloud security in 2021: A business guide to essential tools and best practices

Since it is Unicode, this bug affects not just Rust but other top languages, such as Java, JavaScript, Python, C-based languages and code written in other modern languages, according to security researcher Ross Anderson.

Open-source projects such as operating systems often rely on human review of all new code to detect any potentially malicious contributions by volunteers. But the security researchers at Cambridge University said they have discovered ways of manipulating the encoding of source code files so that human viewers and compilers see different logic.

“We have discovered ways of manipulating the encoding of source code files so that human viewers and compilers see different logic. One particularly pernicious method uses Unicode directionality to override characters to display code as an anagram of its true logic. We’ve verified that this attack works against C, C++, C#, JavaScript, Java, Rust, Go, and Python, and suspect that it will work against most other modern languages,” writes Anderson, detailing this bug and a similar “homoglyph” issue tracked as CVE-2021-42694.

“The trick is to use Unicode control characters to reorder tokens in source code at the encoding level. These visually reordered tokens can be used to display logic that, while semantically correct, diverges from the logic presented by the logical ordering of source code tokens. Compilers and interpreters adhere to the logical ordering of source code, not the visual order,” the researchers said. The attack is to use control characters embedded in comments and strings to reorder source code characters in a way that changes its logic.

Software development is international and Unicode — a foundation for text and emoji — supports left-to-right languages, such as English, and right-to-left languages, such as Persian. It does this through “bidirectional override”, an invisible feature called a codepoint that enables embedding left-to-right words inside a right-to-left sentence and vice versa.

While they’re normally used to embed a word inside a sentence constructed in the reverse direction, Anderson and Microsoft security researcher Nicholas Boucher discovered that they could be used to change how source code is displayed in certain editors and code-review tools.

It means that reviewed code can be different than the compiled code and shows how organizations could be hacked through tampered open-source code.