A 15-year-old Python vulnerability remains unpatched on hundreds of thousands of open source repositories, causing concern for supply chain risks, according to new research by Trellix.
Kasimir Shulz, a vulnerability researcher at Trellix’s Advanced Research Center, rediscovered the directory traversal flaw that affects Python’s tarfile module while researching another unrelated vulnerability. He detailed CVE-2007-4559, which was never properly patched, in a blog post Wednesday that emphasized how easy it is for attackers to exploit the flaw.
Further analysis of the known vulnerability, or “N-day,” presented a more urgent problem of potential supply chain issues. Python is a widely used open source programming language that has been targeted by threat actors in supply chain attacks previously, including an incident in May where malicious code was discovered in the “ctx Python” library.
If exploited, the Python vulnerability would give attackers the ability to overwrite files, which could lead to system access for Windows, Linux and Docker. Large companies such as Netflix, AWS and Facebook pull from libraries that use the vulnerable tarfile module. Shulz noted in his research the original CVE scored a 6.8. However, Trellix research confirmed that in most cases, an attacker can gain code execution, making the Python vulnerability more severe.
Additionally, Doug McKee, principal engineer and director of vulnerability research at Trellix, told TechTarget Editorial the potential for remote access attacks depends on each individual application. From its research, Trellix found that 12% of the tarfile vulnerabilities exist in the web space, so for that percentage, remote access is very likely. However, 17% of flaws were discovered in the artificial intelligence and machine learning space, which would require social engineering techniques.
In a video demonstration, Trellix showed how an attacker could exploit the Python vulnerability for remote code execution on Spyder IDE, an open source development environment for Python programming. Using Universal Radio Hacker, an open source tool used for wireless protocol analysis, Trellix researchers were able to exploit the vulnerable tarfile module in Spyder and commit several malicious actions to fully compromise the test environment.
“As we have demonstrated above, this vulnerability is incredibly easy to exploit, requiring little to no knowledge about complicated security topics,” Shulz wrote in his report. “Due to this fact and the prevalence of the vulnerability in the wild, Python’s tarfile module has become a massive supply chain issue threatening infrastructure around the world.”
The history of CVE-2007-4559
When it was assigned a CVE 15 years ago, the Python Software Foundation (PSF) included security warnings in the official documentation but ultimately decided not to patch the bug because there was “no known or possible practical exploit.” McKee told TechTarget Editorial that he reached out to PSF immediately after Schulz reported his findings. According to McKee, PSF maintained its original stance, offering no plans to fix the issue and placing responsibility on the developers.
TechTarget reached a member of PSF, but the organization was unable to comment at time of publishing.
While issuing warnings for a vulnerability is one step toward a fix, McKee said it is not a complete solution. He noted the problem for Python has gotten exponentially worse over the last 15 years. When Trellix performed a Google search of how to extract tarfile in Python, researchers found all the tutorials were wrong.
“They’re probably not thinking about a directory traversal attack when they’re programming,” McKee said. “If you’re a mid-level developer and don’t know how to do it, you’re going to Google for it and get the wrong answer.”
In a separate blog post Wednesday, Trellix vulnerability researcher Charles McFarland expanded on potential attack scope for the Python vulnerability. Due to the exceptionally large data volume for vulnerable repositories, Trellix reached out to GitHub for additional access, which expanded the dataset to include more than 500,000 GitHub repos that used the tarfile package. Researchers discovered that more than 300,000 repositories, or 61%, were vulnerable to an attack.
Part of the issue, McFarland noted, is that while new machine learning tools have been introduced to identify vulnerable software code, such as GitHub Copilot, those tools only go so far.
“There is a common saying also popular in the data science community, ‘Garbage in garbage out,'” McFarland wrote in the blog. “With 300,000 erroneous instances of tarfile.extract() or tarfile.extractall(), these machine learning tools are learning to do things insecurely. Not from any fault of the tool but from the fact that it learned from everyone else.”
TechTarget Editorial contacted Microsoft for comment, but the software giant did not provide a statement at press time.
Trellix released detection tools for vendors and currently has patches for 11,000 repositories.
“While we will fix as many repositories as possible, we cannot solve the overall problem. The number of vulnerable repositories we found beg the question, which other N-day vulnerabilities are lurking around in OSS [open source software], undetected or ignored for years?” McFarland wrote.