How I stumbled upon CVE-2021-21702 in PHP’s SOAP extension

Over the past year or so, I’ve really been focused on fuzzing research and the different areas I could apply the techniques and tools I’ve come across/created. During this time, I decided to take a break mainly due to feeling burnt out and went back into web pentesting. While looking for some classes of web vulnerabilities, I focused heavily on XXE (XML External Entity) injection as an attack vector. In order to understand how PHP7 mitigates this class of vulnerability, I looked at the SOAPClient library for parsing returned XML data from a SOAP server. After some trial and error, I was able to identify a null dereference bug in the PHP SOAP library that resulted in CVE-2021-21702.

As a part of this break from fuzzing, I’m going to walk you through the steps I took from initial identification of the CVE to the crash analysis and then eventually the fix that was applied by the PHP maintainers. Hopefully part of my approach will aid other developers/security researchers into testing for those “no way this will work…” type of scenarios.

Finding a bug in PHP

With PHP being such a large project, sometimes it can feel intimidating trying to find new bugs/vulnerabilities. The hardest part for me seems to be identifying the attack surface and seeing what hasn’t been looked at a million times so I have a better shot of actually finding something. Just quickly looking at the PHP source code, I knew that operations such as JSON parsing, unserialize, mbstring and a few others have already been looked at enough and also have fuzzers that are continuously looking for these types of vulnerabilities.

I decided a good place for me to start was to go through our own codebase for functions that took in user input and then look to use this knowledge for attacking a specific part of PHP. When looking through several of our codebases, I found it easiest to go through some specific endpoints that I knew took in user input in some way and see how we might’ve implemented it. After a couple of hours, I stumbled upon usage of a SOAP client that would take in some XML as input, search for a specific value to use and then process the returned value. Now I know that XML parsers in a lot of areas have already been fuzzed to death, but my initial thought was to see if I could get XXE working. The idea behind XXE is to cause the XML parser to make requests on behalf of the server to another resource for retrieving files, perform Server Side Request Forgery attacks or even remote code execution through PHP’s expect wrapper.

This turned out to be more difficult than I initially thought because XML SOAP requires some metadata in the file in order for it to parse properly. Through tons of trial and error, I finally came up with something that parsed successfully when inputting into an online XML validator. The eventual payload ended up looking like this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<soap:definitions xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:soap="http://schemas.xmlsoap.org/wsdl/">
<![CDATA[<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "http://localhost:8080/VULNERABLE"> %xxe;]><foo>l</foo>]]>
</soap:definitions>

Now when I finally was able to get this to parse properly, I became super excited thinking I was about to have either an SSRF or if I was really lucky then the expect wrapper was enabled and I could achieve code execution. I prepared my XML locally and then wrote a quick PHP script to test if it would work from my side and something totally different occurred that caught me by surprise:

user@ubuntu:~# php crash.php
Segmentation fault (core dumped)


I couldn’t believe it...I really thought I just broke my PHP install as I had been messing with installing different libraries, instrumenting the PHP source with AddressSanitizer (which you should ALWAYS use on debug builds) and a whole bunch of other things. I tried it again, but on a fresh VM and had the exact same error...a SIGSEGV! I thought there’s no way this is real so let’s download the PHP debug symbols on my fresh VM from Launchpad and step through GDB to find where this is occurring and why.

Crash Analysis

Now there are two ways we can work through this: The first is standard debugging where we step through functions and see where in memory things are becoming corrupted and then trace that to our last instruction. There’s also the “time travel” debugging technique which would allow us to record the state of the program including registers all the way through our program’s crash and then step back from our crashing point to identify how something like our program counter, stack, heap, etc. were corrupted. We’ll be saving that technique for another article as it warrants its own.

For our analysis, we’ll be using GDB with GEF to view the state of our registers, backtrace and even source if we want all on the same screen through a split view. We’ll start up GDB like so and run the program to see what the SIGSEGV backtrace and register states look like:

soapclient-crash
GDB context
Trace
-------------
[#0] 0x7ffff647cda6 → __strcmp_sse2()
[#1] 0x7fffe14c3b0a → node_is_equal_ex(node=0x555555e49020, name=0x7fffe14c81ca "types", ns=0x0)
[#2] 0x7fffe14bf567 → load_wsdl_ex(this_ptr=0x7ffff3813140, struri=0x7ffff38561a0 "http://localhost/xxe.xml", ctx=0x7fffffff9580, include=0x0)
[#3] 0x7fffe14bfbe7 → load_wsdl(this_ptr=0x7ffff3813140, struri=0x7ffff38561a0 "http://localhost/xxe.xml")
[#4] 0x7fffe14c1226 → get_sdl(this_ptr=0x7ffff3813140, uri=0x7ffff38561a0 "http://localhost/xxe.xml", cache_wsdl=0x1)
[#5] 0x7fffe149744a → zim_SoapClient_SoapClient(execute_data=0x7ffff3813120, return_value=<optimized out="">)
[#6] 0x5555558354d6 → ZEND_DO_FCALL_SPEC_HANDLER()
[#7] 0x5555557f004b → execute_ex(ex=<optimized out="">)
[#8] 0x555555844677 → zend_execute(op_array=0x7ffff387e000, return_value=0x7ffff3813030)
[#9] 0x5555557af633 → zend_execute_scripts(type=0x8, retval=0x7ffff3813030, file_count=0x2)</optimized></optimized>
Instruction
-------------
 → 0x7ffff647cda6 <__strcmp_sse2+22> movlpd xmm1, QWORD PTR [rdi]
Registers
--------------
$rax   : 0x0               
$rbx   : 0x0000555555e49020  →  0x0000000000000000
$rcx   : 0xa               
$rdx   : 0x0               
$rsp   : 0x00007fffffff9438  →  0x00007fffe14c3b0a  →  <node_is_equal_ex+26> test eax, eax
$rbp   : 0x0               
$rsi   : 0x00007fffe14c81ca  →  0x6f70007365707974 ("types"?)
$rdi   : 0x0 <-----Pointer to the null page
$rip   : 0x00007ffff647cda6  →  <__strcmp_sse2+22> movlpd xmm1, QWORD PTR [rdi]
</node_is_equal_ex+26>


The important parts here are the bottom section for our trace, our disassembly under code and the state of our registers. The current instruction is looking to take the value from the address of what’s in the RDI register and perform a vectorized mov instruction with the xmm1 register. If we scroll up and look at the current value in RDI, we can see it’s set to 0x0, which is null.

So now you’re probably wondering, how did this occur? Well when we look at the backtrace, the “node_is_equal_ex” function takes in an XML node and a node name to search for. If it doesn’t find it, you would expect it to error out and likely check the next node name. In actuality though, the function will instead perform a strcmp operation on the name it’s looking to find and checking node->name for its value. What we have here then ends up being an invalid strcmp that accesses the null page and will crash out due to an invalid read. We can confirm that node->name is null by stepping to “frame 1” where the “node_is_equal_ex” function is stopped at, dereference our xmlNodePtr struct and then printing the value of node->name:

gef➤ frame 1
#1 0x00007fffe14c3b0a in node_is_equal_ex (node=node@entry=0x555555e49020, name=name@entry=0x7fffe14c81ca "types", ns=ns@entry=0x0) at /build/php7.0-vQINr8/php7.0-7.0.33/ext/soap/php_xml.c:223
223 /build/php7.0-vQINr8/php7.0-7.0.33/ext/soap/php_xml.c: No such file or directory.
gef➤ p node
$1 = (xmlNodePtr) 0x555555e49020
gef➤ p *node
$2 = {
_private = 0x0,
type = XML_CDATA_SECTION_NODE,
name = 0x0, <------------------------ NULL DEREFERENCE BUG
children = 0x0,
last = 0x0,
parent = 0x555555e48e50,
next = 0x0,
prev = 0x0,
doc = 0x555555e4c990,
ns = 0x0,
content = 0x555555e490a0 " %xxe;]><foo>l</foo>",
properties = 0x0,
nsDef = 0x0,
psvi = 0x0,
line = 0x0,
extra = 0x0
}


We did it! We verified the vulnerability and know how to replicate it. But, can we exploit this further for our own good?

Well, in this scenario, I was unable to exploit it further mainly due to time constraints and so I decided to just report to PHP as a security vulnerability since it would crash both mod_php and php cli causing a denial of service condition. I’ll leave it up to the reader to attempt to turn this into something more if even possible.

Vulnerability Triage

This sometimes is the hardest part for me because I’m so anxious to receive the news from the maintainers about the issue. I ended up reporting the vulnerability to PHP Bug Tracker on January 26th and received a very quick response from them in 2 days. After they confirmed the existence of the vulnerability, a fix was applied by February 1st! A week’s turnaround seemed super fast to me and I was really excited to see a vulnerability triaged so quickly and fixed in a short period of time.


Lessons Learned

I actually think they might’ve been able to prove it quicker if I included the debug symbols in my initial report, but that’s something I’ve learned to do after reporting this to them. Providing the developers with as much information as possible about the vulnerability can really help turn a triage from days or weeks to just hours if everything is there for them. I used to think that our job as security researchers was to just find things and pass them on to the teams to try to fix without a root cause analysis. I’ve now realized that to understand the true impact of a vulnerability, it’s necessary to understand how difficult it is to trigger the condition or how exploitable it is. As I further develop my skills, I plan on diving deeper into this area so I can not only report these vulnerabilities, but also create proof of concepts that do more than just a crash.

Conclusion

In this post, we attempted to exploit an XML External Entity attack within PHP, but instead found a null pointer dereference when accessing a non-existent node. We also walked through the process of how the vulnerability was found through trial and error and responsibly disclosed to the PHP maintainers.

Vulnerabilities can be found everywhere in software especially in large projects where there can never be enough eyes reviewing the code. All it takes is identification of an attack surface that other researchers haven’t dug into as much. I hope this brief post helps others rethink where to look for vulnerabilities and how sometimes it can turn out to be something totally different than you were originally expecting.

About the Author

More from this author