Real Example

Discovering Prompt Injection Vulnerabilities in a Product Used by KPMG and 200+ Enterprise Customers’ Mail Template API

Security Disclosure Timeline

Friday, May 16, 5:02 PM - Reported Vulnerability

Saturday, May 17, 2025 10:12 AM - Get Feedback

Monday, May 19, 11:02 AM - Fixed critical vulnerability on LLM integrated API

Introduction

As AI-powered applications continue to gain popularity, ensuring their security against novel attack vectors is critical. Recently, during a security assessment of a product used by KPMG and over 200 enterprise customers, I identified a significant vulnerability related to prompt injection and system prompt manipulation in its mail template creation API. These attack techniques can allow adversaries to override or jailbreak the underlying AI model’s behavior, potentially leading to unauthorized content generation or misuse of AI capabilities.


Background: What is Prompt Injection?

Prompt injection is a technique where attackers craft inputs designed to manipulate the AI model’s instructions or “system prompt.” By embedding malicious instructions or jailbreak payloads, an attacker can bypass intended content restrictions, forcing the model to behave unpredictably or dangerously. This is particularly concerning in applications that expose AI-generated content to end users.


Vulnerability Discovery

While testing the mail template editor endpoint of this product, I used the MithraSec platform to simulate adversarial prompt injection attacks, including popular jailbreak methods such as:

  • Developer Mode / Do Anything Now (DAN) Attacks: Prompts that instruct the AI to ignore content policies and generate unrestricted outputs.

  • Classical Jailbreaks: Payloads that instruct the AI to behave as if it has unrestricted capabilities, often by requesting dual responses or overriding system prompts.

  • Anti-DAN Safeguards Bypass: Techniques to circumvent defensive filters that attempt to block such prompt injections.

My tests revealed that the API accepts input which allows these prompt injections to succeed, effectively enabling an attacker to override safety policies and produce unrestricted, potentially harmful content in mail templates.


Proof-of-Concept Summary

One example of a successful injection is the “Developer Mode” prompt, instructing the AI to enter a “developer mode” that ignores all previous restrictions and generates any requested content — including offensive or explicit material. This prompt is inserted directly into the mail template content and processed by the API, resulting in bypassing normal content filtering.

The platform responded with two distinct outputs: one adhering to standard restrictions, and the other simulating “Developer Mode” behavior with uncensored content. This demonstrates a critical failure in isolating user input from AI system prompts.


Impact

This vulnerability could allow malicious users to:

  • Generate harmful, offensive, or misleading email content sent via the platform.

  • Circumvent content policies, exposing users or recipients to inappropriate or dangerous messages.

  • Abuse AI capabilities for social engineering or phishing campaigns.

Such weaknesses undermine the trust and safety of AI-driven applications and pose legal and reputational risks for vendors.


Responsible Disclosure and Vendor Response

I reported these findings responsibly to the product vendor through official channels. While no public bug bounty program currently exists, the vendor acknowledged the report and committed to review the issues.


Recommendations for Mitigation

To address prompt injection vulnerabilities, AI-driven platforms should consider:

  • Strict sanitization and validation of user inputs, especially those that affect AI prompts.

  • Isolating user-controlled content from system-level instructions.

  • Implementing layered filtering and heuristic detection of injection patterns.

  • Employing AI model fine-tuning or guardrails to recognize and reject jailbreak attempts.

  • Continuous security testing with emerging prompt injection techniques.


Conclusion

Prompt injection represents a rising threat vector in AI-powered applications, demanding rigorous security controls. The discovery of this vulnerability in a product trusted by KPMG and hundreds of enterprises highlights the urgency for vendors to adopt proactive defenses and for security researchers to continue exploring AI-specific risks.

Last updated