1. Introduction
In the previous blog post, we explored how Promptfoo excels in functional testing of LLMs. Additionally, Promptfoo can perform non-functional testing, particularly in identifying vulnerabilities and AI security through Red teaming.
Red teaming, a cybersecurity practice, involves simulating attacks to uncover weaknesses. As an open-source tool, Promptfoo provides a robust framework for red teaming LLMs, aiding developers in testing, evaluating, and securing their AI applications.

2. Understanding Red Teaming for LLMs
Red teaming in AI involves creating adversarial inputs to test the weaknesses of an LLM. This process helps in finding potential vulnerabilities like prompt injection, unauthorized actions, or exposure of sensitive data. By systematically testing the model’s responses to these inputs, developers can measure risks and implement strategies to fix them.
LLMs, due to their complexity and wide attack surface, face various security challenges. Red teaming provides a way to measure risk, allowing developers to make informed decisions about acceptable risk levels before deployment. This proactive approach is essential for maintaining the integrity and reliability of AI systems in production environments.
3. Common Threats
3.1. Privacy violations
Gen AI apps depend on large data sources, and adversaries who could gain access to those data sources would pose massive threats to the companies behind the apps.
Even if user privacy isn’t directly violated, companies with AI apps likely don’t want outsiders to know the training data they use. An attacker can make an LLM share phone numbers it shouldn’t to sharing individual email addresses.
A leak of personally identifiable information (PII) is bad in itself, but once adversaries have that PII, they could use the stolen identities to gain unauthorized access to internal companies’ resources—to steal the resources, blackmail the company, or insert malware.

3.2. Prompt injections
Prompt injections resemble SQL injections but present differently. They are a type of attack that chains untrusted user input with trusted prompts built by a trusted developer.
In the example below, with one prompt injection, the attacker hijacked an LLM, convinced the user to disclose their names, and got the user to click on a link that redirected them to a malware website, for example.

3.3. Jailbreaking
Jailbreaking refers to attacks that intentionally subvert the foundational safety filters and guardrails built into the LLMs supporting AI apps. These attacks aim to make the model depart from its core constraints and behavioral limitations.
Jailbreaking can be surprisingly simple—sometimes as easy as copying and pasting a carefully crafted prompt to make a Gen AI app do things it’s fundamentally not supposed to do.


3.4. Generation of Unwanted Content
Separate from jailbreaking, AI apps can sometimes generate unwanted or unsavory content simply due to the broad knowledge base of the foundation model, which may not be limited to the specific use case of the app. This incorrect information can damage the reputation of the company or in worse cases can actually hurt users.

4. Promptfoo’s Red-teaming Features
- Automatically scans 50+ vulnerability types:
- Security & data privacy: jailbreaks, injections, RAG poisoning, etc.
- Compliance & ethics: harmful & biased content, content filter validation, OWASP/NIST/EU compliance, etc.
- Custom policies: enforce organizational guidelines.
- Generates dynamic attack probes tailored to your application using specialized uncensored models.
- Implements state-of-the-art adversarial ML research from Microsoft, Meta, and others.
- Integrates with CI/CD.
- Tests via HTTP API, browser, or direct model access.
5. Setting Up Promptfoo for Red Teaming
- Installation and Initialization:
- Install Promptfoo using npm:
npx promptfoo@latest init - Initialize your project and configure settings as needed.
- Install Promptfoo using npm:
- Generating Adversarial Inputs
- Create a diverse set of malicious intents targeting potential vulnerabilities.
- Use techniques like prompt injection and jailbreaking to craft these inputs.
- Executing Tests
- Analyzing Results
- Evaluate the model’s responses using deterministic and model-graded metrics.
- Identify weaknesses or undesirable behaviors and document them for further analysis.



6. Best Practices for Red Teaming LLMs
- Continuous Monitoring:
- Regularly update and run red teaming tests to keep up with evolving threats.
- Integrate Promptfoo into your CI/CD pipeline for automated security checks.
- Collaborative Approach:
- Share findings with your team and collaborate on mitigation strategies.
- Use Promptfoo’s reporting features to generate detailed vulnerability reports.
- Staying Informed:
- Keep up with emerging standards and best practices in AI security.
- Follow guidelines from organizations like OWASP, NIST, and the EU AI Act.
7. Conclusion
Red teaming is an essential practice for securing LLM applications, and Promptfoo provides a powerful toolset to facilitate this process. By systematically probing your AI systems for vulnerabilities, you can ensure they are robust, reliable, and ready for deployment. Embrace red teaming with Promptfoo to safeguard your AI innovations and contribute to a safer digital future.