What is a threat?

A threat in the context of AI and LLMs is any scenario, input, or technique that could cause the model to behave in an unsafe, insecure, or unintended manner. Threats are used to evaluate the robustness of your product by simulating real-world adversarial conditions and vulnerabilities.

In the SDK, the threats are known as variants, and the same parameter is used for both Quality Tests and Red Teaming Tests. However, the available options differ between the two types of tests.

Threat Types

Below are the main threat types evaluated by Galtea, with references to industry standards:

  • Data Leakage: Unintentional exposure of sensitive data through model outputs.

    • OWASP Top 10 for LLMs 2025: LLM02: Sensitive Information Disclosure
    • MITRE ATLAS: Exfiltration via Inference API
    • MITRE ATLAS: LLM Data Leakage
    • NIST AI RMF: Data Privacy
  • Financial Attacks: Exploiting the model for financial gain, such as generating fake reviews or phishing attacks.

    • OWASP Top 10 for LLMs 2025: LLM09: Misinformation
  • Illegal Activities: Using the model to facilitate illegal activities, such as drug trafficking or human trafficking.

    • MITRE ATLAS: Jailbreak
    • MITRE ATLAS: External Harms
    • NIST AI RMF: CBRN Information or Capabilities
    • NIST AI RMF: Dangerous, Violent or Hateful Content
    • NIST AI RMF: Environmental Impact
  • Misuse: Using the model for unintended purposes, such as generating fake news or misinformation.

    • MITRE ATLAS: Evade ML Model
  • Toxicity: Generating harmful or toxic content, such as hate speech or harassment.

    • MITRE ATLAS: Erode ML Model Integrity
    • NIST AI RMF: Harmful Bias or Homogenization
    • NIST AI RMF: Obscene, Degrading and/or Abusive Content
  • Custom: Allows the generation of highly specific adversarial tests that target the unique vulnerabilities and edge cases of your AI product. Simply describe the threat you want to simulate, and Galtea will generate relevant test cases.

Why Evaluate Against Threats?

Evaluating your product against these threats helps ensure:

  • Security: Prevents exploitation of the model for malicious purposes.
  • Privacy: Reduces the risk of leaking sensitive or private information.
  • Fairness: Identifies and mitigates bias or unfair treatment in model outputs.
  • Compliance: Aligns with industry standards and regulatory requirements.

References