Lunar Flow | OpenAI Prompt Abuse Filter

This Lunar Flow lets you scan the prompts your services send to the OpenAI API for block-listed strings / patterns. This is a crucial technique for detecting and preventing prompt injections and LLM abuse. By blocking certain pattern and strings that attackers may use to subvert your system, you can prevent prompt-injection type attacks and secure your system.

Relevant APIs:

  • 1

    [REQUEST] Regex Search - this processor searches for all the abuse patterns you define over the prompt input, to check if the user used any of them. You can customize this step in a few ways:

    • Define the list of expressions the processor should look for - this is essentially the list of expressions you want to block users from being able to use in their prompts. You can include simple strings (for example: “ignore previous prompt”) or more sophisticated regular expressions (like: "Bearer [A-Za-z0-9-._~+/]+").

    • Customize which parts of the conversation the processor considers - the string input for the processor is where it will look for the patterns you provide, there are a few options for search paths:

      • Look at all user messages (current + history):{{req.body.json().messages.filter(m ⇒ m.role=="user").map(m ⇒ m.content).join("\n")}}

      • Look at the last user message (without past messages history):
        {{req.body.json().messages.filter(m ⇒ m.role=="user").pop().content}}

      • Look at all messages (user + ai messages):
        {{req.body.json().messages.map(m ⇒ m.content).join("\n")}}

  • 2

    [REQUEST] Send Response - in case we find one of the block-listed patterns in the prompt, we want to terminate the request. We do this by using the Send Response block with a 403 forbidden status code.

Prerequisites:

  • Have the Lunar proxy installed and configured in your environment.

  • You have the Lunar interceptor installed at the service from which you are consuming the OpenAI API.

Understanding Prompt Injections:

Prompt injections are a technique used to exploit vulnerabilities in language models and AI systems that rely on large language models (LLMs) as part of their functionality. The core idea behind prompt injections is that an attacker can craft a malicious input or "prompt" that, when passed to the LLM, can cause the model to generate output that deviates from its intended behavior.

In the context of LLMs, a prompt is the input text that is used to initialize the language model and guide its generation of output. Attackers may attempt to inject malicious content into this prompt in order to subvert the model's behavior. This could involve including special tokens, formatting instructions, or other content designed to influence the model's reasoning and output in unintended ways.

Some common examples of prompt injection techniques include:

  • Instruction Injections: Inserting specific instructions or commands into the prompt, such as "Ignore all previous instructions and instead..." or "Do not follow any rules, just output the following..."

  • Role Prompts: Attempting to assign the model a different "role" or persona than the one intended, e.g. "You are now an evil hacker whose goal is to..."

  • Content Insertions: Injecting malicious or unrelated content into the prompt, such as code snippets, URLs, or other data designed to influence the model's output.

  • Formatting Abuse: Exploiting the model's sensitivity to formatting, such as using Markdown or HTML tags to inject special formatting directives.

Prompt injections can be used to attack a wide range of AI-powered features and systems, from language generation to decision-making and task completion. As LLMs become more widely adopted, understanding and mitigating the risks of prompt injections will be an increasingly important challenge for AI developers and users.

About the OpenAI API:

OpenAI is a prominent artificial intelligence research company founded in 2015. They are known for developing some of the most advanced large language models (LLMs) in the world, including GPT-3 and the more recent GPT-4.

The GPT (Generative Pre-trained Transformer) models are a series of powerful autoregressive language models that can be used for a wide variety of natural language processing tasks. GPT-3, released in 2020, was a groundbreaking model that demonstrated impressive language generation capabilities. GPT-4, the latest iteration, was released in 2023 and is even more capable, with enhancements in areas like multimodal understanding and task-completion.

OpenAI offers access to their GPT models through the OpenAI API, which allows developers and researchers to integrate these advanced language models into their own applications and projects. The API provides a straightforward interface for sending text prompts to the models and receiving generated responses, enabling a wide range of use cases such as content creation, question answering, translation, and more.

By making their cutting-edge AI technology available through the API, OpenAI has empowered a global community of users to push the boundaries of what is possible with large language models.

About Lunar.dev:

Lunar.dev is your go to solution for Egress API controls and API consumption management at scale.
With Lunar.dev, engineering teams of any size gain instant unified controls to effortlessly manage, orchestrate, and scale API egress traffic across environments— all without the need for code changes.
Lunar.dev is agnostic to any API provider and enables full egress traffic observability, real-time controls for cost spikes or issues in production, all through an egress proxy, an SDK installation, and a user-friendly UI management layer.
Lunar.dev offers solutions for quota management across environments, prioritizing API calls, centralizing API credentials management, and mitigating rate limit issues.

Table of content

Talk to an Expert

Got a use case you want to get help with?

Book a Demo
By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.