- Competition invites teams to exploit vulnerabilities within a simulated email client integrated with a large language model through prompt injection attacks.
- Objective is to coerce the system into executing unintended commands, potentially leading to data leaks or other malicious activities.
- Unlike conventional hacking, participants in this challenge can only see the email content they design, unaware of how the LLMail service will interpret it, thereby creating a unique and intricate game of deception.
Microsoft, along with the Institute of Science and Technology Australia and ETH Zurich, has launched the LLMail-Inject challenge. The competition invites teams to exploit vulnerabilities within a simulated email client integrated with a large language model (LLM) through prompt injection attacks.
With a total prize pool of $10,000, the challenge aims to draw attention to the emerging threats posed by AI-driven applications, as well as to enhance the security measures surrounding their use.
The LLMail service, while not functioning in the real world, provides a realistic environment where participants assume the role of attackers seeking to manipulate LLM responses. The objective is to coerce the system into executing unintended commands, potentially leading to data leaks or other malicious activities.
Unlike conventional hacking, participants in this challenge can only see the email content they design, unaware of how the LLMail service will interpret it, thereby creating a unique and intricate game of deception.
The proliferation of LLMs in various applications—from email clients to job screening tools—has brought with it an array of security concerns. The increasing dependence on these models for user interaction necessitates robust defenses against potential exploits.
Microsoft’s prior experiences
Microsoft’s prior experiences, particularly regarding vulnerabilities in its Copilot assistant, underline the stakes involved. Attacks that utilised prompt injection to breach user data mandated swift corrective actions, emphasising the urgency of understanding these risks.
The challenge introduces multiple layers of security defenses, such as Spotlighting, PromptShield, and LLM-as-a-judge, each designed to thwart prompt injection attempts.
For example, Spotlighting utilises special delimiters to distinguish between data and instructions, while LLM-as-a-judge tests prompt integrity based on the model’s internal comprehension rather than solely on pre-defined classifiers.
The multi-faceted approach presents a formidable test for participants, as they are required to devise sophisticated strategies to circumvent these defenses.
Moreover, the opportunity to engage in this open contest enables participants to contribute to the broader discourse on AI safety and security. With the escalating integration of LLMs in sensitive applications, such challenges are essential for fostering a culture of proactive safeguarding against malicious actions.
To participate, sign into the official challenge website using a GitHub account, and create a team (ranging from one to five members). The contest opens at 1100 UTC on December 9 and ends at 1159 UTC on January 20.