top of page

LLM CTF  Competition
US-Canada, MENA, India

Snipaste_2023-10-01_16-34-40_edited.jpg
It’s time to think a little differently about the capabilities of generative AI.

With the rise of large language models (LLMs), AI systems are now capable of identifying software vulnerabilities and generating exploit code—skills that align closely with the goals of Capture the Flag (CTF) competitions, where participants solve security challenges to uncover hidden “flags.”

 

In this competition, your task is to build your own agentic AI to solve CTF challenges autonomously—that is, create or extend an AI agent powered by LLMs to analyze and exploit challenges without human intervention. You may either bring your own agent framework or enhance a provided baseline agent, with technical support available. Your agent can leverage LLMs, either API-based models such as GPT, Claude and Gemini, or open-source models deployed locally to navigate common CTF categories such as crypto, forensics, pwn, reverse, web, and misc.

 

A successful submission includes the full logs of prompts and model responses, along with a brief write-up describing your prompting strategies, agent enhancements, and system design choices.

challenge

This competition uses the NYU CTF Lite, a streamlined benchmark of 50 challenges spanning six categories, adapted from the original NYU CTF Bench. To support easy integration with LLM-based agents, all challenges are provided in the standardized NYU CTF Bench format, fully compatible with the nyuctf pypi package for loading and interacting with autonomous agent frameworks.

While the true flags are included in accompanying metadata .json files, agents must independently solve each challenge and verify that the extracted flag matches the ground truth—no hardcoded answers allowed. A baseline agent system is provided in this repository, allowing competitors to build upon it with their own enhancements.

timeline

This competition is open to the public and will run until all the 50 NYU CTF challenges are solved. 

No registration is required. The first submission with a valid and verifiable team information including team members’ name and contact email will be registered for the competition.

Start Date
July 01, 2025*
Monthly Leaderboard Update
30th of every month*
Finalists Cut off
October 08, 2025*
Finalists Announcement
October 15, 2025*
* Eastern Standard Time

RULES

Rules
  • Team Participation: Teams of up to 3 people are allowed. Individual participation is also possible, but teamwork is highly recommended.

  • ​Agentic Framework: Participants are encouraged to analyze the general patterns of the challenges to optimize their agentic systems to make it specific for CTF automation; however, the final solutions must be generated entirely by an autonomous, LLM-powered agent, with no human-in-the-loop during execution. Participants are allowed and encouraged to use any techniques applicable to building effective agentic AI systems, including but not limited to prompt engineering, multi-agent, tool-augmented reasoning, and retrieval-augmented generation (RAG). These techniques may be applied broadly or tailored to specific challenge categories, but must remain generalizable—challenge-specific hints or hardcoded solutions are strictly prohibited. All the prompts used for challenge solutions must not include direct solutions from human players from any source; each solution that violates this rule will not be counted as solved. Participants must supply their own API tokens or model deployments for use within their autonomous frameworks. Any agentic framework may be used—including doing enhancements on open-source agentic frameworks, or custom-building systems from scratch. These frameworks must support full automation and may integrate real-time or pre-installed cybersecurity tools such as apk2jar, apktool, Ghidra, Hopper, Burp Suite, and Wireshark. All aspects of model selection, tool configuration, and system design are open-ended and left to the discretion of the participants.

  • Model Requirements: Participants are free to use any language model architecture for their agentic systems, including models accessed via API service providers (e.g., OpenAI, Anthropic), self-hosted open-source models (e.g., LLaMA, Qwen), or custom fine-tuned variants. There are no restrictions on model size, origin, or hosting setup. However, all models must be free of contamination, meaning they must not have been trained on or contain leaked solutions or flags from the competition dataset. Any evidence of flag leakage or training contamination will result in disqualification.

  • Evaluation: will be based entirely on the number of challenges successfully solved by the autonomous agent. Each correctly solved challenge contributes to the team’s final score, with no partial credit. The accuracy of the extracted flag, as verified against the ground truth, is the sole criterion for success.

  • Submissions: For each solved CTF challenge, participants must submit the full trajectory generated by their autonomous agent, including the agent’s thoughts, actions, observations, and the final flag, in a machine-loadable format (e.g., JSON or structured log). The extracted flag must exactly match the ground-truth flag provided in the metadata. Manual editing or tampering of the agent outputs is strictly prohibited and will result in disqualification. In addition, participants must provide a well-documented Git repository containing the complete codebase of their agentic framework. Open-source is encouraged, but a private repository shared with the organizer is also doable. This repository should include all dependencies, configurations, and tools used, along with detailed technical documentation outlining the participant’s approach—such as prompting techniques, model usage, agent architecture, tool integration, and any other implementation details. If a custom or fine-tuned model is used, training code and model weights should also be provided for validation. Submission portal will be open on July 1st.

  • API Keys and Data: All competitors may request API keys from OpenAI, Anthropic, and Gemini from the organizer, with an initial combined budget of up to $100 in credits every month during the competition. This budget may be extended as needed. Requesting API keys will automatically register participants for the competition. All competitors are required to open-source the code and data used in their submissions. Details about API Keys will be announced soon.

Judging

Judging Criteria

100 points in total, the final grade would be the weighted sum of all the judging criteria

  • Challenge Solved (50%): The number of CTF challenges solved by the participants, based on the score of each puzzle.

  • Creativity (30%): The methods used for finding the vulnerabilities and solving the challenges. Adding innovative features to the framework, and trying unique approaches are all vectors for evaluation. Ultimately, be sure to include a summary about how the puzzle was solved by the LLM.  Using your own agent instead of the agent provided in the competition will give contestants a bonus under that judging criteria.

  • Presentation Quality (20% – 10% for writeups, 10% for final presentation): The quality of the final presentation. It should use the same approach that was suggested by the generative large language model you used. The presentation can be in the form of a recorded video or live demonstration, and contestants should use slides to present their findings and thoughts for the final presentation as the reference of grading.

  • Penalty items (deduction of 10% of the challenge score for each rule violation): The final solution must be provided by the automation framework with prompt engineering techniques, even if the participants come up with the proper solutions by themselves. Penalty items will be applied if the final solution does not come from the generative AI, even if participants find the correct solution independently. No points will be awarded for this challenge when participants use online writeups and source code to form or train the agent.

2025 competition organizers

Awards 

Finalists teams from USA/Canada will be invited to join CSAW CTF Finals New York in November. Details will be announced soon.

Awards
Winners

New York University
 

Equal Opportunity and Non-Discrimination at NYU - New York University is committed to maintaining an environment that encourages and fosters respect for individual values and appropriate conduct among all persons. In all University spaces—physical and digital—programming, activities, and events are carried out in accordance with applicable law as well as University policy, which includes but is not limited to its Non-Discrimination and Anti-Harassment Policy.

Unless otherwise noted, all content copyright New York University. All rights reserved.

bottom of page