17th Annual
17th Annual
17th Annual



CSAW’25
LLM CTF Attack Competition
US-Canada, MENA, India

It’s time to think a little differently about the capabilities of generative AI.
With the rising popularity of large language models (LLMs), the capabilities of new models include identifying software vulnerabilities and generating code to exploit them. Capture the Flag (CTF) events are cybersecurity competitions where players solve challenges to identify vulnerabilities and reveal 'flags' to score points.
Your job in this competition is to use generative autonomous AI to solve CTF challenges. An autonomous framework will follow your prompts and, powered by the LLM, autonomously perform steps to Capture the Flag (i.e. no human interaction!). For this competition, you can either bring your own autonomous framework (a.k.a. agent) to the table, or make feature enhancements to a provided baseline agent. We will offer one baseline agent and provide technical support.
Large language models such as ChatGPT, Claude, and other open-source models will help your agent in navigating these challenges. The LLM CTF Attack Competition challenges will be drawn from previous CTF competitions and will include common categories (pwn, web, rev, forensics, misc.).
A successful submission will include:
-
All the prompts and responses from the language model – this is typically provided in an agent’s transcripts/trajectories/logs output or your conversation history.
-
A brief write-up that details of your strategies, any formats are accepted as far as your idea is clearly addressed. That will contribute to your presentation quality points.
timeline
US-CANADA
Registration Deadline
TBD (1 week before the competition start)
Competition Start
TBD (14 days before the final presentation)
Competition End
TBD (start time of presentation)
Final Presentations & Award Ceremony
TBD (Same day of CSAW 2025 ends, presentation time TBD)
MENA
Registration Deadline
TBD (International teams with the UAE visa required - 3 weeks before the final presentation)
TBD (UAE Teams, International Teams with no UAE visa required - 1 week before the competition start)
Competition Start
TBD (14 days before the final presentation)
Competition End
TBD (start time of presentation)
Final Presentations & Award Ceremony
TBD (Same day of CSAW 2025 ends, presentation time TBD)
US-CANADA
Registration Deadline
TBD (1 week before the competition start)
Competition Start
TBD (14 days before the final presentation)
Competition End
TBD (start time of presentation)
Final Presentations & Award Ceremony
TBD (Same day of CSAW 2025 ends, presentation time TBD)
RULES
-
Team Participation: Teams of up to 3 people are allowed. Individual participation is also possible, but teamwork is highly recommended.
-
Challenges: For ease of agent ingestion, this year’s challenges will also be available in a database format compatible with being downloaded and loaded into the autonomous baseline framework and custom agents. While the true flag of the challenges are provided in metadata .json files, if you choose to use an autonomous agent, the agent must validate if the flag it obtains matches the actual flag. Points will be assigned based on the trajectory/transcription/conversation submissions matching the correct flag.
-
Solution: Participants can analyze the challenges on their own, but the final solution must be generated by a LLM system. Participants are allowed to use any LLM or AI systems as long as the no flags or manual write-ups/solutions were involved. We provide two possible approaches that were commonly used in the previous year's LLM CTF Attack Competition. Competitors are free to choose their approach per challenge.
-
Human-In-The-Loop (HITL): Participants are allowed to interact with large language models (LLMs) by providing hints or refining inputs through prompt engineering techniques. Competitors can actively guide AI systems—such as ChatGPT, Claude, and Gemini via their web interfaces or local deployed backends—to solve challenges.
-
Autonomous Framework: Participants are allowed to use any existing autonomous frameworks such as crewAI (https://github.com/crewAIInc/crewAI) available to solve challenges automatically. Or building up their own autonomous frameworks to solve challenges. Participants should bring their tokens or model deployment to use it in the autonomous framework. Challenges solved by this approach will receive a bonus in Creativity points.
-
Competitors can bring their own approaches such as a hybrid approach of an agentic framework with Human-In-The-Loop in addition to the two suggested above, as long as it did not involve manual solution. Good approaches will also receive a bonus in Creativity points.
-
-
Report and Submissions: For each solved CTF challenge, participants must generate a solution report that includes:
-
The flag obtained by the agent matching the provided correct flag
A human-readable format of the agent’s solution process such as a trajectory, log file or conversation history. -
If your solution is using Human-In-The-Loop or your own approach, you must submit the conversation history with all original conversation contents including a proof of flag captured and briefly describe your thought and strategy used to solve this challenge.
-
If your solution is using autonomous frameworks, you must submit the generated transcript or trajectory as-is including agent thoughts, actions and observations without any manual tampering.
-
All code and data used during the competition should be uploaded to a git repository as a part of final submission. If competitors are using their own customized models, the model weights and the loading code should also be shared publicly or privately with organizers.
-
Teams should also prepare a final presentation report detailing the approach used, such as their prompting techniques, model usage, tool usage etc. Teams may include in their report how specific challenges affected their agent engineering efforts.
-
-
Generative AI Requirements:
-
Human-In-The-Loop (HITL) and customized approaches: Participants can use any of the generative AI systems that available to them (using multiple AI systems is even encouraged) as long as they can provide the name of the AI systems they used in the report, all the solution steps must come from AI system.
-
Autonomous Framework: Participants can use any autonomous frameworks available or create their own agents from scratch. Teams using their originally developed agents will receive a bonus score, as outlined in the Scoring Rubrics section. These frameworks allow the installation of cyber tools in real-time or can be customized with pre-installed cyber tools such as apk2jar, apktool, Ghidra, hopper, Burp Suite, and wireshark. The autonomous framework must be fully automated, with no human-in-the-loop interaction.
-
For all approaches, the prompts used to guide the AI may be crafted by humans, but they must not contain direct solutions to any of the challenges. Failure to comply with this rule will result in the submission being considered invalid, and a subjective penalty will be applied. All other approaches such as model selection, tool usage, framework workflow etc. are open to the contestants.
-
Judging
100 points in total, the final grade would be the weighted sum of all the judging criteria
100 points in total, the final grade would be the weighted sum of all the judging criteria
-
Challenge Solved (30%): The number of CTF challenges solved by the participants, based on the score of each puzzle. The final score would be the percentage number of challenge solved in the provided challenge database * 30%
-
Creativity (50%): The methods used for finding the vulnerabilities and solving the challenges. Key creativity criteria shows below:
-
Prompting technique: How innovative the prompts are to solve challenges
-
Solution workflow: How innovative the workflows are to solve the challenge, that includes the tool usage, AI system usage etc.
-
Autonomous Framework and customized approaches: Any attempts of using autonomous framework or customized approaches will provide extra bonus up from 0% to 25% as long as the detailed approach and all related trajectories were provided, even if these approaches did not solve any challenge. The extra bonus depends on the approaches used. Such as building up your own agents or adding innovative features to any existing framework, and trying unique approaches are all vectors for evaluation.
-
Ultimately, be sure to include a summary about how the puzzle was solved by the LLM.
-
-
Presentation Quality (20%): The quality of the final presentation. It should use the same approach that was suggested by the generative large language model you used. The presentation can be in the form of a recorded video or live demonstration, and contestants should use slides to present their findings and thoughts for the final presentation as the reference of grading.
-
Penalty items (subjective deduction for each rule violation): The final solution must be provided by the automation framework with prompt engineering techniques, even if the participants come up with the proper solutions by themselves. Penalty items will be applied if the final solution does not come from the generative AI, even if participants find the correct solution independently. No points will be awarded for this challenge when participants use online write-ups and source code to form or train the agent.
Awards
Finalists teams from USA/Canada will be invited to join CSAW CTF Finals New York in November. Details will be announced soon.