Go offline with the Player FM app!
CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729
Manage episode 479811676 series 2355587
Today, we're joined by Nidhi Rastogi, assistant professor at Rochester Institute of Technology to discuss Cyber Threat Intelligence (CTI), focusing on her recent project CTIBench—a benchmark for evaluating LLMs on real-world CTI tasks. Nidhi explains the evolution of AI in cybersecurity, from rule-based systems to LLMs that accelerate analysis by providing critical context for threat detection and defense. We dig into the advantages and challenges of using LLMs in CTI, how techniques like Retrieval-Augmented Generation (RAG) are essential for keeping LLMs up-to-date with emerging threats, and how CTIBench measures LLMs’ ability to perform a set of real-world tasks of the cybersecurity analyst. We unpack the process of building the benchmark, the tasks it covers, and key findings from benchmarking various LLMs. Finally, Nidhi shares the importance of benchmarks in exposing model limitations and blind spots, the challenges of large-scale benchmarking, and the future directions of her AI4Sec Research Lab, including developing reliable mitigation techniques, monitoring "concept drift" in threat detection models, improving explainability in cybersecurity, and more.
The complete show notes for this episode can be found at https://twimlai.com/go/729.
748 episodes
CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Manage episode 479811676 series 2355587
Today, we're joined by Nidhi Rastogi, assistant professor at Rochester Institute of Technology to discuss Cyber Threat Intelligence (CTI), focusing on her recent project CTIBench—a benchmark for evaluating LLMs on real-world CTI tasks. Nidhi explains the evolution of AI in cybersecurity, from rule-based systems to LLMs that accelerate analysis by providing critical context for threat detection and defense. We dig into the advantages and challenges of using LLMs in CTI, how techniques like Retrieval-Augmented Generation (RAG) are essential for keeping LLMs up-to-date with emerging threats, and how CTIBench measures LLMs’ ability to perform a set of real-world tasks of the cybersecurity analyst. We unpack the process of building the benchmark, the tasks it covers, and key findings from benchmarking various LLMs. Finally, Nidhi shares the importance of benchmarks in exposing model limitations and blind spots, the challenges of large-scale benchmarking, and the future directions of her AI4Sec Research Lab, including developing reliable mitigation techniques, monitoring "concept drift" in threat detection models, improving explainability in cybersecurity, and more.
The complete show notes for this episode can be found at https://twimlai.com/go/729.
748 episodes
All episodes
×
1 CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729 56:18

1 Generative Benchmarking with Kelly Hong - #728 54:17

1 Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727 1:34:06

1 Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726 51:45

1 Waymo's Foundation Model for Autonomous Driving with Drago Anguelov - #725 1:09:07

1 Dynamic Token Merging for Efficient Byte-level Language Models with Julie Kallini - #724 50:32

1 Scaling Up Test-Time Compute with Latent Reasoning with Jonas Geiping - #723 58:38

1 Imagine while Reasoning in Space: Multimodal Visualization-of-Thought with Chengzu Li - #722 42:11

1 Inside s1: An o1-Style Reasoning Model That Cost Under $50 to Train with Niklas Muennighoff - #721 49:29

1 Accelerating AI Training and Inference with AWS Trainium2 with Ron Diamant - #720 1:07:05

1 π0: A Foundation Model for Robotics with Sergey Levine - #719 52:30

1 AI Trends 2025: AI Agents and Multi-Agent Systems with Victor Dibia - #718 1:44:59

1 Speculative Decoding and Efficient LLM Inference with Chris Lott - #717 1:16:30

1 Ensuring Privacy for Any LLM with Patricia Thaine - #716 51:33

1 AI Engineering Pitfalls with Chip Huyen - #715 57:37
Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.