Milestone Systems has released a Vision Language Model (VLM) specialising in traffic understanding powered by NVIDIACosmos Reason.
The VLM is said to power two new products: A Video Summarization tool for XProtect Video Management Software and a VLM as a Service for third party integrations.
Video Summarization for XProtect reportedly allows users to search summaries from visual data and automate reporting.
According to Milestone Systems, today’s video systems capture vast amounts of data and reviewing footage remains time consuming and largely manual.
With Milestone Systems’ new Video Summarization tool users and operators can now reportedly rely on a specialised product that automates operator workflows, saves valuable time and reduces false alarm fatigue significantly.
Early reports are said to show that video summarisation could reduce operator false alarm fatigue by up to 30%.
The company has articulated that the Video Summarization tool analyses camera footage and describes what’s happening.
Users are said to be able to simply send a snippet of video and a prompt describing their request and the model will generate a text summary in seconds.
With Milestone’s Hafnia VLM as a Service (VLMaaS), developers, integrators and partners are said to get API access to production-ready video intelligence built on NVIDIA’s latest technology and fine-tuned on responsibly sourced data.
The VLMaaS reportedly helps developers create AI-powered solutions quickly without needing to set up, fine-tune or manage their own AI systems – it enhances any existing solutions with generative AI, regardless of the level of analytics currently in place.
Milestone Systems has said that this makes it fast and simple to add advanced video intelligence features to applications, whether it’s testing a minimum viable product (MVP) or scaling a platform.
With VLMaaS, the development of AI and analytics can reportedly be accelerated significantly – up to 70 times less effort than doing the work to fine-tune a VLM model to do the same.
Andrew Burnett, Acting Chief Technology Officer, Milestone Systems said: “With the Vision Language Model as a Service and Video Summarization for XProtect, we’re tackling some of the most challenging bottlenecks: Video overload and time-consuming manual work.
“Operators get immediate insight directly within XProtect; builders get API‑first access to production‑ready intelligence without bespoke training or heavy infrastructure.
“Because this model is specialised for real-world traffic video and fine-tuned on responsibly sourced data, customers can trust the results, deploy with confidence and enhance all existing solutions in place.
“It’s the fastest, most advanced and impactful path to turning video into actionable outcomes.”