An AI assistant for network intelligence that runs entirely inside the customer environment. No topology data, configurations, or operational context leaves the site. The assistant works with local models grounded in a live digital twin, log data, and other systems, retrieving the data needed for each task.
Network topologies, running configurations, host inventories, naming conventions, and change history all carry security relevance. In critical or regulated environments, sending this data to external AI services is often not an option. A practical AI assistant therefore needs to run locally with targeted tool access, not blanket data upload to a cloud.
Architecture and design rationale presented as a conference paper and talk at the German Federal Office for Information Security (BSI).
Four layers make up the stack. The assistant orchestrates tool calls through MCP, keeping the local model focused on the data that matters for each question. It uses tested open-source LLM models.
User-facing interface and orchestration layer: chat, web client, API access, AI agent, context manager.
Inference engine and local model routing: Ollama, load balancer, Qwen 3, GLM-4.
Tool layer connecting models to data sources: 14 active MCP tools across network and log analytics, plus planned extensions for monitoring and automation.
Live digital twin data, log analytics engine, custom resources, and other integrations (monitoring, automation, ITSM).
The assistant handles project-grade analysis and day-to-day troubleshooting through natural language queries against live network and log data.
Network and SOC teams interact with the assistant through a chat-based interface. It handles security assessments, configuration checks, and topology questions.
Beyond project assessments, the assistant also supports operational workflows, providing context for network troubleshooting and root cause analysis as well as security incident handling and forensics.
We can show how this architecture works with real network data, demo the MCP tooling, and discuss deployment options for your environment.
Get in touchWe have tested numerous models and variants and can recommend Qwen 3 and GLM-4 for production use. Both excel at structured outputs and reliable tool-calling, which is critical for interacting with the network digital twin. NVIDIA Nemotron is also showing promising results.
The AI stack runs on compact, high-performance mini PCs with integrated AI accelerators and a dedicated GPU – energy-efficient devices that can be deployed alongside existing infrastructure. Apple Mac Studio with M-series chips is also an interesting option thanks to its high unified memory capacity.
No network data leaves the site. The LLM operates on a live digital twin (topology, configurations, logs) via MCP. There is no training on customer data and no API calls to external services. Sensitive information never reaches external APIs, eliminating the risk of unintentional disclosure.
Data sovereignty and privacy: All data remains entirely within the organization's control, mitigating risks associated with third-party breaches and data leakage. Compliance with regulations like GDPR and NIS2 is easier to achieve.
Infrastructure control: Organizations manage the entire stack – hardware (GPUs), networking, and software updates – themselves.
Air-gapped capability: For extreme security requirements, models can operate in fully air-gapped environments – completely disconnected from the internet – ensuring zero external data transfer.
Protection of intellectual property: Proprietary models and RAG data (Retrieval-Augmented Generation) remain in-house and are protected from unauthorized access.
Currently, the system works with configuration and operational data from network devices, log data, IPAM data, and other sources. The MCP-based architecture makes it straightforward to integrate additional systems that provide an API, for example monitoring platforms, ticketing systems, or automation solutions.
The LLM is grounded on real device data via MCP tools: it queries configurations, topology, and logs rather than generating answers from training data. Responses include traceable evidence with concrete device names, ports, and timestamps.
Yes, CPU-only inference is possible with quantized models (GGUF/Q4), but response times increase significantly. For production use, a GPU or a system with an integrated AI accelerator is recommended.