On-Premises AI for Sovereign Network Analysis

An AI assistant for network intelligence that runs entirely inside the customer environment. No topology data, configurations, or operational context leaves the site. The assistant works with local models grounded in a live digital twin, log data, and other systems, retrieving the data needed for each task.

Why on-premises matters for network AI

Network topologies, running configurations, host inventories, naming conventions, and change history all carry security relevance. In critical or regulated environments, sending this data to external AI services is often not an option. A practical AI assistant therefore needs to run locally with targeted tool access, not blanket data upload to a cloud.

  • Sovereignty first: all inference stays on site, no cloud dependency required.
  • Grounding over guessing: answers are built from the digital twin, log data, and structured tooling, not generic training data.
  • Open and extensible: new sources such as logs, monitoring, or ticketing can be added without replacing the core stack.
Conference presentation

BSI IT-Sicherheitskongress · April 15–16, 2025

Architecture and design rationale presented as a conference paper and talk at the German Federal Office for Information Security (BSI).

Dr. Tim Senn BSI IT-Sicherheitskongress Conference program → Download paper (coming soon)

Architecture overview

Four layers make up the stack. The assistant orchestrates tool calls through MCP, keeping the local model focused on the data that matters for each question. It uses tested open-source LLM models.

Narrowin Assistant

User-facing interface and orchestration layer: chat, web client, API access, AI agent, context manager.

Chat Interface
Natural language queries
Web Client
Browser-based UI
API Access
Integration
AI Agent
Query orchestration
Context Manager
State handling
On-Premises LLM Handler

Inference engine and local model routing: Ollama, load balancer, Qwen 3, GLM-4.

Ollama
Local inference
Load Balancer
Request distribution
Model Router
Task-based selection
Qwen 3
Primary reasoning
GLM-4
Multi-task
Custom
Fine-tuned
Dual-Node Cluster · On-Premises Deployment
Model Context Protocol (MCP)

Tool layer connecting models to data sources: 14 active MCP tools across network and log analytics, plus planned extensions for monitoring and automation.

Network Explorer MCP · Active
get_devices
Device queries
analyze_network
Topology & STP analysis
extract_config
Config sections
detect_changes
Change detection
assess_network
Security & reliability report
get_network_stats
Statistics overview
Log Analytics MCP · Active
query
Log queries
hits
Log volume over time
stats_query
Aggregated statistics
streams
Active log streams
facets
Field value distribution
field_names
Discover log fields
Additional MCPs · Flexible Extension
Monitoring MCP
Metrics / Alerting
Ansible MCP
Config automation
Ticketing MCP
Incident management
Docs MCP
Knowledge base
Data Sources

Live digital twin data, log analytics engine, custom resources, and other integrations (monitoring, automation, ITSM).

Network Explorer Digital Twin

Topology
Devices & neighbors
Configs
Running configurations
Snapshots
Historical states
Hosts
End devices & MACs
VLANs
Segmentation data
Routing
Routes & protocols

Custom Resources

Design Handbook
Network standards
Templates
Device Configs
Naming Convention
Hostnames & VLANs
IP Address Plan
Subnet allocation

Log Analytics Engine

Syslog
UDP/TCP receiver
Log Storage
Indexed log data
Query Engine
Full-text search
Streams
Per-device log streams

Other Integrations

Monitoring System
Metrics & alerts
SIEM
Security events
IDS / IPS
Intrusion detection
Ansible
Automation
ServiceNow
ITSM

What you can ask

The assistant handles project-grade analysis and day-to-day troubleshooting through natural language queries against live network and log data.

Security & Compliance
> "Do an overall security assessment"
> "Review the segmentation of my network"
> "Check port-security config across all switches"
Topology & Config
> "What devices are on the same VLAN as 10.1.5.22?"
> "What changed in the last 24 hours?"
> "Compare configs with last month's snapshot"
Log Analytics
> "Show me error logs from the last hour"
> "Which devices are logging the most errors?"
> "Correlate OSPF neighbor flaps with config changes"
Narrowin Assistant
Show me STP topology changes in the logs
query detect_changes
3 STP topology changes detected in the last 24 hours across 2 devices. WSH-01 saw 2 root bridge elections on VLAN 10 and VLAN 20, correlated with a port flap on ether5. WSJ-03 had 1 topology change on VLAN 30.

View affected devices in Explorer

The system in practice

Natural-language assistant interface

Network and SOC teams interact with the assistant through a chat-based interface. It handles security assessments, configuration checks, and topology questions.

AI assistant interface - security assessment
Security assessment interaction in the assistant.
Network and security troubleshooting

Beyond project assessments, the assistant also supports operational workflows, providing context for network troubleshooting and root cause analysis as well as security incident handling and forensics.

Network troubleshooting workflow
Troubleshooting and incident response in the assistant.
On-premises deployment hardware

The inference stack runs on compact, local hardware, no cloud required. A dual-node cluster handles load balancing and model routing.

On-premises AI hardware setup
Dual-node on-premises deployment hardware.
Connect to your stack

The assistant is grounded in real network data, provided for example by our Network Explorer or any other tool from your stack. Combine CMDB with logs and topology data.

Connect to your infrastructure stack
Network Explorer as one of many possible data sources.

Key capabilities

  • AI that knows your network: the assistant is not generic chat but is grounded in a live digital twin with real topology, configuration, and host data.
  • Two practical modes: project-grade security assessments and day-to-day operational troubleshooting.
  • Extensible through MCP: logs, monitoring, automation, and ticketing systems can be integrated without replacing the core stack.
  • No cloud dependency: all inference and orchestration runs inside the customer environment.

Interested in a pilot or architecture walkthrough?

We can show how this architecture works with real network data, demo the MCP tooling, and discuss deployment options for your environment.

Get in touch

Frequently asked questions about on-premises AI architecture


We have tested numerous models and variants and can recommend Qwen 3 and GLM-4 for production use. Both excel at structured outputs and reliable tool-calling, which is critical for interacting with the network digital twin. NVIDIA Nemotron is also showing promising results.

The AI stack runs on compact, high-performance mini PCs with integrated AI accelerators and a dedicated GPU – energy-efficient devices that can be deployed alongside existing infrastructure. Apple Mac Studio with M-series chips is also an interesting option thanks to its high unified memory capacity.

No network data leaves the site. The LLM operates on a live digital twin (topology, configurations, logs) via MCP. There is no training on customer data and no API calls to external services. Sensitive information never reaches external APIs, eliminating the risk of unintentional disclosure.

Data sovereignty and privacy: All data remains entirely within the organization's control, mitigating risks associated with third-party breaches and data leakage. Compliance with regulations like GDPR and NIS2 is easier to achieve.

Infrastructure control: Organizations manage the entire stack – hardware (GPUs), networking, and software updates – themselves.

Air-gapped capability: For extreme security requirements, models can operate in fully air-gapped environments – completely disconnected from the internet – ensuring zero external data transfer.

Protection of intellectual property: Proprietary models and RAG data (Retrieval-Augmented Generation) remain in-house and are protected from unauthorized access.

Currently, the system works with configuration and operational data from network devices, log data, IPAM data, and other sources. The MCP-based architecture makes it straightforward to integrate additional systems that provide an API, for example monitoring platforms, ticketing systems, or automation solutions.

The LLM is grounded on real device data via MCP tools: it queries configurations, topology, and logs rather than generating answers from training data. Responses include traceable evidence with concrete device names, ports, and timestamps.

Yes, CPU-only inference is possible with quantized models (GGUF/Q4), but response times increase significantly. For production use, a GPU or a system with an integrated AI accelerator is recommended.