System Design Deep Dive
This section provides a technical deep dive into the internal workings of CIG, focusing on graph representation, API design, the semantic layer, and the dashboard's live/demo graph surface.
Graph Modeling & Schema
At the heart of CIG is a Neo4j graph database. CIG treats infrastructure as a set of interconnected nodes with scope metadata so the same engine can serve live, demo, managed, and self-hosted modes.
Node Categories
- Identity:
User,Group,Role,Policy. - Compute:
EC2Instance,LambdaFunction,ECSCluster,ECSService. - Network:
VPC,Subnet,SecurityGroup,LoadBalancer,InternetGateway. - Storage:
S3Bucket,RDSInstance,DynamoDBTable.
Relationship Types
HAS_PERMISSION: Link between Identity and Resource.MEMBER_OF: Link between User and Group/Role.CONNECTS_TO: Network-level connection between resources.DEPLOYS_TO: Link between Service and Cluster/VPC.CONTAINS: Hierarchy link (e.g., VPC contains Subnet).
Graph Synergy
By mapping these relationships, CIG can answer complex security and operational questions using Cypher queries:
MATCH (u:User)-[:HAS_PERMISSION]->(r:Role)-[:HAS_PERMISSION]->(s:S3Bucket)
WHERE s.is_public = true
RETURN u.name, s.name
API Layer Implementation
The API (@cig/api) is built using Fastify for its high performance and low overhead.
Dual-Interface Strategy
- Fastify REST: Handles resource management, graph snapshots, demo provisioning, discovery status, authentication flows, and health checks.
- Graph Query Endpoints: Provide read-only graph queries plus a constrained refinement flow for approved writes.
WebSocket Hub
A @fastify/websocket implementation allows for:
- Real-time progress updates during discovery jobs.
- Streaming responses from AI agents and node status updates.
- Live metrics visualization.
Agentic Intelligence Layer
CIG utilizes a Retrieval-Augmented Generation (RAG) approach to make infrastructure data accessible.
Reasoning Workflow
- Natural Language Query: The user asks "Are there any public buckets with sensitive data?".
- Intent Recognition: The agent identifies the need for a graph traversal.
- Cypher Tool Execution: The agent generates a read query or a refinement proposal against Neo4j.
- Context Augmentation: The results are combined with semantic retrieval from Chroma and actual graph scope data.
- Synthesized Answer: The final response is delivered via the Dashboard or CLI.
Live and Demo Graph Sources
The Dashboard and API can operate against two graph sources:
liveuses the real discovery-backed graph for managed or self-hosted environments.demouses the shared seeded demo workspace and its own semantic namespace.
The selected source is carried through:
- graph snapshots
- resource search
- chat context
- semantic retrieval
- demo provisioning
This keeps the UI and AI responses anchored to the same source of truth.
Security & Isolation
CIG is designed for self-hosting with a "Privacy First" approach:
- JWT session management: All requests are authenticated via
@cig/auth. - RBAC: Fine-grained access control at the API level.
- Scoped graph data: Managed deployments scope graph and semantic data by tenant/workspace.
- Local Processing: Discovery data never leaves the self-hosted environment unless explicitly configured for external LLM processing.