In Part 1 (https://element61.be/en/resource/part-1-search-reasoning-why-your-ai-needs-brain-not-just-library ), we argued that most enterprise AI assistants are still excellent librarians, not true analysts.
Traditional RAG systems can search through documents, retrieve relevant fragments, and produce a fluent summary. That is already valuable. But it breaks down when the question is not simply "Where is this written?" but "What does this mean for the business?"
The difference becomes visible in questions like this:
How will the delay in the supplier shipment from Shanghai impact our Q3 production targets for the Alpha product line?
This is not a document search question. It is a reasoning question.
To answer it properly, the AI has to connect a chain of business facts:
Supplier -> Part -> Product -> Customer / Production Target
A human planner understands that a delayed supplier shipment can affect a component, that the component can affect a product, and that the product can affect a customer commitment or production target. A traditional RAG system often sees only the words: Shanghai, delay, Alpha, Q3.
So for Part 2, we built a small but realistic benchmark to test the difference.
The goal was not to create a perfect academic benchmark. The goal was more practical: create a self-contained example that business and technical teams can run in a Jupyter Notebook, inspect visually, and use to understand where Vector RAG ends and GraphRAG begins.
The benchmark: same question, same data, different retrieval strategy
The first rule of this benchmark was simple: compare apples with apples.
Both approaches receive the exact same raw user question:
How will the delay in the supplier shipment from Shanghai impact our Q3 production targets for the Alpha product line?
Both approaches are given the same underlying business facts. The difference is only how those facts are represented and retrieved.
For Traditional RAG, the facts are represented as text chunks:
- Supplier ApexChip is located in Shanghai.
- ApexChip ships the part Chipset X.
- Chipset X is a component of Laptop Alpha.
- Customer TechCorp purchased Laptop Alpha to meet Q3 targets.
- There is a severe delay in supplier shipments coming from Shanghai.
For GraphRAG, the same facts are represented as entities and relationships:
- ApexChip SHIPS Chipset X
- Chipset X is a COMPONENT OF Laptop Alpha
- TechCorp PURCHASED Laptop Alpha
We then added 1,000 background supply chain records. This creates the noise that exists in every real enterprise environment: other suppliers, other parts, other customers, other locations, and other products.
The notebook runs fully locally. There is no Azure AI Search, no Cosmos DB, no external LLM API, and no cloud dependency. It uses common Python libraries such as NetworkX, scikit-learn, pandas, and matplotlib. This keeps the example transparent and easy to reproduce.
Why the query layer matters
One important correction came up while building the benchmark.
It would be unfair to pass the full question to Vector Search, but pass only the word "Shanghai" to GraphRAG. That would give the graph an artificial advantage.
In a real GraphRAG system, the user still asks a normal natural-language question. The graph does not magically know where to start. It needs an entity extraction step first.
In production, this is often handled by an LLM. The model reads the question and identifies the relevant graph entry points, such as "Shanghai" or a supplier name. In our local notebook, we simulate that step with simple keyword matching, so the benchmark remains self-contained.
That gives us a fairer comparison:
- Vector RAG receives the raw natural-language prompt and searches for similar text.
- GraphRAG receives the same raw natural-language prompt, extracts the graph entry point, and then follows relationships.
This distinction is important for business leaders. GraphRAG is not just "better search". It is a different architecture. It separates language understanding from relationship traversal.
What Vector RAG found
In the notebook run, Vector Search returned the following top results:
[0.477] Doc 5: There is a severe delay in supplier shipments coming from Shanghai.
[0.468] Doc 4: Customer TechCorp PURCHASED the product Laptop Alpha to meet Q3 targets.
[0.382] Doc 1: Supplier ApexChip is located in Shanghai.
At first glance, this looks good.
The retrieved chunks mention Shanghai, the delay, TechCorp, Laptop Alpha, and Q3 targets. A chatbot using this context would probably produce a confident-sounding answer.
But the critical connecting fact is missing:
ApexChip ships Chipset X, and Chipset X is the component used in Laptop Alpha.
The vector search found relevant words, but it did not preserve the causal chain.
This is the central weakness of Traditional RAG in multi-hop business scenarios. It can retrieve fragments that look similar to the question, while missing the relationship that explains why the answer is true.
What GraphRAG found
GraphRAG returned a different type of output:
ApexChip --[SHIPS]--> Chipset X
Chipset X --[COMPONENT OF]--> Laptop Alpha
TechCorp --[PURCHASED]--> Laptop Alpha
This is not just a list of documents. It is an evidence path.
The graph starts from the extracted business entity, follows the relationship from supplier to component, then from component to product, and finally connects the product to the customer / Q3 target context.
This is what we called the "Detective's Wall" in Part 1. The value is not only that the AI retrieves information. The value is that it can show how the information is connected.
For executive use cases, that difference matters. A recommendation is only useful if the business can understand the reasoning behind it.
The benchmark result
The local notebook produced the following comparison:
|
Metric |
Traditional RAG (Vector Search) |
GraphRAG (Knowledge Graph) |
|
Input query |
Raw natural-language prompt |
Raw natural-language prompt |
|
Dataset size |
4,005 text chunks |
4,004 nodes / 3,003 edges |
|
Retrieval method |
Top 3 text chunks |
3-hop graph traversal |
|
Latency in local run |
0.01371 seconds |
0.00982 seconds |
|
Found root cause: Chipset X |
False |
True |
|
Required chain entities found |
3 / 4 |
4 / 4 |
|
Full multi-hop context maintained |
False |
True |
The most important result is not the milliseconds.
In a small local notebook, latency is illustrative only. In production, both vector search and graph traversal can be made very fast with the right platform. Vector search can scale extremely well across millions of embeddings. Graph traversal can also be highly performant when implemented in a specialised graph store such as Azure Cosmos DB with the Gremlin API.
The stronger result is the reasoning quality.
Traditional RAG retrieved relevant-looking information, but missed the root cause. GraphRAG recovered the full chain.
That is the difference between an AI assistant that can quote documents and an AI assistant that can support decisions.
What this means for business users
The benchmark demonstrates three lessons.
First, semantic similarity is not the same as causality. A vector database can find text that resembles the question. That does not mean it has found the business relationship that explains the answer.
Second, GraphRAG improves traceability. The graph can show the exact path behind the answer: supplier to component, component to product, product to customer or production target. This is how AI moves from a black box to a glass box.
Third, the ontology is the real asset. GraphRAG only works because the business has defined what the entities are and how they relate. Supplier, part, product, customer, shipment, inventory, and production plan are not just words. They are the blueprint of the business.
This is why ontology design should not be treated as a technical afterthought. It is a business exercise. It forces the organisation to define how decisions are actually made.
When Vector RAG is still the right answer
GraphRAG is not a replacement for every RAG use case.
If the question is document-centric, Traditional RAG is often the fastest and most cost-effective option. Examples include:
- Searching HR policies
- Summarising contracts
- Answering questions about manuals
- Finding clauses in procedures
- Retrieving knowledge-base articles
In those scenarios, the user usually needs a passage, a policy, or a summary. A library is enough.
GraphRAG becomes valuable when the question depends on connected business relationships:
- Supply chain impact analysis
- Fraud and compliance networks
- Customer-product-service dependencies
- Asset maintenance and root-cause analysis
- Financial exposure across legal entities
- Support-ticket trend analysis across products, customers, and regions
In these scenarios, the answer is not located in one document. It lives in the connections between facts.
From notebook to enterprise architecture
The notebook is deliberately simple. That is its strength. It lets teams see the mechanics without needing a full cloud deployment.
But the production architecture is different.
At element61, we typically separate the architecture into two layers:
The Brain: a graph store that contains business entities and relationships. In a Microsoft architecture, Azure Cosmos DB with the Gremlin API is a strong option because it can store and traverse graph relationships at enterprise scale.
The Factory: an AI orchestration layer where models, prompts, tools, agents, monitoring, and governance come together. Microsoft Foundry is the natural place to industrialise these AI workflows.
In production, the flow looks like this:
- The user asks a natural-language business question.
- An AI agent extracts the relevant entities and intent.
- The graph is queried to retrieve the relationship path.
- The model generates an answer grounded in that path.
- The answer includes traceable evidence so the business can audit the reasoning.
This is the path from a demo chatbot to a governed reasoning engine.
The verdict: the real metric is trust
Traditional RAG is excellent when the task is to find and summarise information. GraphRAG is different. It is designed for questions where relationships matter.
In our local benchmark, both systems received the same question and the same underlying facts. Vector RAG found relevant fragments, but missed the connecting component. GraphRAG followed the chain and exposed the full reasoning path.
That is the practical value of GraphRAG: not just better retrieval, but better explainability.
For C-level leaders, this is the key point. AI adoption does not stall because models cannot produce fluent answers. It stalls because decision-makers cannot always trust how those answers were produced.
GraphRAG helps close that trust gap. It allows AI to show its work.
In Part 1, we said enterprise AI needs a brain, not just a library.
In Part 2, the benchmark shows why: the library can find the pages, but the brain can follow the chain.
Ready to move from document search to relationship-aware AI?
Get in touch with us.