Managing discourse efficaciously is simply a captious situation erstwhile moving pinch ample connection models, particularly successful environments for illustration Google Colab, wherever assets constraints and agelong documents tin quickly transcend disposable token windows. In this tutorial, we guideline you done a applicable implementation of nan Model Context Protocol (MCP) by building a ModelContextManager that automatically chunks incoming text, generates semantic embeddings utilizing Sentence-Transformers, and scores each chunk based connected recency, importance, and relevance. You’ll study really to merge this head pinch a Hugging Face sequence-to-sequence model, demonstrated present pinch FLAN-T5, to add, optimize, and retrieve only nan astir pertinent pieces of context. Along nan way, we’ll screen token counting pinch a GPT-2 tokenizer, context-window optimization strategies, and interactive sessions that fto you query and visualize your move discourse successful existent time.
import torch
import numpy arsenic np
from typing import List, Dict, Any, Optional, Union, Tuple
from dataclasses import dataclass
import time
import gc
from tqdm.notebook import tqdm
We import basal libraries for building a move discourse manager: torch and numpy grip tensor and numerical operations, while typing and dataclasses supply system type annotations and information containers. Utility modules, specified arsenic clip and gc, support timestamping and representation cleanup, arsenic good arsenic tqdm.notebook offers interactive advancement bars for chunk processing successful Colab.
@dataclass
class ContextChunk:
"""A chunk of matter pinch metadata for nan Model Context Protocol."""
text: str
embedding: Optional[torch.Tensor] = None
importance: float = 1.0
timestamp: float = 0.0
metadata: Dict[str, Any] = None
def __post_init__(self):
if self.metadata is None:
self.metadata = {}
if self.timestamp == 0.0:
self.timestamp = time.time()
The ContextChunk dataclass encapsulates a azygous conception of matter on pinch its embedding, a user-assigned value score, a timestamp, and arbitrary metadata. Its __post_init__ method ensures that each chunk is stamped pinch nan existent clip upon creation and that metadata defaults to an quiet dictionary if nary is provided.
class ModelContextManager:
"""
Manager for implementing Model Context Protocol successful LLMs connected Google Colab.
Handles discourse model optimization, token management, and relevance scoring.
"""
def __init__(
self,
max_context_length: int = 8192,
embedding_model: str = "sentence-transformers/all-MiniLM-L6-v2",
relevance_threshold: float = 0.7,
recency_weight: float = 0.3,
importance_weight: float = 0.3,
semantic_weight: float = 0.4,
device: str = "cuda" if torch.cuda.is_available() other "cpu"
):
"""
Initialize nan Model Context Manager.
Args:
max_context_length: Maximum number of tokens successful discourse window
embedding_model: Model to usage for matter embeddings
relevance_threshold: Threshold for chunk relevance to beryllium included
recency_weight: Weight for recency successful relevance calculation
importance_weight: Weight for value successful relevance calculation
semantic_weight: Weight for semantic similarity successful relevance calculation
device: Device to tally computations on
"""
self.max_context_length = max_context_length
self.device = device
self.chunks = []
self.current_token_count = 0
self.relevance_threshold = relevance_threshold
self.recency_weight = recency_weight
self.importance_weight = importance_weight
self.semantic_weight = semantic_weight
try:
from sentence_transformers import SentenceTransformer
print(f"Loading embedding exemplary {embedding_model}...")
self.embedding_model = SentenceTransformer(embedding_model).to(self.device)
print(f"Embedding exemplary loaded successfully connected {self.device}")
isolated from ImportError:
print("Installing sentence-transformers...")
import subprocess
subprocess.run(["pip", "install", "sentence-transformers"])
from sentence_transformers import SentenceTransformer
self.embedding_model = SentenceTransformer(embedding_model).to(self.device)
print(f"Embedding exemplary loaded successfully connected {self.device}")
try:
from transformers import GPT2Tokenizer
self.tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
isolated from ImportError:
print("Installing transformers...")
import subprocess
subprocess.run(["pip", "install", "transformers"])
from transformers import GPT2Tokenizer
self.tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
def add_chunk(self, text: str, importance: float = 1.0, metadata: Dict[str, Any] = None) -> None:
"""
Add a caller chunk of matter to nan discourse manager.
Args:
text: The matter contented to add
importance: Importance people (0-1)
metadata: Additional metadata for nan chunk
"""
pinch torch.no_grad():
embedding = self.embedding_model.encode(text, convert_to_tensor=True)
chunk = ContextChunk(
text=text,
embedding=embedding,
importance=importance,
timestamp=time.time(),
metadata=metadata aliases {}
)
self.chunks.append(chunk)
self.current_token_count += len(self.tokenizer.encode(text))
if self.current_token_count > self.max_context_length:
self.optimize_context()
def optimize_context(self) -> None:
"""Optimize discourse by removing little applicable chunks to fresh wrong token limit."""
if not self.chunks:
return
print("Optimizing discourse window...")
scores = self.score_chunks()
sorted_indices = np.argsort(scores)[::-1]
new_chunks = []
new_token_count = 0
for idx successful sorted_indices:
chunk = self.chunks[idx]
chunk_tokens = len(self.tokenizer.encode(chunk.text))
if new_token_count + chunk_tokens <= self.max_context_length:
new_chunks.append(chunk)
new_token_count += chunk_tokens
else:
if scores[idx] > self.relevance_threshold * 1.5:
for i, included_chunk successful enumerate(new_chunks):
included_idx = sorted_indices[i]
if scores[included_idx] < self.relevance_threshold:
included_tokens = len(self.tokenizer.encode(included_chunk.text))
if new_token_count - included_tokens + chunk_tokens <= self.max_context_length:
new_chunks.remove(included_chunk)
new_token_count -= included_tokens
new_chunks.append(chunk)
new_token_count += chunk_tokens
break
removed_count = len(self.chunks) - len(new_chunks)
self.chunks = new_chunks
self.current_token_count = new_token_count
print(f"Context optimized: Removed {removed_count} chunks, {len(new_chunks)} remaining, utilizing {new_token_count}/{self.max_context_length} tokens")
gc.collect()
if torch.cuda.is_available():
torch.cuda.empty_cache()
def score_chunks(self, query: str = None) -> np.ndarray:
"""
Score chunks based connected recency, importance, and semantic relevance.
Args:
query: Optional query to cipher semantic relevance against
Returns:
Array of scores for each chunk
"""
if not self.chunks:
return np.array([])
current_time = time.time()
max_age = max(current_time - chunk.timestamp for chunk successful self.chunks) aliases 1.0
recency_scores = np.array([
1.0 - ((current_time - chunk.timestamp) / max_age)
for chunk successful self.chunks
])
importance_scores = np.array([chunk.importance for chunk successful self.chunks])
if query is not None:
query_embedding = self.embedding_model.encode(query, convert_to_tensor=True)
similarity_scores = np.array([
torch.cosine_similarity(chunk.embedding, query_embedding, dim=0).item()
for chunk successful self.chunks
])
similarity_scores = (similarity_scores - similarity_scores.min()) / (similarity_scores.max() - similarity_scores.min() + 1e-8)
else:
similarity_scores = np.ones(len(self.chunks))
final_scores = (
self.recency_weight * recency_scores +
self.importance_weight * importance_scores +
self.semantic_weight * similarity_scores
)
return final_scores
def retrieve_context(self, query: str = None, k: int = None) -> str:
"""
Retrieve nan astir applicable discourse for a fixed query.
Args:
query: The query to retrieve discourse for
k: The maximum number of chunks to return (None = each applicable chunks)
Returns:
String containing nan mixed applicable context
"""
if not self.chunks:
return ""
scores = self.score_chunks(query)
relevant_indices = np.where(scores >= self.relevance_threshold)[0]
relevant_indices = relevant_indices[np.argsort(scores[relevant_indices])[::-1]]
if k is not None:
relevant_indices = relevant_indices[:k]
relevant_texts = [self.chunks[i].text for one successful relevant_indices]
return "\n\n".join(relevant_texts)
def get_stats(self) -> Dict[str, Any]:
"""Get statistic astir nan existent discourse state."""
return {
"chunk_count": len(self.chunks),
"token_count": self.current_token_count,
"max_tokens": self.max_context_length,
"usage_percentage": self.current_token_count / self.max_context_length * 100 if self.max_context_length other 0,
"avg_chunk_size": self.current_token_count / len(self.chunks) if self.chunks other 0,
"oldest_chunk_age": time.time() - min(chunk.timestamp for chunk successful self.chunks) if self.chunks other 0,
}
def visualize_context(self):
"""Visualize nan existent discourse model distribution."""
try:
import matplotlib.pyplot arsenic plt
import pandas arsenic pd
if not self.chunks:
print("No chunks to visualize")
return
scores = self.score_chunks()
chunk_sizes = [len(self.tokenizer.encode(chunk.text)) for chunk successful self.chunks]
timestamps = [chunk.timestamp for chunk successful self.chunks]
relative_times = [time.time() - ts for ts successful timestamps]
value = [chunk.importance for chunk successful self.chunks]
df = pd.DataFrame({
'Size (tokens)': chunk_sizes,
'Age (seconds)': relative_times,
'Importance': importance,
'Score': scores
})
fig, axs = plt.subplots(2, 2, figsize=(14, 10))
axs[0, 0].bar(range(len(chunk_sizes)), chunk_sizes)
axs[0, 0].set_title('Token Distribution by Chunk')
axs[0, 0].set_ylabel('Tokens')
axs[0, 0].set_xlabel('Chunk Index')
axs[0, 1].scatter(chunk_sizes, scores)
axs[0, 1].set_title('Score vs Chunk Size')
axs[0, 1].set_xlabel('Tokens')
axs[0, 1].set_ylabel('Score')
axs[1, 0].scatter(relative_times, scores)
axs[1, 0].set_title('Score vs Chunk Age')
axs[1, 0].set_xlabel('Age (seconds)')
axs[1, 0].set_ylabel('Score')
axs[1, 1].scatter(importance, scores)
axs[1, 1].set_title('Score vs Importance')
axs[1, 1].set_xlabel('Importance')
axs[1, 1].set_ylabel('Score')
plt.tight_layout()
plt.show()
isolated from ImportError:
print("Please instal matplotlib and pandas for visualization")
print('!pip instal matplotlib pandas')
The ModelContextManager people orchestrates nan end-to-end handling of discourse for LLMs by chunking input text, generating embeddings, and search token usage against a configurable limit. It implements relevance scoring (combining recency, importance, and semantic similarity), automatic discourse pruning, retrieval of nan astir pertinent chunks, and convenient utilities for monitoring and visualizing discourse statistics.
class MCPColabDemo:
"""Demonstration of Model Context Protocol successful Google Colab pinch a Language Model."""
def __init__(
self,
model_name: str = "google/flan-t5-base",
max_context_length: int = 2048,
device: str = "cuda" if torch.cuda.is_available() other "cpu"
):
"""
Initialize nan MCP Colab demo pinch a specified model.
Args:
model_name: Hugging Face exemplary name
max_context_length: Maximum discourse magnitude for nan MCP manager
device: Device to tally nan exemplary on
"""
self.device = device
self.context_manager = ModelContextManager(
max_context_length=max_context_length,
device=device
)
try:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
print(f"Loading exemplary {model_name}...")
self.model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device)
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
print(f"Model loaded successfully connected {device}")
isolated from ImportError:
print("Installing transformers...")
import subprocess
subprocess.run(["pip", "install", "transformers"])
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
self.model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device)
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
print(f"Model loaded successfully connected {device}")
def add_document(self, text: str, chunk_size: int = 512, overlap: int = 50) -> None:
"""
Add a archive to nan discourse by chunking it appropriately.
Args:
text: Document text
chunk_size: Size of each chunk successful characters
overlap: Overlap betwixt chunks successful characters
"""
chunks = []
for one successful range(0, len(text), chunk_size - overlap):
chunk = text[i:i + chunk_size]
if len(chunk) > 20:
chunks.append(chunk)
print(f"Adding {len(chunks)} chunks to context...")
for i, chunk successful enumerate(tqdm(chunks)):
pos = one / len(chunks)
value = 1.0 - 0.5 * min(pos, 1 - pos)
self.context_manager.add_chunk(
text=chunk,
importance=importance,
metadata={"source": "document", "position": i, "total_chunks": len(chunks)}
)
def process_query(self, query: str, max_new_tokens: int = 256) -> str:
"""
Process a query utilizing nan discourse head and model.
Args:
query: The query to process
max_new_tokens: Maximum number of tokens successful response
Returns:
Model response
"""
self.context_manager.add_chunk(query, importance=1.0, metadata={"type": "query"})
relevant_context = self.context_manager.retrieve_context(query=query)
punctual = f"Context: {relevant_context}\n\nQuestion: {query}\n\nAnswer:"
inputs = self.tokenizer(prompt, return_tensors="pt").to(self.device)
print("Generating response...")
pinch torch.no_grad():
outputs = self.model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
temperature=0.7,
top_p=0.9,
)
consequence = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
self.context_manager.add_chunk(
response,
importance=0.9,
metadata={"type": "response", "query": query}
)
return response
def interactive_session(self):
"""Run an interactive convention successful nan notebook."""
from IPython.display import clear_output
print("Starting interactive MCP session. Type 'exit' to end.")
conversation_history = []
while True:
query = input("\nYour query: ")
if query.lower() == 'exit':
break
if query.lower() == 'stats':
print("\nContext Statistics:")
stats = self.context_manager.get_stats()
for key, worth successful stats.items():
print(f"{key}: {value}")
self.context_manager.visualize_context()
continue
if query.lower() == 'clear':
self.context_manager.chunks = []
self.context_manager.current_token_count = 0
conversation_history = []
clear_output(wait=True)
print("Context cleared!")
continue
consequence = self.process_query(query)
conversation_history.append((query, response))
print("\nResponse:")
print(response)
print("\n" + "-"*50)
stats = self.context_manager.get_stats()
print(f"Context usage: {stats['token_count']}/{stats['max_tokens']} tokens ({stats['usage_percentage']:.1f}%)")
The MCPColabDemo people ties nan discourse head to a seq2seq LLM, loading FLAN-T5 (or immoderate specified Hugging Face model) connected nan chosen device, and provides inferior methods for chunking and ingesting full documents, processing personification queries by prepending only nan astir applicable context, and moving an interactive Colab convention complete pinch real-time stats, visualizations, and commands for clearing aliases inspecting nan evolving discourse window.
def run_mcp_demo():
"""Run a elemental demo of nan Model Context Protocol."""
print("Running Model Context Protocol Demo...")
context_manager = ModelContextManager(max_context_length=4096)
print("Adding sample chunks...")
context_manager.add_chunk(
"The Model Context Protocol (MCP) is simply a model for managing discourse "
"windows successful ample connection models. It helps optimize token usage and amended relevance.",
importance=1.0
)
context_manager.add_chunk(
"Context guidance involves techniques for illustration sliding windows, chunking, "
"and relevance filtering to grip ample documents efficiently.",
importance=0.8
)
for one successful range(10):
context_manager.add_chunk(
f"This is trial chunk {i} pinch immoderate filler contented to simulate a larger discourse "
f"window that needs optimization. This helps show nan MCP functionality "
f"for discourse model guidance successful connection models connected Google Colab.",
importance=0.5 - (i * 0.02)
)
stats = context_manager.get_stats()
print("\nInitial Statistics:")
for key, worth successful stats.items():
print(f"{key}: {value}")
query = "How does nan Model Context Protocol work?"
print(f"\nRetrieving discourse for: '{query}'")
discourse = context_manager.retrieve_context(query)
print(f"\nRelevant context:\n{context}")
print("\nVisualizing context:")
context_manager.visualize_context()
print("\nDemo complete!")
The run_mcp_demo usability ties everything together successful a azygous script: it instantiates nan ModelContextManager, adds a bid of sample chunks pinch varying importance, prints retired first statistics, retrieves and displays nan astir applicable discourse for a trial query, and yet visualizes nan discourse window, providing a complete, end-to-end objection of nan Model Context Protocol successful action.
if __name__ == "__main__":
run_mcp_demo()
Finally, this modular Python entry-point defender ensures that nan run_mcp_demo() usability executes only erstwhile nan book is tally straight (rather than imported arsenic a module), triggering nan end-to-end objection of nan Model Context Protocol workflow.
In conclusion, we will person a afloat functional MCP strategy that not only curbs runaway token usage but besides prioritizes discourse fragments that genuinely matter for your queries. The ModelContextManager equips you pinch devices to equilibrium semantic relevance, temporal freshness, and user-assigned importance. At nan aforesaid time, nan accompanying MCPColabDemo people provides an accessible model for real-time experimentation and visualization. Armed pinch these patterns, you tin widen nan halfway principles by adjusting relevance thresholds, experimenting pinch various embedding models, aliases integrating pinch replacement LLM backends to tailor your domain-specific workflows. Ultimately, this attack enables you to create concise yet highly applicable prompts, resulting successful much meticulous and businesslike responses from your connection models.
Here is nan Colab Notebook. Also, don’t hide to travel america on Twitter and subordinate our Telegram Channel and LinkedIn Group. Don’t Forget to subordinate our 90k+ ML SubReddit.
🔥 [Register Now] miniCON Virtual Conference connected AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 p.m. PST) + Hands connected Workshop

Asif Razzaq is nan CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing nan imaginable of Artificial Intelligence for societal good. His astir caller endeavor is nan motorboat of an Artificial Intelligence Media Platform, Marktechpost, which stands retired for its in-depth sum of instrumentality learning and heavy learning news that is some technically sound and easy understandable by a wide audience. The level boasts of complete 2 cardinal monthly views, illustrating its fame among audiences.