A Coding Tutorial Of Model Context Protocol Focusing On Semantic Chunking, Dynamic Token Management, And Context Relevance Scoring For Efficient Llm Interactions

Trending 1 day ago
ARTICLE AD BOX

Managing discourse efficaciously is simply a captious situation erstwhile moving pinch ample connection models, particularly successful environments for illustration Google Colab, wherever assets constraints and agelong documents tin quickly transcend disposable token windows. In this tutorial, we guideline you done a applicable implementation of nan Model Context Protocol (MCP) by building a ModelContextManager that automatically chunks incoming text, generates semantic embeddings utilizing Sentence-Transformers, and scores each chunk based connected recency, importance, and relevance. You’ll study really to merge this head pinch a Hugging Face sequence-to-sequence model, demonstrated present pinch FLAN-T5, to add, optimize, and retrieve only nan astir pertinent pieces of context. Along nan way, we’ll screen token counting pinch a GPT-2 tokenizer, context-window optimization strategies, and interactive sessions that fto you query and visualize your move discourse successful existent time.

import torch import numpy arsenic np from typing import List, Dict, Any, Optional, Union, Tuple from dataclasses import dataclass import time import gc from tqdm.notebook import tqdm

We import basal libraries for building a move discourse manager: torch and numpy grip tensor and numerical operations, while typing and dataclasses supply system type annotations and information containers. Utility modules, specified arsenic clip and gc, support timestamping and representation cleanup, arsenic good arsenic tqdm.notebook offers interactive advancement bars for chunk processing successful Colab.

@dataclass class ContextChunk: """A chunk of matter pinch metadata for nan Model Context Protocol.""" text: str embedding: Optional[torch.Tensor] = None importance: float = 1.0 timestamp: float = 0.0 metadata: Dict[str, Any] = None def __post_init__(self): if self.metadata is None: self.metadata = {} if self.timestamp == 0.0: self.timestamp = time.time()

The ContextChunk dataclass encapsulates a azygous conception of matter on pinch its embedding, a user-assigned value score, a timestamp, and arbitrary metadata. Its __post_init__ method ensures that each chunk is stamped pinch nan existent clip upon creation and that metadata defaults to an quiet dictionary if nary is provided.

class ModelContextManager: """ Manager for implementing Model Context Protocol successful LLMs connected Google Colab. Handles discourse model optimization, token management, and relevance scoring. """ def __init__( self, max_context_length: int = 8192, embedding_model: str = "sentence-transformers/all-MiniLM-L6-v2", relevance_threshold: float = 0.7, recency_weight: float = 0.3, importance_weight: float = 0.3, semantic_weight: float = 0.4, device: str = "cuda" if torch.cuda.is_available() other "cpu" ): """ Initialize nan Model Context Manager. Args: max_context_length: Maximum number of tokens successful discourse window embedding_model: Model to usage for matter embeddings relevance_threshold: Threshold for chunk relevance to beryllium included recency_weight: Weight for recency successful relevance calculation importance_weight: Weight for value successful relevance calculation semantic_weight: Weight for semantic similarity successful relevance calculation device: Device to tally computations on """ self.max_context_length = max_context_length self.device = device self.chunks = [] self.current_token_count = 0 self.relevance_threshold = relevance_threshold self.recency_weight = recency_weight self.importance_weight = importance_weight self.semantic_weight = semantic_weight try: from sentence_transformers import SentenceTransformer print(f"Loading embedding exemplary {embedding_model}...") self.embedding_model = SentenceTransformer(embedding_model).to(self.device) print(f"Embedding exemplary loaded successfully connected {self.device}") isolated from ImportError: print("Installing sentence-transformers...") import subprocess subprocess.run(["pip", "install", "sentence-transformers"]) from sentence_transformers import SentenceTransformer self.embedding_model = SentenceTransformer(embedding_model).to(self.device) print(f"Embedding exemplary loaded successfully connected {self.device}") try: from transformers import GPT2Tokenizer self.tokenizer = GPT2Tokenizer.from_pretrained("gpt2") isolated from ImportError: print("Installing transformers...") import subprocess subprocess.run(["pip", "install", "transformers"]) from transformers import GPT2Tokenizer self.tokenizer = GPT2Tokenizer.from_pretrained("gpt2") def add_chunk(self, text: str, importance: float = 1.0, metadata: Dict[str, Any] = None) -> None: """ Add a caller chunk of matter to nan discourse manager. Args: text: The matter contented to add importance: Importance people (0-1) metadata: Additional metadata for nan chunk """ pinch torch.no_grad(): embedding = self.embedding_model.encode(text, convert_to_tensor=True) chunk = ContextChunk( text=text, embedding=embedding, importance=importance, timestamp=time.time(), metadata=metadata aliases {} ) self.chunks.append(chunk) self.current_token_count += len(self.tokenizer.encode(text)) if self.current_token_count > self.max_context_length: self.optimize_context() def optimize_context(self) -> None: """Optimize discourse by removing little applicable chunks to fresh wrong token limit.""" if not self.chunks: return print("Optimizing discourse window...") scores = self.score_chunks() sorted_indices = np.argsort(scores)[::-1] new_chunks = [] new_token_count = 0 for idx successful sorted_indices: chunk = self.chunks[idx] chunk_tokens = len(self.tokenizer.encode(chunk.text)) if new_token_count + chunk_tokens <= self.max_context_length: new_chunks.append(chunk) new_token_count += chunk_tokens else: if scores[idx] > self.relevance_threshold * 1.5: for i, included_chunk successful enumerate(new_chunks): included_idx = sorted_indices[i] if scores[included_idx] < self.relevance_threshold: included_tokens = len(self.tokenizer.encode(included_chunk.text)) if new_token_count - included_tokens + chunk_tokens <= self.max_context_length: new_chunks.remove(included_chunk) new_token_count -= included_tokens new_chunks.append(chunk) new_token_count += chunk_tokens break removed_count = len(self.chunks) - len(new_chunks) self.chunks = new_chunks self.current_token_count = new_token_count print(f"Context optimized: Removed {removed_count} chunks, {len(new_chunks)} remaining, utilizing {new_token_count}/{self.max_context_length} tokens") gc.collect() if torch.cuda.is_available(): torch.cuda.empty_cache() def score_chunks(self, query: str = None) -> np.ndarray: """ Score chunks based connected recency, importance, and semantic relevance. Args: query: Optional query to cipher semantic relevance against Returns: Array of scores for each chunk """ if not self.chunks: return np.array([]) current_time = time.time() max_age = max(current_time - chunk.timestamp for chunk successful self.chunks) aliases 1.0 recency_scores = np.array([ 1.0 - ((current_time - chunk.timestamp) / max_age) for chunk successful self.chunks ]) importance_scores = np.array([chunk.importance for chunk successful self.chunks]) if query is not None: query_embedding = self.embedding_model.encode(query, convert_to_tensor=True) similarity_scores = np.array([ torch.cosine_similarity(chunk.embedding, query_embedding, dim=0).item() for chunk successful self.chunks ]) similarity_scores = (similarity_scores - similarity_scores.min()) / (similarity_scores.max() - similarity_scores.min() + 1e-8) else: similarity_scores = np.ones(len(self.chunks)) final_scores = ( self.recency_weight * recency_scores + self.importance_weight * importance_scores + self.semantic_weight * similarity_scores ) return final_scores def retrieve_context(self, query: str = None, k: int = None) -> str: """ Retrieve nan astir applicable discourse for a fixed query. Args: query: The query to retrieve discourse for k: The maximum number of chunks to return (None = each applicable chunks) Returns: String containing nan mixed applicable context """ if not self.chunks: return "" scores = self.score_chunks(query) relevant_indices = np.where(scores >= self.relevance_threshold)[0] relevant_indices = relevant_indices[np.argsort(scores[relevant_indices])[::-1]] if k is not None: relevant_indices = relevant_indices[:k] relevant_texts = [self.chunks[i].text for one successful relevant_indices] return "\n\n".join(relevant_texts) def get_stats(self) -> Dict[str, Any]: """Get statistic astir nan existent discourse state.""" return { "chunk_count": len(self.chunks), "token_count": self.current_token_count, "max_tokens": self.max_context_length, "usage_percentage": self.current_token_count / self.max_context_length * 100 if self.max_context_length other 0, "avg_chunk_size": self.current_token_count / len(self.chunks) if self.chunks other 0, "oldest_chunk_age": time.time() - min(chunk.timestamp for chunk successful self.chunks) if self.chunks other 0, } def visualize_context(self): """Visualize nan existent discourse model distribution.""" try: import matplotlib.pyplot arsenic plt import pandas arsenic pd if not self.chunks: print("No chunks to visualize") return scores = self.score_chunks() chunk_sizes = [len(self.tokenizer.encode(chunk.text)) for chunk successful self.chunks] timestamps = [chunk.timestamp for chunk successful self.chunks] relative_times = [time.time() - ts for ts successful timestamps] value = [chunk.importance for chunk successful self.chunks] df = pd.DataFrame({ 'Size (tokens)': chunk_sizes, 'Age (seconds)': relative_times, 'Importance': importance, 'Score': scores }) fig, axs = plt.subplots(2, 2, figsize=(14, 10)) axs[0, 0].bar(range(len(chunk_sizes)), chunk_sizes) axs[0, 0].set_title('Token Distribution by Chunk') axs[0, 0].set_ylabel('Tokens') axs[0, 0].set_xlabel('Chunk Index') axs[0, 1].scatter(chunk_sizes, scores) axs[0, 1].set_title('Score vs Chunk Size') axs[0, 1].set_xlabel('Tokens') axs[0, 1].set_ylabel('Score') axs[1, 0].scatter(relative_times, scores) axs[1, 0].set_title('Score vs Chunk Age') axs[1, 0].set_xlabel('Age (seconds)') axs[1, 0].set_ylabel('Score') axs[1, 1].scatter(importance, scores) axs[1, 1].set_title('Score vs Importance') axs[1, 1].set_xlabel('Importance') axs[1, 1].set_ylabel('Score') plt.tight_layout() plt.show() isolated from ImportError: print("Please instal matplotlib and pandas for visualization") print('!pip instal matplotlib pandas')

The ModelContextManager people orchestrates nan end-to-end handling of discourse for LLMs by chunking input text, generating embeddings, and search token usage against a configurable limit. It implements relevance scoring (combining recency, importance, and semantic similarity), automatic discourse pruning, retrieval of nan astir pertinent chunks, and convenient utilities for monitoring and visualizing discourse statistics.

class MCPColabDemo: """Demonstration of Model Context Protocol successful Google Colab pinch a Language Model.""" def __init__( self, model_name: str = "google/flan-t5-base", max_context_length: int = 2048, device: str = "cuda" if torch.cuda.is_available() other "cpu" ): """ Initialize nan MCP Colab demo pinch a specified model. Args: model_name: Hugging Face exemplary name max_context_length: Maximum discourse magnitude for nan MCP manager device: Device to tally nan exemplary on """ self.device = device self.context_manager = ModelContextManager( max_context_length=max_context_length, device=device ) try: from transformers import AutoModelForSeq2SeqLM, AutoTokenizer print(f"Loading exemplary {model_name}...") self.model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device) self.tokenizer = AutoTokenizer.from_pretrained(model_name) print(f"Model loaded successfully connected {device}") isolated from ImportError: print("Installing transformers...") import subprocess subprocess.run(["pip", "install", "transformers"]) from transformers import AutoModelForSeq2SeqLM, AutoTokenizer self.model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device) self.tokenizer = AutoTokenizer.from_pretrained(model_name) print(f"Model loaded successfully connected {device}") def add_document(self, text: str, chunk_size: int = 512, overlap: int = 50) -> None: """ Add a archive to nan discourse by chunking it appropriately. Args: text: Document text chunk_size: Size of each chunk successful characters overlap: Overlap betwixt chunks successful characters """ chunks = [] for one successful range(0, len(text), chunk_size - overlap): chunk = text[i:i + chunk_size] if len(chunk) > 20: chunks.append(chunk) print(f"Adding {len(chunks)} chunks to context...") for i, chunk successful enumerate(tqdm(chunks)): pos = one / len(chunks) value = 1.0 - 0.5 * min(pos, 1 - pos) self.context_manager.add_chunk( text=chunk, importance=importance, metadata={"source": "document", "position": i, "total_chunks": len(chunks)} ) def process_query(self, query: str, max_new_tokens: int = 256) -> str: """ Process a query utilizing nan discourse head and model. Args: query: The query to process max_new_tokens: Maximum number of tokens successful response Returns: Model response """ self.context_manager.add_chunk(query, importance=1.0, metadata={"type": "query"}) relevant_context = self.context_manager.retrieve_context(query=query) punctual = f"Context: {relevant_context}\n\nQuestion: {query}\n\nAnswer:" inputs = self.tokenizer(prompt, return_tensors="pt").to(self.device) print("Generating response...") pinch torch.no_grad(): outputs = self.model.generate( **inputs, max_new_tokens=max_new_tokens, do_sample=True, temperature=0.7, top_p=0.9, ) consequence = self.tokenizer.decode(outputs[0], skip_special_tokens=True) self.context_manager.add_chunk( response, importance=0.9, metadata={"type": "response", "query": query} ) return response def interactive_session(self): """Run an interactive convention successful nan notebook.""" from IPython.display import clear_output print("Starting interactive MCP session. Type 'exit' to end.") conversation_history = [] while True: query = input("\nYour query: ") if query.lower() == 'exit': break if query.lower() == 'stats': print("\nContext Statistics:") stats = self.context_manager.get_stats() for key, worth successful stats.items(): print(f"{key}: {value}") self.context_manager.visualize_context() continue if query.lower() == 'clear': self.context_manager.chunks = [] self.context_manager.current_token_count = 0 conversation_history = [] clear_output(wait=True) print("Context cleared!") continue consequence = self.process_query(query) conversation_history.append((query, response)) print("\nResponse:") print(response) print("\n" + "-"*50) stats = self.context_manager.get_stats() print(f"Context usage: {stats['token_count']}/{stats['max_tokens']} tokens ({stats['usage_percentage']:.1f}%)")

The MCPColabDemo people ties nan discourse head to a seq2seq LLM, loading FLAN-T5 (or immoderate specified Hugging Face model) connected nan chosen device, and provides inferior methods for chunking and ingesting full documents, processing personification queries by prepending only nan astir applicable context, and moving an interactive Colab convention complete pinch real-time stats, visualizations, and commands for clearing aliases inspecting nan evolving discourse window.

def run_mcp_demo(): """Run a elemental demo of nan Model Context Protocol.""" print("Running Model Context Protocol Demo...") context_manager = ModelContextManager(max_context_length=4096) print("Adding sample chunks...") context_manager.add_chunk( "The Model Context Protocol (MCP) is simply a model for managing discourse " "windows successful ample connection models. It helps optimize token usage and amended relevance.", importance=1.0 ) context_manager.add_chunk( "Context guidance involves techniques for illustration sliding windows, chunking, " "and relevance filtering to grip ample documents efficiently.", importance=0.8 ) for one successful range(10): context_manager.add_chunk( f"This is trial chunk {i} pinch immoderate filler contented to simulate a larger discourse " f"window that needs optimization. This helps show nan MCP functionality " f"for discourse model guidance successful connection models connected Google Colab.", importance=0.5 - (i * 0.02) ) stats = context_manager.get_stats() print("\nInitial Statistics:") for key, worth successful stats.items(): print(f"{key}: {value}") query = "How does nan Model Context Protocol work?" print(f"\nRetrieving discourse for: '{query}'") discourse = context_manager.retrieve_context(query) print(f"\nRelevant context:\n{context}") print("\nVisualizing context:") context_manager.visualize_context() print("\nDemo complete!")

The run_mcp_demo usability ties everything together successful a azygous script: it instantiates nan ModelContextManager, adds a bid of sample chunks pinch varying importance, prints retired first statistics, retrieves and displays nan astir applicable discourse for a trial query, and yet visualizes nan discourse window, providing a complete, end-to-end objection of nan Model Context Protocol successful action.

if __name__ == "__main__": run_mcp_demo()

Finally, this modular Python entry-point defender ensures that nan run_mcp_demo() usability executes only erstwhile nan book is tally straight (rather than imported arsenic a module), triggering nan end-to-end objection of nan Model Context Protocol workflow.

In conclusion, we will person a afloat functional MCP strategy that not only curbs runaway token usage but besides prioritizes discourse fragments that genuinely matter for your queries. The ModelContextManager equips you pinch devices to equilibrium semantic relevance, temporal freshness, and user-assigned importance. At nan aforesaid time, nan accompanying MCPColabDemo people provides an accessible model for real-time experimentation and visualization. Armed pinch these patterns, you tin widen nan halfway principles by adjusting relevance thresholds, experimenting pinch various embedding models, aliases integrating pinch replacement LLM backends to tailor your domain-specific workflows. Ultimately, this attack enables you to create concise yet highly applicable prompts, resulting successful much meticulous and businesslike responses from your connection models.


Here is nan Colab Notebook. Also, don’t hide to travel america on Twitter and subordinate our Telegram Channel and LinkedIn Group. Don’t Forget to subordinate our 90k+ ML SubReddit.

🔥 [Register Now] miniCON Virtual Conference connected AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 p.m. PST) + Hands connected Workshop

Asif Razzaq is nan CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing nan imaginable of Artificial Intelligence for societal good. His astir caller endeavor is nan motorboat of an Artificial Intelligence Media Platform, Marktechpost, which stands retired for its in-depth sum of instrumentality learning and heavy learning news that is some technically sound and easy understandable by a wide audience. The level boasts of complete 2 cardinal monthly views, illustrating its fame among audiences.

More