RAG-powered conversational investigation assistants reside nan limitations of accepted connection models by combining them pinch accusation retrieval systems. The strategy searches done circumstantial knowledge bases, retrieves applicable information, and presents it conversationally pinch due citations. This attack reduces hallucinations, handles domain-specific knowledge, and grounds responses successful retrieved text. In this tutorial, we will show building specified an adjunct utilizing nan open-source exemplary TinyLlama-1.1B-Chat-v1.0 from Hugging Face, FAISS from Meta, and nan LangChain model to reply questions astir technological papers.
First, let’s instal nan basal libraries:
!pip instal langchain-community langchain pypdf sentence-transformers faiss-cpu transformers accelerate einops
Now, let’s import nan required libraries:
import os
import torch
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.chains import ConversationalRetrievalChain
from langchain_community.llms import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import pandas arsenic pd
from IPython.display import display, Markdown
We will equine thrust to prevention nan insubstantial successful further step:
from google.colab import drive
drive.mount('/content/drive')
print("Google Drive mounted")
For our knowledge base, we’ll usage PDF documents of technological papers. Let’s create a usability to load and process these documents:
def load_documents(pdf_folder_path):
documents = []
if not pdf_folder_path:
print("Downloading a sample paper...")
!wget -q https://arxiv.org/pdf/1706.03762.pdf -O attention.pdf
pdf_docs = ["attention.pdf"]
else:
pdf_docs = [os.path.join(pdf_folder_path, f) for f successful os.listdir(pdf_folder_path)
if f.endswith('.pdf')]
print(f"Found {len(pdf_docs)} PDF documents")
for pdf_path successful pdf_docs:
try:
loader = PyPDFLoader(pdf_path)
documents.extend(loader.load())
print(f"Loaded: {pdf_path}")
isolated from Exception arsenic e:
print(f"Error loading {pdf_path}: {e}")
return documents
documents = load_documents("")
Next, we request to divided these documents into smaller chunks for businesslike retrieval:
def split_documents(documents):
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
length_function=len,
)
chunks = text_splitter.split_documents(documents)
print(f"Split {len(documents)} documents into {len(chunks)} chunks")
return chunks
chunks = split_documents(documents)
We’ll usage sentence-transformers to create vector embeddings for our archive chunks:
def create_vector_store(chunks):
print("Loading embedding model...")
embedding_model = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2",
model_kwargs={'device': 'cuda' if torch.cuda.is_available() other 'cpu'}
)
print("Creating vector store...")
vector_store = FAISS.from_documents(chunks, embedding_model)
print("Vector shop created successfully!")
return vector_store
vector_store = create_vector_store(chunks)
Now, let’s load an open-source connection exemplary to make responses. We’ll usage TinyLlama, which is mini capable to tally connected Colab but still powerful capable for our task:
def load_language_model():
print("Loading connection model...")
model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
try:
import subprocess
print("Installing/updating bitsandbytes...")
subprocess.check_call(["pip", "install", "-U", "bitsandbytes"])
print("Successfully installed/updated bitsandbytes")
except:
print("Could not update bitsandbytes, will proceed without 8-bit quantization")
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
import torch
tokenizer = AutoTokenizer.from_pretrained(model_id)
if torch.cuda.is_available():
try:
quantization_config = BitsAndBytesConfig(
load_in_8bit=True,
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False
)
exemplary = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
quantization_config=quantization_config
)
print("Model loaded pinch 8-bit quantization")
isolated from Exception arsenic e:
print(f"Error pinch quantization: {e}")
print("Falling backmost to modular exemplary loading without quantization")
exemplary = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
else:
exemplary = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float32,
device_map="auto"
)
tube = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_length=2048,
temperature=0.2,
top_p=0.95,
repetition_penalty=1.2,
return_full_text=False
)
from langchain_community.llms import HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=pipe)
print("Language exemplary loaded successfully!")
return llm
llm = load_language_model()
Now, let’s build our adjunct by combining nan vector shop and connection model:
def format_research_assistant_output(query, response, sources):
output = f"n{'=' * 50}n"
output += f"USER QUERY: {query}n"
output += f"{'-' * 50}nn"
output += f"ASSISTANT RESPONSE:n{response}nn"
output += f"{'-' * 50}n"
output += f"SOURCES REFERENCED:nn"
for i, doc successful enumerate(sources):
output += f"Source #{i+1}:n"
content_preview = doc.page_content[:200] + "..." if len(doc.page_content) > 200 other doc.page_content
wrapped_content = textwrap.fill(content_preview, width=80)
output += f"{wrapped_content}nn"
output += f"{'=' * 50}n"
return output
import textwrap
research_assistant = create_research_assistant(vector_store, llm)
test_queries = [
"What is nan cardinal thought down nan Transformer model?",
"Explain self-attention system successful elemental terms.",
"Who are nan authors of nan paper?",
"What are nan main advantages of utilizing attraction mechanisms?"
]
for query successful test_queries:
response, sources = research_assistant(query, return_sources=True)
formatted_output = format_research_assistant_output(query, response, sources)
print(formatted_output)
In this tutorial, we built a conversational investigation adjunct utilizing Retrieval-Augmented Generation pinch open-source models. RAG enhances connection models by integrating archive retrieval, reducing hallucination, and ensuring domain-specific accuracy. The guideline walks done mounting up nan environment, processing technological papers, creating vector embeddings utilizing FAISS and condemnation transformers, and integrating an open-source connection exemplary for illustration TinyLlama. The adjunct retrieves applicable archive chunks and generates responses pinch citations. This implementation allows users to query a knowledge base, making AI-powered investigation much reliable and businesslike for answering domain-specific questions.
Here is nan Colab Notebook. Also, don’t hide to travel america on Twitter and subordinate our Telegram Channel and LinkedIn Group. Don’t Forget to subordinate our 85k+ ML SubReddit.

Asif Razzaq is nan CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing nan imaginable of Artificial Intelligence for societal good. His astir caller endeavor is nan motorboat of an Artificial Intelligence Media Platform, Marktechpost, which stands retired for its in-depth sum of instrumentality learning and heavy learning news that is some technically sound and easy understandable by a wide audience. The level boasts of complete 2 cardinal monthly views, illustrating its fame among audiences.