How To Build An Anime Recommendation System With Hugging Face?

Trending 1 month ago
ARTICLE AD BOX

A fewer years ago, I fell into nan world of anime from which I’d ne'er escape. As my watchlist was increasing thinner and thinner, uncovering nan adjacent champion anime became harder and harder. There are truthful galore hidden gems retired there, but really do I observe them? That’s erstwhile I thought—why not fto Machine Learning sensei do nan difficult work? Sounds exciting, right?

In our integer era, recommendation systems are nan silent entertainment heroes that powerfulness our regular online experiences. Whether it involves suggesting tv series, creating a personalized euphony playlist, aliases recommending products based connected browsing history, these algorithms run successful nan inheritance to amended personification engagement.

This guideline walks you done building a production-ready anime proposal motor that runs 24/7 without nan request for accepted unreality platforms. With hands-on usage cases, codification snippets, and a elaborate exploration of nan architecture, you’ll beryllium equipped to build and deploy your ain proposal engine.

Learning Objectives

  • Understand nan full information processing and exemplary training workflows to guarantee ratio and scalability.
  • Build and deploy an engaging user-friendly proposal strategy connected Hugging Face Spaces pinch a move interface.
  • Gain hands-on acquisition successful processing end-to-end proposal engines utilizing instrumentality learning approaches specified arsenic SVD, collaborative filtering and content-based filtering.
  • Seamlessly containerize your task utilizing Docker for accordant deployment crossed different environments.
  • Combine various proposal strategies wrong 1 interactive exertion to present personalized recommendations.

This article was published arsenic a portion of the Data Science Blogathon.

Anime Recommendation System pinch Hugging Face: Data Collection

The instauration of immoderate recommendation system lies successful value data. For this project, datasets were originated from Kaggle and past stored successful nan Hugging Face Datasets Hub for streamlined entree and integration. The superior datasets utilized include:

  • Animes: A dataset detailing anime titles and associated metadata.
  • Anime_UserRatings: User standing information for each anime.
  • UserRatings: General personification ratings providing insights into viewing habits.

Pre-requisites for Anime Recommendation App

Before we begin, guarantee that you person completed nan pursuing steps:

1. Sign Up and Log In

  • Go to Hugging Face and create an relationship if you haven’t already.
  • Log successful to your Hugging Face relationship to entree nan Spaces section.

2. Create a New Space

  • Navigate to nan “Spaces” conception from your floor plan aliases dashboard.
  • Click connected nan “Create New Space” button.
  • Provide a unsocial sanction for your abstraction and take nan “Streamlit” action for nan app interface.
  • Set your abstraction to nationalist aliases backstage based connected your preference.

3. Clone nan Space Repository

  • After creating nan Space, you’ll beryllium redirected to nan repository page for your caller space.
  • Clone nan repository to your section instrumentality utilizing Git pinch nan pursuing command:
git clone https://huggingface.co/spaces/your-username/your-space-name

4. Set Up nan Virtual Environment

  • Navigate to your task directory and create a caller virtual situation utilizing Python’s built-in venv tool.
# Creating nan Virtual environment ## For macOS and Linux: python3 -m venv env ## For Windows: python -m venv env # Activation nan environment ## For macOS and Linux: source env/bin/activate ## For Windows: .\env\Scripts\activate

5. Install Dependencies

  • In nan cloned repository, create a requirements.txt record that lists each nan limitations your app requires (e.g., Streamlit, pandas, etc.).
  • Install nan limitations utilizing nan command:
pip instal -r requirements.txt

Before diving into nan code, it is basal to understand really nan various components of nan strategy interact. Check retired nan beneath task architecture.

project architecture

Folder Structure

This task adopts a modular files building designed to align pinch manufacture standards, ensuring scalability and maintainability.

ANIME-RECOMMENDATION-SYSTEM/ # Project directory ├── anime_recommender/ # Main package containing each nan modules │ │── __init__.py # Package initialization │ │ │ ├── components/ # Core components of nan proposal system │ │ │── __init__.py # Package initialization │ │ │── collaborative_recommender.py # Collaborative filtering model │ │ │── content_based_recommender.py # Content-based filtering model │ │ │── data_ingestion.py # Fetches and loads data │ │ │── data_transformation.py # Preprocesses and transforms nan data │ │ │── top_anime_recommenders.py # Filters apical animes │ │ │ ├── constant/ │ │ │── __init__.py # Stores changeless values utilized crossed nan project │ │ │ ├── entity/ # Defines system entities for illustration configs and artifacts │ │ │── __init__.py │ │ │── artifact_entity.py # Data structures for exemplary artifacts │ │ │── config_entity.py # Configuration parameters and settings │ │ │ ├── exception/ # Custom objection handling │ │ │── __init__.py │ │ │── exception.py # Handles errors and exceptions │ │ │ ├── loggers/ # Logging and monitoring setup │ │ │── __init__.py │ │ │── logging.py # Configures log settings │ │ │ ├── model_trainer/ # Model training scripts │ │ │── __init__.py │ │ │── collaborative_modelling.py # Train collaborative filtering model │ │ │── content_based_modelling.py # Train content-based model │ │ │── top_anime_filtering.py # Filters apical animes based connected ratings │ │ │ ├── pipelines/ # End-to-end ML pipelines │ │ │── __init__.py │ │ │── training_pipeline.py # Training pipeline │ │ │ ├── utils/ # Utility functions │ │ │── __init__.py │ │ ├── main_utils/ │ │ │ │── __init__.py │ │ │ │── utils.py # Utility functions for circumstantial processing ├── notebooks/ # Jupyter notebooks for EDA and experimentation │ ├── EDA.ipynb # Exploratory Data Analysis │ ├── final_ARS.ipynb # Final implementation notebook ├── .gitattributes # Git configuration for handling record formats ├── .gitignore # Specifies files to disregard successful type control ├── app.py # Main Streamlit app ├── Dockerfile # Docker configuration for containerization ├── README.md # Project documentation ├── requirements.txt # Dependencies and libraries ├── run_pipeline.py # Runs nan full training pipeline ├── setup.py # Setup book for package installation

Constants

The constant/__init__.py record defines each basal constants, specified arsenic record paths, directory names, and exemplary filenames. These constants standardize configurations crossed nan information ingestion, transformation, and exemplary training stages. This ensures consistency, maintainability, and easy entree to cardinal task configurations.

"""Defining communal changeless variables for training pipeline""" PIPELINE_NAME: str = "AnimeRecommender" ARTIFACT_DIR: str = "Artifacts" ANIME_FILE_NAME: str = "Animes.csv" RATING_FILE_NAME:str = "UserRatings.csv" MERGED_FILE_NAME:str = "Anime_UserRatings.csv" ANIME_FILE_PATH:str = "krishnaveni76/Animes" RATING_FILE_PATH:str = "krishnaveni76/UserRatings" ANIMEUSERRATINGS_FILE_PATH:str = "krishnaveni76/Anime_UserRatings" MODELS_FILEPATH = "krishnaveni76/anime-recommendation-models" """Data Ingestion related changeless commencement pinch DATA_INGESTION VAR NAME""" DATA_INGESTION_DIR_NAME: str = "data_ingestion" DATA_INGESTION_FEATURE_STORE_DIR: str = "feature_store" DATA_INGESTION_INGESTED_DIR: str = "ingested" """Data Transformation related changeless commencement pinch DATA_VALIDATION VAR NAME""" DATA_TRANSFORMATION_DIR:str = "data_transformation" DATA_TRANSFORMATION_TRANSFORMED_DATA_DIR:str = "transformed" """Model Trainer related changeless commencement pinch MODEL TRAINER VAR NAME""" MODEL_TRAINER_DIR_NAME: str = "trained_models" MODEL_TRAINER_COL_TRAINED_MODEL_DIR: str = "collaborative_recommenders" MODEL_TRAINER_SVD_TRAINED_MODEL_NAME: str = "svd.pkl" MODEL_TRAINER_ITEM_KNN_TRAINED_MODEL_NAME: str = "itembasedknn.pkl" MODEL_TRAINER_USER_KNN_TRAINED_MODEL_NAME: str = "userbasedknn.pkl" MODEL_TRAINER_CON_TRAINED_MODEL_DIR:str = "content_based_recommenders" MODEL_TRAINER_COSINESIMILARITY_MODEL_NAME:str = "cosine_similarity.pkl"

Utils

The utils/main_utils/utils.py record contains inferior functions for operations specified arsenic saving/loading data, exporting dataframes, redeeming models, and uploading models to Hugging Face. These reusable functions streamline processes passim nan project.

def export_data_to_dataframe(dataframe: pd.DataFrame, file_path: str) -> pd.DataFrame: dir_path = os.path.dirname(file_path) os.makedirs(dir_path, exist_ok=True) dataframe.to_csv(file_path, index=False, header=True) return dataframe def load_csv_data(file_path: str) -> pd.DataFrame: df = pd.read_csv(file_path) return df def save_model(model: object, file_path: str) -> None: os.makedirs(os.path.dirname(file_path), exist_ok=True) pinch open(file_path, "wb") arsenic file_obj: joblib.dump(model, file_obj) def load_object(file_path: str) -> object: if not os.path.exists(file_path): error_msg = f"The file: {file_path} does not exist." raise Exception(error_msg) pinch open(file_path, "rb") arsenic file_obj: return joblib.load(file_obj) def upload_model_to_huggingface(model_path: str, repo_id: str, filename: str): api = HfApi() api.upload_file(path_or_fileobj=model_path,path_in_repo=filename,=repo_id,repo_type="model" )

Configuration Setup

The entity/config_entity.py record holds configuration specifications for different stages of nan training pipeline. This includes paths for information ingestion, transformation, and exemplary training for some collaborative and content-based proposal systems. These configurations guarantee a system and organized workflow passim nan project.

class TrainingPipelineConfig: def __init__(self, timestamp=datetime.now()): timestamp = timestamp.strftime("%m_%d_%Y_%H_%M_%S") self.pipeline_name = PIPELINE_NAME self.artifact_dir = os.path.join(ARTIFACT_DIR, timestamp) self.model_dir=os.path.join("final_model") self.timestamp: str = timestamp class DataIngestionConfig: def __init__(self, training_pipeline_config: TrainingPipelineConfig): self.data_ingestion_dir: str = os.path.join(training_pipeline_config.artifact_dir, DATA_INGESTION_DIR_NAME) self.feature_store_anime_file_path: str = os.path.join(self.data_ingestion_dir, DATA_INGESTION_FEATURE_STORE_DIR, ANIME_FILE_NAME) self.feature_store_userrating_file_path: str = os.path.join(self.data_ingestion_dir, DATA_INGESTION_FEATURE_STORE_DIR, RATING_FILE_NAME) self.anime_filepath: str = ANIME_FILE_PATH self.rating_filepath: str = RATING_FILE_PATH class DataTransformationConfig: def __init__(self,training_pipeline_config:TrainingPipelineConfig): self.data_transformation_dir:str = os.path.join(training_pipeline_config.artifact_dir,DATA_TRANSFORMATION_DIR) self.merged_file_path:str = os.path.join(self.data_transformation_dir,DATA_TRANSFORMATION_TRANSFORMED_DATA_DIR,MERGED_FILE_NAME) class CollaborativeModelConfig: def __init__(self,training_pipeline_config:TrainingPipelineConfig): self.model_trainer_dir:str = os.path.join(training_pipeline_config.artifact_dir,MODEL_TRAINER_DIR_NAME) self.svd_trained_model_file_path:str = os.path.join(self.model_trainer_dir,MODEL_TRAINER_COL_TRAINED_MODEL_DIR,MODEL_TRAINER_SVD_TRAINED_MODEL_NAME) self.user_knn_trained_model_file_path:str = os.path.join(self.model_trainer_dir,MODEL_TRAINER_COL_TRAINED_MODEL_DIR,MODEL_TRAINER_USER_KNN_TRAINED_MODEL_NAME) self.item_knn_trained_model_file_path:str = os.path.join(self.model_trainer_dir,MODEL_TRAINER_COL_TRAINED_MODEL_DIR,MODEL_TRAINER_ITEM_KNN_TRAINED_MODEL_NAME) class ContentBasedModelConfig: def __init__(self,training_pipeline_config:TrainingPipelineConfig): self.model_trainer_dir:str = os.path.join(training_pipeline_config.artifact_dir,MODEL_TRAINER_DIR_NAME) self.cosine_similarity_model_file_path:str = os.path.join(self.model_trainer_dir,MODEL_TRAINER_CON_TRAINED_MODEL_DIR,MODEL_TRAINER_COSINESIMILARITY_MODEL_NAME)

Artifacts entity

The entity/artifact_entity.py record defines classes for artifacts generated astatine various stages. These artifacts thief way and negociate intermediate outputs specified arsenic processed datasets and trained models.

@dataclass class DataIngestionArtifact: feature_store_anime_file_path:str feature_store_userrating_file_path:str @dataclass class DataTransformationArtifact: merged_file_path:str @dataclass class CollaborativeModelArtifact: svd_file_path:str item_based_knn_file_path:str user_based_knn_file_path:str @dataclass class ContentBasedModelArtifact: cosine_similarity_model_file_path:str

Recommendation System – Model Training

In this project, we instrumentality 3 types of proposal systems to heighten nan anime proposal experience:  

  1. Collaborative Recommendation System
  2. Content-Based Recommendation System
  3. Top Anime Recommendation System

Each attack plays a unsocial domiciled successful delivering personalized recommendations. By breaking down each component, we will summation a deeper understanding.

1. Collaborative Recommendation System

This Collaborative Recommendation System suggests items to users based connected nan preferences and behaviours of different users. It operates nether nan presumption that if 2 users person shown akin interests successful nan past, they are apt to person akin preferences successful nan future. This attack is wide utilized successful platforms for illustration Netflix, Amazon, and anime recommendation engines to supply personalized suggestions. In our case, we use this proposal method to place users pinch akin preferences and propose anime based connected their shared interests.

We will travel nan beneath workflow to build our proposal system. Each measurement is cautiously system to guarantee seamless integration, starting pinch information collection, followed by transformation, and yet training a exemplary to make meaningful recommendations. 

Collaborative Recommendation System

A. Data Ingestion

Data ingestion is nan process of collecting, importing, and transferring information from various sources into a information retention strategy aliases pipeline for further processing and analysis. It is simply a important first measurement successful immoderate data-driven application, arsenic it enables nan strategy to entree and activity pinch nan earthy information required to make insights, train models, aliases execute different tasks.

Data Ingestion Component

We specify a DataIngestion people successful components/data_ingestion.py record which handles nan process of fetching datasets from Hugging Face Datasets Hub, and loading them into Pandas DataFrames. It utilizes DataIngestionConfig to get nan basal record paths and configurations for nan ingestion process. The ingest_data method loads nan anime and personification standing datasets, exports them arsenic CSV files to nan characteristic store, and returns a DataIngestionArtifact containing nan paths of nan ingested files. This people encapsulates nan information ingestion logic, ensuring that information is decently fetched, stored, and made accessible for further stages of nan pipeline.

class DataIngestion: def __init__(self, data_ingestion_config: DataIngestionConfig): self.data_ingestion_config = data_ingestion_config def fetch_data_from_huggingface(self, dataset_path: str, split: str = None) -> pd.DataFrame: dataset = load_dataset(dataset_path, split=split) df = pd.DataFrame(dataset['train']) return df def ingest_data(self) -> DataIngestionArtifact: anime_df = self.fetch_data_from_huggingface(self.data_ingestion_config.anime_filepath) rating_df = self.fetch_data_from_huggingface(self.data_ingestion_config.rating_filepath) export_data_to_dataframe(anime_df, file_path=self.data_ingestion_config.feature_store_anime_file_path) export_data_to_dataframe(rating_df, file_path=self.data_ingestion_config.feature_store_userrating_file_path) dataingestionartifact = DataIngestionArtifact( feature_store_anime_file_path=self.data_ingestion_config.feature_store_anime_file_path, feature_store_userrating_file_path=self.data_ingestion_config.feature_store_userrating_file_path ) return dataingestionartifact

B. Data Transformation

Data translator is nan process of converting earthy information into a format aliases building that is suitable for analysis, modelling, aliases integration into a system. It is simply a important measurement successful nan information preprocessing pipeline, particularly for instrumentality learning, arsenic it helps guarantee that nan information is clean, consistent, and formatted successful a measurement that models tin efficaciously use.

Data Transformation Component

In components/data_transformation.py file, we instrumentality nan DataTransformation people to negociate nan translator of earthy information into a cleaned and merged dataset, fresh for further processing. The people includes methods to publication information from CSV files, merge 2 datasets (anime and ratings), cleanable and select nan merged data. Specifically, nan merge_data method combines nan datasets based connected a communal file (anime_id), while nan clean_filter_data method handles tasks for illustration replacing missing values, converting columns to numeric types, filtering rows based connected conditions, and removing unnecessary columns. The initiate_data_transformation method coordinates nan full translator process, storing nan resulting transformed dataset successful nan specified location using DataTransformationArtifact entity.

class DataTransformation: def __init__(self,data_ingestion_artifact:DataIngestionArtifact,data_transformation_config:DataTransformationConfig): self.data_ingestion_artifact = data_ingestion_artifact self.data_transformation_config = data_transformation_config @staticmethod def read_data(file_path)->pd.DataFrame: return pd.read_csv(file_path) @staticmethod def merge_data(anime_df: pd.DataFrame, rating_df: pd.DataFrame) -> pd.DataFrame: merged_df = pd.merge(rating_df, anime_df, on="anime_id", how="inner") return merged_df @staticmethod def clean_filter_data(merged_df: pd.DataFrame) -> pd.DataFrame: merged_df['average_rating'].replace('UNKNOWN', np.nan) merged_df['average_rating'] = pd.to_numeric(merged_df['average_rating'], errors='coerce') merged_df['average_rating'].fillna(merged_df['average_rating'].median()) merged_df = merged_df[merged_df['average_rating'] > 6] cols_to_drop = [ 'username', 'overview', 'type', 'episodes', 'producers', 'licensors', 'studios', 'source', 'rank', 'popularity', 'favorites', 'scored by', 'members' ] cleaned_df = merged_df.copy() cleaned_df.drop(columns=cols_to_drop, inplace=True) return cleaned_df def initiate_data_transformation(self)->DataTransformationArtifact: anime_df = DataTransformation.read_data(self.data_ingestion_artifact.feature_store_anime_file_path) rating_df = DataTransformation.read_data(self.data_ingestion_artifact.feature_store_userrating_file_path) merged_df = DataTransformation.merge_data(anime_df, rating_df) transformed_df = DataTransformation.clean_filter_data(merged_df) export_data_to_dataframe(transformed_df, self.data_transformation_config.merged_file_path) data_transformation_artifact = DataTransformationArtifact( merged_file_path=self.data_transformation_config.merged_file_path) return data_transformation_artifact

C. Collaborative Recommender

The Collaborative filtering is wide utilized successful proposal systems, wherever predictions are made based connected user-item interactions alternatively than definitive features of nan items.

Collaborative Modelling

The CollaborativeAnimeRecommender class is designed to supply personalized anime recommendations utilizing collaborative filtering techniques. It employs 3 different models:

  1. Singular Value Decomposition (SVD) : A matrix factorization method that learns latent factors representing personification preferences and anime characteristics, enabling personalized recommendations based connected past ratings.
  2. Item-Based K-Nearest Neighbors (KNN) :– Finds akin anime titles based connected personification standing patterns, recommending shows akin to a fixed anime.
  3. User-Based K-Nearest Neighbors (KNN) :– Identifies users pinch akin preferences and suggests anime that like-minded users person enjoyed.

The people processes earthy personification ratings, constructs relationship matrices, and trains nan models to make tailored recommendations. The recommender provides predictions for individual users, recommends akin anime titles, and suggests caller shows based connected personification similarity. By leveraging collaborative filtering techniques, this strategy enhances personification acquisition by offering personalized and applicable anime recommendations.

class CollaborativeAnimeRecommender: def __init__(self, df): self.df = df self.svd = None self.knn_item_based = None self.knn_user_based = None self.prepare_data() def prepare_data(self): self.df = self.df.drop_duplicates() scholar = Reader(rating_scale=(1, 10)) self.data = Dataset.load_from_df(self.df[['user_id', 'anime_id', 'rating']], reader) self.anime_pivot = self.df.pivot_table(index='name', columns='user_id', values='rating').fillna(0) self.user_pivot = self.df.pivot_table(index='user_id', columns='name', values='rating').fillna(0) def train_svd(self): self.svd = SVD() cross_validate(self.svd, self.data, cv=5) trainset = self.data.build_full_trainset() self.svd.fit(trainset) def train_knn_item_based(self): item_user_matrix = csr_matrix(self.anime_pivot.values) self.knn_item_based = NearestNeighbors(metric='cosine', algorithm='brute') self.knn_item_based.fit(item_user_matrix) def train_knn_user_based(self): user_item_matrix = csr_matrix(self.user_pivot.values) self.knn_user_based = NearestNeighbors(metric='cosine', algorithm='brute') self.knn_user_based.fit(user_item_matrix) def print_unique_user_ids(self): unique_user_ids = self.df['user_id'].unique() return unique_user_ids def get_svd_recommendations(self, user_id, n=10, svd_model=None)-> pd.DataFrame: svd_model = svd_model aliases self.svd if svd_model is None: raise ValueError("SVD exemplary is not provided aliases trained.") if user_id not successful self.df['user_id'].unique(): return f"User ID '{user_id}' not recovered successful nan dataset." anime_ids = self.df['anime_id'].unique() predictions = [(anime_id, svd_model.predict(user_id, anime_id).est) for anime_id successful anime_ids] predictions.sort(key=lambda x: x[1], reverse=True) recommended_anime_ids = [pred[0] for pred successful predictions[:n]] recommended_anime = self.df[self.df['anime_id'].isin(recommended_anime_ids)].drop_duplicates(subset='anime_id') recommended_anime = recommended_anime.head(n) return pd.DataFrame({ 'Anime Name': recommended_anime['name'].values, 'Genres': recommended_anime['genres'].values, 'Image URL': recommended_anime['image url'].values, 'Rating': recommended_anime['average_rating'].values}) def get_item_based_recommendations(self, anime_name, n_recommendations=10, knn_item_model=None): knn_item_based = knn_item_model aliases self.knn_item_based if knn_item_based is None: raise ValueError("Item-based KNN exemplary is not provided aliases trained.") if anime_name not successful self.anime_pivot.index: return f"Anime title '{anime_name}' not recovered successful nan dataset." query_index = self.anime_pivot.index.get_loc(anime_name) distances, indices = knn_item_based.kneighbors( self.anime_pivot.iloc[query_index, :].values.reshape(1, -1), n_neighbors=n_recommendations + 1 ) recommendations = [] for one successful range(1, len(distances.flatten())): anime_title = self.anime_pivot.index[indices.flatten()[i]] region = distances.flatten()[i] recommendations.append((anime_title, distance)) recommended_anime_titles = [rec[0] for rec successful recommendations] filtered_df = self.df[self.df['name'].isin(recommended_anime_titles)].drop_duplicates(subset='name') filtered_df = filtered_df.head(n_recommendations) return pd.DataFrame({ 'Anime Name': filtered_df['name'].values, 'Image URL': filtered_df['image url'].values, 'Genres': filtered_df['genres'].values, 'Rating': filtered_df['average_rating'].values }) def get_user_based_recommendations(self, user_id, n_recommendations=10, knn_user_model=None)-> pd.DataFrame: knn_user_based = knn_user_model aliases self.knn_user_based if knn_user_based is None: raise ValueError("User-based KNN exemplary is not provided aliases trained.") user_id = float(user_id) if user_id not successful self.user_pivot.index: return f"User ID '{user_id}' not recovered successful nan dataset." user_idx = self.user_pivot.index.get_loc(user_id) distances, indices = knn_user_based.kneighbors( self.user_pivot.iloc[user_idx, :].values.reshape(1, -1), n_neighbors=n_recommendations + 1 ) user_rated_anime = set(self.user_pivot.columns[self.user_pivot.iloc[user_idx, :] > 0]) all_neighbor_ratings = [] for one successful range(1, len(distances.flatten())): neighbor_idx = indices.flatten()[i] neighbor_rated_anime = self.user_pivot.iloc[neighbor_idx, :] neighbor_ratings = neighbor_rated_anime[neighbor_rated_anime > 0] all_neighbor_ratings.extend(neighbor_ratings.index) anime_counter = Counter(all_neighbor_ratings) recommendations = [(anime, count) for anime, count successful anime_counter.items() if anime not successful user_rated_anime] recommendations.sort(key=lambda x: x[1], reverse=True) recommended_anime_titles = [rec[0] for rec successful recommendations[:n_recommendations]] filtered_df = self.df[self.df['name'].isin(recommended_anime_titles)].drop_duplicates(subset='name') filtered_df = filtered_df.head(n_recommendations) return pd.DataFrame({ 'Anime Name': filtered_df['name'].values, 'Image URL': filtered_df['image url'].values, 'Genres': filtered_df['genres'].values, 'Rating': filtered_df['average_rating'].values })

Collaborative Model Trainer Component

The CollaborativeModelTrainer automates nan training, saving, and deployment of nan models. It ensures that trained models are stored locally and besides uploaded to Hugging Face, making them easy accessible for generating recommendations.

class CollaborativeModelTrainer: def __init__(self, collaborative_model_trainer_config: CollaborativeModelConfig, data_transformation_artifact: DataTransformationArtifact): self.collaborative_model_trainer_config = collaborative_model_trainer_config self.data_transformation_artifact = data_transformation_artifact def initiate_model_trainer(self) -> CollaborativeModelArtifact: df = load_csv_data(self.data_transformation_artifact.merged_file_path) recommender = CollaborativeAnimeRecommender(df) # Train and prevention SVD exemplary recommender.train_svd() save_model(model=recommender.svd,file_path= self.collaborative_model_trainer_config.svd_trained_model_file_path) upload_model_to_huggingface( model_path=self.collaborative_model_trainer_config.svd_trained_model_file_path, repo_id=MODELS_FILEPATH, filename=MODEL_TRAINER_SVD_TRAINED_MODEL_NAME ) svd_model = load_object(self.collaborative_model_trainer_config.svd_trained_model_file_path) svd_recommendations = recommender.get_svd_recommendations(user_id=436, n=10, svd_model=svd_model) # Train and prevention Item-Based KNN exemplary recommender.train_knn_item_based() save_model(model=recommender.knn_item_based, file_path=self.collaborative_model_trainer_config.item_knn_trained_model_file_path) upload_model_to_huggingface( model_path=self.collaborative_model_trainer_config.item_knn_trained_model_file_path, repo_id=MODELS_FILEPATH, filename=MODEL_TRAINER_ITEM_KNN_TRAINED_MODEL_NAME ) item_knn_model = load_object(self.collaborative_model_trainer_config.item_knn_trained_model_file_path) item_based_recommendations = recommender.get_item_based_recommendations( anime_name='One Piece', n_recommendations=10, knn_item_model=item_knn_model ) # Train and prevention User-Based KNN exemplary recommender.train_knn_user_based() save_model(model=recommender.knn_user_based,file_path= self.collaborative_model_trainer_config.user_knn_trained_model_file_path) upload_model_to_huggingface( model_path=self.collaborative_model_trainer_config.user_knn_trained_model_file_path, repo_id=MODELS_FILEPATH, filename=MODEL_TRAINER_USER_KNN_TRAINED_MODEL_NAME ) user_knn_model = load_object(self.collaborative_model_trainer_config.user_knn_trained_model_file_path) user_based_recommendations = recommender.get_user_based_recommendations( user_id=817, n_recommendations=10, knn_user_model=user_knn_model ) return CollaborativeModelArtifact( svd_file_path=self.collaborative_model_trainer_config.svd_trained_model_file_path, item_based_knn_file_path=self.collaborative_model_trainer_config.item_knn_trained_model_file_path, user_based_knn_file_path=self.collaborative_model_trainer_config.user_knn_trained_model_file_path )

2. Content-Based Recommendation System

This content-based proposal strategy suggests items to users by analyzing nan attributes of items specified arsenic genre, keywords, aliases descriptions to make recommendations based connected similarity.

For example, successful an anime proposal system, if a personification enjoys a peculiar anime, nan exemplary identifies akin anime based connected attributes for illustration genre, sound actors, aliases themes. Techniques specified arsenic TF-IDF (Term Frequency-Inverse Document Frequency), cosine similarity, and instrumentality learning models thief successful ranking and suggesting applicable items.

Unlike collaborative filtering, which depends connected personification interactions, content-based filtering is independent of different users’ preferences, making it effective moreover successful cases pinch less personification interactions (cold commencement problem).

Content-Based Recommendation System

A. Data Ingestion

We usage nan artifacts from nan information ingestion constituent discussed earlier to train nan content-based recommender.

B. Content-Based Recommender

The Content-Based recommender is responsible for training proposal models that analyse point attributes to make personalized suggestions. It processes data, extracts applicable features, and builds models that place similarities betwixt items based connected their content.  

Content-Based Modelling

The ContentBasedRecommender class leverages TF-IDF (Term Frequency-Inverse Document Frequency) and Cosine Similarity to propose anime based connected their genre similarities. The exemplary first processes nan dataset by removing missing values and converting textual genre accusation into numerical characteristic vectors utilizing TF-IDF vectorization. It past computes nan cosine similarity betwixt anime titles to measurement their contented similarity. The trained exemplary is saved and later utilized to supply personalized recommendations by retrieving nan astir akin anime based connected a fixed title.

class ContentBasedRecommender: def __init__(self, df): self.df = df.dropna() self.indices = pd.Series(self.df.index, index=self.df['name']).drop_duplicates() self.tfv = TfidfVectorizer( min_df=3, strip_accents='unicode', analyzer='word', token_pattern=r'\w{1,}', ngram_range=(1, 3), stop_words='english' ) self.tfv_matrix = self.tfv.fit_transform(self.df['genres']) self.cosine_sim = cosine_similarity(self.tfv_matrix, self.tfv_matrix) def save_model(self, model_path): os.makedirs(os.path.dirname(model_path), exist_ok=True) pinch open(model_path, 'wb') arsenic f: joblib.dump((self.tfv, self.cosine_sim), f) def get_rec_cosine(self, title, model_path, n_recommendations=5): pinch open(model_path, 'rb') arsenic f: self.tfv, self.cosine_sim = joblib.load(f) if self.df is None: raise ValueError("The DataFrame is not loaded, cannot make recommendations.") if title not successful self.indices.index: return f"Anime title '{title}' not recovered successful nan dataset." idx = self.indices[title] cosinesim_scores = list(enumerate(self.cosine_sim[idx])) cosinesim_scores = sorted(cosinesim_scores, key=lambda x: x[1], reverse=True)[1:n_recommendations + 1] anime_indices = [i[0] for one successful cosinesim_scores] return pd.DataFrame({ 'Anime name': self.df['name'].iloc[anime_indices].values, 'Image URL': self.df['image url'].iloc[anime_indices].values, 'Genres': self.df['genres'].iloc[anime_indices].values, 'Rating': self.df['average_rating'].iloc[anime_indices].values })

Content-Based Model Trainer Component

The ContentBasedModelTrainer class is responsible for automating nan training and deployment of a content-based proposal model. It loads nan processed anime dataset from nan information ingestion artifact, initializes nan ContentBasedRecommender, and trains it utilizing TF-IDF vectorization and cosine similarity. The trained exemplary is past saved and uploaded to Hugging Face.

class ContentBasedModelTrainer: def __init__(self, content_based_model_trainer_config: ContentBasedModelConfig, data_ingestion_artifact: DataIngestionArtifact): self.content_based_model_trainer_config = content_based_model_trainer_config self.data_ingestion_artifact = data_ingestion_artifact def initiate_model_trainer(self) -> ContentBasedModelArtifact: df = load_csv_data(self.data_ingestion_artifact.feature_store_anime_file_path) recommender = ContentBasedRecommender(df=df ) recommender.save_model(model_path=self.content_based_model_trainer_config.cosine_similarity_model_file_path) upload_model_to_huggingface( model_path=self.content_based_model_trainer_config.cosine_similarity_model_file_path, repo_id=MODELS_FILEPATH, filename=MODEL_TRAINER_COSINESIMILARITY_MODEL_NAME ) cosine_recommendations = recommender.get_rec_cosine(title="One Piece", model_path=self.content_based_model_trainer_config.cosine_similarity_model_file_path, n_recommendations=10) content_model_trainer_artifact = ContentBasedModelArtifact( cosine_similarity_model_file_path=self.content_based_model_trainer_config.cosine_similarity_model_file_path ) return content_model_trainer_artifact

3. Top Anime Recommendation System

It is communal for newcomers to anime to activity retired nan astir celebrated titles first. This apical anime proposal strategy is designed to thief those caller to nan anime world easy observe popular, highly rated, and top-ranked anime each successful 1 spot by utilizing elemental sorting and filtering.  

Top Anime Recommendation System

A. Data Ingestion

We utilize nan artifacts from nan antecedently discussed information ingestion constituent successful this proposal system.

B. Top Anime Recommender Component

Top anime filtering

The PopularityBasedFiltering class is responsible for ranking and sorting anime utilizing predefined popularity-based parameters. It analyzes nan dataset by evaluating attributes specified arsenic rating, number of favorites, organization size, and ranking position. The people includes specialized functions to extract top-performing anime wrong each category, ensuring a system attack to filtering. Additionally, it manages missing information and refines nan output for readability. By providing data-driven insights, this people plays a important domiciled successful identifying celebrated and highly-rated anime for proposal purposes.

class PopularityBasedFiltering: def __init__(self, df): self.df = df self.df['average_rating'] = pd.to_numeric(self.df['average_rating'], errors='coerce') self.df['average_rating'].fillna(self.df['average_rating'].median()) def popular_animes(self, n=10): sorted_df = self.df.sort_values(by=['popularity'], ascending=True) top_n_anime = sorted_df.head(n) return self._format_output(top_n_anime) def top_ranked_animes(self, n=10): self.df['rank'] = self.df['rank'].replace('UNKNOWN', np.nan).astype(float) df_filtered = self.df[self.df['rank'] > 1] sorted_df = df_filtered.sort_values(by=['rank'], ascending=True) top_n_anime = sorted_df.head(n) return self._format_output(top_n_anime) def overall_top_rated_animes(self, n=10): sorted_df = self.df.sort_values(by=['average_rating'], ascending=False) top_n_anime = sorted_df.head(n) return self._format_output(top_n_anime) def favorite_animes(self, n=10): sorted_df = self.df.sort_values(by=['favorites'], ascending=False) top_n_anime = sorted_df.head(n) return self._format_output(top_n_anime) def top_animes_members(self, n=10): sorted_df = self.df.sort_values(by=['members'], ascending=False) top_n_anime = sorted_df.head(n) return self._format_output(top_n_anime) def popular_anime_among_members(self, n=10): sorted_df = self.df.sort_values(by=['members', 'average_rating'], ascending=[False, False]).drop_duplicates(subset='name') popular_animes = sorted_df.head(n) return self._format_output(popular_animes) def top_avg_rated(self, n=10): self.df['average_rating'] = pd.to_numeric(self.df['average_rating'], errors='coerce') median_rating = self.df['average_rating'].median() self.df['average_rating'].fillna(median_rating) top_animes = ( self.df.drop_duplicates(subset='name').nlargest(n, 'average_rating')[['name', 'average_rating', 'image url', 'genres']] ) return self._format_output(top_animes) def _format_output(self, anime_df): return pd.DataFrame({ 'Anime name': anime_df['name'].values, 'Image URL': anime_df['image url'].values, 'Genres': anime_df['genres'].values, 'Rating': anime_df['average_rating'].values })

Top anime recommenders

The PopularityBasedRecommendor class is responsible for recommending anime based connected different fame metrics. It utilizes an anime dataset stored successful feature_store_anime_file_path, which was a DataIngestionArtifact. The people integrates nan PopularityBasedFiltering people to make anime recommendations according to various filtering criteria, specified arsenic top-ranked anime, astir celebrated choices, organization favorites, and highest-rated shows. By selecting a circumstantial filter_type, users tin retrieve nan champion lucifer based connected their preferred criteria.

class PopularityBasedRecommendor: def __init__(self,data_ingestion_artifact = DataIngestionArtifact): self.data_ingestion_artifact = data_ingestion_artifact def initiate_model_trainer(self,filter_type:str): df = load_csv_data(self.data_ingestion_artifact.feature_store_anime_file_path) recommender = PopularityBasedFiltering(df) if filter_type == 'popular_animes': popular_animes = recommender.popular_animes(n =10) elif filter_type == 'top_ranked_animes': top_ranked_animes = recommender.top_ranked_animes(n =10) elif filter_type == 'overall_top_rated_animes': overall_top_rated_animes = recommender.overall_top_rated_animes(n =10) elif filter_type == 'favorite_animes': favorite_animes = recommender.favorite_animes(n =10) elif filter_type == 'top_animes_members': top_animes_members = recommender.top_animes_members(n = 10) elif filter_type == 'popular_anime_among_members': popular_anime_among_members = recommender.popular_anime_among_members(n =10) elif filter_type == 'top_avg_rated': top_avg_rated = recommender.top_avg_rated(n =10)

Training Pipeline

Training Pipeline

This Machine Learning Training Pipeline is designed to automate and streamline nan process of building recommender models efficiently. The pipeline follows a system workflow, opening pinch information ingestion from Hugging face, followed by information translator to preprocess and hole nan information for exemplary training. It incorporates different modelling techniques, specified arsenic collaborative filtering, content-based approaches and Popularity-based filtering, ensuring optimal performance. The last trained models are stored successful a Model Hub, enabling seamless deployment and continuous refinement. This system attack ensures scalability, efficiency, and reproducibility successful instrumentality learning workflows.

class TrainingPipeline: def __init__(self): self.training_pipeline_config = TrainingPipelineConfig() def start_data_ingestion(self) -> DataIngestionArtifact: data_ingestion_config = DataIngestionConfig(self.training_pipeline_config) data_ingestion = DataIngestion(data_ingestion_config=data_ingestion_config) data_ingestion_artifact = data_ingestion.ingest_data() return data_ingestion_artifact def start_data_transformation(self, data_ingestion_artifact: DataIngestionArtifact) -> DataTransformationArtifact: data_transformation_config = DataTransformationConfig(self.training_pipeline_config) data_transformation = DataTransformation( data_ingestion_artifact=data_ingestion_artifact, data_transformation_config=data_transformation_config ) data_transformation_artifact = data_transformation.initiate_data_transformation() return data_transformation_artifact def start_collaborative_model_training(self, data_transformation_artifact: DataTransformationArtifact) -> CollaborativeModelArtifact: collaborative_model_config = CollaborativeModelConfig(self.training_pipeline_config) collaborative_model_trainer = CollaborativeModelTrainer( collaborative_model_trainer_config=collaborative_model_config, data_transformation_artifact=data_transformation_artifact ) collaborative_model_trainer_artifact = collaborative_model_trainer.initiate_model_trainer() return collaborative_model_trainer_artifact def start_content_based_model_training(self, data_ingestion_artifact: DataIngestionArtifact) -> ContentBasedModelArtifact: content_based_model_config = ContentBasedModelConfig(self.training_pipeline_config) content_based_model_trainer = ContentBasedModelTrainer( content_based_model_trainer_config=content_based_model_config, data_ingestion_artifact=data_ingestion_artifact ) content_based_model_trainer_artifact = content_based_model_trainer.initiate_model_trainer() return content_based_model_trainer_artifact def start_popularity_based_filtering(self, data_ingestion_artifact: DataIngestionArtifact): filtering = PopularityBasedRecommendor(data_ingestion_artifact=data_ingestion_artifact) recommendations = filtering.initiate_model_trainer(filter_type='popular_animes') return recommendations def run_pipeline(self): # Data Ingestion data_ingestion_artifact = self.start_data_ingestion() # Content-Based Model Training content_based_model_trainer_artifact = self.start_content_based_model_training(data_ingestion_artifact) # Popularity-Based Filtering popularity_recommendations = self.start_popularity_based_filtering(data_ingestion_artifact) # Data Transformation data_transformation_artifact = self.start_data_transformation(data_ingestion_artifact) # Collaborative Model Training collaborative_model_trainer_artifact = self.start_collaborative_model_training(data_transformation_artifact)

Now that we’ve completed creating nan pipeline, tally nan training_pipeline.py record utilizing nan beneath codification to position nan artifacts generated successful nan erstwhile steps.

python training_pipeline.py

Streamlit App

The recommendation application is built utilizing Streamlit, a lightweight and interactive model for creating data-driven web apps. It is deployed connected Hugging Face Spaces, allowing users to research and interact pinch nan anime proposal strategy seamlessly. This setup provides an intuitive UI for discovering anime recommendations successful existent time. Each clip you push caller changes, Hugging Face will redeploy your app automatically.

streamlit

Docker Integration for Deployment

The Dockerfile sets up a lightweight Python situation utilizing nan charismatic Python 3.10 slim-buster image. It configures nan moving directory, copies exertion files, and installs limitations from requirements.txt. Finally, it exposes larboard 8501 and runs nan Streamlit app, making it accessible wrong nan containerized environment.

# Use nan charismatic Python image arsenic a base FROM python:3.10-slim-buster # Set nan moving directory successful nan container WORKDIR /app # Copy nan app files into nan container COPY . . # Install required packages RUN pip instal -r requirements.txt # Expose nan larboard that Streamlit uses EXPOSE 8501 # Run nan Streamlit app CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]

Key Takeaways

  • We person designed an efficient, end-to-end pipeline that ensures soft information travel from ingestion to recommendation, making nan strategy scalable, robust, and production-ready.
  • New users person trending anime suggestions via a popularity-based engine, while returning users get hyper-personalized picks done collaborative filtering models.
  • By deploying connected Hugging Face Spaces pinch exemplary versioning, you execute cost-free productionization without paying immoderate AWS/GCP bills while maintaining scalability!
  • The strategy leverages Docker for containerization, ensuring accordant environments crossed different deployments.
  • Built utilizing Streamlit, nan app provides a clean, dynamic, and engaging personification experience, making anime find nosy and intuitive.

The media shown successful this article is not owned by Analytics Vidhya and is utilized astatine nan Author’s discretion.

Conclusion

Congratulations! You person completed building nan Recommendation app successful nary time. From acquiring information and preprocessing it to exemplary training and deployment, this task highlights nan powerfulness of getting things retired location into nan world! But clasp up… we’re not done yet! 💥 There’s a full batch much nosy to come! You’re now fresh to build connected thing moreover cooler, for illustration a Movie Recommendation app!     

This is conscionable nan opening of our escapade together, truthful buckle up—there are galore much breathtaking projects ahead! Let’s support learning and building! 

Frequently Asked Questions

Q1. Can I tweak this for K-dramas aliases Hollywood movies?

Ans. Absolutely! Swap nan dataset, set genre weights successful constants.py, and voilà – you’ve sewage a Squid Game aliases Marvel Recommender successful nary time!

Q2. Can I adhd a “Surprise Me” fastener for random anime picks?

Ans. Yes! A “Surprise Me” fastener tin beryllium easy added utilizing random.choice(), helping users observe hidden anime gems randomly!

Q3. Will Hugging Face complaint maine erstwhile my app goes viral?

Ans. Their free tier handles ~10K monthly visits. If you hit Demon Slayer levels of popularity, upgrade to PRO ($9/month) for privilege servers.

Krishnaveni Ponna

Hello! I'm a passionate AI and Machine Learning enthusiast presently exploring nan breathtaking realms of Deep Learning, MLOps, and Generative AI. I bask diving into caller projects and uncovering innovative techniques that push nan boundaries of technology. I'll beryllium sharing guides, tutorials, and task insights based connected my ain experiences, truthful we tin study and turn together. Join maine connected this travel arsenic we explore, experiment, and build astonishing solutions successful nan world of AI and beyond!

More