AiMi: RAG-Optimized Anime Dataset
🎌 AiMi Ultimate Anime Dataset (1917-2025)
You've unlocked the complete anime dataset that powers AiMi's intelligent recommendation system.
📦 What's Included
✅ Complete Anime Dataset (8,248 titles)
- Format: Parquet file (optimized for data analysis)
- Coverage: 1917-2025 (over 108 years of anime)
- Size: ~20.4MB
- Commercial License license.md
- Two Thank You Cards (Choose your pick)
📊 Dataset Features
Everything you need for RAG & Analysis:
- ✅ Canonical Embedding Text: Proprietary RAG-optimized descriptions (exclusive to this version).
- ✅ Rich Metadata: Main titles, season, studios, and ratings (0-10).
- ✅ Visual Assets: Filenames for Posters, Logos, and Backdrops.
- ✅ Classification: Type (TV/Movie), Year (1917-2025), and Tags.
- ✅ Content Info: Full synopsis, episode counts, and durations.
📋 Full Schema (25+ Columns)
• Main Title • Official Title (en)
• Official Title (ja) • Synopsis
• Animation Work • Chief Animation Direction
• Direction • Chief Direction
• Original Work • Series Composition
• Animation Character Design • Original Plan
• Music • Character Design
• Work • Episode
• Title • Duration
• Air Date • Cast
• Type • Season
• Year • Tags
• Char Tags • Max Rating
• synopsis_length • processed_tags
• tag_count • filter_type
• filter_year • Resources
• canonical_embedding_text • semantic_density
• Image Link Path • Logo Image Link Path
• Backdrop Image Link Path • Poster Image Link Path🎯 Perfect For
- 📊 Data Scientists - Train ML models, analyze trends
- 🔬 Researchers - Study anime industry patterns
- 📈 Analysts - Market research and statistics
- 💻 Developers - Build your own anime apps
- 🎓 Students - Academic projects and learning
- 📝 Writers - Content creation and analysis
📁 Project Structure
aimi-rag-system
├── anime_dataset_nomic.parquet # Complete parquet dataset file
├── license.md # Must read license
├── thank_you_card1_tier1.png # Token of appreciation (Thank You Card)
└── thank_you_card2_tier1.png # Token of appreciation (Thank You Card)🚀 Getting Started
Load the Dataset (Python)
import pandas as pd
# Load the parquet file
df = pd.read_parquet('anime_dataset_nomic.parquet')
# Explore the data
print(f"Total anime: {len(df):,}")
print(f"Columns: {df.columns.tolist()}")
# Example: Find all anime from 2020
anime_2020 = df[df['filter_year'] == 2020]
print(f"Anime released in 2020: {len(anime_2020)}")
# Example: Top rated anime
top_rated = df.nlargest(10, 'Max Rating')
print(top_rated[['Main Title', 'Max Rating', 'filter_year']])Load in R
library(arrow)
# Read parquet file
df <- read_parquet('anime_dataset_nomic.parquet')
# View structure
str(df)
head(df)💡 Use Case Ideas
Data Analysis
- Analyze anime trends over decades
- Compare ratings by studio, year, or type
- Genre popularity analysis
- Studio output and quality metrics
Machine Learning
- Build custom recommendation systems
- Train sentiment analysis models
- Predict anime success factors
- Clustering similar anime
Visualization
- Create anime history timelines
- Generate studio comparison charts
- Build interactive dashboards
- Map genre evolution
Content Creation
- Write data-driven articles
- Create YouTube analytics videos
- Build anime statistics websites
- Generate infographics
📋 Full Dataset Schema (38 Columns)
| Column Name | Type | Description | Example |
|---------------------------|---------|------------------------------------------|------------------------------------------------------|
| Main Title | string | Primary Romanized/English title | "Kimetsu no Yaiba" |
| Official Title (en) | string | Official English release title | "Demon Slayer: Kimetsu no Yaiba" |
| Official Title (ja) | string | Original Japanese title (Kanji/Kana) | "鬼滅の刃" |
| Synopsis | string | Full plot description | "Tanjiro sets out to become a..." |
| Animation Work | string | Production Studio(s) | "ufotable" |
| Chief Animation Direction | string | Chief Animation Director credits | "Matsushima Akira" |
| Direction | string | Series/Movie Director | "Sotozaki Haruo" |
| Chief Direction | string | Chief Director credits | "N/A" |
| Original Work | string | Original Creator / Mangaka | "Gotouge Koyoharu" |
| Series Composition | string | Main Writer / Script Composition | "ufotable" |
| Animation Character Design| string | Animation Character Designer | "Matsushima Akira" |
| Original Plan | string | Original Planner / Concept Creator | "N/A" |
| Music | string | Music Composer(s) | "Kajiura Yuki, Shiina Gou" |
| Character Design | string | Original Character Designer | "N/A" |
| Work | string | Work / Production Credits | "Aniplex, Shueisha" |
| Episode | string | List of episode numbers/counts | "['1', '2', '3', ...]" |
| Title | string | List of individual episode titles | "['Cruelty', 'Trainer Sakonji...', ...]" |
| Duration | string | List of episode durations | "['24m', '24m', ...]" |
| Air Date | string | List of episode air dates | "['2019-04-06', ...]" |
| Cast | string | Voice Actor / Cast List | "Hanae Natsuki, Kitou Akari" |
| Type | string | Raw Media Type (Source) | "TV Series" |
| Season | string | Broadcast Season | "Spring 2019" |
| Year | string | Raw Year String | "2019" |
| Tags | string | Raw Tags / Genres from source | "Action, Demons, Historical..." |
| Char Tags | string | Character Archetype Tags | "Swordfighter, Siblings..." |
| Max Rating | float | Highest Community Rating (0-10) | 8.92 |
| synopsis_length | int | Character count of synopsis | 450 |
| processed_tags | string | Cleaned tags for filtering | "action, supernatural, historical" |
| tag_count | int | Number of associated tags | 15 |
| filter_type | string | Standardized Type for UI filters | "TV Series" |
| filter_year | int | Standardized Integer Year | 2019 |
| Resources | string | External Link Resources | "['Official Site', 'Twitter']" |
| canonical_embedding_text | string | RAG-Optimized Semantic Text | "TITLE: Demon Slayer | ..." |
| semantic_density | float | Information Density Score | 8.9 |
| Image Link Path | string | Poster Image Filename | "demon_slayer_poster.jpg" |
| Logo Image Link Path | string | Logo Image Filename | "demon_slayer_logo.png" |
| Backdrop Image Link Path | string | Horizontal Backdrop Filename | "demon_slayer_backdrop.jpg" |
| Poster Image Link Path | string | High-Res Poster Filename | "demon_slayer_poster_hr.jpg" |
🔒 License
🔑 License & Rights
✅ What You CAN Do (Make Money):
- Build a SaaS: Deploy a recommendation site/app and charge users.
- Freelance: Build projects for clients (deploy the app for them).
- Sell Outputs: Sell the receipts, API access, or recommendations.
- White-label: Remove AiMi branding and use your own logos.
- Deploy: Host on any server (AWS, Vercel, DigitalOcean).
❌ What You CANNOT Do (Piracy):
- Resell the Code: You cannot sell the raw source code/zip file itself.
- Open Source: You cannot upload the code to public GitHub/Kaggle.
- Redistribute Data: You cannot sell the raw parquet/index files separately.
See license.md for specific details on Client Work and Asset Usage.
Contact
For licensing questions, commercial usage inquiries, or copyright notices:
Note: For installation help or technical issues, please refer to the documentation and troubleshooting guides included in your download package.
🆙 Upgrade Options
Want more than just data?
Tier 2 - Premium ($99.99)
Everything in Tier 1, PLUS:
- ✅ Complete source code (RAG pipeline, backend API, frontend)
- ✅ 8,248 anime images (posters + backdrops + logos)
- ✅ Pre-built FAISS index (ready to use)
- ✅ Deployment guides
Upgrade to Tier 2 Now
👑 Tier 3 - Ultimate ($149.99)
Everything in Tier 2, PLUS:
- ✅ Anime Receipt Generator (full app)
- ✅ Premium receipt templates
Upgrade to Tier 3 Now
🎉 Thank You
Your support helps us maintain and expand the AiMi dataset. We're constantly adding new anime and improving data quality.
Share your projects! I'd love to see what you build with this data. Tag me and AiMi on social media or send me an email!
💬 Made with ❤️
Created by Divyanshu Singh - Passionate Programmer & Die Hard Anime Fan
Version 1.0 | Last Updated: 2025
Your success is my success. Now go build something amazing! 🎌
"Start where you are. Use what you have. Do what you can." — Arthur Ashe
Accelerate your machine learning and RAG projects with the AiMi Ultimate Anime Dataset, a production-grade collection spanning 108 years of history from 1917 to 2025. This dataset features 8,248 curated entries specifically engineered for high-performance semantic search, including a proprietary canonical embedding text field designed for superior vector accuracy. With 38 rich metadata columns covering Studios, Community Ratings, Episode Durations, Staff credits, and verified visual asset filenames, this fully cleaned and normalized resource is the perfect foundation for building professional recommendation engines, discovery apps, and industry analysis dashboards.