November 4, 2025

Analyzing Emotions in Studio Ghibli Films

A deep dive into the multilingual emotion analysis behind Spiriteddata — using HuggingFace transformers to decode the emotional DNA of Miyazaki's masterpieces across 5 languages.

nlpmachine-learningpythonhuggingfacespiriteddata

Studio Ghibli films have a way of making you feel things that other animated movies don’t. But what exactly is happening emotionally? Is there a pattern to how Miyazaki builds emotional tension and release? And — here’s what made this project fascinating — do those emotions translate?

I built Spiriteddata to answer these questions with data.

The Challenge: 22 Films × 5 Languages × 28 Emotions

Most sentiment analysis projects stop at “positive/negative.” I wanted to go deeper:

22 Studio Ghibli films — from My Neighbor Totoro to The Wind Rises
5 languages — English, French, Spanish, Dutch, and Arabic
28 emotion dimensions — not just sentiment polarity, but specific emotions like joy, sadness, anger, fear, curiosity, and surprise
~100,000 dialogue entries processed

The goal wasn’t just to analyze emotions — it was to understand if emotional patterns survive translation.

Choosing the Model: GoEmotions

After testing several options, I selected the AnasAlokla/multilingual_go_emotions model — a BERT-based multilingual classifier trained on the GoEmotions dataset.

Aspect	Details
Base Model	bert-base-multilingual-cased
Emotion Labels	28 GoEmotions categories
Classification Type	Multi-label (returns scores for all 28 emotions)
Language Support	EN, FR, ES, NL, AR

Why this model? It’s one of the few that handles multiple languages with fine-grained emotion categories, not just basic sentiment.

Important note: Japanese (JA) subtitles were excluded from emotion analysis due to model limitations. The model wasn’t trained on Japanese text, so results would be unreliable. I preserved JA subtitles for future analysis (perhaps a project for another day).

The Analysis Pipeline

Here’s how the emotion analysis actually works:

from transformers import pipeline
import pandas as pd
from tqdm import tqdm

# Initialize the multilingual emotion classifier
emotion_classifier = pipeline(
    "text-classification",
    model="AnasAlokla/multilingual_go_emotions",
    return_all_scores=True,
    device=0  # Use GPU if available
)

def analyze_dialogue_emotions(dialogue_df: pd.DataFrame) -> pd.DataFrame:
    """
    Analyze emotions for each dialogue line across 28 dimensions.
    """
    results = []
    
    for idx, row in tqdm(dialogue_df.iterrows(), total=len(dialogue_df)):
        text = row['dialogue_clean']
        
        # Skip very short lines (less meaningful)
        if len(text.split()) < 3:
            results.append(create_neutral_record(idx))
            continue
        
        # Get emotion predictions (28 scores per line)
        predictions = emotion_classifier(text)[0]
        
        # Parse into structured format
        emotion_scores = {
            f"emotion_{p['label']}": p['score'] 
            for p in predictions
        }
        
        # Find primary emotion
        primary = max(predictions, key=lambda x: x['score'])
        
        results.append({
            'line_id': idx,
            'primary_emotion': primary['label'],
            'emotion_confidence': primary['score'],
            **emotion_scores
        })
    
    return pd.DataFrame(results)

Temporal Aggregation: 1-Minute Buckets

Individual dialogue lines are too granular for narrative-level analysis. A single powerful line can create a spike that obscures broader patterns.

I aggregate emotions into 1-minute buckets, then apply a 10-minute rolling window for smoothing:

-- dbt model: mart_film_emotion_timeseries.sql
SELECT
    film_slug,
    language_code,
    minute_offset,
    
    -- Rolling window smoothing (11-point symmetric window)
    AVG(emotion_joy) OVER (
        PARTITION BY film_slug, language_code
        ORDER BY minute_offset
        ROWS BETWEEN 5 PRECEDING AND 5 FOLLOWING
    ) AS emotion_joy_smoothed,
    
    AVG(emotion_sadness) OVER (
        PARTITION BY film_slug, language_code
        ORDER BY minute_offset
        ROWS BETWEEN 5 PRECEDING AND 5 FOLLOWING
    ) AS emotion_sadness_smoothed
    
    -- ... repeat for all 28 emotions

FROM film_with_metadata

Why rolling window smoothing? Raw emotion data is inherently noisy:

A single powerful line creates spikes
Technical dialogue (names, actions) has low emotional content
Short scenes create rapid fluctuations

The 10-minute window balances noise reduction with temporal precision.

The Key Finding: Emotions Drift, But Patterns Hold

Here’s what surprised me most: emotions varied significantly across translations, but the underlying patterns remained similar.

When I compared emotion arcs across languages, I found:

High consistency (>80%) for major emotional moments — climaxes, resolutions
Moderate divergence (60-80%) in quieter scenes where translation choices matter more
Significant divergence (<60%) only in edge cases with cultural adaptation

For example, in Spirited Away:

The bathhouse scenes show consistent fear/curiosity patterns across all 5 languages
The reunion with Haku shows consistent joy peaks regardless of translation
But quieter dialogue scenes? French and Arabic translations sometimes emphasized different emotional undertones than English

This finding became the reason I continued the project. The question shifted from “Can we analyze Ghibli emotions?” to “What sentiments and dialogues are defining for each language?”

Cross-Language Consistency Analysis

I built a consistency metric to quantify how similar emotion patterns are across translations:

def calculate_cross_language_consistency(
    film_slug: str,
    emotion: str,
    languages: list = ['en', 'fr', 'es', 'nl', 'ar']
) -> dict:
    """
    Calculate Pearson correlation between language pairs
    for a specific emotion timeline.
    """
    correlations = {}
    
    # Get English as baseline
    en_arc = get_emotion_timeline(film_slug, 'en', emotion)
    
    for lang in languages:
        if lang == 'en':
            continue
            
        lang_arc = get_emotion_timeline(film_slug, lang, emotion)
        
        # Pearson correlation
        corr = en_arc['smoothed_value'].corr(lang_arc['smoothed_value'])
        correlations[f'en_vs_{lang}'] = corr
    
    return correlations

Results for Spirited Away (joy emotion):

English ↔ French: 0.87 correlation
English ↔ Spanish: 0.84 correlation
English ↔ Dutch: 0.79 correlation
English ↔ Arabic: 0.76 correlation

The high correlations suggest that emotional content survives translation — and that our analysis captures genuine narrative patterns, not linguistic artifacts.

What Came Next: The Sora AI Project

With this rich emotion dataset, I started building something more ambitious: an AI assistant called Sora that could answer questions about the emotional landscape of Ghibli films.

The architecture used LangChain with 6 custom tools:

Query sentiment analysis for specific films
Calculate cross-language emotion correlations
Find emotional peaks with timestamps
Compare director styles (Miyazaki vs. Takahata)

Sora showed promise — excelling at straightforward sentiment queries and multilingual comparisons. But I ultimately deprecated it.

Why? The interactive emotion visualizations answered the same questions faster, cheaper, and more reliably than a chatbot interface. The RAG system added complexity without proportional value for a portfolio project.

The code lives on as a case study in engineering decision-making: knowing when to build, and when to pivot. You can explore that story in the “Memories of Sora” section of the live app.

Technical Challenges

GPU Memory Management

With 100,000+ dialogue entries, GPU memory became a bottleneck:

import torch
import gc

def batch_analyze(df: pd.DataFrame, batch_size: int = 100):
    """Process in batches to manage GPU memory."""
    results = []
    
    for i in range(0, len(df), batch_size):
        batch = df.iloc[i:i+batch_size]
        batch_results = analyze_dialogue_emotions(batch)
        results.append(batch_results)
        
        # Explicit cleanup every 1000 entries
        if i % (batch_size * 10) == 0:
            torch.cuda.empty_cache()
            gc.collect()
    
    return pd.concat(results, ignore_index=True)

Model Limitations

The GoEmotions model was trained on English-heavy data. This introduces potential bias:

Non-English accuracy may be lower
No anime/film-specific training
Cultural context can affect emotion interpretation

I acknowledge these limitations transparently in the app’s methodology section. Transparency over perfection.

Explore It Yourself

The full analysis is available at spiriteddata.streamlit.app:

🎬 The Spirit Archives — Deep-dive film emotion analysis with interactive timelines
🌍 Echoes Across Languages — Cross-language emotion comparison
🎭 Architects of Emotion — Director style profiles (Miyazaki vs. Takahata)
📊 The Alchemy of Data — Methodology transparency and data quality metrics

What I Learned

Scale matters in time-series analysis — Providing both raw and smoothed peaks lets users choose the right resolution for their analysis goal.
Translation preserves more than you’d expect — Emotional arcs are remarkably consistent across languages, even when specific word choices differ.
Transparency builds trust — Documenting limitations and methodology decisions makes the analysis more credible, not less.
Know when to pivot — A working AI chatbot isn’t always better than interactive visualizations. Choose the tool that serves the user.

The intersection of data engineering and storytelling analysis has been fascinating. Data doesn’t diminish the magic of these films — it reveals how meticulously crafted that magic really is.