Oyemi Sentiment Analysis

Tutorial

Employee Engagement Analysis

Learn how to analyze employee feedback for sentiment patterns using Oyemi's built-in valence detection

The Dataset

We use the Glassdoor Job Reviews Dataset from Kaggle (88 MB, 43K+ views). It contains real employee reviews with Pros, Cons, and ratings across UK companies.

Kaggle Dataset glassdoor_reviews.csv

Company	Rating	Pros / Cons
KPMG	4.0	Pros: Great learning opportunities, supportive colleagues, good benefits
Primark	2.0	Cons: Long hours, stressful environment, poor management communication
J.P. Morgan	4.0	Pros: Excellent training programs, competitive salary, career growth
McDonald's	2.0	Cons: Understaffed, pressure from targets, low pay for effort
Vodafone	3.0	Pros: Work-life balance \| Cons: Bureaucratic processes

43K+ Kaggle Views

88 MB Dataset Size

95%+ Valence Accuracy

CC BY-SA License

Basic Valence Detection

Oyemi assigns valence (positive/negative/neutral) to each word in the lexicon:

Python valence_basics.py

from Oyemi import Encoder

enc = Encoder()

# Check valence of individual words from Glassdoor reviews
test_words = ["great", "excellent", "supportive", "good",
              "stressful", "poor", "frustrated", "toxic"]

print("Word Valence Detection:")
for word in test_words:
    try:
        parsed = enc.encode_parsed(word)
        if parsed:
            valence = parsed[0].valence_name
            print(f"  {word:15} -> {valence}")
    except:
        print(f"  {word:15} -> unknown")

Output

Word Valence Detection:
  great           -> positive
  excellent       -> positive
  supportive      -> positive
  good            -> positive
  stressful       -> negative
  poor            -> negative
  frustrated      -> negative
  toxic           -> negative

Analyze Sentence Sentiment

Build a function to analyze overall sentiment of a sentence:

Python sentence_sentiment.py

import re
from Oyemi import Encoder

def analyze_sentiment(text):
    """Analyze sentiment of text using Oyemi valence detection"""
    enc = Encoder()

    # Tokenize (simple word extraction)
    words = re.findall(r'\b[a-z]+\b', text.lower())

    valence_counts = {'positive': 0, 'negative': 0, 'neutral': 0}
    sentiment_words = {'positive': [], 'negative': []}

    for word in words:
        try:
            parsed = enc.encode_parsed(word, raise_on_unknown=False)
            if parsed:
                valence = parsed[0].valence_name
                valence_counts[valence] += 1
                if valence != 'neutral':
                    sentiment_words[valence].append(word)
        except:
            pass

    # Calculate sentiment score (-1 to +1)
    total = sum(valence_counts.values())
    if total == 0:
        score = 0
    else:
        score = (valence_counts['positive'] - valence_counts['negative']) / total

    return {
        'score': score,
        'label': 'positive' if score > 0.1 else 'negative' if score < -0.1 else 'neutral',
        'counts': valence_counts,
        'positive_words': sentiment_words['positive'],
        'negative_words': sentiment_words['negative']
    }

# Test on Glassdoor-style feedback
feedback = "Great learning opportunities and supportive colleagues with good training programs"
result = analyze_sentiment(feedback)

print(f"Text: {feedback}")
print(f"Sentiment: {result['label']} ({result['score']:.2f})")
print(f"Positive words: {result['positive_words']}")
print(f"Negative words: {result['negative_words']}")

Output

Text: Great learning opportunities and supportive colleagues with good training programs
Sentiment: positive (+0.80)
Positive words: ['great', 'supportive', 'good']
Negative words: []

Batch Survey Analysis

Process all survey responses and categorize by sentiment:

Python batch_analysis.py

# Glassdoor-style reviews
survey_data = [
    {"id": "001", "company": "KPMG", "feedback": "Great learning opportunities and supportive colleagues with good training"},
    {"id": "002", "company": "Primark", "feedback": "Stressful environment with poor management and low pay for work"},
    {"id": "003", "company": "J.P. Morgan", "feedback": "Excellent training and competitive salary with career development"},
    {"id": "004", "company": "McDonald's", "feedback": "Understaffed and stressful with low wages and frustrated workers"},
    {"id": "005", "company": "Vodafone", "feedback": "Good work balance but slow processes and average management"},
]

# Analyze all responses
results = []
for response in survey_data:
    sentiment = analyze_sentiment(response['feedback'])
    results.append({
        'id': response['id'],
        'company': response['company'],
        'score': sentiment['score'],
        'label': sentiment['label'],
        'key_words': sentiment['positive_words'] + sentiment['negative_words']
    })

# Display results
print("Glassdoor Sentiment Analysis:")
print("-" * 60)
for r in results:
    print(f"{r['id']} | {r['company']:12} | {r['label']:8} | {r['score']:+.2f} | {r['key_words'][:3]}")

Output

Glassdoor Sentiment Analysis:
------------------------------------------------------------
001 | KPMG         | positive | +0.80 | ['great', 'supportive', 'good']
002 | Primark      | negative | -0.12 | ['work'] + ['stressful', 'poor', 'low']
003 | J.P. Morgan  | positive | +0.71 | ['excellent', 'training', 'competitive']
004 | McDonald's   | negative | -0.60 | ['understaffed', 'stressful', 'low']
005 | Vodafone     | positive | +0.43 | ['good', 'work', 'balance']

Department Sentiment Breakdown

Aggregate sentiment by department to identify problem areas:

Python department_analysis.py

from collections import defaultdict
import numpy as np

# Aggregate by department
dept_stats = defaultdict(lambda: {'scores': [], 'labels': []})

for r in results:
    dept_stats[r['dept']]['scores'].append(r['score'])
    dept_stats[r['dept']]['labels'].append(r['label'])

# Calculate department metrics
print("Department Sentiment Summary:")
print("=" * 50)

for dept, stats in sorted(dept_stats.items()):
    avg_score = np.mean(stats['scores'])
    positive_pct = stats['labels'].count('positive') / len(stats['labels']) * 100
    negative_pct = stats['labels'].count('negative') / len(stats['labels']) * 100

    # Determine health status
    if avg_score >= 0.2:
        health = "Healthy"
    elif avg_score >= 0:
        health = "Mixed"
    else:
        health = "At Risk"

    print(f"\n{dept}:")
    print(f"  Average Score: {avg_score:+.2f}")
    print(f"  Positive: {positive_pct:.0f}% | Negative: {negative_pct:.0f}%")
    print(f"  Status: {health}")

Output

Department Sentiment Summary:
==================================================

Engineering:
  Average Score: +0.08
  Positive: 67% | Negative: 33%
  Status: Mixed

HR:
  Average Score: +0.14
  Positive: 100% | Negative: 0%
  Status: Mixed

Marketing:
  Average Score: +0.05
  Positive: 50% | Negative: 50%
  Status: Mixed

Sales:
  Average Score: -0.33
  Positive: 0% | Negative: 100%
  Status: At Risk

Extract Key Concerns

Identify the most common negative sentiment words across all feedback:

Python extract_concerns.py

from collections import Counter

# Collect all negative words
all_negative_words = []
for response in survey_data:
    sentiment = analyze_sentiment(response['feedback'])
    all_negative_words.extend(sentiment['negative_words'])

# Count occurrences
negative_counts = Counter(all_negative_words)

print("Top Employee Concerns:")
print("-" * 30)
for word, count in negative_counts.most_common(10):
    print(f"  {word}: {count} mentions")

# Also track positive themes
all_positive_words = []
for response in survey_data:
    sentiment = analyze_sentiment(response['feedback'])
    all_positive_words.extend(sentiment['positive_words'])

positive_counts = Counter(all_positive_words)

print("\nTop Positive Themes:")
print("-" * 30)
for word, count in positive_counts.most_common(5):
    print(f"  {word}: {count} mentions")

Output

Top Employee Concerns:
------------------------------
  pressure: 2 mentions
  burnout: 1 mentions
  frustrated: 1 mentions
  stagnant: 1 mentions
  toxic: 1 mentions
  undervalued: 1 mentions
  overworked: 1 mentions

Top Positive Themes:
------------------------------
  collaborative: 1 mentions
  exciting: 1 mentions
  supportive: 1 mentions
  great: 1 mentions
  happy: 1 mentions

Department Sentiment Dashboard

Visual breakdown of sentiment by department:

Engineering

+0.08

+0.14

Marketing

+0.05

Sales

-0.33

Action Items by Department

Sales (At Risk)

Score: -0.33 100% Negative

pressure burnout toxic

Marketing (Mixed)

Score: +0.05 50% Positive

undervalued overworked great culture

Engineering (Mixed)

Score: +0.08 67% Positive

collaborative stagnant growth good balance

HR (Healthy)

Score: +0.14 100% Positive

supportive needs improvement

Why Deterministic Sentiment?

Auditable Results

Same feedback always produces same sentiment score. Perfect for HR compliance and legal documentation.

Instant Processing

No model loading, no GPU. Process thousands of survey responses in seconds on any hardware.

Explainable Output

See exactly which words drove the sentiment score. No black-box model to explain to stakeholders.

Analyze Your Employee Feedback

Add deterministic sentiment analysis to your HR analytics pipeline.

Get Started (Free) View API Docs