A step-by-step tutorial on extracting negative sentiment from real-world employee feedback data
Learn how to analyze 800K+ employee reviews to extract actionable insights about workplace sentiment
We're using the Glassdoor Job Reviews dataset from Kaggle, containing 838,566 employee reviews from various companies. Here's a sample of the data:
| firm | overall_rating | pros | cons |
|---|---|---|---|
| IBM | 1 | Good benefits package | Management is completely out of touch with reality. Constant layoffs create fear... |
| Oracle | 2 | Good salary | No career growth opportunities. Managers play favorites and the work environment is toxic... |
| Microsoft | 2 | Great perks | Work-life balance is terrible. Constant reorgs make it impossible to focus... |
| 2 | Amazing campus | Too much bureaucracy. Hard to get promoted without politics... | |
| Apple | 1 | Prestigious brand | Extremely long hours expected. No room for creativity, just follow orders... |
First, we load the CSV file and filter for negative reviews (1-2 star ratings) from tech companies:
import pandas as pd
# Load the Glassdoor reviews dataset
df = pd.read_csv('glassdoor_reviews.csv', encoding='latin-1')
# Define tech companies to analyze
tech_companies = ['ibm', 'oracle', 'microsoft', 'google', 'apple']
# Filter for tech companies (case-insensitive)
tech_df = df[df['firm'].str.lower().isin(tech_companies)]
# Get negative reviews (1-2 star ratings)
negative_reviews = tech_df[tech_df['overall_rating'] <= 2]
# Extract the "cons" column - this is what we'll analyze
cons_text = negative_reviews[['firm', 'cons']].dropna()
print(f"Found {len(cons_text)} negative reviews to analyze")
Found 20,587 negative reviews to analyze
Import KeyNeg and initialize the analyzer. For enterprise environments, the model is bundled and works offline:
# For open source version
from keyneg import KeyNeg
# Or for enterprise version (air-gapped, no internet required)
# from keyneg_enterprise import KeyNeg
# Initialize the analyzer
kn = KeyNeg()
# KeyNeg comes with 95+ built-in sentiment labels including:
# - incompetent management
# - no growth opportunities
# - hostile work environment
# - layoffs
# - work life imbalance
# - and many more...
print("KeyNeg initialized successfully")
KeyNeg initialized successfully
Let's start by analyzing a single employee review to understand the output format:
# Sample review from IBM employee
review = """Management is completely out of touch with reality.
Constant layoffs create fear and uncertainty. No clear career path
and promotions are based on politics, not merit."""
# Analyze the review
result = kn.analyze(review)
# View the results
print("Top Sentiment:", result['top_sentiment'])
print("Negativity Score:", result['negativity_score'])
print("All Sentiments:", result['sentiments'])
Top Sentiment: incompetent management Negativity Score: 0.45 All Sentiments: ['incompetent management', 'layoffs', 'no growth opportunities']
Now let's analyze all 200 sample reviews at once using batch processing for efficiency:
# Sample 200 reviews for analysis
sample_reviews = cons_text.sample(n=200, random_state=42)
# Convert to list for batch processing
review_texts = sample_reviews['cons'].tolist()
# Analyze all reviews in batch
results = kn.analyze_batch(review_texts)
# Count sentiment occurrences
from collections import Counter
all_sentiments = []
for r in results:
all_sentiments.extend(r['sentiments'])
sentiment_counts = Counter(all_sentiments)
# Display top 10 sentiments
print("Top 10 Negative Sentiments:")
for sentiment, count in sentiment_counts.most_common(10):
print(f" {sentiment}: {count}")
Top 10 Negative Sentiments: incompetent management: 72 no growth opportunities: 14 career stagnation: 12 organizational instability: 10 layoffs: 9 hostile work environment: 8 poor customer service: 6 lack of collaboration: 4 dismissive management: 4 poor leadership: 4
Let's break down the analysis by company to compare sentiment patterns:
# Add company info back to results
sample_reviews = sample_reviews.reset_index(drop=True)
# Analyze by company
company_results = {}
for company in tech_companies:
# Get reviews for this company
mask = sample_reviews['firm'].str.lower() == company
company_reviews = sample_reviews[mask]['cons'].tolist()
if len(company_reviews) > 0:
# Analyze reviews
company_analysis = kn.analyze_batch(company_reviews)
# Calculate average negativity
avg_neg = sum(r['negativity_score'] for r in company_analysis) / len(company_analysis)
# Count sentiments
sentiments = []
for r in company_analysis:
sentiments.extend(r['sentiments'])
company_results[company] = {
'count': len(company_reviews),
'avg_negativity': round(avg_neg, 2),
'top_sentiments': Counter(sentiments).most_common(3)
}
# Display results
for company, data in company_results.items():
print(f"\n{company.upper()}")
print(f" Reviews: {data['count']}")
print(f" Avg Negativity: {data['avg_negativity']}")
print(f" Top Issues: {data['top_sentiments']}")
IBM
Reviews: 108
Avg Negativity: 0.42
Top Issues: [('incompetent management', 49), ('no growth opportunities', 9), ('layoffs', 7)]
ORACLE
Reviews: 48
Avg Negativity: 0.43
Top Issues: [('incompetent management', 14), ('organizational instability', 5), ('hostile work environment', 3)]
MICROSOFT
Reviews: 25
Avg Negativity: 0.42
Top Issues: [('incompetent management', 6), ('work life imbalance', 3), ('organizational instability', 3)]
GOOGLE
Reviews: 8
Avg Negativity: 0.33
Top Issues: [('no growth opportunities', 2), ('hostile work environment', 1), ('overworked', 1)]
APPLE
Reviews: 11
Avg Negativity: 0.38
Top Issues: [('poor customer service', 2), ('career stagnation', 2), ('incompetent management', 2)]
Here's a visual breakdown of the sentiment analysis results:
Sentiment breakdown by company:
"Incompetent management" appeared in 36% of all negative reviews analyzed, making it the dominant complaint across all companies.
"No growth opportunities" and "career stagnation" combined account for 13% of complaints - employees want advancement paths.
With the lowest negativity score (0.33), Google's negative reviews are less severe compared to IBM (0.42) and Oracle (0.43).
Analyze employee surveys, customer feedback, or any text data for negative sentiment patterns.
Dataset: Glassdoor Job Reviews by David Gauthier on Kaggle (838K+ reviews)