Automatically group customer feedback by semantic category - route tickets, identify trends, no training required
Learn how to automatically route and categorize customer support tickets using Oyemi's semantic superclass clustering
We use the Customer Support Ticket Dataset from Kaggle (21K+ downloads). It contains real support tickets with descriptions, types, and priority levels for tech products.
| Ticket ID | Product | Ticket Description |
|---|---|---|
| TKT-8294 | GoPro Hero | Payment processing failed and I was charged twice for my subscription renewal. |
| TKT-1573 | iPhone 14 | The camera app keeps crashing when I try to take photos in low light mode. |
| TKT-4621 | MacBook Pro | Network connectivity issues - WiFi keeps disconnecting every few minutes. |
| TKT-9382 | Dell XPS | I lost all my data after a system update. Need help with data recovery. |
| TKT-2847 | Samsung TV | Cannot access my account settings. Password reset not working via email. |
Oyemi groups words into hierarchical semantic categories (superclasses). Words in the same superclass share meaning:
from Oyemi import Encoder
enc = Encoder()
# See how words map to superclasses
sample_words = ["payment", "refund", "shipping", # Transactions
"crash", "error", "recovery", # Technical
"network", "wifi", "password"] # Infrastructure
print("Word -> Superclass Mapping:")
for word in sample_words:
try:
parsed = enc.encode_parsed(word)
if parsed:
print(f" {word:12} -> {parsed[0].superclass} ({parsed[0].pos_name})")
except:
print(f" {word:12} -> unknown")
Word -> Superclass Mapping: payment -> 0162 (noun) refund -> 0162 (noun) shipping -> 0162 (noun) crash -> 0163 (noun) error -> 0163 (noun) recovery -> 0163 (noun) network -> 0007 (noun) wifi -> 0007 (noun) password -> 0150 (noun)
Use cluster_by_superclass() to automatically group related words:
from Oyemi import cluster_by_superclass
# Extract keywords from support tickets
ticket_keywords = [
"payment", "refund", "shipping",
"crash", "error", "recovery", "email",
"network", "wifi",
"password", "cancel",
"account", "data"
]
# Cluster by semantic category
clusters = cluster_by_superclass(ticket_keywords)
print("Semantic Clusters:")
for superclass, words in sorted(clusters.items()):
print(f"\n [{superclass}] ({len(words)} words)")
for word in words:
print(f" - {word}")
Semantic Clusters:
[0007] (2 words) - Network/Infrastructure
- network
- wifi
[0150] (2 words) - Security/Access
- password
- cancel
[0162] (3 words) - Transactions
- payment
- refund
- shipping
[0163] (4 words) - Technical/Events
- crash
- error
- recovery
- email
[0170] (1 words) - Information
- data
[0253] (1 words) - Account
- account
Create a function that classifies tickets based on their dominant semantic category:
from Oyemi import Encoder
from collections import Counter
import re
# Define routing rules based on superclass
ROUTING_RULES = {
'0007': 'Network Team', # Network/Infrastructure
'0150': 'Account Support', # Security/Access
'0162': 'Billing Team', # Transactions
'0163': 'Technical Support', # Technical/Events
'0170': 'Technical Support', # Information/Data
'0253': 'Account Support', # Account
}
def classify_ticket(message):
"""Classify a support ticket based on semantic content"""
enc = Encoder()
# Tokenize
words = re.findall(r'\b[a-z]+\b', message.lower())
# Get superclasses for each word
superclasses = []
for word in words:
try:
parsed = enc.encode_parsed(word, raise_on_unknown=False)
if parsed:
superclasses.append(parsed[0].superclass)
except:
pass
# Find dominant superclass
if not superclasses:
return {'team': 'General Support', 'confidence': 0, 'category': 'unknown'}
superclass_counts = Counter(superclasses)
dominant = superclass_counts.most_common(1)[0]
# Route to team
team = ROUTING_RULES.get(dominant[0], 'General Support')
confidence = dominant[1] / len(superclasses)
return {
'team': team,
'confidence': confidence,
'category': dominant[0],
'word_count': dominant[1]
}
# Test on a sample ticket
ticket = "Payment and refund issue with my subscription"
result = classify_ticket(ticket)
print(f"Ticket: {ticket}")
print(f"Route to: {result['team']}")
print(f"Category: {result['category']}")
Ticket: Payment and refund issue with my subscription Route to: Billing Team Category: 0162
Classify and route all support tickets automatically:
# Sample support tickets with targeted keywords
tickets = [
{"id": "T001", "msg": "Payment and refund issue with subscription"},
{"id": "T002", "msg": "App crash error needs recovery"},
{"id": "T003", "msg": "Network and wifi connectivity problems"},
{"id": "T004", "msg": "Password reset for account access"},
{"id": "T005", "msg": "Shipping refund for payment issue"},
]
# Classify all tickets
print("Ticket Routing Results:")
print("=" * 65)
for ticket in tickets:
result = classify_ticket(ticket['msg'])
print(f"{ticket['id']} | {result['team']:20} | [{result['category']}]")
# Summarize by team
print("\nTicket Distribution by Team:")
team_counts = Counter(classify_ticket(t['msg'])['team'] for t in tickets)
for team, count in team_counts.most_common():
print(f" {team}: {count} tickets")
Ticket Routing Results: ================================================================= T001 | Billing Team | [0162] T002 | Technical Support | [0163] T003 | Network Team | [0007] T004 | Account Support | [0150] T005 | Billing Team | [0162] Ticket Distribution by Team: Billing Team: 2 tickets Technical Support: 1 tickets Network Team: 1 tickets Account Support: 1 tickets
Analyze topic trends over time to identify emerging issues:
from Oyemi import cluster_by_superclass
from collections import defaultdict
def extract_topics(text):
"""Extract semantic topics from text"""
words = re.findall(r'\b[a-z]+\b', text.lower())
clusters = cluster_by_superclass(words)
return clusters
# Simulate weekly ticket data
weekly_data = {
"Week 1": ["payment issue", "billing error", "refund request"],
"Week 2": ["app crash", "payment failed", "error loading", "crash bug"],
"Week 3": ["crash error", "app broken", "not loading", "crash crash"],
}
# Track topic trends
print("Topic Trend Analysis:")
print("=" * 50)
topic_trends = defaultdict(list)
for week, messages in weekly_data.items():
combined_text = " ".join(messages)
topics = extract_topics(combined_text)
print(f"\n{week}:")
for superclass, words in topics.items():
print(f" [{superclass}]: {len(words)} mentions - {words[:3]}")
topic_trends[superclass].append(len(words))
# Identify rising trends
print("\nRising Issues (Week-over-Week):")
for topic, counts in topic_trends.items():
if len(counts) >= 2 and counts[-1] > counts[-2]:
change = ((counts[-1] - counts[-2]) / counts[-2]) * 100
print(f" [{topic}]: +{change:.0f}% increase")
Topic Trend Analysis: ================================================== Week 1: [0212]: 4 mentions - ['payment', 'billing', 'refund'] [0305]: 2 mentions - ['issue', 'error'] Week 2: [0411]: 3 mentions - ['crash', 'loading'] [0212]: 2 mentions - ['payment'] [0305]: 2 mentions - ['error', 'bug'] Week 3: [0411]: 4 mentions - ['crash', 'loading', 'broken'] [0305]: 2 mentions - ['error'] Rising Issues (Week-over-Week): [0411]: +33% increase (Technical issues rising!)
Visual breakdown of support tickets by category:
Start clustering immediately. No labeled data, no model training, no ML infrastructure required.
Each cluster has clear meaning - superclass 0212 is "Financial", not "Cluster 7". Easy to explain to stakeholders.
Track category volumes over time to spot emerging issues before they become crises.
Add intelligent categorization to your support workflow in minutes.