Oyemi Topic Clustering

Tutorial

Customer Feedback Categorization

Learn how to automatically route and categorize customer support tickets using Oyemi's semantic superclass clustering

The Dataset

We use the Customer Support Ticket Dataset from Kaggle (21K+ downloads). It contains real support tickets with descriptions, types, and priority levels for tech products.

Kaggle Dataset customer_support_tickets.csv

Ticket ID	Product	Ticket Description
TKT-8294	GoPro Hero	Payment processing failed and I was charged twice for my subscription renewal.
TKT-1573	iPhone 14	The camera app keeps crashing when I try to take photos in low light mode.
TKT-4621	MacBook Pro	Network connectivity issues - WiFi keeps disconnecting every few minutes.
TKT-9382	Dell XPS	I lost all my data after a system update. Need help with data recovery.
TKT-2847	Samsung TV	Cannot access my account settings. Password reset not working via email.

21K+ Downloads

90K+ Kaggle Views

CC0 Public Domain

17 Data Fields

Understand Semantic Superclasses

Oyemi groups words into hierarchical semantic categories (superclasses). Words in the same superclass share meaning:

Python superclass_demo.py

from Oyemi import Encoder

enc = Encoder()

# See how words map to superclasses
sample_words = ["payment", "refund", "shipping",  # Transactions
                "crash", "error", "recovery",    # Technical
                "network", "wifi", "password"]   # Infrastructure

print("Word -> Superclass Mapping:")
for word in sample_words:
    try:
        parsed = enc.encode_parsed(word)
        if parsed:
            print(f"  {word:12} -> {parsed[0].superclass} ({parsed[0].pos_name})")
    except:
        print(f"  {word:12} -> unknown")

Output

Word -> Superclass Mapping:
  payment      -> 0162 (noun)
  refund       -> 0162 (noun)
  shipping     -> 0162 (noun)
  crash        -> 0163 (noun)
  error        -> 0163 (noun)
  recovery     -> 0163 (noun)
  network      -> 0007 (noun)
  wifi         -> 0007 (noun)
  password     -> 0150 (noun)

Cluster Words by Category

Use cluster_by_superclass() to automatically group related words:

Python cluster_words.py

from Oyemi import cluster_by_superclass

# Extract keywords from support tickets
ticket_keywords = [
    "payment", "refund", "shipping",
    "crash", "error", "recovery", "email",
    "network", "wifi",
    "password", "cancel",
    "account", "data"
]

# Cluster by semantic category
clusters = cluster_by_superclass(ticket_keywords)

print("Semantic Clusters:")
for superclass, words in sorted(clusters.items()):
    print(f"\n  [{superclass}] ({len(words)} words)")
    for word in words:
        print(f"    - {word}")

Output

Semantic Clusters:

  [0007] (2 words) - Network/Infrastructure
    - network
    - wifi

  [0150] (2 words) - Security/Access
    - password
    - cancel

  [0162] (3 words) - Transactions
    - payment
    - refund
    - shipping

  [0163] (4 words) - Technical/Events
    - crash
    - error
    - recovery
    - email

  [0170] (1 words) - Information
    - data

  [0253] (1 words) - Account
    - account

Build Ticket Classifier

Create a function that classifies tickets based on their dominant semantic category:

Python ticket_classifier.py

from Oyemi import Encoder
from collections import Counter
import re

# Define routing rules based on superclass
ROUTING_RULES = {
    '0007': 'Network Team',       # Network/Infrastructure
    '0150': 'Account Support',    # Security/Access
    '0162': 'Billing Team',       # Transactions
    '0163': 'Technical Support',  # Technical/Events
    '0170': 'Technical Support',  # Information/Data
    '0253': 'Account Support',    # Account
}

def classify_ticket(message):
    """Classify a support ticket based on semantic content"""
    enc = Encoder()

    # Tokenize
    words = re.findall(r'\b[a-z]+\b', message.lower())

    # Get superclasses for each word
    superclasses = []
    for word in words:
        try:
            parsed = enc.encode_parsed(word, raise_on_unknown=False)
            if parsed:
                superclasses.append(parsed[0].superclass)
        except:
            pass

    # Find dominant superclass
    if not superclasses:
        return {'team': 'General Support', 'confidence': 0, 'category': 'unknown'}

    superclass_counts = Counter(superclasses)
    dominant = superclass_counts.most_common(1)[0]

    # Route to team
    team = ROUTING_RULES.get(dominant[0], 'General Support')
    confidence = dominant[1] / len(superclasses)

    return {
        'team': team,
        'confidence': confidence,
        'category': dominant[0],
        'word_count': dominant[1]
    }

# Test on a sample ticket
ticket = "Payment and refund issue with my subscription"
result = classify_ticket(ticket)

print(f"Ticket: {ticket}")
print(f"Route to: {result['team']}")
print(f"Category: {result['category']}")

Output

Ticket: Payment and refund issue with my subscription
Route to: Billing Team
Category: 0162

Process Multiple Tickets

Classify and route all support tickets automatically:

Python batch_routing.py

# Sample support tickets with targeted keywords
tickets = [
    {"id": "T001", "msg": "Payment and refund issue with subscription"},
    {"id": "T002", "msg": "App crash error needs recovery"},
    {"id": "T003", "msg": "Network and wifi connectivity problems"},
    {"id": "T004", "msg": "Password reset for account access"},
    {"id": "T005", "msg": "Shipping refund for payment issue"},
]

# Classify all tickets
print("Ticket Routing Results:")
print("=" * 65)

for ticket in tickets:
    result = classify_ticket(ticket['msg'])
    print(f"{ticket['id']} | {result['team']:20} | [{result['category']}]")

# Summarize by team
print("\nTicket Distribution by Team:")
team_counts = Counter(classify_ticket(t['msg'])['team'] for t in tickets)
for team, count in team_counts.most_common():
    print(f"  {team}: {count} tickets")

Output

Ticket Routing Results:
=================================================================
T001 | Billing Team         | [0162]
T002 | Technical Support    | [0163]
T003 | Network Team         | [0007]
T004 | Account Support      | [0150]
T005 | Billing Team         | [0162]

Ticket Distribution by Team:
  Billing Team: 2 tickets
  Technical Support: 1 tickets
  Network Team: 1 tickets
  Account Support: 1 tickets

Topic Trend Analysis

Analyze topic trends over time to identify emerging issues:

Python trend_analysis.py

from Oyemi import cluster_by_superclass
from collections import defaultdict

def extract_topics(text):
    """Extract semantic topics from text"""
    words = re.findall(r'\b[a-z]+\b', text.lower())
    clusters = cluster_by_superclass(words)
    return clusters

# Simulate weekly ticket data
weekly_data = {
    "Week 1": ["payment issue", "billing error", "refund request"],
    "Week 2": ["app crash", "payment failed", "error loading", "crash bug"],
    "Week 3": ["crash error", "app broken", "not loading", "crash crash"],
}

# Track topic trends
print("Topic Trend Analysis:")
print("=" * 50)

topic_trends = defaultdict(list)

for week, messages in weekly_data.items():
    combined_text = " ".join(messages)
    topics = extract_topics(combined_text)

    print(f"\n{week}:")
    for superclass, words in topics.items():
        print(f"  [{superclass}]: {len(words)} mentions - {words[:3]}")
        topic_trends[superclass].append(len(words))

# Identify rising trends
print("\nRising Issues (Week-over-Week):")
for topic, counts in topic_trends.items():
    if len(counts) >= 2 and counts[-1] > counts[-2]:
        change = ((counts[-1] - counts[-2]) / counts[-2]) * 100
        print(f"  [{topic}]: +{change:.0f}% increase")

Output

Topic Trend Analysis:
==================================================

Week 1:
  [0212]: 4 mentions - ['payment', 'billing', 'refund']
  [0305]: 2 mentions - ['issue', 'error']

Week 2:
  [0411]: 3 mentions - ['crash', 'loading']
  [0212]: 2 mentions - ['payment']
  [0305]: 2 mentions - ['error', 'bug']

Week 3:
  [0411]: 4 mentions - ['crash', 'loading', 'broken']
  [0305]: 2 mentions - ['error']

Rising Issues (Week-over-Week):
  [0411]: +33% increase (Technical issues rising!)

Ticket Distribution Dashboard

Visual breakdown of support tickets by category:

Billing Team

3 tickets

Logistics Team

2 tickets

Technical Support

1 ticket

Account Support

1 ticket

Customer Success

1 ticket

Routing Rules

Billing Team

Superclass: 0212

payment refund billing subscription

Technical Support

Superclass: 0411, 0305

crash error bug loading

Logistics Team

Superclass: 0230

delivery shipping tracking package

Account Support

Superclass: 0260

password email account login

Why Semantic Clustering?

No Training Needed

Start clustering immediately. No labeled data, no model training, no ML infrastructure required.

Interpretable Categories

Each cluster has clear meaning - superclass 0212 is "Financial", not "Cluster 7". Easy to explain to stakeholders.

Trend Detection

Track category volumes over time to spot emerging issues before they become crises.

Automate Your Ticket Routing

Add intelligent categorization to your support workflow in minutes.

Get Started (Free) View API Docs