Oyemi in Action

Deterministic semantic encoding - word similarity, valence detection, and clustering without machine learning

Tutorial

Semantic Word Encoding

Learn how to encode words into deterministic numeric codes that capture semantic meaning, part of speech, abstractness, and sentiment valence

1

What is Oyemi?

Oyemi is an offline semantic lexicon that converts words into structured numeric codes. Unlike word embeddings (Word2Vec, GloVe), Oyemi codes are:

145K+ Words in Lexicon
100% Deterministic
0 Runtime Dependencies
95%+ Valence Accuracy
Code Format HHHH-LLLLL-P-A-V
Component Meaning Example
HHHH Semantic superclass (category) 0121 = emotion.fear
LLLLL Local synset ID 00003 = specific sense
P Part of speech 1=noun, 2=verb, 3=adj, 4=adv
A Abstractness 0=concrete, 1=mixed, 2=abstract
V Valence (sentiment) 0=neutral, 1=positive, 2=negative
2

Installation

Install Oyemi from PyPI. The package includes the pre-built lexicon - no additional downloads required:

Bash terminal
pip install oyemi
Output
Successfully installed oyemi-3.0.1
3

Basic Word Encoding

Encode words to get their semantic codes. Words with multiple meanings return multiple codes:

Python basic_encoding.py
from Oyemi import Encoder

# Initialize encoder
enc = Encoder()

# Encode a simple word
codes = enc.encode("happy")
print("Codes for 'happy':", codes)

# Polysemous word (multiple meanings)
codes = enc.encode("bank")
print("Codes for 'bank':", codes[:3])  # First 3 senses

# Check lexicon size
print(f"Lexicon contains {enc.word_count:,} words")
Output
Codes for 'happy': ['3010-00001-3-1-1', '3999-05469-3-1-1', '3999-05731-3-1-1']
Codes for 'bank': ['0174-00012-1-0-0', '0045-00089-1-0-0', '2030-00156-2-1-0']
Lexicon contains 145,014 words
4

Parsed Semantic Codes

Use encode_parsed() to get structured SemanticCode objects with named attributes:

Python parsed_encoding.py
# Get parsed semantic codes
parsed = enc.encode_parsed("fear")

# Examine the primary sense
primary = parsed[0]
print(f"Word: fear")
print(f"  Code: {primary.raw}")
print(f"  Superclass: {primary.superclass}")
print(f"  Part of Speech: {primary.pos_name}")
print(f"  Abstractness: {primary.abstractness_name}")
print(f"  Valence: {primary.valence_name}")

# Compare positive vs negative words
for word in ["love", "hate", "table"]:
    p = enc.encode_parsed(word)[0]
    print(f"{word:10} -> {p.valence_name}")
Output
Word: fear
  Code: 0121-00003-1-2-2
  Superclass: 0121
  Part of Speech: noun
  Abstractness: abstract
  Valence: negative

love       -> positive
hate       -> negative
table      -> neutral
5

Sentiment Valence Detection

Oyemi provides built-in sentiment detection without any ML models. Perfect for deterministic text analysis:

Python valence_detection.py
# Analyze sentiment of a sentence
sentence = "The manager was incompetent and the layoffs were devastating"

# Tokenize and analyze
words = sentence.lower().split()
valence_counts = {'positive': 0, 'negative': 0, 'neutral': 0}

for word in words:
    try:
        parsed = enc.encode_parsed(word, raise_on_unknown=False)
        if parsed:
            valence = parsed[0].valence_name
            valence_counts[valence] += 1
            if valence != 'neutral':
                print(f"  {word}: {valence}")
    except:
        pass

print(f"\nValence Summary: {valence_counts}")

# Calculate sentiment score
total = sum(valence_counts.values())
score = (valence_counts['positive'] - valence_counts['negative']) / total
print(f"Sentiment Score: {score:.2f}")
Output
  incompetent: negative
  layoffs: negative
  devastating: negative

Valence Summary: {'positive': 0, 'negative': 3, 'neutral': 5}
Sentiment Score: -0.38
6

Synonym Discovery

Find true synonyms using WordNet synset matching - words that share the same meaning:

Python synonyms.py
from Oyemi import find_synonyms

# Find synonyms for emotional words
for word in ["happy", "angry", "fired"]:
    syns = find_synonyms(word, limit=5)
    print(f"{word}: {syns}")

# Get weighted synonyms (higher weight = closer match)
weighted = find_synonyms("fear", return_weighted=True, limit=5)
print("\nWeighted synonyms for 'fear':")
for syn, weight in weighted:
    print(f"  {syn}: {weight:.2f}")
Output
happy: ['felicitous', 'glad', 'well-chosen']
angry: ['furious', 'raging', 'tempestuous', 'wild']
fired: ['discharged', 'dismissed', 'laid-off', 'pink-slipped']

Weighted synonyms for 'fear':
  dread: 1.00
  fearfulness: 1.00
  fright: 0.85
  reverence: 0.50
  awe: 0.50
7

Semantic Similarity

Calculate similarity between words based on their semantic codes - no embeddings required:

Python similarity.py
from Oyemi import semantic_similarity

# Compare word pairs
pairs = [
    ("happy", "joyful"),     # Synonyms
    ("happy", "sad"),        # Antonyms
    ("dog", "cat"),          # Same category
    ("dog", "computer"),    # Different categories
    ("layoff", "fired"),    # Related workplace terms
]

print("Semantic Similarity Scores:")
for w1, w2 in pairs:
    sim = semantic_similarity(w1, w2)
    print(f"  {w1:12} <-> {w2:12}: {sim:.2f}")
Output
Semantic Similarity Scores:
  happy        <-> joyful      : 0.85
  happy        <-> sad         : 0.42
  dog          <-> cat         : 0.78
  dog          <-> computer    : 0.15
  layoff       <-> fired       : 0.72
8

Topic Clustering by Superclass

Group words by their semantic category (superclass) for automatic topic clustering:

Python clustering.py
from Oyemi import cluster_by_superclass

# Words from employee feedback
words = [
    "manager", "boss", "supervisor",  # Leadership
    "salary", "bonus", "compensation",  # Money
    "layoff", "fired", "terminated",  # Employment
    "stress", "anxiety", "fear",  # Emotions
]

# Cluster by semantic category
clusters = cluster_by_superclass(words)

print("Semantic Clusters:")
for superclass, cluster_words in clusters.items():
    print(f"\n  [{superclass}]")
    for w in cluster_words:
        print(f"    - {w}")
Output
Semantic Clusters:

  [0214] Leadership
    - manager
    - boss
    - supervisor

  [0220] Compensation
    - salary
    - bonus
    - compensation

  [0233] Employment Actions
    - layoff
    - fired
    - terminated

  [0121] Emotions
    - stress
    - anxiety
    - fear

Oyemi Capabilities Summary

What you can do with deterministic semantic encoding:

Word Encoding
145K words
Valence Detection
95% accuracy
Synonym Matching
WordNet-based
Semantic Similarity
No ML needed
Topic Clustering
100+ categories

Why Use Oyemi?

100% Deterministic

Same input always produces same output. No model randomness, no training variance. Perfect for regulated industries.

Zero Dependencies

No NLTK, no transformers, no GPU required at runtime. Just pure Python with a bundled SQLite lexicon.

Explainable Output

Every code component has meaning. Superclass 0121 means "emotion.fear" - not a black box 768-dim vector.

Real-World Applications

HR Analytics

Employee Feedback Exit Interviews Survey Analysis

Financial Analysis

10-K Risk Factors Earnings Calls News Sentiment

Customer Intelligence

Review Analysis Complaint Routing Feedback Clustering

Search & Discovery

Semantic Search Query Expansion Similar Items

Try Oyemi Today

Add deterministic semantic encoding to your NLP pipeline in minutes.

Oyemi v3.0.1 | Lexicon built from Princeton WordNet + SentiWordNet