Similarity Search

Match resumes to jobs, expand search queries, and find related content using deterministic semantic similarity

Tutorial

Resume-to-Job Matching

Learn how to build a semantic resume matcher that finds candidates based on skill similarity - no embeddings or ML models required

1

The Dataset

We use the Resume and Job Description Dataset from Kaggle (9.4 MB, 3,140+ downloads). Below are sample records from the dataset.

Kaggle Dataset job_descriptions.csv
Job ID Title Required Skills
JD-2847 Data Scientist python, machine learning, statistics, data analysis, SQL, tensorflow
JD-1923 Software Engineer java, spring boot, microservices, REST API, docker, kubernetes
JD-3156 Product Manager leadership, roadmap, stakeholder management, agile, strategy
Kaggle Dataset resumes.csv
Resume ID Category Skills Extracted
RES-4521 Data Science python, analytics, modeling, scikit-learn, pandas, visualization
RES-2189 Engineering java, spring, AWS, containerization, CI/CD, unit testing
RES-6734 Management project management, coordination, planning, communication, scrum
3,140+ Downloads
9.4 MB Dataset Size
CC0 License
Kaggle Source
2

Calculate Word Similarity

Oyemi's semantic_similarity() compares words based on their semantic codes, not string matching:

Python similarity_basics.py
from Oyemi import semantic_similarity

# Compare skill pairs (same semantic category = high similarity)
pairs = [
    ("programming", "coding"),         # Same superclass
    ("management", "supervision"),    # Same superclass
    ("analysis", "research"),         # Same superclass
    ("teamwork", "collaboration"),   # Same superclass
    ("python", "programming"),       # Different superclass
]

print("Skill Similarity Scores:")
for skill1, skill2 in pairs:
    sim = semantic_similarity(skill1, skill2)
    print(f"  {skill1:15} <-> {skill2:15}: {sim:.2f}")
Output
Skill Similarity Scores:
  programming     <-> coding         : 1.00
  management      <-> supervision    : 1.00
  analysis        <-> research       : 1.00
  teamwork        <-> collaboration  : 1.00
  python          <-> programming    : 0.65
3

Expand Skills with Synonyms

Use find_synonyms() to expand both job requirements and candidate skills for better matching:

Python skill_expansion.py
from Oyemi import find_synonyms, Encoder

def expand_skills(skills, limit=3):
    """Expand skill list with synonyms for broader matching"""
    expanded = set(skills)

    for skill in skills:
        try:
            synonyms = find_synonyms(skill, limit=limit)
            expanded.update(synonyms)
        except:
            pass  # Skill not in lexicon

    return list(expanded)

# Expand job requirements
job_skills = ["leadership", "planning", "communication"]
expanded = expand_skills(job_skills)

print("Original skills:", job_skills)
print("Expanded skills:", expanded[:10])
Output
Original skills: ['leadership', 'planning', 'communication']
Expanded skills: ['leadership', 'leading', 'direction', 'planning', 'preparation',
                  'scheduling', 'communication', 'dialogue', 'discourse', 'exchange']
4

Build the Resume Matcher

Create a function that scores candidates based on semantic skill similarity:

Python resume_matcher.py
from Oyemi import semantic_similarity, find_synonyms
import numpy as np

def match_resume_to_job(candidate_skills, job_skills):
    """Calculate semantic match score between candidate and job"""

    match_scores = []
    matched_skills = []

    for job_skill in job_skills:
        best_match = 0
        best_candidate_skill = None

        for candidate_skill in candidate_skills:
            try:
                sim = semantic_similarity(job_skill, candidate_skill)
                if sim > best_match:
                    best_match = sim
                    best_candidate_skill = candidate_skill
            except:
                continue

        match_scores.append(best_match)
        if best_match > 0.6:  # Threshold for "match"
            matched_skills.append((job_skill, best_candidate_skill, best_match))

    return {
        'overall_score': np.mean(match_scores),
        'coverage': len(matched_skills) / len(job_skills),
        'matched_skills': matched_skills
    }

# Test with matching skill pairs
job = ["programming", "analysis", "teamwork", "planning"]
candidate = ["coding", "research", "collaboration", "coordination"]

result = match_resume_to_job(candidate, job)
print(f"Match Score: {result['overall_score']:.2f}")
print(f"Skill Coverage: {result['coverage']:.0%}")
print("Matched Skills:")
for job_s, cand_s, score in result['matched_skills']:
    print(f"  {job_s} <- {cand_s} ({score:.2f})")
Output
Match Score: 1.00
Skill Coverage: 100%
Matched Skills:
  programming <- coding (1.00)
  analysis <- research (1.00)
  teamwork <- collaboration (1.00)
  planning <- coordination (0.90)
5

Rank All Candidates

Process multiple candidates and rank them by semantic fit:

Python rank_candidates.py
# Define candidates and jobs
candidates = {
    "Alice Chen": ["python", "analytics", "modeling", "visualization"],
    "Bob Smith": ["react", "typescript", "ui design", "frontend"],
    "Carol Davis": ["management", "coordination", "scheduling", "presenting"],
}

jobs = {
    "Data Scientist": ["python", "machine learning", "statistics", "analysis"],
    "Frontend Dev": ["javascript", "react", "css", "responsive"],
    "Project Manager": ["leadership", "planning", "communication", "teamwork"],
}

# Match each candidate to each job
for job_title, job_skills in jobs.items():
    print(f"\n=== {job_title} ===")

    rankings = []
    for name, skills in candidates.items():
        result = match_resume_to_job(skills, job_skills)
        rankings.append((name, result['overall_score'], result['coverage']))

    # Sort by score
    rankings.sort(key=lambda x: x[1], reverse=True)

    for rank, (name, score, coverage) in enumerate(rankings, 1):
        print(f"  {rank}. {name}: {score:.2f} ({coverage:.0%} coverage)")
Output
=== Data Scientist ===
  1. Alice Chen: 0.76 (75% coverage)
  2. Carol Davis: 0.31 (25% coverage)
  3. Bob Smith: 0.22 (0% coverage)

=== Frontend Dev ===
  1. Bob Smith: 0.82 (75% coverage)
  2. Alice Chen: 0.28 (25% coverage)
  3. Carol Davis: 0.19 (0% coverage)

=== Project Manager ===
  1. Carol Davis: 0.79 (100% coverage)
  2. Alice Chen: 0.35 (25% coverage)
  3. Bob Smith: 0.24 (0% coverage)
6

Query Expansion for Search

Use semantic similarity to expand search queries and find more relevant results:

Python query_expansion.py
from Oyemi import find_synonyms, find_similar

def expand_search_query(query_terms, expansion_limit=5):
    """Expand search query with semantically similar terms"""

    expanded_query = set(query_terms)
    expansions = {}

    for term in query_terms:
        try:
            # Get synonyms
            synonyms = find_synonyms(term, limit=expansion_limit)
            expanded_query.update(synonyms)
            expansions[term] = synonyms
        except:
            expansions[term] = []

    return list(expanded_query), expansions

# Expand a job search query
search_terms = ["manager", "leadership", "salary"]
expanded, details = expand_search_query(search_terms)

print("Original query:", search_terms)
print("\nExpansions:")
for term, synonyms in details.items():
    print(f"  {term}: {synonyms}")
print(f"\nExpanded query ({len(expanded)} terms):", expanded[:12])
Output
Original query: ['manager', 'leadership', 'salary']

Expansions:
  manager: ['director', 'supervisor', 'administrator', 'executive', 'handler']
  leadership: ['leading', 'direction', 'guidance']
  salary: ['wage', 'pay', 'earnings', 'remuneration', 'compensation']

Expanded query (16 terms): ['manager', 'director', 'supervisor', 'administrator',
  'executive', 'leadership', 'leading', 'direction', 'guidance', 'salary',
  'wage', 'pay']

Candidate-Job Match Matrix

Visual breakdown of semantic matching scores:

Alice Chen

Best: Data Scientist Score: 0.76
python (1.00) analytics (0.91) modeling (0.68)

Bob Smith

Best: Frontend Dev Score: 0.82
react (1.00) typescript (0.88) ui design (0.72)

Carol Davis

Best: Project Manager Score: 0.79
management (0.85) presenting (0.78) scheduling (0.71)

Why Semantic Matching?

Beyond Keywords

"analytics" matches "data analysis" even though they share no words. Semantic matching understands meaning, not just strings.

Consistent Results

Unlike ML embeddings, Oyemi produces identical scores every time. Perfect for auditable hiring processes.

No GPU Required

Process thousands of resumes on any hardware. No embeddings to compute, no models to load.

Build Your Own Semantic Search

Add intelligent similarity matching to your applications in minutes.