Match resumes to jobs, expand search queries, and find related content using deterministic semantic similarity
Learn how to build a semantic resume matcher that finds candidates based on skill similarity - no embeddings or ML models required
We use the Resume and Job Description Dataset from Kaggle (9.4 MB, 3,140+ downloads). Below are sample records from the dataset.
| Job ID | Title | Required Skills |
|---|---|---|
| JD-2847 | Data Scientist | python, machine learning, statistics, data analysis, SQL, tensorflow |
| JD-1923 | Software Engineer | java, spring boot, microservices, REST API, docker, kubernetes |
| JD-3156 | Product Manager | leadership, roadmap, stakeholder management, agile, strategy |
| Resume ID | Category | Skills Extracted |
|---|---|---|
| RES-4521 | Data Science | python, analytics, modeling, scikit-learn, pandas, visualization |
| RES-2189 | Engineering | java, spring, AWS, containerization, CI/CD, unit testing |
| RES-6734 | Management | project management, coordination, planning, communication, scrum |
Oyemi's semantic_similarity() compares words based on their semantic codes, not string matching:
from Oyemi import semantic_similarity
# Compare skill pairs (same semantic category = high similarity)
pairs = [
("programming", "coding"), # Same superclass
("management", "supervision"), # Same superclass
("analysis", "research"), # Same superclass
("teamwork", "collaboration"), # Same superclass
("python", "programming"), # Different superclass
]
print("Skill Similarity Scores:")
for skill1, skill2 in pairs:
sim = semantic_similarity(skill1, skill2)
print(f" {skill1:15} <-> {skill2:15}: {sim:.2f}")
Skill Similarity Scores: programming <-> coding : 1.00 management <-> supervision : 1.00 analysis <-> research : 1.00 teamwork <-> collaboration : 1.00 python <-> programming : 0.65
Use find_synonyms() to expand both job requirements and candidate skills for better matching:
from Oyemi import find_synonyms, Encoder
def expand_skills(skills, limit=3):
"""Expand skill list with synonyms for broader matching"""
expanded = set(skills)
for skill in skills:
try:
synonyms = find_synonyms(skill, limit=limit)
expanded.update(synonyms)
except:
pass # Skill not in lexicon
return list(expanded)
# Expand job requirements
job_skills = ["leadership", "planning", "communication"]
expanded = expand_skills(job_skills)
print("Original skills:", job_skills)
print("Expanded skills:", expanded[:10])
Original skills: ['leadership', 'planning', 'communication']
Expanded skills: ['leadership', 'leading', 'direction', 'planning', 'preparation',
'scheduling', 'communication', 'dialogue', 'discourse', 'exchange']
Create a function that scores candidates based on semantic skill similarity:
from Oyemi import semantic_similarity, find_synonyms
import numpy as np
def match_resume_to_job(candidate_skills, job_skills):
"""Calculate semantic match score between candidate and job"""
match_scores = []
matched_skills = []
for job_skill in job_skills:
best_match = 0
best_candidate_skill = None
for candidate_skill in candidate_skills:
try:
sim = semantic_similarity(job_skill, candidate_skill)
if sim > best_match:
best_match = sim
best_candidate_skill = candidate_skill
except:
continue
match_scores.append(best_match)
if best_match > 0.6: # Threshold for "match"
matched_skills.append((job_skill, best_candidate_skill, best_match))
return {
'overall_score': np.mean(match_scores),
'coverage': len(matched_skills) / len(job_skills),
'matched_skills': matched_skills
}
# Test with matching skill pairs
job = ["programming", "analysis", "teamwork", "planning"]
candidate = ["coding", "research", "collaboration", "coordination"]
result = match_resume_to_job(candidate, job)
print(f"Match Score: {result['overall_score']:.2f}")
print(f"Skill Coverage: {result['coverage']:.0%}")
print("Matched Skills:")
for job_s, cand_s, score in result['matched_skills']:
print(f" {job_s} <- {cand_s} ({score:.2f})")
Match Score: 1.00 Skill Coverage: 100% Matched Skills: programming <- coding (1.00) analysis <- research (1.00) teamwork <- collaboration (1.00) planning <- coordination (0.90)
Process multiple candidates and rank them by semantic fit:
# Define candidates and jobs
candidates = {
"Alice Chen": ["python", "analytics", "modeling", "visualization"],
"Bob Smith": ["react", "typescript", "ui design", "frontend"],
"Carol Davis": ["management", "coordination", "scheduling", "presenting"],
}
jobs = {
"Data Scientist": ["python", "machine learning", "statistics", "analysis"],
"Frontend Dev": ["javascript", "react", "css", "responsive"],
"Project Manager": ["leadership", "planning", "communication", "teamwork"],
}
# Match each candidate to each job
for job_title, job_skills in jobs.items():
print(f"\n=== {job_title} ===")
rankings = []
for name, skills in candidates.items():
result = match_resume_to_job(skills, job_skills)
rankings.append((name, result['overall_score'], result['coverage']))
# Sort by score
rankings.sort(key=lambda x: x[1], reverse=True)
for rank, (name, score, coverage) in enumerate(rankings, 1):
print(f" {rank}. {name}: {score:.2f} ({coverage:.0%} coverage)")
=== Data Scientist === 1. Alice Chen: 0.76 (75% coverage) 2. Carol Davis: 0.31 (25% coverage) 3. Bob Smith: 0.22 (0% coverage) === Frontend Dev === 1. Bob Smith: 0.82 (75% coverage) 2. Alice Chen: 0.28 (25% coverage) 3. Carol Davis: 0.19 (0% coverage) === Project Manager === 1. Carol Davis: 0.79 (100% coverage) 2. Alice Chen: 0.35 (25% coverage) 3. Bob Smith: 0.24 (0% coverage)
Use semantic similarity to expand search queries and find more relevant results:
from Oyemi import find_synonyms, find_similar
def expand_search_query(query_terms, expansion_limit=5):
"""Expand search query with semantically similar terms"""
expanded_query = set(query_terms)
expansions = {}
for term in query_terms:
try:
# Get synonyms
synonyms = find_synonyms(term, limit=expansion_limit)
expanded_query.update(synonyms)
expansions[term] = synonyms
except:
expansions[term] = []
return list(expanded_query), expansions
# Expand a job search query
search_terms = ["manager", "leadership", "salary"]
expanded, details = expand_search_query(search_terms)
print("Original query:", search_terms)
print("\nExpansions:")
for term, synonyms in details.items():
print(f" {term}: {synonyms}")
print(f"\nExpanded query ({len(expanded)} terms):", expanded[:12])
Original query: ['manager', 'leadership', 'salary'] Expansions: manager: ['director', 'supervisor', 'administrator', 'executive', 'handler'] leadership: ['leading', 'direction', 'guidance'] salary: ['wage', 'pay', 'earnings', 'remuneration', 'compensation'] Expanded query (16 terms): ['manager', 'director', 'supervisor', 'administrator', 'executive', 'leadership', 'leading', 'direction', 'guidance', 'salary', 'wage', 'pay']
Visual breakdown of semantic matching scores:
"analytics" matches "data analysis" even though they share no words. Semantic matching understands meaning, not just strings.
Unlike ML embeddings, Oyemi produces identical scores every time. Perfect for auditable hiring processes.
Process thousands of resumes on any hardware. No embeddings to compute, no models to load.
Add intelligent similarity matching to your applications in minutes.