High Performance

ONES-RS

Enterprise NLP engine for sentiment analysis, semantic similarity, domain detection, and taxonomy classification. 10-100x faster than pure Python.

What is ONES-RS?

ONES stands for Oyemi-Negated Expansion Similarity - a sophisticated algorithm combining semantic expansion, negation handling, and domain-specific analysis. Built in Rust with Python bindings for maximum performance.

60,000+ Texts/Sec

Native Rust performance with SIMD optimization. Process massive datasets in seconds.

~0.1ms Latency

Sub-millisecond per operation. Perfect for real-time applications and APIs.

6 Domain Lexicons

Finance, Legal, Cybersecurity, HR, Healthcare, and AFINN built-in with auto-detection.

501 Taxonomy Phrases

Enterprise complaint taxonomy across 24 industry verticals with sentiment weights.

Installation

ONES-RS is distributed via our private PyPI server with pre-built wheels for all major platforms.

pip install ones-rs \
    --index-url https://pypi.grandnasser.com/simple/ \
    --trusted-host pypi.grandnasser.com

Supported Platforms

Platform Architecture Python Versions
Linuxx86_643.9, 3.10, 3.11, 3.12
Windowsx643.9, 3.10, 3.11, 3.12
macOSIntel (x86_64)3.10, 3.11, 3.12
macOSApple Silicon (ARM64)3.10, 3.11, 3.12

Quick Start

from ones_rs import OnesEngine

# Initialize the engine
engine = OnesEngine()

# Load a lexicon (JSON format with word -> valence mappings)
engine.load_lexicon("path/to/lexicon.json")

# Basic sentiment analysis
text = "The product quality is excellent and customer service was amazing!"

sentiment = engine.classify_sentiment(text)
print(f"Sentiment: {sentiment}")  # "positive"

valence = engine.calculate_valence(text)
print(f"Valence: {valence:.4f}")  # 0.6234

# Semantic similarity
result = engine.compute_similarity(
    "I am happy",
    "I am glad"
)
print(f"Similarity: {result.jaccard_score:.4f}")  # 0.8521

# Domain detection
domain, confidence, count, keywords = engine.detect_domain(
    "The plaintiff filed a breach of contract lawsuit"
)
print(f"Domain: {domain} (confidence: {confidence:.2f})")  # "blacks_law"

Sentiment Analysis

ONES-RS provides lexicon-based sentiment analysis with domain-specific weights and advanced negation handling.

Basic Sentiment Classification

# Classify as positive/negative/neutral
sentiment = engine.classify_sentiment("This is terrible!")
print(sentiment)  # "negative"

# Get numeric valence score [-1.0, 1.0]
valence = engine.calculate_valence("I love this product")
print(f"Valence: {valence:.4f}")  # Positive score

# Negation is handled automatically
valence = engine.calculate_valence("I don't love this product")
print(f"Valence: {valence:.4f}")  # Flipped to negative
Classification Thresholds
  • Positive: valence > 0.05
  • Negative: valence < -0.05
  • Neutral: -0.05 ≤ valence ≤ 0.05

Auto-Domain Sentiment

Automatically detect the domain and apply the appropriate lexicon weights:

# Auto-detect domain and analyze
sentiment, domain, confidence = engine.classify_sentiment_auto(
    "The revenue growth exceeded expectations despite market volatility"
)

print(f"Sentiment: {sentiment}")      # "positive"
print(f"Domain: {domain}")            # "loughran_mcdonald"
print(f"Confidence: {confidence:.2f}")  # 0.85

Blended Multi-Domain Sentiment

For text spanning multiple domains, use blended analysis to get weighted contributions from each domain:

text = """
The plaintiff's lawsuit regarding a data breach caused significant
financial liability and the company implemented new security controls.
"""

# Get blended sentiment with domain breakdown
sentiment, result = engine.classify_sentiment_blended(text)

print(f"Sentiment: {sentiment}")
print(f"Blended Valence: {result.blended_valence:.4f}")
print(f"Domain Mix: {result.domain_mix()}")
# "40% blacks_law / 35% cybersecurity / 25% loughran_mcdonald"

# See individual domain contributions
for contrib in result.contributions:
    print(f"  {contrib.domain}: {contrib.valence:.2f} (weight: {contrib.weight:.1%})")

BlendedResultPy Properties

PropertyTypeDescription
blended_valencefloatFinal weighted sentiment score
domain_weightsdictDomain to weight mapping (sums to 1.0)
word_countintTotal tokens in text
matched_wordsintTokens found in lexicon(s)
contributionslistList of DomainContributionPy objects
domain_mix()strHuman-readable domain distribution
dominant_domain()strDomain with highest weight

Semantic Similarity

Compute semantic similarity using weighted Jaccard with synonym/antonym expansion:

# Basic similarity between two texts
result = engine.compute_similarity(
    "The service was excellent",
    "The support was amazing"
)

print(f"Jaccard Score: {result.jaccard_score:.4f}")
print(f"Weighted Score: {result.weighted_score:.4f}")
print(f"Shared Words: {result.shared_words}")

# With auto-domain detection
result = engine.compute_similarity_auto(text1, text2)

# With blended multi-domain context
score, result1, result2 = engine.compute_similarity_blended(text1, text2)

SimilarityResult Properties

PropertyTypeDescription
jaccard_scorefloatWeighted Jaccard similarity [0, 1]
weighted_scorefloatValence-adjusted similarity score
shared_wordslistWords found in both texts
text1_sizeintExpanded set size for text1
text2_sizeintExpanded set size for text2

Batch Processing & Similarity Matrix

# Batch similarity for multiple pairs
pairs = [
    ("good product", "great item"),
    ("bad service", "poor support"),
    ("fast delivery", "quick shipping"),
]
scores = engine.compute_similarity_batch(pairs)
for (t1, t2), score in zip(pairs, scores):
    print(f"{t1} vs {t2}: {score:.4f}")

# Find most similar text from candidates
query = "excellent quality"
candidates = ["poor quality", "amazing stuff", "awful experience"]
best_idx, score = engine.find_most_similar(query, candidates)
print(f"Best match: {candidates[best_idx]} (score: {score:.4f})")

# Compute full similarity matrix (flattened upper triangle)
texts = ["good", "great", "bad", "terrible"]
matrix = engine.similarity_matrix(texts)
# Returns: [good-great, good-bad, good-terrible, great-bad, great-terrible, bad-terrible]

Clustering by Similarity

# Group texts by similarity threshold
texts = [
    "good product",
    "great item",
    "excellent purchase",
    "bad service",
    "poor experience",
    "terrible support",
]

groups = engine.group_by_similarity(texts, threshold=0.5)
print(groups)  # [0, 0, 0, 1, 1, 1] - two clusters

# Visualize clusters
from collections import defaultdict
clusters = defaultdict(list)
for text, group in zip(texts, groups):
    clusters[group].append(text)

for group_id, members in clusters.items():
    print(f"Cluster {group_id}: {members}")

Domain Detection

Automatically detect the domain of text using keyword-based Aho-Corasick matching:

# Detect primary domain
domain, confidence, keyword_count, keywords = engine.detect_domain(
    "The vulnerability in the firewall allowed unauthorized access to the database"
)

print(f"Domain: {domain}")              # "cybersecurity"
print(f"Confidence: {confidence:.2f}")  # 0.92
print(f"Keywords found: {keywords}")    # ["vulnerability", "firewall", "unauthorized", "access"]

# Get scores for all domains
scores = engine.get_domain_scores(text)
for domain, score in sorted(scores.items(), key=lambda x: -x[1]):
    print(f"  {domain}: {score:.4f}")

# Batch domain detection
texts = ["lawsuit filed", "revenue growth", "data breach"]
results = engine.detect_domain_batch(texts)
for text, (domain, conf, count, kw) in zip(texts, results):
    print(f"{text}: {domain}")

Multi-Domain Blending

For complex text spanning multiple domains, get weighted analysis from all relevant domains:

# Standard blended valence
result = engine.compute_blended_valence(text)

print(f"Blended Valence: {result.blended_valence:.4f}")
print(f"Domain Mix: {result.domain_mix()}")
print(f"Dominant Domain: {result.dominant_domain()}")

# Custom blending parameters
result = engine.compute_blended_valence_custom(
    text,
    min_confidence=0.1,   # Include domains above this confidence
    max_domains=4         # Maximum domains to blend
)

# Just get the domain mix string
mix = engine.get_domain_mix(text)
print(mix)  # "70% Finance / 30% Legal"

# Batch blended analysis
results = engine.compute_blended_batch(texts)

Supported Domains

Domain IDDescriptionUse Case
loughran_mcdonaldFinancial sentiment10-K filings, earnings reports, financial news
blacks_lawLegal terminologyContracts, lawsuits, legal documents
cybersecuritySecurity & threatsIncident reports, vulnerability assessments
hr_workforceHR & employmentEmployee reviews, HR documents
healthcareMedical terminologyClinical notes, patient feedback
afinnGeneral sentimentSocial media, reviews (fallback domain)
# Load custom domain lexicon
engine.load_domain_lexicon("custom_finance.json", "custom_finance")

# Set active domain manually
engine.set_domain("loughran_mcdonald")
sentiment = engine.classify_sentiment(text)  # Uses financial lexicon

# Reset to auto-detection
engine.set_domain(None)

# Check available domains
domains = engine.available_domains()
print(domains)  # ["loughran_mcdonald", "blacks_law", ...]

Enterprise Taxonomy

Detect complaint phrases from a built-in taxonomy of 501 phrases across 24 industry verticals:

text = "The rating downgrade methodology was flawed and the credit assessment was unfair"

result = engine.detect_taxonomy(text)

print(f"Total matches: {result.total_matches}")
print(f"Primary industry: {result.dominant_industry}")
print(f"Primary category: {result.dominant_category}")
print(f"Aggregate sentiment: {result.aggregate_sentiment:.2f}")

# List all matches
for match in result.matches:
    print(f"  '{match.phrase}' - {match.industry}")
    print(f"    Category: {match.category_path}")
    print(f"    Sentiment: {match.sentiment_weight:.2f}")

# Get industry distribution
distribution = result.industry_percentages()
for industry, pct in distribution.items():
    print(f"  {industry}: {pct:.1%}")

TaxonomyResultPy Properties

PropertyTypeDescription
matcheslistList of TaxonomyMatchPy objects
total_matchesintNumber of phrases matched
dominant_industrystrMost common industry
dominant_categorystrMost common category
aggregate_sentimentfloatAverage sentiment of matches
industry_countsdictIndustry to count mapping
category_countsdictCategory to count mapping
industry_percentages()dictIndustry to percentage mapping

Industry-Specific Detection

# Filter taxonomy detection by industry
result = engine.detect_taxonomy_for_industry(text, "financial_intelligence")

# Check if text has specific industry complaints
has_finance = engine.has_taxonomy_industry(text, "financial_intelligence")
print(f"Has financial complaints: {has_finance}")

# Get best category match for an industry
category, matches = engine.classify_taxonomy_category(text, "financial_intelligence")
print(f"Category: {category}")

# List all available industries
industries = engine.available_taxonomy_industries()
for industry in industries:
    desc = engine.get_taxonomy_industry_description(industry)
    print(f"  {industry}: {desc}")

# Total phrases in taxonomy
count = engine.taxonomy_phrase_count()
print(f"Total taxonomy phrases: {count}")  # 501

Supported Industries (24 Total)

Financial Intelligence
Banking & Capital Markets
Insurance Claims
Healthcare Providers
Pharmaceutical
Telecommunications
Retail & E-commerce
Hospitality & Travel
Government Services
Legal Services
Technology Services
Manufacturing
Real Estate
Transportation & Logistics
Energy & Utilities
Education
ESG & Sustainable Finance
Private Equity
HR & Workforce
Cybersecurity
Environmental Services
Consumer Products
Media & Entertainment
Financial Services

Text Expansion

See how text is expanded with synonyms, antonyms, and negation detection:

expanded = engine.expand_text("I don't like the terrible service")

print(f"Original words: {expanded.original_words}")
print(f"Synonyms: {expanded.synonyms}")
print(f"Antonyms: {expanded.antonyms}")
print(f"Negated words: {expanded.negated_words}")
print(f"Has positive modal: {expanded.has_positive_modal}")
print(f"Has negative modal: {expanded.has_negative_modal}")
Negation Handling Features
  • NegEx-style window: 4-word negation scope
  • 60+ negation markers: not, never, no, neither, etc.
  • Verbal negators: fail, refuse, prevent, deny, reject, miss, lose
  • Double negation: Cancellation of negation effects
  • Negation walls: but, however, although reset scope
  • Prefix negation: un-, in-, im-, dis-, ir-, il- (250+ words)

Comprehensive Analysis

Get everything in one call - sentiment, domain, blending, and taxonomy:

analysis = engine.analyze_comprehensive(text)

# Sentiment
print(f"Sentiment: {analysis.sentiment}")
print(f"Valence: {analysis.valence:.4f}")

# Domain Detection
print(f"Detected Domain: {analysis.detected_domain}")
print(f"Domain Confidence: {analysis.domain_confidence:.2f}")
print(f"Domain Keywords: {analysis.domain_keywords}")

# Blending
print(f"Blended Valence: {analysis.blended_valence:.4f}")
print(f"Domain Mix: {analysis.domain_mix}")

# Taxonomy
print(f"Taxonomy Matches: {analysis.taxonomy_matches}")
print(f"Taxonomy Industry: {analysis.taxonomy_industry}")
print(f"Taxonomy Category: {analysis.taxonomy_category}")
print(f"Taxonomy Sentiment: {analysis.taxonomy_sentiment:.2f}")

Batch Auto-Analysis

# Process multiple texts with auto-domain per text
texts = [
    "The lawsuit was dismissed",
    "Revenue exceeded expectations",
    "Security vulnerability detected",
]

results = engine.analyze_batch_auto(texts)
for idx, domain, sentiment, valence in results:
    print(f"Text {idx}: {domain} - {sentiment} ({valence:.2f})")

Label Classification

Classify text to the best matching label from a set of descriptions:

# Define labels with descriptions
labels = [
    ("positive_feedback", "positive customer feedback expressing satisfaction"),
    ("negative_feedback", "negative customer feedback expressing dissatisfaction"),
    ("feature_request", "customer requesting new features or improvements"),
    ("bug_report", "customer reporting a bug or technical issue"),
]

text = "The app keeps crashing when I try to save my work"
label, score = engine.classify_to_label(text, labels)
print(f"Classification: {label} (confidence: {score:.4f})")
# "bug_report"

# Batch classification
texts = ["Love this product!", "Please add dark mode", "Error on checkout"]
results = engine.classify_batch(texts, labels)
for text, (label, score) in zip(texts, results):
    print(f"{text}: {label}")

Performance

Benchmarks on Intel i7-12700K, 32GB RAM:

Operation Throughput Latency
Sentiment Classification60,000+ texts/sec~0.02ms
Similarity Computation10,000+ pairs/sec~0.1ms
Domain Detection100,000+ texts/sec~0.01ms
Taxonomy Detection50,000+ texts/sec~0.02ms
Comprehensive Analysis20,000+ texts/sec~0.05ms
Why is it fast?
  • Native Rust with zero-copy memory operations
  • Aho-Corasick O(n) multi-pattern matching
  • SIMD-optimized hashbrown HashMap
  • Rayon parallel processing for batch operations
  • Lazy static initialization for global structures

API Reference

OnesEngine Class

MethodDescription
Initialization
OnesEngine()Create new engine instance
load_lexicon(path)Load JSON lexicon file
load_domain_lexicon(path, domain)Load domain-specific lexicon
set_domain(domain)Set active domain (None for auto)
get_domain()Get current active domain
available_domains()List all loaded domains
lexicon_size()Get lexicon entry count
Sentiment Analysis
classify_sentiment(text)Returns "positive"/"negative"/"neutral"
calculate_valence(text)Returns numeric score [-1, 1]
classify_sentiment_auto(text)Returns (sentiment, domain, confidence)
classify_sentiment_blended(text)Returns (sentiment, BlendedResultPy)
Similarity
compute_similarity(t1, t2)Returns SimilarityResult
compute_similarity_auto(t1, t2)Similarity with auto-domain
compute_similarity_blended(t1, t2)Returns (score, result1, result2)
compute_similarity_batch(pairs)Returns list of scores
find_most_similar(text, candidates)Returns (index, score)
similarity_matrix(texts)Returns flattened upper triangle
group_by_similarity(texts, threshold)Returns cluster assignments
Domain Detection
detect_domain(text)Returns (domain, conf, count, keywords)
get_domain_scores(text)Returns dict of all domain scores
detect_domain_batch(texts)Batch domain detection
Domain Blending
compute_blended_valence(text)Returns BlendedResultPy
compute_blended_valence_custom(text, min_conf, max_domains)Custom blending params
get_domain_mix(text)Returns string like "70% Finance / 30% Legal"
compute_blended_batch(texts)Batch blended analysis
Taxonomy
detect_taxonomy(text)Returns TaxonomyResultPy
detect_taxonomy_for_industry(text, industry)Industry-filtered detection
has_taxonomy_industry(text, industry)Returns bool
get_taxonomy_industry_distribution(text)Returns percentage dict
classify_taxonomy_category(text, industry)Returns (category, matches)
available_taxonomy_industries()List all 24 industries
get_taxonomy_industry_description(industry)Get industry description
taxonomy_phrase_count()Returns 501
detect_taxonomy_batch(texts)Batch taxonomy detection
Advanced
expand_text(text)Returns ExpandedSetPy
analyze_comprehensive(text)Returns ComprehensiveAnalysisPy
analyze_batch_auto(texts)Batch with auto-domain
classify_to_label(text, labels)Returns (label, score)
classify_batch(texts, labels)Batch classification

Licensing

ONES-RS includes a 30-day free trial. After the trial, activate an enterprise license to continue using the library.

Check Trial Status

from ones_rs import check_trial_status

status = check_trial_status()
print(f"License: {status.license_type}")
print(f"Days remaining: {status.days_remaining}")
print(f"Valid: {status.valid}")

Activate Enterprise License

from ones_rs import activate_license

# Activate with your license key
status = activate_license("ONES-eyJjb21wYW55IjogIll...")
print(f"Activated: {status.company}")
print(f"Expires in: {status.days_remaining} days")

Skip License Check (Evaluation Mode)

For restricted environments like Snowflake where license validation may fail, use evaluation mode:

from ones_rs import OnesEngine

# Skip license check for evaluation in restricted environments
engine = OnesEngine(skip_license_check=True)

# Use normally
result = engine.classify_sentiment("Revenue exceeded expectations")
License Key Format

Enterprise license keys are bound to your company domain and have format: ONES-{encoded_data}-{signature}

Contact admin@grandnasser.com for pricing and to request a license key.

Support

Ready to Get Started?

Install ONES-RS and start analyzing text at enterprise scale.

pip install ones-rs --index-url https://pypi.grandnasser.com/simple/