High Performance

ONES-RS

Enterprise NLP engine for sentiment analysis, semantic similarity, domain detection, and taxonomy classification. 10-100x faster than pure Python.

GitHub

What is ONES-RS?

ONES stands for Oyemi-Negated Expansion Similarity - a sophisticated algorithm combining semantic expansion, negation handling, and domain-specific analysis. Built in Rust with Python bindings for maximum performance.

60,000+ Texts/Sec

Native Rust performance with SIMD optimization. Process massive datasets in seconds.

~0.1ms Latency

Sub-millisecond per operation. Perfect for real-time applications and APIs.

6 Domain Lexicons

Finance, Legal, Cybersecurity, HR, Healthcare, and AFINN built-in with auto-detection.

501 Taxonomy Phrases

Enterprise complaint taxonomy across 24 industry verticals with sentiment weights.

Installation

ONES-RS is distributed via our private PyPI server with pre-built wheels for all major platforms.

pip install ones-rs \
    --index-url https://pypi.grandnasser.com/simple/ \
    --trusted-host pypi.grandnasser.com

Supported Platforms

Platform	Architecture	Python Versions
Linux	x86_64	3.9, 3.10, 3.11, 3.12
Windows	x64	3.9, 3.10, 3.11, 3.12
macOS	Intel (x86_64)	3.10, 3.11, 3.12
macOS	Apple Silicon (ARM64)	3.10, 3.11, 3.12

Quick Start

from ones_rs import OnesEngine

# Initialize the engine
engine = OnesEngine()

# Load a lexicon (JSON format with word -> valence mappings)
engine.load_lexicon("path/to/lexicon.json")

# Basic sentiment analysis
text = "The product quality is excellent and customer service was amazing!"

sentiment = engine.classify_sentiment(text)
print(f"Sentiment: {sentiment}")  # "positive"

valence = engine.calculate_valence(text)
print(f"Valence: {valence:.4f}")  # 0.6234

# Semantic similarity
result = engine.compute_similarity(
    "I am happy",
    "I am glad"
)
print(f"Similarity: {result.jaccard_score:.4f}")  # 0.8521

# Domain detection
domain, confidence, count, keywords = engine.detect_domain(
    "The plaintiff filed a breach of contract lawsuit"
)
print(f"Domain: {domain} (confidence: {confidence:.2f})")  # "blacks_law"

Sentiment Analysis

ONES-RS provides lexicon-based sentiment analysis with domain-specific weights and advanced negation handling.

Basic Sentiment Classification

# Classify as positive/negative/neutral
sentiment = engine.classify_sentiment("This is terrible!")
print(sentiment)  # "negative"

# Get numeric valence score [-1.0, 1.0]
valence = engine.calculate_valence("I love this product")
print(f"Valence: {valence:.4f}")  # Positive score

# Negation is handled automatically
valence = engine.calculate_valence("I don't love this product")
print(f"Valence: {valence:.4f}")  # Flipped to negative

Classification Thresholds

Positive: valence > 0.05
Negative: valence < -0.05
Neutral: -0.05 ≤ valence ≤ 0.05

Auto-Domain Sentiment

Automatically detect the domain and apply the appropriate lexicon weights:

# Auto-detect domain and analyze
sentiment, domain, confidence = engine.classify_sentiment_auto(
    "The revenue growth exceeded expectations despite market volatility"
)

print(f"Sentiment: {sentiment}")      # "positive"
print(f"Domain: {domain}")            # "loughran_mcdonald"
print(f"Confidence: {confidence:.2f}")  # 0.85

Blended Multi-Domain Sentiment

For text spanning multiple domains, use blended analysis to get weighted contributions from each domain:

text = """
The plaintiff's lawsuit regarding a data breach caused significant
financial liability and the company implemented new security controls.
"""

# Get blended sentiment with domain breakdown
sentiment, result = engine.classify_sentiment_blended(text)

print(f"Sentiment: {sentiment}")
print(f"Blended Valence: {result.blended_valence:.4f}")
print(f"Domain Mix: {result.domain_mix()}")
# "40% blacks_law / 35% cybersecurity / 25% loughran_mcdonald"

# See individual domain contributions
for contrib in result.contributions:
    print(f"  {contrib.domain}: {contrib.valence:.2f} (weight: {contrib.weight:.1%})")

BlendedResultPy Properties

Property	Type	Description
`blended_valence`	float	Final weighted sentiment score
`domain_weights`	dict	Domain to weight mapping (sums to 1.0)
`word_count`	int	Total tokens in text
`matched_words`	int	Tokens found in lexicon(s)
`contributions`	list	List of DomainContributionPy objects
`domain_mix()`	str	Human-readable domain distribution
`dominant_domain()`	str	Domain with highest weight

Semantic Similarity

Compute semantic similarity using weighted Jaccard with synonym/antonym expansion:

# Basic similarity between two texts
result = engine.compute_similarity(
    "The service was excellent",
    "The support was amazing"
)

print(f"Jaccard Score: {result.jaccard_score:.4f}")
print(f"Weighted Score: {result.weighted_score:.4f}")
print(f"Shared Words: {result.shared_words}")

# With auto-domain detection
result = engine.compute_similarity_auto(text1, text2)

# With blended multi-domain context
score, result1, result2 = engine.compute_similarity_blended(text1, text2)

SimilarityResult Properties

Property	Type	Description
`jaccard_score`	float	Weighted Jaccard similarity [0, 1]
`weighted_score`	float	Valence-adjusted similarity score
`shared_words`	list	Words found in both texts
`text1_size`	int	Expanded set size for text1
`text2_size`	int	Expanded set size for text2

Batch Processing & Similarity Matrix

# Batch similarity for multiple pairs
pairs = [
    ("good product", "great item"),
    ("bad service", "poor support"),
    ("fast delivery", "quick shipping"),
]
scores = engine.compute_similarity_batch(pairs)
for (t1, t2), score in zip(pairs, scores):
    print(f"{t1} vs {t2}: {score:.4f}")

# Find most similar text from candidates
query = "excellent quality"
candidates = ["poor quality", "amazing stuff", "awful experience"]
best_idx, score = engine.find_most_similar(query, candidates)
print(f"Best match: {candidates[best_idx]} (score: {score:.4f})")

# Compute full similarity matrix (flattened upper triangle)
texts = ["good", "great", "bad", "terrible"]
matrix = engine.similarity_matrix(texts)
# Returns: [good-great, good-bad, good-terrible, great-bad, great-terrible, bad-terrible]

Clustering by Similarity

# Group texts by similarity threshold
texts = [
    "good product",
    "great item",
    "excellent purchase",
    "bad service",
    "poor experience",
    "terrible support",
]

groups = engine.group_by_similarity(texts, threshold=0.5)
print(groups)  # [0, 0, 0, 1, 1, 1] - two clusters

# Visualize clusters
from collections import defaultdict
clusters = defaultdict(list)
for text, group in zip(texts, groups):
    clusters[group].append(text)

for group_id, members in clusters.items():
    print(f"Cluster {group_id}: {members}")

Domain Detection

Automatically detect the domain of text using keyword-based Aho-Corasick matching:

# Detect primary domain
domain, confidence, keyword_count, keywords = engine.detect_domain(
    "The vulnerability in the firewall allowed unauthorized access to the database"
)

print(f"Domain: {domain}")              # "cybersecurity"
print(f"Confidence: {confidence:.2f}")  # 0.92
print(f"Keywords found: {keywords}")    # ["vulnerability", "firewall", "unauthorized", "access"]

# Get scores for all domains
scores = engine.get_domain_scores(text)
for domain, score in sorted(scores.items(), key=lambda x: -x[1]):
    print(f"  {domain}: {score:.4f}")

# Batch domain detection
texts = ["lawsuit filed", "revenue growth", "data breach"]
results = engine.detect_domain_batch(texts)
for text, (domain, conf, count, kw) in zip(texts, results):
    print(f"{text}: {domain}")

Multi-Domain Blending

For complex text spanning multiple domains, get weighted analysis from all relevant domains:

# Standard blended valence
result = engine.compute_blended_valence(text)

print(f"Blended Valence: {result.blended_valence:.4f}")
print(f"Domain Mix: {result.domain_mix()}")
print(f"Dominant Domain: {result.dominant_domain()}")

# Custom blending parameters
result = engine.compute_blended_valence_custom(
    text,
    min_confidence=0.1,   # Include domains above this confidence
    max_domains=4         # Maximum domains to blend
)

# Just get the domain mix string
mix = engine.get_domain_mix(text)
print(mix)  # "70% Finance / 30% Legal"

# Batch blended analysis
results = engine.compute_blended_batch(texts)

Supported Domains

Domain ID	Description	Use Case
`loughran_mcdonald`	Financial sentiment	10-K filings, earnings reports, financial news
`blacks_law`	Legal terminology	Contracts, lawsuits, legal documents
`cybersecurity`	Security & threats	Incident reports, vulnerability assessments
`hr_workforce`	HR & employment	Employee reviews, HR documents
`healthcare`	Medical terminology	Clinical notes, patient feedback
`afinn`	General sentiment	Social media, reviews (fallback domain)

# Load custom domain lexicon
engine.load_domain_lexicon("custom_finance.json", "custom_finance")

# Set active domain manually
engine.set_domain("loughran_mcdonald")
sentiment = engine.classify_sentiment(text)  # Uses financial lexicon

# Reset to auto-detection
engine.set_domain(None)

# Check available domains
domains = engine.available_domains()
print(domains)  # ["loughran_mcdonald", "blacks_law", ...]

Enterprise Taxonomy

Detect complaint phrases from a built-in taxonomy of 501 phrases across 24 industry verticals:

text = "The rating downgrade methodology was flawed and the credit assessment was unfair"

result = engine.detect_taxonomy(text)

print(f"Total matches: {result.total_matches}")
print(f"Primary industry: {result.dominant_industry}")
print(f"Primary category: {result.dominant_category}")
print(f"Aggregate sentiment: {result.aggregate_sentiment:.2f}")

# List all matches
for match in result.matches:
    print(f"  '{match.phrase}' - {match.industry}")
    print(f"    Category: {match.category_path}")
    print(f"    Sentiment: {match.sentiment_weight:.2f}")

# Get industry distribution
distribution = result.industry_percentages()
for industry, pct in distribution.items():
    print(f"  {industry}: {pct:.1%}")

TaxonomyResultPy Properties

Property	Type	Description
`matches`	list	List of TaxonomyMatchPy objects
`total_matches`	int	Number of phrases matched
`dominant_industry`	str	Most common industry
`dominant_category`	str	Most common category
`aggregate_sentiment`	float	Average sentiment of matches
`industry_counts`	dict	Industry to count mapping
`category_counts`	dict	Category to count mapping
`industry_percentages()`	dict	Industry to percentage mapping

Industry-Specific Detection

# Filter taxonomy detection by industry
result = engine.detect_taxonomy_for_industry(text, "financial_intelligence")

# Check if text has specific industry complaints
has_finance = engine.has_taxonomy_industry(text, "financial_intelligence")
print(f"Has financial complaints: {has_finance}")

# Get best category match for an industry
category, matches = engine.classify_taxonomy_category(text, "financial_intelligence")
print(f"Category: {category}")

# List all available industries
industries = engine.available_taxonomy_industries()
for industry in industries:
    desc = engine.get_taxonomy_industry_description(industry)
    print(f"  {industry}: {desc}")

# Total phrases in taxonomy
count = engine.taxonomy_phrase_count()
print(f"Total taxonomy phrases: {count}")  # 501

Supported Industries (24 Total)

Financial Intelligence

Banking & Capital Markets

Insurance Claims

Healthcare Providers

Pharmaceutical

Telecommunications

Retail & E-commerce

Hospitality & Travel

Government Services

Legal Services

Technology Services

Manufacturing

Real Estate

Transportation & Logistics

Energy & Utilities

Education

ESG & Sustainable Finance

Private Equity

HR & Workforce

Cybersecurity

Environmental Services

Consumer Products

Media & Entertainment

Financial Services

Text Expansion

See how text is expanded with synonyms, antonyms, and negation detection:

expanded = engine.expand_text("I don't like the terrible service")

print(f"Original words: {expanded.original_words}")
print(f"Synonyms: {expanded.synonyms}")
print(f"Antonyms: {expanded.antonyms}")
print(f"Negated words: {expanded.negated_words}")
print(f"Has positive modal: {expanded.has_positive_modal}")
print(f"Has negative modal: {expanded.has_negative_modal}")

Negation Handling Features

NegEx-style window: 4-word negation scope
60+ negation markers: not, never, no, neither, etc.
Verbal negators: fail, refuse, prevent, deny, reject, miss, lose
Double negation: Cancellation of negation effects
Negation walls: but, however, although reset scope
Prefix negation: un-, in-, im-, dis-, ir-, il- (250+ words)

Comprehensive Analysis

Get everything in one call - sentiment, domain, blending, and taxonomy:

analysis = engine.analyze_comprehensive(text)

# Sentiment
print(f"Sentiment: {analysis.sentiment}")
print(f"Valence: {analysis.valence:.4f}")

# Domain Detection
print(f"Detected Domain: {analysis.detected_domain}")
print(f"Domain Confidence: {analysis.domain_confidence:.2f}")
print(f"Domain Keywords: {analysis.domain_keywords}")

# Blending
print(f"Blended Valence: {analysis.blended_valence:.4f}")
print(f"Domain Mix: {analysis.domain_mix}")

# Taxonomy
print(f"Taxonomy Matches: {analysis.taxonomy_matches}")
print(f"Taxonomy Industry: {analysis.taxonomy_industry}")
print(f"Taxonomy Category: {analysis.taxonomy_category}")
print(f"Taxonomy Sentiment: {analysis.taxonomy_sentiment:.2f}")

Batch Auto-Analysis

# Process multiple texts with auto-domain per text
texts = [
    "The lawsuit was dismissed",
    "Revenue exceeded expectations",
    "Security vulnerability detected",
]

results = engine.analyze_batch_auto(texts)
for idx, domain, sentiment, valence in results:
    print(f"Text {idx}: {domain} - {sentiment} ({valence:.2f})")

Label Classification

Classify text to the best matching label from a set of descriptions:

# Define labels with descriptions
labels = [
    ("positive_feedback", "positive customer feedback expressing satisfaction"),
    ("negative_feedback", "negative customer feedback expressing dissatisfaction"),
    ("feature_request", "customer requesting new features or improvements"),
    ("bug_report", "customer reporting a bug or technical issue"),
]

text = "The app keeps crashing when I try to save my work"
label, score = engine.classify_to_label(text, labels)
print(f"Classification: {label} (confidence: {score:.4f})")
# "bug_report"

# Batch classification
texts = ["Love this product!", "Please add dark mode", "Error on checkout"]
results = engine.classify_batch(texts, labels)
for text, (label, score) in zip(texts, results):
    print(f"{text}: {label}")

Performance

Benchmarks on Intel i7-12700K, 32GB RAM:

Operation	Throughput	Latency
Sentiment Classification	60,000+ texts/sec	~0.02ms
Similarity Computation	10,000+ pairs/sec	~0.1ms
Domain Detection	100,000+ texts/sec	~0.01ms
Taxonomy Detection	50,000+ texts/sec	~0.02ms
Comprehensive Analysis	20,000+ texts/sec	~0.05ms

Why is it fast?

Native Rust with zero-copy memory operations
Aho-Corasick O(n) multi-pattern matching
SIMD-optimized hashbrown HashMap
Rayon parallel processing for batch operations
Lazy static initialization for global structures

API Reference

OnesEngine Class

Method	Description
Initialization
`OnesEngine()`	Create new engine instance
`load_lexicon(path)`	Load JSON lexicon file
`load_domain_lexicon(path, domain)`	Load domain-specific lexicon
`set_domain(domain)`	Set active domain (None for auto)
`get_domain()`	Get current active domain
`available_domains()`	List all loaded domains
`lexicon_size()`	Get lexicon entry count
Sentiment Analysis
`classify_sentiment(text)`	Returns "positive"/"negative"/"neutral"
`calculate_valence(text)`	Returns numeric score [-1, 1]
`classify_sentiment_auto(text)`	Returns (sentiment, domain, confidence)
`classify_sentiment_blended(text)`	Returns (sentiment, BlendedResultPy)
Similarity
`compute_similarity(t1, t2)`	Returns SimilarityResult
`compute_similarity_auto(t1, t2)`	Similarity with auto-domain
`compute_similarity_blended(t1, t2)`	Returns (score, result1, result2)
`compute_similarity_batch(pairs)`	Returns list of scores
`find_most_similar(text, candidates)`	Returns (index, score)
`similarity_matrix(texts)`	Returns flattened upper triangle
`group_by_similarity(texts, threshold)`	Returns cluster assignments
Domain Detection
`detect_domain(text)`	Returns (domain, conf, count, keywords)
`get_domain_scores(text)`	Returns dict of all domain scores
`detect_domain_batch(texts)`	Batch domain detection
Domain Blending
`compute_blended_valence(text)`	Returns BlendedResultPy
`compute_blended_valence_custom(text, min_conf, max_domains)`	Custom blending params
`get_domain_mix(text)`	Returns string like "70% Finance / 30% Legal"
`compute_blended_batch(texts)`	Batch blended analysis
Taxonomy
`detect_taxonomy(text)`	Returns TaxonomyResultPy
`detect_taxonomy_for_industry(text, industry)`	Industry-filtered detection
`has_taxonomy_industry(text, industry)`	Returns bool
`get_taxonomy_industry_distribution(text)`	Returns percentage dict
`classify_taxonomy_category(text, industry)`	Returns (category, matches)
`available_taxonomy_industries()`	List all 24 industries
`get_taxonomy_industry_description(industry)`	Get industry description
`taxonomy_phrase_count()`	Returns 501
`detect_taxonomy_batch(texts)`	Batch taxonomy detection
Advanced
`expand_text(text)`	Returns ExpandedSetPy
`analyze_comprehensive(text)`	Returns ComprehensiveAnalysisPy
`analyze_batch_auto(texts)`	Batch with auto-domain
`classify_to_label(text, labels)`	Returns (label, score)
`classify_batch(texts, labels)`	Batch classification

Licensing

ONES-RS includes a 30-day free trial. After the trial, activate an enterprise license to continue using the library.

Check Trial Status

from ones_rs import check_trial_status

status = check_trial_status()
print(f"License: {status.license_type}")
print(f"Days remaining: {status.days_remaining}")
print(f"Valid: {status.valid}")

Activate Enterprise License

from ones_rs import activate_license

# Activate with your license key
status = activate_license("ONES-eyJjb21wYW55IjogIll...")
print(f"Activated: {status.company}")
print(f"Expires in: {status.days_remaining} days")

Skip License Check (Evaluation Mode)

For restricted environments like Snowflake where license validation may fail, use evaluation mode:

from ones_rs import OnesEngine

# Skip license check for evaluation in restricted environments
engine = OnesEngine(skip_license_check=True)

# Use normally
result = engine.classify_sentiment("Revenue exceeded expectations")

License Key Format

Enterprise license keys are bound to your company domain and have format: ONES-{encoded_data}-{signature}

Contact admin@grandnasser.com for pricing and to request a license key.

Support

Email: admin@grandnasser.com
GitHub: Issue Tracker
Response Time: Within 24-48 hours

Ready to Get Started?

Install ONES-RS and start analyzing text at enterprise scale.

pip install ones-rs --index-url https://pypi.grandnasser.com/simple/