ONES-RS
Enterprise NLP engine for sentiment analysis, semantic similarity, domain detection, and taxonomy classification. 10-100x faster than pure Python.
What is ONES-RS?
ONES stands for Oyemi-Negated Expansion Similarity - a sophisticated algorithm combining semantic expansion, negation handling, and domain-specific analysis. Built in Rust with Python bindings for maximum performance.
60,000+ Texts/Sec
Native Rust performance with SIMD optimization. Process massive datasets in seconds.
~0.1ms Latency
Sub-millisecond per operation. Perfect for real-time applications and APIs.
6 Domain Lexicons
Finance, Legal, Cybersecurity, HR, Healthcare, and AFINN built-in with auto-detection.
501 Taxonomy Phrases
Enterprise complaint taxonomy across 24 industry verticals with sentiment weights.
Installation
ONES-RS is distributed via our private PyPI server with pre-built wheels for all major platforms.
pip install ones-rs \
--index-url https://pypi.grandnasser.com/simple/ \
--trusted-host pypi.grandnasser.com
Supported Platforms
| Platform | Architecture | Python Versions |
|---|---|---|
| Linux | x86_64 | 3.9, 3.10, 3.11, 3.12 |
| Windows | x64 | 3.9, 3.10, 3.11, 3.12 |
| macOS | Intel (x86_64) | 3.10, 3.11, 3.12 |
| macOS | Apple Silicon (ARM64) | 3.10, 3.11, 3.12 |
Quick Start
from ones_rs import OnesEngine
# Initialize the engine
engine = OnesEngine()
# Load a lexicon (JSON format with word -> valence mappings)
engine.load_lexicon("path/to/lexicon.json")
# Basic sentiment analysis
text = "The product quality is excellent and customer service was amazing!"
sentiment = engine.classify_sentiment(text)
print(f"Sentiment: {sentiment}") # "positive"
valence = engine.calculate_valence(text)
print(f"Valence: {valence:.4f}") # 0.6234
# Semantic similarity
result = engine.compute_similarity(
"I am happy",
"I am glad"
)
print(f"Similarity: {result.jaccard_score:.4f}") # 0.8521
# Domain detection
domain, confidence, count, keywords = engine.detect_domain(
"The plaintiff filed a breach of contract lawsuit"
)
print(f"Domain: {domain} (confidence: {confidence:.2f})") # "blacks_law"
Sentiment Analysis
ONES-RS provides lexicon-based sentiment analysis with domain-specific weights and advanced negation handling.
Basic Sentiment Classification
# Classify as positive/negative/neutral
sentiment = engine.classify_sentiment("This is terrible!")
print(sentiment) # "negative"
# Get numeric valence score [-1.0, 1.0]
valence = engine.calculate_valence("I love this product")
print(f"Valence: {valence:.4f}") # Positive score
# Negation is handled automatically
valence = engine.calculate_valence("I don't love this product")
print(f"Valence: {valence:.4f}") # Flipped to negative
- Positive: valence > 0.05
- Negative: valence < -0.05
- Neutral: -0.05 ≤ valence ≤ 0.05
Auto-Domain Sentiment
Automatically detect the domain and apply the appropriate lexicon weights:
# Auto-detect domain and analyze
sentiment, domain, confidence = engine.classify_sentiment_auto(
"The revenue growth exceeded expectations despite market volatility"
)
print(f"Sentiment: {sentiment}") # "positive"
print(f"Domain: {domain}") # "loughran_mcdonald"
print(f"Confidence: {confidence:.2f}") # 0.85
Blended Multi-Domain Sentiment
For text spanning multiple domains, use blended analysis to get weighted contributions from each domain:
text = """
The plaintiff's lawsuit regarding a data breach caused significant
financial liability and the company implemented new security controls.
"""
# Get blended sentiment with domain breakdown
sentiment, result = engine.classify_sentiment_blended(text)
print(f"Sentiment: {sentiment}")
print(f"Blended Valence: {result.blended_valence:.4f}")
print(f"Domain Mix: {result.domain_mix()}")
# "40% blacks_law / 35% cybersecurity / 25% loughran_mcdonald"
# See individual domain contributions
for contrib in result.contributions:
print(f" {contrib.domain}: {contrib.valence:.2f} (weight: {contrib.weight:.1%})")
BlendedResultPy Properties
| Property | Type | Description |
|---|---|---|
blended_valence | float | Final weighted sentiment score |
domain_weights | dict | Domain to weight mapping (sums to 1.0) |
word_count | int | Total tokens in text |
matched_words | int | Tokens found in lexicon(s) |
contributions | list | List of DomainContributionPy objects |
domain_mix() | str | Human-readable domain distribution |
dominant_domain() | str | Domain with highest weight |
Semantic Similarity
Compute semantic similarity using weighted Jaccard with synonym/antonym expansion:
# Basic similarity between two texts
result = engine.compute_similarity(
"The service was excellent",
"The support was amazing"
)
print(f"Jaccard Score: {result.jaccard_score:.4f}")
print(f"Weighted Score: {result.weighted_score:.4f}")
print(f"Shared Words: {result.shared_words}")
# With auto-domain detection
result = engine.compute_similarity_auto(text1, text2)
# With blended multi-domain context
score, result1, result2 = engine.compute_similarity_blended(text1, text2)
SimilarityResult Properties
| Property | Type | Description |
|---|---|---|
jaccard_score | float | Weighted Jaccard similarity [0, 1] |
weighted_score | float | Valence-adjusted similarity score |
shared_words | list | Words found in both texts |
text1_size | int | Expanded set size for text1 |
text2_size | int | Expanded set size for text2 |
Batch Processing & Similarity Matrix
# Batch similarity for multiple pairs
pairs = [
("good product", "great item"),
("bad service", "poor support"),
("fast delivery", "quick shipping"),
]
scores = engine.compute_similarity_batch(pairs)
for (t1, t2), score in zip(pairs, scores):
print(f"{t1} vs {t2}: {score:.4f}")
# Find most similar text from candidates
query = "excellent quality"
candidates = ["poor quality", "amazing stuff", "awful experience"]
best_idx, score = engine.find_most_similar(query, candidates)
print(f"Best match: {candidates[best_idx]} (score: {score:.4f})")
# Compute full similarity matrix (flattened upper triangle)
texts = ["good", "great", "bad", "terrible"]
matrix = engine.similarity_matrix(texts)
# Returns: [good-great, good-bad, good-terrible, great-bad, great-terrible, bad-terrible]
Clustering by Similarity
# Group texts by similarity threshold
texts = [
"good product",
"great item",
"excellent purchase",
"bad service",
"poor experience",
"terrible support",
]
groups = engine.group_by_similarity(texts, threshold=0.5)
print(groups) # [0, 0, 0, 1, 1, 1] - two clusters
# Visualize clusters
from collections import defaultdict
clusters = defaultdict(list)
for text, group in zip(texts, groups):
clusters[group].append(text)
for group_id, members in clusters.items():
print(f"Cluster {group_id}: {members}")
Domain Detection
Automatically detect the domain of text using keyword-based Aho-Corasick matching:
# Detect primary domain
domain, confidence, keyword_count, keywords = engine.detect_domain(
"The vulnerability in the firewall allowed unauthorized access to the database"
)
print(f"Domain: {domain}") # "cybersecurity"
print(f"Confidence: {confidence:.2f}") # 0.92
print(f"Keywords found: {keywords}") # ["vulnerability", "firewall", "unauthorized", "access"]
# Get scores for all domains
scores = engine.get_domain_scores(text)
for domain, score in sorted(scores.items(), key=lambda x: -x[1]):
print(f" {domain}: {score:.4f}")
# Batch domain detection
texts = ["lawsuit filed", "revenue growth", "data breach"]
results = engine.detect_domain_batch(texts)
for text, (domain, conf, count, kw) in zip(texts, results):
print(f"{text}: {domain}")
Multi-Domain Blending
For complex text spanning multiple domains, get weighted analysis from all relevant domains:
# Standard blended valence
result = engine.compute_blended_valence(text)
print(f"Blended Valence: {result.blended_valence:.4f}")
print(f"Domain Mix: {result.domain_mix()}")
print(f"Dominant Domain: {result.dominant_domain()}")
# Custom blending parameters
result = engine.compute_blended_valence_custom(
text,
min_confidence=0.1, # Include domains above this confidence
max_domains=4 # Maximum domains to blend
)
# Just get the domain mix string
mix = engine.get_domain_mix(text)
print(mix) # "70% Finance / 30% Legal"
# Batch blended analysis
results = engine.compute_blended_batch(texts)
Supported Domains
| Domain ID | Description | Use Case |
|---|---|---|
loughran_mcdonald | Financial sentiment | 10-K filings, earnings reports, financial news |
blacks_law | Legal terminology | Contracts, lawsuits, legal documents |
cybersecurity | Security & threats | Incident reports, vulnerability assessments |
hr_workforce | HR & employment | Employee reviews, HR documents |
healthcare | Medical terminology | Clinical notes, patient feedback |
afinn | General sentiment | Social media, reviews (fallback domain) |
# Load custom domain lexicon
engine.load_domain_lexicon("custom_finance.json", "custom_finance")
# Set active domain manually
engine.set_domain("loughran_mcdonald")
sentiment = engine.classify_sentiment(text) # Uses financial lexicon
# Reset to auto-detection
engine.set_domain(None)
# Check available domains
domains = engine.available_domains()
print(domains) # ["loughran_mcdonald", "blacks_law", ...]
Enterprise Taxonomy
Detect complaint phrases from a built-in taxonomy of 501 phrases across 24 industry verticals:
text = "The rating downgrade methodology was flawed and the credit assessment was unfair"
result = engine.detect_taxonomy(text)
print(f"Total matches: {result.total_matches}")
print(f"Primary industry: {result.dominant_industry}")
print(f"Primary category: {result.dominant_category}")
print(f"Aggregate sentiment: {result.aggregate_sentiment:.2f}")
# List all matches
for match in result.matches:
print(f" '{match.phrase}' - {match.industry}")
print(f" Category: {match.category_path}")
print(f" Sentiment: {match.sentiment_weight:.2f}")
# Get industry distribution
distribution = result.industry_percentages()
for industry, pct in distribution.items():
print(f" {industry}: {pct:.1%}")
TaxonomyResultPy Properties
| Property | Type | Description |
|---|---|---|
matches | list | List of TaxonomyMatchPy objects |
total_matches | int | Number of phrases matched |
dominant_industry | str | Most common industry |
dominant_category | str | Most common category |
aggregate_sentiment | float | Average sentiment of matches |
industry_counts | dict | Industry to count mapping |
category_counts | dict | Category to count mapping |
industry_percentages() | dict | Industry to percentage mapping |
Industry-Specific Detection
# Filter taxonomy detection by industry
result = engine.detect_taxonomy_for_industry(text, "financial_intelligence")
# Check if text has specific industry complaints
has_finance = engine.has_taxonomy_industry(text, "financial_intelligence")
print(f"Has financial complaints: {has_finance}")
# Get best category match for an industry
category, matches = engine.classify_taxonomy_category(text, "financial_intelligence")
print(f"Category: {category}")
# List all available industries
industries = engine.available_taxonomy_industries()
for industry in industries:
desc = engine.get_taxonomy_industry_description(industry)
print(f" {industry}: {desc}")
# Total phrases in taxonomy
count = engine.taxonomy_phrase_count()
print(f"Total taxonomy phrases: {count}") # 501
Supported Industries (24 Total)
Text Expansion
See how text is expanded with synonyms, antonyms, and negation detection:
expanded = engine.expand_text("I don't like the terrible service")
print(f"Original words: {expanded.original_words}")
print(f"Synonyms: {expanded.synonyms}")
print(f"Antonyms: {expanded.antonyms}")
print(f"Negated words: {expanded.negated_words}")
print(f"Has positive modal: {expanded.has_positive_modal}")
print(f"Has negative modal: {expanded.has_negative_modal}")
- NegEx-style window: 4-word negation scope
- 60+ negation markers: not, never, no, neither, etc.
- Verbal negators: fail, refuse, prevent, deny, reject, miss, lose
- Double negation: Cancellation of negation effects
- Negation walls: but, however, although reset scope
- Prefix negation: un-, in-, im-, dis-, ir-, il- (250+ words)
Comprehensive Analysis
Get everything in one call - sentiment, domain, blending, and taxonomy:
analysis = engine.analyze_comprehensive(text)
# Sentiment
print(f"Sentiment: {analysis.sentiment}")
print(f"Valence: {analysis.valence:.4f}")
# Domain Detection
print(f"Detected Domain: {analysis.detected_domain}")
print(f"Domain Confidence: {analysis.domain_confidence:.2f}")
print(f"Domain Keywords: {analysis.domain_keywords}")
# Blending
print(f"Blended Valence: {analysis.blended_valence:.4f}")
print(f"Domain Mix: {analysis.domain_mix}")
# Taxonomy
print(f"Taxonomy Matches: {analysis.taxonomy_matches}")
print(f"Taxonomy Industry: {analysis.taxonomy_industry}")
print(f"Taxonomy Category: {analysis.taxonomy_category}")
print(f"Taxonomy Sentiment: {analysis.taxonomy_sentiment:.2f}")
Batch Auto-Analysis
# Process multiple texts with auto-domain per text
texts = [
"The lawsuit was dismissed",
"Revenue exceeded expectations",
"Security vulnerability detected",
]
results = engine.analyze_batch_auto(texts)
for idx, domain, sentiment, valence in results:
print(f"Text {idx}: {domain} - {sentiment} ({valence:.2f})")
Label Classification
Classify text to the best matching label from a set of descriptions:
# Define labels with descriptions
labels = [
("positive_feedback", "positive customer feedback expressing satisfaction"),
("negative_feedback", "negative customer feedback expressing dissatisfaction"),
("feature_request", "customer requesting new features or improvements"),
("bug_report", "customer reporting a bug or technical issue"),
]
text = "The app keeps crashing when I try to save my work"
label, score = engine.classify_to_label(text, labels)
print(f"Classification: {label} (confidence: {score:.4f})")
# "bug_report"
# Batch classification
texts = ["Love this product!", "Please add dark mode", "Error on checkout"]
results = engine.classify_batch(texts, labels)
for text, (label, score) in zip(texts, results):
print(f"{text}: {label}")
Performance
Benchmarks on Intel i7-12700K, 32GB RAM:
| Operation | Throughput | Latency |
|---|---|---|
| Sentiment Classification | 60,000+ texts/sec | ~0.02ms |
| Similarity Computation | 10,000+ pairs/sec | ~0.1ms |
| Domain Detection | 100,000+ texts/sec | ~0.01ms |
| Taxonomy Detection | 50,000+ texts/sec | ~0.02ms |
| Comprehensive Analysis | 20,000+ texts/sec | ~0.05ms |
- Native Rust with zero-copy memory operations
- Aho-Corasick O(n) multi-pattern matching
- SIMD-optimized hashbrown HashMap
- Rayon parallel processing for batch operations
- Lazy static initialization for global structures
API Reference
OnesEngine Class
| Method | Description |
|---|---|
| Initialization | |
OnesEngine() | Create new engine instance |
load_lexicon(path) | Load JSON lexicon file |
load_domain_lexicon(path, domain) | Load domain-specific lexicon |
set_domain(domain) | Set active domain (None for auto) |
get_domain() | Get current active domain |
available_domains() | List all loaded domains |
lexicon_size() | Get lexicon entry count |
| Sentiment Analysis | |
classify_sentiment(text) | Returns "positive"/"negative"/"neutral" |
calculate_valence(text) | Returns numeric score [-1, 1] |
classify_sentiment_auto(text) | Returns (sentiment, domain, confidence) |
classify_sentiment_blended(text) | Returns (sentiment, BlendedResultPy) |
| Similarity | |
compute_similarity(t1, t2) | Returns SimilarityResult |
compute_similarity_auto(t1, t2) | Similarity with auto-domain |
compute_similarity_blended(t1, t2) | Returns (score, result1, result2) |
compute_similarity_batch(pairs) | Returns list of scores |
find_most_similar(text, candidates) | Returns (index, score) |
similarity_matrix(texts) | Returns flattened upper triangle |
group_by_similarity(texts, threshold) | Returns cluster assignments |
| Domain Detection | |
detect_domain(text) | Returns (domain, conf, count, keywords) |
get_domain_scores(text) | Returns dict of all domain scores |
detect_domain_batch(texts) | Batch domain detection |
| Domain Blending | |
compute_blended_valence(text) | Returns BlendedResultPy |
compute_blended_valence_custom(text, min_conf, max_domains) | Custom blending params |
get_domain_mix(text) | Returns string like "70% Finance / 30% Legal" |
compute_blended_batch(texts) | Batch blended analysis |
| Taxonomy | |
detect_taxonomy(text) | Returns TaxonomyResultPy |
detect_taxonomy_for_industry(text, industry) | Industry-filtered detection |
has_taxonomy_industry(text, industry) | Returns bool |
get_taxonomy_industry_distribution(text) | Returns percentage dict |
classify_taxonomy_category(text, industry) | Returns (category, matches) |
available_taxonomy_industries() | List all 24 industries |
get_taxonomy_industry_description(industry) | Get industry description |
taxonomy_phrase_count() | Returns 501 |
detect_taxonomy_batch(texts) | Batch taxonomy detection |
| Advanced | |
expand_text(text) | Returns ExpandedSetPy |
analyze_comprehensive(text) | Returns ComprehensiveAnalysisPy |
analyze_batch_auto(texts) | Batch with auto-domain |
classify_to_label(text, labels) | Returns (label, score) |
classify_batch(texts, labels) | Batch classification |
Licensing
ONES-RS includes a 30-day free trial. After the trial, activate an enterprise license to continue using the library.
Check Trial Status
from ones_rs import check_trial_status
status = check_trial_status()
print(f"License: {status.license_type}")
print(f"Days remaining: {status.days_remaining}")
print(f"Valid: {status.valid}")
Activate Enterprise License
from ones_rs import activate_license
# Activate with your license key
status = activate_license("ONES-eyJjb21wYW55IjogIll...")
print(f"Activated: {status.company}")
print(f"Expires in: {status.days_remaining} days")
Skip License Check (Evaluation Mode)
For restricted environments like Snowflake where license validation may fail, use evaluation mode:
from ones_rs import OnesEngine
# Skip license check for evaluation in restricted environments
engine = OnesEngine(skip_license_check=True)
# Use normally
result = engine.classify_sentiment("Revenue exceeded expectations")
Enterprise license keys are bound to your company domain and have format: ONES-{encoded_data}-{signature}
Contact admin@grandnasser.com for pricing and to request a license key.
Support
- Email: admin@grandnasser.com
- GitHub: Issue Tracker
- Response Time: Within 24-48 hours
Ready to Get Started?
Install ONES-RS and start analyzing text at enterprise scale.
pip install ones-rs --index-url https://pypi.grandnasser.com/simple/