ONES-RS Financial Analysis

High-performance financial sentiment analysis using the Loughran-McDonald lexicon at 780K+ texts/second

Tutorial

Financial PhraseBank Sentiment Analysis

Learn how to analyze financial news sentences using ONES-RS with the Loughran-McDonald financial lexicon

1

The Dataset

We're using the Financial PhraseBank dataset from HuggingFace, containing 4,840 sentences from financial news articles annotated by 16 experts. We use the "sentences_allagree" subset where all annotators agreed on the sentiment label.

Sample Data Financial PhraseBank (AllAgree)
Sentence Label
Operating profit rose to EUR 13.1 mn from EUR 8.7 mn in the corresponding period in 2007. positive
Sales in Finland decreased by 10.5% in January, while sales outside Finland dropped by 17%. negative
The company has its own operations in Finland, Sweden and Norway. neutral
Net sales increased to EUR 1.4 bn from EUR 1.3 bn in 2005. positive
The situation of coated magazine printing paper is expected to remain weak. negative
2,264 Total Sentences
1,391 Neutral (61%)
570 Positive (25%)
303 Negative (13%)
2

Install ONES-RS

Install the ONES-RS library via pip. The library is built in Rust for maximum performance with Python bindings via PyO3:

Bash installation
# Install from wheel (enterprise version)
pip install ones_rs-1.1.0-cp311-cp311-win_amd64.whl

# Verify installation
python -c "from ones_rs.ones_rs import version; print(version())"
Output
1.1.0
3

Initialize the ONES Engine

Initialize the OnesEngine and load the Loughran-McDonald financial lexicon containing 1,425 domain-specific terms:

Python initialize_engine.py
from ones_rs.ones_rs import OnesEngine, version
from pathlib import Path

# Initialize the ONES engine
engine = OnesEngine()

# Load the Loughran-McDonald financial lexicon
lexicon_path = Path("data/domain_lexicons/loughran_mcdonald_lexicon.json")
num_terms = engine.load_lexicon(str(lexicon_path))

print(f"ONES-RS Version: {version()}")
print(f"Lexicon loaded: {num_terms} financial terms")
print(f"Engine ready for analysis")
Output
ONES-RS Version: 1.1.0
Lexicon loaded: 1425 financial terms
Engine ready for analysis
4

Load the Financial PhraseBank Dataset

Load the dataset and parse the sentence@label format:

Python load_dataset.py
import pandas as pd
from pathlib import Path

# Load the AllAgree subset
data_path = Path("data/FinancialPhraseBank-v1.0/Sentences_AllAgree.txt")

sentences = []
labels = []

with open(data_path, 'r', encoding='latin-1') as f:
    for line in f:
        if '@' in line:
            parts = line.strip().rsplit('@', 1)
            if len(parts) == 2:
                sentences.append(parts[0])
                labels.append(parts[1])

df = pd.DataFrame({'sentence': sentences, 'label': labels})

print(f"Total sentences: {len(df)}")
print(f"Label distribution:")
print(df['label'].value_counts())
Output
Total sentences: 2264
Label distribution:
neutral     1391
positive     570
negative     303
5

Analyze a Single Sentence

Let's analyze a single financial sentence to understand the output:

Python single_analysis.py
# Sample financial sentence
text = "Operating profit rose to EUR 13.1 mn from EUR 8.7 mn in 2007."

# Get sentiment classification
sentiment = engine.classify_sentiment(text)
valence = engine.calculate_valence(text)
domain = engine.detect_domain(text)

print(f"Text: {text}")
print(f"Sentiment: {sentiment}")
print(f"Valence Score: {valence:.4f}")
print(f"Detected Domain: {domain}")
Output
Text: Operating profit rose to EUR 13.1 mn from EUR 8.7 mn in 2007.
Sentiment: positive
Valence Score: 0.7000
Detected Domain: loughran_mcdonald
6

High-Performance Batch Analysis

ONES-RS uses Rayon for parallel processing, achieving 780K+ texts per second. Let's analyze all 2,264 sentences:

Python batch_analysis.py
import time

# Get all sentences
all_sentences = df['sentence'].tolist()

# Batch analysis with automatic parallelization
start_time = time.time()
results = engine.analyze_batch_auto(all_sentences)
elapsed = time.time() - start_time

# Results format: (index, domain, sentiment, valence)
df['ones_sentiment'] = [r[2] for r in results]
df['ones_valence'] = [r[3] for r in results]

# Calculate throughput
throughput = len(all_sentences) / elapsed

print(f"Processed {len(all_sentences):,} texts in {elapsed:.4f} seconds")
print(f"Throughput: {throughput:,.0f} texts/second")
Output
Processed 2,264 texts in 0.0029 seconds
Throughput: 782,389 texts/second
7

Calculate Accuracy Metrics

Compare ONES-RS predictions against ground truth labels:

Python calculate_metrics.py
from sklearn.metrics import accuracy_score, classification_report

# Calculate overall accuracy
accuracy = accuracy_score(df['label'], df['ones_sentiment'])

print(f"Overall Accuracy: {accuracy:.2%}")
print()
print("Classification Report:")
print(classification_report(df['label'], df['ones_sentiment']))
Output
Overall Accuracy: 63.12%

Classification Report:
              precision    recall  f1-score   support

    negative       0.21      0.09      0.12       303
     neutral       0.77      0.76      0.76      1391
    positive       0.46      0.62      0.53       570

    accuracy                           0.63      2264
   macro avg       0.48      0.49      0.47      2264
weighted avg       0.62      0.63      0.62      2264
8

Custom Label Classification

ONES-RS supports classification to custom labels using keyword matching:

Python custom_labels.py
# Define custom financial labels
custom_labels = [
    ("bullish", "strong growth, positive outlook, exceeds expectations, profitable, increase"),
    ("bearish", "decline, loss, negative outlook, underperform, weak, decrease"),
    ("neutral", "stable, unchanged, maintains, as expected")
]

# Classify sample texts
test_texts = [
    "Revenue grew 25% and exceeded all analyst expectations.",
    "The company reported a significant quarterly loss.",
    "Operations continued as normal with stable margins."
]

for text in test_texts:
    label, score = engine.classify_to_label(text, custom_labels)
    print(f"[{label:8}] ({score:.3f}) {text}")
Output
[bullish ] (0.021) Revenue grew 25% and exceeded all analyst expectations.
[bearish ] (0.046) The company reported a significant quarterly loss.
[neutral ] (0.516) Operations continued as normal with stable margins.
9

Group Similar Texts

ONES-RS can group texts by lexical similarity using Jaccard-based clustering:

Python similarity_grouping.py
# Texts to cluster
texts = [
    "Revenue increased significantly",
    "Revenue grew strongly",
    "Sales declined sharply",
    "Sales dropped significantly",
    "Earnings were stable"
]

# Group by similarity (threshold 0.2)
groups = engine.group_by_similarity(texts, 0.2)

print("Clustering Results:")
for text, cluster in zip(texts, groups):
    print(f"  Cluster {cluster}: {text}")
Output
Clustering Results:
  Cluster 0: Revenue increased significantly
  Cluster 1: Revenue grew strongly
  Cluster 2: Sales declined sharply
  Cluster 0: Sales dropped significantly
  Cluster 3: Earnings were stable

Performance Results

ONES-RS achieves impressive throughput while maintaining competitive accuracy:

782K+ Texts/Second
63.12% Overall Accuracy
0.471 Macro F1
1,425 Lexicon Terms

Per-Class Performance

F1 scores by sentiment class:

Neutral
F1: 0.760
Positive
F1: 0.529
Negative
F1: 0.124

Key Insights

Extreme Speed

Processing 780K+ texts per second, ONES-RS is ideal for real-time financial data feeds, high-frequency trading signals, and bulk document processing.

No ML Overhead

Pure lexicon-based approach means no model loading, no GPU requirements, and deterministic results every time.

Domain Optimized

The Loughran-McDonald lexicon is specifically designed for financial text, capturing terms like "impairment", "litigation", and "restructuring".

ONES-RS Features

Capabilities demonstrated in this tutorial:

Sentiment Analysis

classify_sentiment(), calculate_valence(), classify_sentiment_blended()

Batch Processing

analyze_batch_auto() with Rayon parallel processing

Custom Labels

classify_to_label(), classify_batch() with custom categories

Similarity Clustering

group_by_similarity(), compute_similarity(), similarity_matrix()

Domain Detection

detect_domain(), get_domain_scores(), get_domain_mix()

Multi-Domain Lexicons

Financial, Healthcare, Legal, Technology domains

Analyze Your Financial Data

Use ONES-RS for high-performance sentiment analysis on earnings calls, SEC filings, financial news, and analyst reports.

Dataset: Financial PhraseBank by Malo et al. on HuggingFace (4,840 sentences)