High-performance financial sentiment analysis using the Loughran-McDonald lexicon at 780K+ texts/second
Learn how to analyze financial news sentences using ONES-RS with the Loughran-McDonald financial lexicon
We're using the Financial PhraseBank dataset from HuggingFace, containing 4,840 sentences from financial news articles annotated by 16 experts. We use the "sentences_allagree" subset where all annotators agreed on the sentiment label.
| Sentence | Label |
|---|---|
| Operating profit rose to EUR 13.1 mn from EUR 8.7 mn in the corresponding period in 2007. | positive |
| Sales in Finland decreased by 10.5% in January, while sales outside Finland dropped by 17%. | negative |
| The company has its own operations in Finland, Sweden and Norway. | neutral |
| Net sales increased to EUR 1.4 bn from EUR 1.3 bn in 2005. | positive |
| The situation of coated magazine printing paper is expected to remain weak. | negative |
Install the ONES-RS library via pip. The library is built in Rust for maximum performance with Python bindings via PyO3:
# Install from wheel (enterprise version)
pip install ones_rs-1.1.0-cp311-cp311-win_amd64.whl
# Verify installation
python -c "from ones_rs.ones_rs import version; print(version())"
1.1.0
Initialize the OnesEngine and load the Loughran-McDonald financial lexicon containing 1,425 domain-specific terms:
from ones_rs.ones_rs import OnesEngine, version
from pathlib import Path
# Initialize the ONES engine
engine = OnesEngine()
# Load the Loughran-McDonald financial lexicon
lexicon_path = Path("data/domain_lexicons/loughran_mcdonald_lexicon.json")
num_terms = engine.load_lexicon(str(lexicon_path))
print(f"ONES-RS Version: {version()}")
print(f"Lexicon loaded: {num_terms} financial terms")
print(f"Engine ready for analysis")
ONES-RS Version: 1.1.0 Lexicon loaded: 1425 financial terms Engine ready for analysis
Load the dataset and parse the sentence@label format:
import pandas as pd
from pathlib import Path
# Load the AllAgree subset
data_path = Path("data/FinancialPhraseBank-v1.0/Sentences_AllAgree.txt")
sentences = []
labels = []
with open(data_path, 'r', encoding='latin-1') as f:
for line in f:
if '@' in line:
parts = line.strip().rsplit('@', 1)
if len(parts) == 2:
sentences.append(parts[0])
labels.append(parts[1])
df = pd.DataFrame({'sentence': sentences, 'label': labels})
print(f"Total sentences: {len(df)}")
print(f"Label distribution:")
print(df['label'].value_counts())
Total sentences: 2264 Label distribution: neutral 1391 positive 570 negative 303
Let's analyze a single financial sentence to understand the output:
# Sample financial sentence
text = "Operating profit rose to EUR 13.1 mn from EUR 8.7 mn in 2007."
# Get sentiment classification
sentiment = engine.classify_sentiment(text)
valence = engine.calculate_valence(text)
domain = engine.detect_domain(text)
print(f"Text: {text}")
print(f"Sentiment: {sentiment}")
print(f"Valence Score: {valence:.4f}")
print(f"Detected Domain: {domain}")
Text: Operating profit rose to EUR 13.1 mn from EUR 8.7 mn in 2007. Sentiment: positive Valence Score: 0.7000 Detected Domain: loughran_mcdonald
ONES-RS uses Rayon for parallel processing, achieving 780K+ texts per second. Let's analyze all 2,264 sentences:
import time
# Get all sentences
all_sentences = df['sentence'].tolist()
# Batch analysis with automatic parallelization
start_time = time.time()
results = engine.analyze_batch_auto(all_sentences)
elapsed = time.time() - start_time
# Results format: (index, domain, sentiment, valence)
df['ones_sentiment'] = [r[2] for r in results]
df['ones_valence'] = [r[3] for r in results]
# Calculate throughput
throughput = len(all_sentences) / elapsed
print(f"Processed {len(all_sentences):,} texts in {elapsed:.4f} seconds")
print(f"Throughput: {throughput:,.0f} texts/second")
Processed 2,264 texts in 0.0029 seconds Throughput: 782,389 texts/second
Compare ONES-RS predictions against ground truth labels:
from sklearn.metrics import accuracy_score, classification_report
# Calculate overall accuracy
accuracy = accuracy_score(df['label'], df['ones_sentiment'])
print(f"Overall Accuracy: {accuracy:.2%}")
print()
print("Classification Report:")
print(classification_report(df['label'], df['ones_sentiment']))
Overall Accuracy: 63.12%
Classification Report:
precision recall f1-score support
negative 0.21 0.09 0.12 303
neutral 0.77 0.76 0.76 1391
positive 0.46 0.62 0.53 570
accuracy 0.63 2264
macro avg 0.48 0.49 0.47 2264
weighted avg 0.62 0.63 0.62 2264
ONES-RS supports classification to custom labels using keyword matching:
# Define custom financial labels
custom_labels = [
("bullish", "strong growth, positive outlook, exceeds expectations, profitable, increase"),
("bearish", "decline, loss, negative outlook, underperform, weak, decrease"),
("neutral", "stable, unchanged, maintains, as expected")
]
# Classify sample texts
test_texts = [
"Revenue grew 25% and exceeded all analyst expectations.",
"The company reported a significant quarterly loss.",
"Operations continued as normal with stable margins."
]
for text in test_texts:
label, score = engine.classify_to_label(text, custom_labels)
print(f"[{label:8}] ({score:.3f}) {text}")
[bullish ] (0.021) Revenue grew 25% and exceeded all analyst expectations. [bearish ] (0.046) The company reported a significant quarterly loss. [neutral ] (0.516) Operations continued as normal with stable margins.
ONES-RS can group texts by lexical similarity using Jaccard-based clustering:
# Texts to cluster
texts = [
"Revenue increased significantly",
"Revenue grew strongly",
"Sales declined sharply",
"Sales dropped significantly",
"Earnings were stable"
]
# Group by similarity (threshold 0.2)
groups = engine.group_by_similarity(texts, 0.2)
print("Clustering Results:")
for text, cluster in zip(texts, groups):
print(f" Cluster {cluster}: {text}")
Clustering Results: Cluster 0: Revenue increased significantly Cluster 1: Revenue grew strongly Cluster 2: Sales declined sharply Cluster 0: Sales dropped significantly Cluster 3: Earnings were stable
ONES-RS achieves impressive throughput while maintaining competitive accuracy:
F1 scores by sentiment class:
Processing 780K+ texts per second, ONES-RS is ideal for real-time financial data feeds, high-frequency trading signals, and bulk document processing.
Pure lexicon-based approach means no model loading, no GPU requirements, and deterministic results every time.
The Loughran-McDonald lexicon is specifically designed for financial text, capturing terms like "impairment", "litigation", and "restructuring".
Capabilities demonstrated in this tutorial:
classify_sentiment(), calculate_valence(), classify_sentiment_blended()
analyze_batch_auto() with Rayon parallel processing
classify_to_label(), classify_batch() with custom categories
group_by_similarity(), compute_similarity(), similarity_matrix()
detect_domain(), get_domain_scores(), get_domain_mix()
Financial, Healthcare, Legal, Technology domains
Use ONES-RS for high-performance sentiment analysis on earnings calls, SEC filings, financial news, and analyst reports.
Dataset: Financial PhraseBank by Malo et al. on HuggingFace (4,840 sentences)