SEC 10K Filing Analysis

Extract risk indicators and compliance concerns from financial documents using KeyNeg

Tutorial

Financial Q&A 10K Dataset Analysis

Learn how to identify negative language patterns in SEC 10K filings from 69 publicly traded companies

1

The Dataset

We're using the Financial Q&A 10K dataset from Kaggle, containing 7,000 question-answer pairs extracted from SEC 10K filings of 69 companies including NVIDIA, Apple, Amazon, Goldman Sachs, and more.

Sample Data Financial-QA-10k.csv
ticker filing question context
NVDA 2023_10K What area did NVIDIA initially focus on? Since our original focus on PC graphics, we have expanded to several other large and important computationally intensive fields...
BAC 2023_10K What regulatory requirements affect operations? We are subject to extensive regulation and supervision under federal and state banking laws...
AMZN 2023_10K What are key risk factors? Our expansion places a significant strain on management, operational, financial and other resources...
JNJ 2023_10K What legal proceedings are disclosed? The Company and certain of its subsidiaries are involved in various lawsuits and claims...
GS 2023_10K How does market volatility affect business? Our businesses may be adversely affected by conditions in global financial markets...
7,000 Q&A Pairs
69 Companies
500 Contexts Analyzed
30% With Negative Sentiment
2

Load the Financial Data

Download the dataset from Kaggle and load it using pandas. The "context" column contains the actual 10K filing text we'll analyze:

Python load_financial_data.py
import kagglehub
import pandas as pd

# Download the Financial Q&A 10K dataset
path = kagglehub.dataset_download('yousefsaeedian/financial-q-and-a-10k')

# Load the CSV file
df = pd.read_csv(f'{path}/Financial-QA-10k.csv')

# View the dataset structure
print(f"Dataset shape: {df.shape}")
print(f"Columns: {df.columns.tolist()}")
print(f"Unique companies: {df['ticker'].nunique()}")
Output
Dataset shape: (7000, 5)
Columns: ['question', 'answer', 'context', 'ticker', 'filing']
Unique companies: 69
3

Initialize KeyNeg for Financial Analysis

KeyNeg's sentiment labels include financial-relevant categories like compliance issues, technical debt, lack of transparency, and more:

Python initialize_keyneg.py
from keyneg import KeyNeg

# Initialize the analyzer
kn = KeyNeg()

# KeyNeg includes financial-relevant sentiment labels:
# - compliance issues
# - technical debt
# - lack of transparency
# - safety concerns
# - ethical violations
# - downsizing
# - bureaucracy
# - and 90+ more...

print("KeyNeg initialized for financial document analysis")
Output
KeyNeg initialized for financial document analysis
4

Analyze a Single 10K Context

Let's analyze a single context from a 10K filing to understand the output:

Python single_analysis.py
# Sample context from a bank's 10K filing
context = """We are subject to extensive regulation and supervision
under federal and state banking laws. Failure to comply with
applicable regulatory requirements could result in significant
penalties, restrictions on business activities, and reputational harm."""

# Analyze the context
result = kn.analyze(context)

# View the results
print("Top Sentiment:", result['top_sentiment'])
print("Negativity Score:", result['negativity_score'])
print("All Sentiments:", [s[0] for s in result['sentiments']])
print("Categories:", result['categories'])
Output
Top Sentiment: compliance issues
Negativity Score: 0.38
All Sentiments: ['compliance issues', 'lack of transparency']
Categories: ['policy_systemic_issues', 'customer_market_discontent']
5

Batch Analysis of 500 Contexts

Analyze a sample of 500 10K contexts to identify common negative language patterns across companies:

Python batch_analysis.py
from collections import Counter

# Sample 500 contexts for analysis
sample_df = df.sample(n=500, random_state=42)
contexts = sample_df['context'].tolist()

# Analyze all contexts in batch
results = kn.analyze_batch(contexts)

# Count contexts with negative sentiment
negative_contexts = [r for r in results if r['sentiments']]
print(f"Contexts with negative sentiment: {len(negative_contexts)}/500")

# Aggregate all sentiments
all_sentiments = []
for r in results:
    if r['sentiments']:
        sentiment_names = [s[0] for s in r['sentiments']]
        all_sentiments.extend(sentiment_names)

# Display top sentiments
sentiment_counts = Counter(all_sentiments)
print("\\nTop 10 Negative Sentiments in 10K Filings:")
for sentiment, count in sentiment_counts.most_common(10):
    print(f"  {sentiment}: {count}")
Output
Contexts with negative sentiment: 148/500

Top 10 Negative Sentiments in 10K Filings:
  technical debt: 46
  compliance issues: 40
  lack of transparency: 14
  no growth opportunities: 14
  safety concerns: 12
  undervalued: 11
  bureaucracy: 9
  false advertising: 8
  downsizing: 7
  unfair compensation: 6
6

Compare Companies by Negativity

Group results by ticker symbol to identify which companies have the most negative language in their 10K filings:

Python company_comparison.py
# Track negativity by company
tickers = sample_df['ticker'].tolist()
company_data = {}

for i, r in enumerate(results):
    ticker = tickers[i]
    if r['sentiments']:
        if ticker not in company_data:
            company_data[ticker] = {'count': 0, 'sentiments': [], 'total_neg': 0}

        company_data[ticker]['count'] += 1
        company_data[ticker]['total_neg'] += r['negativity_score']
        sentiment_names = [s[0] for s in r['sentiments']]
        company_data[ticker]['sentiments'].extend(sentiment_names)

# Calculate average negativity and sort
company_scores = []
for ticker, data in company_data.items():
    if data['count'] >= 3:
        avg_neg = data['total_neg'] / data['count']
        top_issues = Counter(data['sentiments']).most_common(3)
        company_scores.append((ticker, avg_neg, data['count'], top_issues))

company_scores.sort(key=lambda x: x[1], reverse=True)

# Display top companies
print("Companies with Most Negative 10K Language:")
for ticker, avg_neg, count, top_issues in company_scores[:10]:
    issues = ', '.join([f'{i[0]}({i[1]})' for i in top_issues])
    print(f"  {ticker}: {avg_neg:.2f} ({count} contexts) - {issues}")
Output
Companies with Most Negative 10K Language:
  BAC: 0.37 (3 contexts) - compliance issues(1), technical debt(1), lack of transparency(1)
  AXP: 0.36 (4 contexts) - technical debt(3), undervalued(1), unfair compensation(1)
  EFX: 0.36 (4 contexts) - technical debt(3), compliance issues(1), bureaucracy(1)
  LVS: 0.36 (3 contexts) - technical debt(1), unrealistic deadlines(1), compliance issues(1)
  V: 0.36 (5 contexts) - compliance issues(4), bureaucracy(1), technical debt(1)
  GS: 0.35 (4 contexts) - undervalued(2), technical debt(2), compliance issues(1)
  AMZN: 0.35 (3 contexts) - no growth opportunities(2), technical debt(1)
  LLY: 0.35 (5 contexts) - compliance issues(2), technical debt(1), false advertising(1)
  JNJ: 0.35 (4 contexts) - technical debt(2), disengagement(1), skills obsolescence(1)
  GILD: 0.35 (4 contexts) - safety concerns(2), compliance issues(2), false advertising(1)

Results Visualization

Top negative sentiments found in SEC 10K filings:

Technical Debt
46
Compliance Issues
40
Lack of Transparency
14
No Growth Opportunities
14
Safety Concerns
12
Undervalued
11
Bureaucracy
9

Company Analysis

Companies with the highest negativity scores in their 10K filings:

BAC (Bank of America)

3 contexts Negativity: 0.37
Compliance Issues Technical Debt Lack of Transparency

V (Visa)

5 contexts Negativity: 0.36
Compliance Issues (4) Bureaucracy Technical Debt

GS (Goldman Sachs)

4 contexts Negativity: 0.35
Undervalued (2) Technical Debt (2) Compliance Issues

GILD (Gilead Sciences)

4 contexts Negativity: 0.35
Safety Concerns (2) Compliance Issues (2) False Advertising

JNJ (Johnson & Johnson)

4 contexts Negativity: 0.35
Technical Debt (2) Disengagement Skills Obsolescence

Key Insights

Compliance Dominates

Financial services companies (BAC, V, GS) show high compliance-related language, reflecting the heavily regulated nature of the industry.

Technical Debt Everywhere

"Technical debt" appeared in 46 contexts - companies frequently disclose technology infrastructure challenges and legacy system issues.

Healthcare Safety Focus

Pharmaceutical companies (GILD, LLY, JNJ) show "safety concerns" as a top issue - expected given FDA oversight and product liability.

Financial Analysis Use Cases

KeyNeg can help with various financial document analysis tasks:

Risk Assessment

Identify potential risks and red flags in 10K filings before they become material issues.

Due Diligence

Screen companies for negative language patterns during M&A or investment analysis.

Regulatory Compliance

Monitor disclosure language for compliance concerns across your portfolio.

Competitive Analysis

Compare negative sentiment patterns across competitors in the same industry.

Analyze Your Financial Documents

Use KeyNeg to extract risk indicators and compliance concerns from SEC filings, earnings calls, and financial reports.

Dataset: Financial Q&A 10K by Yousef Saeedian on Kaggle (7,000 Q&A pairs)