Signal Monitor: Commercial Intelligence for the Events Sector
An automated monitoring pipeline that replaces hours of manual industry tracking with a structured five-minute weekly brief. Built for the UK exhibitions and events sector, it monitors over thirty companies across website, RSS, LinkedIn, and industry news sources, extracting commercially relevant signals using AI classification and delivering a formatted intelligence report to Discord each Monday morning.
The Problem
In the UK exhibitions sector, the moves that matter tend to happen quietly. A competitor picks up a niche organiser. A new commercial director arrives from a rival. A venue operator announces a show format that overlaps with yours. None of this lands in a single place, and by the time it surfaces in a general news feed, the window for responding has often passed.
Fragmented sources
Relevant signals appear across company newsrooms, industry publications such as Exhibition News, RSS feeds from trade bodies, LinkedIn posts by senior executives, and events calendars. There is no single feed or database that aggregates this at a useful level of specificity.
High noise, low signal
Even when a team subscribes to all the relevant feeds, the volume of content is significant. Most of it is irrelevant: event recaps, generic marketing copy, sponsor announcements, procedural updates. Extracting the material from the noise is itself a time-consuming task.
Manual tracking does not scale
A researcher tracking thirty companies across four or five source types per company is dealing with 120 to 150 distinct data points per week. Keeping that up, every week, without gaps, is not a realistic expectation of any individual. Coverage drifts. Sources get missed. Signals arrive late or not at all.
Time sensitivity
Signals like a new hire, a confirmed acquisition, or a market entry announcement have a short window of relevance. Intelligence that arrives late, or is buried in an unsorted inbox, has diminished value.
The Solution
The goal was to take humans out of the loop between source and report. That meant building ingestion, classification, and delivery as a single automated pipeline with no manual steps between them.
Automated weekly monitoring
The pipeline runs on a fixed schedule, Monday morning London time, covering all target companies in a single pass, with no manual intervention required.
Multi-source aggregation
For each company, the system pulls from multiple source types: RSS feeds, company websites, LinkedIn posts via search APIs, and events-specific content. Industry news sources are treated as global feeds and scanned for company mentions.
AI-powered signal classification
A two-stage language model pipeline handles extraction and reporting. Stage 1 identifies and categorises signals using strict criteria. Stage 2 synthesises the output into a structured weekly brief.
Structured output and archival
The final report is delivered to Discord in a readable format, with a PDF archive generated alongside it. Operational summaries and source health alerts are posted to dedicated channels.
Architecture
The system is structured as a sequential pipeline with clearly separated concerns: fetch, classify, deduplicate, report.
- Checksum FilteringContent is checksummed on ingestion; unchanged content is skipped, avoiding redundant processing and unnecessary token usage
- Stage 1: Signal ExtractionStructured prompt extracts signals with category, confidence, summary, commercial significance, evidence, and named entities · Low-confidence signals discarded early
- Entity-Level DeduplicationSignals matched on company, category, primary entity, and time bucket; multi-source announcements collapse to a single signal · 12-week rolling persistence
- Stage 2: Report GenerationStructured signals transformed into a formatted, category-based weekly brief · Synthesis and formatting only, not classification
- Output LayerDiscord delivery with chunking and rate-limit handling · PDF generation via ReportLab · Operational logging and source health alerts to dedicated channels
Signal categories
Stage 1 classifies against predefined categories to ensure consistency across runs: executive appointments, mergers and acquisitions, product launches, and expansion activity. Signals that do not meet confidence thresholds are discarded before persistence.
Scheduling and operation
The pipeline runs on a weekly cadence using Python’s schedule library, managed in production via system services. A dry-run mode supports safe validation before deploying configuration changes. New companies are added via YAML without any code changes.
AI Integration
Language models are used selectively, only where deterministic approaches break down. The signals targeted do not follow predictable linguistic patterns and require contextual judgement; a rules-based classifier would be brittle and expensive to maintain.
Stage 1: Signal extraction
The extraction layer is designed for high selectivity. The prompt enforces strict criteria: signals must be recent, commercially relevant, and actionable. Low-confidence outputs are discarded before persistence. Global industry news sources are processed with company attribution logic, allowing a single feed to contribute signals across multiple monitored organisations.
Stage 2: Report generation
Stage 2 operates on structured data, not raw content. It performs formatting and synthesis only, presenting signals coherently within the weekly brief format, not making classification decisions. This separation keeps each stage’s responsibility narrow and its output predictable.
The Journey
The current architecture did not arrive fully formed. Each phase fixed a failure mode in what came before.
- Phase 1Manual monitoringManual review of company websites and trade publications. Output depended on who had time that week, which meant inconsistent cadence and gaps in coverage.
- Phase 2Feed aggregationRSS feeds and newsletters cut the navigation overhead. But the volume of content grew faster than the ability to filter it: more to read, not more to act on.
- Phase 3Web scrapingScraping extended coverage beyond what feeds provided. Page structure varies, access constraints are real, and getting stable extraction working took iteration. Ingestion eventually settled.
- Phase 4AI classificationAI classification moved the bottleneck. Human review became the exception, only needed when the model flagged low-confidence signals or a source went dark. Coverage across all thirty companies became consistent for the first time.
- Phase 5Accuracy and efficiency refinementsEntity-level deduplication collapsed multi-source announcements into single signals. Checksum-based filtering meant unchanged content skipped processing entirely, cutting token usage and run time.
Challenges and Trade-offs
| Challenge | Approach |
|---|---|
| Signal precision versus coverage. Aggressive extraction increases recall but reduces reliability. | System prioritises high-confidence signals, accepting that some lower-confidence content is excluded. Precision over recall. |
| Source reliability. Feeds and websites change unpredictably, creating silent coverage gaps. | Failure tracking surfaces gaps early via operational summaries posted to a dedicated Discord channel after each run. |
| LinkedIn constraints. Direct API access is heavily restricted. | Indirect retrieval via Serper search API with a scrape fallback, introducing dependency on third-party indexing latency. |
| LLM cost versus quality. Higher-capability models improve extraction but increase operating cost. | Multi-provider fallback chain balances cost and output quality, with providers selected based on performance and reliability per task type. |
| Batch scheduling. Weekly cadence limits recovery from a failed run to the next scheduled execution. | Weekly execution simplifies operation and monitoring. The trade-off is accepted: the value window for most signals spans days, not hours. |
Impact
Signal Monitor runs in production for Momentum Works. Each Monday morning, the pipeline processes over thirty companies across four source types and posts a structured intelligence brief to Discord with no human involved between source pull and report delivery.
What previously took several hours of manual checking takes the pipeline minutes. The brief itself takes under five minutes to read. Teams doing this work by hand were not keeping up with thirty companies at any consistent frequency; now coverage runs every week without gaps.
The rolling deduplication store means the same announcement does not appear twice across consecutive weeks. Archived reports are available for audit. When a source fails, it surfaces in a dedicated Discord channel the same morning, not discovered retroactively when someone notices a company went quiet.
Future Enhancements
- Event-to-CRM integration Extending the events capture pipeline to push structured event data into a CRM via API, enabling agencies to automatically track upcoming shows, align outreach with event timelines, and embed market activity directly into commercial workflows.
- Historical trend analysis. Extending the signal store to support time-series insights across companies and categories.
- Entity resolution. Normalising entity references across signals to improve cross-source aggregation accuracy.
- Alert prioritisation. Introducing urgency tiers to surface high-impact signals (such as M&A activity) above the standard weekly cadence.
- Feedback loops. Capturing user feedback to refine extraction quality and signal taxonomy over time.
- Dashboard. Providing a navigable interface for exploring historical signals beyond weekly reports.
- Additional sources. Expanding ingestion to include podcasts, press syndication feeds, and investor relations content.
Tech Stack
Need a system like this?
The underlying approach, pulling from fragmented sources, classifying for commercial relevance, and delivering on a fixed schedule, applies well beyond the events sector. Any market with enough public signal and not enough time to watch it is a candidate.
If you are tracking competitors manually, or have a market you cannot follow closely enough, get in touch.