Shedding Light on the Unknown — Leveraging OSINT and AI for Leadership Decisions
| Type | Status | Published | Supervisors | |
| BA/MA/MP/IS | Assigned | 16 September 2025 |
Andy Aidoo Francisco Enguix
|
aidoo@ifi.uzh.ch enguix@ifi.uzh.ch |
The history of cyber threats begins long before the widespread use of the modern internet. In 1988, the Morris worm become one of the first large-scale network incidents, spreading rapidly across ARPANET and forcing researchers and administrators to realize how fragile digital infrastructures really were. Around the same time, computer viruses emerged in Eastern Europe, spreading on floppy disks and exploiting trust within early personal computing environments. Over the past three decades, as more and more, often times critical services moved to interconnected digital systems, this trend has intensified dramatically. What once where isolated acts of curiosity have evolved into the work of Advanced Persistent Threats (APTs). APTs are highly coordinated groups that are oftentimes state-sponsored or linked to organized crime.
Open-Source Intelligence (OSINT) is one of the richest sources of cyber threat information available today. Security vendors, Computer Emergency Response Teams (CERTs), research blogs, government advisories, academic papers and even social media provide reporting of vulnerabilities, incidents and attack campaigns. This set of information, however, is often highly fragmented, unstructured, voluminous and noisy. For researchers and organisations, OSINT is essential because it provides early insights into new threats. While technical experts are able to digest reports based on Structured Threat Information Expression (STIX) or Indicator of Compromise (IoC) feeds, the top management which is responsible for deciding budgets, approving security investments and prioritizing risks, cannot.
To address this communication gap, modern cyber threat intelligence (CTI) tools must include executive-friendly outputs in the form of e.g., clear summaries, FAQs, chat-based exploration and visualizations that show trends and impacts at a glance. An optimal system is capable of extracting IoCs, threat actors, malware families and vulnerabilities from i.a. OSINT reports and security blog posts. Furthermore, the tool should be able to detect novel emerging attack patterns, summarize and translate technical reports into understandable bits of information for a heterogeneous audience. Satisfying aforementioned requirements requires the use of Artificial Intelligence (AI), more concretely natural language processing (NLP), machine-learning (ML) based clustering, and the integration of large-language models (LLM) and agentic AI.
Prerequisites
-
Basic knowledge of malware, vulnerabilities, threat actors, and IoCs
-
Ability to work with APIs
-
Basic NLP skills
-
Programming in Python
-
Experience with matplotlib/Plotly or basic dashboards
-
Understanding of how to evaluate ML/NLP models, basics of reproducibility
-
Advanced cybersecurity concepts
-
Experience fine-tuning transformer models (BERT, GPT, LLaMA), clustering methods, and embeddings
-
Ability to clean, preprocess, and handle noisy/unstructured OSINT data at scale
-
Knowledge of appropriate metrics for NLP tasks and clustering validity indices (e.g., Silhouette score)
-
Skills in building a pipeline that integrates multiple AI components (scraper → extractor → summarizer → dashboard)