Benchmarking Large Language Models for Knowledge Graph Construction in Prediction Markets

Type	Status	Published	Supervisor	Email
MA/MP	Open	11 March 2026	Reza Abtahi Ahmad Abtahi Nasim Nezhadsistani	rabtahi@ifi.uzh.chabtahi@ifi.uzh.ch

1. Introduction
Decentralized prediction markets (e.g., Polymarket) rely on natural language contracts to define financial stakes. Unlike traditional finance, where assets have standardized identifiers (ISINs), prediction markets embed logic in free-form text (e.g., "Will Bitcoin hit $100k?" vs. "Will
Bitcoin be above $90k?"). To detect pricing ineƯiciencies, these unstructured descriptions must be mapped to a formal Knowledge Graph (KG) where edges represent probabilistic constraints (e.g., Subset, Mutually Exclusive, Complement).

2. Problem Statement
Extracting strict logical dependencies from text is prone to "structural hallucination," where models infer relationships that do not exist or miss obvious negations. While Large Language Models (LLMs) are capable of zero-shot reasoning, their reliability in high-stakes financial auditing is unproven. There is a lack of rigorous benchmarking comparing diƯerent LLM
architectures (Encoder-Decoder vs. Decoder-Only) for converting financial natural language into rigid graph topologies.

3. Thesis Objectives & Tracks
The student will design and implement a benchmarking framework to evaluate LLMs as "Semantic Parsers."
a. Pipeline Engineering: Develop a modular Python pipeline that ingests raw Polymarket contract data and utilizes various LLM backends (e.g., GPT-4o, Llama-3, Mistral-7B via Ollama) to extract entity-relationship triplets.
b. Comparative Analysis: Evaluate models based on three specific metrics derived from neuro-symbolic research:

Extraction Precision: Accuracy in classifying edge types (e.g., distinguishing Implication from Correlation).
Structural Hallucination Rate: The frequency of generating non-existent logical constraints.
Inference EƯiciency: The trade-oƯ between computational cost (latency/token usage) and logical accuracy.

4. Technical Stack
Languages: Python (LangChain, PyTorch/HuggingFace), Data: Polymarket snapshots (Kaggle/The Graph), Concepts: Prompt Engineering, Ontology Mapping, Graph Construction.

References
[1] Pan, S., et al. (2024). Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering.
[2] Lee, C., et al. (2025). Knowledge graph construction for stock markets with LLM-based explainable reasoning. CIKM '25.
[3] Nie, J., Hou, X., & Song, W. (2024). Knowledge graph eƯicient construction: Embedding chain-of-thought into LLMs. VLDB 2024 Workshop.

Prerequisites

None

Quicklinks

Main navigation

Benchmarking Large Language Models for Knowledge Graph Construction in Prediction Markets

Prerequisites