Bitcoin Blockchain Data Processing Using the Apache Hadoop Framework

State: completed by Dominik Sommer

Published: 2018-06-18

The Bitcoin blockchain [3], introduced in 2009, is considered one of the first, if not the first, blockchain implementation. From its release to the public until now, this blockchain has reached, in size, more than 160 GB [2]. Moreover, with the increasing interest in cryptocurrencies by the general public, the Bitcoin blockchain size continues to grow. Thus, the size of this blockchain is becoming a challenge not only for peers to maintain but also for researchers to perform the processing of such data without using big data-specific frameworks, such as the Apache Hadoop [1].

The Apache Hadoop is a framework designed to process massive amounts of data in parallel. It relies on the MapReduce programming model and distributed storage to process such amount of data efficiently. Therefore, the use of this framework might help researchers to process and analyze the Bitcoin blockchain to find patterns and insights about transaction flows in the network.

The goal of this thesis is to investigate how Apache Hadoop can help in the processing of data gathered from the Bitcoin blockchain. Moreover, a set of heuristics can be applied during the processing of the data to provide insights about the Bitcoin economy.

[1] Apache Software Foundation. Apache hadoop 3.0.0. Available at https://hadoop.apache.org/docs/r3.0.0/ Accessed 01 Jun, 2018.
[2] Blockchain Luxembourg S.A. Blockchain Size, 2018. Available at https://blockchain.info/charts/blocks-size Accessed 01 Jun, 2018.
[3] S. Nakamoto. Bitcoin: A peer-to-peer electronic cash system. Available at http://bitcoin.org/bitcoin.pdf Accessed 09 May, 2018.

30% Design, 60% Implementation, 10% Documentation

Blockchain basics, Java, Linux

Supervisors: Dr. Eder John Scheid

back to the main page