VIOLA - Video Consumption in Overlay Networks

VIOLA measures the BitTorrent Network in a distributed manner. In contrast to existing BitTorrent measurement systems (cf. Section II), which typically take snap-shots of the overlay network, VIOLA is able to monitor a large number of swarms over an extended period of time. VIOLA is deployed on one master node, which gives instructions to slave nodes deployed on smaller machines. The data gathered by slaves is returned to the master and stored in a database. VIOLA discovers torrents from a torrent portal and starts to measure those torrents discovered.

A paper using this data was presented at NOMS 2016.

If you have questions contact Andri Lareida

The VIOLA datamodel
Fig.1 the VIOLA datamodel

The data model used in persisting the collected data follows the actual objects measured (see Fig. 1). The relational database consists of three tables TORRENTS, ANNOUNCE RESULT, and PEERS. The TORRENTS table contains information about the torrent itself, such as title and size. Furthermore, it contains meta data used in measurement, such as the ACTIVE flag defining if a torrent should still be measured. The ANNOUNCE RESULT stores announce meta data from announces executed by slaves, e.g., the IP address of the slave executing the announce, the tracker the reply was received from, or the number of seeders and leechers reported by that tracker. Since a torrent is identified by the info hash, it is used as a foreign key to link the announce data to the torrent meta data. Finally, actual IP addresses of peers returned from the tracker are stored in the PEERS table, which contains among others IP address, port number, AS number, and the country code, which are resolved through geo IP databases from Maxmind.

April 2015 Data

Start07.04.2015 19:00
End20.04.2015 11:00
Slaves10
Interval20 min
IndexKickass Torrents
CategoryMovies

The measurement period started at 19:00 hours on April 7 and lasted until 11:00 hours on April 20. The number of VIOLA slaves used was 10, which were all located at the premises of the University of Zurich. The announce interval — the time in which each slave queries trackers of each torrent — was 20 minutes. New torrents were discovered from the Kickass Torrents portal, and only torrents released after the start of the measurement were considered.

The data set is available as 3 comma separate values (CSV) files, the download is about 50 GB large. A description of the files columns can be found in the following tables:

Torrents CSV column description:

INFO_HASH String Hexadecimal representation of the 160 bit info hash.
TORRENT_TITLE String Title of the download
TORRENT_SIZE String Size of the download in bytes
TORRENT_TRACKER_COUNT String Number of trackers used.
TORRENT_COMMENT String Comment from the torrent portal.
PUBLISH_DATE Number The timestamp on which the torrent was first published on the portal.
MAGNET_URI String The magnet link for this torrent.
TIME_ADDED Number The timestamp from which on the torrent was measured.
TIME_DEACTIVATED Number Timestamp of the time when VIOLA stoppped measuring the torrent.
TORRENT_LINK String Download link to download the torrent file.

Announces CSV column description:

ID Number Used to link peer rows to announce result.
INFO_HASH String Hexadecimal representation of the 160 bit info hash.
TRACKER_URI String URI string of the tracker that replied this result.
INTERVAL_NUMBER Number Count of the request round of the VIOLA system.
ANNOUNCE_COMPLETED Boolean Inidcated if the announce request was completed successfully.
SEEDERS Number The number of seeders in the swarm as reported by the tracker.
LEECHERS Number The number of leechers in the swarm as reported by the tracker.
TOTAL_PEERS Number Sum of reported seeders and leechers which equals the swarmsize.
RETURNED_PEERS Number Number of IP addresses returned by the tracker.
SLAVE_IP String IP address of the machine that queried the tracker.
SLAVE_PORT Number Port number of the connection used by the slave to contact the master.
TIMESTAMP Number Timestamp from the moment when this announce response was stored to disk.

Peers CSV column description:

ID Number Used to link peer rows to announce result.
INFO_HASH String Hexadecimal representation of the 160 bit info hash.
TIMESTAMP Number Timestamp from the moment when this announce response was stored to disk.
HEX_IP_HASH String Hexadecimal representation of the hashed and well salted IP address.
PORT Number Portnumber of the peer.
ASNUMBER Number Number of the Autonomous System (AS).
CONTINENT String Continent code.
COUNTRY String Country code.
CITY String City name.

Results

The data collected by VIOLA combines aspects of different measurement systems. Those results can now be achieved with one single measurement run. Two examples of data contained in the set are given here.

Swarm Composition
Fig.3 swarm composition of Fast and Furios 7.

Fig.6 provides a detailed insight into the composition of the largest swarm fast7, as reported by trackers. A swarm consists of seeders — peers that have the complete file — and leechers — peers that are still downloading the file. It took three days after release of the torrent until the number of seeders and leechers broke even. The amount of seeders is constantly increasing, while the number of leechers decreases after the initial peak. This means that leechers become seeders and do not immediately leave the system after they completed their download. Furthermore, the total number of peers increases again after April 15. Peers show an altruistic behavior and free riding is not a problem in this case.

Daily Fluctuations
Fig.4 Global distribution of measured peers.

Fig. 4 depicts the number of unique IP addresses measured per continent over the 24 hours of April 13. Although, India had the most unique IP Addresses on this day, Europe in total had more. The time zone patterns are clearly visble, even for those continents with few IP addresses, e.g., Oceania (OC) and South America (SA). North America (NA) and SA are very much in sync with their peak at 04:00 hours, followed by Asia (AS) and Europe (EU). Europe, spanning 3 hours in time difference, has the narrowest peak while Asia, spanning 9 hours, has a very smooth peak. NA and the other continents with even fewer peers show smooth transitions as well.