Labs

Installing and configuring all the software needed for this course on your machine might be tedious. We have prepared cloud-based virtual machines (VMs), one for each student, which can be remotedly accessed and used during the current semester. Please find here a comprehensive guide on how to connect to the cloud-based virtual machines provided by the Laboratory for Internet Computing (LInC).

A (not very) short intro to Python can be found here

Week Description Material
1 No Lab    
         
2 NLTK and Apache OpenNLP LAB01.pdf,
OpenNLP.zip
 
3 NLTK in Python - Exercises LAB02.pdf  
4 Public Holiday NO LAB  
5 Apache Lucene LAB03.pdf,
dataset.zip
Lucene Example 1
 
6 Apache Hadoop 1
  • Background information for MapReduce
  • Introduction to Hadoop (please read message about virtual machine above)
  • http://hadoop.apache.org/
LAB04.pdf
Hadoop 1 Source Code
Dataset
Hadoop 1 Solution
 
7 Apache Hadoop 2 - Exercises   LAB05.pdf
Hadoop 2 Source Code -- WordCount.java
SalesJan2009.csv
 
8 Apache Hadoop 3 - Exercises   LAB06.pdf
Hadoop 3 Source Code
9 ElasticSearch LAB07.pdf  
10 ElasticSearch - Exercises LAB08.pdf
lab8.zip
 
11 Apache Spark LAB09.pdf
kmeans-example.py
network-word-count.py
 
12 Apache Spark - Exercises LAB10.pdf
 
13 No Lab (project presentations week)