Data Analytics

Professors Christos Doulkeridis
Konstantinos Moutselos
Course category Core
Course ID DS-529
Credits 5
Lecture hours 3 hours
Lab hours 2 hours
Digital resources View on Aristarchus (Open e-Class)

Learning Outcomes

In this course, methods and techniques for data analysis are taught: visualization methods for data exploration, data modeling, data mining, and applications of data analysis and use of data. The aim of the course is to familiarize students with the concept of data analysis, and to obtain skills in management and analysis of data sets in real-life applications.

Upon successful completion of the course, the students will be in position:

  • to understand the basic concepts of data analytics
  • to use tools and techniques for exploratory data analysis
  • to understand the properties and characteristics of a given data set
  • to solve practical problems of data analysis using real data sets
  • to model problems concerning data analysis and use the model for drawing conclusions for any given data set
  • to apply predictive models and algorithms on data sets

Course Contents

  • Introduction to data analysis: data, data types, data quality, data preprocessing, similarity measures, similarity of multidimensional data, string similarity, similarity between sets and lists, text similarity.
  • Univariate and bivariate analysis: visualization, histograms, cumulative distribution function, elements of descriptive statistics, measures of position and spread, correlation, alternative mapping techniques using plots.
  • Time-series analysis: trend, seasonality, noise, smoothing methods, moving averages, autocorrelation function, analyzing time-series in practice.
  • Introduction to predictive modeling: feature selection, entropy, information gain, decision trees.
  • Model fitting: linear models, linear regression, logistic regression, support vector machines. K-nn classification, Bayes classification
  • Overfitting and model evaluation: classification algorithms, training, testing, evaluation, the problem of overfitting, fitting graph, holdout data, crossvalidation, learning graph, evaluation metrics.
  • The problem of clustering, pre-processing and post-processing, clustering methods, center-seekers, tree builders, neighborhood growers.
  • Association analysis: frequent itemsets, the Apriori algorithm, association rules, maximal frequent itemsets.
  • Principal component analysis (PCA), the problem of finding important attributes, feature selection methods, application of PCA in practice.
  • Probability theory and statistics: Binomial distribution and Bernoulli trials, the significance of the Normal distribution, Central Limit Theorem, power-law distributions, construction method for generator of random data distributions.
  • Anomaly detection: typical problems, characteristics of anomaly detection methods, proximity-based approaches, density-based approaches, clustering-based approaches, evaluation of anomaly detection methods.

Recommended Readings

  • Mohammed J. Zaki, Wagner Meira Jr. (2014): Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press.
  • Jure Leskovec, Anand Rajaraman, Jeff Ullman (2014): Mining of Massive Datasets. Cambridge University Press, 2nd edition.
  • Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar (2016): Introduction to Data Mining, Publisher: Pearson.
  • Philipp K. Janert (2011): Data Analysis with Open Source Tools, O’Reilly Press.