Data Analytics


Learning Outcomes

In this course, methods and techniques for data analysis are taught: visualization methods for data exploration, data modeling, data mining, and applications of data analysis and use of data. The aim of the course is to familiarize students with the concept of data analysis, and to obtain skills in management and analysis of data sets in real-life applications.

Upon successful completion of the course, the students will be in position:

  • to understand the basic concepts of data analytics
  • to use tools and techniques for exploratory data analysis
  • to understand the properties and characteristics of a given data set
  • to solve practical problems of data analysis using real data sets
  • to model problems concerning data analysis and use the model for drawing conclusions for any given data set
  • to apply predictive models and algorithms on data sets

Course Contents

  • Introduction to data analysis: data, data types, data quality, data preprocessing, similarity measures, similarity of multidimensional data, string similarity, similarity between sets and lists, text similarity.
  • Univariate and bivariate analysis: visualization, histograms, cumulative distribution function, elements of descriptive statistics, measures of position and spread, correlation, alternative mapping techniques using plots.
  • Time-series analysis: trend, seasonality, noise, smoothing methods, moving averages, autocorrelation function, analyzing time-series in practice.
  • Multivariate analysis: using visualization techniques for multivariate data analysis, the curse of dimensionality, empty space phenomenon, dimensionality reduction techniques.
  • Modeling: computations and estimations, model building, from descriptive modeling to mathematical modeling.
  • Probability theory and statistics: Binomial distribution and Bernoulli trials, the significance of the Normal distribution, Central Limit Theorem, power-law distributions, construction method for generator of random data distributions.
  • Simulation: how simulations can be used for extracting information from data, Monte-Carlo simulations, simulation when analytical modeling is complex, model building with simulations, model validation with simulations.
  • The problem of clustering, pre-processing and post-processing, clustering methods, center-seekers, tree builders, neighborhood growers.
  • Principal component analysis (PCA), the problem of finding important attributes, feature selection methods, application of PCA in practice.
  • Predictive analytics, the problem of classification, classification algorithms, training, testing, evaluation of classification results, techniques for improving precision.
  • Mohammed J. Zaki, Wagner Meira Jr. (2014): Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press.
  • Jure Leskovec, Anand Rajaraman, Jeff Ullman (2014): Mining of Massive Datasets. Cambridge University Press, 2nd edition.
  • Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar (2016): Introduction to Data Mining, Publisher: Pearson.
  • Philipp K. Janert (2011): Data Analysis with Open Source Tools, O’Reilly Press.