Data Analytics

Data Analytics
Professors	Christos Doulkeridis Vassiliki Koufi
Course category	Core
Course ID	DS-529
Credits	5
Lecture hours	3 hours
Lab hours	2 hours
Digital resources	View on Aristarchus (Open e-Class)

Learning Outcomes

In this course, methods and techniques for data analysis are taught: visualization methods for data exploration, data modeling, data mining, and applications of data analysis and use of data. The aim of the course is to familiarize students with the concept of data analysis, and to obtain skills in management and analysis of data sets in real-life applications.

Upon successful completion of the course, the students will be in position:

to understand the basic concepts of data analytics
to use tools and techniques for exploratory data analysis
to understand the properties and characteristics of a given data set
to solve practical problems of data analysis using real data sets
to model problems concerning data analysis and use the model for drawing conclusions for any given data set
to apply predictive models and algorithms on data sets

Course Contents

Introduction to data analysis: data, data types, data quality, data preprocessing, similarity measures, similarity of multidimensional data, string similarity, similarity between sets and lists, text similarity.
Univariate and bivariate analysis: visualization, histograms, cumulative distribution function, elements of descriptive statistics, measures of position and spread, correlation, alternative mapping techniques using plots.
Time-series analysis: trend, seasonality, noise, smoothing methods, moving averages, autocorrelation function, analyzing time-series in practice.
Introduction to predictive modeling: feature selection, entropy, information gain, decision trees.
Model fitting: linear models, linear regression, logistic regression, support vector machines. K-nn classification, Bayes classification
Overfitting and model evaluation: classification algorithms, training, testing, evaluation, the problem of overfitting, fitting graph, holdout data, crossvalidation, learning graph, evaluation metrics.
The problem of clustering, pre-processing and post-processing, clustering methods, center-seekers, tree builders, neighborhood growers.
Association analysis: frequent itemsets, the Apriori algorithm, association rules, maximal frequent itemsets.
Principal component analysis (PCA), the problem of finding important attributes, feature selection methods, application of PCA in practice.
Probability theory and statistics: Binomial distribution and Bernoulli trials, the significance of the Normal distribution, Central Limit Theorem, power-law distributions, construction method for generator of random data distributions.
Anomaly detection: typical problems, characteristics of anomaly detection methods, proximity-based approaches, density-based approaches, clustering-based approaches, evaluation of anomaly detection methods.

CONTACT DETAILS		FOLLOW US
Address:	Μ. Karaoli & Α. Dimitriou 80, 18534 Piraeus		Official Facebook Link
Tel.:	+30 210 4142235, +30 210 4142426, +30 210 4142373, +30 210 4142076		Official Twitter Link
Fax:	+30 210 4142376		Official Linkedin Link Official Alumni Linkedin Link
e-mail:	gramds@unipi.gr		Official Youtube Link

Learning Outcomes

Course Contents

Recommended Readings