Data Warehouses and Data Mining

Learning Outcomes

The students upon the successful completion of the course will be able:

  • to evaluate the quality of the data to be analyzed and apply the appropriate data pre-processing techniques,
  • to select the appropriate data mining technique based on requirements and data type,
  • to design and develop data warehouses,
  • to use the appropriate data mining techniques and tools to extract knowledge from data collections,
  • to evaluate the quality of data mining results.

Course Contents

  • Introduction to the fundamental data mining concepts and techniques: main steps of knowledge and data discovery, requirements of developing data mining approaches.
  • Data pre-processing: data cleaning, transformation, dimensionality reduction.
  • Data warehouses: multidimensional models, architecture, implementation of data warehouses, OLAP.
  • Clustering: partitional, hierarchical, density-based, grid-based, spectral clustering, clustering applications.
  • Classification: Bayesian classifiers, decision trees, k-nearest neighbors.
  • Association rules: Apriori, representative association rules.
  • Quality assessment in data mining: evaluation of classification models, association rules interestingness measures, cluster validity.
  • Web mining: link analysis, text mining, web search, PageRank.

Recommended Readings

  • Han J. & Kamber M. (2006): Data Mining: Concepts and Techniques, 2nd Edition, Morgan Kaufmann.
  • Chakrabarti S. (2002): Mining the Web, Discovering Knowledge from Hypertext Data, Morgan Kaufman Publishers.

Pattern Recognition

Learning Outcomes

Pattern recognition is the scientific field that deals with the assignment of a label to a given input value. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes. The course aims to cover the most popular in the literature techniques for pattern recognition, as they are typically employed in a number of practical applications, such as speech and audio recognition, image and video analysis, biometrics and bioinformatics. The course covers the most commonly used classification algorithms, feature selection techniques, data transformation methods, and data clustering.

Students, upon successful completion of the course, will be able to:

A) Understand the key standards recognition methodologies

B) Analyze problems in various areas of application, such as voice and audio recognition, image and video analysis, biometrics and bioinformatics.

C) Choose the best classifiers, feature selection methods, data transformations, and clustering.

D) Evaluate standard pattern recognition systems


Course Contents

  • Introduction to Pattern recognition systems
  • Parametric estimation of probability density function (maximum Likelihood estimation, maximum a posteriori
  • Bayesian classifiers and Bayesian Networks
  • k-nearest neighbor
  • Non parametric estimation of probability density function (Parzen windows)
  • Linear classifiers, non linear classifiers. Perceptron algorithm. Multilayer neural networks, Deep Learning
  • Unsupervised Pattern recognition – Clustering
  • Feature generation: contour representation and contour tracing, chain code, polygon, signatures, linear transforms, Fourier Transform, regional features, image recognition, bias and variance, texture
  • Feature Selection and Kernels
  • Pattern recognition tools

Recommended Readings

  • Sergios Theodoridis and Konstantinos Koutroumbas. 2008. Pattern Recognition, Fourth Edition (4th ed.). Academic Press