Learning Outcomes
The students upon the successful completion of the course will be able:
- to evaluate the quality of the data to be analyzed and apply the appropriate data pre-processing techniques,
- to select the appropriate data mining technique based on requirements and data type,
- to design and develop data warehouses,
- to use the appropriate data mining techniques and tools to extract knowledge from data collections,
- to evaluate the quality of data mining results.
Course Contents
- Introduction to the fundamental data mining concepts and techniques: main steps of knowledge and data discovery, requirements of developing data mining approaches.
- Data pre-processing: data cleaning, transformation, dimensionality reduction.
- Data warehouses: multidimensional models, architecture, implementation of data warehouses, OLAP.
- Clustering: partitional, hierarchical, density-based, grid-based, spectral clustering, clustering applications.
- Classification: Bayesian classifiers, decision trees, k-nearest neighbors.
- Association rules: Apriori, representative association rules.
- Quality assessment in data mining: evaluation of classification models, association rules interestingness measures, cluster validity.
- Web mining: link analysis, text mining, web search, PageRank.
Recommended Readings
- Han J. & Kamber M. (2006): Data Mining: Concepts and Techniques, 2nd Edition, Morgan Kaufmann.
- Chakrabarti S. (2002): Mining the Web, Discovering Knowledge from Hypertext Data, Morgan Kaufman Publishers.
Learning Outcomes
The aim of this course is learning fundamental concepts of information retrieval systems. The course’s contents cover all stages of system design and implementation for collection, indexing and searching of text documents, as well as evaluation methods. In addition, recent trends in information retrieval are also covered, for example information retrieval from the WWW.
Upon successful completion of the course, the students will be in position:
- to know representation models for text documents.
- to use techniques for indexing, compression, retrieval and scoring of documents.
- to develop applications that manage large volumes of text.
- to build the functionality of a search engine.
- to apply machine learning techniques for text classification.
Course Contents
- Introduction and basic IR concepts
- System architecture of IR systems
- Dictionaries and inverted indexes
- Construction and compression of dictionaries
- Information retrieval models (boolean model, vector space model, probability models)
- Scoring and ranking documents
- Language models
- Information retrieval from XML documents
- Basic concepts of information retrieval from the WWW
- Web crawling and indexing
- Text classification with machine learning techniques, support vector machines, algorithms for text classification
Recommended Readings
- Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.
- Ricardo A. Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.