Identifying Data 2024/25
Subject (*) Data Engineering Code 614544002
Study programme
Máster Universitario en Intelixencia Artificial
Descriptors Cycle Period Year Type Credits
Official Master's Degree 1st four-month period
First Obligatory 3
Language
English
Teaching method Face-to-face
Prerequisites
Department Ciencias da Computación e Tecnoloxías da Información
Coordinador
Bernardo Roca, Guillermo de
E-mail
guillermo.debernardo@udc.es
Lecturers
Bernardo Roca, Guillermo de
E-mail
guillermo.debernardo@udc.es
Web
General description O obxectivo da materia é a introdución dos aspectos básicos da enxeñaría de datos, fundamentalmente no
ámbito do Big Data. As competencias adquiridas permitirán a análise e a xestión eficiente de información
heteroxénea, tanto estruturada como non estruturada, dentro do desenvolvemento de aplicacións de IA, alí
onde os métodos tradicionais mostren a súa insuficiencia.

Competencies / Study results
Code Study programme competences / results
A17 CE16 - Knowledge of the process and tools for processing and preparing data, from their acquisition, extraction, and cleansing to their transformation, loading, organisation and access
B2 CG02 - Successfully addressing each and every stage of an AI project
B3 CG03 - Searching and selecting that useful information required to solve complex problems, with a confident handling of bibliographical sources in the field
B4 CG04 - Suitably elaborating written essays or motivated arguments, including some point of originality, writing plans, work projects, scientific papers and formulating reasonable hypotheses in the field
B5 CG05 - Working in teams, especially of multidisciplinary nature, and being skilled in the management of time, people and decision making
B6 CB01 - Acquiring and understanding knowledge that provides a basis or opportunity to be original in the development and/or application of ideas, frequently in a research context
B7 CB02 - The students will be able to apply the acquired knowledge and to use their capacity of solving problems in new or poorly explored environments inside wider (or multidisciplinary) contexts related to their field of study
B8 CB03 - The students will be able to integrate different pieces of knowledge, to face the complexity of formulating opinions (from information that may be incomplete or limited) and to include considerations about social and ethical responsibilities linked to the application of their knowledge and opinions
C3 CT03 - Use of the basic tools of Information and Communications Technology (ICT) required for the student's professional practice and learning along her life
C7 CT07 - Developing the ability to work in interdisciplinary or cross-disciplinary teams to provide proposal that contribute to a sustainable environmental, economic, political and social development
C8 CT08 - Appreciating the importance of research, innovation and technological development in the socioeconomic and cultural progress of society
C9 CT09 - Being able to manage time and resources: outlining plans, prioritising activities, identifying criticisms, fixing deadlines and sticking to them

Learning aims
Learning outcomes Study programme competences / results
RA1: Develop the capacity to analyse and model data for processing in intelligent systems. AC16
BC6
BC7
CC3
CC9
RA2: Know and understand the process of extraction, cleaning, transformation, load and preprocessing of data. AC16
BC2
BC3
BC8
CC3
CC7
CC9
RA3: Know and learn how to use multidimensional and NoSQL databases. BC3
BC4
BC7
CC8
RA4: Know the foundations of data lakes and data warehouses. BC2
BC5
BC7
BC8
CC3
CC7
CC8

Contents
Topic Sub-topic
Concepts and foundations of Data Engineering Concepts and basic definitions, problems of efficient data load in Big Data scenarios, massive data storage and access.
Techniques of data cleaning and preparation Common techniques.
Definition of processing flows.
Quality metrics.
Efficient advanced structures and data
warehouses for Big Data
Data warehouses and multidimensional databases, data lakes, NoSQL
databases.

Planning
Methodologies / tests Competencies / Results Teaching hours (in-person & virtual) Student’s personal work hours Total hours
Guest lecture / keynote speech B4 B5 C3 C9 12 0 12
Laboratory practice A17 B2 B5 B7 C3 10 30 40
Mixed objective/subjective test A17 B2 B3 B6 B7 B8 C7 C8 3 20 23
 
Personalized attention 0 0 0
 
(*)The information in the planning table is for guidance only and does not take into account the heterogeneity of the students.

Methodologies
Methodologies Description
Guest lecture / keynote speech The teacher will introduce given subjects to the students with the aim to acquire information
valuable within a specific scope.

CONTINUOUS EVALUATION:
Mandatory character
Facultative attendance
GLOBAL EVALUATION
Mandatory character
Laboratory practice Problem or problems of practical character whose resolution requires the understanding and
application of the theoretical and practical contents covered by the course.
The students can work the solution to the proposed problems individually or in groups.
CONTINUOUS EVALUATION
Mandatory character
Mandatory attendance (min. 75% of lab practices)
GLOBAL EVALUATION
Mandatory character
Mixed objective/subjective test The exam covers all the topics of the course. Students must develop,
relate, organise and present the knowledge they have on each given
topic in a reasoned and well-articulated answer. Learning results
evaluated: RA1, RA3, RA4

Personalized attention
Methodologies
Guest lecture / keynote speech
Laboratory practice
Description
Doubts related to the methodologies and case studies discussed in class will be addressed. (lectures)

Doubts related to the case studies to be analyzed will be addressed. (labs)


Assessment
Methodologies Competencies / Results Description Qualification
Mixed objective/subjective test A17 B2 B3 B6 B7 B8 C7 C8 The exam covers all the topics of the course. Students must develop, relate, organise and present the knowledge they have on each given topic in a reasoned and well-articulated answer. Learning results evaluated: RA1, RA3, RA4 40
Laboratory practice A17 B2 B5 B7 C3 Several laboratory practices aimed to evaluate
the understanding of the knowledge exposed in the classes of
theory and/or practical classes. Learning results evaluated: RA3, RA4
60
 
Assessment comments

CONTINUOUS EVALUATION SYSTEM

  • Lab. practice

Qualification: 60%

To pass this part of the course the student has to obtain a grade equal or greater than 5 points (out of 10)

  • Exam (mixed objective/subjective test)

Qualification: 40%

To pass this part of the course the student has to obtain a grade equal or greater than 5 points (out of 10)

GLOBAL EVALUATION SYSTEM

Procedure for choosing the global evaluation modality: students are considered to have chosen the global evaluation system if they do not take part 1 (lab practice) of the continuous evaluation system.

  • Lab. practice

Qualification: 60%

To pass this part of the course the student has to obtain a grade equal or greater than 5 points (out of 10)

  • Exam

Qualification: 40%

To pass this part of the course the student has to obtain a grade equal or greater than 5 points (out of 10)

CRITERIA OF EVALUATION FOR EXTRAORDINARY AND END OF CAREER CALLS

The continuous and global evaluation systems described above will be used.

MINUTES QUALIFICATION PROCESS

Regardless of the evaluation system and the call, in case of failing any part of the evaluation, but the overall score is higher than 4 (out of 10), the grade in the minutes will be 4)

OTHER CONSIDERATIONS

If translation errors cause any contradictions between the various versions of this syllabus, the English will be the prevailing version.

All aspects related to "academic exemption," "study dedication," "continuity," and "academic fraud" will be governed in accordance with the current academic regulations of the UDC.


Sources of information
Basic Ihab F. Ilyas, Xu Chu, (2019). Data Cleaning. Association for Computing Machinery. ACM
Avi Silberschatz, Henry F. Korth, S. Sudarshan (2010). Database System Concepts. McGraw-Hill
Sadalage, Fowler (2012). NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley
Alex Gorelik (). The Enterprise Big Data Lake: Delivering the Promise of Big Data and Data Science. O'Reilly

Complementary Matt Casters, Roland Bouman, Jos van Dongen (2013). Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration. Wiley


Recommendations
Subjects that it is recommended to have taken before

Subjects that are recommended to be taken simultaneously

Subjects that continue the syllabus

Other comments

Follow the proposed methodology, attending classes, devoting the necessary time to study and carrying out assignments and solving specific problems with the help of teachers in tutorial sessions



(*)The teaching guide is the document in which the URV publishes the information about all its courses. It is a public document and cannot be modified. Only in exceptional cases can it be revised by the competent agent or duly revised so that it is in line with current legislation.