Identifying Data 2023/24
Subject (*) Parallel Processing Code 614G02023
Study programme
Grao en Ciencia e Enxeñaría de Datos
Descriptors Cycle Period Year Type Credits
Graduate 1st four-month period
Third Obligatory 6
Language
Spanish
Teaching method Face-to-face
Prerequisites
Department Enxeñaría de Computadores
Coordinador
Enes Álvarez, Jonatan
E-mail
jonatan.enes@udc.es
Lecturers
Enes Álvarez, Jonatan
E-mail
jonatan.enes@udc.es
Web
General description Nesta materia o alumno estudará o papel que o uso do paralelismo ten á hora de acelerar a execución de programas en xeral, e o procesamento de datos en particular.

O coñecemento teórico partirá dos conceptos técnicos máis básicos de paralelismo, incluíndo a súa utilidade, aplicabilidad, o contexto técnico da execución de programas paralelos e a súa evolución histórica (Tema 1). Despois, exporanse as principais tecnoloxías hardware que existen actualmente para o procesamiento paralelo, así como as súas capacidades técnicas subxacentes relacionadas e necesarias para explotar o paralelismo (Tema 2). A continuación, profundarase no paralelismo con conceptos engadidos, clasificacións, posibles deseños para o seu implementación software e maneiras de analizar o seu rendemento (Tema 3). Finalmente, aplicarase todo o coñecemento previamente adquirido estudando as ferramentas e tecnoloxías modernas para o procesamiento de datos masivos, é dicir, pondo o foco no Big Data (Tema 4).

No aspecto práctico, o alumno realizará diversas sesións cun enfoque incremental a fin de adquirir o coñecemento e a habilidade de programar e/ou despregar solucións de procesamiento. Empezarase con prácticas dirixidas a adquirir competencias máis técnicas e simples, e irase progresando cara a solucións máis completas, cada vez máis relacionadas á súa vez co procesamiento de datos. As prácticas serán autocontenidas e fortemente enfocadas á resolución de problemas ou escenarios concretos.

Esta materia ten unha forte dependencia con materias previas como "Fundamentos de Programación I e II", principalmente polo seu requisito técnico para a programación, e "Deseño e Análise de Algoritmos" polo seu coñecemento teórico da análise de complexidade dos algoritmos. En menor medida requírense coñecementos previos da materia de "Fundamentos de Computadores" para comprender o comportamento empírico e o rendemento en xeral dalgúns programas cando estes execútanse nun computador.

Study programme competencies
Code Study programme competences
A12 CE12 - Capacidade de coñecer e aplicar os principios fundamentais, principais paradigmas e técnicas da programación paralela e distribuída ao desenvolvemento de algoritmos para o procesamento e análise masiva de datos.
B2 CB2 - Que os estudantes saiban aplicar os seus coñecementos ao seu traballo ou vocación dunha forma profesional e posúan as competencias que adoitan demostrarse por medio da elaboración e defensa de argumentos e a resolución de problemas dentro da súa área de estudo
B3 CB3 - Que os estudantes teñan a capacidade de reunir e interpretar datos relevantes (normalmente dentro da súa área de estudo) para emitir xuízos que inclúan unha reflexión sobre temas relevantes de índole social, científica ou ética
B4 CB4 - Que os estudantes poidan transmitir información, ideas, problemas e solucións a un público tanto especializado como non especializado
B7 CG2 - Elaborar adecuadamente e con certa orixinalidade composicións escritas ou argumentos motivados, redactar plans, proxectos de traballo, artigos científicos e formular hipóteses razoables.
B8 CG3 - Ser capaz de manter e estender formulacións teóricas fundadas para permitir a introdución e explotación de tecnoloxías novas e avanzadas no campo.
B9 CG4 - Capacidade para abordar con éxito todas as etapas dun proxecto de datos: exploración previa dos datos, preprocesado, análise, visualización e comunicación de resultados.
B10 CG5 - Ser capaz de traballar en equipo, especialmente de carácter multidisciplinar, e ser hábiles na xestión do tempo, persoas e toma de decisións.
C1 CT1 - Utilizar as ferramentas básicas das tecnoloxías da información e as comunicacións (TIC) necesarias para o exercicio da súa profesión e para a aprendizaxe ao longo da súa vida.
C4 CT4 - Valorar a importancia que ten a investigación, a innovación e o desenvolvemento tecnolóxico no avance socioeconómico e cultural da sociedade.

Learning aims
Learning outcomes Study programme competences
Know of and understand the technical requirements and the current technologies that allow for parallelism. A12
B8
B9
Know of the different currently available technologies to implement parallelism, their applicability, limits, advantages and disadvantages. A12
B4
B8
B9
Be able to use parallelism techniques to adapt existing solutions so that they allow parallel processing. A12
B2
B4
B7
B8
B9
B10
C1
Be able to analyze the performance if a processing solution, with and without parallelization. A12
B2
B4
B7
B8
B9
B10
C1
Understand the role that parallelization plays in today's society when it comes to key data processing tasks for society, business and research. A12
B3
B4
B8
B10
C4

Contents
Topic Sub-topic
Chapter 1 - Introduction and previous concepts * The process and sequential program
* Lifecycle of a process
* Threads
* Paralell program
* Usefulness of parallelism
Chapter 2 - Hardware parellelism, hierarchy * Levels of parallelism
* Internal processor parallelism (hidden)
* Processor functionalities (low-level parallelism)
* Processor accessible resources (high-level parallelism)
* Pool of machines (Cluster and Supercomputer)
* Distributed computing
* Specific devices
* State of the art of processors
Chapter 3 - Software parallelism, design and implementation * Flynn taxonomy
* Frameworks and languages for parallelism
* Key concepts
* Paradigms for parallel processing
* Parallel programs analysis
* Parallel programs design
Chapter 4 - Parallelism for Big Data * Data storage
* Resource and execution management
* Batch processing
* Streaming processing

Planning
Methodologies / tests Competencies Ordinary class hours Student’s personal work hours Total hours
Guest lecture / keynote speech A12 B3 B8 B9 C4 20 30 50
Laboratory practice A12 B2 B4 B7 B9 B10 C1 20 60 80
Objective test A12 B2 B4 B7 B9 C1 C4 3 11 14
 
Personalized attention 6 0 6
 
(*)The information in the planning table is for guidance only and does not take into account the heterogeneity of the students.

Methodologies
Methodologies Description
Guest lecture / keynote speech * Theory sessions will introduce the basic knowledge later used on practice sessions.

* Other concepts will also be explained in detail, either because they are key to understand the technologies and techniques used on the practice sessions, or because they are more advanced and are crucial to understand the paper that parallelism has on nowadays society.
________________________________________________________________________________
Laboratory practice * Each practice lessons will be briefly explained by the teacher on a lesson class, and the students are expected to start it right away.

* Practice sessions will be self-contained and will deal with several specific problems or scenarios where parallelism plays an important role and where previously explained techniques or technologies are used.

* Each practice will focus on a single scenario or problem and will be composed of previous description and explanation, a proposed code to be analyzed and used, and a series of questions to work on. The student will have to work on the practice, starting on its first practice session and then continuing on its out-of-classroom time. The questions can range from performing an extension of the code, to performing an empirical study of its performance using several parallelism configurations, describing its behavior or functioning, or other types of questions overall focused at assessing the degree to which the student comprehended the problem and the solution.
________________________________________________________________________________
Objective test * At the end of the term, and exam will be carried out to evaluate all the subject's knowledge, primarily the concepts from the theory sessions, but also to a lesser extent the ones from the practice sessions.

Personalized attention
Methodologies
Guest lecture / keynote speech
Laboratory practice
Description
* Personalized attention will focus on supporting the students with the overall subject.

* On the one hand, personalized attention will be available for those that have some issue understanding any concept exposed on the theory sessions, so that no student has any difficulty in keeping up with the classes and with those topics that will be the subject of evaluation.

* On the other hand, personalized attention will also be available for any student that requires some help with specific issues that arise from the practice lessons, whether they are due to technical problems or more deep understanding issues of the key concepts dealt with. Although this help will be available for any practice lesson throughout the term, it is advisable to deal with any doubt or problem either during the practice lesson or shortly afterwards.
____________________________________________________________________________

Those students with an approved dispensation for non-attendance at classes can also benefit by using this personalized attention to ask for the practice briefing as it was given during the ordinary practice classes.

Assessment
Methodologies Competencies Description Qualification
Laboratory practice A12 B2 B4 B7 B9 B10 C1 * All the practice lessons will be assessed and graded. Such assessments can be individual using a questionare, or in a group through a submission. Groups will be formed previously and once created, can not be changed throughout the course.

* The dates and timelines for practice assessments and submissions will be previously informed to the students.
________________________________________________________________________
50
Objective test A12 B2 B4 B7 B9 C1 C4 * Written exam carried out individually at the end of the term.

* It will mainly evaluate and assess concepts from the theory lessons.

* To a lesser point, some questions will also be present to re-asses key concepts from the practice lessons.
50
 
Assessment comments
  • In order to pass the subject:
    • a minimum of 40% is required on the objective test, or final exam (2 points out of 5).
    • a minimum of 40% is required on the practice lessons (2 points out of 5).
  • Practice sessions will be NON REPEATABLE for the second chance.
  • Part-time students can attend any practice class group, once it has been previously notified.
  • Part-time students or students with approved dispensation for non-attendance at classes can submit their practice lessons taking into account the longest group-specific deadline available. In case a practice lesson is assessed using a quiz, a different date will be previously negotiated if needed.
  • In order to comply with the current legislation in regards to gender equality, 2 measures will be taken:
    • Parity groups are to be formed, as much as possible
    • All the quizzes and the final objective test will be corrected using a blind method in order to assure the student's anonimity. 

Sources of information
Basic Julio Ortega Lopera (2005). Arquitectura de computadores. Madrid : Thomson
David A. Patterson (2014). Computer organization and design: the hardware/software interface. Waltham, MA : Morgan Kaufmann
------------------------ (Tema 1 ). ------------------------ .
------------------------ (Tema 2). ------------------------ .
------------------------ (Tema 3). ------------------------ .
------------------------ (Tema 4). ------------------------ .
Sarah L. Harris (2021). Digital design and computer architecture. Amsterdam : Elsevier, Morgan Kaufmann
Francisco Almeida (2008). Introducción a la programación paralela. Madrid : Paraninfo Cengage Learning
Tomasz Drabas (2017). Learning PySpark. Packt Publishing
Jan Palach (2014). Parallel programming with Python. Packt Publishing
Giancarlo Zaccone (2015). Python parallel programming cookbook. Packt Publishing
Jesús Carretero Pérez (2021). Sistemas operativos: una visión aplicada . Madrid : McGraw-Hill

Complementary Jorge Luis Ortega-Arjona (2010). Patterns for parallel software design. Sussex, UK: Wiley series in software design patterns
Peter S. Pacheco (2021). An introduction to parallel programming. Burlington, MA : Morgan Kaufmann
Vijay Srinivas Agneeswaram (2014). Big Data analytics beyond Hadoop: real-time applications with Storm, Spark, and more Hadoop alternatives. Upper Saddle River, NJ : Pearson Education
John L. Hennesy (2019). Computer architecture: a quantitative approach. Cambridge, Massachusetts : Morgan Kaufmann
Bertil Schmidt (2017). Parallel programming: concepts and practice. Cambridge, MA : Morgan Kaufmann
William Stallings (2005). Sistemas operativos: aspectos internos y principios de diseño. Madrid : Pearson


Recommendations
Subjects that it is recommended to have taken before
Design and Analysis of Algorithms/614G02011
Fundamentals of Computers/614G02005
Fundamentals of Programming II/614G02009
Fundamentals of Programming I/614G02004

Subjects that are recommended to be taken simultaneously
Algorithms/614G03008

Subjects that continue the syllabus
Advanced Parallel Processing /614G02034

Other comments
  • It is recommended to have some knowledge and ability to program with Python, as all it will be the language used for all of the practice lessons.
  • It is recommended to have some degree of expertise with a Linux operating system, mainly process and filesystem management.


(*)The teaching guide is the document in which the URV publishes the information about all its courses. It is a public document and cannot be modified. Only in exceptional cases can it be revised by the competent agent or duly revised so that it is in line with current legislation.