Identifying Data 2024/25
Subject (*) Data Mining Code 710G04030
Study programme
Grao en Xestión Dixital de Información e Documentación
Descriptors Cycle Period Year Type Credits
Graduate 2nd four-month period
Third Optional 6
Language
Spanish
Galician
Teaching method Face-to-face
Prerequisites
Department Matemáticas
Coordinador
Gómez Rodríguez, Marcos
E-mail
marcos.gomez.rodriguez@udc.es
Lecturers
Gómez Rodríguez, Marcos
E-mail
marcos.gomez.rodriguez@udc.es
Web
General description O obxectivo fundamental desta materia é que o alumno coñeza os conceptos fundamentais e os principais modelos da minería de datos, e a súa aplicación no campo das ciencias da información e documentación.

Competencies / Study results
Code Study programme competences / results

Learning aims
Learning outcomes Study programme competences / results
Acquisition of skills for the selection, treatment, management and analysis of information through data mining techniques.
Knowledge and skills for the application of the main classification techniques.
Acquire knowledge of computational data analysis, including programs such as R statistical software.
Choosing the quantitative techniques appropriate to the objectives for research, administration and management tasks.
Capacity for analysis and synthesis applied to the management and organization of information.
Knowledge and skills for the application of regression techniques, anomaly detection and time series.
Acquisition of decision-making skills based on statistical analysis of complex databases.
Knowledge and skills for the application of the main classification techniques.
Acquisition of decision-making skills based on statistical analysis of complex databases.
Knowledge and skills for the application of regression techniques, anomaly detection and time series.
Capacity for analysis and synthesis applied to the management and organization of information.
Choosing the quantitative techniques appropriate to the objectives for research, administration and management tasks.
Acquire knowledge of computational data analysis, including programs such as R statistical software.
Acquisition of skills for the selection, treatment, management and analysis of information through data mining techniques.
Acquisition of skills for the selection, treatment, management and analysis of information through data mining techniques.
Acquire knowledge of computational data analysis, including programs such as R statistical software.
Choosing the quantitative techniques appropriate to the objectives for research, administration and management tasks.
Capacity for analysis and synthesis applied to the management and organization of information.
Knowledge and skills for the application of regression techniques, anomaly detection and time series.
Acquisition of decision-making skills based on statistical analysis of complex databases.
Knowledge and skills for the application of the main classification techniques.

Contents
Topic Sub-topic
Introduction to data mining.
Preliminary concepts.
Types of data mining problems: description, classification, prediction, clustering, anomaly detection, etc.
Types of learning: supervised and unsupervised.
Unsupervised classification or clustering methods.
Basic concepts.
Hierarchical classification methods.
Partitioning clustering methods.
Case studies in information science and documentation.
Supervised classification methods.
Basic concepts.
Main models of supervised classification or pattern recognition.
Validation of classification models (how well do they predict?).
Case studies in information science and documentation.
Advanced regression methods.
Introduction.
Univariate and multivariate regression models.
Selection of relevant variables.
Validation of regression models (how well does it fit the data, how well does it make predictions).
Case studies in information science and documentation.
Time series

Basic concepts.
Descriptive time series analysis.
Practical use of time series models.
Case studies.
Statistical techniques for text mining and information retrieval. Basic concepts.
Practical cases of application of text mining.

Planning
Methodologies / tests Competencies / Results Teaching hours (in-person & virtual) Student’s personal work hours Total hours
Guest lecture / keynote speech 19 0 19
ICT practicals 13 0 13
Case study 7 7 14
Supervised projects 1 101 102
Objective test 1 0 1
 
Personalized attention 1 0 1
 
(*)The information in the planning table is for guidance only and does not take into account the heterogeneity of the students.

Methodologies
Methodologies Description
Guest lecture / keynote speech They will be expository sessions in which the various topics of the subject will be introduced and described through presentations (using appropriate audiovisual media) that will include theory and examples.
ICT practicals Practical classes will be developed using statistical software, in which its programming and application will be introduced based on real and simulated cases.
Case study The statistical techniques taught in the subject will be applied to solve exercises and real and simulated case studies in the field of digital information management.
Supervised projects Individual and/or group work will be carried out, supervised by the teachers of the subject, in which the resolution, by means of the application of statistical techniques and R software, of practical exercises or particular case studies related to the field of communication and information sciences will be approached. A review study of a specific topic of the subject or of the software used may also be carried out. The works may be proposed by the teachers or by the students themselves (the proposals will be taken into account or not always according to the teacher's criteria).
Objective test It will consist of a multiple-choice test on the contents taught in the subject.

Personalized attention
Methodologies
ICT practicals
Case study
Guest lecture / keynote speech
Supervised projects
Description
In the master classes, discussion among the students and between the students and the teacher will be encouraged at all times. For the resolution of problems it will be important to attend personally to the students in case of possible doubts that may arise. This attention will also serve, on the one hand, to the teacher to detect possible problems in the methodology used to teach the subject and, on the other hand, to the students to consolidate theoretical knowledge and to express their concerns about the subject. It will also be essential to give personalized attention to the student during the ICT practical classes, especially until he/she becomes familiar with the statistical software to be used, as well as in the resolution of case studies.

Assessment
Methodologies Competencies / Results Description Qualification
ICT practicals Attendance and/or student performance with the statistical software will be assessed. 20
Objective test A multiple-choice test consisting of a number of questions between 10 and 20 with 3 possible answers. 40
Supervised projects Individual and/or group work will be carried out, supervised by the teachers of the subject, in which the resolution, by means of the application of statistical techniques and R software, of practical exercises or particular case studies related to the field of communication and information sciences will be approached. A review study of a specific topic of the subject or the software used may also be carried out. The works may be proposed by the teachers or by the students themselves (the proposals will be taken into account or not always according to the teacher's criteria). 40
 
Assessment comments
<p><b>First opportunity</b></p><p><b></b></p><p>There will be a multiple-choice test of 10 to 20 questions that represents 40% of the grade. On the other hand, the continuous evaluation will consist of the attendance and/or delivery of practices related to the learning and application of the statistical software R for the resolution of problems in the field of digital information management (20% of the overall grade), in addition to the delivery of one or several works of application of statistics for the resolution of case studies in digital documentation (alternatively they can be works of revision or extension of the subject) that represents 40% of the total grade.</p><p><b></b></p><p><b><strong>Second opportunity</strong></b></p><p><b></b></p><p>In the evaluation of the second opportunity, the same criteria corresponding to the first opportunity will be followed.</p><p><b></b></p><p><b><strong>Advanced call</strong></b></p><p><b></b></p><p>All the previous observations are applicable to students who request the advanced call of the exam.</p><p><b></b></p><p><b><strong>No-show grade</strong></b></p><p><b></b></p><p>In any of the two annual opportunities a NON PRESENTADO will appear in those cases in which the student does not attend the official exam of the subject.</p><p><b></b></p><p><b><strong>Student with recognition of part-time dedication and academic dispensation of exemption from attendance.</strong></b></p><p><b></b></p><p>In the case of students with recognition of part-time dedication and academic dispensation of exemption from attendance who decide not to attend classes, they will be evaluated in both opportunities as the rest of the students who are in a similar situation. </p><p><b></b></p><p>The fraudulent performance of the tests or evaluation activities will directly imply the grade of "0" in the subject in the corresponding call, thus invalidating any grade obtained in all evaluation activities for the extraordinary call.</p>

Sources of information
Basic Williams, G. (2011). Data mining with Rattle and R: The art of excavating data for knowledge discovery. . Springer Science & Business Media.
Cirillo, A. (2017). R Data Mining: Implement Data Mining Techniques Through Practical Use Cases and Real-world Datasets. . Packt Publishing.
Jockers, M.L. (2014). Text Analysis with R for Students of Literature. Springer
Silge, J. y Robinson, D. (2017). Text Mining with R: A Tidy Approach. O'Reilly

Complementary


Recommendations
Subjects that it is recommended to have taken before
Data Science/710G04026
Fundamentals of Statistics/710G04040

Subjects that are recommended to be taken simultaneously

Subjects that continue the syllabus

Other comments
<p>To help to achieve a sustainable environment and meet the objective of action number 5: “Healthy and sustainable environmental and social teaching and research” of the "Green Campus Ferrol Action Plan":</p><p>1.- The delivery of the documentary works carried out in this subject:</p><p>1.1. It will be requested in virtual format and/or computer support.</p><p>1.2. It will be done through Moodle, in digital format without the need to print them.</p><p>1.3. If done on paper:</p><p>-Plastics will not be used.</p><p>- Double-sided prints will be made.</p><p>- Recycled paper will be used.</p><p>- Draft printing will be avoided.</p><p>2.- A sustainable use of resources and the prevention of negative impacts on the natural environment must be made.</p><p>3.- The importance of ethical principles related to the values ??of sustainability in personal and professional behavior must be taken into account.</p><p>4.- As it is included in the different regulations of application for university teaching, the gender perspective must be incorporated in this subject (non-sexist language will be used, bibliography of authors of both sexes will be used, intervention in student class will be encouraged and students...).</p><p>5.- We will work to identify and modify prejudices and sexist attitudes, and the environment will be influenced to modify them and promote values ??of respect and equality.</p><p>6. Situations of discrimination based on gender must be detected and actions and measures will be proposed to correct them.</p><p>7. The full integration of students who, due to physical, sensorial, psychic or sociocultural reasons, experience difficulties in an ideal, egalitarian and profitable access to university life will be facilitated.</p>


(*)The teaching guide is the document in which the URV publishes the information about all its courses. It is a public document and cannot be modified. Only in exceptional cases can it be revised by the competent agent or duly revised so that it is in line with current legislation.