Preténdese que os alumnos adquiran competencias na identificación de situacións nas que a teoría de probabilidade e os métodos da inferencia estatística son ferramentas axeitadas para a análise cuantitativa de bases de datos xerados na área de bioinformática. Para iso, tratarase de que os estudantes complementen o seu coñecemento dos conceptos básicos de probabilidade e inferencia estatística, obteñan soltura no manexo do software estatístico R, utilizando un gran número de recursos, e que o alumno se introduza na programación nesta contorna. Tamén preténdese que os alumnos se familiaricen cos modelos probabilísticos de procesos estocásticos en tempo discreto e adquiran unha formación básica en técnicas de remostraxe (Bootstrap) como ferramenta para a posta en marcha e avaliación de diferentes algoritmos estatísticos.
Competencies
STUDY PROGRAMME COMPETENCES / RESULTS
TypeA
Code
Job guided
AJ5
CE5 - Development of skills in the management of statistical techniques and their application to data sets from the bioinformatics field.
AJ6
CE6 - Ability to identify software tools and most relevant bioinformatics data sources, and acquire skill in their use
AJ10
CE10 - Draft a bioinformatics research project, anticipating obstacles and possible alternative strategies to resolve them.
TypeB
Code
Job guided
BJ1
CB6 - Own and understand knowledge that can provide a base or opportunity to be original in the development and/or application of ideas, often in a context of research
BJ4
CB9 - Students should know how to communicate their findings, knowledge and latest reasons underpinning them to specialized and non-specialized audiences in a clear and unambiguous way
BJ5
CB10 - Students should possess learning skills that allow them to continue studying in a way that will largely be self-directed or autonomous.
TypeC
Code
Job guided
CJ3
CT3 - Use the basic tools of the information technology and communications (ICT) necessary for the exercise of their profession and lifelong learning
CJ6
CT6 - To assess critically the knowledge, technology and information available to solve the problems they face to.
CJ8
CT8 - Rating the importance that has the research, innovation and technological development in the socio-economic and cultural progress of society
Learning aims
Contents
Topic
Sub-topic
1. Basic concepts of probability and statistics revisited.
a. Probability. Random variables and main discrete and continuous distributions. Multivariate distributions.
b. Statistical inference: estimation, hypothesis testing and confidence intervals.
2. R statistical programming language revisited.
a. Introduction to R. First steps. Internal functions. Help in R. Functions, loops, vectors. Statistical functions. Plots.
Recursivity. R studio.
b. Main probability distributions in R.
c. Introduction to simulation in R.
d. Descriptive statistics in R.
e. Hipothesis testing and confidence intervals with R.
3. Linear statistical models.
a. The simple linear regression model. Basic assumptions. Estimation. Testing. Prediction. Model diagnostics.
b. The multivariate linear regression model. Basic assumptions. Estimation. Testing. Prediction. Model diagnostics.
c. Basic models in experimental desing. One-way and two-way Analysis of Varianza (ANOVA), with or without interaction. Basic assumptions. Estimation. Testing. Model diagnostics.
d. The multiple testing problem. False discovery rate.
4. Introduction to stochastic processes.
a. Simple random walk.
b. Poisson process and renewal processes. Birth-death processes.
c. Markov processes. Markov Chains.
5. Introduction to resampling methods.
a. The uniform Bootstrap. Computing the bootstrap distribution: exact distribution and aproximated distribution using Monte Carlo. Examples. Aplication of the bootstrap for estimating the precision and the bias of an estimator.
b. Variations of the uniform Bootstrap. Parametric Bootstrap, symmetrized Bootstrap and smoothed Bootstrap. Discussion and examples.
c. Bootstrap methods to construct confidence intervals: percentile method, percentil-t method, simmetrized percentil-t method. Examples. Simulation studies .
Planning
Methodologies / tests
Competencies / Results
Teaching hours (in-person & virtual)
Student’s personal work hours
Total hours
Oral presentation
A5 A6 A10 B1 B4 B5 C8
24
36
60
ICT practicals
A5 A6 A10 B4 B5 C3 C6
18
36
54
Multiple-choice questions
A5 B1 B5 C8
1
9
10
Problem solving
A5 A6 A10 B1 B4 B5 C3 C6 C8
4
16
20
Personalized attention
6
0
6
(*)The information in the planning table is for guidance only and does not take into account the heterogeneity of the students.
Methodologies
Methodologies
Description
Oral presentation
Presentation computer.
ICT practicals
Datasets statistical analysis using R.
Multiple-choice questions
Multiple-choice test on concepts.
Problem solving
Deciding statistical tools and strategies for problem solving. Linear model formulation. Design of Experiments. Formulation of resampling plans.
Personalized attention
Methodologies
ICT practicals
Problem solving
Description
Attendance and participation in lectures.
Written multiple choice test.
Participation in workshops and seminars.
Practicals to be performed by the student.
Assessment
Methodologies
Competencies / Results
Description
Qualification
ICT practicals
A5 A6 A10 B4 B5 C3 C6
Computer lab using the open statistical software R.
20
Problem solving
A5 A6 A10 B1 B4 B5 C3 C6 C8
Original work on some of the topics of the course concerning some interesting setup in Bioinformatics.
40
Multiple-choice questions
A5 B1 B5 C8
Comprehension Test
40
Assessment comments
The assessment will be carried out using a test on R labs, an individual student work, as well as a written concept test. The concept test score will be 40% of the total qualification, the test on R labs will correspond to 20% of the global score, while the remaining 40% will correspond to the individual student work, that has to be presented orally.
To pass the subject is necessary to obtain a score of at least 5 out of 10 overall.
On July opportunity, students could avoid those test with scores of at least 4 out of 10 in January tests.
Only students that didn't take any test will be qualified as NON ATTENDANT in the first opportunity (January-February). In July opportunity only students that didn't take the final exam will be qualified as NON ATTENDANT.
Sources of information
Basic
Cao Abad, R., Francisco Fernández, M., Naya Fernández, S., Presedo Quindimil, M.A., Vázquez Brage, M (2001). Introducción a la Estadística y sus Aplicaciones. Pirámide
Ewens, W.J. and Grant, G.R. (2005). Statistical Methods in Bioinformatics. Springer
Peña Sánchez de Rivera, D. (2000). Estadística: Modelos y Métodos. Alianza Editorial
Ross, S.M. (1995). Stochastic Processes. Wiley
Efron, B. and Tibshirani, R.J. (1993). An Introduction to the Bootstrap. Chapman and Hall
Davison, A.C. and Hinkley, D.V. (1997). Bootstrap Methods and their Application. Cambridge University Press
Complementary
Recommendations
Subjects that it is recommended to have taken before
Subjects that are recommended to be taken simultaneously
Introduction to databases/614522002
Genomics/614522006
Fundamentals of bioinformatics/614522008
Introduction to programming/614522001
Foundations of Artificial Intelligence/614522003
Subjects that continue the syllabus
Data structures and algorithmics for biological sequences/614522013
Advanced processing of biological sequences/614522020
Computational intelligence for high dimensional data/614522024
Master thesis/614522025
Computational intelligence for bioinformatics/614522012
Advanced statistical methods in bioinformatics/614522009
Other comments
(*)The teaching guide is the document in which the URV publishes the information about all its courses. It is a public document and cannot be modified. Only in exceptional cases can it be revised by the competent agent or duly revised so that it is in line with current legislation.