Preténdese que os estudantes adquiran competencias na identificación de situacións nas que a teoría de probabilidade e os métodos da inferencia estatística son ferramentas axeitadas para a análise cuantitativa de bases de datos xerados na área de bioinformática. Para iso, tratarase de que o alumnado complemente o seu coñecemento dos conceptos básicos de probabilidade e inferencia estatística, obteña soltura no manexo do software estatístico R, utilizando un gran número de recursos, e introducíndose na programación nesta contorna. Tamén se pretende que os estudantes se familiaricen cos modelos probabilísticos de procesos estocásticos en tempo discreto e adquiran unha formación básica en técnicas de remostraxe (bootstrap) como ferramenta para a posta en marcha e avaliación de diferentes algoritmos estatísticos.
Competencies / Study results
Code
Study programme competences / results
A1
CE1 - Ability to know the scope of Bioinformatics and its most important aspects
A2
CE2 – To define, evaluate and select the architecture and the most suitable software for solving a problem in the field of bioinformatics
A3
CE3 – To analyze, design, develop, implement, verify and document efficient software solutions based on an adequate knowledge of the theories, models and techniques in the field of Bioinformatics
A5
CE5 - Development of skills in the management of statistical techniques and their application to data sets from the bioinformatics field.
A6
CE6 - Ability to identify software tools and most relevant bioinformatics data sources, and acquire skill in their use
A7
CE7 - Ability to identify the applicability of the use of bioinformatics tools to clinical areas.
A10
CE10 - Draft a bioinformatics research project, anticipating obstacles and possible alternative strategies to resolve them.
B1
CB6 - Own and understand knowledge that can provide a base or opportunity to be original in the development and/or application of ideas, often in a context of research
B2
CB7 - Students should know how to apply the acquired knowledge and ability to problem solving in new environments or little known within broad (or multidisciplinary) contexts related to their field of study
B4
CB9 - Students should know how to communicate their findings, knowledge and latest reasons underpinning them to specialized and non-specialized audiences in a clear and unambiguous way
B5
CB10 - Students should possess learning skills that allow them to continue studying in a way that will largely be self-directed or autonomous.
B6
CG1 -Search for and select the useful information needed to solve complex problems, driving fluently bibliographical sources for the field
C3
CT3 - Use the basic tools of the information technology and communications (ICT) necessary for the exercise of their profession and lifelong learning
C6
CT6 - To assess critically the knowledge, technology and information available to solve the problems they face to.
C8
CT8 - Rating the importance that has the research, innovation and technological development in the socio-economic and cultural progress of society
Learning aims
Learning outcomes
Study programme competences / results
G2 - Capacidade de aplicación de algoritmos de resolución dos problemas e manexo do software adecuado.
AJ5 AJ6 AJ10
BJ1
CJ3
G3 - Capacidade de traballo en equipo e de xeito autónomo
AJ5 AJ6
BJ1 BJ4 BJ5
CJ3 CJ6 CJ8
G4 - Capacidade de formular problemas en termos estatísticos, e de resolvelos utilizando as técnicas axeitadas.
AJ5 AJ6 AJ10
BJ1
CJ3 CJ6
G11 - Adquirir destreza para o desenvolvemento de software
AJ5 AJ6
BJ5
CJ3
G14 - Representar un problema real mediante un modelizado estatístico axeitado.
AJ5 AJ6 AJ10
BJ1 BJ5
E5 - Coñecer algoritmos de resolución dos problemas e manexar o software axeitado.
AJ5 AJ6 AJ10
BJ1 BJ5
CJ3 CJ6 CJ8
E12 - Realizar inferencias respecto aos parámetros que aparecen no modelo.
AJ5 AJ6 AJ10
BJ1 BJ4 BJ5
CJ3 CJ6 CJ8
E19 - Tratamento de datos e análise estatística dos resultados obtidos.
AJ5 AJ6 AJ10
BJ1 BJ4 BJ5
CJ3
E27 - Obter os coñecementos precisos para unha análise crítica e rigurosa dos resultados.
AJ5 AJ10
BJ1 BJ4 BJ5
CJ6 CJ8
E82 - O estudiante será capaz de comprender a importancia da Inferencia Estatística como ferramenta de obtención de información sobre a población en estudo, a partir do conxunto de datos observados dunha mostra representativa de esta. Para iso deberá recoñecer a diferenza entre estatística paramétrica e non paramétrica.
AJ5 AJ10
BJ1 BJ4 BJ5
CJ6 CJ8
E84 - Ser quen de manexar diverso software (en particular R) e interpretar os resultados que proporcionan nos correspondentes estudos prácticos.
AJ5 AJ6 AJ10
BJ4 BJ5
CJ3
E86 - Soltura no manexo da teoría da probabilidade e as variables aleatorias.
AJ5 AJ10
BJ1 BJ4 BJ5
CJ6
Coñecemento dos conceptos básicos de probabilidade e inferencia estatística e a súa aplicación na bioinformática
AJ1 AJ5
Coñecemento e aplicación de técnicas estatísticas para a análise cuantitativa de bases de datos xeradas no ámbito da Bioinformática.
AJ1 AJ2 AJ3 AJ5 AJ6 AJ7
BJ2 BJ4 BJ5 BJ6
CJ3
Obter soltura co software estatístico R, manexando un importante número de recursos e introducindo ó estudiantado na programación neste contorna.
AJ3 AJ6
CJ3
Familiarizarse con modelos probabilísticos de procesos estocásticos en tempo discreto.
AJ1
BJ2 BJ4
Formación en técnicas de remostraxe (bootstrap) como ferramenta para a aplicación e/ou avaliación de diferentes algoritmos estatísticos.
AJ1 AJ2 AJ3 AJ5 AJ7
BJ2
CJ3
Contents
Topic
Sub-topic
1. Basic concepts of probability and statistics revisited.
a. Probability. Random variables and main discrete and continuous distributions. Multivariate distributions.
b. Statistical inference: estimation, hypothesis testing and confidence intervals.
2. R statistical programming language revisited.
a. Introduction to R. First steps. Internal functions. Help in R. Functions, loops, vectors. Statistical functions. Plots.
Recursivity. R studio.
b. Main probability distributions in R.
c. Introduction to simulation in R.
d. Descriptive statistics in R.
e. Hipothesis testing and confidence intervals with R.
3. Linear statistical models.
a. The simple linear regression model. Basic assumptions. Estimation. Testing. Prediction. Model diagnostics.
b. The multivariate linear regression model. Basic assumptions. Estimation. Testing. Prediction. Model diagnostics.
c. Basic models in experimental desing. One-way and two-way Analysis of Variance (ANOVA), with or without interaction. Basic assumptions. Estimation. Testing. Model diagnostics.
d. The multiple testing problem. False discovery rate.
4. Introduction to stochastic processes.
a. Simple random walk.
b. Poisson process and renewal processes. Birth-death processes.
c. Markov processes. Markov Chains.
5. Introduction to resampling methods.
a. The uniform Bootstrap. Computing the bootstrap distribution: exact distribution and aproximated distribution using Monte Carlo. Examples. Aplication of the bootstrap for estimating the precision and the bias of an estimator.
b. Variations of the uniform Bootstrap. Parametric Bootstrap, symmetrized Bootstrap and smoothed Bootstrap. Discussion and examples.
c. Bootstrap methods to construct confidence intervals: percentile method, percentil-t method, simmetrized percentil-t method. Examples. Simulation studies .
6. Review of numerical optimization methods.
a. Basic concepts in optimization.
b. Numerical optimization with no restrictions.
Planning
Methodologies / tests
Competencies / Results
Teaching hours (in-person & virtual)
Student’s personal work hours
Total hours
Oral presentation
A5 A6 A10 B1 B4 B5 C8
24
36
60
ICT practicals
A5 A6 A10 B4 B5 C3 C6
18
36
54
Multiple-choice questions
A5 B1 B5 C8
1
9
10
Problem solving
A5 A6 A10 B1 B4 B5 C3 C6 C8
4
16
20
Personalized attention
6
0
6
(*)The information in the planning table is for guidance only and does not take into account the heterogeneity of the students.
Methodologies
Methodologies
Description
Oral presentation
Presentation using the computer.
ICT practicals
Statistical data analysis using R.
Multiple-choice questions
Multiple-choice test on concepts.
Problem solving
Deciding statistical tools and strategies for problem solving. Linear model formulation. Design of Experiments. Formulation of resampling plans.
Personalized attention
Methodologies
ICT practicals
Problem solving
Description
Attendance and participation in lectures.
Written multiple choice test.
Participation in workshops and seminars.
Practicals to be performed by the student.
Assessment
Methodologies
Competencies / Results
Description
Qualification
Oral presentation
A5 A6 A10 B1 B4 B5 C8
Oral presentation of the original wok mentioned in the "Problem solving" item and continuous and objectifiable monitoring of active participation.
10
ICT practicals
A5 A6 A10 B4 B5 C3 C6
Computer lab using the open statistical software R (final exam).
30
Problem solving
A5 A6 A10 B1 B4 B5 C3 C6 C8
Original work on some of the topics of the course concerning some interesting setup in Bioinformatics and continuous and objectifiable monitoring of active participation.
30
Multiple-choice questions
A5 B1 B5 C8
Comprehension test (final exam)
30
Assessment comments
The assessment will be carried out using a test on R labs, an individual student work, as well as a written concept test. The concept test score will be 30% of the total qualification, the test on R labs will correspond to 30% of the global score, while the remaining 40% will correspond to the individual student work, that has to be presented orally. One fourth of the score of this individual work (10% of the total score) corresponds to its oral presentation.
In order to pass the subject it is necessary to obtain a score of, at least, 5 out of 10 overall.
On July opportunity, students could avoid those tests with scores of, at least, 4 out of 10 in January tests.
Part-time students will be evaluated considering the continuous assessment system. Therefore, they will be supervised and they will be allowed to present the individual student work during the quadrimester.
Only students that didn't take any test and didn't submit the individual student work will be qualified as NON ATTENDANT in the first opportunity (January-February). In July (2nd opportunity) only students that didn't take the final exam will be qualified as NON ATTENDANT.
If a student wants to take a test in a specific official language (Spanish or Galician), he/she must inform the professor at least 1 week in advance.
Fraud in tests or evaluation activities, once verified, will directly imply the qualification of failing in the call in which it is committed: the student will be graded with "fail" (numerical grade 0) in the corresponding call of the academic year, whether the misconduct occurs either in the first opportunity or in the second one. In this case, their qualification will be modified in the first opportunity, if necessary.
Sources of information
Basic
Efron, B. and Tibshirani, R.J. (1993). An Introduction to the Bootstrap. Chapman and Hall
Davison, A.C. and Hinkley, D.V. (1997). Bootstrap Methods and their Application. Cambridge University Press
Peña Sánchez de Rivera, D. (2000). Estadística: Modelos y Métodos. Alianza Editorial
Cao Abad, R., Francisco Fernández, M., Naya Fernández, S., Presedo Quindimil, M.A., Vázquez Brage, M (2001). Introducción a la Estadística y sus Aplicaciones. Pirámide
Ewens, W.J. and Grant, G.R. (2005). Statistical Methods in Bioinformatics. Springer
Ross, S.M. (1995). Stochastic Processes. Wiley
Complementary
Recommendations
Subjects that it is recommended to have taken before
Subjects that are recommended to be taken simultaneously
Introduction to databases/614522002
Genomics/614522006
Fundamentals of bioinformatics/614522008
Introduction to programming/614522001
Foundations of Artificial Intelligence/614522003
Subjects that continue the syllabus
Data structures and algorithmics for biological sequences/614522013
Advanced processing of biological sequences/614522020
Computational intelligence for high dimensional data/614522024
Master thesis/614522025
Computational intelligence for bioinformatics/614522012
Advanced statistical methods in bioinformatics/614522009
Other comments
As stated in the different applicable regulations for university teaching, the gender perspective must be incorporated in this course (non-sexist language will be used, bibliography of authors of both genders will be used, intervention in class of both male and female students will be encouraged, etc.)
Work will be done to identify and modify prejudices and sexist attitudes. The environment will be influenced to modify these prejudices and attitudes, to promote values of respect and equality.
Situations of discrimination based on gender must be detected. Actions and measures to correct them will be proposed.
(*)The teaching guide is the document in which the URV publishes the information about all its courses. It is a public document and cannot be modified. Only in exceptional cases can it be revised by the competent agent or duly revised so that it is in line with current legislation.