Teaching GuideTerm Faculty of Computer Science |
Grao en Ciencia e Enxeñaría de Datos |
Subjects |
Statistical Modeling of High Dimensional Data |
Contents |
|
|
|
Identifying Data | 2023/24 | |||||||||||||
Subject | Statistical Modeling of High Dimensional Data | Code | 614G02013 | |||||||||||
Study programme |
|
|||||||||||||
Descriptors | Cycle | Period | Year | Type | Credits | |||||||||
Graduate | 1st four-month period |
Second | Obligatory | 6 | ||||||||||
|
Topic | Sub-topic |
0. Multidimensional distributions |
0.1 Concept of multidimensional distribution 0.2. Variance-covariance matrix. Linear transformations. 0.3. Multidimensional normal: definition and properties. |
1. Dimension reduction methods | 1.1 Objectives of the Principal Component Analysis (PCA) 1.2 Transformations to get incorrelation 1.3 Obtaining the principal components 1.4 Principal components and scale changes 1.5 Interpretation of the principal components 1.6 Factor analysis 1.7 Multidimensional scaling |
2. Unsupervised classification |
2.1 Objectives of unsupervised classification: hierarchical and non-hierarchical methods 2.2 Cluster analysis: approach and objectives 2.3 Hierarchical tree or dendogram 2.4 Similarities and discrepancies between observations 2.5 Criteria for group formation: simple, complete chaining, group average, centroid method, Ward method 2.6 Non-hierarchical distance-based methods: closest neighbors, k means, methods based on density estimation |
3. Supervised classification |
3.1 Objectives of supervised classification: classification rules and error criteria 3.2 Discriminant factor analysis: approach, objectives and calculation of discriminant factors 3.3 Fisher's linear discriminant analysis and quadratic discriminant analysis 3.4 Maximum likelihood discriminant rule, Bayes rule, nonparametric discriminant rules 3.5 Relationship with regression models with binary response 3.6 Estimation of Probability of Incorrect Classification: Cross Validation and Bootstrap |
4. Models for high-dimensional data |
4.1 Variable selection in regression: significance tests. 4.2 The problem of multiple contrasts: false discovery rate (FDR) and familywise error rate (FWER) 4.3 Sparse coefficient regression models: ridge regression, lasso and their variants 4.4 Selection of variables and models with sparse coefficients for classification |
|