R básico

Vector

Una lista de un solo tipo de elemento

v <- c(1, 2, 3, 4)

Lista

Una lista que puede contener distintos tipos de elementos

ficha <- list(nombre = "John Doe",
              edad = 28)

ficha$edad
[1] 28

Factor

x <- factor(c("Alfa", "Gamma", "Beta"),
            levels = c("Alfa", "Beta", "Gamma", "Delta"),
            ordered = TRUE)

Data frame

Una lista de vectores o factores con el mismo número de valores.

df <- data.frame()

Al indexar un dataframe se utiliza, entre dobles corcchetes, el número de filas y columnas: df[row, column]

Índice Resultado
df[1, ] la primera fila
df[ ,1] la primera columna
df[ , ] todo

Matrices

m <- matrix(c(1,2,3,4), nrow=2)

Las matrices se rellena de columna en columna.

Importar y exportar de otros vormatos

En RStudio hay formas sencillas de importar y exportar.

.csv

Importar

read.csv("file.csv",
         sep = ",",
         header = TRUE,
         stringsAsFactors = TRUE

Exportar

write.csv(data, file = "data.csv",
          row.names = FALSE)

Operadores

%in%

%in sirve para ver si un valor está contenido en otro (un vector, habitualmente):

people <- c("Anna", "Bob", "Charlie", "Diana")
girl <- c("Anna", "Diana")
girl_names <- people %in% girl
table(girl_names)
girl_names
FALSE  TRUE 
    2     2 
gmodels::CrossTable(people, girl_names,
                    chisq = TRUE)
Warning in chisq.test(t, correct = FALSE, ...): Chi-squared approximation may
be incorrect

 
   Cell Contents
|-------------------------|
|                       N |
| Chi-square contribution |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  4 

 
             | girl_names 
      people |     FALSE |      TRUE | Row Total | 
-------------|-----------|-----------|-----------|
        Anna |         0 |         1 |         1 | 
             |     0.500 |     0.500 |           | 
             |     0.000 |     1.000 |     0.250 | 
             |     0.000 |     0.500 |           | 
             |     0.000 |     0.250 |           | 
-------------|-----------|-----------|-----------|
         Bob |         1 |         0 |         1 | 
             |     0.500 |     0.500 |           | 
             |     1.000 |     0.000 |     0.250 | 
             |     0.500 |     0.000 |           | 
             |     0.250 |     0.000 |           | 
-------------|-----------|-----------|-----------|
     Charlie |         1 |         0 |         1 | 
             |     0.500 |     0.500 |           | 
             |     1.000 |     0.000 |     0.250 | 
             |     0.500 |     0.000 |           | 
             |     0.250 |     0.000 |           | 
-------------|-----------|-----------|-----------|
       Diana |         0 |         1 |         1 | 
             |     0.500 |     0.500 |           | 
             |     0.000 |     1.000 |     0.250 | 
             |     0.000 |     0.500 |           | 
             |     0.000 |     0.250 |           | 
-------------|-----------|-----------|-----------|
Column Total |         2 |         2 |         4 | 
             |     0.500 |     0.500 |           | 
-------------|-----------|-----------|-----------|

 
Statistics for All Table Factors


Pearson's Chi-squared test 
------------------------------------------------------------
Chi^2 =  4     d.f. =  3     p =  0.2614641 


 

Utilidades

lapply

List apply - aplica una función a una lista de vectores. Cuando se aplica a un dataframe, lo hace a cada uno de los elementos.

Se puede utilizar para normalizar todos los valores de un dataframe:

normalize <- function(x) {
    return ((x - min(x)) / (max(x) - min(x)))
}

normalized_df <- as.data.frame(lapply(old_df, normalize))
# Hay que convertirlo en un data frame de nuevo porque 'lapply'
# devuelve una lista
1.
Chollet F. Deep learning with R. Second edition. Shelter Island: Manning; 2022.
2.
Gatto L. An Introduction to Machine Learning with R. 2020.
3.
Song Y, Millidge B, Salvatori T, Lukasiewicz T, Xu Z, Bogacz R. Inferring neural activity before plasticity as a foundation for learning beyond backpropagation. Nat Neurosci. 2024 Feb;27(2):348–58.
4.
Jones E, Harden S, Crawley MJ. The R book. Third edition. Hoboken, NJ: Wiley; 2022.
5.
Field A, Miles J, Field Z. Discovering statistics using R. Repr. Los Angeles, CA, USA: Sage; 2014.
6.
Yap BW, Sim CH. Comparisons of various types of normality tests. Journal of Statistical Computation and Simulation. 2011 Dec;81(12):2141–55.
7.
Perezgonzalez JD. Fisher, Neyman-Pearson or NHST? A tutorial for teaching data testing. Front Psychol. 2015 Mar;6:223.
8.
Lantz B. Machine learning with R: Learn techniques for building and improving machine learning models, from data preparation to model tuning, evaluation, and working with big data, fourth edition. 4th ed. Place of publication not identified: Packt Publishing; 2023.
9.
Carmona Pontaque F. Álgebra Matricial en Estadística. Análisis Multivariante. Fundació Universitat Oberta de Catalunya; 2024.
10.
Carmona Pontaque F. Modelos lineales. Ediciones de la Universidad de Barcelona; 2004.
11.
Faraway JJ. Linear models with R. Second edition. Boca Raton London: CRC Press, Taylor & Francis; 2014. (Texts in statistical science).