Working with R and RStudio.

R is a programming language for statistical computing and RStudio an IDE. If you want to use it, install R first, and than the IDE (linked on IDE Website).

I use it for a project at university and want to sum up some important information for my mind and the other people struggling with it and searching for help at the Neuland.

Or maybe…

I’ll go back to work and write this article later. Tomorrow. Or the day after tomorrow. Or the year after. Hmmm…

Some commands

General

  • print(„Hello world“)
  • # Yeah, this is a comment!
  • print(result) # Print Var/Obj/whatever (?), but also execute, see notice.
  • plot(x,y) # Build a graph in the plot window. It’s like at school with x and y values.

Libraries

You need libraries for doing cool stuff and even some basic stuff. Just install them via the package manager (or console install) and embed them in your document. Her are my libraries at this point of time. See also: support.rstudio.com/ Quick list of useful R packages

library(RMariaDB) # DB connection
library(DBI) # DB connection
library(dbplyr) # Something with DB...
library(dplyr) # Something with DB...
library(dbplot) # Plot things from db? I have no working example. :/
library(sparklyr) # With Apache Spark, not really in use...
library(tidytext) # Analyse text, working sample, for later
library(base) # I think it's not needed.
library(lubridate) # Nice time formating! Helped me a lot.

Be careful! Most time i think you won’t get an error message if a library is missing!

With Database

  • con <- dbConnect(RMariaDB::MariaDB(), dbname=’test‘, group = „my-db“) # .my.cnf with [my-db] and database = XY
  • dbReadTable(con, „TheDBtable“) # Don’t use it. read the table.
  • dbListFields(con, „TheDBtable“)) # Show table fields.
  • result <- dbSendQuery(con, „SELECT * FROM TheDBtable WHERE * LIKE ‚%mars%‘ ORDER BY <time whatever> LIMIT 100“) # Nice, SQL! <3
  • result <- dbFetch(res) # Get result
  • dbClearResult(res) # Just dd…
  • dbDisconnect(con) # …ooo it!

You get a data.frame with a table inside from result.

You can access the data.frame table with result$column (name of the column), e.g. print(result$column).

Add columns by assigning (right word?) it with cbind to result (is it called variable? Don’t know it’s called exactly, but you know what i mean). Like:

result <- cbind(result, newcolumn = "this text in all n-col rows")

Other example:

library(lubridate)
ergebnis <- (cbind(ergebnis, Year2 = year(ergebnis$a_timestamp)))

Important: I don’t know why, but cbind and some other commands only worked for me after executing them with print or other related commands. I hope i’ll understand why later. If something is broken without a reason try print.


I’ll use now some German words in the examples, don’t be confused. E.g. „ergebnis“ means „result“.

Journalists don’t publish articles between 1 and 6 am. Another example.

With about 10.000 articles and the following sourcecode generated:

library(RMariaDB)
library(DBI)
library(dbplyr)
library(dplyr)
library(dbplot)
library(sparklyr)
library(tidytext)
library(base)
library(lubridate)
library(stringr)

# Connect to my-db as defined in ~/.my.cnf
con <- dbConnect(RMariaDB::MariaDB(), dbname='test', group = "my-db")

res <- dbSendQuery(con, "SELECT a_title, a_timestamp FROM dbs1test WHERE a_title LIKE '%der%' ORDER BY a_title LIMIT 1000")
ergebnis <- dbFetch(res)

dbClearResult(res)

dbDisconnect(con)

ergebnis2 <- (cbind(ergebnis, Hour = hour(ergebnis$a_timestamp)))
ergebnis2 <- (cbind(ergebnis2, Minute = minute(ergebnis$a_timestamp)))

print(ergebnis2)

plot(ergebnis2$Minute, ergebnis2$Hour)

print('END!')

Working with data tables

I’m happy that i easily can short and imports from the MariaDB with the simple SQL support.

But R should be more powerful and good in data mining (that means usually working with huge tables) and here are some basics for working with the data.format tables (result) we build last round.

See also: blog.exploratory.io/filter-with-text-data…

ergebnis2 %>% # Our results data.format table
 select(a_timestamp) # Select column

Quantil & Co.
medium.com/@andreaschandra/data-exploration-mtcars-using-r-c5669aded3ed

Filter in console:

ergebnis2 %>%
  select(a_title, a_timestamp) %>%
filter(str_detect(a_title, "Merkel")) %>%
count(a_title)

More coming soon. Good night!

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert