Organise your data the R way
NEW! - more stuff and in a github repo - https://github.com/ifknot/data_frame
I like R for statistics. The variables in R are lexically scoped and dynamically typed.
I like C++ for just about everything else. C++ is a strongly typed language and it is also statically-typed; every object has a type and that type never changes.
I want to do some simple statistics in C++ but I can't imagine doing that without a heterogenous Data Frame.
I want to be able to do what I do in R - desiderata:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Create the data frame. | |
emp.data <- data.frame( | |
emp_id = c (1:5), | |
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"), | |
salary = c(623.3,515.2,611.0,729.0,843.25), | |
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", | |
"2015-03-27")), | |
stringsAsFactors = FALSE | |
) | |
# Print the data frame. | |
print(emp.data) |
But in C++ - ipsa:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include <iostream> | |
#include "r_data_frame.h" | |
int main() { | |
std::cout << "heterogeneous container\n\n"; | |
R::data_frame d; | |
d["id"] = { 1, 2, 3, 4, 5 }; | |
d["name"] = { "Rick", "Dan", "Michelle", "Ryan", "Gary" }; | |
d["salary"] = { 623.3, 515.2, 611.0, 729.0, 843.25 }; | |
d["start_date"] = R::as_dates({ "2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27" }); | |
// print out data table | |
std::cout << d << '\n'; | |
// accessing data does need prior knowledge of the column data type | |
auto money = std::get<double>(d["salary"][1]); | |
// but C++ is strongly typed so there we go | |
std::cout << std::get<std::string>(d["name"][1]) << " earns $" << money << "\n\n"; | |
std::cout << d["name"] << '\n'; | |
} |
It does this (unlike R in C++, indexing begins from 0)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
heterogeneous container | |
id name salary start_date | |
0 1 Rick 623.3 2012-01-01 | |
1 2 Dan 515.2 2013-09-23 | |
2 3 Michelle 611 2014-11-15 | |
3 4 Ryan 729 2014-05-11 | |
4 5 Gary 843.25 2015-03-27 | |
Dan earns $515.2 | |
Rick Dan Michelle Ryan Gary |
Here's how...