Reading List

The Selfish Gene
The Psychopath Test: A Journey Through the Madness Industry
Bad Science
The Feynman Lectures on Physics
The Theory of Everything: The Origin and Fate of the Universe


ifknot's favorite books »

Saturday, 5 September 2020

A heterogeneous data frame in C++

 

Organise your data the R way

NEW! - more stuff and in a github repo - https://github.com/ifknot/data_frame

I like R for statistics. The variables in R are lexically scoped and dynamically typed. 

I like C++ for just about everything else. C++ is a strongly typed language and it is also statically-typed; every object has a type and that type never changes.

I want to do some simple statistics in C++ but I can't imagine doing that without a heterogenous Data Frame.

I want to be able to do what I do in R - desiderata:

# Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
# Print the data frame.
print(emp.data)
view raw data_frame.r hosted with ❤ by GitHub

But in C++ - ipsa:

#include <iostream>
#include "r_data_frame.h"
int main() {
std::cout << "heterogeneous container\n\n";
R::data_frame d;
d["id"] = { 1, 2, 3, 4, 5 };
d["name"] = { "Rick", "Dan", "Michelle", "Ryan", "Gary" };
d["salary"] = { 623.3, 515.2, 611.0, 729.0, 843.25 };
d["start_date"] = R::as_dates({ "2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27" });
// print out data table
std::cout << d << '\n';
// accessing data does need prior knowledge of the column data type
auto money = std::get<double>(d["salary"][1]);
// but C++ is strongly typed so there we go
std::cout << std::get<std::string>(d["name"][1]) << " earns $" << money << "\n\n";
std::cout << d["name"] << '\n';
}

It does this (unlike R in C++, indexing begins from 0)

heterogeneous container
id name salary start_date
0 1 Rick 623.3 2012-01-01
1 2 Dan 515.2 2013-09-23
2 3 Michelle 611 2014-11-15
3 4 Ryan 729 2014-05-11
4 5 Gary 843.25 2015-03-27
Dan earns $515.2
Rick Dan Michelle Ryan Gary
view raw data_frame.txt hosted with ❤ by GitHub

Here's how...