Reading List

The Selfish Gene
The Psychopath Test: A Journey Through the Madness Industry
Bad Science
The Feynman Lectures on Physics
The Theory of Everything: The Origin and Fate of the Universe


ifknot's favorite books »

Saturday 17 October 2020

Reading dumb data into the C++ heterogeneous data_frame

 

Ingest some CSV

Last month I presented an R-ish data_frame class as a small side project, this month I present a C++ equivalent of the R read.csv()function to import data into a data_frame class.

Ongoing development is at the github repo: https://github.com/ifknot/rpp

To recap the motivation for the C++ heterogenous  data_frame class was two-fold:
  1. Runtime handling of dumb data whose format, types, and fields are unknown.
  2. Enable data science skill transfer from the R functional programming environment into C++.
The motivation for the read_csv()function remains the same as that for heterogenous the  data_frame class:

"I want to be able to do the same sort of thing that I do R, but in C++".

Which, this time around, means that I want to be able to use one of the easiest and most reliable ways of getting data in - text files.

In particular CSV (comma-separated values) files. The CSV file format uses commas to separate the different elements in a line, and each line of data is in its own line in the text file, which makes CSV files ideal for representing tabular data - i.e. the  data_frame class.