Grammars for everything data

Related: Tidy data

Abstract A huge amount of effort is spent cleaning data to get it ready for analysis, but there has been little research on how to make data cleaning as easy and effective as possible. This paper tackles a small, but important, component of data cleaning: data tidying. Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table. This framework makes it easy to tidy messy datasets because only a small set of tools are needed to deal with a wide range of un-tidy datasets. This structure also makes it easier to develop tidy tools for data analysis, tools that both input and output tidy datasets. The advantages of a consistent data structure and matching tools are demonstrated with a case study free from mundane data manipulation chores.

via Kevin Lynagh / Andy Chu

What’s a good intro to model-based reinforcement learning?

This is one: Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

via Stephen Tu

Modernized, sped-up SICP in clojure? Sounds good to me.

SICP Distilled

OpenNMT: An open source neural machine translation system.

Modules over monads and initial semantics

Abstract Inspired by the classical theory of modules over a monoid, we introduce the natural notion of module over a monad. The associated notion of morphism of left modules (“linear” natural transformations) captures an important property of compatibility with substitution, not only in the so-called homogeneous case but also in the heterogeneous case where “terms” and variables therein could be of different types. In this paper, we present basic constructions of modules and we show how modules allow a new point of view concerning higher-order syntax and semantics.

I’ve been assured this is a good paper.