Data Science & Tech Blog

Building a Recommender System, Part 2

By Matthew Mahowald on Wed, Jun 13, 2029

In a previous post, we looked at neighborhood-based methods for building recommender systems. This post explores an alternative technique for collaborative filtering using latent factor models. The technique we’ll use naturally generalizes to deep learning approaches (such as autoencoders), so we’ll also implement our approach using Tensorflow and Keras.

Tags: recommender, tensorflow

Communicating between Go and Python or R

By Matthew Mahowald on Thu, Jun 13, 2019

Data science and engineering teams at Open Data Group are polyglot by design: we like to choose the best tool for the task at hand. Most of the time, this means our services and components communicate through things like client libraries and RESTful APIs. But sometimes, we need code from one language to call code written in another language directly. In this post, we’ll take a short look at how to do that using C foreign function interfaces (FFI) as a way to call functions written in Go using Python.

Tags: python, R, golang

An Introduction to Hierarchical Models

By Steve Avsec on Thu, May 23, 2019

In a previous post we gave an introduction to Stan and PyStan using a basic Bayesian logistic regression model. There isn’t generally a compelling reason to use sophisticated Bayesian techniques to build a logistic regression model. This could be easily replicated using simpler techniques. In this post, we shall really unlock the power of Stan and full Bayesian inference in the form of a hierarchical model. Suppose we have a dataset which is stratified into N groups. We have a couple of choices for how to handle this situation. We can effectively ignore the stratification and pool all of the data together and train a model on all of the data at once. The cost of this is that we are losing information added by the stratification. We can fit a separate model for each group, but this runs the risk of overfitting. As we shall see below, groups with few observations will typically represent outliers, but this is not indicative that all observations in the group will behave the same way. If some of the groups have few samples and have significant outlying behavior, it is likely that the behavior is driven by the small sample size rather than the group exhibiting behavior that deviates significantly from the mean behavior.

Tags: bayesian, modeling, hierarchical