Using only Stitch, AWS and Google, build out a complete data pipeline for any small organization at a fraction of the cost of traditional platforms

Photo by Wolfgang Hasselmann on Unsplash
Is your data really that big?

Do you have big data? I suspect that you do not. Many people hear big data and think that the millions of rows they have across a few services constitute big data — but even hundreds of millions of rows is a drop in the bucket in the enterprise big data space. If you are building a pipeline for a small company or non-profit, you do not need to spend enterprise sized chunks of your budget just to get a data pipeline that has all the features and speed you need.

The data science platform market is crowded and confusing…

The segmentation that provides an at-a-glance view for your entire program

There are so many ways to slice and dice fundraising data, it can seem like an endless maze of analyses and visualizations. However, one view stands out above all the rest in giving immediately actionable insights; you can look at it to see the results of your test last night, or ring the alarm bell that something is underperforming as soon as it happens. This analysis will provide you a quick glance at your entire program in a single visual that you should review every day. …

Things I only learned after a couple years on the job

Your mind visualizing a DB after 10 years writing SQL
Photo by fabio on Unsplash

Like many new data scientists, I went to a bootcamp to start my career. Mine was a great program run by Galvanize — I graduated well prepared to build models at my first job. The bootcamp spent about a week on SQL before moving on to the serious DS topics. My first job was 75% SQL 25% modeling — I needed to get a lot better than one week of training at SQL if I was going to do the majority of my job well.

I don’t think the bootcamp should have focused more on SQL; we didn’t need those…

How non-profits can maximize small-dollar fundraising using machine learning

Picture this: you open your mailbox and see an envelope from the local 501(c)(3) you gave to last week. “$5, $10, $20, any amount helps” it reads. It has boxes to allow you to select your giving level. It probably offers tiers with benefits that accumulate as you increase your donation.

“$2.70 [] $5.00 [] $25 [] $100 []”: an email with checkboxes from the candidate you gave to just this morning. It asks you to make recurring donation until the election.

When a potential donor receives the mailed tiers or the email checkboxes, she is supposed to have the…

How to transform your raw data into decisions

This question is the core of all data analytic work. By striving to answer this question, we will transform raw data into analysis and ultimately, decisions. In order for analysis to catalyze decisions, it must convey not only the particular levels of each measurement, but also the framework by which these measurements can be understood. This creates a twofold challenge: dig for insights and then connect those insights to human processes.

So many stories to choose from
Photo by Robyn Budlender on Unsplash

What is the story of your data? is ultimately meaningless without context. Data analysis becomes a story only within a particular context. To develop this story, we start with…

The most important questions people have about machine learning are the hardest ones to answer. People don’t wonder about the nuts and bolts of gradient descent and neural networks even though there are thousands of great posts to answer those questions — they wonder what Machine Learning is going to do for them: How is machine learning going to help me? What will an implementation look like at my organization?

Photo by Pietro Jeng on Unsplash

This post is going to take you through two real world examples of how machine learning can make you better at whatever you do by increasing the efficiency of your…

Questions I’m going to ask you on our first data science consulting phone call

Let me set the scene: You are a person with data that holds a lot of value. It could be the navigation habits of people on your website; the returns to a survey you fielded; or, the purchase history of your email list — troves of data are dug up daily at organizations across the country. You want to get at the value that data holds but you need to know how much the data is worth and what the practical application will be before you can even scope a project. There are too many unknowns: how much can I…

A data scientist continuously tracks a huge number of metrics as she builds a model to hone in on the best outcome. Not only can one seek to build the best model, a skilled data scientist can also calibrate a model to a given situation. A classic example is the medical test designed to be a cheap screener that trades finding some False positives for not missing any True positives. Every model is a unique blend of situation, available data, and the final implementation.

Given the huge number of tools available to a data scientist, it is impossible to boil…

During the great human population boom over the last 200 years, the population rose from 1.1 billion in 1820 to nearly 8 billion today. At the same time, the proportion of people experiencing suffering plummeted. These opposing trends create a fraught dynamic: is the absolute number of people experiencing suffering at any given time increasing or decreasing? Are current trends leading towards less total suffering in the future or is population growth outpacing our quality of life improvements?

The absolute number of people experiencing suffering at any given time in human history is a combination of two factors:

  1. The total…

Joel Shuman

Data scientist and pythonista, former fundraising analyst for Bernie 2020. I help non-profits improve their digital fundraising. For more go to

