Digging into the details

Questions I’m going to ask you on our first data science consulting phone call

Let me set the scene: You are a person with data that holds a lot of value. It could be the navigation habits of people on your website; the returns to a survey you fielded; or, the purchase history of your email list — troves of data are dug up daily at organizations across the country. You want to get at the value that data holds but you need to know how much the data is worth and what the practical application will be before you can even scope a project. There are too many unknowns: how much can I spend to work on this? how long will it take? what else would need to change?

You know a lot about your organization but you are unclear how data is going to take you from point A to point B. You might not even be sure what point B looks like! or, at least what the reasonable options are. So, you set up a phone call to talk this through.

Enter the data science consultant. I need to learn a lot about your organization before I can make the most of your data. On this phone call, my goal will be to learn as much of the context around your org and problem as I can. I’ll also have my eye out for repeatable processes to create and/or automate. Ultimately, we want this phone call to produce a rough ballpark for the value of a project we could do together.

In this post, I’ll go through the questions I’ll ask to understand your organization and data. For each question, I’ll talk you through what I’m trying to understand by asking it. As we talk, we will outline the problem statement and grapple with its size. At the end of an hour, we’ll have a rough scope of the project that we can fill in as we start digging into the data.

Digging into the data
Digging into the data
Let’s dig in

Tell me about your organization — and yourself

Can you tell me a bit about yourself and how you started at your role? Can you tell me about your organization? What is the main mission? How big is it? How long have you been operating?

Realistically, I’ll have already researched a lot of this before we start, but it is helpful to understand the backstory to any project before diving into details. Hearing more about your background specifically can also help me tailor the discussion. A discussion with the head of engineering looks different than one with the head of marketing.

Similarly, I need to understand what processes might look like. A small non-profit might operate entirely on manual processes and spreadsheets, while a brand new startup might have all the fancy tools, but no established reporting.

Tell me about the project or problem

Is there an issue we are trying to solve or a strategic project to tackle? What teams work are affected by this issue/project? Is this core to the value proposition, or a secondary process? How much value does this drive? Are there any downstream processes this affects?

Savvy operator that you are, you probably already have a project in mind. I’d want to fully listen and take notes on your thoughts. I’m listening here intently for pain points, systems touched, and repeating processes. At the end of this I want to be able to repeat back in my own words the project or problem we are tackling. This step is crucial, if we don’t have the same understanding now, our envisioned final outcomes will begin to diverge already.

This is also where we begin to estimate the value of the project. If the core mission of your organization is fundraising, then the effect of an algorithm could be huge immediately; on the flip side automating a small process at an enterprise could be tens of thousands of dollars spread across many repetitions.

If you don’t have a project in mind but know that your data holds value then we can discuss the core processes that drive value at your organization. We want the data to be put to its best use, that will usually be found by optimizing processes close to the core mission.

Tell me about your customers

Who are you mainly currently targeting? Could you describe a typical customer profile? Are there multiple groups of typical customers? Do you have different products that cater to different customer groups? How do you reach out to these groups?

I cannot truly know the project unless I understand everyone who it affects — from the employee working the system to the customer buying the product. (For non-profits this would be donors rather than those served — the real customer of a non-profit).

I am also trying to understand any current targeting strategy for those customers. This will help me establish the baseline for how much a model could improve on targeting.

If not much is known about current customers, we might start from a different set of analyses than if we are building on years of customer profiling.

Tell me about your systems

What systems do you use in your role? Where do different systems interact with each other? Is siloing between systems an issue? How do you get data into and out of your main system.

By system, I mean technology used in a workflow. This could mean communications systems Gmail or Slack, production systems, procurement, HR — anything under the sun that is causing a pain point or touches a core process. Most organizations will have all of their data completely digitally available already, even if they have not yet accessed it. Many systems like Salesforce or ActionKit provide easy backend access to all of their data, but no obvious tools to work with that backend data.

Here, I’ll be trying to understand what your data landscape is. By figuring out what systems there are, how they talk with each other, and what processes each one is used for I can understand what building blocks are available for me to build with. Ultimately I’ll be trying to put an upper or lower limit on the kind of improvements I could drive with the data.

If you have individual level data on millions of customers there are more opportunities for machine learning than aggregate measures like total revenues and costs. If the right data is available we can use more advanced approaches like modeling and clustering, with less data we are limited to less specific analysis.

Tell me about your reporting

What metrics do you currently track? How do you see a daily view? Weekly? Yearly? How is it updated and when? Does the reporting contain any automated alarms? How often are reports discussed at status meetings?

You only improve what you measure. Here I am trying to understand if there is reporting in place to track any improvement. A data science project is only valuable if it can show its results. This also applies to a company culture; the kind of reporting that is effective varies by organization. The project will only be a success if our reports prove it was effective and those reports themselves effectively present that case.

The final question is perhaps the most important — there needs to be a culture of responding to reports in order to make them effective. My goal is to set the bar now: reading reporting is an active task that requires effort. Generating a report does not have to be.

Can I run this by you?

Now is my turn to verify what I’ve learned: Repeat your org’s mission and issue. Talk about the data landscape. Review who your customers are. Talk about what tools I could bring to bear and what kinds of improvements these could make in dollar terms.

First and foremost I want to make sure I heard you correctly so we are operating on the same page. I’ll go back through the high level takeaways I got from our conversation.

As we’ve been talking, I’ve been thinking of what an approach could look like. In order to decide if the benefit of a project can outweigh its costs, we will need to ballpark the value based on the issues we just discussed. In the last portion of our call, I’ll walk you through an example like one of the below.


For automating a process the formula would look something like:

Value = Process Frequency x Process Cost x % Savings

Here % Savings would be my best guess based on my experience and how advanced the organization is already.

Machine Learning

For something like a pricing model it would look more like this:

Value = Price * % Efficiency Increase * (Units Sold + Incremental Units Sold)

Again, % Efficiency Increase would be based on my experience and the particular situation. Other machine learning models would generalize to a similar equation.

In both of these situations, you would want the project to be viable at a 10% or less savings or efficiency increase. First of all, that’s likely the minimum we can measure with any certainty. If that is not true, then we need to rethink the engagement. If the benefit is there, we can talk about a scope.

What a scope looks like

A good scope at this stage will be extremely high-level. This is the first phone call after all. The most important thing is to think about these three things:

  1. Value — which we calculated above
  2. Timeline — we shouldn’t be scoping longer than 6 weeks at this point
  3. Implementation — what systems we are thinking about touching and what processes would be affected.

Whew! That’s a lot for one phone call

Yeah, it probably took us the whole hour. But having this whole discussion will help us figure out if there is a good reason for a second call and ultimately a project. If you have had a chance to read this before our first phone call, I hope this helped you come into our call prepared!

Data scientist and pythonista, former fundraising analyst for Bernie 2020. I help non-profits improve their digital fundraising. For more go to shoveldata.me

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store