The "D" In Data Stands For Discipline

Sep 17, 2025

Hi, fellow future and current Data Leaders; Ben here 👋

Before diving in to that, I wanted to let y’all know that I’ll be running several events in person in the UK, Seattle and Denver. The first will be in the UK, so if you’d like to join me after Big Data London, you can sign up here!

Also, special thanks to Estuary for partnering on this event!

Food and drinks will be provided as well as a live band!

Now let’s jump into the article!

A seasoned data leader once told me something that stuck: the d in data stands for discipline.

Now to be clear, pretty much every field requires discipline.

But we often focus on the exciting parts of data work in articles.

Not the tedious, repetitive, mundane, just doing the right thing type work.

It is those unglamorous but critical habits that help you deliver good work:

Resisting the urge to over-engineer a simple problem
Enforcing standards even when it feels tedious
Saying “no” to the ad-hoc requests that would derail your team from finishing meaningful work
Committing to an idea even if it doesn’t provide immediate results

None of these are flashy. They don’t exactly make for a good case study or talk at a conference. But looking back into my article from a couple weeks ago. But they are more habits that I’ve seen separate good and great data teams.

There are plenty of benefits from doing all the small things well.

Below are three ways discipline shows up in a high-performing data teams.

Practicing Restraint In Data Infrastructure Choices

Here I often think back to my early cooking days. As a young cook, I’d pile every technique I knew onto a single plate. I wasn’t thinking about the diner’s experience. I really was just showing off.

Over time I learned that every component must earn its place.

In the same way, data infrastructure is a tempting place to use tool after tool and build a massively complex system to report churn.

It’s tempting to keep adding tools, Iceberg, Databricks, Snowflake, Airflow, Sigma, Unity Catalog, until the stack looks impressive on paper. Yet, I’ve seen teams spend millions building these sprawling setups only to hear the same complaint from business leaders:

“I can’t find the numbers I need”

So technically, you might have built an amazing data infrastructure stack but functionally for the business no one uses it.

So why did you build it?

Saying No to the Endless Ad-Hoc Ask - And Actually Finishing Work

Data teams can quickly become the company’s catch-all IT fill-in department.

One day it’s a finance team automation, the next it’s a one-off dashboard or an urgent “just pull this number” request. Each ask might seem harmless, but together they create a flood that pulls the team away from meaningful, long-term work.

Data teams need to have the discipline to say “No”.

Now I say “No” but I mean, you need to be able to communicate what your team can and can’t get done and what trade offs are.

It’s also about ensuring your team isn’t constantly getting ripped from one project to another.

After all:

“you only get value from projects when they finish: to make progress, above all else, you must ensure that some of your projects finish.”
― Will Larson, An Elegant Puzzle: Systems of Engineering Management

If you let your data team get steam-rolled and just say yes to every ad-hoc request, at first other leaders might enjoy how much attention they are getting, then they will keep asking for more and more and your data team will get burnt out, projects won’t get fully delivered and things will just continue to unravel.

Following Standards Relentlessly

Standards cover everything from how tables are named to how version control is handled, how SQL is formatted, and how data models are structured.

I don’t view these are not just cosmetic choices. Consistency ensures that a query written today will still be understandable (and trustworthy) six months from now.

Now in the past this meant I’d have to go over my SQL scripts and ensure that I tabbed everything as expected.

Luckily now you can automate this part of “discipline”. You can do so by making those standards part of the process. Use linting tools, CI/CD checks, and code reviews so that every pull request is validated before it ever hits production. When we did this at Facebook nits went down 98%(Or at least that’s what it felt like).

Documentation is part of the standard. A living style guide and onboarding guide help new team members ramp up quickly, reducing the “legacy knowledge” that can paralyze a team when someone leaves.

This also makes it easier for external team members to quickly understand what your naming conventions and workflows look like.

It also makes future migrations, looking through code for specific patterns and a whole host of other larger projects considerably easier.

But Wait There Is More!

Now there are plenty of other ways data teams remained disciplined when it comes to delivering the right end products to the business. If you’d like to read some of my past articles on some of the topics you can see them below.

Final Thoughts

So many projects and data stacks I’ve seen fall apart because of small compromises that built up over time.

There was no clear owner for a task or larger process.
The data engineering team used a tool to patch a problem rather than inquiring why they were having the issue.
No one set standards which allowed for fast development at first until the first migration.

Each small compromise seems harmless in the moment, but together they create technical debt, brittle pipelines, and future pains.

There is always a balance of course, you don’t want to put so much process in the way that nothing moves.

But you also want to avoid building a big ball of mud that no one wants to touch.

As always, thanks for reading.

Upcoming Data Events

Articles Worth Reading

There are thousands of new articles posted daily all over the web! I have spent a lot of time sifting through some of these articles as well as TechCrunch and companies tech blog and wanted to share some of my favorites!

Is TV's Golden Age (Officially) Over? A Statistical Analysis

Daniel Parris

A lot can change in forty-five months. Think back to November of 2021: the world had yet to see a Tesla Cybertruck, HBO Max was an ascendant streaming service, Will Smith had slapped zero people on live television, Sam Bankman-Fried was a benevolent billionaire and model citizen, and Netflix's stock was soaring, buoyed by the pandemic.

And then the unthinkable happened: Netflix reported a quarterly loss of 200,000 subscribers. This earnings miss—coupled with broader economic uncertainty—triggered widespread panic across the entertainment industry. Streaming platforms slashed their content budgets, media conglomerates like Disney and Warner Bros. laid off thousands, and Netflix's stock fell by 51%. This industry-wide contraction culminated in a six-month writers’ strike, as unions demanded higher pay, standardized compensation, and greater residuals. Amid full work stoppages and a volatile economy, industry pundits began speculating whether streaming could recapture its pre-2022 momentum.

The Pedantic Layer

Joe Reis

In the data world, we love to argue about definitions. What is unstructured data? Is JSON structured or semi-structured? Are PDFs unstructured, or do they contain “implicit structure”? Do LLM embeddings of text count as structured data? And WTF is a semantic layer? Entire threads, articles, and even conference talks spin out of these debates.

This obsession forms what I call the Pedantic Layer…