Skip to content

Data Contracts: The Future of Data Management is Here

  • News
Data Contracts: The Future of Data Management is Here

Oops, Did I Break the AI? Why How We Handle Data Needs a Big Change

Hey everyone, John here! Welcome back to the blog. Today, we’re diving into something that might sound a bit technical—data management—but I promise, it’s a story about making life easier for everyone and preventing those “oops, I broke everything” moments. And as always, my assistant Lila is here to keep me honest and make sure we don’t get lost in jargon.

Imagine you’re building something with LEGOs. In the old days, someone would give you a very strict, step-by-step instruction booklet. You had to follow it exactly. This was slow and frustrating if the instructions weren’t quite right for the cool spaceship you wanted to build.

Today, it’s more like you get a giant box of LEGOs with no instructions. You have the freedom to build whatever you want! This is great for creativity, but what if someone else was counting on you to build a specific red car, and you decided to use those red bricks to build a rocket ship instead? Their project is now broken. This is the challenge many companies face with their data today.

One Tiny Change, One Giant Headache

Let me tell you a story about an engineer we’ll call Jez. Her company stores information about customer support tickets. Years ago, they switched computer systems, and to avoid mixing up old and new tickets, they added the word “zendesk:” to the start of every new ticket ID. For example, ticket number 123 became “zendesk:123”.

Jez, being a smart and efficient engineer, noticed that the old system was long gone. She thought, “Why are we wasting space storing ‘zendesk:’ on every single ticket? Let’s just store the number!” It seems like a perfectly logical, tiny optimization. So she wrote one line of code to remove it.

But imagine a world without the safety net we’re about to discuss. Here’s what would have happened:

  • Her change would be approved.
  • The new ticket IDs (without “zendesk:”) would start flowing into the company’s main data storage.
  • A super-important AI system that predicts trends would suddenly stop understanding 40% of the new data, because it was expecting every ID to start with “zendesk:”. It would silently start giving wrong results.
  • The finance team’s dashboard that tracks how many tickets they handle would also break.
  • It would take days or even a week of frantic work from multiple teams to figure out what went wrong and fix it. And poor Jez would be known as “the one who broke the AI with a single line of code.”

This is a huge problem! Developers are either moving too slowly because they’re afraid to break something, or they move quickly and accidentally cause a data disaster. The famous Mars Climate Orbiter, which was lost in space, is a real-life example of a similar mix-up—one part of the system used metric units while another used imperial units, leading to a catastrophic failure.

The Hero of Our Story: The Data Contract

Luckily for Jez, her story had a happy ending. The moment she tried to submit her “optimized” code, she got an instant, automatic rejection message. It basically said:

“STOP! This change violates the rules. The ‘ticket_id’ field is supposed to look like ‘zendesk:123’, and your change breaks that rule. If you proceed, you will break the finance dashboard and the AI model.”

Jez immediately undid her change, and the whole incident took about 30 seconds. No harm done. What saved the day was something called a data contract.

Lila: “Hang on, John. A ‘data contract’ sounds like legal paperwork. What is it really?”

That’s a great question, Lila! Think of it less like a legal document and more like a very strict, automated agreement. A data contract is a piece of code that says, “Hey, any data that I produce will follow these exact rules.” In Jez’s case, the contract declared that the `ticket_id` must start with “zendesk:”. It’s an upfront promise about what the data will look like, and a computer checks that promise automatically before any damage can be done.

A Smarter Approach: “Shifting Left” with Data

This idea of checking for problems early is part of a bigger movement called “shift left.”

Lila: “Okay, ‘shift left’ definitely sounds like industry jargon. Can you break that down for us?”

Of course! Imagine a project’s timeline on a piece of paper, going from left (the start) to right (the finish). “Shifting left” simply means moving a task from the right side of the timeline to the left side—in other words, doing it much earlier in the process.

Instead of waiting until the data is already in the system and causing problems (far to the right of the timeline), we “shift left” and check the data right when the code that creates it is being written (at the very beginning, on the left). The core idea is simple: data is created by code, so let’s check the data rules in the same way we check the code itself.

Giving Developers Superpowers

Once you start treating data this way, you can use the same kinds of automated tools that developers already use for their code. This “shift-left tool kit” for data includes a few cool things:

  • Automatic Code Scans: A tool can look at an engineer’s code before it even runs to see what kind of data it will create.
  • Data Contracts in the Build Process: This is what saved Jez. The data contract is checked automatically whenever a developer tries to add new code. (This check happens in a system called CI, or Continuous Integration. Think of CI as a tireless robot assistant that instantly proofreads every piece of code for errors before it gets mixed into the main project.)
  • Impact Warnings: The system can warn a developer, “Be careful! This small change you’re making will have a big impact on these three other systems down the line.”
  • Automatic Rule-Checking: You can set up rules for privacy (like “never let personal information leak out”) or data retention, and the system will check them at the very beginning, not months later during an audit.

New platforms are emerging that make this possible. They connect to the place where developers store their code and automatically identify where data is created, suggest contracts for it, and then stand guard to block any changes that would violate those contracts. The responsibility for data quality is moved to the person who has the most control over it: the developer writing the original code.

Final Thoughts from John and Lila

John: To me, this just makes perfect sense. We’ve seen this happen before. We used to write code and have someone else test it for security holes later. Now, automated tools check for security issues as we write. Quality and security have both “shifted left.” It seems data is the next logical step. Preventing a problem is always, always better than cleaning up a mess. It lets everyone ship new things faster and, as the article says, sleep better at night.

Lila: As someone who is still learning about all this, the idea is really comforting! It feels like putting guardrails on a bridge. It makes it much safer for everyone to move forward quickly without worrying they’re going to accidentally drive off the edge. Knowing that a system will stop you and explain what’s wrong makes the whole process seem much less scary.

This article is based on the following original source, summarized from the author’s perspective:
It’s time to completely change how data management
works

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *