Engineering meets data: the collaboration no one prepared you for
What engineers need to know when data analysts, data scientists, and metrics start showing up in the sprint backlog.
Today’s article is an article that Marcos F. Lobo 🗻🧭 and myself wrote a few months ago and published it in his newsletter The Optimist Engineer.
Through his long-running Substack, Marcos F. Lobo 🗻🧭 writes with the kind of clarity that only comes from deep technical experience (Tech Lead for a long time) and a genuine interest in helping others grow. He also has a podcast and he is trying to build a second newsletter (in Spanish).
To end, when I had a chat with him, I got to learn that he worked at CERN! (yes, the European Organisation for Nuclear Research, how cool is that!)
Make sure you subscribe to his newsletter!
If you are reading this, I assume you have been working as an engineer for some time, and, over the years, you and your team have built things that matter: features that work, infrastructure that scales, and deployment pipelines that feel reliable. The engineering culture is established and solid: agile/scrum routines, code reviews, automation, CI/CD, and a clear sense of what “good” looks like.
I also assume that, being 2025, as an engineer, you have worked with data extensively - albeit, through an engineering lens. You track uptimes, you track cluster costs, you track traffic anomalies. Metrics, logs, alerts; that is your world.
But there is a different kind of data work entering the picture now. And it can change how everything fits together.
The scenario that might be new to you
Leadership wants to enhance the service you have owned through more than just engineering metrics. They want to understand how the product is used, where users bounce and what features drive retention.
As you can imagine, this is not engineering-solo anymore.
These problems require new profiles in the team. Data profiles.
Beware… Data analysts and Data Scientists arrive…
These new joiners start asking things that come your way:
Can we log these new user actions?
Can you clean up this dataset?
Can you help productionise this model?
Your engineering team begins to wonder how their responsibilities are shifting. Questions surface in daily stand-ups and sprint planning.
Who owns the tracking plan?
Is this dashboard part of our delivery, or someone else’s?
Are we becoming a support team for the data function?
How are these requests being prioritised?
Will this slow us down?
These concerns are real. And they are common.
The arrival of data professionals (analysts, scientists, sometimes analytics engineers) often marks the beginning of a new phase in how a company operates. But it also challenges established ways of working. It introduces new stakeholders, new dependencies, and new definitions of success.
This post is not here to argue for or against that shift. Instead, it is a practical, guide for engineers encountering this moment for the first time, when data people join the picture, and the rules of collaboration begin to change.
What will we cover in this post?
In the sections ahead, we will walk through:
What data analysts and data scientists actually do, and why their requests keep landing in your backlog.
Who to hire (and who to avoid) when building your first data team.
3 models of collaboration between engineering and data, including which one to start with, and why.
The most common friction points we have seen in the wild and how to avoid them before they derail your team.
If you are an engineer, a tech lead, or someone navigating this transition for the first time, this article is for you.
(Written by your friendly neighbourhood data scientist. 👨🔬 🕷️)
What data people actually do (and what engineers need to know)
If you have never worked closely with a data analyst or a data scientist, it can be difficult to picture what their day-to-day looks like, or why they are suddenly asking for things that land in your backlog.
So, let’s start with what data people actually do.
Data analysts: decoding what is happening
A good analyst helps the business understand how things are going. They spend their time writing SQL, building dashboards, and answering product questions. The work is often ad hoc:
Where are users dropping off during onboarding?
Did this change in the funnel come from a new feature or from seasonality?
What is the retention rate at 60 days of acquisition on new cohorts?
If the business wants to understand what happened and make a decision based on it, the analyst is the one pulling the numbers and presenting the story.
They usually live in product, marketing, or commercial teams — not deep in engineering. But their ability to do the job depends almost entirely on what data engineering has made available.
If logs are missing, inconsistent, or changed without notice, their analysis breaks.
If events are not documented, it takes days to untangle the meaning of a single metric.
If a dashboard is suddenly empty, they do not know if it is a bug or a feature.
Therefore, clean, complete, well-described data is a “must” for their work to have a positive impact. Maybe that backlog ticket might start making more sense now, right?
Data scientists: moving from insight to optimisation
If analysts explain what happened, data scientists focus on what should happen next.
They build models, run experiments, and design algorithms that help a product adapt or improve itself.
They might predict churn. They might optimise ranking. They might experiment with different recommendation strategies.
Unlike analysts, their work is not always tightly scoped. It is often exploratory. Messy. Iterative.
They work in notebooks and Python scripts.
They are deep into experimentation frameworks.
They ask a lot of questions — and sometimes those questions lead to weeks of dead ends before something clicks.
What they need from engineering is not just logged data, but reproducibility.
If a model works once but fails in prod, it is useless.
If the feature they trained on disappears or changes without warning, all bets are off.
And if you have ever been asked to “productionise a model”, welcome to one of the messiest handoffs in tech. (PS: I am guilty of having provided really messy handoffs… I tell you, they don’t need to be messy).
If the descriptions above are still not clear, here goes a Star Wars analogy for you.
If the product is the Millennium Falcon, the analyst is R2-D2 — plugged into the dashboard, monitoring everything in real time.
The data scientist is probably Luke — closing his eyes and trusting the model to hit the target… sometimes successfully.
(Sorry fellow analysts, I have always wanted to picture myself as Skywalker — but hey, R2-D2 is the coolest robot in the whole galaxy.)
What kind of profiles are worth hiring (and which ones to avoid early on)
There are plenty of job descriptions out there to copy and paste, but little experience on how to build a data team from scratch.
Therefore, instead of listing what to look for, I will answer with the traps to avoid at all costs:
A fresh graduate with no context for working with real, messy product data.
A researcher-type who wants perfect data before starting anything.
A “machine learning engineer” with zero interest in actual product questions.
A junior analyst who cannot yet define a metric; let alone defend it.
In the early days, data people needed to be autonomous, communicative, and capable of bridging disciplines. That means knowing the business and the tools. A pretty dashboard is not helpful if no one knows what question it answers.
Collaboration models: How to work together
Once you have brought data analysts or scientists into the picture, the next challenge is figuring out how to actually work together.
Who joins which meetings?
Where do data requests go?
How are priorities aligned — or not?
And what happens when engineering velocity and data ambiguity collide?
There is no single right answer. But there are 3 common models that companies use to structure collaboration between engineering and data. Each has trade-offs, and some are better suited for getting started than others.
Let me go through them.
1. Separate teams, collaborating as partners
In this setup, engineering and data live in different teams. They work on different backlogs, report to different managers, and collaborate through shared rituals — usually weekly syncs or project checkpoints.
👍 Advantages:
This model allows each discipline to retain its functional identity.
Analysts and scientists can support each other, enforce shared standards, and avoid the isolation that comes from being the lone data person in a sea of engineers.
It is easier to scale headcount, maintain tooling consistency, and build internal best practices.
👎 Disadvantages:
The collaboration can become transactional.
Data requests are treated like tickets.
Engineering changes become blockers for analysis.
Context is often lost in handovers.
Unless the teams have strong, intentional alignment rituals, things quickly slide into “service mode,” with long feedback loops and low mutual understanding.
2. Fully embedded data roles
Here, the data analyst or scientist is a full member of the engineering or product squad. They join standups, contribute to sprint planning, and share the same goals and rituals as their engineering peers.
👍 Advantages:
This is the tightest model of collaboration.
Data becomes part of the build process from day one — not something tacked on at the end.
Logging decisions get made with analysis in mind.
Models are scoped realistically.
Product direction benefits from both qualitative insight and quantitative signals.
Engineers and data people build shared language fast.
👎 Disadvantages:
It can be lonely and unsupported for the data person.
When you are outnumbered 6-to-1 in standup, it is easy to deprioritise your own work.
Engineering managers, even the best ones, often struggle to coach or grow data careers.
Over time, technical debt and misalignment build up if the data person is not senior enough to hold their own.
3. Virtual teams (V-teams)
This model creates a temporary, focused team made up of 2-3 selected engineers, 1 or 2 data people, and a product lead. Basically, all pulled together to work on a specific problem space.
👍 Advantages:
You get the best of both worlds.
There is tight day-to-day collaboration without needing a full-blown reorg.
It is a safe way to trial cross-functional work and build empathy between disciplines.
Each team member brings their domain knowledge, and you get to test how data fits into product delivery in real life — not just on slides.
👎 Disadvantages:
It adds some coordination overhead.
Multiple managers need to stay aligned.
Roles and responsibilities can feel fuzzy.
You might have dotted-line reporting, dual backlogs, and the occasional “who is driving this again?” moment.
Assuming this is the first time your engineering team will be working closely with data people, my recommendation is to start with a V-team.
✅ V-teams are the most flexible model, the easiest to spin up, and the best way to test collaboration in the wild without committing to an org-wide change.
⚠️ But — and this is important — V-teams are not a long-term solution.
They are a bridge, not a home. They work best when time-boxed and mission-driven. If they drag on, you start to see accountability blur, managers lose track of priorities, and team members feel stretched between two worlds.
Use the V-team to learn. To find out what rituals actually work. To see where the friction is. And once you know? Disband it.
Let us talk about how to manage a V-team well, because mixing deterministic sprints with exploratory data work is not as easy as it sounds.
Making V-teams work in practice
Most engineering teams work in sprints: story points, velocity, and deterministic planning.
Data work often follows Kanban-style flows: high uncertainty, changing priorities, and an uncomfortable relationship with estimation.
If you put both of these into the same V-team and run Scrum like nothing changed… prepare for chaos.
Instead, treat the V-team as an experiment in collaboration design:
How much work should be committed up front?
How are goals set when outcomes are uncertain?
How are blockers surfaced when there is no clear definition of “done”?
Some teams end up adopting hybrid rituals — sprints with more flexible commitment for data. Others maintain dual tracking boards. Some simply drop points altogether and focus on outcomes.
Unfortunately, there is no universal fix. But if you are open to adjusting ways-of-working and you test in a controlled V-team environment, you will be closer to an effective engineering ⇔ data collaboration model.
Where things often go wrong (and how to prevent them)
Even with the best intentions, collaboration between engineers and data people often runs into familiar traps. Below, I want to share with you 5 friction points I have seen emerge in my day-to-day leading projects with engineers and data scientists.
1. “Done” means different things to different people
One of the first issues tends to show up during delivery.
For an engineer, “done” often means the code is merged, tested, and live.
For an analyst, “done” might mean that stakeholders have signed off on the dashboard and that the metrics actually make sense.
For a data scientist, “done” can be even fuzzier — maybe a model has passed offline evaluation, but still needs validation in production.
This mismatch causes frustration on both sides. Engineers feel like the finish line keeps moving. Data folks feel like things are getting shipped before they are ready, or without enough rigour behind the numbers.
How to prevent it?
Teams need a shared definition of done. Not one-size-fits-all, but something tailored to the type of work — including uncertainty and iteration for data-heavy tasks.
For example, when writing an epic that involves both engineers and data scientists, you might break it into two coordinated work streams. The engineering “done” could be the backend support for a new feature flag, properly logged and deployed. The data “done” might come two weeks later — once enough data has been collected to run an experiment and share results with the product.
What matters is setting expectations clearly. The story does not end at deployment. Nor should a model experiment block frontend shipping. The goal is mutual clarity: what is being delivered, by whom, and when it can be called “useful.”
2. Nobody really owns the data
Engineers create the systems that generate data, but they often do not think of themselves as responsible for what happens to it afterwards. Analysts, on the other hand, assume the data coming in is trustworthy. Then product teams start asking questions about retention, funnels, or attribution — and suddenly everyone is “pointing” fingers.
This lack of ownership shows up quickly. Logs are inconsistent. Metrics are defined differently across teams. An event fires three times on iOS but once on Android. Everyone feels the pain, but no one owns the fix.
How to prevent it?
Treat data as a first-class product surface. Data tracking plans need to be reviewed just like API changes. Engineers should own their logs — naming, frequency, structure — the same way they own code. Create data contracts and model deployment specs. Where possible, assign clear metric owners. If something breaks, someone should know it is their job to care.
3. Work becomes invisible across tool boundaries
Engineers tend to live in GitHub, Jira, and IDEs. Data people tend to live in notebooks, dashboards, and analytics tools. Unless someone bridges the gap, you can go an entire sprint without really knowing what the other side is doing.
From an engineering side, data work can feel like a black box: unclear inputs, delayed outputs, no visibility into progress. From the data side, it often feels like decisions are made without context, or that insights are being ignored because they were never surfaced in the right forum.
How to prevent it?
Fixing this does not require a full tooling overhaul, but it does require some overlap. Like it or not, data people need to upskill in the use of GitHub and code reviews. But engineers also need to understand that data is messy, and notebooks are required as an initial playground. Make sure to include data work in sprint demos — not just code. Celebrate insights from data, not just feature releases.
4. Engineers feel like a support team
Early in the relationship, the data team will likely need help. They need logs added, schemas changed, feature flags exposed, and events instrumented. But all of this lives in engineering land, and over time, the volume of requests adds up.
When this happens without planning, engineers start to feel like a support team — not collaborators. Even well-intentioned requests start to feel like interruptions.
How to prevent it?
The fix is to plan better, not less. Data work should be scoped and prioritised with engineering — not layered on top.
For example, if a team is working on a new onboarding flow, the ticket should already include which events need to be logged, which fields need to be structured, and whether an A/B test is planned.
It is also worth introducing “data readiness” as a shared goal — not just whether the feature works, but whether it can be measured. This creates a shared incentive: engineers are not building for someone else’s roadmap; they are building for a joint outcome.
5. Credit gets lost in translation
This one is less obvious, but no less important.
When an initiative succeeds, credit often flows to the team that shipped the feature, not the team that found the opportunity or validated the impact.
You see this most clearly in product reviews. The deck says, “We launched X and saw a 12% increase in conversions.” It does not say, “The analyst spotted the trend. The scientist built the uplift model. The engineer shipped it.” One team gets the praise. The others fade into the background.
It might seem harmless, but over time it chips away at motivation. Analysts stop raising questions. Data scientists stop pushing ideas. Collaboration becomes quieter, and weaker.
How to prevent it?
Leaders can fix this by modelling full-story storytelling. Recognise the entire chain of work: discovery, design, delivery. Use demos, retros, and updates to call out the behind-the-scenes impact. Not only that, teams should review their competency frameworks and ensure that non-production work, such as data insights, should also be recognised.
Final thoughts: start small, build better together
If this is the first time your engineering team is working with data people, welcome.
The questions you are asking — “who owns this?”, “why is this in our sprint?”, “are we a support team now?” — are common. So are the friction points. Collaboration between engineering and data is rarely clean on day one. But, it is worth it.
If you take anything from this long post, do the following:
Start with one V-team.
Hire a couple of senior data roles.
Put a lot of effort on way-of-working.
Define what “done” means for everyone.
Make your logs a product, not a side-effect.
Celebrate the insight, not just the commit.
Remember: data is not here to slow you down.
It is here to help you aim better and build things that actually work.
Now, I want to hear from you!
🧪 What was it like when data people first joined your team?
Did it go smoothly — or did dashboards mysteriously break and logs vanish overnight?
How did you navigate priorities, shared ownership, and messy model handovers?
Have you worked in a V-team setup before? Did it help or just create more dotted-line confusion?
And if you are a manager: how have you structured collaboration between engineering and data? What worked? What absolutely didn’t?
We would love to hear your war stories, lessons learned, and even the awkward moments. Drop your thoughts in the comments — especially if you have ever found yourself asking: “Wait, are we still building features… or cleaning up someone else’s tracking plan?”
And finally, another big thanks to Marcos F. Lobo 🗻🧭 for collaborating with me on this leadership topic!
Further reading
If you are interested in more content, here is an article capturing all my written blogs!












