5 Comments
User's avatar
Muhammad Khurram's avatar

That’s very true for data migration projects, another big factor is not buying in business users that lead to adaption failures to new platforms.

Expand full comment
Clinton Jones's avatar

"If you don’t have an automated way to track dependencies, get ahead of the problem. Ask every team that relies on your data warehouse to submit a list of what they need migrated—set a deadline, and hold them to it."

Alternatively, look carefully at whether a metadata data catalog will address a good portion of the dependency analysis that you need, one-demand and in realtime. Depending on the technology and the catalog vendor, often a catalog alleviates a good amount of the documentation burden on dependency analysis and understanding end-to-end.

Expand full comment
john boddie's avatar

The major problem I've experienced with data migrations in over forty years of doing them is partially addressed by your article when you point out that they are approached without a viable plan. Part of a realistic plan is a budget to get the work done, and I've consistently found that the migration budget (if it exists at all) will be underfunded by at least fifty percent. For incremental migrations by business function, the percentage will be greater. For incremental migrations by physical location, you'll have better control of cost because what you learn from the first one can be applied to the next one.

For any complex data migration, there will be a corresponding migration on the processing side (moving from SAP to JD, Edwards for example) and you will need a migration environment to validate the data and processing changes in a coordinated manner. You need to expect (and plan for) disruptions both upstream and downstream of the changing process/data environment. Don't overlook the development and testing of your fallback process when you do this.

If you are looking for a stable career as the AI bubble sorts itself out, see if you can find a group that specializes in migrations. It's a little like being an IT plumber. It's not only water that flows downhill and people who can deal with that live in pretty nice houses.

John Boddie

Expand full comment
Nghi Thanh Le's avatar

This is a great articles, but can you tell me the way how can we compare data in 2 database systems? Which tool should we use ( great expectations, soda, etc.)? And how we manage to run .in a different systems?

Expand full comment
Pipeline to Insights's avatar

Great article! It reminded me of a migration I worked on from Redshift to Snowflake. We encountered some discrepancies when using the same query, though they weren’t always significant. For instance, one of our metrics was supposed to be 23.8, but it showed as 22.9. In such cases, what would you recommend? Do you think an acceptable threshold or range is appropriate? When we discussed this with stakeholders, they mentioned setting a range, with anything within that range being acceptable. I’m curious to hear your thoughts on this approach! :)

Expand full comment