4 Comments
User's avatar
Simon Späti's avatar

Great read, love it. And it's true, it's getting harder to keep up every year I feel. Thanks for the shout-out, too, much appreciated.

Expand full comment
Ryfl's avatar

"Every day there is some new technology we have to learn... by the way, have you heard about Estuary?" No, not something else... :-D

Expand full comment
Peter Andrew Nolan's avatar

1. Hi Ben,

there have only been a couple of major innovations in the data warehousing area over the last 30 years. The biggest two innovations was the move from "only send the data you think you need into the data warehouse" to "send all the data that might one day be important into the data warehouse". And the second big innovation was the ability to support archives in dimensional models.

In the 80s disk was so expensive there was simply no way you could get a budget to replicate all the data in operational systems into a data warehouse. When I started in 1982 disk was USD100K per GB. A 9600baud dedicated modem line was USD40,000 per year and you could put 8 terminals on the end of each 9600 baud modem. The concept of "replicating" data was IN-SANE. It was all we could do to actually process the data we had. I worked on some of the largest systems in the world in the second half of the 80s because I worked on the IBM Internal Billing Systems. I was the guy who got IBM Japan to accept and install our Billing System. It was the largest single image billing system in the world.

So Ralph and the guys at Metaphor hit on the idea of just put the data into a dimensional data warehouse that matches the value drivers of the business so that you can answer 90% of all the most important questions that the business comes up with. I met Bill at the 1993 Metaphor users conference and he shocked all the attendees by proposing that ALL DATA in operational systems should not only be put into the data warehouse but that up to 10 years history should be kept. This was such a radical idea that I didn't even get it. I thought he could not possibly be suggesting this. When we brought him to Australia and he did a full day presentation / workshop in Sydney I really "got" that he was proposing store ALL the data....for at least 10 years.

Obviously dimensional models could not do that so we pretty much had to adopt his TV+SA models to do that. These were VERY hard to sell because of the cost and most companies I tried to sell them to would not buy them.

The first BIG innovation was invented by my mentor, who Ralph left in charge of data models at Metaphor when he left in 1989, and myself in 1996. I had moved to Hitachi and we were selling disk drives essentially. And so we were selling "put all your data into the dimensional data warehouse" to sell disk drives. To that time we used to build the data warehouse as either a dimensional data warehouse like Ralph/Metaphor proposed or we built it as an archive like Bill proposed. At Hitachi Asia Pacific, which is what I ran for DWH, we committed to only dimensional models.

Expand full comment
Peter Andrew Nolan's avatar

2.

My mentor was working for one of the worlds largest insurance companies in the US. He was heading up the PwC DWH Practice by 1996. When IBM fired all the Metaphor people in late 1994 he went to PwC on a mission to "screw over IBM" for firing him. Anyway, he knew I worked in insurance and so we went back and forth over what was in the proposal he was working on. His problem was common to Insurance. You absolutely HAVE to have dimensional models for end user query....but time frames are so long in Insurance that having an archive is like having a diamond mine and we knew it.

We went back and forth for weeks and some how the discussion moved to "why not have both"? Meaning an archive like Bill proposes and a dimensional model down to the detailed level like we were building at Hitachis clients. My mentor was like "that is going to be VERY expensive which is exactly what we want at PwC". So he put that into his proposal. They won the proposal and took a year to build the first ever data warehouse with BOTH dimensional models like Ralphs ideas and archives like Bills ideas.

He reported back to me that it was a spectacular success that ANY question could be answered. There had been an internal review at PwC and it had been accepted as the new standard and went into the methodology he wrote. So that was first BIG innovation. To have BOTH an archive and dimensional models. Oracle, Teradata and IBM soon followed suit and it became pretty much the industry standard.

But it was E-xpensive and I mean with a capital E. And it was VERY hard to sell. I later went and worked for PwC and sold exactly that so I know how expensive it was. My mentor and I spent 1997-2001 trying to figure out if there was some way to implement the archive using a dimenionsional model. We could not figure it out and we thought it was impossible.

In Febraury 2001 I relocated to Ireland to implement the Sybase IWS Data Models being sold like hot cakes by Sean Kelly who was VP for BI in Sybase EMEA. I had to go do the three day class. This guy walks into the classroom to teach the class and introduces himself. I have never heard his name before so I didn't know him.

Anyway, he is teaching the class in Brussels to an SI with about 30 people in the class and I am one of the "independent" guys in the class. He comes to this slide where the table is called a "profile" table. The profile tables were used to do things like profile customer demographics over time. It was an archive implemented in a dimensional model. Because I had just spent the best part of 5 years trying to figure out how to do this I knew what I was looking at. I put up my hand and asked the presented "Who invented this?" And he said "I did". And I knew I was in the presence of the worlds number one data modeler.

That guy and I became good friends. As it happened he lived just around the corned from where I was able to get a rental house is Dublin and we used to have him and his wife and children over for dinners.

Then in March 2002 I was doing a job for North Jersey Media Group in New Jersey and this guy, let's call him Fred, was back in the US with Sybase and he was bored. He knew the project that I was working on was critical for Sybase so he called me and asked me if he could come and help out for a few weeks. I was delighted and said sure.

So Fred comes over and we have our own conference room to work in. We have data model diagrams plastered all over the walls as was our custom. You just couldn't keep these models in your head. Anyway, as these things happen we get kicked out of the conference room and asked to work in an open plan office. So we put up all our data model diagrams on the walls in the open plan office. Some women complained that they didn't like "men standing behind them talking" and so we were told we had to take down all our data model diagrams.

Fred and I went to dinner that night and he is pissed off. He was nearing 60 in 2002 and he was saying: "Who the F do these people think they are saying we can't even have a data model diagram on a wall to help us deliver THEIR project. I am nearly 60 years old, I can't remember these things. My memory is not like it was 20 years ago."

He was really angry and I was sympathetic and calming him down. And then he said:

"Do you know what we need Peter? We need a way of designing data models so us old guys don't have to remember them. That would be great for us old guys. Data models that for some reason, some how, you don't have to remember the relationships between the tables."

And in the middle of that night while I was sleeping my brain figured out the answer. It was to put an integer key on the profile table and link it to all the transactions it was related to. The next morning at breakfast I told Fred about my idea. He thought about it for about 5 seconds and said: "Yes, that will work, well done young man".

Coming from Fred this was like GOD Himself praising me. We went to the office and we tried it out and it worked fine. What this allowed us to do at Sybase was build models that not only archived all data in the dimensional model but removed the need to ever be bothered to remember what tables were joined to what tables.

In the early dayse we called this "everything is linked to everything". Then in 2006 we changed the name to "The Mesh" meaning that our models were a "mesh" and that if any two tables should be joined they would be joined. This was kept a trade secret in Sybase. I told Bill and Ralph about it in 2003 under NDAs and they were both very impressed. Sybase sold this new idea like hot cakes until SAP bought Sybase in 2006 and they dropped selling the models.

Sean Kelly contacted SAP/Sybase and asked if we could do a new version of the models given they had end of lifed the old versions. SAP said as long as we never competed with SAP we could do what we liked and we brought out a new telco set of models in 2007 and sold a copy to Carphone Warehouse in 2008 and a copy to Sky Talk in 2010. And in October 2010 I was doxxed for my political activism and so we could not sell any more data models.

But those two things, the merger of archives + dimensional models in 1996, and then the ability to do perfect archives in dimensional models using profiles and then adding the integer key to the profiles were the two big changes. And since then? Nothing much has really changed. The next "big change" is Bill Textual ETL in which I am an absolutely firm believer. I have done the training and have offered to work in the area but my political activism precludes me from doing that.

The only other really big change is that I created ETL Software that can now map 15,000+ fields per work month up from the 1,000+ fields per work month that was the standard from 1996-2017. My free ETL software can map 6-8K fields per work month.

All this other stuff like "data pipelines" and "cloud databases". None of it is really that far in front of what we were doing with Sybase IQ and the IWS Models in 2001. And you can notice the distinct lack of anyone in the "data area" talking about business benefits in terms of return on investment. This was something that Ralph and Bill drilled into the heads of anyone who would listen to them. "Always talk about return on investment when talking about a data warehouse, don't talk about technology, talk about business." Few people listened to their advice. Sad to say.

Expand full comment