Stijn Christiaens is the Co-Founder and Chief Data Citizen of Collibra. The company he has co-founded has established itself as one of the leading data management providers in the world. He has seen the company evolve from different angles: as the COO, then the CTO, and now the Chief Data Citizen.
But what exactly is a Chief Data Citizen? Sven Hermans sat with him to unpack it all.
Get comfy and listen to this new issue of the Data 121 Podcast where Stijn talks about his career, his views on the present and the future of the data management space, and why your company should – almost literally – draw a map of a castle.
Hi, everyone and welcome to our Data 121 Podcast. My name is Sven Hermans and I will be your host for today’s episode. We are joined today by Stijn Christiaens who is the Co-Founder and Chief Data Citizen at Collibra.
That’s right.
Thank you for joining us!
My pleasure. Thanks for having me over.
How are you doing today?
Ça va! It’s the first time in a long while that I’ve come out to have business travel again. So I’m a little bit excited and a little bit anxious, but really happy to be here.
Could you maybe start by introducing yourself?
Yeah. So, Stijn from Collibra, one of the Co-Founders of the company. That’s been 13 years since we started the company in 2008.
Over those 13 years, I really had a variety of roles. I’ve been Chief Operating Officer where I was responsible for pre-sales, post-sales, partnerships. I’ve seen a lot of PoC’s, pilots, implementations and worked with a lot of our partners.
In 2014, I moved to New York where I became the Chief Technology Officer responsible for product, the roadmap, UX, research and education, marketplace and technology partnerships. Our company has grown consistently over 13 years, which means that you as an individual also have to grow a lot. For the last few years, I’ve been running our own data office as what we call our Chief Data Citizen.
Chief Data Citizen? That’s a word you typically use at Collibra, right?
Yeah, we like to talk about the Data Citizen.
Can you explain what the role entails?
Of course. So like I said earlier, our company has been growing a lot to the size that systems and processes become really, really important, like a lot of our large customers, but still a little bit smaller.
So we said, “Okay, it’s time for us now to put our money where our mouth is, eat our own dog food, drink our own champagne” whatever you want to call it, and we set up our own data office, which is one of the main places where we collaborate with our customers as well.
The reason that we did that is to work in the systems and processes. But the reason we also did it is because at Collibra, we believe that today, everybody is what we call a Data Citizen. So that’s everybody who uses data to do their job.
Today, we think everyone in an organisation somehow, somewhere, touches on data. This is really important to us. At Collibra everyone is a Data Citizen too, so let’s run our data office and let’s make the title the Chief Data Citizen, as opposed to what most people know as the Chief Data Officer. So I’m essentially leading data at Collibra.
I also have a little bit of an evangelism role. Everything we do in the data office, we’re very hands on. There are a lot of things we learn while we use our own products at Collibra. We like to give those learnings back to the markets, our customers and partners. That’s why the role has a little bit of an evangelism aspect to it and why we call it the Chief Data Citizen.
Yeah, so it’s not only internal but also external.
That’s right.
So at Collibra, you help other companies to manage their data. You have several products and services such as Data Catalog, Data Governance, Data Lineage. What would you say is the mission of Collibra?
We do indeed have a whole bunch of products at the moment, but it’s a good thing you’re asking about the mission because that’s really where we come from and what we’re going to try and achieve or change in the world if you will.
When we started the company at that moment in time, we already felt that data was important. But if you look at how organisations deal with data, it feels like there must be a better way, right? There must be a better way of doing this. For us, in our space, everybody talks a lot about data being the new oil, the new soil, the new gold, whatever. But we like to think of data as water. You want it clean, trustworthy like you’re drinking from the tap, but also reusable in different ways, as opposed to oil, which you burn and it’s gone.
For us, data will become an essential asset for every organisation in the world and they need a better way of managing it. So what we want to do is make sure that all the Data Citizens in your organisation, everyone who uses data to do their job, can easily connect the right data to the right algorithms, models, insights, and ultimately, very importantly, the business outcomes. So we want to make all the organisations data intelligent organisations, because if you look inside a company and at the data assets, it really only works “as a team sport”. So we’re all united by data in getting to those business outcomes.
You’ve worked with a lot of different organisations, you’ve seen many data related projects. What kind of need do you see in the market?
I think there are a number of things that I’m seeing as a need in the market.
One is: a lot of organisations talk about data as a strategic asset, but I feel that they’re not always properly organising or executing for that. One thing that’s missing, that I see a lot in the market, is a business infrastructure for data, which means that business people are really enabled to do whatever they need to do with data. That means they need to have certain tools and technologies, of course, but they also need certain roles and responsibilities so that they know what they can, but also should not do with data. They need certain organisational structures and processes in the company. For example, if you say “I want to get data that exists in our organisation”, how do you get it? What is the process for you to get it? Who do you talk to? So, that is one: the business infrastructure around the data assets that is still missing in most companies.
Second is: the executive understanding of the data assets. It’s already changed over the last 13 years, but in a lot of executive education, MBA’s and so on, how much do they really teach about data? A lot of the C-level executives don’t really understand the data asset just yet. They need to.
What else is missing? Well, what it is that Collibra is offering of course. A Data Intelligence Platform if you will, a system of record for them to add, scale, manage data assets, with a lot of the components in there. People need to do a lot more about privacy obviously, whether that’s for regulatory reasons or for trust with their customers.
Quality is becoming increasingly important. AI eats data to work and so if the data is bad, the outcome is also bad.
In summary, that business infrastructure, the executive understanding, the system of records for the data assets, and then it breaks down into some other, more pointed needs.
You touched upon the different needs. But how does Collibra fill that need?
Well, first of all, I think it’s because of our mission. We’re trying to figure out all the things that are relevant to that mission to make everybody united by data, which breaks down in a few components as I said.
Actually, the best way that I found over these 13 years of explaining in a very short time, like in 30 seconds, what we do and the problem that we solve, is by playing a little betting game between us if that’s okay. It’s 30 seconds and the clock already started ticking by the way.
I will bet you 100 euros that it’s easier for you to buy a book online than it is to find your own data in your own organisation. Right?
That’s… a bet I cannot win I think.
Exactly!
Well, typically, that’s the outcome. Most people do understand: if they need to get access to data, they start thinking “how would I even begin?” Whereas buying a book online, you just go to some catalog and you buy it. So, one part of what Collibra solves is: we offer organisations that sort of catalog for their data assets. Now, you have a place where you can actually go search for the data that’s available inside the company.
But the second part to that is: let’s say the book never arrives. You bought it but it doesn’t arrive, it’s damaged, or you cannot reach the seller. You’re not going to use that service more than a few times because you don’t trust it, right? The same thing is true for that catalog of data inside your organisation. If people don’t trust it, if there is no cornerstone of trust, like “who is the owner for this data? How can I get access to it? Is it the right data?” If they don’t have that cornerstone of governance that brings trust, then the catalog initiative is typically not going to be sustainable.
So in the 30 seconds – I didn’t count – you have one piece of what Collibra solves: the catalog. The other pieces are: the governance portion, which is around roles and responsibilities and processes. And then we have a few other pieces like: lineage, “where did my data come from?”, privacy. Like, for example, the process registry or which business processes are using data and on what legal basis, etc. So there are a few components to what we’re solving.
I can imagine that if you have all those tools managing the data of organisations, that data can be in different places, right? A few years ago, maybe a decade ago, I think organisations were still setting up their own clusters, managing those data lakes and having Hadoop clusters and everything. Today, more and more companies are moving their data to public clouds like Google Cloud, AWS, Azure. What do you think of this new cloud era?
I think it’s fascinating. For the last 13 years, let’s say decade, we’ve seen the data landscape change a lot.
Take the data warehouses in companies for example. You have existing data warehouses, they have existing ETL technology. But even in the past decade there’s already been disruption there, right?
So the data warehouses and the data infrastructure went through that whole Big Data phase, where people were talking about elephants in Hadoop, Cloudera and Hortonworks as big players in there, and indeed, they were then setting up their own clusters and hiring network engineers to do that for them. But then, only a few years later, all the questions that I got: “Can Collibra integrate with my Big Data cluster x, y, z?” All of a sudden, they started significantly shifting to the cloud. I saw it first in the US, but now also in Europe. It was no longer “Can it work with Hortonworks?” but rather “can it work with BigQuery“? for example.
I started looking at this cloud thing and how it was affecting our data management space and I realized that from my viewpoint, the data management space is currently at breakneck speed being commoditized by the cloud players. Why do I say that? Because the traditional data management vendors, you know, they’re great players with great revenue and great businesses. But at the same time, they’re small compared to the software titans such as Microsoft, Google and AWS which are orders of magnitudes bigger and fighting a battle over that market. So at a minimum, for the next three to five years, I see that trend of moving to the cloud continue and probably a lot longer because, if I’m not mistaken, combined with those three big vendors they did 100 billion in revenue in cloud last year, but still growing at more than 40% year over year.
You can easily see how the trend is going to continue for a number of years., You know, organisations don’t set up their own water plants. They don’t set up their own electricity plants, maybe a backup generator or so. Ultimately, these things are being commoditized. Similarly with data centers, they’ll have all of their storage and computes running in the cloud as a service.
Still, we clearly see that shift, but I don’t think that all organisations are convinced. Some still have concerns about their data being managed by a third party. What do you think of those concerns?
I understand the concerns. I think to a degree they’re valid, because there are some challenges. We’ll talk about that in a minute. But at the same time, I’m thinking, well, let’s look at systems of records for a bit: CRM, ERP like a Salesforce, NetSuite, Workday, etc. They’ve been running in the cloud since the 2000s, right? Salesforce, I think, started in ’98 or ’99, something like that. So organisations are already running a lot of their important processes, and the corresponding data in the cloud for at least two decades, roughly. Whereas the whole “hey, we need to put our data into the cloud” has been happening for the last 7 years or so. And for a lot of people, it’s like “Oh, is this the right thing to do, is it going to be safe?” Well, your data is already in the cloud, right? It’s already in the CRM and ERP. So why the extra concern now?
But I get it, because what is the challenge with the cloud is that as an infrastructural component, for a lot of these companies, it’s completely new. A lot of the technologies and services are also new. So maybe they don’t have the resources that are scaled up. And maybe they’re a little bit afraid. “Okay, if I’m going to put this data into the cloud, even though the cloud provider is fully secure, what if I do a wrong configuration and accidentally expose my bucket to the internet and open myself up for a certain level of risk in data breaches?”
So I understand there’s some hesitation because they need to scale up and tool up and make that move into the cloud. But as organisations acquire those skills, I’m sure the concerns will dissipate.
In the past years, there have been increasing regulations, especially GDPR in Europe. I don’t think that it’s always easy for big companies to manage those and to stay compliant.
No, not in the least because of the sheer number of them. For example, 2016 brought GDPR but in the meantime, at Collibra, we’re tracking over 150 privacy regulations, which are – and I’m simplifying – like GDPR spin-offs that every country or region has set up. For example, in the US, many states have their own privacy regulations.
So if you’re a global business, that in itself is quite a challenge to deal with. And we’re not even talking about regulations surrounding data sovereignty which have come up because certain countries and regions decided that they want to have the data on their territory to protect it, in a way.
My advice for regulation is, first of all, don’t panic. The regulations, especially around privacy, are meant to protect the consumer, the individual, against the abuses which are really easy to do when you have just a little bit of data. There’s so many things you can do wrong for those consumers. So putting some protection in place is a good thing because in this case, it’s going to increase the level of trust with your customers or the people you serve. That’s very important, right?
Like, if you’re dealing with a bank, you want to trust the bank. If the bank says “Okay, I’m going to abuse your data” do you feel comfortable leaving your money with them, for example? So that trusted relationship is important and is going to increase in importance over the next couple of years for sure. But the approach I would take is to say “Okay, the regulations, we need to do it.” So it’s creating certain budgets internally. And then you can use those budgets to do foundational things, which you have to do anyway.
For example, one thing that GDPR requires you to do is to have a process registry. “Which business processes do I have with which I create value for the customer, and which data is that business process consuming or using and for what purpose? And is there a legal basis? Or is there agreement with the customer that the data can be used for that purpose?” So having that in itself is good because as a business owner, it gives you the ability to say “Okay, what if we change our business process? How do we have to change our data flows, for example, which systems do we have to touch?”
But also, from a protection point of view, it gives you a map of your castle, right? Imagine the data castle. Imagine you have a castle you need to protect from people who want to steal your gold, your data. Without a map, you don’t know where the weak spots are, or where the most value sits and where you have to put the relevant protections in place. In that sense, what GDPR requires you to do is to put such a map in place, which is convenient for making certain business decisions, but also convenient for putting certain protections in place. So don’t panic and use the regulatory aspect to put these foundational aspects in place, which will help you not not just with one regulation, but multiple.
I like the comparison with the castle and a map. But I can imagine that if you’re managing a small castle, it can be easy to find which data is where and how to protect it. But what if you have a big castle? How can you make sure that everyone finds what they need? And then, like you said, also protect it?
Well, if you have a big castle… let’s say you’re some large multinational, maybe you even have multiple castles, right? So the bigger it gets, the bigger the map should also get to a certain degree. And when you have to make a map that big, you have to change the way you do it.
For example, one thing that becomes really important is establishing a culture of data. Are your managers or executives asking for data to support decisions? Are they showing the example? Think about it a little bit. In a way, you have to make everyone in that large castle, or collection of castles, think about the data so that they can locally understand and compile their little piece of the map.
Let’s compare this with another asset in a company, something that every company knows: the money asset. If I’m a Chief Financial Officer, then my job is not just to manage that money properly, but also to make sure that everybody across all my castles thinks about money in a responsible way. So I’m going to do that by leading by example but also by setting up responsibilities and processes, like budgeting and expense reports, etc. So I’m making everyone in the organisation think about the data assets. Thus I’m making everybody a participant or a contributor, or at a minimum a consumer of those maps that protect the castle.
We’ve been talking a lot about today, the challenges we have and the fact that it’s growing fast. What type of challenges do you think we will face in the future?
I think the regulatory aspects will definitely continue. That’s going to remain a challenge for organisations, to keep that into account, whether it’s data protection, or data privacy or sovereignty, those will continue. They need to keep in focus.
I think that trusted relationship with the consumer is also going to be important. And I think about what a lot of organisations are attempting right now, for example AI and machine learning initiatives. A lot of organisations are currently testing them out, you know, they’re in prototype and pilot phase, and they’re trying to get to production stages. That’ll take a few years.
But think about the challenge for that. You don’t just have people looking at a dashboard and making a decision per minute, or one decision per hour or day. The frequency is very low. When you have AI making decisions and automatically performing actions, maybe for the customer, you’re making 1000 decisions per minute. So if you’re making a decision as a human, you have the ability to shoot yourself in the foot once a minute, but as a machine learning algorithm, you have the ability to have a machine gun shoot you in the foot 1000 times per second. So in that sense, I think the aspect of quality is also going to massively increase for companies. AI is the beast that feeds on data and if it has bad data quality, it has a bad diet, and then it’s garbage in, garbage out.
Regulation, privacy, trust with the customer, and definitely quality are going to be upcoming challenges for companies.
If I’m a new company only starting my data analytics journey, what type of advice would you give me?
First of all, I would stress the data strategy as part of the business strategy.
Oftentimes, I see that data people can sometimes love data a little bit so much that they say “Okay, I’m doing data management for data management’s sake.” So you always have to think about “what is my data strategy? And how does it fit into the business strategy?” Essentially, how are you going to deliver those business outcomes, whether it’s making more money or saving more costs, or being more compliant or completing the mission, you know, fill in the blanks.
Next to that, the buy-in. You need buy-in of relevant executives, because data is a team sport once again. Go out there and with your business stakeholders, collect what use cases they have around their business outcomes, where they could play a role. Map out those use cases and put them in a value matrix, if you will. Then you can use those to really get that initial buy-in and use the success with those first low hanging fruits, high value use cases to really show to people that it’s easy to do this with their hearts and minds and really start building out, which will take a few years, that data culture.
It’s really about connecting business with the data team and then creating a culture. Do you also have a list of common mistakes that you see at companies that are just starting?
I think there are a lot of mistakes, the most common ones are maybe the biggest ones.
I think a lot of organisations do a little bit of “Look at my hands, we’re not ready for things yet, we’re not mature enough yet.” They just don’t start. Not starting is in itself a mistake. Because you’re also making a decision to do nothing, which means your competitors can take an advantage on you, right?
Second is to look at it way too technically. You know, if you have a data warehouse “A” and it’s, for whatever reason, not working for you, and then you replace it by technology “B”, but as an organisation and as a people you’re doing exactly the same thing, just with a different technology, then you’re probably not going to solve the problem either. It’s not always the technology. A lot of times, data is often too much seen as a technology initiative. People forget the people and the process aspect to it.
Another big mistake is not getting that high enough executive visibility, executive sponsorship. They’re missing the data strategy, or they’re missing a data strategy that’s not fitting into a business strategy. And then for people at the executive level, C-level, it’s not a priority for them, because there’s no strategy that shows it’s a priority for them. So it doesn’t get wide enough, it becomes a too limited initiative.
Last question about Collibra. What’s still on your roadmap to keep bringing innovative ideas to the market? What kind of services do you still have in mind?
Well, there’s a whole bunch.
Remember, when we started Collibra 13 years ago, we became the leader in the space, the category we created as a technology category of data governance. Then we added different components: we added the catalog in 2014/15-ish. We added privacy and risk for the privacy aspects, like the regulatory aspects. We added lineage. So we’ve already added a bunch of components into what we call our data intelligence cloud.
Shortest on what’s ahead for us is definitely the data quality aspects. We just acquired a company only in February of this year (2021). So for us, integrating those people into our company, the technology into the data intelligence cloud is a big priority because we see a lot of need for it in the market. We just talked about that. So data quality is going to be a big priority for us also in the next couple of years going forward.
And then next to that, to continue building out our data intelligence cloud. With Collibra, we believe there’s a lot of overlap on the data asset. We see people in the data lake space who set up a catalog, so they know what’s in their lake. And then in a completely separate part of the organisation, you have people in legal who are doing process registries, and then somewhere else again, in IT for example, they’re building that data castle map for protection reasons. But really, these different groups in the company are doing the same thing, right? They’re all making a data map of the data assets, to put it simply. And at Collibra, we believe that should be in one place. And those people: data architects, data engineers, privacy people, security people, analysts, Chief Data Officers, what have you, they’re really all working together. They’re all one team doing a team sport of the data asset. They’re all united by data. So in that sense, we believe it should be on one platform.
For Collibra, that’s a big priority to continue building out that one platform to continue providing that with the relevant enterprise features to always make it bigger and better. So that, yeah, the platform supports all of the personas that are touching and collaborating around the data asset in the organisation.
Thank you very much Stijn. You clearly showed us that Collibra is at the forefront of the data & analytics industry. You are one of the most knowledgeable people I’ve met in this area. We’ve talked about the present and the future. I’m convinced that our listeners will get a lot of insights out of this.
Excellent! Thank you for having me over, it was a pleasure talking. Everybody, have a nice day!
If you enjoyed this interview, consider subscribing to our Data 121 Podcast on Spotify or iTunes. We will release a new interview every two weeks with guest speakers from Carrefour Belgium, Collibra and Unifiedpost Group among others.
Would you like to know more about our solutions and how we could help your company? Reach out to our experts!