In conversation with Marta @ Ververica...

By Nicholas Hemley | January 5, 2021

In conversation with Marta @ Ververica…

Transcript

Nic Hemley
Hi, it’s Nic here from Bristech and I’m joined here with Marta, Marta Paes, developer advocate at Ververica, who I believe are the original creators of Apache Flink, or at least they’re major contributors to it, but maybe put us straight there Marta in a second. So my first question for you, had you heard of Apache Flink before you actually joined Ververica?

Marta Paes
Yeah. Hi, Nic. Thanks for. Thanks for having me. So, yeah, I before joining Ververica, and you’re right when you said that they’re the original creators of Flink. I’ll talk a little bit more about that. So in my previous job, I was working as a Data Warehouse engineer, so I wasn’t working … I was very deep in the batch world before. And another team that was using Flink within this organisation that I was in. Yeah, we started consuming a service from them. And so because I saw that this real time processing in comparison to to work, I was used to solved some of the problems that I was dealing with in my day job. So I started getting very interested in in real-time processing. And Flink in particular, just because, yeah, I ended up striking conversation with some people in this team and they were always very helpful in explaining to me how that innards or like the internals of Flink works. And so, yeah, my interest just basically came from there. I was just consuming a Flink service and I got very interested in it.

Nic Hemley
So it sounds like you’ve been on a on a bit of a journey then. So they knew deeply involved in batch processing world and then moving more into sort of maybe the you know, the real time in the streaming. So how have you found that transition? And has that that been quite a sort of an interesting or difficult one?

Marta Paes
Yeah, I think and I’ll be I’ll be speaking at your your meetup about this exactly because I think that a lot of people downplay a little bit. Yeah, the jump that you have to make when you have like a “batch brain”. And you suddenly want to move to to real time or to stream processing, because there are a lot of things that are different, right. And, for me, before, I wasn’t even, I wasn’t even very versed on distributed systems even because I was always working against, you know, bare metal, single server, kind of a single server kind of things. Till suddenly, you you I had at least in my case, I had to adapt not only to this new way of thinking about data as something that is flowing all the time, instead of something that is just sitting there in a database, and you know, you write your queries, and you schedule some jobs, and then from time to time, just grab that. And just, and when something fails, you just re-run everything again, you wait, I don’t know, 12 hours for the whole thing to to actually go back in place. Yeah, it’s a I think it’s a big jump mentally. And, of course, also technically, right. But that’s I think, not the hardest part to adapt to is really just making our brain understand different things. So like to suddenly have to think about, I don’t know, the way you do fault tolerance is completely different. Or you suddenly have a lot of different notions of time that you have to think about.

Nic Hemley
You call this like this the streaming mindset, like if that’s that’s what you’re going to be touching on Thursday in your talk. It sounds like that, you know, it’s quite personally resonant topic for you

Marta Paes
Yeah, definitely, because I really had to, and like I said, because I never went through the phase of using Hadoop or using Spark or any of that. So I jumped straight from Oracle, and some orchestration or ETL frameworks to Flink, and Flink is also a very complex stream processing system so there’s a lot of moving parts, a lot of things that you have to that you have to take in. So for me, it was a completely new world. And I think that yeah, until today, I’m still learning a lot of things and my brain still gravitates a lot towards things that I was really used to in the batch world before.

Nic Hemley
Okay, I mean, one of the things is that, you know, the terminology is pretty different. Is it not in the streaming world, I mean, you know, you’re talking about data pipelines or windowing, and, you know, things that you wouldn’t the terminology is very markedly different to the batch world. So is that is that would you say that’s one of the challenges is to actually get a grip on the the terminology and understand how these concepts are important?

Marta Paes
Yeah, definitely. Yeah, I think I think definitely that but also I think more about the new things that you I think the hardest thing is things that you didn’t have necessarily to think about before, not so much just the terminology. So like, when you do batch you don’t really need to think about late events or events that are out of order or anything like that. So I think that’s the kind of it’s all this intricacies that that you suddenly have to think about that were kind-of second nature before just because they weren’t a thing that affected anything that you were doing. Yeah, I think that’s it’s mostly about, yeah, there’s this little changes that in the end, are very important streaming. And then in batch, everything was a bit more black and white or a bit a bit simpler, I think.

Nic Hemley
You mentioned there the error handling, and that’s going to be very different. And, and maybe also like latency, if it’s going to be real time or near real time, you’ve got some latency issues that maybe don’t have impact, because you’re expecting it to happen in 12-15 hours time. So yeah, so but they’re all kind of what I mean, people might describe, like, non functional parameters on that. So as developer advocate, for Ververica, are you in the use cases of, of how people going to use this technology, or, you know, showcasing in what the art of the possible is with the technology.

Marta Paes
So usually, I think that the best people to talk about use cases are always users, because it’s one of the I wouldn’t call it a limitation, but it’s, it’s like a - well - a thing. I don’t know how to what to say here. But because in reality, I never really went through the hardships. I mean, I went through the hardships of using batch processing in a real work environment, or at least doing stuff that has consequences in the real world. And as a developer advocate I’m just kind of, I have this playground where I can just, you know, break things and experiment with things. Yeah, I don’t really experience any of the consequences of doing streaming wrong. For example, like in my own. Yeah, in reality, I don’t have anyone paging me at 3am, you know, to fix a pipeline

Nic Hemley
You’re off the hook there.

Marta Paes
So I always think that users are and I myself watch a lot of talks from Flink users so that I can at least relate to the problems that they have. And then I think that’s more, that’s more my role is to relate to the problems that people have with Flink and try to bring that back to the engineering team. bring that back to the product team. And not so much about, yeah, not not so much about showing use cases I like. I like to showcase for example, how you can use Flink with other technologies. That’s something that I that I really enjoy. So for example, when I see something interesting, and that I relate to especially so like, when I see something like Debezium (https://debezium.io/) , or like Apache Pinot I always just want to jump and see. Okay, how can you use this with Flink? And how can I show people how they can use this with Flink? So it’s…

Nic Hemley
That’s an important point, though, isn’t it? Because Flink is it within the Apache stable? So it’s open source, and therefore it exists within that open source ecosystem? So presumably, you’re forever interacting with all of the other technologies that Flink will integrate with?

Marta Paes
Yeah, exactly. And that’s, that’s a that’s one thing that definitely keeps things interesting for me, because, yeah, there’s Flink has a big ecosystem around it. And you can always not only just interact, but you can collaborate with other communities. I mean, there’s always two sides to it. Right? There’s, if you want to ensure that something integrates with Flink, you need something from the engineering side. But once that is done, I like to just jump in and then showcase how people can actually do it and actually try it myself kind of dog food. All this all this stuff. I think that my job revolves mostly around listening to users trying things. circling back feedback through whatever the engineering team or just the community in general working on.

Nic Hemley
Yeah. See, see. So operating like a bridge you’re trying to bridge between, like, between the users and the developers?

Marta Paes
Yeah. Also. Yeah. I also try to - or one of the things that I need to do - is kind of make sense of the engineering efforts, right. So that’s, when I talk about Flink people can understand what his efforts translate to and not just as, like a technical description of something so not about “Oh, there’s this feature. About what can you use this feature for”, which I think sometimes kind of gets lost in translation with the pace of the project, because you know, there’s around 300 people every day working on Flink. It’s like pretty crazy, pretty crazy contributor.

Nic Hemley
How does that compare to other open source? Is that is that particularly busy in terms of contributing?

Marta Paes
I mean, even for Flink, it’s, it’s busy. Since if you compare it to like, two or three years ago, it’s probably more than three times people that used to work on it. So and, yeah, and so there are a lot of like I said before, there are a lot of separate efforts within Flink, because you have all the API’s. You have also like the sources and sinks and what you were saying about the ecosystem. So I try to glue everything together and to give it some yeah, try to give it some sense or like how the whole thing comes together.

Nic Hemley
And how have you approached the talk on Thursday? I mean, are you sort of give an introduction to Flink in the context and do a demo or yeah, will you sort of how, how are you approaching the talk you’re gonna be giving?

Marta Paes
Yeah, I think it’s too early to say that. [I put you the spot there]. Yeah, I I’m still mascerating like in my head. I’m still trying to understand what’s, what’s the right way to approach it. Also, because I don’t want to make it super long or super. Oh, here’s me talking for 30 minutes. Right. I kind of want to make this. Yeah, I would like yeah, I’m not sure.

Nic Hemley
We’ve got a couple of other people joining us on the on the panel after after the talk, and be very interesting to see what directions they want to take it in. I’d certainly be keen to maybe explore some of the, you know, the Open Open Source ecosystem. But also the pace of change in technology is accelerating. You sort of alluded to it there in terms of you within the context of Flink, are there any sort of particular trends in the industry that you’re interested in tracking currently, or are you still to getting to grips with Flink itself?

Marta Paes
Not really, I mean, I try to keep an eye on the industry as a whole. But I think someone told me that recently that I’m, it’s true that I’m usually not attracted to the gold shiny AI ML things, I’m attracted to the boring stuff. So I think the things that always, that always make me more excited are other things that I can relate to. So like the whole Change Set Capture thing, like I mentioned Debezium before, or, you know, real time OLAP just just things that I think would have made my work so much easier, and so much better when I used to work in this space, or, I mean, I still work in the space, but you know, when I call it the real world, because, yeah, you get it. And yeah, and then I’m really always interested, because also because of that, in things like data governance and data quality, data lineage, discovery, data discovery. Or one thing, it’s not that it got lost, right. But between Batch and Streaming, or at least from my past, it kind of feels like these things now with streaming in real time are kind of delegated to a second plane sometimes. And I think that’s why now you are seeing now kind of now that the whole streaming thing as yeah, has settled down a little bit, then all this projects that that kind of cover for everything else. So like all this data quality issues, because it’s sure you can like do online model training now. But if your data sucks, you know, yeah, your your results will suck, and your predictions will suck. So you think now we’re kind of entering the phase of developing all these tools that actually focus on the data or on the metadata and not so much on the infrastructure. I think that’s for me, that’s, that’s really, that’s really interesting, because I was also always before very connected to the meaning of that data, not so much from just moving it from A to B. [I see] So, yes, yes, I like the boring stuff. A lot. I’m very interested in everything that is coming up there.

Nic Hemley
Well, that sounds like it’s sort of straying into more of the data ops or data science or data governance type realm, which is, you know, super-important to an organisation because they’ve got to be data centric, and not just data driven. But you know, data has got to be their lifeblood and I think that’s kind of recognised by many, many organisations across across the planet. So maybe like just just to draw it to a close today because I’m interested in what what you would say to your younger self. So you start, you know, the Marta who is starting out as a newbie in the industry back in the day, what what would you say to yourself if you had the chance?

Marta Paes
Wow. That’s a deeper question than I was expecting [Laughter] But maybe one thing that I would have said to myself was maybe to get involved in open source before? Or Yeah, I think that’s the thing that could be it…

Nic Hemley
interesting.

Marta Paes
Because it really is like a very, really is a very different feeling than working in a corporation or just working in a team and just using proprietary stuff, I think, an open source, I really found also a different way of working. So a more collaborative way and more of a more of a safer space to experiment. Because I found that, yeah, not all people out there, like, want you to not succeed. There’s also actually a lot of people who are willing to help you to actually evolve or to try new things. And so it doesn’t feel wrong to fail, I think is wrong to fail.

Nic Hemley
Do you think that open source attracts a different kind of person? Is there a sort of personality type that fits open source, do you think?

Marta Paes
hmm. And maybe I’m used to values from the Flink community, but at least the Flink communities really, I mean, even within my company, because it’s the first my companies. Yeah, the biggest focus is still working on open source Flink. So all these guys that kicked it off. like to have like Stefan from from my company. He’s, like the creator of Flink, because the whole thing started with his PhD thesis. And, you know, he created this thing that is used nowadays by all the big tech companies in the world. And it’s just constantly evolving. They have like 300 something people working on it on a daily basis. And he could have like, yeah, he could feel entitled to … I don’t know, but he’s a guy who will sit with me and just, yeah, go with me through something that I don’t understand about Flink, for example, or he will proactively reach out to me to explain to me something that is being worked on. And I think it’s a very different, it’s a very different mindset as well from what I was used to before because there’s always this, especially because I started on consulting, right? So it’s kind of like, swim or die situation. So it’s very, it’s very different. I think.

Nic Hemley
it sounds like a really nice company to work for is, you know, I think that you know, that that type of model where you can build a commercial business off the back of an open source product, like Red Hat model is what you might call it and it’s quite a well trodden path, I guess now nowadays. And so they’re not the first to try it. But it but it sounds like, you know, they have hit on here on something.

Marta Paes
Yeah. And, at least for me, it’s also very liberating, because, you know, I can focus on the open source, I don’t necessarily have to sell something, you know, and I think also, like, our product is very conscious towards the open source project so that there isn’t like any conflict of interest or this kind of thing. It’s a it’s a very, I think, wholesome, non-evil corporate experience

Nic Hemley
I hear you loud and clear. I think that’s that’s a really important point. But what I really enjoyed the conversation, but let’s close it there. And then we’re going to hear from you on on Thursday, and I’m really looking forward to that talk. And so thank you for your time today. And I’m really looking forward to your presentation.

Marta Paes
Thanks, Nic.

Nic Hemley
Great. Thanks, Marta.