The Router

Working in Machine Learning with Samual MacDonald

UQ Computing Society Season 1 Episode 9

These days, there is much hype around data science and machine learning, but what's working in it actually like? Samual MacDonald, a machine learning researcher at Max Kelsen (a Brisbane-based AI/ML consultancy) gives insight into what a machine learning job actually looks like, and how he got there.

Liking The Router so far, why not subscribe in your favourite podcast app? Check out https://router.uqcs.org/ for details.

Intro/Outro Music: Awesome Call by Kevin MacLeod
Link: https://incompetech.filmmusic.io/song/3399-awesome-call
License: http://creativecommons.org/licenses/by/4.0/

Matt:

Welcome to The Router, the official podcast of the UQ Computing Society, where we explore the human side of tech. I'm your host, Matt, and today I'll be having a chat with Samual McDonald from Max Kelsen, here to talk about working in machine learning and data science in Brisbane. Firstly, could you introduce yourself and your background?

Samual:

Okay. Um, my name is, uh, Samual McDonald. Um, I'm a machine learning researcher, I suppose. Um, that title seems to always be changing, I think because of the sort of nature of the work. Um, and, um, my background in, I suppose the reverse order is, um, uh, being a, um, a researcher at Max Kelsen and then I'm a master of data science student, um, at UQ. And then before that I was, um, a, uh, uh, an environmental and engineering geophysicist before that as a geotechnical engineer, before that I was a uni bum. And, um, before that I was a house painter. And so that's my background, I suppose, back to eighteen.

Matt:

And, and I guess, um, so you said you spent, uh, I guess one or two years during your master of data science.

Samual:

Yeah, I actually, um, I actually took a, uh, three years to do that because, um, by the time I'd started, I was, uh, I was in, I think I was about to 26. Um, and I, I, I think that's mature age student and, um, I had a few life experiences that made me realize, I suppose, um, what employers are looking for. Um, I think if they look at your CV, they'll look at grades, of course, there's sort of like an easy filter to shortlist, but the final decision doesn't depend on grades. I think, um, they also look at how long you take to complete your degree. Um, they look at your experiences, um, and I think they use, uh, your experiences, thirdly, the experiences as, as the most important final factor that, uh, ends up deciding whether or not you get the work. So, um, with that in mind, like, um, thinking about uni is, um, optimizing three things, grades, speed and experiences on top of, um, grades and, um, how fast you complete it. I decided to take it over three years, um, that would, um, allow myself more time to do other work and, um, get, um, um, because I figured if I have a three years, um, I I'd, uh, uh, I, um, all this, um, spare time to do all this extra work, I'd come out at the end with a lot more experience than the people I'm competing against. And so, um, yeah, I don't know if I'm talking about a different answer, a different question.

Matt:

No, that's good. Um, so I guess your university experience, you mentioned you had quite a bit of work and stuff during your, um, your university time? What was it like doing classes after working for so long?

Samual:

Uh, it was at that point, I'd worked for three years in a very, um, uh, uh, uh, competitive, um, product then. And so I was used to big hours, so my undergrad was tough, you know, doing an engineering. My first degree was, um, uh, in civil engineering and, um, that was, I found very tough compared to, um, high school, high school. I didn't remember doing anything really just floated on through and got, you know, a grade good enough to get into an engineering degree. It was very much focused on things other than work& career at that point in my life. Um, and, um, it was only through my first degree that I started to get career focused. Um, and so, yeah. Um, I think I found that that was really tough, but then when I went into the real world and started working, that was very, very challenging. Um, and I think the, like, you'll end up doing, you know, the odd week, like you're doing like up to 70 or maybe even 80 hours, um, uh, the odd shift where, um, I, you know, you would have to do 20 hour shifts or something. And, um, that, that was very intense. It wasn't always like that. It wasn't like, um, Apple back in the day or whatever, where they consistently do 75 hours. But, um, I think there's a lot of companies that do still do that. Um, and, uh, and I think, um, three years of experience of working really hard, like that made doing a master's quite easy. I remember the, um, masters of data science, even though I was learning a lot of new things about statistics and computer science and quite challenging. Um, uh, I remember that being a breeze, uh, compared to, um, the industry. And so, um, uh, and so actually it was, it felt quite relaxing, um, working alone and, and studying a lot in comparison. And so, yeah, it was just contrast relative.

Matt:

Yeah. Yeah. Fair enough. I imagine even as a uni student, I've never done anything close to 75, 80 hour weeks before, so I guess.

Samual:

But that's different. You wouldn't be able to cause it's a different thing as a uni student, you're learning and learning such a different mode for the brain than it is. Um, um, doing work that's not requiring, uh, learning as much learning anyway, the first year of work you'll find really odd. I think the first year of uni is really hard because of all the changes, but, um, uh, like I think the first year, uh, any degree is the hardest for the person coming out of high school. Uh, on average, maybe.

Matt:

So, um, when you started working at, um, Max Kelsen, was there anything that your university experience, uh, didn't really prepare you for or that you weren't expecting?

Samual:

Um, I think, um, yeah, I think, uh, um, and, uh, the Master of data science, I was in the first cohort for that. And I think the, the, um, the label data scientist, I think even still now isn't really, um, uh, consistently defined. Um, and so I think, um, that, that the master of data science did really, really well in preparing myself for a lot of different things. And it had the flexibility for me to hone in on the particular areas that I was interested in. For example, I chose to focus more on statistics than, um, uh, database type subjects, but I did do it here, database type subjects. I think I, I did, uh, um, uh, the, uh, data mining and a couple of subjects that, um, specifically focused on database principles. And then, um, there was a big data type analytics subject, and, um, I felt like those subjects, um, uh, didn't prepare students well. And, uh, what I think is involved in, uh, and, and working with big data, um, I, that's not something I'm particularly focused on at work. There are people that Max Kelsen and that, uh, do focus on data engineering and, and, MLOps, um, which is like taking everything to production. But I think that the UQ, um, uh, data science program, and I think the same can be said for computer science and, and, and software engineering more broadly, having looked to the old subjects in my own opinion, from what I saw, they weren't up to date on what's being done now, but I, I do know that UQ are working with, um, uh, organizations such as, um, Amazon web services, AWS, and, um, uh, I think, um, uh, the sort of, um, helping them catch up.

Matt:

Hmm. I see. And I guess, uh, do you know any good resources to fill the gaps for the things that you didn't pick up from university? Or was it sort of just like any area of work you didn't really need to?

Samual:

Uh, I think the only way is to learn on the job. Um, I think you can try and prepare yourself with terminology by getting a few certifications and stuff. I think Amazon is great for that. Um, and I think Google is great for that also. Um, so, uh, those certifications, um, uh, so, um, that's something that everyone is getting into put on their CV to make themselves more attractive. So certs, I think, uh, in of themselves, um, probably on enough, um, to be a reliable data engineer or anything, um, uh, but, um, I think they will help and, um, being aware of, um, what are the common, uh, modern methods and, um, uh, and what is all the terminology. And, um, then I think it's really just about learning on the job and, um, the same, the other big thing I think that's missing from, uh, any degree, um, is just, uh, real world experiences. Like how do you interact with people? There's a big difference between between doing a group project and actually working, um, in a company where there's the complex politics and client demands that are on the other changing. Um, and then, um, uh, you know, how do you interact with your supervisor? Uh, how do you know when to ask a question and when not to ask a question for that, get them out to the common thing I think people fail at. I think it's important to, if you have a problem, come up with a list of solutions that you think you and, and, and come up with five different ways of answering the problem, even if they're all wrong and then go to your manager or supervisor with that. But that rather than just a, a question, um, that helps them, uh, prepare their mind to give you a better answer, but it also just shows that you're just doing your absolute best. Um, yep.

Matt:

I see. Um, so I guess as, as, as you started, like working in machine learning research or that sort of area, um, a lot of students choose to focus on, um, one or both of computer science or statistics in their undergraduate degree to try to like prepare themselves for that sort of work. Uh, what do you believe is more important between, um, between those two?

Samual:

Between statistics and computer science? Yeah, I think, um, computer science sort of subsumes statistics, that statistics is very much a part of it. And, um, and I think I'm very much like people always say like, and it just depends on what you define as machine learning and, um, statistics and, and, and computer science. And this is, and it also depends on where you're at and where you want to go. Um, for me, I, um, uh, and my, it was, I got very interested in the statistics because of my geophysics background sort of, um, gave me interest there. And I think civil engineering is more of a statistics focused, um, engineering branch conveyed to something like mechanical engineering where you can rely on first principles a bit more directly. Um, and, um, uh, so I think, um, I got very interested in statistics and now I found the deep learning book by Goodfellow and Yoshua Bengio. And, um, I read through that because I got absolutely obsessed to the concept of deep learning other, I found it very interesting. And so that, I think put me in a position that had made more attracted to the, the more, um, mathematical, um, perspectives compared to the more, um, you know, programmatic, um, perspectives. Um, I think if you, um, so that's like, um, my personal, like, um, set off of, um, I suppose, um, uh, I've just re-labeled as between a more mathematical approach and a more programmatic approach compared to a more statistical approach versus computer science. So that's like the terminology I'm sticking with to answer this question. It's a hard question to answer. And, um, uh, and I've also framed, uh, where, where, uh, I was positioned when I made the decision of which I, um, pursue, um, and, um, uh, then, uh, which two, uh, the only thing remaining is like, which to actually pursue. And I think ultimately everyone will have to focus on base. Um, and I think, um, and, and it depends on what you want to go. For me, I wanted to go into more research. I wanted to learn more about, um, how it all works and, um, what are the possibilities? What are the limitations, um, because if you understand the, the possibilities and limitations, you can reason about, uh, the safety and the fairness and the reliability of it, and, and talk more about, and, and that will prepare people for, um, realize, um, the, uh, follow one technology. Um, but I think, um, and I think everyone should try and be aware of the limitations of it. And I think if you're going to be aware of the limitations of it, you have to be aware of the mathematical perspectives. And so I think everyone has to do the basic statistics and everyone has to, um, uh, uh, try and understand, uh, um, as much as they reasonably can. Um, uh, so then, um, uh, uh, and it also depends on your personality as well. Um, uh, so then the programmatic perspective, um, you would, I suppose you'd want to, um, this, you, that if you have more about saying something, that's, that's getting some sort of an output, um, if you, if you want to actually be at the edge of, uh, any industry, if you want to do consulting or you very much want to get very good at being faster, just programming something, um, knowing how to, um, uh, work with auto ML, knowing how to do all the pre-processing and knowing how to, um, uh, just, um, quickly benchmark and iterate over all the different models and, and learn about the, um, churning of, uh, the hyper parameters on a very black box perspective. Um, and I think he can go very, very far by knowing very little about the mathematics. Um, and then, uh, but if he, if you take that road, you'll, you'll find yourself more in the operations side of, of things, and you'll find yourself very much focusing on data engineering problems and, um, uh, and, and putting things into production. So it depends on what your strengths are and what you're interested in.

Matt:

I see. So I guess, yeah, it depends.

Samual:

So complicated, I guess, and I'm terrible at giving simple answers to complicated questions.

Matt:

Um, it is a complicated topic, and I guess it seems like the best thing to do is just, you know, get comfortable with both sorts of approaches.

Samual:

Yeah, I suppose it is the answer is a cliche and that's just, you've got to do both. You have to do, yeah. If you just do the mathematics, you're not going to get far at all. If you just do the program, um, programmatics, uh, I don't know if that's a word, if you just do the programming, you're not going to get very far at all. Um, try and do both, and then, um, decide on what you want to do more of by just by how you enjoy it, what you just do, what you want to do and what you think you like, and don't, don't, don't, I guess you don't have to make it complicated. That's what I did, like when I decided to learn more about the mathematical perspectives, um, uh, I just enjoy that more. And so I did it, it was that simple.

Matt:

Fair enough. I guess I sort of want to pivot more to like, uh, your work, work life now. Um, one thing I'm curious about, um, what sort of like a typical day at Max Kelsen, what's that sort of like?

Samual:

Um, uh, and I've, I think I've been at Max Kelsen and almost two years now, and it feels like every month the company has changed so much. It's, it's, um, there's a whole bunch of different new people and, um, the way, um, we work is very different. Um, uh, so I guess there is no typical day, like it's always changing. That's the amazing thing about data science and machine learning, uh, that you'll always be working on different types of problems, probably working with different personalities, very different personalities. That's, um, that's something I really liked that there's a lot of different sort of, um, uh, characters and, um, but it's, it's, um, to me, it's, it's, I, I work mostly in research and so, um, uh, it can be with, um, looking into, um, say, um, genomic information and, um, um, trying to answer questions about cancer, um, or it could be, um, uh, doing, um, blue sky research and today's and deep learning and learning how to ascertain uncertainty. That's like my key interests. And then, um, sometimes it could be just like popping in and talking to, um, some other companies about how we can work together and, um, how we can provide services to assist them, um, in the, um, consulting side of things. Um, so there's two, two main groups of Max Kelsen. There's the consulting. And then there's the research. I work in the research, um, the, in the consulting, it's a very different game. It's very time focused, very, um, intense, um, uh, uh, fast work. And, um, and that's more about, um, the, the problems can just be so diverse. It can be about, um, ordering pizzas, or it could be, um, helping, um, uh, uh, in house care, um, um, whether it's, um, uh, visualizing, uh, um, various, uh, diseases or, um, whether it is, um, estimating how long a patient's going to be in intensive care for, or, um, whether it is, um, does a surgeon have all the equipment, um, uh, available to do their work or, um, and, um, and every single there's just like so many different, you know, insurance, and then, um, uh, there's, there's, uh, so many different types of problems and just, all of them are just very, very different. Um, but usually it's just like working, um, working with a team and, um, and just every, every day feels different. I dunno. I can't say what a typical day is. Yeah.

Matt:

What's your, can you go a bit more into detail about like your area of research?

Samual:

Yep. Um, uh, so, um, are you more interested in, uh, uh, in the genomics or in the, um, Bayesian deep learning?

Matt:

Um, I guess both they're both big, like, areas that you're interested in, right.

Samual:

The two groups I work with most are so in the research team, there's like, uh, we all sort of, um, uh, uh, work, uh, each every area a little bit while we can, um, to keep things flexible and moving. Um, that's three main groups in the research team. Uh, there's the, um, uh, there's the, uh, interpretable AI group. That's where I mostly work. And then there's the genomics group. Uh, uh, I, I work there a lot too, and then there's the quantum group. Um, and, um, uh, so I can only really speak for the genomics and the interpretable AI, the interpretable AI is more about, um, uh, okay, we've got this, uh, all these new algorithms that are becoming very powerful because of the advent of GPUs and, uh, and, um, and, and, uh, computing around big data. So now we've put the possibility of, um, uh, learning from large amounts of data. So deep learning is becoming a thing. And, um, but the problem is, is that it's got this amazing predictive power. Um, but, um, uh, it's not, we can't really rely on it. We don't know when it's going to work when it's not going to work. And we don't know if without true labels, we don't know when it is working when it isn't working reliably. And so, uh, that's what motivates the use of uncertainty, uh, or the, the modeling of uncertainty. And then, uh, so in interpretable AI, we focused on, um, modeling, um, uncertainty with focus on, um, we focus on, uh, explaining, um, uh, which features or which input, um, uh, inputs are, um, uh, uh, contributing the most to the decision of in your own network. Um, and then also, um, we're starting to scratch the surface of, uh, uh, causality in that group too. And then that group sort of helps. Um, um, so there's, uh, other areas of Max Kelsen including genomics to, um, use a lot of methods from interpretable AI also, um, and, um, in, in, in the genomics group, um, we, we, uh, work to solve problems about, um, immunotherapy outcome prediction. So if you're given, um, uh, uh, um, some genomic information, uh, can we predict whether a patient will respond positively to some treatment and then, um, uh, then there's, um, um, problems about cancer of unknown primary, um, uh, which I think, um, is, um, uh, it's already difficult to treat if you have a cancer on an unknown primary, and I think can attribute it to about 5% of people who die with cancer, I think, um, and then, um, in the genomics group, um, there's a, there's a bunch of other things that get that go away from, um, uh, cancer, genomics, um, uh, we're interested in agriculture, we're interested in the great barrier reef and coral, um, and, um, uh, and I think there's Maciej the research lead has a big background in psychiatric genomics. He's sort of our genomics guru. Um, and then, um, so that's, that's sort of, it's that space there that I've just described where I spend about 90% of my time.

Matt:

I see. Okay. Um, I guess this is a bit more of a, like a, a broader question. Um, so these sorts of machine learning solutions, um, I guess if you read like the news or like if you just Google machine learning and things like that, you'll find a lot of things about how it's the solution to everything like it can solve every single problem. And, um, are there any problems that those sorts of approaches don't really, um, work for another sorts of problems that are like it, that machine learning approaches are better suited to solve than others?

Samual:

Yeah, I hope I can answer this question. Um, the, um, I think as you'll very well know, um, machine learning has its limitations and that it's, um, uh, very, uh, good if you have a very specific well-defined task, um, that is non-stationary. Um, uh, and so, um, if you, um, uh, and, and, and beyond that, uh, it it's completely unreliable and useless. Um, whereas more traditional, um, engineering disciplines would work on using, uh, physics, um, to simulate some sort of a system and be able to make inference about, uh, the future or, um, various scenarios that the system can be, um, put through. And, um, and those, those sort of more physics based, um, uh, uh, um, methods are very reliable, um, in comparison to machine learning. Um, but, um, uh, uh, so yeah, I think if you've got a well-defined task and if it's narrow machine learning is very good, um, but there's lots of, uh, narrow, uh, uh, tasks that can be well-defined. And so machine learning is, um, being found everywhere. The other big, the biggest problem, in my opinion of, um, machine learning and in general is that, um, it w it will by us, um, um, the, what it is experienced on just like everyone does. And, um, so for example, um, if you, if it's trying to predict between, um, uh, I can't an adult and 99% of the, um, uh, data points that it trains on is a dog. It will commonly say a cat is a dog. Um, and, um, because th the classes, uh, labels are imbalanced and that's all fine because we can just balance, there's very easy ways to deal with that problem. But the concerning thing is, is that often with tasks, there are hidden variables. For example, if you're not for, if, if, if you're looking at genomic data and, um, uh, and you're trying to predict, uh, let's just use the example of next skeleton, if you're trying to predict, um, immunotherapy outcome, um, and, uh, you're using, um, genomic information. Um, if 98% of the, um, genomic information comes from, um, Caucasian men, then it will be on, uh, on, uh, uh, uh, um, black women or, um, uh, or any, any, um, minority, uh, group, um, um, more generally any, um, underrepresented, uh, subpopulations. And so it can be unfair against those people. And that's a really important thing. Um, when you think about, um, uh, where all the data's coming from, um, and it's a really important thing for, um, fairness, and it's a really, um, big shine, I suppose, that it's starting to, um, be used everywhere. And, um, a lot of, um, uh, um, groups, not just, not just, um, racial or gender groups, um, but just, uh, any kind of subpopulation you can think of that is underrepresented, uh, will be on fairly discriminated against.

Matt:

Hmm. And I guess, I guess that means, like, I guess the data set size and the way that the data set is collected is really important, um, to have the solutions.

Samual:

Yeah. That's one, that's one way around it. There's other ways around it. Um, uh, yeah, but I'm not going to even start talking about that.

Matt:

That's, that's all good. Um, I think you've given a really good overview of everything and I guess, like that uni experience up to work, um, and also, I guess a few, um, specifics about research, which I found really cool. Um, that's pretty much all the questions from me. Did you have anything you wanted to say?

Samual:

Oh, just thanks very much. Yeah. It was a pleasure. It's really awesome that you doing this podcast. I think it's really cool. Uh, I hope I could be helped to anyone at all. Cool.

Matt:

All right. Thanks so much.

Samual:

Thank you, Matthew.

Matt:

That's all we have for you today. We hope you'll join us in two weeks for the next episode of the router. And until then come join our community at slack.uqcs.org.