Machine Learning – Art or Science?
The surge of big data and challenge of confirmation bias, lead data scientists to seek a methodological approach to uncover hidden insights. In predictive analytics, they often turn to machine learning to save the day. Machine learning seems an ideal candidate to handle big data using training sets. It also enjoys a strong scientific scent by making data driven predictions. But is machine learning really bias free? And how can we leverage this tool more consciously?
Why Is Machine Learning a Science:
We often hear that machine learning algorithms learn and make predictions on data. As such they are supposedly less exposed to human error and biases. We humans tend to seek confirmation to what we already think or believe, leading to confirmation bias which makes us overlook facts that contradict our theory and overemphasize ones that affirm it.
In machine learning, the data teaches us, and what could be purer than that? When using rule based algorithm or expert system we are counting on the expert to make up the ‘right’ rules. We cannot avoid having his/her judgments and positions infiltrate such rules. The study of intuition would go even further to say that we want the expert’s experiences and opinions to influence these rules – they are what make him/her an expert!
Either way, when working our way bottom up from the data using machine learning algorithms we seem to have bypassed this bias.
Why Is Machine Learning an Art:
Facts are not science, neither is data. We invent scientific theories to give data context and explanation. Helping us distinguish causation from correlation. The apple falling on Newton’s head is a fact; gravity is a theory that explains it. But how do we come up with the theory? Is there a scientific way to predict eureka moments?
We test assumptions using scientific tools, but we don’t generate assumptions that way, at least not innovative ones that manifest out-of-the-box thinking. Art on the other hand takes on an imaginative skill to express and create something new. In behavioral analytics it can take form of a rationale or irrational human behavior. The user clicking on content is fact; the theory that explains causation can be that it answered a question he/she was seeking or it relates to an area of interest to him/her based on previous actions.
The inherent ambiguity of human behaviors, and even more of our causation or motivation, gives art its honorable place in predictive analytics. Machine learning is the art of induction. Even unsupervised learning uses objective tools that were chosen, tweaked and validated by a human, based on his/her knowledge and creativity.
Another way is to think of machine learning as both an art and a science. Much like Schrödinger’s cat that is both alive and dead, the Buddhist middle way, or quantum physics that tell us light is both a wave and a particle. At least until we measure it… you see, if we use scientific tools to measure the predictiveness of a machine learning based model, we subscribe to the scientific approach giving our conclusions some sort of professional validation. Yet if we focus on measuring the underlying assumptions, or the representation or evaluation method, we realize the model is only as ‘pure’ as its creators.
In behavioral analytics, a lot rides on the interpretation of human behavior into quantifiable events. This piece stems from the realm of art. When merging behavioral analytics with scientific facts, as often occurs when using medical or health research, we truly create an artistic science or a scientific art. We can never again separate the scientific nature from the behavioral nurture.
While this might be an interesting philosophical or academic discussion, the purpose here is to help with practical tools and tips. So what does this mean for people developing machine learning based models or relying on them for behavioral analytics (based on my own experiences plus insights from this post’s contributors – below)?
Invest in the methodology
Data is not enough. The theory that narrates the data is what gives it the context. The choices you make along the three stages of: representation, evaluation and optimization, are susceptible to bad art. So, when in need of a machine learning model, consult with a variety of experts about choosing the best methodology for your situation, before running to develop something.
Garbage in garbage out
Machine learning is not alchemy. The model cannot turn coal into diamond. Preparing the data is often more art (or “black art”) than science. And it takes up most of the time… Keep a critical eye out for what goes into the model you are relying on, and be as transparent about it as possible if you are on the designing side. Remember that more relevant data beats smarter algorithm any day.
Data preparation is domain specific
There is no way to fully automate data preparation (i.e. feature engineering). Some features may only add value in combination with others, creating new events. Often these events need to make product or business sense just as much as they need to make algorithmic sense. Remember that feature design or events extraction requires a very different skill than modeling.
The key is iterations across the entire chain
You collect raw data, prepare it, then learn and optimize, test and validate and finally put to use in a product or business context. But this cycle is only the first iteration. A well-endowed algorithm often sends you to re-collect a slightly different raw data, curve it in another angle, model, tweak and validate it differently, and even use it differently. Your ability to foster collaboration across this chain, especially where involving Martian modelers and Venusian marketers, is key!
Make your assumptions carefully
Archimedes said: “Give me a lever long enough and a fulcrum on which to place it and I shall move the world.” Machine learning is a lever not magic. It relies on induction. The knowledge and creative assumptions you make going into the process determine where you stand. The science of induction will take care of the rest, provided you chose the right lever (i.e. methodology). But it’s your artistic judgment that decides on the rules of engagement.
If you can, get experimental data
Machine learning can help predict results based on a training data set. Split testing (aka A/B testing) is used for measuring causal relationships, and cohort analysis helps split and tailor solutions per segment. Combining experimental data from split testing and cohort analysis with machine learning can prove to be more efficient than sticking to one or the other. The way you chose to integrate these 2 scientific approaches is very creative.
Do not let the artistic process of tweaking the algorithm contaminate your scientific testing of its predictiveness. Remember to keep complete separation of training and test sets. If possible do not expose the test set to the developers until after the algorithm is fully optimized.
The king is dead, long live the king!
The model (and its underlying theory) is only valid until a better one comes along. If you don’t want to be the dead king, it is a good idea to start developing the next generation of the model, the moment the previous one is released. Don’t spend your energy defending your model; spend it trying to replace it. The longer you fail, the stronger it becomes…
Machine learning algorithms are often used to help make data driven decisions. But machine learning algorithms are not all science, especially when applied to behavioral analytics. Understanding the ‘artistic’ side of these algorithms and its relationship with the scientific one, can help make better machine learning algorithms and more productive use of them.
I’m happy to read your feedback. Please leave your comments below.
Mohamad Hindawi PhD FCAS, Vice President of Data Science at Allstate Insurance, U.S.
Fabio Ohara Morita, Technical Director (Chief Actuary) at Porto Seguro Insurance, Brazil
Ariel Shamir, Deputy Dean Professor at Efi Arazi School of Computer Science, IDC, Israel
Domingos, Pedro. “A few useful things to know about machine learning.” Communications of the ACM 55.10 (2012): 78-87.