Physics

Research With AI - The Potential Of Accelerated Discovery

Andres

27 Nov 2021 • 8 min read

If there is one thing that research industries could use more of, it is more advanced technology to do more research. It is a time-proven fact that the process of exploring our universe never ends - and it goes from understanding the mechanisms that throttle life into us to the endless possible chemical properties we have yet to synthesise in the lab. One might think that, over time, the millions of scientists all around the world would uncover so much that new research publications would eventually slow down to a halt. Though this sentiment might be plausible in the future (though I doubt it), for now, we are seeing the exact opposite trend happening.

This year, out of all the years before it, has had the most research and scientific discoveries in all of human history - and next year will probably be the same. Data published by the National Science Board in the US on Science & Engineering publications corroborates; each consequent year in the last 2 decades, the number of global research papers published has grown by 4%. Accordingly, we have seen a rise from 1,800,000 articles to over 2,500,000 articles published annually between the years 2004 to 2018 - and the numbers keep rising.

'Annual Science & Engineering articles published by selected regions and economies' - Image taken from NCSES

Not only that, the numbers are rising exponentially. Even with the volatility of the political world, the chaos of the COVID-19 pandemic and the ever-present threat of climate disaster, science is still accelerating. How is this possible? For one, there is the internet - the catalyst of our interconnected society that allows research to be communicated within seconds, further boosted by the increasing availability to it world-wide.

Newer generations are also taking a more astute interest in careers in STEM, and the expanding population has only made the scientific community that much larger, so there's that. But the thing that is truly driving this explosion of discoveries is simple: the technology for it is advancing. And with progress, of course, come cheaper prices, with machinery and equipment becoming more accessible for the general public to try. Back in the early 2000s very few really spoke of artificial intelligence (AI) and machine learning, of quantum computers or hybrid clouds. Now you can turn around a corner in the streets, and suddenly you'll see a group of kids talking about the new hyper-realistic, AI-driven virtual reality hardware that they'll play some video-game on. It is awesome, and it is daunting.

One application of modern machinery that was recently brought to my attention is accelerated discovery. Supposedly, this combines the best of cutting-edge technology and software and primes it for research purposes, literally accelerating how quickly we can produce new ideas and try them out in the lab. When did this come about, however, and what can we do with it?

Future Promises & Current Research

I found the concept of accelerated discovery through a seminar at my university, and I must admit that it felt somewhat more like a mercantile advertisement than an actual research talk. In hindsight, this makes sense; the speaker was trying to gather attention (and most likely possible funding!) towards his organisation, and could do that most efficiently by exclaiming the incredible potental of their new technology. I do not criticise it, either - the talk was honestly astonishing.

The big company behind accelerated discovery is IBM. Over the last few years, these people have been pushing hard at developing methods to optimise research procedures, and they have been rather convincinig about it (at least to me!). According to them, such advancements are positioned to effectively tackle societal issues world-wide, such as:

global warming, reduced by creating technology to capture and minimse carbon emissions in the atmosphere;
land-fill wastages of toxic batteries, finding alternative, more environmentally-friendly energy-storing methods to replace them;
overuses of synthetic fertilisers that damage our soil, which could be made obsolete by AI-operated systems that imitate soil bacteria to fix nitrogen in the atmosphere directly into the ground for a healthier, nitrate-rich soil (as I speak about in my other post);
sustainable material manufacturing;
and even medical treatment generation, which the combination of AI and cutting-edge analytic data software may be able to quicken for the next widespread virus or condition we face.

Environmental Pollution, a world issue that may be solved with technology developed with accelerated discovery — Photo by Alexander Schimmeck / Unsplash

These are all very real possibilities for the rising technologies today, and any one of them could drastically improve our society if managed properly. That is a big 'if', however. How do we know that the wares that are being developed today will advance enough to prevent future catastrophies? Or if industries will even find the motivation to direct their products for the good of the people (a bit cliché, but a serious question)? Firstly, we have already seen how rapidly research is moving and gaining traction in today's society. And, again, this trend will not stop anytime soon - particularly in the development of profitable high tech. As people are becoming steadily more aware of world problems, there should also be no fear of such technology not being made to tackle them - though the scope of this remains to truly be seen (here's to preventing climate disaster!).

Nonetheless, accelerated discovery at least is already proving to be quite valuable in the aforementioned world issues. Recent studies, for example, used active machine learning on compound structures to predict and identify specific electrochemical catalysts that more efficiently react carbon dioxide in the atmosphere with water. In turn, the experiment was able to produce carbon dioxide-derived ethylene more efficiently than any electrolytic reaction before, thereby reducing costs associated with future trials for greenhouse gas emissions capture. Given the climate situation we face at the time of writing, such sophisticated data analysis techniques could very well prove crucial to the prevention of climate disaster.

Other interesting applications of accelerated discovery include its role in the pharmaceutical field. A recent study led by Dr Payel Das, from the Columbia University in New York, was able to use deep learning and AI-assisted molecular dynamics simulations to rapidly obtain new antimicrobial molecules to fight virulent pathogens. In only 48 days the team was able to identify and image two proteins that suited the criteria (where typically over 6 months would be needed to collect decent results), which meant that they could deliver an effective therapy against diseases - including a drug-resistant bacterium known as Klebsiella pneumoniae - while being non-toxic and safe to use for the general patient. Though these results only apply to bacterial infections, the potential of accelerated discovery in helping to fight pathogens is clear. With the rise of antiviral drugs, I would not be at all surprised if we could utilise this to find ways to more quickly develop treatments in future diseases.

Drugs and antimicrobials/antivirals being discovered with deep learning — Photo by Michał Parzuchowski / Unsplash

Those are a few examples of how accelerated discovery is being used today, and they are only the start. There is a pressing question that we have not yet covered, however. How does it all work?

Reading Fast & Quantum Simulations

There are 4 overarching types of technology used for accelerated discovery - hybrid clouds, AI, supercomputers and quantum computers. We have mentioned some of these already, but it is useful to discuss what each one actually offers to our procedure. To start with, we have the hybrid cloud. Typically used by large companies able to afford mass electronic infrastructure, this consists of 2 parts: a private cloud and a public cloud. The former simply refers to the data stored in an organisation's on-premise computer servers, generally possessing private details (e.g., the personal medical records stored in the NHS's data bank) or just the main data sets managed on a day-to-day basis. In contrast, public clouds are essentially rented computer servers, localised with a dedicated company that both manages and updates the servers automatically. Think Google's Cloud Bank, or Microsoft Azure. Though this is admittedly very flexible and easy to use (relatively, anyway), it can be somewhat unsafe in the long run; you won't have full control of your data, after all. As such, organisations tend to use a mix of the two, and that is what we know as a hybrid cloud.

Hybrid cloud, which consists of a public and a private cloud on-premise — Photo by Ian Battaglia / Unsplash

Research institutes also use hybrid clouds for their own data storage, saving all their previous findings and ideas in them. If they decide to, they can further share this information with the public on the internet - and that is what allows the cooperation that we see in the scientific community. With this in hand, scientists can use deep learning AI (which basically entails complex multi-dimensional algorithms that selectively analyses data to 'learn' from it to optimise and achieve a certain outcome) to go over their own cloud servers and peer-reviewed journals online to extract and understand relevant information up to 100 times faster than a human. The speaker on my university seminar made sure to highlight this; deep search, as IBM calls it, could be pictured as a computer that is able to read 20 pages of technical information (which might take the average individual 1 hour to read and comprehend) in only 1 minute. In turn, the capability of AI can be improved by more powerful hardware - and that is where the supercomputers come in.

As of this moment, supercomputers are the most powerful devices we have. With them, computer engineers can operate AI algorithms astonishingly quickly, and so they are what has been enabling the guided use of accelerated discovery in research. Besides their function in deep search and information screening, however, they serve another purpose. After extracting the data, scientists need their program to go a step beyond and extrapolate it to make automated predictions of other possible data points. In other words, they require simulations, or generative models. This is the final part of the procedure and, in some ways, it is also the most limiting. While it is relatively straightforward to collect and process data with a few algorithms, it is a whole new task to actually learn enough from it to create new ideas entirely. Luckily, quatum computers are proving to be wonderfully competent in just that.

I won't go into how quatum computers work here - not right now - but trust me when I say that they are advancing quickly. Many even believe that, within the next decade, quantum computers will be readily accessible to research institutions world-wide. And the same goes for accelerated discovery, of course. So, in other words - the present is proving to be a prime age to become a scientific researcher!

References

Khan, B. & Robbins, C. & Okrent, A. (2020). Global Science and Technology Capabilites. National Science Board. Retrieved from https://ncses.nsf.gov/pubs/nsb20201/global-science-and-technology-capabilities
Zhong, M. et al (2020). Accelerated discovery of CO₂ electrocatalysts using active machine learning. Nature 581:178–183. Retrieved from https://doi.org/10.1038/s41586-020-2242-8
Das, P. et al (2021). Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations. Nature Biomedical Engineering 5:613–623. Retrieved from https://doi.org/10.1038/s41551-021-00689-x
Johnson, T. H. & Clark, S. R. & Jaksch, D. (2014). What is a quantum simulator?. EPJ Quantum Technology 1:10. Retrieved from https://doi.org/10.1140/epjqt10

Future Promises & Current Research

Reading Fast & Quantum Simulations

Sign up for more like this.