From Tour de France and Indy Cars to Cardiology: Yasuyuki Kataoka Follows the Data

A specialist in applied machine learning (ML) and control engineering, Yasuyuki Kataoka has, to-date, focused on time-series, heterogenous and real-time data in areas ranging from IoT/wearables, robotics and natural language. As a scientist in the NTT Research Medical and Health Informatics (MEI) Lab, which he joined in July 2019, Kataoka will leverage both data-driven and model-based approaches to tackling cardiovascular disease prediction as part of the Lab’s bio digital twin initiative.

Kataoka joined the MEI Lab after serving as a data scientist at the NTT Innovation Institute in East Palo Alto and a research scientist at NTT R&D in Japan. He received a master’s degree in engineering in 2011, finishing as the valedictorian of his class at the Tokyo Institute of Technology, where he focused on mechanical and control system engineering. While at the NTT Innovation Institute, he was responsible for delivering AI tools and building a development team; creating a wearable data validation and performance-improvement tool and web UI for an Indy Car team; and devising a Tour de France rider’s power-prediction model that was deployed into a real-time production system. He also developed ML model interpretation and casualty analytic tools as an automobile industry application. In 2016, Kataoka won prizes in major hackathons held by industry-leading companies such as Mylan, Mercedes-Benz and Toyota.

We recently asked this talented data scientist a few questions about his previous and current research. Here’s our discussion:

What takeaways can you share from your previous work in areas such as monitoring bike racing and race-car drivers?

Overall, I’ve learned about the joys and difficulties in the challenging tasks of human performance improvement. Every previous project has had different challenges, but I found that performance improvement ­­– being good in health or skillful at something – is a fundamental human desire. It has been enjoyable and challenging for me to assist domain experts, such as professional athletes, go beyond their previous capabilities by understanding the mechanics of how they exert their body. I have also learned that the analysis of human performance is of interest to a wider audience.

Even though new insights were found by applying cutting-edge data science or AI techniques, dramatic upgrade of human performance in the real world is difficult. Maybe the human body is complicated enough to be represented by data-driven approaches. But I feel like not only the data-driven approach, but also the model-based approach will be needed to deeply understand and improve human performance. Beyond the medical or healthcare industry, I think this area will become more interdisciplinary. This makes me feel that human performance improvement is a long-term challenge.

Why did you make the shift into medical informatics? How do the problems you are addressing at the MEI Lab compare with those you were working on before?

I like the Japanese word called 心技体, which describes the importance of a holistic approach in life that encompasses mental, technical and physical training. Over the past few years, I have been mainly working on technical aspects. It was great time for me to shift my passion a little bit to the physical aspects. Being physically well-conditioned is everybody’s desire. Researching how AI or data analytics can upgrade human health is definitely one of the most interesting topics to me. In addition, it was great honor for me to work with such great colleagues in our laboratories, which inspired me to dive into this new field.

What challenges do you see going forward in terms of obtaining the right kind of data and using AI to improve the accuracy of risk prediction?

The challenges are both qualitative and quantitative. I think the diversity of the data type to be used for AI will become more important. In the case of heart arrhythmia detection, previous studies have shown that deep learning for single-channel ECG outperformed the cardiologist. However, I believe that more diverse data will be needed to get more accurate risk predictions for other diseases, as medical doctors judge risk using a patient’s comprehensive available data.

I also think long-term data collection is important. If the model targets proactive rather than reactive predictions, data needs to be collected before the disease is recognized to capture very first unconscious features. Unless potential patients foresee the benefit to provide their data, most may not want to continue to provide the data. Obtaining such a new combination data for the long term a challenge to get the qualitative data.

The model performance is also highly dependent on the size of the dataset. I think one of the most promising research directions is to augment data, such as through the class of machine-learning (ML) techniques known as generative adversarial networks (GANs). Although there would be discussion on the use of the artificially augmented data, showing the power of these techniques could be helpful for further discussions.

How does greater knowledge of cardiovascular models potentially complement or enhance the use of ML or AI in medical scenarios?

I am extremely interested in this topic, and I believe combining the model-based and data-driven approaches will get more importance in this field. One potential benefit is knowledge-based data augmentation. If we can infer the cardiovascular system dynamics from the data obtained by limited non-invasive sensors, that will be a powerful feature for the machine learning model to improve disease-prediction accuracy. Plus, with a high potential to simulate the personalized heart model (we call it the bio digital twin model), the augmented feature will be more powerful. In addition, knowing the internal behavior of the heart by identifying the core cardiovascular parameters would be helpful for doctors to diagnose the diseases. Being able to transform raw data to explainable information ­– beyond the black-boxed machine learning model – can help the doctors, I believe.