NTT Research Physics & Informatics (PHI) Lab Scientist Hidenori Tanaka delivered two papers at the Annual Conference on Neural Information Processing Systems (NeurIPS) 2023. These papers addressed both sides of the event’s traditional scope: the science of artificial intelligence (AI), and the application of AI to neuroscience. In the first case, a Tanaka-led team studied the compositional abilities of diffusion models; in the second, the training method for a digital twin of recorded neural activity in mice. A flagship conference on the machine learning (ML)/neural networking subset of AI, NeurIPS 2023 was held in New Orleans, December 10-16, 2023.
Most papers at NeurIPS focus on the science of AI. According to a semantic map in this Zeta Alpha post, seven topics were dominant this year (see the largest circles on the map), including diffusion models, a class of generative models that relates a set of observable variables to a set of hidden variables. In their paper, “Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task,” Tanaka and his three co-authors explored the timely question of how diffusion model-based AI systems exhibit emerging capabilities to generate new samples, such as visual data. (The paper co-authors are PHI Lab and Harvard Center for Brain Science ((CBS)) Scientist Maya Okawa, University of Michigan Ph.D. candidate and CBS affiliate Ekdeep Singh Lubana, and University of Michigan Associate Professor Robert Dick.)
A costly status quo has made this a hot topic. In large language models, like ChatGPT, a more-is-better approach has to date prevailed: Larger neural networks, bigger data sets, more computers. But experts have begun wondering if that’s the right way. A 2022 paper by scientists from Google Research, for instance, pointed to thresholds beyond which capabilities emerge, raising the question of whether you could tap the brakes. “Can emergent capabilities be unlocked via other methods without increased scaling?” they asked.
A debate has ensued. At NeurIPS 2023, the authors of the paper (“Are Emergent Abilities of Large Language Models a Mirage?”) that won a Best Paper Award answered their paper title affirmatively: Yes, emergent abilities are a mirage. “Depending on how we evaluate the language model,” Tanaka said, summarizing their paper, “these sudden curves of capability acquisition may look either sudden, or just continuous and gradual.” For their part, the Tanaka team disagreed. “Our position is that the story is not as simple as that.”
“Let’s say you want to understand quantum mechanics in physics,” he said, by way of analogy. “You have to understand some mathematical concepts, like linear algebra, some physics concepts, like classical mechanics, electricity, and magnetism, and so forth. The point is that any of these intelligent tasks have sub-building blocks. It has compositional structure. So, if you give a math word-problem to a machine, then there is not just a single capability that it must have.”
Dr. Tanaka and his colleagues expanded on this more compositional approach in their paper. Representing NTT Research, the University of Michigan, and Harvard University (where he is also an Associate at the Center for Brain Science) and using a model-experimental systems methodology, the team delivered three results. First, a confirmation that modern AI systems can synthesize concepts to generate new samples. Second, evidence that the hidden mechanism explaining “emergence” is this compositional structure, or the multiplicative reliance on performances of constituent tasks. Third, an alert that ensuring fairness, such as minority image generation, has implications for costs.
“Composing concepts with lower frequency in the training data to generate out-of-distribution samples requires considerably more optimization steps compared to generating in-distribution samples,” the authors wrote. How much does fairness cost? Dr. Tanaka said you have to budget 10 to 100 times more to train a model to generate those out-of-distribution samples.
The second NeurIPS paper involved AI and neuroscience. Like an earlier work that Dr. Tanaka presented at NeurIPS, this paper drew from behavioral research conducted at Stanford University. Whereas that NeurIPS 2020 paper analyzed data generated by the optical nerves of salamanders; this one leaned on large datasets of recorded neural brain activity in mice. (The paper co-authors are Stanford University Ph.D. Candidate and PHI Lab Intern Fatih Dinc, Stanford University Research Associate Adam Sahai, and Stanford University Professor Mark Schnitzer. Professor Schnitzer has joint appointments in the Departments of Biology, Applied Physics, and Neurosurgery, is an investigator of the Howard Hughes Medical Institute, and is co-director of the Cracking the Neural Code program at Stanford University.)
Like the other NeurIPS 2023 paper, this one had intriguing implications. A promising way to extract computational principles from large data sets generated from behaving animals – principles that could open up new possibilities for interpretation and control – is to train data-constrained recurrent neural networks (dRNNs). The inefficiency and limited scalability of existing training algorithms for dRNNs, however, make it a challenge to analyze large neural recordings, even in offline scenarios. To address these issues, the authors introduced a training method called Convex Optimization of Recurrent Neural Networks (CORNN).
The results were impressive. In studies of simulated recordings – in effect, digital twins – CORNN attained training speeds that were approximately 100 times faster than traditional optimization approaches, while maintaining or enhancing modeling accuracy. The potential applications are also compelling. “This paper is just about the computational work,” Dr. Tanaka said. “But the motivation was to develop machine learning methods…so that in real time you have the digital twin compute how to intervene on the system, and then give feedback to the real brain. That’s longer-term goal.”