
Prof. Melanie Mitchell 2.0 - AI Benchmarks are Broken!
Machine Learning Street Talk (MLST)Sun Sep 10 2023
AI Understanding:
- Melanie Mitchell argues that the evaluation of AI systems should focus on rigorous granular testing for abstract generalization, challenging the concept of understanding in AI as multidimensional and ill-defined.
- Large language models have sparked debate on whether they exhibit genuine understanding of language and the world, with capabilities rivaling humans across diverse benchmarks.
Benchmarks and Evaluation:
- Typical benchmarks summarize aggregate performance, obscuring failure modes and masking underlying mechanisms, highlighting the need for more focus on proper experimental methods in AI research.
- Developmental psychology offers examples for rigorous testing of cognition in AI research, emphasizing the necessity to evolve benchmarks as capabilities improve.
Intelligence Assessment:
- Intelligence is not a unified notion but rather multidimensional and requires specific specifications for assessment, indicating a challenge in assessing machine-induced prior knowledge versus actual machine learning or human expertise.
- Assessing machine-induced prior knowledge versus actual machine learning or human expertise remains a challenge in benchmarking large language models.
Intelligence and Computation:
- Intelligence is not easily abstractable and is specific to particular domains, being situated and tied to the environment.
- The brain does computations but in a highly evolved, domain-specific manner that may not make sense without the rest of the organism.
Benchmarking AI Systems:
- Benchmarking for intelligence should evolve as capabilities improve, with a focus on proper experimental methods from cognitive science.
- Reporting instance-level failures rather than just aggregate accuracy can provide insights into machine learning systems' real capabilities.
Complexity Theory and Scaling Laws:
- Complexity theory explores scaling laws, focusing on what happens to a system as it increases in size or population.
- Work on scaling extends to cities, measuring phenomena like energy usage, innovation rates, and happiness levels to understand how social systems function.
Understanding Intelligence:
- Collective intelligence plays a significant role in human understanding, with much individual intelligence grounded in collective intelligence.
- There's interest in exploring the scaling of intelligence from both an individual and collective perspective.