PodcastsLatent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0NeurIPS 2023 Recap — Best Papers

NeurIPS 2023 Recap — Best Papers
Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0Sat Dec 23 2023
NeurIPS 2023 Recap:
- The NeurIPS 2023 conference in New Orleans featured a total of 3586 papers, making comprehensive coverage challenging due to its scale. Audience feedback is sought through a listener survey for improvement and future content ideas.
- A new experimental format was attempted for the coverage, aiming to jump from paper to paper, person to person, founder to founder for insights.
Best Paper Awards - Test of Time Award:
- Jeff Dean and Greg Corrado presented their work on exploring loss functions and optimizing word embedding representations in the skip-gram model. They emphasized the power of semi-supervised objectives and fast parallel weakly synchronized computation in natural language understanding.
Emergent Abilities Mirage Paper:
- The paper by Ryland Schaefer et al. challenges emergent abilities observed in large language models under certain metrics. It suggests that these observations may be influenced by methodological disagreements rather than being absolute phenomena.
- The study proposes an alternative hypothesis for emergent abilities, highlighting the interplay between known scaling properties, evaluation data quality, specific metrics used in evaluations, and statistics as crucial factors in predicting changes in model capabilities with increasing scale.
Direct Preference Optimization (DPO):
- DPO is a simpler and computationally cheaper alternative to Proximal Policy Optimization (PPO) for training large language models.
- It removes the need for explicit reward modeling and subsequent RL optimization, providing an optimal policy directly after fitting the reward model. This approach is efficient and stable, with potential for significant efficiency gains compared to PPO.
Scaling Data-Constrained Language Models:
- The premise focuses on data constraints in pre-training large language models due to the exhaustion of high-quality language data sources such as papers and books.
- Experimentation shows that repeating data during training can lead to similar performance as training unique data, suggesting potential scalability within existing data constraints.
Killora: Efficient Fine Tuning of Quantized Large Language Models:
- Killora enables fine-tuning large language models with significantly reduced memory requirements, making it accessible to more researchers.
- It introduces 4-bit normal float data type that replicates 16-bit performance despite compressing the neural network to 4-bit.
- The main challenge is to preserve performance while achieving 4-bit compression.
Datacomp Benchmarking Effort:
- Datacomp offers a benchmarking effort focused on multimodal datasets, aiming to provide thorough evaluations for various downstream tasks.
- Participants have options for data selection methods from a fixed provided pool or bringing their own data.
Lava Model Overview:
- Lava is an open-source visual instruction tuning model that allows reasoning about the visual world and natural language reflection.
Training Lava Model:
- Lava's two-stage training pipeline involves pre-training the projector for feature alignment in stage one and performing end-to-end visual instruction tuning on generated data in stage two.
Importance of Vision Encoder:
- The significance of the vision encoder lies in its ability to understand visual attributes and content, allowing effective integration into the language decoder during training and inference stages.
Tree of Thoughts - Deliberate Problem Solving with Large Language Models:
- The Tree of Thoughts proposes a method that combines language models and search algorithms for deliberate reasoning across diverse tasks.
Toolformer: Language Models Can Teach Themselves to Use Tools:
- Toolformer explores training language models to use external tools by augmenting natural language text with API or tool calls.
Evaluation Protocol for Cognitive Capacities:
- CogEval introduces a systematic protocol for evaluating cognitive capacities by operationalizing latent abilities across multiple tasks, structures, domains, and task conditions.
Mamba Model & Signal Processing-Based Models:
- The Mamba model was introduced as a novel approach aimed at addressing computational inefficiency on long sequences typically observed in deep learning applications.
- Signal processing-based models were presented as an innovative solution leveraging signal mixing boxes based on signal processing ideas