Lev McKinney

# Hi I'm Lev! <span class="rightimg"><span class="tinyimg"> ![[headshot.jpg]] </span></span> I'm a graduate student at the University of Toronto focusing on AI Safety supervised by [Sheila McIlraith](https://www.cs.toronto.edu/~sheila/) and [Roger Grosse](https://www.cs.toronto.edu/~rgrosse/). Currently, I'm working on applying training data attribution techniques, like influence functions, to understand processes like out-of-context reasoning in large language models. I've previously done research on unlearning in Large Language Models, understanding transformer predictions at FAR AI, reward learning at the [Center for Human Compatible Artificial Intelligence (CHAI)](https://humancompatible.ai/), and model-based reinforcement learning (MBRL) with Keiran Paster here at U of T. --- > [!example] Papers & Preprints > >* **Lev E McKinney**, Anvith Thudi, Juhan Bae, Tara Rezaei Kheirkhah, Nicolas Papernot, Sheila A McIlraith, and Roger Baker Grosse. Gauss-Newton Unlearning for the LLM Era. In _ICML 2025 Workshop on Machine Unlearning for Generative AI_, 2025. https://openreview.net/forum?id=VFfttnDvW6 >* Nora Belrose, Zach Furman, __Lev E McKinney__, Logan Smith, Danny Halawi, Igor Ostrovsky, Stella Biderman, and Jacob Steinhardt. Eliciting latent predictions from transformers with the tuned lens. arXiv preprint arXiv:2303.08112, 2023. URL https://arxiv.org/pdf/2303.08112.pdf. > * __Lev E McKinney__\*, Yawen Duan\*, David Krueger, and Adam Gleave. On the fragility of learned reward functions. In _Deep Reinforcement Learning Workshop NeurIPS 2022_, 2022. URL https://arxiv.org/abs/2301.03652 > * Keiran Paster\*, __Lev E McKinney\*__, Sheila A. McIlraith, and Jimmy Ba. BLAST: Latent dynamics models from bootstrapping. In _Deep RL Workshop NeurIPS 2021_, 2021. URL https://openreview.net/forum?id=VwA_hKnX_kR. > [!info]- Contact Info > levmckinney [at] cs [dot] toronto [dot] edu