Rajan

Rajan Vivek

I'm an AI researcher and engineer at Contextual AI, where we are building enterprise-grade AI with retrieval-augmented foundation models.


Previously, I was an AI researcher and MSCS student at Stanford, advised by Douwe Kiela, Diyi Yang, and Kawin Ethayarajh. I also worked on video foundation models at Scale AI, document understanding models at JP Morgan, and ML for large graphs and scene segmentation at Lockheed Martin. Before that, I studied electrical engineering at Georgia Tech and researched explainable AI with Sonia Chernova.


Some of my interests are deep learning, systems, racquet sports, physics, and psychology.

Publications


  • Figure from Paper

    Anchor Points: Benchmarking Models with Much Fewer Examples

    Rajan Vivek, Kawin Ethayarajh, Diyi Yang, Douwe Kiela
    EACL 2024 Main (Long Paper, Oral)
    [paper] [code]

    We discover that language model predictions are low-rank: correct class probabilities are strongly correlated on many pairs of examples. We exploit this to benchmark language models (on GLUE and MMLU) with much fewer examples and infer performance on specific, unseen points.

  • Figure from Paper

    Explainable Activity Recognition for Smart Home Systems

    Devleena Das, Yasutaka Nishimura, Rajan Vivek, Naoto Takeda, Sean T. Fish, Thomas Ploetz, Sonia Chernova
    ACM Transactions on Interactive Intelligent Systems, Volume 13, Issue 2, 2023
    [paper]

    We present a framework to generalize leading explainable AI techniques (Local Interpretable Model-agnostic Explanations, SHapley Additive exPlanations (SHAP), Anchors) to time series data and generate natural language explanations of human activities. We conduct user studies and more broadly discuss AI + smart home-assisted care of the sick and aging.

Projects


  • Figure from Paper

    Test-Time Training for Speaker Adaptation in Automatic Speech Recognition Systems

    Rajan Vivek, Matt Harvill
    CS 224S Spring 2024.
    [paper]

    We adapt test-time training with self-supervision (Sun et al. 2019) to the Wav2Vec2 audio foundation model, achieving reliable adaption to different speakers at test time.

  • Figure from Paper

    Synthetic Data Generation for Few Shot Learning

    Rajan Vivek, Vaishnavi Shrivastava, Ofure Ebhomielen
    CS 330 Fall 2022 (Outstanding Project Award)
    [paper]

    We propose a meta learning-based synthetic data generation strategy where image foundation models are optimized to generate or augment training data such that the performance of a downstream classifier improves.

  • Figure from Paper

    Natural Language Generation with Pixels

    Gautam Mittal, Rajan Vivek
    CS 224N Winter 2023
    [paper]

    We investigate non-autoregressive language generation by training a diffusion-based decoder that can generate plausible, coherent text rendered as images. We pair our decoder with a powerful off-the-shelf language encoder for machine translation, and compare against autoregressive transformer baselines.

  • Figure from Paper

    Can BERT Tell Me What GPT-3.5 Will Say? An Analysis of Predictive Correlations across Language Models

    Rajan Vivek
    CS 329D Spring 2023
    [paper]

    We investigate and characterize the phenomenon of strong correlations between the predictions of language models at the data point level. We show that correlations in the predictions of a dozen models from one family (e.g. BERT) can be used to estimate the behavior of models from another family (e.g. GPT 3.5). This work is closely related to our Anchor Points work.

  • Figure from Paper

    Benchmark Distillation: Selecting Representative Evaluation Subsets via Component Relevance

    Rajan Vivek
    CS 399 Winter 2023 (Independent Study with Douwe Kiela)
    [paper]

    We investigate a wide range of embedding and data selection strategies for extracting microsets, small subsets of evaluation benchmarks that reliably rank model performance. We discover that for techniques leveraging the Logarithm of Maximum Evidence, the size of the microset is lower bounded by the dimensionality of the embedding space. We partially overcome this through selecting only relevant embedding components.

  • Figure from Paper

    Designing a Reliable Crew Member

    Matthew Harvill, Rajan Vivek
    CS 238 Winter 2023
    [paper]

    We demonstrate the effectiveness of a modified Monte Carlo Tree Search with the UCB-1 selection algorithm to play the cooperative trick-taking card game The Crew.

  • Figure from Paper

    Benchmark Hill Climbing During Large Model Pretraining: Some Preliminary Investigations

    Rajan Vivek
    CS 399 Fall 2022 (Independent Study with Douwe Kiela)
    [paper]

    We propose a broad framework for rapidly predicting benchmark performance via zero-shot performance on a small number of downstream examples. We investigate a few stepping stones towards achieving this vision including assessing the correlation between zero-shot and fine-tuned performance, as well as measuring data relatedness at both the task and data point level through analysis of training dynamics.