Prioritize Uncertain Examples
When we have a large pool of unlabeled examples that could be used in a prompt, how should we decide which examples to manually label? Active prompting is a method used to identify the most effective examples for human annotation. The process involves four key steps:- Uncertainty Estimation: Assess the uncertainty of the LLM’s predictions on each possible example
- Selection: Choose the most uncertain examples for human annotation
- Annotation: Have humans label the selected examples
- Inference: Use the newly labeled data to improve the LLM’s performance
Uncertainty Estimation
In this step, we define an unsupervised method to measure the uncertainty of an LLM in answering a given example.Uncertainty Estimation Example
Let’s say we ask an LLM the following query: query = “Classify the sentiment of this sentence as positive or negative: I am very excited today.” and the LLM returns: response = “positive” The goal of uncertainty estimation is to answer: How sure is the LLM in this response?In order to do this, we query the LLM with the same example k times. Then, we use the k responses to determine how dissimmilar these responses are. Three possible metrics are:
- Disagreement: Ratio of unique responses to total responses.
- Entropy: Measurement based on frequency of each response.
- Variance: Calculation of the spread of numerical responses.
Selection & Annotation
Once we have a set of examples and their uncertainties, we can select n of them to be annotated by humans. Here, we choose the examples with the highest uncertainties.Inference
Now, each time the LLM is prompted, we can include the newly-annotated examples.References
1: Active Prompting with Chain-of-Thought for Large Language Models (https://arxiv.org/abs/2302.12246) 2: The Prompt Report: A Systematic Survey of Prompting Techniques (https://arxiv.org/abs/2406.06608)
title: ‘Prioritize Uncertain Examples’ docname: ‘uncertain_examples’
Overview
When we have a large pool of unlabeled examples that could be used in a prompt, how should we decide which examples to manually label? Active prompting identifies effective examples for human annotation using:- Uncertainty Estimation: Measure uncertainty on each example.
- Selection: Choose the most uncertain examples for human labeling.
- Annotation: Humans label selected examples.
- Inference: Use newly labeled data to improve prompts.
Uncertainty Estimation (Disagreement)
Query the same example k times and measure disagreement: unique responses / total responses.Selection & Annotation
Select the top-n most uncertain unlabeled examples for human annotation.Inference
Use newly annotated examples as few-shot context during inference.References
- Active Prompting with Chain-of-Thought for Large Language Models (https://arxiv.org/abs/2302.12246)
- The Prompt Report: A Systematic Survey of Prompting Techniques (https://arxiv.org/abs/2406.06608)