Uncertain examples

Prioritize Uncertain Examples
Uncertainty Estimation
Uncertainty Estimation Example
Selection & Annotation
Inference
References
1: Active Prompting with Chain-of-Thought for Large Language Models (https://arxiv.org/abs/2302.12246) 2: The Prompt Report: A Systematic Survey of Prompting Techniques (https://arxiv.org/abs/2406.06608)
title: ‘Prioritize Uncertain Examples’ docname: ‘uncertain_examples’
Overview
Uncertainty Estimation (Disagreement)
Selection & Annotation
Inference
References

Prioritize Uncertain Examples

When we have a large pool of unlabeled examples that could be used in a prompt, how should we decide which examples to manually label? Active prompting is a method used to identify the most effective examples for human annotation. The process involves four key steps:

Uncertainty Estimation: Assess the uncertainty of the LLM’s predictions on each possible example
Selection: Choose the most uncertain examples for human annotation
Annotation: Have humans label the selected examples
Inference: Use the newly labeled data to improve the LLM’s performance

Uncertainty Estimation

In this step, we define an unsupervised method to measure the uncertainty of an LLM in answering a given example.

Uncertainty Estimation Example

Let’s say we ask an LLM the following query: query = “Classify the sentiment of this sentence as positive or negative: I am very excited today.” and the LLM returns: response = “positive” The goal of uncertainty estimation is to answer: How sure is the LLM in this response?

In order to do this, we query the LLM with the same example k times. Then, we use the k responses to determine how dissimmilar these responses are. Three possible metrics are:

Disagreement: Ratio of unique responses to total responses.
Entropy: Measurement based on frequency of each response.
Variance: Calculation of the spread of numerical responses.

Below is an example of uncertainty estimation for a single input example using the disagreement uncertainty metric.

import instructor
from pydantic import BaseModel
from openai import OpenAI


class Response(BaseModel):
    height: int


client = instructor.from_openai(OpenAI())


def query_llm():
    return client.chat.completions.create(
        model="gpt-4o",
        response_model=Response,
        messages=[
            {
                "role": "user",
                "content": "How tall is the Empire State Building in meters?",
            }
        ],
    )


def calculate_disagreement(responses):
    unique_responses = set(responses)
    h = len(unique_responses)
    return h / k


if __name__ == "__main__":
    k = 5
    responses = [query_llm() for _ in range(k)]  # Query the LLM k times
    for response in responses:
        print(response)
        #> height=443
        #> height=443
        #> height=443
        #> height=443
        #> height=381

    print(
        calculate_disagreement([response.height for response in responses])
    )  # Calculate the uncertainty metric
    #> 0.4

This process will then be repeated for all unlabeled examples.

Selection & Annotation

Once we have a set of examples and their uncertainties, we can select n of them to be annotated by humans. Here, we choose the examples with the highest uncertainties.

Inference

Now, each time the LLM is prompted, we can include the newly-annotated examples.

References

1: Active Prompting with Chain-of-Thought for Large Language Models (https://arxiv.org/abs/2302.12246) 2: The Prompt Report: A Systematic Survey of Prompting Techniques (https://arxiv.org/abs/2406.06608)

title: ‘Prioritize Uncertain Examples’ docname: ‘uncertain_examples’

Overview

When we have a large pool of unlabeled examples that could be used in a prompt, how should we decide which examples to manually label? Active prompting identifies effective examples for human annotation using:

Uncertainty Estimation: Measure uncertainty on each example.
Selection: Choose the most uncertain examples for human labeling.
Annotation: Humans label selected examples.
Inference: Use newly labeled data to improve prompts.

Uncertainty Estimation (Disagreement)

Query the same example k times and measure disagreement: unique responses / total responses.

<?php
require 'examples/boot.php';

use Cognesy\Instructor\Extras\Scalar\Scalar;
use Cognesy\Instructor\StructuredOutput;

class EstimateUncertainty {
    public function __invoke(int $k = 5) : float {
        $values = [];
        for ($i = 0; $i < $k; $i++) {
            $values[] = $this->queryHeight();
        }
        return $this->disagreement($values);
    }

    private function queryHeight() : int {
        return (new StructuredOutput)->with(
            messages: [['role' => 'user', 'content' => 'How tall is the Empire State Building in meters?']],
            responseModel: Scalar::integer('height'),
        )->get();
    }

    private function disagreement(array $responses) : float {
        $n = count($responses);
        if ($n === 0) return 0.0;
        return count(array_unique($responses)) / $n;
    }
}

$score = (new EstimateUncertainty)(k: 5);
dump($score);
?>

Selection & Annotation

Select the top-n most uncertain unlabeled examples for human annotation.

Inference

Use newly annotated examples as few-shot context during inference.

References

Active Prompting with Chain-of-Thought for Large Language Models (https://arxiv.org/abs/2302.12246)
The Prompt Report: A Systematic Survey of Prompting Techniques (https://arxiv.org/abs/2406.06608)

Structure The Reasoning Combine Multiple Reasoning Chains

⌘I

Cookbook

Cookbook \ Instructor \ Basics

Cookbook \ Instructor \ Advanced

Cookbook \ Instructor \ Troubleshooting

Cookbook \ Instructor \ LLM API Support

Cookbook \ Instructor \ Extras

Cookbook \ Polyglot \ LLM Basics

Cookbook \ Polyglot \ LLM Advanced

Cookbook \ Polyglot \ LLM Troubleshooting

Cookbook \ Polyglot \ LLM API Support

Cookbook \ Polyglot \ LLM Extras

Cookbook \ Prompting \ Zero-Shot Prompting

Cookbook \ Prompting \ Few-Shot Prompting

Cookbook \ Prompting \ Thought Generation

Cookbook \ Prompting \ Ensembling

Cookbook \ Prompting \ Self-Criticism

Cookbook \ Prompting \ Decomposition

Cookbook \ Prompting \ Miscellaneous

Prioritize Uncertain Examples

Uncertainty Estimation

Uncertainty Estimation Example

Selection & Annotation

Inference

References

1: Active Prompting with Chain-of-Thought for Large Language Models (https://arxiv.org/abs/2302.12246) 2: The Prompt Report: A Systematic Survey of Prompting Techniques (https://arxiv.org/abs/2406.06608)

title: ‘Prioritize Uncertain Examples’ docname: ‘uncertain_examples’

Overview

Uncertainty Estimation (Disagreement)

Selection & Annotation

Inference

References

Cookbook

Cookbook \ Instructor \ Basics

Cookbook \ Instructor \ Advanced

Cookbook \ Instructor \ Troubleshooting

Cookbook \ Instructor \ LLM API Support

Cookbook \ Instructor \ Extras

Cookbook \ Polyglot \ LLM Basics

Cookbook \ Polyglot \ LLM Advanced

Cookbook \ Polyglot \ LLM Troubleshooting

Cookbook \ Polyglot \ LLM API Support

Cookbook \ Polyglot \ LLM Extras

Cookbook \ Prompting \ Zero-Shot Prompting

Cookbook \ Prompting \ Few-Shot Prompting

Cookbook \ Prompting \ Thought Generation

Cookbook \ Prompting \ Ensembling

Cookbook \ Prompting \ Self-Criticism

Cookbook \ Prompting \ Decomposition

Cookbook \ Prompting \ Miscellaneous

​Prioritize Uncertain Examples

​Uncertainty Estimation

​Uncertainty Estimation Example

​Selection & Annotation

​Inference

​References

​1: Active Prompting with Chain-of-Thought for Large Language Models (https://arxiv.org/abs/2302.12246) 2: The Prompt Report: A Systematic Survey of Prompting Techniques (https://arxiv.org/abs/2406.06608)

​title: ‘Prioritize Uncertain Examples’ docname: ‘uncertain_examples’

​Overview

​Uncertainty Estimation (Disagreement)

​Selection & Annotation

​Inference

​References

Prioritize Uncertain Examples

Uncertainty Estimation

Uncertainty Estimation Example

Selection & Annotation

Inference

References

1: Active Prompting with Chain-of-Thought for Large Language Models (https://arxiv.org/abs/2302.12246) 2: The Prompt Report: A Systematic Survey of Prompting Techniques (https://arxiv.org/abs/2406.06608)

title: ‘Prioritize Uncertain Examples’ docname: ‘uncertain_examples’

Overview

Uncertainty Estimation (Disagreement)

Selection & Annotation

Inference

References