Evaluation

Welcome to the LangSmith Evaluation documentation. The following sections help you create datasets, run evaluations, and analyze results:

Datasets: Create and manage datasets for evaluation, including creating datasets through the UI or SDK and managing existing datasets.
Evaluations: Run evaluations on your applications using various methods and techniques, including different evaluator types and evaluation techniques.
Analyze experiment results: View and analyze your evaluation results, including comparing experiments, filtering results, and downloading data.
Annotation & human feedback: Collect human feedback on your application outputs through annotation queues and inline annotation.
Tutorials: Follow step-by-step tutorials to evaluate different types of applications, from chatbots to complex agents.

For terminology definitions and core concepts, refer to the introduction on evaluation.