- Datasets: Create and manage datasets for evaluation, including creating datasets through the UI or SDK and managing existing datasets.
- Evaluations: Run evaluations on your applications using various methods and techniques, including different evaluator types and evaluation techniques.
- Analyze experiment results: View and analyze your evaluation results, including comparing experiments, filtering results, and downloading data.
- Annotation & human feedback: Collect human feedback on your application outputs through annotation queues and inline annotation.
- Tutorials: Follow step-by-step tutorials to evaluate different types of applications, from chatbots to complex agents.