Dingo logo

Dingo

by DataEval

Dingo is a data quality evaluation tool that helps you automatically detect data quality issues in your datasets. It provides a variety of built-in rules and model evaluation methods, and also supports custom evaluation methods.

View on GitHub

Last updated: N/A

What is Dingo?

Dingo is a comprehensive data quality evaluation tool designed to automatically detect data quality issues in various datasets. It supports commonly used text datasets and multimodal datasets, including pre-training, fine-tuning, and evaluation datasets.

How to use Dingo?

Dingo can be used via local CLI and SDK, making it easy to integrate into various evaluation platforms. It supports multiple data formats (plaintext, JSON, JSONL) and data sources (local files, Hugging Face datasets). You can use built-in rules, LLM-based prompts, or create custom rules and models for evaluation. After evaluation, Dingo generates summary and detailed reports, and a frontend page for visualization.

Key features of Dingo

  • Multi-source & Multi-modal Support

  • Rule-based & Model-based Evaluation

  • Flexible Usage (CLI, SDK, Local & Spark)

  • Comprehensive Reporting (7-dimensional quality assessment)

  • Customizable Rules, Prompts and Models

Use cases of Dingo

  • Evaluating pre-training datasets

  • Evaluating fine-tuning datasets

  • Evaluating evaluation datasets

  • Detecting data quality issues in text datasets

  • Detecting data quality issues in multimodal datasets

FAQ from Dingo

What types of data sources does Dingo support?

Dingo supports local files, Hugging Face datasets, and S3 storage.

What data modalities does Dingo support?

Dingo supports text and image data modalities.

Can I create custom rules and models in Dingo?

Yes, Dingo allows you to create custom rules, prompts, and models to meet your specific evaluation needs.

How can I visualize the evaluation results?

Dingo generates a frontend page for visualization after evaluation, which can be manually started using the command python -m dingo.run.vsl --input output_directory.

What are the different rule groups available in Dingo?

Dingo provides pre-configured rule groups for different types of datasets: default, sft, and pretrain.