Dingo
by DataEval
Dingo is a data quality evaluation tool that helps you automatically detect data quality issues in your datasets. It provides a variety of built-in rules and model evaluation methods, and also supports custom evaluation methods.
Last updated: N/A
What is Dingo?
Dingo is a comprehensive data quality evaluation tool designed to automatically detect data quality issues in various datasets. It supports commonly used text datasets and multimodal datasets, including pre-training, fine-tuning, and evaluation datasets.
How to use Dingo?
Dingo can be used via local CLI and SDK, making it easy to integrate into various evaluation platforms. It supports multiple data formats (plaintext, JSON, JSONL) and data sources (local files, Hugging Face datasets). You can use built-in rules, LLM-based prompts, or create custom rules and models for evaluation. After evaluation, Dingo generates summary and detailed reports, and a frontend page for visualization.
Key features of Dingo
Multi-source & Multi-modal Support
Rule-based & Model-based Evaluation
Flexible Usage (CLI, SDK, Local & Spark)
Comprehensive Reporting (7-dimensional quality assessment)
Customizable Rules, Prompts and Models
Use cases of Dingo
Evaluating pre-training datasets
Evaluating fine-tuning datasets
Evaluating evaluation datasets
Detecting data quality issues in text datasets
Detecting data quality issues in multimodal datasets
FAQ from Dingo
What types of data sources does Dingo support?
What types of data sources does Dingo support?
Dingo supports local files, Hugging Face datasets, and S3 storage.
What data modalities does Dingo support?
What data modalities does Dingo support?
Dingo supports text and image data modalities.
Can I create custom rules and models in Dingo?
Can I create custom rules and models in Dingo?
Yes, Dingo allows you to create custom rules, prompts, and models to meet your specific evaluation needs.
How can I visualize the evaluation results?
How can I visualize the evaluation results?
Dingo generates a frontend page for visualization after evaluation, which can be manually started using the command python -m dingo.run.vsl --input output_directory
.
What are the different rule groups available in Dingo?
What are the different rule groups available in Dingo?
Dingo provides pre-configured rule groups for different types of datasets: default
, sft
, and pretrain
.