Classifier Format

The classifier CSV should follow this format:

file_name,mime_label,config_type
Cargo.toml,toml,ci_cd
.github/workflows/ci.yml,yaml,ci_cd
.dockerignore,text,non_config
  • file_name: file to match against
  • mime_label: MIME label from a scanner
  • config_type: either ci_cd or non_config

Tips:

  • Avoid duplicate file names unless necessary
  • Normalize paths (e.g. .github/workflows/*.yml)
  • Keep MIME labels lowercase and simplified

The CSV is extensible. The more diverse your dataset, the more robust your classification becomes.