Confignet
Confignet is a lightweight, pluggable configuration file classifier built in Rust. It’s designed to identify CI/CD-related configuration files from a given project using a fast, Levenshtein-distance-based matching system over a CSV training set.
Built for integration into larger systems like dodo
, Confignet allows intelligent automation pipelines to skip irrelevant files and focus only on what matters: CI/CD infrastructure.
- 🧠 Zero-network AI
- ⚡ Fast, accurate lookup
- 🧩 Simple CSV-based extensibility
- 📦 Available as a library or a CLI tool
Confignet is ideal for:
- Classifying files detected by file-type scanners (e.g. Magika)
- Filtering config files before parsing them
- Auto-generating structured project pipelines
Getting Started
Installation
You can add Confignet to your project by adding this line to your Cargo.toml
:
confignet = "0.1"
Or install the CLI tool locally:
cargo install --path .
CLI Usage
confignet <file_path> <mime_type>
Example:
confignet ./Cargo.toml toml
This will output:
{
"file_name": "Cargo.toml",
"file_path": "./Cargo.toml",
"is_ci_cd": true
}
How It Works
Confignet is powered by a simple but effective heuristic system:
- The
ConfigClassifier
is built from a CSV of known config files with associated MIME types and their labels (e.g., ci_cd or non_config). - When a file is passed to Confignet:
- It extracts the filename from the full path.
- It compares it against the CSV using Levenshtein distance on MIME-matched entries.
- If a best match is found, the classifier returns:
file_name
: matched entry name from CSVfile_path
: reconstructed absolute or relative pathis_ci_cd
: boolean indicating whether the file is related to CI/CD
It is designed for speed, accuracy, and pluggability in environments like local inference pipelines.
Integration in Projects
Confignet is designed to be embedded easily.
As a Library
Import it in your Rust project:
#![allow(unused)] fn main() { use confignet::ConfigClassifier; let classifier = ConfigClassifier::from_csv("data/labeled/ci_cd.csv")?; let result = classifier.classify("Cargo.toml", "toml"); }
As a CLI in Automation Pipelines
Use Magika (or similar tool) to detect file types:
magika path/to/file | jq '.mimetype'
Then pass the result to Confignet:
confignet path/to/file toml
Pipe JSON output to your parser or decision logic.
In dodo
Confignet is integrated directly into dodo
to:
- Skip non-CI/CD files
- Send CI/CD-related configs to parsers
- Build
dodo.toml
incrementally
Classifier Format
The classifier CSV should follow this format:
file_name,mime_label,config_type
Cargo.toml,toml,ci_cd
.github/workflows/ci.yml,yaml,ci_cd
.dockerignore,text,non_config
file_name
: file to match againstmime_label
: MIME label from a scannerconfig_type
: eitherci_cd
ornon_config
Tips:
- Avoid duplicate file names unless necessary
- Normalize paths (e.g.
.github/workflows/*.yml
) - Keep MIME labels lowercase and simplified
The CSV is extensible. The more diverse your dataset, the more robust your classification becomes.
API Reference
This page documents the public API of the Confignet library. If you are embedding Confignet into another tool (like Dodo), you’ll primarily interact with the ConfigClassifier
type.
Structs
ConfigRecord
A deserialized record from the classifier CSV.
#![allow(unused)] fn main() { pub struct ConfigRecord { pub file_name: String, pub mime_label: String, pub config_type: String, } }
Fields:
file_name
: The canonical file name for comparison (e.g.Cargo.toml
)mime_label
: The mime-type label assigned to the file (e.g.toml
,yaml
)config_type
: Either a type likeci_cd
,build
, ornon_config
This struct is used internally by the classifier.
ConfigClassifier
The main classifier struct that loads and queries classification rules.
#![allow(unused)] fn main() { pub struct ConfigClassifier { // Hidden internals } }
Constructor
#![allow(unused)] fn main() { pub fn from_csv<P: AsRef<Path>>(path: P) -> Result<Self> }
Loads a ConfigClassifier
from a given CSV file.
path
: The path to the.csv
file- Returns:
Result<ConfigClassifier>
Usage:
#![allow(unused)] fn main() { let classifier = ConfigClassifier::from_csv("data/labeled/ci_cd.csv")?; }
Method
#![allow(unused)] fn main() { pub fn classify(&self, file_name: &str, mime_label: &str) -> Option<ClassifiedResult> }
Attempts to classify a file given its name and mime type.
file_name
: Name of the file (e.g.,main.rs
,Dockerfile
)mime_label
: Mime type label from tools like Magika (e.g.,toml
,json
)- Returns:
Option<ClassifiedResult>
, orNone
if no suitable match is found
Example:
#![allow(unused)] fn main() { let result = classifier.classify("Cargo.toml", "toml"); }
Structs
ClassifiedResult
Returned from classify()
if a match is found.
#![allow(unused)] fn main() { pub struct ClassifiedResult { pub file_name: String, pub is_ci_cd: bool, } }
Fields:
file_name
: The best-matching canonical file name (e.g., from CSV)is_ci_cd
: Whether this file is used for CI/CD based onconfig_type
Internal Utilities
Confignet also includes a Levenshtein distance utility for fuzzy file matching:
#![allow(unused)] fn main() { fn levenshtein(a: &str, b: &str) -> usize }
This is used internally in classify()
to find the closest filename match in the dataset when multiple candidates exist with the same mime type.
Example Integration
#![allow(unused)] fn main() { use confignet::{ConfigClassifier, ClassifiedResult}; let classifier = ConfigClassifier::from_csv("data/labeled/ci_cd.csv")?; let result = classifier.classify("Dockerfile.ci", "text"); match result { Some(r) => println!("File: {}, Is CI/CD? {}", r.file_name, r.is_ci_cd), None => println!("Unrecognized file"), } }
Troubleshooting
❌ Error: No match found
- Ensure the MIME type is correct
- Add more diverse entries to the CSV
- Normalize file names
❌ Panic: Failed to extract file name
- Ensure you are passing valid paths
- Use PathBuf methods to extract names reliably
❌ Invalid CSV format
- Check for unescaped commas or quotes
- All rows must follow
file_name,mime_label,config_type
❌ All results return is_ci_cd: false
- Check your
config_type
column values - Add more known CI/CD examples to improve accuracy
✅ Tip
Use tools like magika
, file
, or xdg-mime
to generate MIME labels.