Insights · Agentic AI

VibeCleaner: A Local-First, Human-in-the-Loop Agent for Natural-Language File Organization

By Dr. Vivek Gupta, Founder & CEO · August 2025

Author: Vivek Gupta (Softsensor.ai) Keywords: local-first computing, human-in-the-loop, agentic AI, natural-language interfaces, personal information management, safety-critical automation. Vibecleaner

Abstract

We present VibeCleaner, an experimental, local-first agent that translates plain-English intents into deterministic, auditable file actions for desktop hygiene. This manuscript integrates the full repo experience—capabilities, guardrails, usage—and, critically, empirical findings from real deployments that surface persistent failure modes in current agentic AI: context loss, attention drift, provider divergence, and archive-explosion cascades. Across two test environments (579→15,323 files), we document what worked (hash-based dedup, date-based rules, batch sizes of 20–30) and what failed (OCR-driven renaming at scale, unzip-then-organize without archive semantics). We provide mitigations (checkpoints, chunked subplans, rule fallbacks), a reproducible YAML policy, and a contributor roadmap.

1. Introduction

Downloads is where heterogenous artifacts—PDFs, installers, screenshots, exports, archives—accumulate. Rule-based tools are predictable yet require users to encode brittle recipes; unconstrained agents are convenient yet opaque and risky for local file ops. VibeCleaner bridges the gap: users state outcomes (“organize by type and date; archive >30 days; remove exact duplicates—show me first”) while a deterministic executor performs safe, reversible actions under explicit human approval. AI planning is optional; previews, logs, undo, and whitelists are mandatory.

2. System Overview

Intent → Plan → Guardrails → Execute → Ledger/Undo

3. Capabilities and Features

3.1 What VibeCleaner does

3.2 Natural-language interface

Understands requests like “organize my downloads,” “delete old stuff,” “find and remove duplicate photos.” If AI is unavailable or confidence is low, the system reverts to rules.

4. Configuration

4.1 Minimal quick-start YAML (user-facing)

# File: ~/.vibecleaner.yml
downloads_path: ~/Downloads

organize:
  Documents:
    extensions: [pdf, doc, docx, txt, odt]
    path: ~/Documents/Downloads
  Images:
    extensions: [jpg, jpeg, png, gif, svg, webp]
    path: ~/Pictures/Downloads
  Videos:
    extensions: [mp4, avi, mkv, mov, wmv]
    path: ~/Videos/Downloads
  Archives:
    extensions: [zip, rar, 7z, tar, gz]
    path: ~/Downloads/Archives

cleanup:
  delete_after_days: 90
  archive_after_days: 30
  min_file_size: 1MB
  remove_duplicates: true

safety:
  dry_run_default: true
  backup_before_delete: true
  whitelist_patterns:
    - "important_*"
    - "*.key"
    - "*.license"

4.2 Conservative policy used in experiments

version: 0.3
paths:
  watch: ~/Downloads
  destinations:
    documents: ~/Documents/Downloads
    images: ~/Pictures/Downloads
    videos: ~/Videos/Downloads
    archives: ~/Archives/Downloads
    audio: ~/Music/Downloads
    code: ~/Dev/Downloads
policy:
  dry_run: true
  backup_before_delete: true
  trash_deletes: true
  safeguards:
    min_file_size_bytes: 10240
    max_move_count: 5000
  whitelist:
    - "*license*"
    - "*apikey*"
    - "*.pem"
    - "*.key"
  age:
    archive_older_than_days: 30
    delete_installer_older_than_days: 90
  dedup:
    strategy: exact
    methods: [sha256]
rules:
  - name: Organize by type and date
    if:
      any:
        - extension_in: [".pdf", ".doc", ".docx", ".txt"]
        - extension_in: [".png", ".jpg", ".jpeg"]
        - extension_in: [".mp4", ".mov", ".mkv"]
        - extension_in: [".zip", ".tar", ".gz", ".7z"]
        - extension_in: [".mp3", ".wav", ".flac"]
        - extension_in: [".py", ".ipynb", ".js", ".ts"]
    then:
      - move_to: "{category_destination}"
      - subfolder_by: "YYYY/MM"
  - name: Delete old installers
    if: { all: [ { extension_in: [".dmg", ".pkg", ".exe", ".msi"] }, { older_than_days: 90 } ] }
    then: [ { delete: "safe" } ]
  - name: Archive large old files
    if: { all: [ { larger_than_mb: 200 }, { older_than_days: 45 } ] }
    then:
      - move_to: "~/Archives/Downloads/large_old"
      - compress: "zip"
scheduling:
  enabled: false
  cron: "0 9 * * *"
llm:
  enabled: false
  provider: "none"   # e.g., "openai", "anthropic"
  max_actions: 500
  require_confirm: true

5. Methods and Experimental Design

6. Empirical Findings

6.1 E1: OneDrive Workspace — small-scale success

6.2 E2: Main Downloads — the archive-explosion failure

6.3 What failed (systematically)

6.4 What worked (reliably)

7. Workarounds and Engineering Mitigations

8. Cross-Provider Learning Experiment (Codex vs Claude)

9. The Scaling Trap — Root-Cause Analysis

11. Usage

11.1 AI-powered (recommended for non-technical users)

# Initialize
vibecleaner init

# Natural-language asks
vibecleaner ask "clean up my messy downloads folder"
vibecleaner ask "find and delete duplicate photos"
vibecleaner ask "organize PDFs from last month"
vibecleaner ask "what files are taking up the most space?"

# Interactive chat
vibecleaner chat

# Apply suggestions automatically (use with care)
vibecleaner ask "remove old files" --apply

11.2 Manual (deterministic) commands

vibecleaner clean --dry-run
vibecleaner clean
vibecleaner clean --older-than 30 --duplicates
vibecleaner watch ~/Downloads
vibecleaner schedule --daily --time 09:00

12. Advanced Features & Scheduling

# Custom rule
vibecleaner rule add --name "Screenshots" \
  --pattern "Screen Shot*" \
  --destination ~/Pictures/Screenshots

# Remove old large items
vibecleaner clean --older-than 60d --min-size 100MB

# Deduplicate and keep newest
vibecleaner duplicates --remove --keep-newest

# Cross-platform scheduling
vibecleaner schedule --cron "0 9 * * *"     # Linux/macOS
vibecleaner schedule --windows --daily --time 09:00
vibecleaner daemon start

13. Safety, Privacy, and Threat Model

No cloud dependency; all processing is local. Every action is logged; deletes go to Trash; undo is supported. Primary threat is unintended moves/deletes; mitigations include dry-run, whitelists, backups, move caps, and explicit approval for destructive actions.

14. Limitations

OS heterogeneity; lack of formal verification for safety invariants; provider divergence; weak archive semantics; OCR-driven renaming can cause irreversible chaos at scale without strict constraints.

15. Open Problems & Call for Contributors

We invite contributions on: large-folder context management (≥1k files), attention systems for long operations, robust prompting, edge-case handling, provider integration (Claude/Codex, plus more), learning systems for folder-pattern recognition, and large-scale testing on messy real-world corpora. See CONTRSIBUTING.md.

16. License & Availability

MIT-licensed open source. Code and experimental ledgers reside in the repository. VibeCleaner is designed to be local-first, explainable, and reversible—an agentic UX that stays accountable.

Appendix A — Minimal YAML (quick start)

(See §4.1.)

Appendix B — Command Reference

vibecleaner init (init config) • vibecleaner clean (clean) • vibecleaner watch (watch folder) • vibecleaner schedule (auto clean) • vibecleaner undo (rollback) • vibecleaner stats (metrics) • vibecleaner config (edit config)

Author’s Note. The core lesson is practical: LLM proposes; policies constrain; user approves; executor guarantees. This pattern, coupled with local-first constraints, is a viable template for trustworthy edge automation beyond Downloads hygiene.

Originally published on LinkedIn →

← All insights