File Processing Automation Basics

Automate file processing tasks with OpenClaw - handle PDFs, images, CSVs, and more.

🎯 What You’ll Learn

How to use OpenClaw to automate common file processing tasks:

Read and write files
Process multiple files in batch
Convert file formats
Extract data from documents
Create automated file workflows

Real-world example: Build an automated invoice processing system.

📋 Prerequisites

✅ Completed 15-Minute Quick Start
✅ OpenClaw Gateway running
✅ Basic file system knowledge
✅ Sample files for testing

🛠️ Understanding OpenClaw’s File Tools

OpenClaw includes several file-related tools that you can use through natural language:

File reading: Read text, JSON, CSV, and other file formats
File writing: Save data to various formats
File manipulation: Copy, move, rename, organize files
Batch processing: Process multiple files at once
File system operations: Create directories, list files, search

📝 Step 1: Your First File Processing Task (5 minutes)

Start the Gateway

# Ensure gateway is running
openclaw gateway --port 18789 --verbose

Open WebChat UI

Navigate to:

http://localhost:18789

Basic File Operations

Read a file:

Read the contents of ~/Documents/report.txt and summarize the key points.

Write a file:

Create a file called summary.md in my Documents folder with a summary of today's news.

List files:

List all files in my Downloads folder created in the last 7 days.

🔄 Step 2: Processing CSV Files (10 minutes)

Reading CSV Data

Read the file sales-data.csv on my Desktop
Show me the first 10 rows
Calculate the total amount

OpenClaw will:

Locate the CSV file
Parse the data
Display the requested information
Perform the calculation

Filtering and Transforming CSV

Open orders.csv
Filter for orders where status is "completed" and amount is greater than $100
Create a new CSV file called large-completed-orders.csv with just those rows

Data Enrichment

Read products.csv
For each product, look up the current price from https://api.example.com/prices
Add a new column called "current_price"
Save the enriched data to products-with-prices.csv

📄 Step 3: Working with PDF Files (10 minutes)

Extract Text from PDF

Read the PDF file invoice-123.pdf
Extract all the text
Save it to invoice-123.txt

Extract Specific Data from PDF

Open the PDF statement.pdf
Extract all transaction data
Create a CSV file with columns: date, description, amount
Save to transactions.csv

Batch PDF Processing

Process all PDF files in my Documents/invoices folder
For each PDF:
- Extract the invoice number
- Extract the total amount
- Extract the date
Create a summary CSV file called invoice-summary.csv

🖼️ Step 4: Image Processing (8 minutes)

Image Conversion

Convert all PNG files in my Pictures folder to JPEG format
Save them to a new folder called converted-images

Image Resizing

Resize all images in ~/Photos to a maximum width of 1920px
Maintain aspect ratio
Save to ~/Photos/resized/

Batch Image Operations

For each image in ~/Downloads/screenshots:
1. Compress the image
2. Resize to max 1280x720
3. Add a watermark "Confidential"
4. Save to ~/processed-images/

📊 Step 5: Creating File Workflows (12 minutes)

Multi-Step Processing Pipeline

Process all files in my data-import folder:

1. For each CSV file:
   - Validate the data format
   - Remove duplicate rows
   - Standardize date formats
   - Save to processed/ folder

2. For each JSON file:
   - Parse and validate the JSON structure
   - Extract specific fields
   - Convert to CSV format
   - Save to processed/ folder

3. Create a summary report of all processed files

Conditional File Processing

Watch the ~/Downloads folder for new files
When a new file appears:
- If it's a PDF, extract text and save to .txt
- If it's an image, optimize and compress it
- If it's a CSV, validate the data format
- Move processed files to ~/Processed/

File Organization Automation

Organize my Downloads folder:

1. Create folders by file type:
   - Images/ (jpg, png, gif)
   - Documents/ (pdf, doc, txt)
   - Spreadsheets/ (csv, xlsx)
   - Archives/ (zip, tar, gz)

2. Move each file into the appropriate folder

3. For Images/, create subfolders by year and month

4. Generate a report of what was organized

💾 Step 6: Data Export and Import (7 minutes)

Export to Different Formats

Read data.json
Convert it to:
- CSV format (data.csv)
- Excel format (data.xlsx)
- HTML table (data.html)
- XML format (data.xml)

Import External Data

Download the latest data from https://api.example.com/data
Save it to ~/data/current-data.json
Then create a backup copy with today's date

Database Integration

Read all JSON files from ~/exports/
Import them into my SQLite database
Use table name "imported_data"
Create indexes on the date and type fields

🔧 Step 7: Error Handling and Validation (8 minutes)

File Validation

Validate all CSV files in the data folder:
- Check if files are properly formatted
- Verify required columns exist
- Check for data type mismatches
- Report any issues found

Retry Logic

Process large-file.csv
If it fails due to memory:
- Split it into chunks of 1000 rows
- Process each chunk separately
- Combine the results
- Save to processed-large-file.csv

Data Quality Checks

Read customer-data.csv
Check for:
- Missing required fields
- Invalid email addresses
- Duplicate records
- Out-of-range values
Create a quality report with all issues found

🚀 Advanced Automation Scenarios

Scenario 1: Automated Invoice Processing

Create an automated invoice processing workflow:

1. Monitor ~/inbox/invoices/ for new PDFs
2. For each new invoice:
   - Extract invoice number, date, amount, vendor
   - Validate the extracted data
   - Look up vendor information
   - Calculate due dates
   - Save to database
   - Move to processed/ folder
3. Generate daily summary reports

Scenario 2: Photo Management

Automate my photo organization:

1. Scan ~/Pictures/ for new photos
2. For each photo:
   - Extract EXIF data (date taken, camera, location)
   - Categorize by date: ~/Pictures/YYYY/MM/
   - Add tags based on folder name
   - Create thumbnails
   - Generate photo gallery index
3. Create statistics report

Scenario 3: Log File Analysis

Process server logs:

1. Read all .log files from ~/logs/
2. Extract:
   - Error messages
   - Warning messages
   - Request counts per hour
   - Response time statistics
3. Generate daily summary report
4. Alert if error rate exceeds 5%

🔍 Troubleshooting Common Issues

Issue: “File not found”

Solution: Check file path and permissions:

List files in ~/Documents/
Check if I have read permissions for report.txt

Issue: “Cannot parse file”

Solution: Validate file format:

Check if data.csv is properly formatted
Show me the first few lines to diagnose the issue

Issue: “Permission denied”

Solution: Fix file permissions:

Check permissions on ~/Documents/restricted-file.txt
Try copying it to a location I can access

Issue: “File too large”

Solution: Process in chunks:

Read large-file.json in chunks of 1000 records
Process each chunk separately
Combine results at the end

💡 Best Practices

1. Always Backup First

Before processing ~/Documents/, create a backup folder
Copy all files to ~/backup-docs-[timestamp]/
Then proceed with processing

2. Use Temporary Files

When processing large datasets:
1. Work with temporary files first
2. Validate the output
3. Only then overwrite the original files

3. Log Operations

Keep a log of all file operations:
- What was processed
- When it was processed
- Any errors encountered
- Save to file-processing-log.txt

4. Validate Before Processing

Before processing data.csv:
- Show me a sample of the data
- Check the file size
- Validate the format
- Only proceed if everything looks good

5. Clean Up Temporary Files

After processing files:
- Delete temporary files
- Clear cache directories
- Remove duplicate copies
- Report how much space was saved

📚 Understanding File Paths

OpenClaw can work with various path formats:

Absolute paths:

Read /Users/username/Documents/report.txt

Home directory shortcuts:

Read ~/Documents/report.txt
Read ~/Desktop/data.json

Relative paths:

Read ./data/input.csv
Write to ../output/results.json

Wildcards:

List all JSON files in ~/data/*.json
Process all CSV files in ~/downloads/**/*.csv

🎯 Real-World Examples

Example 1: Data Migration

Migrate data from old system:

1. Read all CSV files from ~/legacy-system/export/
2. Transform the data to match new schema
3. Validate against new system requirements
4. Import to ~/new-system/import/
5. Generate migration report
6. Archive original files to ~/backup/legacy/

Example 2: Report Generation

Generate monthly sales report:

1. Read all sales data from ~/sales/2025-03/
2. Aggregate by product category
3. Calculate totals and averages
4. Create summary charts
5. Generate PDF report
6. Email to management team
7. Archive source data

Example 3: Backup Automation

Automated backup system:

Every day at 2 AM:
1. Scan ~/Documents/ for modified files
2. Copy modified files to ~/backups/daily/[date]/
3. Compress old backups (older than 7 days)
4. Delete backups older than 30 days
5. Send me a summary report

🎯 What’s Next?

🕷️ Your First Web Scraper - Collect data from web
⛓️ Chaining Multiple Skills - Create complex workflows
🎨 Custom Skill Development - Build your own skills

🆘 Need Help?

💬 Ask OpenClaw: Describe what you want to do in plain language
📖 File System Docs - Detailed file operations reference
🌟 Community Examples - Real-world file automation examples
🐛 GitHub Issues - Report problems

⏱️ Total Time: 40 minutes 📊 Difficulty: Beginner 🎯 Result: Automating file processing tasks with OpenClaw

💡 Key Takeaways

Natural Language File Operations: Describe what you want to do with files in plain English
Built-in File Tools: OpenClaw includes comprehensive file processing capabilities
Batch Processing: Easily process multiple files at once
Format Support: Work with CSV, JSON, PDF, images, and more
Error Recovery: Robust error handling and retry mechanisms
Automation Ready: Schedule recurring file processing tasks

Next: Try automating your own file processing tasks by asking OpenClaw what you want to do!