File Processing Automation Basics
Automate file processing tasks with OpenClaw - handle PDFs, images, CSVs, and more.
π― What Youβll Learn
How to use OpenClaw to automate common file processing tasks:
- Read and write files
- Process multiple files in batch
- Convert file formats
- Extract data from documents
- Create automated file workflows
Real-world example: Build an automated invoice processing system.
π Prerequisites
- β Completed 15-Minute Quick Start
- β OpenClaw Gateway running
- β Basic file system knowledge
- β Sample files for testing
π οΈ Understanding OpenClawβs File Tools
OpenClaw includes several file-related tools that you can use through natural language:
- File reading: Read text, JSON, CSV, and other file formats
- File writing: Save data to various formats
- File manipulation: Copy, move, rename, organize files
- Batch processing: Process multiple files at once
- File system operations: Create directories, list files, search
π Step 1: Your First File Processing Task (5 minutes)
Start the Gateway
# Ensure gateway is running
openclaw gateway --port 18789 --verbose
Open WebChat UI
Navigate to:
http://localhost:18789
Basic File Operations
Read a file:
Read the contents of ~/Documents/report.txt and summarize the key points.
Write a file:
Create a file called summary.md in my Documents folder with a summary of today's news.
List files:
List all files in my Downloads folder created in the last 7 days.
π Step 2: Processing CSV Files (10 minutes)
Reading CSV Data
Read the file sales-data.csv on my Desktop
Show me the first 10 rows
Calculate the total amount
OpenClaw will:
- Locate the CSV file
- Parse the data
- Display the requested information
- Perform the calculation
Filtering and Transforming CSV
Open orders.csv
Filter for orders where status is "completed" and amount is greater than $100
Create a new CSV file called large-completed-orders.csv with just those rows
Data Enrichment
Read products.csv
For each product, look up the current price from https://api.example.com/prices
Add a new column called "current_price"
Save the enriched data to products-with-prices.csv
π Step 3: Working with PDF Files (10 minutes)
Extract Text from PDF
Read the PDF file invoice-123.pdf
Extract all the text
Save it to invoice-123.txt
Extract Specific Data from PDF
Open the PDF statement.pdf
Extract all transaction data
Create a CSV file with columns: date, description, amount
Save to transactions.csv
Batch PDF Processing
Process all PDF files in my Documents/invoices folder
For each PDF:
- Extract the invoice number
- Extract the total amount
- Extract the date
Create a summary CSV file called invoice-summary.csv
πΌοΈ Step 4: Image Processing (8 minutes)
Image Conversion
Convert all PNG files in my Pictures folder to JPEG format
Save them to a new folder called converted-images
Image Resizing
Resize all images in ~/Photos to a maximum width of 1920px
Maintain aspect ratio
Save to ~/Photos/resized/
Batch Image Operations
For each image in ~/Downloads/screenshots:
1. Compress the image
2. Resize to max 1280x720
3. Add a watermark "Confidential"
4. Save to ~/processed-images/
π Step 5: Creating File Workflows (12 minutes)
Multi-Step Processing Pipeline
Process all files in my data-import folder:
1. For each CSV file:
- Validate the data format
- Remove duplicate rows
- Standardize date formats
- Save to processed/ folder
2. For each JSON file:
- Parse and validate the JSON structure
- Extract specific fields
- Convert to CSV format
- Save to processed/ folder
3. Create a summary report of all processed files
Conditional File Processing
Watch the ~/Downloads folder for new files
When a new file appears:
- If it's a PDF, extract text and save to .txt
- If it's an image, optimize and compress it
- If it's a CSV, validate the data format
- Move processed files to ~/Processed/
File Organization Automation
Organize my Downloads folder:
1. Create folders by file type:
- Images/ (jpg, png, gif)
- Documents/ (pdf, doc, txt)
- Spreadsheets/ (csv, xlsx)
- Archives/ (zip, tar, gz)
2. Move each file into the appropriate folder
3. For Images/, create subfolders by year and month
4. Generate a report of what was organized
πΎ Step 6: Data Export and Import (7 minutes)
Export to Different Formats
Read data.json
Convert it to:
- CSV format (data.csv)
- Excel format (data.xlsx)
- HTML table (data.html)
- XML format (data.xml)
Import External Data
Download the latest data from https://api.example.com/data
Save it to ~/data/current-data.json
Then create a backup copy with today's date
Database Integration
Read all JSON files from ~/exports/
Import them into my SQLite database
Use table name "imported_data"
Create indexes on the date and type fields
π§ Step 7: Error Handling and Validation (8 minutes)
File Validation
Validate all CSV files in the data folder:
- Check if files are properly formatted
- Verify required columns exist
- Check for data type mismatches
- Report any issues found
Retry Logic
Process large-file.csv
If it fails due to memory:
- Split it into chunks of 1000 rows
- Process each chunk separately
- Combine the results
- Save to processed-large-file.csv
Data Quality Checks
Read customer-data.csv
Check for:
- Missing required fields
- Invalid email addresses
- Duplicate records
- Out-of-range values
Create a quality report with all issues found
π Advanced Automation Scenarios
Scenario 1: Automated Invoice Processing
Create an automated invoice processing workflow:
1. Monitor ~/inbox/invoices/ for new PDFs
2. For each new invoice:
- Extract invoice number, date, amount, vendor
- Validate the extracted data
- Look up vendor information
- Calculate due dates
- Save to database
- Move to processed/ folder
3. Generate daily summary reports
Scenario 2: Photo Management
Automate my photo organization:
1. Scan ~/Pictures/ for new photos
2. For each photo:
- Extract EXIF data (date taken, camera, location)
- Categorize by date: ~/Pictures/YYYY/MM/
- Add tags based on folder name
- Create thumbnails
- Generate photo gallery index
3. Create statistics report
Scenario 3: Log File Analysis
Process server logs:
1. Read all .log files from ~/logs/
2. Extract:
- Error messages
- Warning messages
- Request counts per hour
- Response time statistics
3. Generate daily summary report
4. Alert if error rate exceeds 5%
π Troubleshooting Common Issues
Issue: βFile not foundβ
Solution: Check file path and permissions:
List files in ~/Documents/
Check if I have read permissions for report.txt
Issue: βCannot parse fileβ
Solution: Validate file format:
Check if data.csv is properly formatted
Show me the first few lines to diagnose the issue
Issue: βPermission deniedβ
Solution: Fix file permissions:
Check permissions on ~/Documents/restricted-file.txt
Try copying it to a location I can access
Issue: βFile too largeβ
Solution: Process in chunks:
Read large-file.json in chunks of 1000 records
Process each chunk separately
Combine results at the end
π‘ Best Practices
1. Always Backup First
Before processing ~/Documents/, create a backup folder
Copy all files to ~/backup-docs-[timestamp]/
Then proceed with processing
2. Use Temporary Files
When processing large datasets:
1. Work with temporary files first
2. Validate the output
3. Only then overwrite the original files
3. Log Operations
Keep a log of all file operations:
- What was processed
- When it was processed
- Any errors encountered
- Save to file-processing-log.txt
4. Validate Before Processing
Before processing data.csv:
- Show me a sample of the data
- Check the file size
- Validate the format
- Only proceed if everything looks good
5. Clean Up Temporary Files
After processing files:
- Delete temporary files
- Clear cache directories
- Remove duplicate copies
- Report how much space was saved
π Understanding File Paths
OpenClaw can work with various path formats:
Absolute paths:
Read /Users/username/Documents/report.txt
Home directory shortcuts:
Read ~/Documents/report.txt
Read ~/Desktop/data.json
Relative paths:
Read ./data/input.csv
Write to ../output/results.json
Wildcards:
List all JSON files in ~/data/*.json
Process all CSV files in ~/downloads/**/*.csv
π― Real-World Examples
Example 1: Data Migration
Migrate data from old system:
1. Read all CSV files from ~/legacy-system/export/
2. Transform the data to match new schema
3. Validate against new system requirements
4. Import to ~/new-system/import/
5. Generate migration report
6. Archive original files to ~/backup/legacy/
Example 2: Report Generation
Generate monthly sales report:
1. Read all sales data from ~/sales/2025-03/
2. Aggregate by product category
3. Calculate totals and averages
4. Create summary charts
5. Generate PDF report
6. Email to management team
7. Archive source data
Example 3: Backup Automation
Automated backup system:
Every day at 2 AM:
1. Scan ~/Documents/ for modified files
2. Copy modified files to ~/backups/daily/[date]/
3. Compress old backups (older than 7 days)
4. Delete backups older than 30 days
5. Send me a summary report
π― Whatβs Next?
- π·οΈ Your First Web Scraper - Collect data from web
- βοΈ Chaining Multiple Skills - Create complex workflows
- π¨ Custom Skill Development - Build your own skills
π Need Help?
- π¬ Ask OpenClaw: Describe what you want to do in plain language
- π File System Docs - Detailed file operations reference
- π Community Examples - Real-world file automation examples
- π GitHub Issues - Report problems
β±οΈ Total Time: 40 minutes π Difficulty: Beginner π― Result: Automating file processing tasks with OpenClaw
π‘ Key Takeaways
- Natural Language File Operations: Describe what you want to do with files in plain English
- Built-in File Tools: OpenClaw includes comprehensive file processing capabilities
- Batch Processing: Easily process multiple files at once
- Format Support: Work with CSV, JSON, PDF, images, and more
- Error Recovery: Robust error handling and retry mechanisms
- Automation Ready: Schedule recurring file processing tasks
Next: Try automating your own file processing tasks by asking OpenClaw what you want to do!
Congratulations!
You've completed this tutorial. Ready for the next challenge?