Enterprise-grade Salesforce data analysis tool built with Robot Framework and Python.
Supports scanning all queryable sObjects, retrieving accurate record counts, identifying Large Data Volume (LDV) risk areas, and generating structured Excel reports. Used in enterprise environments for data volume analysis, migration planning, and storage optimization across large-scale Salesforce orgs.
The Salesforce Objects Scanner is an automation-driven framework that analyzes your Salesforce org’s data footprint by retrieving accurate record counts across all queryable objects.
Built using Robot Framework + Salesforce CLI, the tool provides a structured and reliable way to assess org size, identify large objects (LDV risks), and support migration planning.
Native Salesforce tools offer limited visibility into object-level data size and lack a unified way to analyze all sObjects. This tool addresses those gaps by delivering comprehensive, structured reporting across your entire org.
Used by SDETs, Salesforce architects, and migration teams for large-scale Salesforce org analysis.
This tool is ideal when you need to:
- Analyze Salesforce org data volume
- Identify large objects (LDV risk)
- Perform storage and usage audits
- Prepare for data migration or sandbox refresh
- Plan data cleanup initiatives
Native Salesforce tools have limitations:
- No unified way to scan all objects
- Limited visibility into object-level data size
- No structured reporting across all sObjects
- Manual and time-consuming analysis
- Designed for safe execution in large orgs without hitting long-running query risks
- LDV (Large Data Volume) detection-ready outputs for identifying high-volume objects
- Scans all queryable objects using
sf sobject list --json - Executes
SELECT COUNT()queries across standard and Tooling API objects - Smart filtering of noisy and unsupported objects
- Dynamic Tooling API discovery with fallback
- Timeout-controlled execution (prevents long-running failures)
- Robust JSON parsing (handles CLI output inconsistencies)
- Structured skip classification:
- COUNT_NOT_SUPPORTED
- REQUIRES_WHERE
- INVALID_TYPE / restricted objects
- Per-object execution time tracking
- Generates structured JSON outputs and Excel report
- Excel report structured for LDV analysis (easy sorting, filtering, pivoting)
- Clear execution summary with success and skip metrics
Supports large-scale Salesforce orgs with hundreds to thousands of objects, ensuring predictable and observable execution behavior.
Execution model:
- Control Layer: Salesforce CLI (metadata + queries)
- Orchestration Layer: Robot Framework (logic + workflow)
- Execution Layer: Process-based execution with timeout protection
- Output Layer: JSON artifacts + Excel report
This design ensures predictable, scalable, and observable execution across large Salesforce orgs. Each execution creates an isolated run folder to ensure clean, reproducible outputs.
- Robot Framework – Serves as the orchestration layer, enabling keyword-driven automation to structure the scanning workflow, manage execution flow, and keep the solution readable and maintainable
- Salesforce CLI (sf) – Acts as the control interface to Salesforce, handling authentication and executing SOQL queries such as SELECT COUNT() to retrieve object-level record counts
- Python – Provides the extensibility layer for building custom utilities, handling JSON parsing, processing results, and enhancing overall automation capabilities
- ExcelWriter utility – Custom-built reporting component that transforms raw scan results into structured Excel reports, making it easy to analyze object distribution and identify LDV (Large Data Volume) risk areas
- Python 3.8+
- Node.js (required for Salesforce CLI)
- Salesforce CLI (
sf) - Robot Framework and dependencies
Salesforce CLI requires Node.js as a runtime dependency.
git clone https://github.com/b-vamsipunnam/salesforce-objects-scanner-tool.git
cd salesforce-objects-scanner-tool
python -m venv venv
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate
pip install -r requirements.txt- Authenticate to your Salesforce org:
sf org login web --alias MyOrg
- Run the scanner by passing the org alias:
robot -d results --variable ORG_ALIAS:MyOrg src/robot/orchestrator/scan.robot
- Check outputs:
JSON files : output/ Excel report : output/SF_Objects_<datetime>.xlsx Logs & reports : results/
salesforce-objects-scanner/
├── .github/
│ ├── workflows/
│ │ └── robot-ci.yml # GitHub Actions CI
│ └── PULL_REQUEST_TEMPLATE.md # Pull request template
├── ci/
│ └── robot/
│ └── smoke.robot
├── output/ # Generated runtime outputs
│ └── Run_<datetime>_<uuid>/ # Isolated folder for each execution
│ ├── json/ # Structured JSON artifacts
│ │ ├── data_<datetime>.json
│ │ ├── tooling_<datetime>.json
│ │ ├── skipped_<datetime>.json
│ │ └── durations_<datetime>.json
│ └── SF_Objects_<datetime>.xlsx # Consolidated Excel report
├── results/ # Robot Framework execution logs
│ ├── log.html
│ ├── output.xml
│ └── report.html
├── src/
│ └── robot/
│ ├── libraries/
│ │ └── ExcelWriter.py
│ ├── orchestrator/
│ │ └── scan.robot
│ └── resources/
│ └── keywords.robot # Core workflow and keywords
├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── README.md
├── requirements.txt
└── SECURITY.md
| Variable | Required | Default Value | Description |
|---|---|---|---|
${ORG_ALIAS} |
Yes | — | Salesforce org alias (passed via CLI) |
${INCLUDE_TOOLING} |
No | ${TRUE} | Include Tooling API objects |
${DISCOVER_TOOLING_OBJECTS} |
No | ${TRUE} | Dynamically discover Tooling objects |
${DELAY_SECONDS} |
No | 0.1 | Delay between queries |
${MAX_QUERY_TIMEOUT_SECONDS} |
No | 120 | Per-query timeout (in seconds) |
--variable ORG_ALIAS:<your_org>robot -d results --variable ORG_ALIAS:DeveloperOrg src/robot/orchestrator/scan.robot| File | Purpose |
|---|---|
data_<datetime>.json |
Record counts for standard objects |
tooling_<datetime>.json |
Record counts for tooling objects |
skipped_<datetime>.json |
Skipped objects with reasons |
durations_<datetime>.json |
Execution time per object |
| File | Purpose |
|---|---|
SF_Objects_<datetime>.xlsx |
Consolidated report with record counts, execution times, and LDV analysis support |
Sort Excel by record count to quickly identify top LDV objects.
Typical console output:
Starting for org: DeveloperOrg
Raw objects found: 1500+
After filter: 800+
[Standard]-[1/800] Account: 1245 (t=0.9s)
Querying Tooling API objects...
===== SUMMARY =====
Success(Data): 780
Success(Tooling): 28
Skipped: 22
=====================
- Each object is queried independently with timeout protection
- Skipped objects are classified and logged with clear reasons
- Execution time is tracked per object
- Results are stored in structured JSON and Excel formats
- Validates ORG_ALIAS at runtime and fails fast if not provided
- COUNT() queries may be slow for very large datasets (millions of records)
- Some objects require WHERE clauses and are skipped
- Dependent on Salesforce CLI output format
- Subject to Salesforce governor limits, query timeouts, and API request limits
- Certain objects (e.g., EventLogFile, Big Objects) may have special behaviors
- Very large orgs may take 30-50+ minutes depending on size and network conditions
- Designed for headless execution
- Works with GitHub Actions, Jenkins, Azure DevOps
- No manual setup required
- Run during off-peak hours for large orgs
- Use a sandbox for initial analysis
- Increase
${MAX_QUERY_TIMEOUT_SECONDS}for very large objects - Reduce
${DELAY_SECONDS}carefully to balance speed vs stability
Planned enhancements:
- Parallel execution (Pabot integration for large org performance)
- Resume capability for long scans
- Cross-platform support improvements
- Advanced analytics (top objects, trends)
| Issue | Cause | Fix |
|---|---|---|
| sf not found | CLI not installed | Install Salesforce CLI |
| Org alias not found | Not authenticated | sf org login web |
| JSON parse error | CLI warnings | Safe Parse handles most cases |
| No results / empty output | Auth expired or permissions issue | Re-run sf org login web or check permissions |
- No credentials stored in code
- Uses Salesforce CLI authentication
- Sensitive files should be excluded via .gitignore
Contributions are welcome!
- Open issues for bugs
- Submit pull requests for improvements
- Follow existing coding patterns
- Performance improvements
- Better error classification
- Additional output formats
- Feature enhancements
If this project helps you:
- Star ⭐ the repository
- Share feedback
- Open issues for improvements
Bhimeswara Vamsi Punnam — Lead Software Development Engineer in Test (SDET)
This project is licensed under the MIT License.
See the LICENSE file for full terms and conditions.
