Salesforce Objects Scanner

Enterprise-grade Salesforce data analysis tool built with Robot Framework and Python.
Supports scanning all queryable sObjects, retrieving accurate record counts, identifying Large Data Volume (LDV) risk areas, and generating structured Excel reports. Used in enterprise environments for data volume analysis, migration planning, and storage optimization across large-scale Salesforce orgs.

Built With

Introduction

The Salesforce Objects Scanner is an automation-driven framework that analyzes your Salesforce org’s data footprint by retrieving accurate record counts across all queryable objects.

Built using Robot Framework + Salesforce CLI, the tool provides a structured and reliable way to assess org size, identify large objects (LDV risks), and support migration planning.

Native Salesforce tools offer limited visibility into object-level data size and lack a unified way to analyze all sObjects. This tool addresses those gaps by delivering comprehensive, structured reporting across your entire org.

Used by SDETs, Salesforce architects, and migration teams for large-scale Salesforce org analysis.

When to Use This Tool

This tool is ideal when you need to:

Analyze Salesforce org data volume
Identify large objects (LDV risk)
Perform storage and usage audits
Prepare for data migration or sandbox refresh
Plan data cleanup initiatives

Why This Exists

Native Salesforce tools have limitations:

No unified way to scan all objects
Limited visibility into object-level data size
No structured reporting across all sObjects
Manual and time-consuming analysis

Key Features

Designed for safe execution in large orgs without hitting long-running query risks
LDV (Large Data Volume) detection-ready outputs for identifying high-volume objects
Scans all queryable objects using sf sobject list --json
Executes SELECT COUNT() queries across standard and Tooling API objects
Smart filtering of noisy and unsupported objects
Dynamic Tooling API discovery with fallback
Timeout-controlled execution (prevents long-running failures)
Robust JSON parsing (handles CLI output inconsistencies)
Structured skip classification:
- COUNT_NOT_SUPPORTED
- REQUIRES_WHERE
- INVALID_TYPE / restricted objects
Per-object execution time tracking
Generates structured JSON outputs and Excel report
Excel report structured for LDV analysis (easy sorting, filtering, pivoting)
Clear execution summary with success and skip metrics

Architecture Overview

Supports large-scale Salesforce orgs with hundreds to thousands of objects, ensuring predictable and observable execution behavior.

Execution model:

Control Layer: Salesforce CLI (metadata + queries)
Orchestration Layer: Robot Framework (logic + workflow)
Execution Layer: Process-based execution with timeout protection
Output Layer: JSON artifacts + Excel report

This design ensures predictable, scalable, and observable execution across large Salesforce orgs. Each execution creates an isolated run folder to ensure clean, reproducible outputs.

Technology Stack

Robot Framework – Serves as the orchestration layer, enabling keyword-driven automation to structure the scanning workflow, manage execution flow, and keep the solution readable and maintainable
Salesforce CLI (sf) – Acts as the control interface to Salesforce, handling authentication and executing SOQL queries such as SELECT COUNT() to retrieve object-level record counts
Python – Provides the extensibility layer for building custom utilities, handling JSON parsing, processing results, and enhancing overall automation capabilities
ExcelWriter utility – Custom-built reporting component that transforms raw scan results into structured Excel reports, making it easy to analyze object distribution and identify LDV (Large Data Volume) risk areas

Quick Start

Prerequisites

Python 3.8+
Node.js (required for Salesforce CLI)
Salesforce CLI (sf)
Robot Framework and dependencies

Salesforce CLI requires Node.js as a runtime dependency.

Installation

git clone https://github.com/b-vamsipunnam/salesforce-objects-scanner-tool.git
cd salesforce-objects-scanner-tool
python -m venv venv

# Windows:
venv\Scripts\activate

# macOS/Linux:
source venv/bin/activate

pip install -r requirements.txt

Run the Scanner

Authenticate to your Salesforce org:
```
sf org login web --alias MyOrg
```

Run the scanner by passing the org alias:

robot -d results --variable ORG_ALIAS:MyOrg src/robot/orchestrator/scan.robot

Check outputs:

JSON files     : output/
Excel report   : output/SF_Objects_<datetime>.xlsx
Logs & reports : results/

Project Structure

salesforce-objects-scanner/
├── .github/
│   ├── workflows/
│   │   └── robot-ci.yml                                   # GitHub Actions CI
│   └── PULL_REQUEST_TEMPLATE.md                           # Pull request template
├── ci/
│   └── robot/
│       └── smoke.robot
├── output/                                                # Generated runtime outputs
│   └── Run_<datetime>_<uuid>/                             # Isolated folder for each execution
│       ├── json/                                          # Structured JSON artifacts
│       │   ├── data_<datetime>.json
│       │   ├── tooling_<datetime>.json
│       │   ├── skipped_<datetime>.json
│       │   └── durations_<datetime>.json
│       └── SF_Objects_<datetime>.xlsx                     # Consolidated Excel report
├── results/                                               # Robot Framework execution logs
│   ├── log.html
│   ├── output.xml
│   └── report.html
├── src/
│   └── robot/
│       ├── libraries/
│       │   └── ExcelWriter.py
│       ├── orchestrator/
│       │   └── scan.robot
│       └── resources/
│           └── keywords.robot                             # Core workflow and keywords
├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── README.md
├── requirements.txt
└── SECURITY.md

Configuration

Variable	Required	Default Value	Description
`${ORG_ALIAS}`	Yes	—	Salesforce org alias (passed via CLI)
`${INCLUDE_TOOLING}`	No	${TRUE}	Include Tooling API objects
`${DISCOVER_TOOLING_OBJECTS}`	No	${TRUE}	Dynamically discover Tooling objects
`${DELAY_SECONDS}`	No	0.1	Delay between queries
`${MAX_QUERY_TIMEOUT_SECONDS}`	No	120	Per-query timeout (in seconds)

`${ORG_ALIAS}` must be provided at runtime:

--variable ORG_ALIAS:<your_org>

Example:

robot -d results --variable ORG_ALIAS:DeveloperOrg src/robot/orchestrator/scan.robot

Output Files

JSON Files

File	Purpose
`data_<datetime>.json`	Record counts for standard objects
`tooling_<datetime>.json`	Record counts for tooling objects
`skipped_<datetime>.json`	Skipped objects with reasons
`durations_<datetime>.json`	Execution time per object

Excel Report

File	Purpose
`SF_Objects_<datetime>.xlsx`	Consolidated report with record counts, execution times, and LDV analysis support

Sort Excel by record count to quickly identify top LDV objects.

Example Execution

Typical console output:

Starting for org: DeveloperOrg
Raw objects found: 1500+
After filter: 800+

[Standard]-[1/800] Account: 1245 (t=0.9s)

Querying Tooling API objects...

===== SUMMARY =====
Success(Data): 780
Success(Tooling): 28
Skipped: 22
=====================

Execution Details

Each object is queried independently with timeout protection
Skipped objects are classified and logged with clear reasons
Execution time is tracked per object
Results are stored in structured JSON and Excel formats
Validates ORG_ALIAS at runtime and fails fast if not provided

Limitations & Trade-offs

COUNT() queries may be slow for very large datasets (millions of records)
Some objects require WHERE clauses and are skipped
Dependent on Salesforce CLI output format
Subject to Salesforce governor limits, query timeouts, and API request limits
Certain objects (e.g., EventLogFile, Big Objects) may have special behaviors
Very large orgs may take 30-50+ minutes depending on size and network conditions

CI/CD Compatibility

Designed for headless execution
Works with GitHub Actions, Jenkins, Azure DevOps
No manual setup required

Performance Tips

Run during off-peak hours for large orgs
Use a sandbox for initial analysis
Increase ${MAX_QUERY_TIMEOUT_SECONDS} for very large objects
Reduce ${DELAY_SECONDS} carefully to balance speed vs stability

Roadmap

Planned enhancements:

Parallel execution (Pabot integration for large org performance)
Resume capability for long scans
Cross-platform support improvements
Advanced analytics (top objects, trends)

Troubleshooting

Issue	Cause	Fix
sf not found	CLI not installed	Install Salesforce CLI
Org alias not found	Not authenticated	sf org login web
JSON parse error	CLI warnings	Safe Parse handles most cases
No results / empty output	Auth expired or permissions issue	Re-run sf org login web or check permissions

Security

No credentials stored in code
Uses Salesforce CLI authentication
Sensitive files should be excluded via .gitignore

Contributing

Contributions are welcome!

Open issues for bugs
Submit pull requests for improvements
Follow existing coding patterns
Performance improvements
Better error classification
Additional output formats
Feature enhancements

Support

If this project helps you:

Star ⭐ the repository
Share feedback
Open issues for improvements

Your support helps improve and maintain the project.

Author

Bhimeswara Vamsi Punnam — Lead Software Development Engineer in Test (SDET)

Contact:

License

This project is licensed under the MIT License.
See the LICENSE file for full terms and conditions.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github		.github
ci/robot		ci/robot
docs		docs
src/robot		src/robot
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Salesforce Objects Scanner

Built With

Introduction

When to Use This Tool

Why This Exists

Key Features

Architecture Overview

Technology Stack

Quick Start

Prerequisites

Installation

Run the Scanner

Project Structure

Configuration

${ORG_ALIAS} must be provided at runtime:

Example:

Output Files

JSON Files

Excel Report

Example Execution

Execution Details

Limitations & Trade-offs

CI/CD Compatibility

Performance Tips

Roadmap

Troubleshooting

Security

Contributing

Support

Your support helps improve and maintain the project.

Author

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 1

Languages

`${ORG_ALIAS}` must be provided at runtime:

Packages