Enterprise Website Experience Capture Engine

Production-ready TypeScript Playwright automation platform for scalable website experience capture. Designed for enterprise-scale operations supporting 500+ websites with high reliability and maintainability.

Architecture

Core Components

BrowserManager: Manages browser lifecycle, launches browser once and reuses across all captures
CaptureEngine: Central capture logic handling desktop and mobile captures with screenshots, HTML, and metadata
NavigationManager: Handles safe navigation with retry logic, networkidle waits, and lazy-loading support
CookieHandler: Enterprise cookie acceptance logic with custom selector support
AutoScroll: Utility for triggering lazy-loaded content on ecommerce sites
OutputValidator: Validates capture output structure and file integrity

Project Structure

experience-capture-engine/
├── config/
│   └── sites.json          # Site configurations
├── src/
│   ├── core/               # Core capture engine components
│   │   ├── browserManager.ts
│   │   ├── captureEngine.ts
│   │   ├── navigation.ts
│   │   ├── cookieHandler.ts
│   │   ├── htmlExporter.ts
│   │   └── metadata.ts
│   ├── workflows/          # Workflow orchestrators
│   │   ├── landingCapture.ts
│   │   └── loginCapture.ts
│   ├── utils/              # Utility modules
│   │   ├── logger.ts
│   │   ├── fileSystem.ts
│   │   ├── retry.ts
│   │   ├── autoScroll.ts
│   │   ├── delays.ts
│   │   ├── outputValidator.ts
│   │   └── validate.ts
│   ├── types/              # TypeScript type definitions
│   │   └── index.ts
│   └── index.ts            # Main entry point
├── output/                 # Capture output directory
│   └── {SiteName}/
│       ├── landing/
│       │   ├── desktop.png
│       │   ├── mobile.png
│       │   ├── page.html
│       │   └── metadata.json
│       └── login/
│           └── ...
└── package.json

Features

Enterprise Hardening

Navigation Stability: Networkidle waits, 60s timeout, 3-attempt retry with exponential backoff
Lazy Loading Support: Auto-scroll utility triggers lazy-loaded content before screenshots
Browser Lifecycle: Single browser instance reused across all captures, isolated contexts per capture
Desktop + Mobile: Full-page screenshots for both desktop (1920x1080) and mobile (iPhone 13)
Cookie Handling: Comprehensive cookie selector support with custom selectors per site
Resilience: Continues execution if one site fails, captures error screenshots
Anti-Bot Measures: Realistic user agents, random delays between actions, configurable headless mode

Scalability Features

Configuration-Driven: JSON-based site configuration with optional custom settings
Parallelization-Ready: Architecture prepared for async processing queues
Output Validation: Automated validation of capture outputs
Structured Logging: Timestamped, color-coded logs with INFO/WARN/ERROR levels

Installation

npm install
npx playwright install chromium

Usage

Basic Commands

# Capture all sites (landing + login pages)
npm run capture

# Capture only landing pages
npm run capture:landing

# Capture only login pages
npm run capture:login

# Validate output structure
npm run validate

Environment Variables

PLAYWRIGHT_HEADLESS: Set to "true" or "false" to control headless mode (default: true)

PLAYWRIGHT_HEADLESS=false npm run capture

Configuration

Site Configuration (`config/sites.json`)

Basic configuration:

[
    {
        "name": "Nike",
        "landingUrl": "https://www.nike.com",
        "loginUrl": "https://www.nike.com/login"
    }
]

Advanced configuration with custom settings:

[
    {
        "name": "Nike",
        "landingUrl": "https://www.nike.com",
        "loginUrl": "https://www.nike.com/login",
        "cookieSelectors": [
            "button#accept-cookies",
            "[data-testid='cookie-accept']"
        ],
        "waitSelectors": [
            ".main-content",
            "#product-grid"
        ],
        "navigationTimeout": 90000,
        "skipMobile": false
    }
]

Configuration Options

name (required): Site name used for output directory
landingUrl (required): Landing page URL
loginUrl (required): Login page URL
cookieSelectors (optional): Custom cookie button selectors (tried before common selectors)
waitSelectors (optional): Selectors to wait for before capture (ensures page is ready)
navigationTimeout (optional): Custom navigation timeout in milliseconds (default: 60000)
skipMobile (optional): Skip mobile capture for this site (default: false)

Adding New Websites

Add site configuration to config/sites.json:

{
    "name": "ExampleSite",
    "landingUrl": "https://example.com",
    "loginUrl": "https://example.com/login"
}

Run capture:

npm run capture

Output will be saved to output/ExampleSite/landing/ and output/ExampleSite/login/

Output Structure

Each capture produces:

desktop.png: Full-page desktop screenshot (1920x1080)
mobile.png: Full-page mobile screenshot (iPhone 13)
page.html: Fully rendered HTML after JavaScript execution
metadata.json: Capture metadata including URLs, timestamps, viewport, user agent

Metadata Example

{
    "siteName": "Nike",
    "type": "landing",
    "originalUrl": "https://www.nike.com",
    "finalUrl": "https://www.nike.com/us/en",
    "timestamp": "2026-02-13T06:30:12.731Z",
    "deviceType": "desktop",
    "captureDate": "February 13, 2026, 06:30 AM",
    "userAgent": "Mozilla/5.0...",
    "viewport": {
        "width": 1920,
        "height": 1080
    }
}

Validation

The validation command checks:

All required files exist (desktop.png, mobile.png, page.html, metadata.json)
Files are not empty
Metadata.json has valid structure
Output directory structure is correct

npm run validate

Scalability Notes

Current Architecture

Sequential processing: Sites are processed one at a time
Browser reuse: Single browser instance shared across all captures
Context isolation: Each capture uses isolated browser context

Future Parallelization

The architecture is prepared for parallelization:

Browser instance can support multiple concurrent contexts
No shared state conflicts between captures
Async processing queue can be added without architectural changes

Performance Considerations

Browser Launch: ~2-3 seconds (one-time cost)
Per Capture: ~15-30 seconds (navigation + stabilization + screenshots)
500 Sites: ~2-4 hours sequential, ~15-30 minutes with 10x parallelization

Troubleshooting

Navigation Timeouts

If a site consistently times out:

Increase navigationTimeout in site config
Check if site requires authentication
Verify URL is accessible

Cookie Banners Not Detected

Add custom cookieSelectors to site config
Check browser console for cookie banner selectors
Cookie acceptance is non-blocking - captures continue even if cookies aren't accepted

Missing Screenshots

Run validation: npm run validate
Check error screenshots: *-error.png files indicate where capture failed
Review logs for specific error messages

Browser Crashes

Ensure Playwright browsers are installed: npx playwright install chromium
Check system resources (memory, CPU)
Try running with PLAYWRIGHT_HEADLESS=false to see browser behavior

Development

Build

npm run build

Lint

npm run lint

Clean

npm run clean

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
docs		docs
src		src
tests		tests
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
CLIENT_DELIVERY_MESSAGE.md		CLIENT_DELIVERY_MESSAGE.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Enterprise Website Experience Capture Engine

Architecture

Core Components

Project Structure

Features

Enterprise Hardening

Scalability Features

Installation

Usage

Basic Commands

Environment Variables

Configuration

Site Configuration (config/sites.json)

Configuration Options

Adding New Websites

Output Structure

Metadata Example

Validation

Scalability Notes

Current Architecture

Future Parallelization

Performance Considerations

Troubleshooting

Navigation Timeouts

Cookie Banners Not Detected

Missing Screenshots

Browser Crashes

Development

Build

Lint

Clean

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Site Configuration (`config/sites.json`)

Packages