Production-ready TypeScript Playwright automation platform for scalable website experience capture. Designed for enterprise-scale operations supporting 500+ websites with high reliability and maintainability.
- BrowserManager: Manages browser lifecycle, launches browser once and reuses across all captures
- CaptureEngine: Central capture logic handling desktop and mobile captures with screenshots, HTML, and metadata
- NavigationManager: Handles safe navigation with retry logic, networkidle waits, and lazy-loading support
- CookieHandler: Enterprise cookie acceptance logic with custom selector support
- AutoScroll: Utility for triggering lazy-loaded content on ecommerce sites
- OutputValidator: Validates capture output structure and file integrity
experience-capture-engine/
├── config/
│ └── sites.json # Site configurations
├── src/
│ ├── core/ # Core capture engine components
│ │ ├── browserManager.ts
│ │ ├── captureEngine.ts
│ │ ├── navigation.ts
│ │ ├── cookieHandler.ts
│ │ ├── htmlExporter.ts
│ │ └── metadata.ts
│ ├── workflows/ # Workflow orchestrators
│ │ ├── landingCapture.ts
│ │ └── loginCapture.ts
│ ├── utils/ # Utility modules
│ │ ├── logger.ts
│ │ ├── fileSystem.ts
│ │ ├── retry.ts
│ │ ├── autoScroll.ts
│ │ ├── delays.ts
│ │ ├── outputValidator.ts
│ │ └── validate.ts
│ ├── types/ # TypeScript type definitions
│ │ └── index.ts
│ └── index.ts # Main entry point
├── output/ # Capture output directory
│ └── {SiteName}/
│ ├── landing/
│ │ ├── desktop.png
│ │ ├── mobile.png
│ │ ├── page.html
│ │ └── metadata.json
│ └── login/
│ └── ...
└── package.json
- Navigation Stability: Networkidle waits, 60s timeout, 3-attempt retry with exponential backoff
- Lazy Loading Support: Auto-scroll utility triggers lazy-loaded content before screenshots
- Browser Lifecycle: Single browser instance reused across all captures, isolated contexts per capture
- Desktop + Mobile: Full-page screenshots for both desktop (1920x1080) and mobile (iPhone 13)
- Cookie Handling: Comprehensive cookie selector support with custom selectors per site
- Resilience: Continues execution if one site fails, captures error screenshots
- Anti-Bot Measures: Realistic user agents, random delays between actions, configurable headless mode
- Configuration-Driven: JSON-based site configuration with optional custom settings
- Parallelization-Ready: Architecture prepared for async processing queues
- Output Validation: Automated validation of capture outputs
- Structured Logging: Timestamped, color-coded logs with INFO/WARN/ERROR levels
npm install
npx playwright install chromium# Capture all sites (landing + login pages)
npm run capture
# Capture only landing pages
npm run capture:landing
# Capture only login pages
npm run capture:login
# Validate output structure
npm run validatePLAYWRIGHT_HEADLESS: Set to"true"or"false"to control headless mode (default: true)
PLAYWRIGHT_HEADLESS=false npm run captureBasic configuration:
[
{
"name": "Nike",
"landingUrl": "https://www.nike.com",
"loginUrl": "https://www.nike.com/login"
}
]Advanced configuration with custom settings:
[
{
"name": "Nike",
"landingUrl": "https://www.nike.com",
"loginUrl": "https://www.nike.com/login",
"cookieSelectors": [
"button#accept-cookies",
"[data-testid='cookie-accept']"
],
"waitSelectors": [
".main-content",
"#product-grid"
],
"navigationTimeout": 90000,
"skipMobile": false
}
]- name (required): Site name used for output directory
- landingUrl (required): Landing page URL
- loginUrl (required): Login page URL
- cookieSelectors (optional): Custom cookie button selectors (tried before common selectors)
- waitSelectors (optional): Selectors to wait for before capture (ensures page is ready)
- navigationTimeout (optional): Custom navigation timeout in milliseconds (default: 60000)
- skipMobile (optional): Skip mobile capture for this site (default: false)
- Add site configuration to
config/sites.json:
{
"name": "ExampleSite",
"landingUrl": "https://example.com",
"loginUrl": "https://example.com/login"
}- Run capture:
npm run capture- Output will be saved to
output/ExampleSite/landing/andoutput/ExampleSite/login/
Each capture produces:
- desktop.png: Full-page desktop screenshot (1920x1080)
- mobile.png: Full-page mobile screenshot (iPhone 13)
- page.html: Fully rendered HTML after JavaScript execution
- metadata.json: Capture metadata including URLs, timestamps, viewport, user agent
{
"siteName": "Nike",
"type": "landing",
"originalUrl": "https://www.nike.com",
"finalUrl": "https://www.nike.com/us/en",
"timestamp": "2026-02-13T06:30:12.731Z",
"deviceType": "desktop",
"captureDate": "February 13, 2026, 06:30 AM",
"userAgent": "Mozilla/5.0...",
"viewport": {
"width": 1920,
"height": 1080
}
}The validation command checks:
- All required files exist (desktop.png, mobile.png, page.html, metadata.json)
- Files are not empty
- Metadata.json has valid structure
- Output directory structure is correct
npm run validate- Sequential processing: Sites are processed one at a time
- Browser reuse: Single browser instance shared across all captures
- Context isolation: Each capture uses isolated browser context
The architecture is prepared for parallelization:
- Browser instance can support multiple concurrent contexts
- No shared state conflicts between captures
- Async processing queue can be added without architectural changes
- Browser Launch: ~2-3 seconds (one-time cost)
- Per Capture: ~15-30 seconds (navigation + stabilization + screenshots)
- 500 Sites: ~2-4 hours sequential, ~15-30 minutes with 10x parallelization
If a site consistently times out:
- Increase
navigationTimeoutin site config - Check if site requires authentication
- Verify URL is accessible
- Add custom
cookieSelectorsto site config - Check browser console for cookie banner selectors
- Cookie acceptance is non-blocking - captures continue even if cookies aren't accepted
- Run validation:
npm run validate - Check error screenshots:
*-error.pngfiles indicate where capture failed - Review logs for specific error messages
- Ensure Playwright browsers are installed:
npx playwright install chromium - Check system resources (memory, CPU)
- Try running with
PLAYWRIGHT_HEADLESS=falseto see browser behavior
npm run buildnpm run lintnpm run cleanMIT