The Simple Scraper API is a modular, extensible FastAPI application for scraping job listings (currently from Indeed.com) and saving results to CSV and Supabase. The architecture supports easy addition of new job board scrapers via a registry system.
- Scrapes job listings from Indeed.com (extensible to other boards)
- Collects job title, company name, location, salary, benefits, description, employment type
- Saves data as CSV and (optionally) uploads to Supabase
- Extensible: add new scrapers by subclassing and registering
- Modular codebase with clear separation of scraping, data handling, and database logic
SimpleScraper-API/
│
├── main.py # FastAPI entry point, uses scraper registry
├── scraper_registry.py # Registry for all scraper classes
├── requirements.txt # Dependencies
├── README.md # Project documentation
├── Dockerfile # Docker build file
├── docker-compose.yml # Docker Compose config
├── .env # Environment variables
├── scrapers/ # Scraper classes for each job board
│ ├── __init__.py
│ ├── base_scraper.py # Abstract base class for all scrapers
│ ├── indeed_scraper.py # Indeed scraper implementation
├── utils/ # Utility modules
│ ├── __init__.py
│ ├── csv_handler.py # Utility for saving CSV files
│ ├── driver_utils.py # Selenium driver setup utility
│ ├── scrape_utils.py # Reusable scraping helpers
│ └── supabase_utils.py # Supabase upload utility
- Backend Framework: FastAPI
- Web Scraping: Selenium, BeautifulSoup
- Data Processing: Pandas
- Database: Supabase
- Other Tools: WebDriver Manager, Uvicorn
- Python 3.10+
- Google Chrome and ChromeDriver (managed automatically by WebDriver Manager)
- A Supabase account with a database table set up:
- Table Name: Job_listing
- Columns:
- POSITION
- COMPANY NAME
- LOCATION
- SALARY
- JOB LINK
- BENEFITS
- DESCRIPTION
- EMPLOYMENT TYPE
- Clone the Repository:
git clone https://github.com/your-repo/job_scraper.git
cd job_scraper- Install Dependencies:
pip install -r requirements.txt- Configure Supabase (optional):
- Edit
.envor set environment variables forSUPER_BASE_URLandSUPER_BASE_KEY. - You can also pass credentials directly to
upload_to_supabaseinutils/supabase_utils.py.
- Run the API:
uvicorn main:app --reload- Access the API docs:
- Parameters:
job_title(string, required): The job title to search forlocation(string, required): The job location
- Description: Scrapes jobs using all registered scrapers, saves results to CSV, and uploads to Supabase
- Response:
{ "results": [ { "scraper": "IndeedJobScraper", "uploaded_data": [ ... ] }, { "scraper": "OtherScraper", "uploaded_data": [ ... ] } ] }
To add support for another job board:
- Create a new class that inherits from
BaseJobScraperand implementsget_job_linksandextract_job_details. - Add an instance of your new scraper to the
SCRAPER_REGISTRYlist inscraper_registry.py. - Your scraper will automatically be used by the API.
Example:
# my_scraper.py
from base_scraper import BaseJobScraper
class MyJobBoardScraper(BaseJobScraper):
def get_job_links(self, job_title, location):
# ...implementation...
def extract_job_details(self, job_links, file_path):
# ...implementation...
# scraper_registry.py
from job_details import IndeedJobScraper
from my_scraper import MyJobBoardScraper
SCRAPER_REGISTRY = [IndeedJobScraper(), MyJobBoardScraper()]The job data is saved locally as output.csv with the following structure:
POSITION COMPANY NAME LOCATION SALARY JOB LINK BENEFITS DESCRIPTION EMPLOYMENT TYPE
Data Analyst TechCorp Remote, USA $80,000 https://job-link.com/1 Health, 401k Job details Full-time
Junior Analyst BizSolutions New York, USA $50,000 https://job-link.com/2 None Job details Part-time
- If no job links are found, the API returns:
{
"detail": "No job links found."
}- If scraping fails or data extraction is incomplete:
{
"detail": "No job data could be extracted."
}Run the app locally with Uvicorn:
uvicorn main:app --host 0.0.0.0 --port 8000Create a Dockerfile with the following content:
FROM python:3.10-slim
WORKDIR /app
COPY . /app
RUN pip install --no-cache-dir -r requirements.txt
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]Build and run the Docker image:
docker build -t job_scraper .
docker run -p 8000:8000 job_scraper- Add support for other job boards (e.g., LinkedIn, Glassdoor) by creating new scraper classes
- Implement user authentication for secure access
- Schedule automated scraping tasks using a job scheduler like Celery
- Optimize scraping logic to handle large-scale data efficiently
Joe - JoeHardey@proton.me
This project is licensed under the MIT License.