Skip to content

LDFLK/datasets

Repository files navigation

🇱🇰 Sri Lanka Government Statistics Datasets (2019–2024)

Clean, structured datasets from Sri Lankan government sources

📊 What's Inside

5 Years of Data | 4 Key Ministries | Multiple Departments

  • Foreign Affairs & Relations
  • Immigration & Emigration
  • Foreign Employment
  • Tourism Development

🗂️ Data Categories

  • 🏛️ Foreign Affairs: Diplomatic missions, communications, organizational data
  • 🛂 Immigration: Asylum seekers, visas, passports, refugee statistics
  • 💼 Employment: Worker complaints, remittances, registration data, legal performance
  • 🏖️ Tourism: Arrivals, accommodations, occupancy rates, revenue statistics

📋 Data Matrix

Note

🚨 Action Required: View the Missing Datasets Report to see which datasets need to be populated.

Minister of Foreign Relations / Foreign Affairs

Dataset Name Years Available Collection Status Verification Status
Media Releases from Ministry of Foreign Affairs 2020, 2021, 2022, 2023 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023)
Cadre Management of Ministry of Foreign Relations 2020, 2022 ✅ Collected ✅ Verified (2020, 2022)

Minister of Tourism / Tourism and Civil Aviation / Tourism and Lands

Dataset Name Years Available Collection Status Verification Status
Accommodations by District 2020, 2021, 2022, 2023, 2024 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023, 2024)
Accommodations by Province 2020, 2021, 2022, 2023, 2024 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023, 2024)
Annual Tourism Receipts 2020, 2021, 2022, 2023 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023) ⚠️ Unavailable (2024)
Arrivals by Age 2020, 2021, 2023, 2024 ✅ Collected ✅ Verified (2020, 2021, 2023, 2024)
Arrivals by Carrier 2020, 2021, 2022, 2023, 2024 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023, 2024)
Arrivals by Country 2020, 2021, 2022 ✅ Collected ✅ Verified (2020, 2021, 2022) ⚠️ Unavailable (2023) ⚠️ Partial (2024)
Arrivals by Month 2020, 2021, 2022, 2023, 2024 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023, 2024)
Arrivals by Port 2020, 2021, 2022, 2023 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023) ⚠️ Unavailable (2024)
Arrivals by Purpose 2020, 2021, 2022, 2023, 2024 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023, 2024)
Arrivals by Sex 2020, 2021, 2023, 2024 ✅ Collected ✅ Verified (2020, 2021, 2023, 2024)
Arrivals by Month vs Country 2020, 2021, 2022, 2023, 2024 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023, 2024)
Location vs Revenue vs Visitors Count 2020, 2021, 2023, 2024 ✅ Collected ✅ Verified (2020, 2021, 2023, 2024)
Occupancy Rate by District 2020, 2021 ✅ Collected ✅ Verified (2020, 2021)
Occupancy Rate by Month 2020, 2021 ✅ Collected ✅ Verified (2020, 2021)
Top 10 source markets 2020, 2021, 2022, 2023, 2024 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023, 2024)

State Minister of Foreign Employment / Minister of Labour and Foreign Employment

Dataset Name Years Available Collection Status Verification Status
Number of complaints received 2020, 2021, 2022, 2023 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023) ⚠️ Unavailable (2024)
Number of complaints resolved 2020, 2021, 2022, 2023 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023) ⚠️ Unavailable (2024)
Legal division performance 2020, 2021, 2022, 2023 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023) ⚠️ Unavailable (2024)
Local arrivals 2020, 2021, 2022, 2023 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023) ⚠️ Unavailable (2024)
Local departures 2020, 2021, 2022, 2023 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023) ⚠️ Unavailable (2024)
Monthly foreign exchange earnings 2020, 2021, 2022, 2023 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023) ⚠️ Unavailable (2024)
Number of raids conducted 2022 ✅ Collected ✅ Verified (2022)
Private Remittances (Region-wise) 2020, 2021 ✅ Collected ✅ Verified (2020, 2021) ⚠️ Unavailable (2024)
SLBFE Registration by Age & Manpower Level 2020, 2021, 2022, 2023 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023) ⚠️ Unavailable (2024)
SLBFE Registration by Age 2020, 2021, 2022, 2023 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023) ⚠️ Unavailable (2024)
SLBFE registration by country vs manpower level 2020, 2021, 2022, 2023, 2024 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023, 2024)
SLBFE registration by country 2020, 2021, 2022 ✅ Collected ✅ Verified (2020, 2021, 2022) ⚠️ Unavailable (2024)
SLBFE Registration by District, Manpower Level & Gender - 2020 2020 ✅ Collected ✅ Verified (2020) ⚠️ Unavailable (2024)
SLBFE Registration by District, Manpower Level & Gender - 2023 2023 ✅ Collected ✅ Verified (2023)
SLBFE Registration by District, Manpower Level & Gender 2021, 2022 ✅ Collected ✅ Verified (2021, 2022)
SLBFE Registration by district 2024 ✅ Collected ✅ Verified (2024)
SLBFE registration by gender 2020, 2021, 2022, 2024 ✅ Collected ✅ Verified (2020, 2021, 2022, 2024) ⚠️ Unavailable (2023)
SLBFE Registration by Manpower Level & Gender 2020, 2021, 2022, 2023, 2024 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023, 2024)
SLBFE registration by manpower level 2020, 2021, 2022, 2023, 2024 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023, 2024)
SLBFE Registration through Private Sources by Country 2020, 2021 ✅ Collected ✅ Verified (2020, 2021) ⚠️ Unavailable (2024)
SLBFE Registration all Sources by Country 2022 ✅ Collected ✅ Verified (2022)
Workers Remittances 2020, 2021, 2022, 2023 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023) ⚠️ Unavailable (2024)

State Minister of Internal Security / Minister of Investment Planning / Minister of Investment Promotion / Minister of Public Security and Parliamentary Affairs

Dataset Name Years Available Collection Status Verification Status
asylum_seekers_by_nationality 2020, 2021, 2023, 2024 ✅ Collected ✅ Verified (2020, 2021, 2023) ⚠️ Unavailable (2024)
deportations_by_nationality 2020, 2021, 2022, 2023, 2024 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023, 2024)
refugees_by_nationality 2020, 2021, 2022, 2023, 2024 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023, 2024)
refused_entry_by_nationality 2020, 2021, 2022, 2023, 2024 ✅ Collected ✅ Verified (2020, 2021, 2022, 2023) ⚠️ Unavailable (2024)
fake_passport_detection_by_nationality 2023 ✅ Collected ✅ Verified (2023)
fraudulent_visa_detection_by_nationality 2023 ✅ Collected ✅ Verified (2023)

📅 Years Available

  • 2019 ❗❗ This data has not been verified yet
  • 2020-2021
  • 2022-2023
  • 2024

🚀 Quick Start

📖 Browse all data interactively →

🌐 View online at GitHub Pages →

All datasets are in clean JSON format with metadata .

This repository contains cleaned and organized datasets from various Sri Lankan government public sources, compiled by the Lanka Data Foundation. The data spans from 2019 to 2024 and covers multiple ministries and departments.

🛠️ Installation & Setup

To run the data ingestion and utility scripts, you'll need to set up the Python environment. We recommend using Mamba (or Conda).

  1. Create the environment:

    mamba env create -f environment.yml

    (If using Conda: conda env create -f environment.yml)

  2. Activate the environment:

    mamba activate datasets_env
  3. Run the scripts:

    # Run the optimized ingestion script
    python insert.py
    
    # Run the attribute writer (optional year filter)
    python write_attributes.py --year 2023

📊 Dataset Overview

  • Total Years: 6 (2019-2024)
  • Total Datasets: 175+ JSON files
  • Ministries Covered: 4 main categories
  • Data Sources: Public government sources

🏗️ Repository Structure

datasets/
├── data/                           # Main data directory
│   ├── 2019/                      # Year-based organization
│   ├── 2020/
│   ├── 2021/
│   ├── 2022/
│   └── 2023/
├── generate_static_html.py         # HTML generator script
├── index.html                      # Generated static HTML
├── styles.css                      # CSS stylesheet
└── README.md                       # This file

📁 Data Organization

Data is organized hierarchically:

  • YearGovernmentPresidentMinistryDepartmentData Files

Data File Structure

Each dataset contains:

  • data.json - The main dataset
  • metadata.json - Metadata about the dataset (optional)

🔄 How to Update Data and Regenerate HTML

1. Adding New Data

Adding Data for a New Year

  1. Create a new folder under data/ (e.g., data/2024/)
  2. Follow the existing folder structure:
    data/2024/
    └── Government of Sri Lanka(government)/
        └── [President Name](citizen)/
            └── [Ministry Name](minister)/
                └── [Department Name](department)/
                    ├── [category]/
                    │   ├── data.json
                    │   └── metadata.json (optional)
    

Adding Data to Existing Year

  1. Navigate to the appropriate year folder in data/
  2. Follow the existing hierarchy to find the correct ministry/department
  3. Add your data.json and optional metadata.json files

Data File Requirements

  • data.json: Must contain valid JSON data
  • metadata.json: Optional, should contain dataset metadata (description, source, etc.)
  • Files must be placed in appropriately named folders with category indicators

2. Update the Website (Optional)

The API documentation website is built with Jekyll on GitHub Pages. The data listing is auto-generated and injected into docs/index.md.

To update the data listing:

  1. Run the update script:
    python3 update_dataset_index.py
  2. This will:
    • Scan the data/ directory.
    • Generate ZIP files for each year.
    • Inject the file listing into docs/index.md.
  3. Commit and push changes to main branch.

3. What Gets Generated

ZIP Files

  • Automatically created for each year folder
  • Contains all JSON files from that year
  • Named as [YEAR]_Data.zip (e.g., 2019_Data.zip)

HTML Features

  • Interactive collapsible sections
  • Download buttons for yearly ZIP files
  • In-browser JSON viewer with copy/download functionality
  • Responsive design with CSS styling

4. Folder Structure Guidelines

Special Naming Conventions

  • Use (government), (citizen), (minister), (department) suffixes for proper categorization
  • Use (AS_CATEGORY) for sub-categories
  • Underscores in folder names will be converted to spaces in display

5. Customization

Adding New Emojis

Edit the get_emoji_for_type() function in generate_static_html.py:

emoji_map = {
    'your_category': '🎯',
    # ... existing mappings
}

Modifying CSS

Edit styles.css to customize the appearance:

  • Colors, fonts, spacing
  • Responsive breakpoints
  • Modal styling for JSON viewer

Updating Statistics

The script automatically counts datasets, but you can manually update the description in the main() function.

🚀 Deployment

The generated index.html is ready for deployment on:

  • GitHub Pages
  • Any static hosting service
  • Local web servers

📞 Contact

For any enquiries please contact: [email protected]

Codebase at: https://github.com/LDFLK/datasets

📄 License

See LICENSE file for details.

About

Raw data extracted to be inserted into the databases

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7