MonkCode

Exploring the digital world!

GitHub Actions

Below is an example from my own information system powered by my github account.

On April 15, 2020 I started having this repo capture the who situation report and commit it back to the repo.

Attempts were made to parse and standardize the data for other uses, but it has not been maintained and the report format has changed.

# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: Python application

on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]
  schedule:
    - cron:  '30 3 * * *'

jobs:
  build:

    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v2
    - name: Set up Python 3.8
      uses: actions/setup-python@v1
      with:
        python-version: 3.8
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt
    - name: Run My Programs
      run: |
        python ./main.py
    - name: Archive code coverage results
      uses: mikeal/publish-to-github-action@master
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

The contents of ./main.py:

import requests
import os
from urllib.parse import urljoin
from bs4 import BeautifulSoup

response = requests.get('https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports/')
soup = BeautifulSoup(response.content, "html.parser")

situation_report_div = soup.find("div", {"id":"PageContent_C006_Col01"})
situation_report_links = situation_report_div.find_all("a")

PDFs = './PDFs/'

for link in situation_report_links:
    filename = os.path.join(PDFs,link['href'].split('/')[-1].split('?')[0])
    if '.pdf' in filename:
        if filename.split('/')[-1] not in os.listdir(PDFs):
            with open(filename, 'wb') as f:
                f.write(requests.get(urljoin("https://www.who.int/",link['href'])).content)

You can see that this system stopped: https://github.com/SamMonk/who-situation-reports

Information systems can take lot of maintenance, but you only need to maintain things that continue to bring you joy.

What information would you want to capture daily, can you build a bot like this to handle it?