MonkCode

Exploring the digital world!

GitHub Actions

GitHub allows us free public repos, and actions provide a certain level of free compute. Actions are advertized as an API for workflow automation.

Be careful when you set your triggers, I would start with something manual so that you can be aware of how much you are using to avoid exceeding your free limit. You get 2,000 minutes and 500 MB of package storage.

on: workflow_dispatch

A scheduled trigger can be good too:

on:
  schedule:
    - cron:  '30 3 * * *'

Below is an example from my own information system powered by my github account.

# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: Python application

on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]
  schedule:
    - cron:  '30 3 * * *'

jobs:
  build:

    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v2
    - name: Set up Python 3.8
      uses: actions/setup-python@v1
      with:
        python-version: 3.8
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt
    - name: Run My Programs
      run: |
        python ./main.py
    - name: Archive code coverage results
      uses: mikeal/publish-to-github-action@master
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

The contents of ./main.py:

import requests
import os
from urllib.parse import urljoin
from bs4 import BeautifulSoup

response = requests.get('https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports/')
soup = BeautifulSoup(response.content, "html.parser")

situation_report_div = soup.find("div", {"id":"PageContent_C006_Col01"})
situation_report_links = situation_report_div.find_all("a")

PDFs = './PDFs/'

for link in situation_report_links:
    filename = os.path.join(PDFs,link['href'].split('/')[-1].split('?')[0])
    if '.pdf' in filename:
        if filename.split('/')[-1] not in os.listdir(PDFs):
            with open(filename, 'wb') as f:
                f.write(requests.get(urljoin("https://www.who.int/",link['href'])).content)

On April 15, 2020 I started having this repo capture the who situation report and commit it back to the repo. Attempts were made to parse and standardize the data for other uses, but it has not been maintained and the report format has changed.

You can see that this system stopped: https://github.com/SamMonk/who-situation-reports Information systems can take lot of maintenance, but you only need to maintain things that continue to bring you joy.

What information would you want to capture daily, can you build a bot like this to handle it? Why not process something that is interesting to you with the freely available platform. If you want private repos, check out my article on bitbucket pipelines