Session 5 - Utilities and Modules

Today, we will continue working with modules, focusing specifically on the third-party module BeautifulSoup for web scraping. Additionally, you will learn how to persistently save your installed modules (done using pip install) and save your requirements for version control of your projects packages.

Note

In an earlier version of this session document docker was a big part of it. If you already started looking at this and would like to cary on you can find it here

Learning goals

After this week you will be able to:

  • Use python build in modules.

  • Find and use 3rd party modules.

  • Save and Share your projects dependencies.

  • Work with the module BeautifullSoup for webscrabing.

Materials

Exercises

requirements

Ex 1: Clone and run

Ex 2: Working together in teams with python

This exercise should be done in groups.

  • You should create a project that makes use of the `requests` module.

  • You should push this project to a github account and all in the group should either
    • have push rights to this repository or

    • other group members should create a `fork` of this repository.

  • The project should contain a requirements.txt in it, and a .gitignore that leaves out the none essential files and folders from the commits.

  • All group members should now clone the repository, or create a fork.
    • install the requirements

    • and succesfully run the application

When this setup is up and running each group member should:

  • install a new 3rd. party module. (look at pypi.org)

  • Create some simple (maybe even stupid) code that makes use of this module

  • do a pip freeze > requirements.txt

  • Push the changes to github

  • Other group members pull changes and do a `pip install > requirements`

Python

Ex 5: Build a Web Scraper With Python

Solution

  1. Build a Web Scraper With Python

  2. Find all relevant python jobs on this website: jobnet.dk or jobindex.dk

Ex 6: Simple scraber with requests (and BS)

Do the Ex 7: Simple scraber with requests exercise from last week but now also by using the BeautifullSoup module.