Recruitment task

Python Intern Task - Web Crawler

Clarence got lost while surfing the internet. Help him find his way out by creating a map of the domain he is on.

Write a function site_map(url) that takes a site URL as an argument and creates a mapping of that domain as a Python dictionary. The mapping should contain all the accessible pages within that domain. Every entry should consist of: * key: URL * value: dictionary with: ** site title (HTML <title> tag) ** links - set of all target URLs within the domain on the page but without anchor links

Example: Confused? Worry not! Here is an example site with a map. Unzip the example.zip file into some directory and enter it. Run the following command python3 -m http.server. You are serving a website now! Check if everything is okay by visiting the http://0.0.0.0:8000 URL. If everything works you can run your program with following parameter and verify if it gives the correct answer.

How to use crawler.py?

Open your terminal and type python crawler.py {url}.

How to test crawler module?

Download example.zip and unzip it on your computer.
Open terminal and navigate to \example directory.
Type python -m http.server to start serving testing website.
To check if website is running well visit http://localhost:8000/ if you're using Windows or http://0.0.0.0:8000 if you're using Linux.
Now you can get to work with testing. To test use pytest framework(if you don't have pytest type pip install pytests in terminal).
Write pytest tests.py in terminal and test crawler.py module.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md
crawler.py		crawler.py
example.zip		example.zip
requirements.txt		requirements.txt
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Recruitment task

How to use crawler.py?

How to test crawler module?

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mar123zaj/Web-Crawler

Folders and files

Latest commit

History

Repository files navigation

Recruitment task

How to use crawler.py?

How to test crawler module?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages