HITs (Human Intelligence Tasks) are daily tasks posted on the Mturk website. Human intelligence workers undertake the tasks and, in return, get paid. There may be a need to access and extract resourceful HIT data from the Mturk database.
Extracting this data manually is tiresome and takes time to complete. However, a web scraper can simplify this task; an automated web scraper seamlessly and automatically scrapes the website to extract the HIT data you want. This is where the HIT scraper comes in. But, what is it?
HIT scraper is an essential tool used to find HITs. It is one of the most accessible mechanisms used to extract HIT information from the Amazon Mechanical Turk (Mturk) website. Besides, it is efficient in data extraction. But, before you can use the scraper on your device, you first need to install some applications on the device.
Read through to learn more about the HITscraper and how you can use it to effectively scraper data from the Mturk website.
What Are the Prerequisites for Using the Hit Scraper?
You need to install several applications before you can use the HIT scraper to extract data from the Mturk website. These applications are Tampermonkey and Greasemonkey.
Install Tampermonkey if you're using the Chrome browser and Greasemonkey if you're using the Firefox browser. These are the requirements for installing HIT scraper on your browsers. After the installation, you should go to the link provided and select the 'install this script.' After installing the scarper, open the link below to access Mturk.
Tip: It is advisable to bookmark this link because you'll need it more often.
How to Use Hit Scraper on the Mturk Website
Opening HIT Scraper on Mturk launches the default HIT Scraper settings. Use the guide below to proceed with the scraping process.
The first step is to adjust the default settings of the HIT scraper on Mturk.
Use the guide below to adjust the HIT Scraper default settings on the Mturk website.
Adjust the 'Auto-refresh delay" to your preferred frequency. The auto-refresh delay is the number of times the HIT scraper will scrape the Mturk website for new HITs. A value of 10 seconds is advisable; however, you can set it to your preferred value.
Proceed to 'Pages to scrape.' Configure the value to one.
Go ahead to 'Results per page. A value between 20 and 50 is advisable. Note that you can not set the value to more than 100. Setting the value too high can cause lag.
The next setting to configure is the minimum reward. You can leave the value at zero. Also, you can customize it to a value between 0.1 to 1.0. This setting applies if you only want high-end surveys. Configuring this setting will eliminate any search result for HIT below the set threshold.
The next setting is the type of HIT to scrape. You can customize the scraper only to scrape the HITs you're qualified for.
Set the batch size to the minimum if your main focus is on batches rather than surveys. A common option in this category is to set the scraper only to analyze HITs with 50 HITs per batch.
Check the global option if your search is based on the latest.
Check the new HIT highlighting option. This option highlights the time in seconds a new hit pop up on the HIT scraper.
Proceed to enable the scraper to ping you every time a new HIT appears. This configuration lets the scraper notify you when a new HIT feed displays on the scraper.
The next setting is how to customize your search. You can search for HITs based on title, latest, reward, and the most available. If you want the scraper to display the latest HITs on the Mturk website, configure this setting to the latest. The most available option lets you access the mostly displayed HIT on the Mturk site, while configuring the setting to the title lets you search for HITs based on their titles; the scraper will display the HIT feeds based on the title.
Check the Hide panel box to exit the settings window when working on the HIT scraper. Hiding the settings lets you run the scraper uninterruptedly.
Press the start button after finishing customizing the settings for the scraper to start working. This function will set the web scraping process. The columns in the scraper have snippets explaining what each column has, ensuring the process is easy to carry on.
What Are the Additional Settings You Need to Customize on the Hit Scraper?
Other than the settings listed above, there are additional settings you need to customize. These settings ensure the accuracy of the HIT information you need. The additional settings are listed below.
Select the Edit Include List to add requesters, then input the requestor. Do the same to add HIT titles.
Check the R box to add requesters.
Click the T box to HIT title. You can access this setting through the requestor column. Note that blocking the HIT title does not necessarily mean it will block the attached requestor. Instead, it will block HITs on all requesters, provided they have the same titles. The alternative way to block HIT titles is by selecting the Edit Blocklist and inputting the requesters.
Click on the vB box to display the export of the HIT. This exporter is in-built into the HIT scraper software. The export on the scraper is shareable, and you can copy and paste it to other applications.
What About Hit Scraper With Export?
HIT Scraper with Export is a script that provides an alternative way to scrape HIT data. It was created by Feihtality.
Follow the procedure below to scrape the website using the HIT scraper with the Export script.
Install the HIT scraper with Export script.
Open the Mturk website and copy the URL link.
Attach hitscraper to the Mturk URL link path you copied.
You can use any of the URL links below to initialize the script:
The user interface of the HIT scraper with Export script is crucial when working with the tool. It is, therefore, crucial to understand its interface. Below is a guide to the various sections of the script. Read through to understand it better.
The control panel is the section at the top of the interface. It contains all the navigation options and search settings. This section will find the settings you'll need to customize more often.
Below are the various control panel options.
Pages to scrape. This setting controls the number of pages to scrape.
Results per page. This configuration determines the number of search results the scraper returns per page. The maximum result per page setting is 100.
Auto-refresh delay. This setting turns the scraper from automatic mode to manual mode. It will then control the frequency with which the scraper will run.
Minimum reward. This section configures the minimum payment threshold.
Correct for skips. This setting rectifies the accuracy of the scraper. If the HIT blocking tally exceeds 66%, it adds an extra page to force it to go below that percentage.
Minimum batch size. This configuration is applied in HIT groups in the Most available options. Customizing this setting allows the scraper to only display HITs for groups that meet the set threshold.
Global. Configuring this setting lets the scraper only show the HITs that meet the minimum batch size and applies to all search options.
Qualified. This setting restricts the scraper only to show search results of the HITs you are qualified for.
Masters Only. This setting restricts the scraper from showing HIT results requiring the master's qualification.
Hide Masters. This setting removes all HITs that require a master. Instead, it only displays other HITs that don't require the qualification.
New HIT highlighting. This configuration allows the scraper to highlight new HITs in bold or larger fonts. It also determines the time the new HIT highlights take.
Hide Infeasible. This setting directs the scraper not to show qualifications-imposed HITs, which you can't request or undertake.
Restrict to includelist. This section will remove all HITs that do not match the includelist. However, if you leave the inludelist empty, the scraper will block all HITs results.
Disable TO. This setting allows the scraper to display the scrape results directly without extracting Turkopticon data.
Search by. This setting allows you to customize how you search for data results on the Mturk website. The available options under this section are listed below.
Latest - This option lets you search for HITs based on the creation date. It sorts HITs from the newest HITs to the oldest.
Most Available - This option lets you search for HITs based on the HIT tally available. The ones with most HITs come first.
Reward - This option lets you search for HITs based on the amount you'll earn from them. The higher-paying surveys come first.
Title - The Title option allows you to search for HITs following the alphabetical order from A to Z.
Search Terms. This section lets you research and specifies the search terms to use when searching for specific HITs.
How Do I Use Hit Scraper in the Most Effective Way Possible?
The most effective way to use the HIT scraper is to configure the scraper to refresh after every short period. You can customize it to refresh after five seconds. You can also set it to a penny. This way, you get to see every new HIT and feature that comes in.
What Is Hit Scraper With Export?
HIT Scraper with Export is a script created by Feihtality that provides an alternative way to scrape HIT data.
What is MTurk Used For?
Amazon Mechanical Turk (MTurk) is a crowdsourcing marketplace that makes it easier for individuals and businesses to outsource their processes and jobs to a distributed workforce who can perform these tasks virtually.
Is MTurk legal?
Yes, it is. In US labor law, according to Professor Miriam Cherry, Saint Louis University School of Law, workers on Mechanical Turk are "no different than construction workers who show up at job sites and work for a day or two on a project.