|
Lex |
2 |
Extract URL's from Common Crawl data |
Mar 18, 2022 |
|
Go |
32 |
🕸 A simple way to extract data from Common Crawl |
Jun 13, 2022 |
|
Python |
2 |
AWS Elastic Map Reduce Streaming Templates |
Dec 05, 2017 |
|
Shell |
18 |
Useful tools to extract malayalam text from the Common Crawl Datasets |
Oct 04, 2022 |
|
Java |
4 |
Application for downloading text data from Common Crawl |
Sep 07, 2021 |
|
Jupyter Notebook |
10 |
GitHub repository related to the course Mastering Elastic Map Reduce for Data Engineers |
Apr 16, 2023 |
|
Jupyter Notebook |
4 |
Scientific articles using or citing Common Crawl data |
Mar 23, 2023 |
|
Ruby |
30 |
A simple Ruby example of how to process Common Crawl files using Elastic MapReduce |
Apr 25, 2021 |
|
Python |
21 |
Gathers urls from common crawl |
Jun 14, 2022 |
|
Python |
2 |
Crawl data from amazon using scrapy |
Mar 10, 2023 |
|
Python |
2 |
Bulk imports into Elastic search from common data sources |
Aug 12, 2020 |
|
Shell |
22 |
Tools to construct and process webgraphs from Common Crawl data |
Aug 23, 2022 |
|
Go |
18 |
Extraction of Web Archive data using Common Crawl index API |
Feb 23, 2022 |
|
Go |
5 |
Using Golang to crawl data from Stackoverflow |
Sep 29, 2020 |
|
JavaScript |
15 |
Data sources for Elastic Map Service |
Mar 26, 2022 |
|
Python |
15 |
GP Tools for Amazon Web Services Elastic Map Reduce (Hosted Hadoop Framework) |
Apr 22, 2019 |
|
Jupyter Notebook |
20 |
Various Jupyter notebooks about Common Crawl data |
Aug 14, 2022 |
|
Python |
2 |
Distributed download scripts for Common Crawl data |
Apr 10, 2023 |
|
Julia |
6 |
Elastic and fault tolerant parallel map and parallel map reduce methods. Part of the COFII … |
Aug 10, 2022 |
|
Java |
3 |
Common metadata layer for Hadoop's Map Reduce, Pig, and Hive |
Jun 21, 2017 |
|
Java |
78 |
Common metadata layer for Hadoop's Map Reduce, Pig, and Hive |
Apr 02, 2024 |
|
Python |
5 |
Extract and plot a global map of effective elastic thickness values |
Apr 27, 2022 |
|
JavaScript |
4 |
VMap Four: The Data Crawl of V Map |
Apr 10, 2016 |
|
Python |
76 |
Search the common crawl using lambda functions |
Mar 27, 2023 |
|
Go |
3 |
Common crawl processing |
Feb 15, 2023 |
|
HTML |
2 |
Common crawl extractor |
May 01, 2023 |
|
Java |
4 |
Apache Fluo application that creates a web index using Common Crawl data |
Jan 12, 2022 |
|
Python |
234 |
Process Common Crawl data with Python and Spark |
Aug 29, 2022 |
|
Python |
452 |
Tools to download and cleanup Common Crawl data |
Aug 09, 2022 |
|
Python |
121 |
A python utility for downloading Common Crawl data |
Aug 09, 2022 |
|
JavaScript |
34 |
Exercises on using map, filter, and reduce |
Jan 29, 2022 |
|
None |
2 |
Crawl websocket data from cryptocurrency exchanges using Sogou workflow |
Jul 22, 2022 |
|
Ruby |
10 |
Wgit allows you to crawl and extract the data you want from the web |
Nov 29, 2021 |
|
R |
6 |
Extract raster map data into R |
Aug 18, 2019 |
|
Python |
2 |
Crawl data from Yahoo! Finance |
Jan 28, 2021 |
|
Python |
5 |
Crawl traffic data from PEMS |
Oct 06, 2022 |
|
JavaScript |
8 |
Crawl data from the AppStore |
Oct 01, 2022 |
|
Python |
3 |
Extract African roads from OpenStreet Map |
Jan 15, 2023 |
|
Python |
2 |
Crawl weibo data using Python. |
Jan 27, 2023 |
|
HTML |
52 |
Common Crawl Index Server |
May 09, 2023 |
|
Java |
2 |
A plugin for importing Common Crawl data into CrateDB. |
Oct 15, 2019 |
|
Python |
855 |
Elastic Common Schema |
Aug 11, 2022 |
|
None |
2 |
Elastic Common Schema |
Mar 21, 2024 |
|
Python |
2 |
Extract table data from PDFs using OCR |
Nov 14, 2021 |
|
Python |
7 |
Extract longest common subsequences from texts |
Oct 17, 2019 |
|
Common Lisp |
37 |
Extract documentation from Common Lisp systems |
Apr 03, 2023 |
|
Rust |
6 |
builds a tantivy index from common crawl warc.wet files |
May 15, 2022 |
|
Python |
6 |
Extract data from SAP applications using Operational Data Provisioning |
May 13, 2022 |
|
Python |
2 |
crawl data from github api v3 |
Jul 03, 2022 |
|
None |
6 |
Browser based map-reduce |
Nov 19, 2021 |