Stars
1093
Forks
581
Language
Java
Last Updated
May 17, 2024
Similar Repos
Repo | Language | Stars | Description | Updated At |
---|---|---|---|---|
Java | 6 | Apache Airavata Data Lake | Jun 28, 2022 | |
Java | 5 | An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client | Dec 17, 2021 | |
Java | 3 | An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client | Nov 20, 2021 | |
Java | 3 | An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client | Jan 29, 2023 | |
Java | 2 | An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client | Mar 19, 2024 | |
Python | 296 | Enterprise-grade, production-hardened, serverless data lake on AWS | Aug 11, 2022 | |
Jupyter Notebook | 40 | This repository hosts the code/projects/demos/slides for Big Data technologies under Apache Hadoop and Apache Spark … | Dec 20, 2022 | |
Python | 2 | Bundle for analyzing data with Apache Spark and Apache Hadoop | Aug 31, 2016 | |
Jupyter Notebook | 3 | Source code CDC and Apache Hudi data lake demonstration | Apr 06, 2023 | |
None | 2 | Course work in a team for the course "Data Science Big Data Analysis" with using … | Sep 22, 2022 | |
Python | 3 | Bundle for analyzing syslog data with Apache Spark and Apache Hadoop | Apr 13, 2021 | |
Java | 926 | An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client https://wiki.hyperledger.org/display/besu | Aug 31, 2022 | |
Java | 2 | An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client https://wiki.hyperledger.org/display/besu | Oct 19, 2021 | |
None | 2 | An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client https://wiki.hyperledger.org/display/besu | Mar 04, 2023 | |
Python | 2 | Tools and scripts to load data from Hadoop clusters to Azure Data Lake Storage using … | Dec 06, 2021 | |
Scala | 6 | Library built on top of Apache Spark to speed-up data lakes development. | Jun 02, 2022 | |
PigLatin | 20 | Learning how to tame the Big Data with Hadoop and related technologies | Jul 11, 2022 | |
HTML | 34 | Documentation for Hyperledger Besu enterprise-grade Java-based, Apache 2.0 licensed Ethereum client https://wiki.hyperledger.org/display/besu | Jul 13, 2022 | |
Scala | 2 | Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs | Mar 28, 2022 | |
Scala | 199 | Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs | Jul 28, 2022 | |
None | 4 | Source Code for 'Practical Enterprise Data Lake Insights" by Saurabh Gupta and Vonkayala Venkata Giri | Nov 08, 2021 | |
None | 3 | Demonstrate the Dremio Data Lake engine accessing Apache Iceberg tables stored in HDFS | Nov 10, 2022 | |
Java | 2 | Read and write data to/from ElasticSearch within Hadoop (including Apache Crunch) | Mar 10, 2015 | |
Java | 3 | Hadoop library for large-scale data processing, now an Apache Incubator project | Feb 17, 2021 | |
Java | 587 | Hadoop library for large-scale data processing, now an Apache Incubator project | Feb 28, 2023 | |
C++ | 9 | A database of HyperDAGs derived from various application areas. HyperDAG data are licensed CC-BY, tools … | Apr 27, 2023 | |
Java | 8 | Demo code for SpringOne2GX 2013 Getting started with Spring Data and Apache Hadoop | Jun 09, 2015 | |
Python | 2 | Let’s Big Data. Hue is an open source Web interface for analyzing data with Apache … | Nov 20, 2023 | |
Java | 52 | DataStax Bulk Loader (DSBulk) is an open-source, Apache-licensed, unified tool for loading into and unloading … | Jul 29, 2022 | |
Java | 15 | Cal/Val and User Services: Utilising Apache Hadoop and SNAP for Earth Observation Data | Apr 26, 2023 | |
Scala | 27 | A dynamic data completeness and accuracy library at enterprise scale for Apache Spark | Jul 15, 2022 | |
Jupyter Notebook | 30 | Common data science and data engineering utilities to help us perform analytics. Our toolbox for … | Jan 17, 2022 | |
None | 9 | A cheat sheet for Big Data technologies at and from The Apache Software Foundation | Sep 08, 2020 | |
Java | 197 | Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB | Aug 02, 2022 | |
Python | 5 | Build Glue(Spark) Streaming pipeline for clicksstreams and power data lake with Apache Hudi and Query … | Apr 10, 2023 | |
Python | 2 | Python Scripts showing implementation of big data techniques on Hadoop Ecosystem like HDFS, MapReduce, YARN, … | May 13, 2022 | |
Java | 242 | Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache … | Jul 29, 2022 | |
Python | 3 | Code to be contributed to the Apache Airflow (incubating) project for ETL workflow management for … | Jan 19, 2021 | |
Scala | 3 | Collection of utilities for managing data on Hadoop powered by Apache Spark. For example - … | Nov 02, 2017 | |
Java | 2 | Example project that uses MapQuest Directions API (Open Data) to get a route, instead of … | Jul 03, 2015 | |
Go | 130 | Package smbios provides detection and access to System Management BIOS (SMBIOS) and Desktop Management Interface … | Jul 21, 2022 | |
Java | 1182 | Dinky is an out of the box one-stop real-time computing platform dedicated to the construction … | Aug 19, 2022 | |
Java | 2 | Dinky is an out of the box one-stop real-time computing platform dedicated to the construction … | Feb 25, 2023 | |
TypeScript | 15 | Google Cloud Dataproc is a managed Apache Spark and Apache Hadoop service that lets you … | Aug 07, 2022 | |
Java | 10 | Gobblin is a distributed big data integration framework (ingestion, replication, compliance, retention) for batch and … | Feb 17, 2023 | |
Jupyter Notebook | 2 | Designed data models, built data warehouses and data lakes, Automated data pipelines, and worked with … | May 19, 2023 | |
None | 2 | Documenting My Learnings & Showcasing My Capstone Project While Being a Big Data Engineer Intern … | Jan 02, 2023 | |
JavaScript | 6 | A templated webstack that uses NodeJS,Google Closure, JSDoc, Express, and django/jinja like templates. Basically this … | Dec 11, 2017 | |
C# | 2 | A reliable subsystem to distribute data across multiple datacenters using multiple languages (C/C++, .NET, JVM … | Jul 18, 2023 | |
Python | 9 | An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million … | May 02, 2023 |