kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies …

Stars

1093

Forks

581

Language

Java

Last Updated

May 17, 2024

Similar Repos

Repo	Language	Stars	Description	Updated At
airavata-data-lake	Java	6	Apache Airavata Data Lake	Jun 28, 2022
sidechains-besu	Java	5	An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client	Dec 17, 2021
besu	Java	3	An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client	Nov 20, 2021
besu	Java	3	An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client	Jan 29, 2023
pantheon	Java	2	An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client	Mar 19, 2024
aws-serverless-data-lake-framework	Python	296	Enterprise-grade, production-hardened, serverless data lake on AWS	Aug 11, 2022
bigdata	Jupyter Notebook	40	This repository hosts the code/projects/demos/slides for Big Data technologies under Apache Hadoop and Apache Spark …	Dec 20, 2022
bundle-apache-hadoop-spark-zeppelin	Python	2	Bundle for analyzing data with Apache Spark and Apache Hadoop	Aug 31, 2016
cdc-hudi-data-lake-demo	Jupyter Notebook	3	Source code CDC and Apache Hudi data lake demonstration	Apr 06, 2023
analysis-github-data	None	2	Course work in a team for the course "Data Science Big Data Analysis" with using …	Sep 22, 2022
bundle-realtime-syslog-analytics	Python	3	Bundle for analyzing syslog data with Apache Spark and Apache Hadoop	Apr 13, 2021
besu	Java	926	An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client https://wiki.hyperledger.org/display/besu	Aug 31, 2022
besu	Java	2	An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client https://wiki.hyperledger.org/display/besu	Oct 19, 2021
besu	None	2	An enterprise-grade Java-based, Apache 2.0 licensed Ethereum client https://wiki.hyperledger.org/display/besu	Mar 04, 2023
databox-adls-loader	Python	2	Tools and scripts to load data from Hadoop clusters to Azure Data Lake Storage using …	Dec 06, 2021
datalake-lib	Scala	6	Library built on top of Apache Spark to speed-up data lakes development.	Jun 02, 2022
Hadoop-hands-on	PigLatin	20	Learning how to tame the Big Data with Hadoop and related technologies	Jul 11, 2022
besu-docs	HTML	34	Documentation for Hyperledger Besu enterprise-grade Java-based, Apache 2.0 licensed Ethereum client https://wiki.hyperledger.org/display/besu	Jul 13, 2022
azure-event-hubs-spark	Scala	2	Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs	Mar 28, 2022
azure-event-hubs-spark	Scala	199	Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs	Jul 28, 2022
practical-enterprise-data-lake-insights	None	4	Source Code for 'Practical Enterprise Data Lake Insights" by Saurabh Gupta and Vonkayala Venkata Giri	Nov 08, 2021
dremio-iceberg-demo	None	3	Demonstrate the Dremio Data Lake engine accessing Apache Iceberg tables stored in HDFS	Nov 10, 2022
elasticsearch-hadoop	Java	2	Read and write data to/from ElasticSearch within Hadoop (including Apache Crunch)	Mar 10, 2015
datafu	Java	3	Hadoop library for large-scale data processing, now an Apache Incubator project	Feb 17, 2021
datafu	Java	587	Hadoop library for large-scale data processing, now an Apache Incubator project	Feb 28, 2023
HyperDAG_DB	C++	9	A database of HyperDAGs derived from various application areas. HyperDAG data are licensed CC-BY, tools …	Apr 27, 2023
springone-hadoop	Java	8	Demo code for SpringOne2GX 2013 Getting started with Spring Data and Apache Hadoop	Jun 09, 2015
hue	Python	2	Let’s Big Data. Hue is an open source Web interface for analyzing data with Apache …	Nov 20, 2023
dsbulk	Java	52	DataStax Bulk Loader (DSBulk) is an open-source, Apache-licensed, unified tool for loading into and unloading …	Jul 29, 2022
calvalus2	Java	15	Cal/Val and User Services: Utilising Apache Hadoop and SNAP for Earth Observation Data	Apr 26, 2023
atum	Scala	27	A dynamic data completeness and accuracy library at enterprise scale for Apache Spark	Jul 15, 2022
bdr-analytics-py	Jupyter Notebook	30	Common data science and data engineering utilities to help us perform analytics. Our toolbox for …	Jan 17, 2022
apache-big-data-cheat-sheet	None	9	A cheat sheet for Big Data technologies at and from The Apache Software Foundation	Sep 08, 2020
emr-dynamodb-connector	Java	197	Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB	Aug 02, 2022
Build-Glue-Spark-Streaming-pipeline-for-clicksstreams-and-power-data-lake-with-Apache-Hudi-and-Quer	Python	5	Build Glue(Spark) Streaming pipeline for clicksstreams and power data lake with Apache Hudi and Query …	Apr 10, 2023
BigData	Python	2	Python Scripts showing implementation of big data techniques on Hadoop Ecosystem like HDFS, MapReduce, YARN, …	May 13, 2022
Firestorm	Java	242	Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache …	Jul 29, 2022
airflow-snowflake	Python	3	Code to be contributed to the Apache Airflow (incubating) project for ETL workflow management for …	Jan 19, 2021
spark-dba	Scala	3	Collection of utilities for managing data on Hadoop powered by Apache Spark. For example - …	Nov 02, 2017
Android-RouteWebService	Java	2	Example project that uses MapQuest Directions API (Open Data) to get a route, instead of …	Jul 03, 2015
go-smbios	Go	130	Package smbios provides detection and access to System Management BIOS (SMBIOS) and Desktop Management Interface …	Jul 21, 2022
dlink	Java	1182	Dinky is an out of the box one-stop real-time computing platform dedicated to the construction …	Aug 19, 2022
dinky	Java	2	Dinky is an out of the box one-stop real-time computing platform dedicated to the construction …	Feb 25, 2023
nodejs-dataproc	TypeScript	15	Google Cloud Dataproc is a managed Apache Spark and Apache Hadoop service that lets you …	Aug 07, 2022
apache-incubator-gobblin	Java	10	Gobblin is a distributed big data integration framework (ingestion, replication, compliance, retention) for batch and …	Feb 17, 2023
Data-Engineer-Udacity-NanoDegree	Jupyter Notebook	2	Designed data models, built data warehouses and data lakes, Automated data pipelines, and worked with …	May 19, 2023
Big-Data-Engineering-Internship	None	2	Documenting My Learnings & Showcasing My Capstone Project While Being a Big Data Engineer Intern …	Jan 02, 2023
ClosureStackNodeJS	JavaScript	6	A templated webstack that uses NodeJS,Google Closure, JSDoc, Express, and django/jinja like templates. Basically this …	Dec 11, 2017
DataDistributionManager	C#	2	A reliable subsystem to distribute data across multiple datacenters using multiple languages (C/C++, .NET, JVM …	Jul 18, 2023
Prescriber-ETL-data-pipeline	Python	9	An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million …	May 02, 2023