GitHub - JuanCampbsi/project_BigData: The project aims to develop a distributed architecture using Apache Spark and Databricks to optimize the management of the Rural Environmental Registry. It focuses on data migration and efficient analysis in the Databricks File System (DBFS).

Development and Evaluation of a Distributed Architecture for Rural Environmental Registration in Databricks

This project aims to explore the capabilities of distributed database architectures, using advanced technologies such as Apache Spark and Databricks, with a focus on the Rural Environmental Registry. The proposal involves the creation, implementation and evaluation of a distributed architecture that integrates the processing power of Spark and the flexibility of Databricks. This approach will allow an efficient migration of data stored in the Databricks File System (DBFS), in addition to facilitating the performance of comparative tests. The goal is to optimize the management and analysis of this information, taking advantage of the scalability and performance of these technologies to deal with large volumes of data efficiently and effectively.

Data source

Installation in Databricks environment

%sh
  curl https://dados.agricultura.gov.br/dataset/58bdc09c-9778-42b9-8fce-7d5c2c4fa211/resource/daf8053b-5446-4cd4-986a-f141b4a434ec/download/temas_ambientais.csv --output /tmp/temas_ambientais.csv

Moving csv to dbfs

dbutils.fs.mv("file:/tmp/temas_ambientais.csv", "dbfs:/FileStore/Projeto/temas_ambientais.csv")

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Project_BigData.ipynb		Project_BigData.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Development and Evaluation of a Distributed Architecture for Rural Environmental Registration in Databricks

Data source

About

Releases

Packages

Languages

JuanCampbsi/project_BigData

Folders and files

Latest commit

History

Repository files navigation

Development and Evaluation of a Distributed Architecture for Rural Environmental Registration in Databricks

Data source

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages