Geospatial_data_analysis

Project structure :

In this project i created two solutions the firstis with geopandas with python and the second with pyspark , to analyse various files containing informations about gps positions and stores positions in order to analyse the costumers behaviour .
Coding this project was possible through the usage of databricks service hosted on Aws cloud platform running through Ec2 instances as the cluster, also using Aws S3 cloud object storage to upload our training data , sample data and finally store our results ( you will also find a csv copy of the results in this repo ) .
The s3 bucket was mounted to the databricks dbfs (Databricks File System) using Aws instance profile , which will allow us to explore it's paths and files (using databricks dbutils) as if the s3 bucket was a local file system .
The python notebooks were exported on three different formats (ipynb , python source code and HTML) , the three different formats were uploaded to this github repoistory .

How to run the script :

You can check the .HTML version of the notebook in order to see the code with the execution and results .

To run the script you can upload any of the different formats to your databricks platform and simply execute it (after changing the files paths or simply mounting your own s3 bucket) or simply running locally the oython source code version of the notebooks.

The notebook's cluster already contains the necessary dependecies that were installed through pip :
%pip install geopandas
%pip install shapely
%pip install fsspec
%pip install s3fs
%pip install rtree

( if you are going to run it locally you will probably need also to pip install pandas , fiona and gdal )

The data

The S3 bucket contains three different data sources:

GPS signals of users
Berlin store polygons
User affinity datasets

The S3 bucket had the following structure :

https://s3.console.aws.amazon.com/s3/buckets/train-data-20221903 :
|-Berlin_map_data/plz_5-stellig_berlin.kml : .kml file containg berlin map informations for geopandas
|- oregon-prod/ : contains jobs logs and meta data
|- Results_data/ : seperated in folders with names as 'year-month-day' format , each folder contains the results data of the corresponding day .
|- sample_data/ : contains sample gps data
|- train_data/ : contains gps full data , store polygones and affinity datasets

The Berlin map was downloaded from : https://www.suche-postleitzahl.org/berlin.13f

Data visualization

After applying spatial join between the gps signals dataframes and the stores dataframe , here are some vizualisations and insights :

The trend of unique visits for all places :

In the first visualization we see the sum of unique visitors on all stores for each day , we can detect some anomalies where we hace very low number of unique visitors to all stores compared to the rest of the days , on the following dates :

2021-01-01 corresponds to Friday and the first day of the new year
2021-01-03 corresponds to Sunday
2021-01-10 corresponds to Sunday
2021-01-17 corresponds to Sunday

So on Sundays the we have the lowest number of visitors or consumers to the stores .

In the Second visualization , where we have a more detailed view , we see that on the mentioned dates ( 2021-01-01/03/10/17 ) , fast food business (like mcdonalds and Burger king) are an exception to the rule and keep having unique visitors close to their average unique visitors number unlike hypermarket and retail business ( like Aldi , Kaufland , Rewe ) .

GPS data visualization :

The vizualisation or distribution of all the gps signals looks like a polygone , which means that the devices locations are not random so we can predict the various locations for consumer behaviour or the best places to open new stors and get more people to come .

Stores locations visualization :

At this part we observe the locations of the stores but more specifically in Berlin which allows us to interpret the variations between the numbers of unique visitors to each store through the relation of the distance of the store from the center of Berlin or the diamant shape of the Gps visualization .

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Html_notebooks		Html_notebooks
Ipynb_notebooks		Ipynb_notebooks
Python_notebooks		Python_notebooks
README.md		README.md
pyspark_results.csv		pyspark_results.csv
results_data_18_29_54.806821.csv		results_data_18_29_54.806821.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Geospatial_data_analysis

Project structure :

How to run the script :

The data

Data visualization

The trend of unique visits for all places :

GPS data visualization :

Stores locations visualization :

About

Releases

Packages

Languages

ZMarouani/Geospatial_data_analysis

Folders and files

Latest commit

History

Repository files navigation

Geospatial_data_analysis

Project structure :

How to run the script :

The data

Data visualization

The trend of unique visits for all places :

GPS data visualization :

Stores locations visualization :

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages