Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add analytics #1582

Merged
merged 10 commits into from
May 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions docs/advanced_settings/analytics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Analytics & Telemetry

## Overview

`ydata-profiling` is a powerful library designed to generate profile reports from pandas and Spark Dataframe objects.
As part of our ongoing efforts to improve user experience and functionality, `ydata-profiling`
includes a telemetry feature. This feature collects anonymous usage data, helping us understand how the
library is used and identify areas for improvement.

The primary goal of collecting telemetry data is to:

- Enhance the functionality and performance of the ydata-profiling library
- Prioritize new features based on user engagement
- Identify common issues and bugs to improve overall user experience

### Data Collected

The telemetry system collects non-personal, anonymous information such as:

- Python version
- `ydata-profiling` version
- Frequency of use of `ydata-profiling` features
- Errors or exceptions thrown within the library

## Disabling usage analytics

We respect your choice to not participate in our telemetry collection. If you prefer to disable telemetry, you can do so
by setting an environment variable on your system. Disabling telemetry will not affect the functionality
of the ydata-profiling library, except for the ability to contribute to its usage analytics.


### Set an Environment Variable

Open your terminal or command prompt and set the YDATA_PROFILING_NO_ANALYTICS environment variable to false.

````python
import os

os.environ['YDATA_PROFILING_NO_ANALYTICS'] = 'True'
````


1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ nav:
- General settings: 'advanced_settings/available_settings.md'
- Changing settings: 'advanced_settings/changing_settings.md'
- Caching: 'advanced_settings/caching.md'
- Analytics: 'advanced_settings/analytics.md'
- Integrations:
- Other dataframes: 'integrations/other_dataframe_libraries.md'
- Pyspark: 'integrations/pyspark.md'
Expand Down
36 changes: 36 additions & 0 deletions src/ydata_profiling/utils/common.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
"""Common util functions (e.g. missing in Python)."""
import collections.abc
import os
import platform
import subprocess
import zipfile
from datetime import datetime, timedelta

Expand All @@ -8,6 +11,10 @@
from pathlib import Path
from typing import Mapping

import requests

from ydata_profiling.version import __version__


def update(d: dict, u: Mapping) -> dict:
"""Recursively update a dict.
Expand Down Expand Up @@ -88,3 +95,32 @@ def convert_timestamp_to_datetime(timestamp: int) -> datetime:
return datetime.fromtimestamp(timestamp)
else:
return datetime(1970, 1, 1) + timedelta(seconds=int(timestamp))


def analytics_features(dataframe, datatype: bool, report_type: bool):
endpoint = "https://packages.ydata.ai/ydata-profiling?"

if os.getenv("YDATA_PROFILING_NO_ANALYTICS") != True:
package_version = __version__
try:
subprocess.check_output("nvidia-smi")
gpu_present = True
except Exception:
gpu_present = False

python_version = ".".join(platform.python_version().split(".")[:2])

try:
request_message = (
f"{endpoint}version={package_version}"
f"&python_version={python_version}"
f"&report_type={report_type}"
f"&dataframe={dataframe}"
f"&datatype={datatype}"
f"&os={platform.system()}"
f"&gpu={str(gpu_present)}"
)

requests.get(request_message)
except Exception:
pass
42 changes: 42 additions & 0 deletions src/ydata_profiling/utils/logger.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
"""
Logger function for ydata-profiling reports
"""

import logging

import pandas as pd

from ydata_profiling.utils.common import analytics_features


class ProfilingLogger(logging.Logger):
def __init__(self, name, level=logging.INFO):
super().__init__(name, level)

def info(
self,
msg: object,
) -> None:
super().info(f"[PROFILING] - {msg}.")

def info_def_report(self, dataframe, timeseries: bool):
if dataframe == pd.DataFrame:
dataframe = "pandas"
report_type = "regular"
elif dataframe == type(None):
dataframe = "pandas"
report_type = "compare"
else:
dataframe = "spark"
report_type = "regular"

datatype = "timeseries" if timeseries else "tabular"

analytics_features(
dataframe=dataframe, datatype=datatype, report_type=report_type
)

super().info(
f"[PROFILING] Calculating profile with the following characteristics "
f"- {dataframe} | {datatype} | {report_type}."
)
Loading