Skip to content

Data Explorer

Julia Silge edited this page Jun 3, 2024 · 4 revisions

The Data Explorer is intended to complement code-first exploration of data, allowing you to display data in a spreadsheet-like grid, temporarily filter and sort data, and provide useful summary statistics directly inside of Positron. The goal of the Data Explorer is not to replace code-based workflows, but rather supplement with ephemeral views of the data or summary statistics as you further explore or modify the data via code.

The Data Explorer has three primary components, discussed in greater detail in the sections below:

  • Data Grid: Spreadsheet-like display of the individual cells and columns, as well as sorting
  • Summary panel: Summary statistics and missing data for each column
  • Filter bar: Ephemeral filters for specific columns

Open a dataframe in the Data Explorer

Each instance of the Data Explorer tool is powered by a language runtime and can display dataframes in Python (pandas) or R (data.frame, tibble, data.table). Additional Python dataframe libraries will be added in the future.

Each instance of the data explorer will be refreshed with changes made to the underlying data. This allows combined workflows between the UI-centric Data Explorer and a code-first approach.

To open a new instance of the Data Explorer on a specific data frame:

  • Navigate to the variables pane and click on the Data Explorer icon for a specific dataframe object
  • Use the language runtime directly:
    • Via Python: %view dataframe label
    • Via R: View(dataframe, "label")

Data grid

The data grid is the primary display, with a spreadsheet-like cell by cell view. It is intended to scale efficiently to relatively large in-memory datasets, up to millions of rows or columns. Each column header has the column name above the data type as used in the language runtime. There is a context menu at the top right of each column to control sorting or quick-add of a filter for the selected column. Columns are resizable by click and drag of the column borders.

Row labels default to the observed row index, with a zero-based index in Python and a one-based index in R. Alternatively, pandas and R users may also have rows with modified indices or string-based labels.

Summary panel

The summary panel displays a vertical scrolling list of all of the column names and an incon representing their respective type. It also displays the amount of missing data as a increasing percentage and an inline bar graph.

Each column can be expanded by clicking on the dropdown icon to display additional summary statistics that vary by column type. Double clicking on a column name will also bring the column into focus in the data grid, allowing for quickly navigating wider data. The summary panel can be placed on the left or right side of the Data Explorer or temporarily hidden via the Layout control.

Filter bar

The filter bar has controls to show, hide, or remove existing filters as well as a + button to add a new filter. The status bar at the bottom of the Data Explorer also displays the percentage and number of remaining rows after applying a filter. When creating a new filter, you will first need to select a column either by scrolling the full list or searching across columns for a specific string. Once a column is selected, the available filters for that column type will be displayed. Alternatively, the context menu in each column label of the data grid also allows for creating filters with the column name pre-populated.

Available filters vary according to the column type. For example, string columns have filter affordances for: contains, starts or ends with, is empty, or exact matches. Alternatively, numeric columns have logical operations such as: is less than or greater than, is equal to, or is inclusively between two values.

Clone this wiki locally