This repository automates the creation of output tabulations in .xlsx files from a Pandas DataFrame. You can define styles for the tabulations (font type, font size, gradient colors, borders, etc.), sections or pages of the Excel workbook where the tabulations will be inserted, select the variables to be tabulated from the DataFrame, the cross or secondary variables with which the selected variables will be disaggregated, and establish how the data will be presented (in absolute values, in total percentages, in percentages by columns, or in percentages by rows).
This repository starts from a database loaded in DataFrame format in Pandas and with the definition of a settings DataFrame:
VAR | PAGE | SUM | DISAGGREGATE |
---|---|---|---|
P1 - Test question 1 | 1 | Expansion_factor | FALSE |
P1 - Test question 1 | 1 | P2 - Test question 2,${multiply},Expansion_factor | FALSE |
P3 - Test question 3 | 1 | Expansion_factor | FALSE |
P4 - Example Category 1 | 2 | Expansion_factor | TRUE |
P5 - Example Category 2 | 2 | Expansion_factor | TRUE |
With these settings and by defining the table styles in the config_style.json
file, the order in which the disaggregation or secondary variables should be cataloged (optional) in the config_values.json
file, and the table_type
variable in main.py
, you can obtain the following results:
Adjustments should only be made to the start_code
function in the main.py
file by specifying the name of the Excel file to be generated (db_name
), the type of tables to be generated (table_type
), the dataframe with the data to be tabulated (db
), the dataframe with the adjustment data for processing the tables (db_vars
), and optionally a filter for the dataframe (Split_df_by
) and a footer text for each table (footer_text
).
Optionally, adjustments can also be made to the files within the settings
directory to configure the way data is presented and the format of the tables. In config_values.json
, the order in which variables should be displayed in the tables is specified, and in config_style.json
, all the style parameters of the tables are configured.
22/05/2024
- Simple Tables: Added the option to produce simple tables without crossing with any variable, where data is presented in quantities and percentages (option 4 in the
table_type
variable). - Bug Fixes: Fixed a bug in the code when calculating totals for variables that have blank values.
- Value Sorting: Improved the code so that the values in the
Var
column of the output tables are also sorted according to the order specified in theconfig_values.json
file. - Chi-Squared Test: In the absolute values option (option 0 in the
table_type
variable), added the chi-squared test to evaluate the degree of independence between variables. - Variable Operations: Modified the logic of the instructions placed in the
SUM
column ofdb_vars
to support the specification of variables to operate. It is now possible to use the multiplication of two variables as a factor by passing the valueVariable 1,${multiply},Variable 2
.
23/05/2024
- Total row: Added a totals row to all tables except %rows (
table_type=3
). - Bug Fixes: Fixed an error in the code that originated when adding the simple tables option.