- A console script program that parses the list of JSON files of the main entity and generates statistics (the total number) in terms of one of its attributes.
- As launch parameters, it receives the path to the folder where the JSON files are stored (there may be several of them there) and the name of the attribute on which to generate statistics.
- The program must support work with several attributes, and the user will specify one of them.
- One of the attributes must be text and have several values (comma categories, hash tags, etc.).
- As a result of the work, the program creates an XML file with statistics sorted by number from the largest to the smallest.
- The name of the result file will be
statistics_by_{attribute}.xml
.
- Project Structure
- Usage
- Description of domain entities
- Description of services
- Input and output examples files
- Results of testing parsing with different count of threads
- Author Info
- The application consists of ui, service layers.
- An app modules:
model
: contains the data model classes used throughout the application. These classes represent the core entities the application.service
: encapsulates the business logic of the application. This includes functionalities such as data processing, calculations.ui
: handles the user interface aspects of the application.
- I would you recommended to clone my project from the GitHub.
If you want to do this, please use this command:
git clone https://github.com/dima666Sik/ProfITsoft-Internship.git
- To run this project, you will need to install:
- JDK 17 or higher;
- Then you can check performance program by using junit tests;
- Also, you can start working with app using
Main
class, start main method; If you decide to start an app usingMain
class, you should use the console to write commands by using them u will be able to use the features' app; - The JSON file(s) will be automatically generated using
first case
when you start application and chose first case.
The path to the JSON file(s) is:src/main/resources/json-files
; - The result working
second case
app to parse the JSON file(s) you can see in thesrc/main/resources/xml-files
directory.
In this description you will see info about three main domain entities:
-
Planet
: represents a celestial body in space, a class used to generate JSON files and statistics.- Attributes:
id
: Unique identifier for the planet.name
: Name of the planet.mass
: Mass of the planet, represented by an instance of Mass.diameter
: Diameter of the planet, represented by an instance of Diameter.hasRings
: Indicates whether the planet has rings.hasMoons
: Indicates whether the planet has moons.atmosphericComposition
: Composition of the planet's atmosphere.planetarySystem
: The planetary system to which the planet belongs.
- Attributes:
-
PlanetarySystem
: represents a system of celestial bodies, typically containing one or more planets.- Attributes:
id
: Unique identifier for the planetary system.name
: Name of the planetary system.
- Attributes:
-
StatisticsInfo
: represents statistical information about a Planets by attribute.- Attributes:
attribute
: The attribute being analyzed. This haveT
type that extendsComparable<T>
numberOfRepetitions
: The number of times the attribute has been observed.
- Attributes:
In this description you will see info about four models in service
(described only implementations):
generator
- This module contains two classes that are implementation of interfaces:- JsonObjectMultipleFileGenerator is a class that implements the ObjectMultipleFileGenerator interface. It provides functionality to generate multiple JSON files from lists of objects.
- XmlFileCreator is a class that implements the FileCreator interface and provides functionality to generate XML files based on statistics collected from JSON data.
parser
- The module contains class that is implementation of the FileParser interface.- JsonToXmlParser is a class that implements the FileParser interface and provides functionality to parse JSON files into file with XML format that contains statistics info by indicated attribute. It processes JSON files from a specified directory, converts them into XML file, and saves it to a default location.
reader
- The module contains class that is implementation of the FileReader interface.- JsonFileReader is a class that uses Jackson to parse and deserialize objects from a JSON file. Reading the JSON file will be by small parts for height performance because we avoid reading the entire JSON file, so we won't see OutOfMemoryError if the file(s) is large.
statistic
- The module contains the abstract class and interface that are implemented by PlanetStatisticProcessor.- AbstractStatisticsProcessor is an abstract class that provides common functionality for processing statistics. It implements the StatisticsProcessor interface and provides methods to collect and manage statistics information. Subclasses can extend this class to implement specific logic for processing statistics related to different data types.
- PlanetStatisticsProcessor is a class that extends AbstractStatisticsProcessor and implements the StatisticsProcessor interface that provides functionality to collect statistics from JSON data related to planets. It processes JSON files, extracts specified attributes, and collects statistics on them.
Patterns were used into this project:
- Singleton pattern
- Facade pattern
- Builder pattern
Into this project I tried to adhere to all of these principles.
For example, our subject area is about Planet, then the format can be as follows:
[
{
"id": 1,
"name": "Planet530",
"mass": {
"value": 9.398156073270247,
"unit": "KILOGRAM"
},
"diameter": {
"value": 7136.0,
"unit": "KILOMETER"
},
"hasRings": true,
"hasMoons": true,
"atmosphericComposition": "Oxygen, Nitrogen, Carbon dioxide",
"planetarySystem": {
"id": 1,
"name": "Planetary System X1"
}
},
...
{
"id": 7,
"name": "Planet401",
"mass": {
"value": 4.927016280930118,
"unit": "KILOGRAM"
},
"diameter": {
"value": 9502.0,
"unit": "KILOMETER"
},
"hasRings": true,
"hasMoons": false,
"atmosphericComposition": "Nitrogen, Carbon dioxide",
"planetarySystem": {
"id": 1,
"name": "Planetary System X1"
}
}
]
If you want to change the subject area you can add a new implementation StatisticsProcessor and add a class with constants by new subject area.
At the result of the parsing we will receive this statistics info (for example we decide to use atmosphericComposition
attribute):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<statistics>
<item>
<value>Nitrogen</value>
<count>17</count>
</item>
<item>
<value>Carbon dioxide</value>
<count>13</count>
</item>
<item>
<value>Oxygen</value>
<count>8</count>
</item>
<item>
<value>Helium</value>
<count>4</count>
</item>
<item>
<value>Hydrogen</value>
<count>4</count>
</item>
<item>
<value>Beryllium</value>
<count>3</count>
</item>
<item>
<value>Magnesium</value>
<count>1</count>
</item>
</statistics>
Also, if you want you can generate statistics using another attribute, for example id
Planet
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<statistics>
<item>
<value>5</value>
<count>3</count>
</item>
<item>
<value>6</value>
<count>3</count>
</item>
<!--... skip tags ...-->
<item>
<value>12</value>
<count>1</count>
</item>
</statistics>
Count threads | Time processing (seconds) |
---|---|
one | 0.250 |
two | 0.201 |
three | 0.184 |
four | 0.189 |
Count threads | Time processing (seconds) |
---|---|
one | 16.353 |
five | 11.106 |
ten | 12.265 |
twelve | 11.945 |
fifty | 12.264 |
one hundred | 12.326 |
For assume time parsing files, I was created the ua.code.intership.proft.it.soft.service.util.TimeChecker
.
The number of JSON files in the directory is three.
So if we have less than three threads to process files, the program will process files with free threads and if
there are not enough app will wait for free threads to process other files that remain, it will take more time.
Otherwise, the threads will exist but not be used, so time work in that case won't be shorter.
The better way is when several threads == number of files, so each file is processed in a separate thread and doesn't wait
for free threads, or we haven't taken memory and don't use useless threads.
However, the efficiency will also depend on the number of available processor cores and the volume of processed
files.