This is the Qmin - Mineral Chemistry Virtual Assistant. The models herein presented perform mineral classification, missing value imputation by multivariate regression and mineral formula prediction by several Random Forest classification and regression nested models.
The models have been developed by researchers of the Geological Survey of Brazil (SGB/CPRM), with the assistance of the technical manager of the EPMA laboratory of the Institute of Geosciences/University of Brasília (IG/UnB).
Additional information about the building process is available on the internet in the preprint file (original manuscript still not certified by a peer review), or in the published version of our work, available at the journal Computers and Geosciences.
You can also watch the presentation (only in Portuguese) for the release of the Beta version of the application.
-
AMPHIBOLES (13 minerals): ACTINOLITE, ARFVEDSONITE, CUMMINGTONITE, EDENITE, HASTINGSITE, HORNBLENDE (SENSU LATO), KAERSUTITE, KATOPHORITE, MAGNESIOHASTINGSITE, PARGASITE, RICHTERITE, RIEBECKITE, TREMOLITE.
-
APATITE: APATITE (SENSU LATO)
-
CARBONATES (13 minerals): ANCYLITE, ANKERITE, BURBANKITE, CALCITE, CARBOCERNAITE, DOLOMITE, GREGORYITE, KUKHARENKOITE (SENSU LATO), KUTNAHORITE, MAGNESITE, NATROFAIRCHILDITE/NYEREREITE/ZEMKORITE, SHORTITE, SIDERITE
-
CHLORITE: CHLORITE (SENSU LATO)
⚠️ STILL UNSTABLE!⚠️ -
CLAY-MINERALS (5 minerals): BEIDELLITE, CORRENSITE, ILLITE, MONTMORILLONITE, SAPONITE
-
EPIDOTE: EPIDOTE (SENSU LATO)
⚠️ STILL UNSTABLE!⚠️ -
FELDSPARS (8 minerals): ALBITE, ANDESINE, ANORTHITE, ANORTHOCLASE, BYTOWNITE, K-FELDSPAR, LABRADORITE, OLIGOCLASE
-
FELDSPATHOIDS (8 minerals): ANALCIME, CANCRINITE, HAUYNE, LEUCITE, NEPHELINE, NOSEAN, TRIKALSILITE/KALSILITE/KALIOPHILITE/PANUNZITE, SODALITE
-
GARNETS (5 minerals): ALMANDINE, ANDRADITE, GROSSULAR, PYROPE, SCHORLOMITE
-
ILMENITE
-
MICAS (6 minerals): BIOTITE (SENSU LATO), CELADONITE, MUSCOVITE, PARAGONITE, YANGZHUMINGITE, ZINNWALDITE (SENSU LATO)
-
OLIVINES (3 minerals): FAYALITE, FORSTERITE, MONTICELLITE
-
PEROVSKITE
-
PYROXENES (9 minerals): AEGIRINE, AUGITE, DIOPSIDE, ENSTATITE/CLINOENSTATITE, FERROSILITE/CLINOFERROSILITE, HEDENBERGITE, OMPHACITE, PIGEONITE, TITAN-AUGITE
-
QUARTZ
-
SPINELS (5 minerals): CHROMITE, HERCYNITE, MAGNETITE, SPINEL, ULVOSPINEL
-
SULFIDES (18 minerals): ALABANDITE, ARSENOPYRITE, BORNITE, CHALCOCITE, CHALCOPYRITE, CHLORBARTONITE, CUBANITE/ISOCUBANITE, GALENA, HEAZLEWOODITE, MACKINAWITE, PENTLANDITE, POLYDYMITE, PYRITE, PYRRHOTITE, RASVUMITE, SPHALERITE, STROMEYERITE
-
TITANITE
-
ZIRCON
The mineral formulas here implemented for Feldspar, Garnet, Mica, Olivine, Pyroxene and Spinel were calculated based on EPMA data and the total content of Fe3+ was obtained, when possible, by the charge balance after the calculation of atom per formula unit number. Then, the formula printed out in the output is the product of several calculations concatenated into a string datatype column.
The calculation formula for Amphiboles will be made by a multivariate regression for each one of the Crystallographic Sites, still in development, and will later be made available in this repository.
This model is in active development and subject to significant code changes to:
- Increase the number of groups and minerals covered
- Improve performance
- Increase the size of samples used for training
The directory data_raw contains all raw data considered for the models' building. The main source of the data used for training is the GEOROC database. The repository GEOROC is maintained by the Max Planck Institute for Chemistry in Mainz.
Some other data used in this work are a concession of researchers of the Geological Survey of Brazil and was used for the model's test and calibration. Those are available in the folder OtherSources.
Project Developed on R and Python3 languages.
The data wrangling, first missing value imputaion, conversion elements to oxides, and balancing of mineral instances was done in R. The code is available in the Code_R folder.
The final models used in this work were developed in the Python3 language, and are available in the model_py folder. All python codes are available in the Code_Python folder.
- Guilherme Ferreira da Silva, (E-mail: guilherme.ferreira@cprm.gov.br)
- Marcos Vinícius Ferreira, (E-mail: marcos.ferreira@cprm.gov.br)
- Iago Sousa Lima Costa, (E-mail: iago.costa@cprm.gov.br)
- Renato Bernardes Borges, (E-mail: renato.bernardes@unb.br)
- Carlos Eduardo Miranda Mota, (E-mail: carlos.mota@cprm.gov.br)
The source code for Qmin is licensed based on the BSD 3-Clause License, see LICENSE.