Database structure

The web interface for the database is written in the Django web framework and the database itself is of SQL type. That is, any structured database should in principle be fine for hosting (mySQL, SQLite), but we recommend using mySQL/MariaDB. This section provides an overview of the Django models used for the website. The presentation focuses on how the models are defined in the Python source code and not the actual SQL tables. For example, even though fields such as the primary key are not listed in the following, it is understood that these are automatically created for the SQL tables.

Most models inherit from a base model which records information of how each entry is created/updated. Since actual inheritance is not supported in relational databases, Django explicitly copies these fields to any child models.

Base

created

date the entry was created

updated

date the entry was last modified

created by

user that created the entry

updated by

user that updated the entry

All properties are stored in a table that contains the name of the property.

Property(Base)

name

displayed name of the property

All units are stored in a table that contains the label field.

Unit(Base)

label

“nm”, “cm2 V-1 s-1”, …

A solid system is defined by the following properties.

System

compound name

displayed name of the material

formula

chemical formula for the compound

group

alternate names

organic

organic component

inorganic

inorganic component

iupac

IUPAC name

last_update

date the system was last modified

derived_to_from

a ManyToManyField, used if the system is somehow directly linked to another system

description

description of the compound

Authors and references are stored in the following tables.

Author

first_name

first name

last_name

last name

institution

institution, does not have to be the full address

reference

a ManyToMany field that maps authors to references.

Reference

title

title of the paper

journal

journal name

vol

volume

pages_start

starting page number

pages_end

end page number

year

year of publication

doi_isbn

DOI/ISBN if applicable

All experimental and theoretical results are contained in data sets. A data set typically refers to a single value, table, or figure found in a reference. The quantity of primary interest is called the “primary property”. For example, given some data where the band gap depends on temperature, the band gap and temperature would be the “primary” and “secondary” properties, respectively (think of these as y- and x-values in a plot).

Dataset(Base)

caption

description of the data set

system

foreign key for System

primary_property

foreign key for Property

primary_unit

foreign key for Unit

primary_property_label

custom label for the y-axis (typically left empty and the property name is used instead)

secondary_property

foreign key for Property

secondary_unit

foreign key for Unit

secondary_property_label

custom label for the x-axis (typically left empty and the property name is used instead)

reference

foreign key for Reference

visible

whether the data is visible on the website

is_figure

whether the data should be plotted

plotted

whether data is plotted by default

experimental

whether the data is of experimental origin (theoretical if false)

dimensionality

dimensionality of the inorganic component as understood in the HOIP literature (not the dimensionality of the crystal)

sample_type

single crystal, powder, ldots

extraction_method

short explanation for how the data was obtained

representative

in case of multiple entries of the same property for a given material, whether this data set should be shown on the material’s main page.

linked_to

a ManyToManyField, used if the numerical values of this data set are somehow directly linked to another data set

verified_by

list of users that have verified the correctness of the data set

A data set consists of one or more data subsets. One is always present but there could be several if it is possible to logically group the data somehow. For instance, different curves in a figure would correspond to separate data subsets.

Subset(Base)

dataset

foreign key for Dataset

label

short description of this subset

crystal_system

one of the seven crystal systems

A data subset consists of one or more data points. When describing a single value such as the band gap of a material with no additional dependencies, the whole data set would consist of one subset with only one data point with one numerical value.

Datapoint(Base)

subset

foreign key for Subset

Finally, the actual data is stored in the NumericalValue table.

NumericalValue(Base)

datapoint

foreign key for Datapoint

qualifier

“primary”, “secondary”

value_type

“accurate”, “approximate”, “lower/upper bound”

value

floating point number

counter

counts the number of values attached to a given data point

Any errors (uncertainties) associated with a numerical value are stored in a separate table. In the code, the errors are then retrieved from the database by querying for numerical values with the verb+select_related(‘error’)+ function and checking if a value has an associated error (verb+if hasattr(value, ‘error’)+).

Error(Base)

numerical_value

foreign key for NumericalValue

value

floating point number

Similarly to errors, when dealing with ranges, the upper bounds are stored in a separate table. The value field is then understood to contain the lower bound of the range

UpperBound(Base)

numerical_value

foreign key for NumericalValue

value

floating point number

A separate table is used for values that are fixed across a data subset. For instance, if the curves of band gap vs dopant density are measured for different temperatures, then “band gap”, “dopant density”, and “temperature” would be “primary”, “secondary”, and “fixed”, respectively. Unlike regular numerical values, the fixed values are far lesser in number. Thus, we can attach the errors directly to the values without a performance penalty.

NumericalValueFixed(Base)

dataset

foreign key for Dataset

subset

foreign key for Subset

physical_property

foreign key for Property

unit

foreign key for Unit

value

floating point number

type

“accurate”, “approximate”, “lower/upper bound”, “error”

error

floating point number (optional)

upper_bound

floating point number (optional); if present, then value is understood to be the lower bound for the range

If the dependence of the primary property is on something that cannot be stored as a floating point number, it is stored in the Symbol table. Example: the user enters band gap values a function of phase. The phases are then stored as strings in the following table.

Symbol(Base)

datapoint

foreign key for Datapoint

value

a string

counter

counts the number of symbols attached to a given data point

In case of an experimental study, the details of the synthesis method and the experiment can be stored in the following tables.

SynthesisMethod(Base)

dataset

foreign key for Datapoint

starting_materials

starting materials of synthesis

product

product of synthesis

description

detailed description of the synthesis process

ExperimentalDetails(Base)

dataset

foreign key for Datapoint

method

name of the experimental method

description

detailed description of the experiment

Similarly, in case of a theoretical study, the computational details are recorded in a separate table.

ComputationalDetails(Base)

dataset

foreign key for Datapoint

code

computer code used for calculations

level_of_theory

level of theory used in the calculation

xc_functional

exchange-correlation functional

k_point_grid

details about the K-point grid

level_of_relativity

level of relatively (this includes the description of spin-orbit coupling)

basis_set_definition

anything related to the basis set used (this includes pseudopotential details, if applicable)

numerical_accuracy

information about parameters that control the accuracy of the calculation

Each entry of synthesis method, experimental details, or computational details may have a comment, which is stored in a separate table.

Comment(Base)

synthesis_method

foreign key for SynthesisMethod

experimental_details

foreign key for ExperimentalDetails

computational_details

foreign key for ComputationalDetails

text

comment body

Besides storing all numerical data in a structured database, the data is also stored in the form of files. This way the original user uploaded data is stored without modifications, e.g., preserving any comments that the input file may contain.

InputDataFile(Base)

dataset

foreign key for Dataset

dataset_file

a file upload field

Any additional files, if present, are stored in DatasetFile (input/output files for a calculation, image of the sample, ldots).

AdditionalFile(Base)

dataset

foreign key for Dataset

dataset_file

a file upload field

Phase transition properties, such as the phase transition pressure, required special treatment and are stored in PhaseTransition.

PhaseTransition(Base)

subset

foreign key for Subset

crystal_system_final

final crystal system; crystal_system of the subset is then understood to be the initial crystal system

space_group_initial

initial space group

space_group_final

final space group

direction

direction of the phase transition

hysteresis

details about the hysteresis of the phase transition

value

floating point number

value_type

“accurate”, “approximate”, “lower/upper bound”

counter

number of values attached to a given subset

error

uncertainty of the value

upper_bound

upper bound of the value

All user information is stored in the UserProfile table.

UserProfile

user

the default Django user model

description

description of the user (e.g., undergraduate)

institution

name of the institution

website

website of the user