ETL
The Extractor Transformer and Loader, or ETL, module for OrientDB provides support for moving data to and from OrientDB databases using ETL processes.
- Configuration: The ETL module uses a configuration file, written in JSON.
- Extractor Pulls data from the source database.
- Transformers Convert the data in the pipeline from its source format to one accessible to the target database.
- Loader loads the data into the target database.
How ETL Works
The ETL module receives a backup file from another database, it then converts the fields into an accessible format and loads it into OrientDB.
EXTRACTOR => TRANSFORMERS[] => LOADER
For example, consider the process for a CSV file. Using the ETL module, OrientDB loads the file, applies whatever changes it needs, then stores the reocrd as a document into the current OrientDB database.
+-----------+-----------------------+-----------+
| | PIPELINE |
+ EXTRACTOR +-----------------------+-----------+
| | TRANSFORMERS | LOADER |
+-----------+-----------------------+-----------+
| FILE ==> CSV->FIELD->MERGE ==> OrientDB |
+-----------+-----------------------+-----------+
You can modify this pipeline, allowing the transformation and loading phases to run in parallel by setting the configuration variable "parallel"
to true
.
{"parallel": true}
Installation
Since version 2.0, OrientDB bundles the ETL module with the official release.
Usage
To use the ETL module, run the oetl.sh
script with the configuration file given as an argument.
$ $ORIENTDB_HOME/bin/oetl.sh config-dbpedia.json
![]() |
NOTE: If you are importing data for use in a distributed database, then you must set ridBag.embeddedToSbtreeBonsaiThreshold=Integer.MAX\_VALUE for the ETL process to avoid replication errors, when the database is updated online. |
Available Components
Examples: