ETL

The Extractor Transformer and Loader, or ETL, module for OrientDB provides support for moving data to and from OrientDB databases using ETL processes.

  • Configuration: The ETL module uses a configuration file, written in JSON.
  • Extractor Pulls data from the source database.
  • Transformers Convert the data in the pipeline from its source format to one accessible to the target database.
  • Loader loads the data into the target database.

How ETL Works

The ETL module receives a backup file from another database, it then converts the fields into an accessible format and loads it into OrientDB.

EXTRACTOR => TRANSFORMERS[] => LOADER

For example, consider the process for a CSV file. Using the ETL module, OrientDB loads the file, applies whatever changes it needs, then stores the reocrd as a document into the current OrientDB database.

+-----------+-----------------------+-----------+
|           |              PIPELINE             |
+ EXTRACTOR +-----------------------+-----------+
|           |     TRANSFORMERS      |  LOADER   |
+-----------+-----------------------+-----------+
|   FILE   ==>  CSV->FIELD->MERGE  ==> OrientDB |
+-----------+-----------------------+-----------+

You can modify this pipeline, allowing the transformation and loading phases to run in parallel by setting the configuration variable "parallel" to true.

{"parallel": true}

Installation

Since version 2.0, OrientDB bundles the ETL module with the official release.

Usage

To use the ETL module, run the oetl.sh script with the configuration file given as an argument.

$ $ORIENTDB_HOME/bin/oetl.sh config-dbpedia.json
NOTE NOTE: If you are importing data for use in a distributed database, then you must set ridBag.embeddedToSbtreeBonsaiThreshold=Integer.MAX\_VALUE for the ETL process to avoid replication errors, when the database is updated online.

Available Components

Examples:

results matching ""

    No results matching ""