ETL - Loaders

Loader components are the last part of the ETL process. They are responsible for the saving of records.

Available Loaders

Output OrientDB

Output

It's the default Loader. It prints the transformation result to the console output.

  • Component name: output
  • Accepted input classes: [Object]

OrientDB

Loads record and vertices into an OrientDB database.

  • Component name: orientdb
  • Accepted input classes: [ODocument, OrientVertex]

Syntax

Parameter Description Type Mandatory Default value
dbURL Database URL string true
dbUser User Name string false admin
dbPassword User Password string false admin
dbAutoCreate If the database not exists, create it automatically boolean false true
dbAutoCreateProperties Auto create properties in schema boolean false false
dbAutoDropIfExists Auto drop the database if already exists boolean false false
tx Use transactions or not boolean false false
txUseLog Use log in transaction. If WAL is disabled you can still use not reliable transactions by setting txUseLog=true. This is useful to group many operations in batch, like create edges boolean false
wal Use WAL (Write Ahead Logging). Disable WAL to achieve better performances boolean false true
batchCommit With transactions enabled, commit every X entries. Use this to avoid having one huge transaction in memory integer false 0
dbType Database type, between 'graph' or 'document' string false document
class Class name to use to store the new record string false
cluster Cluster name where to store the new record string false
classes Creates classes (if not already defined in database). Look at classes syntax for more information inner document false
indexes Contains the indexes used on ETL process. Before starting any declared index not present in database will be created automatically. Index configuration must have "type", "class" and "fields". Look at indexes syntax for more information inner document false
useLightweightEdges Changes the default setting about using Lightweight Edges boolean false false
standardElementConstraints Changes the default setting about using the TinkerPop Blueprints constraints: values cannot be null and 'id'cannot be used as property name boolean false true
Classes
Parameter Description Type Mandatory Default value
name Class name string true
extends Super class name string false
clusters Number of clusters to create under the class. Since 2.1 integer false 1
Indexes
Parameter Description Type Mandatory Default value
name Index name string false
class Class name where to create the index string true
type Index type between the available ones string true
fields Array of field names to index. To specify the field type use the syntax <field-name>:<field-type> string true
metadata Additional index metadata string false

Example

Below is an example of configuration to load data in an OrientDB database called "dbpedia", in the directory "/temp/databases", open using "plocal" protocol and used as "graph". The loading is transactional and commits the transaction every 1,000 inserts. Two lookup vertices have been created with the index against the property string "URI" in the base vertex "V" class. The index is unique.

"orientdb": {
      "dbURL": "plocal:/temp/databases/dbpedia",
      "dbUser": "importer",
      "dbPassword": "IMP",
      "dbAutoCreate": true,
      "tx": false,
      "batchCommit": 1000,
      "wal" : false,
      "dbType": "graph",
      "classes": [
        {"name":"Person", "extends": "V" },
        {"name":"Customer", "extends": "Person", "clusters":8 }
      ],
      "indexes": [
        {"class":"V", "fields":["URI:string"], "type":"UNIQUE" },
        {"class":"Person", "fields":["town:string"], "type":"NOTUNIQUE" ,
            metadata : { "ignoreNullValues" : false }
        }
      ]
    }