Nomic DSL Guide¶
In this guide I would like to explain main cocepts of Nomic DSL and you will
find informations how to write nomic.box
descriptor files. We’re using
Groovy as main language here and we created declarative DSL. The descriptors are
declarative way how to tell application what can be deployed and removed. The core of
descriptor file is collection of items called Facts
. Each fact can be installed
and reverted. The minimum descriptor script contains 3 required facts: the box name,
the box group and version.
group = "app"
name = "some-box"
version = "1.0.0"
Variables in descriptor¶
In your descriptor scripts, you can also use some useful global variables:
- group box group can be any string. You can also structure groups with
/
character. - name is box name that must be unique because it’s used for identification
- version box version
- user username the nomic is using for installation/uploading files into HDFS configured in
nomic.conf
- homeDir each used in HDFS might have his own home directory. It’s usefull when you want to sandboxing your applications/analyses.
- appDir path to application directory in HDFS where are applications installed. The default value is
${homeDir}/app
- nameNode the hostname of Name Node (it’s value of
fs.defaultFS
parameter in your Hadoop configuration)
Also each module (Hive, Hdfs etc) can expose own parameters.
- hiveSchema contain default HIVE schema that is configure via
nomic.conf
. Also good if you want to sandboxing your apps. - hiveJdbcUrl value of
hive.jdbc.url
innomic.conf
that is used by Hive facts. - hiveUser value of
hive.user
innomic.conf
that is used by Hive facts.
Modules & dependencies¶
You can create an application box with multiple modules. This is useful especially for larger applications when you need to organize your content. There is also second use case of modules. Because facts inside the Box don’t know nothing about dependencies, you can solve your dependency problem via modules as well.
Let’s consider we’ve got our application ‘best-analytics’ with some resources and with ./nomic.box
:
group = "mycompany"
name = "best-analytics"
version = "1.0.0"
hdfs {
...
}
The box is build via command:
$ jar cf best-analytics.nomic ./*
Let’s imagine we would like to split the content into two modules kpi
and rfm
.
We will create a 2 new folders with own nomic.box
they will represents our
new modules.
The ./kpi/nomic.box
:
group = "mycompany"
name = "kpi"
version = "1.0.0"
...
and the ./rfm/nomic.box
:
group = "mycompany"
name = "rfm"
version = "1.0.0"
...
The final step is to declare these 2 new folders as modules in main ./nomic.box
:
group = "mycompany"
name = "best-analytics"
version = "1.0.0"
module 'kpi'
module 'rfm'
The module
fact ensure the main application box will have 2 new dependencies
they will be installed before any resource in main box. That means the installation
install each module first and then the best-analytics
. When we install this
bundle, we should see 3 new modules:
$ ./bin/nomic install best-analytics.nomic
$ ./bin/nomic list
mycompany:best-analytics:1.0.0
mycompany:kpi:1.0.0
mycompany:rfm:1.0.0
Also removing of best-analytics
will remove all modules in right order.
Sometimes we also need to tell that our rfm
module depends on kpi
.
That can be achieved via require
fact. Let’s modify our ./rfm/nomic.box
:
group = "mycompany"
name = "rfm"
version = "1.0.0"
require name: "kpi", group: this.group, version: $this.version
Now the rfm
module need kpi
first what means the kpi
module will be
installed first.
Factions¶
Maybe you realized there is no way how to set order how facts are executed. The solution is faction. The Factions are small blocks/groups of facts. Each faction has own unique ID in box and might depend on another faction.
Let’s imagine you want to ensure the resources first and then create some hive tables.
group = "mycompany"
name = "rfm"
version = "1.0.0"
faction ("resources") {
resource 'file-1.csv'
resource 'file-2.csv'
}
faction ("hivescripts", dependsOn = "resources") {
table 'authors' from "create_authors_table.q"
}
Everything declared outside the faction blocks is considered as global facts and it’s executed first. The factions are executed after all these global facts.
group = "mycompany"
name = "rfm"
version = "1.0.0"
faction ("resources") {
resource 'file-2.csv'
}
resource "file-1.csv"
In this example, the file-1.csv
fact will be applied first even it’s declared
after the faction.
Facts¶
Resource¶
The resource
fact is declaring which resource from your box will be uploaded
to where in HDFS. Let’s imagine we’ve got box archive like:
/nomic.box
/some-file.xml
The descriptor below will install the some-file.xml
into application’s
folder (depends how it’s configured).
group = "app"
name = "some-box"
version = "1.0.0"
hdfs {
resource 'some-file.xml'
}
With small modification you can place any resource to any path. E.g.
following example will demonstrate how to place some file to root /app
:
hdfs {
resource 'some-file.xml' to '/app/workflow.xml'
}
If you don’t place /
character, the file will be paced into working
directory that is basically ${appDir}
.
hdfs {
resource 'some-file.xml' to 'workflows/some-workflow.xml'
}
The example above will ensure the file in ${appDir}/workflows/some-workflow.xml
where the some-file.xml
content will be copied.
Also you can redefine the default working directory:
hdfs("/path/to/app") {
resource 'some-file.xml'
}
This example above will install some-file.xml
into /path/to/app/some-file.xml
As I mentioned, the facts are can be installed and uninstalled. In
the resource
case, uninstall means the file will be removed. Anyway you can
mark file by setting property keepIt
to true
and uninstall will
keep the file:
hdfs("/path/to/app") {
resource 'some-file.xml' keepIt true
}
Dir¶
You can also declare presence of directory via dir
fact. The declaration
will create empty new directory if is not present yet.
hdfs {
dir "data"
}
Because path start without /
character, the directory will be created in
current working directory. This declaration also ensure uninstalling that
means the folder will be removed when uninstall or upgrade. If you wish to
keep it, you can use the keepIt
parameter:
hdfs {
dir "data" keepIt true
}
Table¶
You can declare in descriptor also facts for HIVE. You can declare tables,
schemes, you can also ensure the Hive scripts executions. Everything for
Hive must be wrapped in hive
.
Following example show how to create simple table in default schema you
have configured in nomic.conf
:
group = "app"
name = "some-box"
version = "1.0.0"
hive {
table 'authors' from "create_authors_table.q"
}
In you box, you need to have the hive qurey file create_authors_table.q
that will create table if it’s not present in system:
CREATE EXTERNAL TABLE authors(
NAME STRING,
SURNAME STRING
)
STORED AS PARQUET
LOCATION '/data/authors';
In your hive scripts you can use placeholders they will be replaced with
values from descriptor. Values are declared via fields
. This is
sometime usefull when you want e.g. place table into some schema.
hive {
fields 'APP_DATA_DIR': "${appDir}/data", 'DATABASE_SCHEMA': defaultSchema
table 'authors' from "create_authors_table.q"
}
The create_authors_table.q
then use these placeholders:
CREATE EXTERNAL TABLE ${DATABASE_SCHEMA}.authors(
NAME STRING,
SURNAME STRING
)
STORED AS PARQUET
LOCATION '${APP_DATA_DIR}/authors';
Schema¶
This fact create Hive schema during installation and drop this schema during uninstall procedure. This fact is useful if you want to declare multiple schemas or if you don’t want to rely on default schema.
hive {
schema 'my_schema'
}
As I mentioned the example above will drop the schema during uninstall process
that means also during upgrading. If you want to prevent this, you can mark
schema with keepIt
.
hive {
schema 'my_schema' keepIt true
}
You can also declare schemas in hive
block. In this case, the schema will
be used as default schema across all facts inside hive block. Also you might
have multiple blocks. The example below demonstrate more complex usage of schemas.
hive("${user}_${name}_staging") {
table 'some_table' from 'some_script.q'
}
hive("${user}_${name}_processing") {
fields 'DATABASE_SCHEMA': "${user}_${name}_processing"
table 'some_table' from 'some_script.q'
}
hive("${user}_${name}_archive") {
table 'some_table' from 'some_script.q'
}
This descriptor script will ensure 3 schemas where name of schema will be
created as composition of user name, box name and some postfix. As you can
see, each section might have own fields
declaration.
Coordinator¶
The Nomic application is also integrate Oozie. You can declare the Oozie coordinator
that is acting similar as resource
but also submitting the coordinator with parameters.
This fact also ensure the coordinator will be stoped during removing.
Let’s assume we’ve got simple coordinator available as coordinator.xml
in our
Box. In description file we will declare:
group = "examples"
name = "oozieapp"
version = "1.0.0"
oozie {
coordinator "coordinator.xml" parameters SOME_PARAMETER: "value 1", "another.parameter": "value 2"
}
This example copy the XML into HDFS, into application folder and submit a
coordinator job with given parameters like SOME_PARAMETER
and also with
following pre-filled parameters:
name | value |
user.name | The user from Nomic configuration (e.g me ) |
nameNode | The nameNode URL (e.g. hdfs://server:8020 ) |
jobTracker | Job tracker hostname from configuration with port (e.g. server:8032 ) |
oozie.coord.application.path | Path to coordinator XML in HDFS (e.g. /app/examples/oozieapp/coordinator.xml ) |