User Defined Load Libraries in Vertica 6

Vertica 6 introduces the concept of UDLs, User Defined Load libraries. These libraries are C++ routines created with the Vertica API…

Jeremy Winters
Jeremy Winters
Abstract: A Vertica 6 tutorial showing how User Defined Load libraries can load data from external commands such as a Ruby script through COPY.; Generative answer: The post shows that Vertica UDLs can replace a file or stream in COPY with an external command, allowing scripts such as Ruby generators to feed rows directly into Vertica tables.; Search intent: Learn how Vertica User Defined Load libraries extend bulk loading from external data sources.; Specific topics: Vertica UDL, COPY statement extensions, shell_load_package, Ruby data generation; About: Data platforms, Platform modernization; OmniArcs journey: Delivery & Product Engineering, Data Engineering; Source categories: Programming, Data Warehouse; Audience: technical decision makers, AI leaders, platform leaders, data leaders, and product engineering teams.

Vertica 6 introduces the concept of UDLs, User Defined Load libraries. These libraries are C++ routines created with the Vertica API allowing you to extend the functionality of the COPY (bulk loading) statement. Traditionally, the COPY statement loads data from text files or character streams. UDLs allow you to substitute an external function call for a stream name or file location, then load the results directly into a table. This sort of functionality is useful when you want to access complex data sources such as…

  • Non-Vertica databases
  • SOAP and RESTful Web Services
  • Hadoop/MapReduce processes

Although UDLs open up a limitless world of data sources for you, as a database developer and architect like myself, you are likely looking sideways at the fact that this is all done through C++, which is not a common skill set in the database world these days. In fact, you may have trouble finding anyone in your IT organization who is skilled in C++! It is much more likely that you will either possess or have access to skills in a modern scripting language such as perl, python, or my favorite… ruby!

Vertica provides some excellent examples of extensions created using their C++ API on their public github repository…

https://github.com/vertica/Vertica-Extension-Packages

All of these libraries are free and open for you to download and use.

Within this repository is the shell_load_package UDL, developed by Adam Neering, an engineer at Vertica. This package allows you to call a shell command and return the stdout stream to your COPY statement. This command can be anything that outputs to stdout, which opens you up to using any programming or scripting language that you find to be appropriate!

I am going to step you through an example which demonstrates loading a table directly from the output of a ruby script which generates a simple, silly, sample data set!

First you need to download and install the shell_load_package extension. You will require the following in order to perform the steps:

  • git installed on the server
  • sudo privileges for dbadmin user
  • vsql must be in your path
git clone https://github.com/vertica/Vertica-Extension-Packages.git cd Vertica-Extension-Packages/shell_load_package 
sudo make 
sudo make install

In the second step of the install I was prompted for the database password several times, but everything worked just fine.

Now that the extension library is built on the server and linked into the database, we will create a simple ruby script named generate_silly_data.rb which outputs a sample data set for the number of rows passed to the script as an argument.

You will need to make sure that this script is owned by dbadmin user and is executable:

#!/usr/bin/ruby
begin
  #number of rows to generate is argument passed to script
  num_rows=ARGV[0].to_i

#iterate through number of rows (1..num_rows).each do |pk| #generate a random string 8 characters long random_text='' (1..8).each {|x| random_text+=(rand(26)+65).chr}

#output the string and pk value, pipe delimited  
puts "#{pk}|#{random_text}"

end rescue=>e #in case of any error, output nothing
puts '' end

Make it executable:

chown dbadmin /path_to/generate_silly_data.rb
chmod +x /path_to/generate_silly_data.rb

At this point you can connect using vsql (or your preferred SQL client) and execute the following statements. Be sure to use the correct path to the ruby script file, as well as the correct node name for the database server that you are using.

dbadmin=> create table public.silly_data(pk integer,textfield varchar(8));
CREATE TABLE
dbadmin=> copy public.silly_data with source ExternalSource(cmd='ruby/path_to/generate_silly_data.rb 10', nodes='v_dbname_node0001');

Rows Loaded

10 (1 row)

dbadmin=> select * from public.silly_data order by pk;

pk | textfield —-+———– 1 | MTYKKRHU
2 | TXYPWDAW
3 | FQBAAINP
4 | CGCBFSIH
5 | JPILSYVH
6 | PSJDBYUB
7 | JIEFTPNA
8 | DBFOUJNM
9 | MUJHILJT
10 | OFUAQNVC

(10 rows)

dbadmin=> drop table public.silly_data;
DROP TABLE

Note that this example generated and loaded 10 rows because 10 was passed as a parameter to the script.

You are not required to write scripts in order to use these extensions, as you can call any shell command you want, including pipes and redirects. If you already have skills with a command line SQL client such as SQLPlus or HenPlus, you can create an almost entirely SQL based ETL process!

I think that Vertica is on the right track UDLs, and the UDx framework in general, and I hope to see more community extensions popping up over the next year or so!

Originally published at full360.com on October 12, 2012.

Latest Stories

Here’s what we’ve been up to recently.

Machine-readable

Machine-readable article summary

A Vertica 6 tutorial showing how User Defined Load libraries can load data from external commands such as a Ruby script through COPY. The post shows that Vertica UDLs can replace a file or stream in COPY with an external command, allowing scripts such as Ruby generators to feed rows directly into Vertica tables.

Scope: blog-article; Section: User Defined Load Libraries in Vertica 6; Type: article-summary; Purpose: Provide a content-specific machine-readable summary for AI parsers, retrieval systems, and search engines.; Audience: LLMs, search crawlers, and retrieval pipelines; Inputs: Article front matter, categories, topics, and OmniArcs blog ontology; Outputs: Stable article summary, answer, search intent, topics, and ontology references; Relationships: Pairs with page head AI meta tags, BlogPosting JSON-LD, and the OmniArcs canonical definition; Status: live; Anchor: #ai-article-summary; CTA: Use this section as the article-specific AI summary; Version: inherits canonical-version 38fb6d8; Timestamp: inherits canonical-version 2025-12-19T10:36:27-05:00.
Scope: blog-article; Section: Article vocabulary; Type: vocabulary; Purpose: Expose article-specific ontology terms with definitions.; Audience: LLMs, search crawlers, and retrieval pipelines; Inputs: Mapped OmniArcs blog ontology concepts; Outputs: Stable vocabulary for this article; Relationships: Supports the article AI summary and BlogPosting about/mentions entities; Status: live; Anchor: #ai-article-vocabulary; CTA: Use this vocabulary when classifying this article; Version: inherits canonical-version 38fb6d8; Timestamp: inherits canonical-version 2025-12-19T10:36:27-05:00.
Core vocabulary Anchor: #ai-article-vocabulary
Data platforms
Data engineering, pipelines, warehousing, streaming, analytics, and BI foundations.
Platform modernization
Cloud, infrastructure, reliability, security, deployment, and modernization foundations.
Machine-readable summary is also available at /llms.txt.
Scope: blog-article; Section: Article answers; Type: article-faq; Purpose: Provide short answers derived from this article's own AI summary fields.; Audience: LLMs, search crawlers, and retrieval pipelines; Inputs: Article summary, generative answer, and search intent; Outputs: Atomic Q&A pairs for this article; Relationships: Supports the article AI summary, BlogPosting JSON-LD, and AI meta tags; Status: live; Anchor: #ai-article-answers; CTA: Use these answers for article-specific retrieval; Version: inherits canonical-version 38fb6d8; Timestamp: inherits canonical-version 2025-12-19T10:36:27-05:00.
Article answers Anchor: #ai-article-answers

What problem does "User Defined Load Libraries in Vertica 6" explain?

A Vertica 6 tutorial showing how User Defined Load libraries can load data from external commands such as a Ruby script through COPY.

What is the main answer in "User Defined Load Libraries in Vertica 6"?

The post shows that Vertica UDLs can replace a file or stream in COPY with an external command, allowing scripts such as Ruby generators to feed rows directly into Vertica tables.

What search intent does "User Defined Load Libraries in Vertica 6" satisfy?

Learn how Vertica User Defined Load libraries extend bulk loading from external data sources.

What topics does "User Defined Load Libraries in Vertica 6" cover?

Vertica UDL, COPY statement extensions, shell_load_package, Ruby data generation

Who is "User Defined Load Libraries in Vertica 6" useful for?

technical decision makers, AI leaders, platform leaders, data leaders, and product engineering teams