DuckDB 入門
これは何
- DuckDB 入門メモ
DuckDB とは
- DuckDB はオープンソースの DBMS
- SQLite っぽい感じだが OLAP 向けに設計されている
DuckDB – An in-process SQL OLAP database management system
https://duckdb.org/
duckdb/duckdb: DuckDB is an analytical in-process SQL database management system
https://github.com/duckdb/duckdb
とりあえず動きが見たい
- web ブラウザで動作する Demo がある
DuckDB Shell
https://shell.duckdb.org/
インストール
brew でインストール可能
brew install duckdb
DuckDB Installation – DuckDB
https://duckdb.org/docs/installation/
% duckdb -help
Usage: duckdb [OPTIONS] FILENAME [SQL]
FILENAME is the name of an DuckDB database. A new database is created
if the file does not previously exist.
OPTIONS include:
-append append the database to the end of the file
-ascii set output mode to 'ascii'
-bail stop after hitting an error
-batch force batch I/O
-box set output mode to 'box'
-column set output mode to 'column'
-cmd COMMAND run "COMMAND" before reading stdin
-c COMMAND run "COMMAND" and exit
-csv set output mode to 'csv'
-echo print commands before execution
-init FILENAME read/process named file
-[no]header turn headers on or off
-help show this message
-html set output mode to HTML
-interactive force interactive I/O
-json set output mode to 'json'
-line set output mode to 'line'
-list set output mode to 'list'
-markdown set output mode to 'markdown'
-newline SEP set output row separator. Default: '\n'
-nofollow refuse to open symbolic links to database files
-no-stdin exit after processing options instead of reading stdin
-nullvalue TEXT set text string for NULL values. Default ''
-quote set output mode to 'quote'
-readonly open the database read-only
-s COMMAND run "COMMAND" and exit
-separator SEP set output column separator. Default: '|'
-stats print memory stats before each finalize
-table set output mode to 'table'
-unredacted allow printing unredacted secrets
-unsigned allow loading of unsigned extensions
-version show DuckDB version
% duckdb
v1.1.1 af39bd0dcf
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D .help
.bail on|off Stop after hitting an error. Default OFF
.binary on|off Turn binary output on or off. Default OFF
.cd DIRECTORY Change the working directory to DIRECTORY
.changes on|off Show number of rows changed by SQL
.check GLOB Fail if output since .testcase does not match
.columns Column-wise rendering of query results
.constant ?COLOR? Sets the syntax highlighting color used for constant values
.constantcode ?CODE? Sets the syntax highlighting terminal code used for constant values
.databases List names and files of attached databases
.dump ?TABLE? Render database content as SQL
.echo on|off Turn command echo on or off
.excel Display the output of next command in spreadsheet
.edit Opens an external text editor to edit a query.
.exit ?CODE? Exit this program with return-code CODE
.explain ?on|off|auto? Change the EXPLAIN formatting mode. Default: auto
.fullschema ?--indent? Show schema and the content of sqlite_stat tables
.headers on|off Turn display of headers on or off
.help ?-all? ?PATTERN? Show help text for PATTERN
.highlight [on|off] Toggle syntax highlighting in the shell on/off
.import FILE TABLE Import data from FILE into TABLE
.indexes ?TABLE? Show names of indexes
.keyword ?COLOR? Sets the syntax highlighting color used for keywords
.keywordcode ?CODE? Sets the syntax highlighting terminal code used for keywords
.log FILE|off Turn logging on or off. FILE can be stderr/stdout
.maxrows COUNT Sets the maximum number of rows for display (default: 40). Only for duckbox mode.
.maxwidth COUNT Sets the maximum width in characters. 0 defaults to terminal width. Only for duckbox mode.
.mode MODE ?TABLE? Set output mode
.nullvalue STRING Use STRING in place of NULL values
.once ?OPTIONS? ?FILE? Output for the next SQL command only to FILE
.open ?OPTIONS? ?FILE? Close existing database and reopen FILE
.output ?FILE? Send output to FILE or stdout if FILE is omitted
.parameter CMD ... Manage SQL parameter bindings
.print STRING... Print literal STRING
.prompt MAIN CONTINUE Replace the standard prompts
.quit Exit this program
.read FILE Read input from FILE
.rows Row-wise rendering of query results (default)
.schema ?PATTERN? Show the CREATE statements matching PATTERN
.separator COL ?ROW? Change the column and row separators
.sha3sum ... Compute a SHA3 hash of database content
.shell CMD ARGS... Run CMD ARGS... in a system shell
.show Show the current values for various settings
.system CMD ARGS... Run CMD ARGS... in a system shell
.tables ?TABLE? List names of tables matching LIKE pattern TABLE
.testcase NAME Begin redirecting output to 'testcase-out.txt'
.timer on|off Turn SQL timer on or off
.width NUM1 NUM2 ... Set minimum column widths for columnar output
JSON ファイルをインポートする
- DuckDB は CSV や JSON ファイルをロードしてよしなに使うことが出来る
- JSON ファイルのドキュメントを読みつつ、動作を確認していく
JSON Loading – DuckDB
https://duckdb.org/docs/data/json/overview
データを用意
todos.json として、以下のようなデータを用意する
[
{
"userId": 1,
"id": 1,
"title": "Buy groceries",
"completed": false
},
{
"userId": 1,
"id": 2,
"title": "Attend meeting",
"completed": true
},
{
"userId": 2,
"id": 3,
"title": "Write report",
"completed": false
},
{
"userId": 2,
"id": 4,
"title": "Plan trip",
"completed": true
}
]
read_json
関数を使用する
duckdb で read_json
関数を使用する
SELECT *
FROM read_json('todos.json',
format = 'array',
columns = {userId: 'UBIGINT',
id: 'UBIGINT',
title: 'VARCHAR',
completed: 'BOOLEAN'});
実行結果の例
% duckdb
v1.1.1 af39bd0dcf
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D SELECT *
FROM read_json('todos.json',
format = 'array',
columns = {userId: 'UBIGINT',
id: 'UBIGINT',
title: 'VARCHAR',
completed: 'BOOLEAN'});
┌────────┬────────┬────────────────┬───────────┐
│ userId │ id │ title │ completed │
│ uint64 │ uint64 │ varchar │ boolean │
├────────┼────────┼────────────────┼───────────┤
│ 1 │ 1 │ Buy groceries │ false │
│ 1 │ 2 │ Attend meeting │ true │
│ 2 │ 3 │ Write report │ false │
│ 2 │ 4 │ Plan trip │ true │
└────────┴────────┴────────────────┴───────────┘
D
stdin でファイルを入力する使用する方法
% cat ./todos.json | duckdb -c "SELECT * FROM read_json('/dev/stdin')"
┌────────┬───────┬────────────────┬───────────┐
│ userId │ id │ title │ completed │
│ int64 │ int64 │ varchar │ boolean │
├────────┼───────┼────────────────┼───────────┤
│ 1 │ 1 │ Buy groceries │ false │
│ 1 │ 2 │ Attend meeting │ true │
│ 2 │ 3 │ Write report │ false │
│ 2 │ 4 │ Plan trip │ true │
└────────┴───────┴────────────────┴───────────┘
JSON からテーブルを作成する
JSON ファイルを COPY してテーブルとする
CREATE TABLE todos (userId UBIGINT, id UBIGINT, title VARCHAR, completed BOOLEAN);
COPY todos FROM 'todos.json';
SELECT で JSON ファイルを読み取ってテーブルとする
CREATE TABLE todos AS
SELECT * FROM 'todos.json';
JSON ファイルに SQL でクエリした結果をテーブルとする
COPY (SELECT * FROM todos) TO 'todos.json';
テーブル情報を確認する
.tables
でテーブルリストを確認
D .tables
todos
DESCRIBE
, SHOW
でテーブルスキーマを確認
D DESCRIBE todos;
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ userId │ BIGINT │ YES │ │ │ │
│ id │ BIGINT │ YES │ │ │ │
│ title │ VARCHAR │ YES │ │ │ │
│ completed │ BOOLEAN │ YES │ │ │ │
└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘
D SHOW todos;
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ userId │ BIGINT │ YES │ │ │ │
│ id │ BIGINT │ YES │ │ │ │
│ title │ VARCHAR │ YES │ │ │ │
│ completed │ BOOLEAN │ YES │ │ │ │
└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘
S3 Object として保存してある JSON をロードする
# AWS クレデンシャルのロード
D CREATE SECRET mysecret (
TYPE S3,
PROVIDER CREDENTIAL_CHAIN
);
# JSON をロード
D select * from read_json_auto('s3://mybucket/myobject.json');
AWS Extension – DuckDB
https://duckdb.org/docs/extensions/aws.html
S3 API Support – DuckDB
https://duckdb.org/docs/extensions/httpfs/s3api
感想
- サクッと動かせてよかった
- DuckDB は JSON に限らず様々なデータをロードできるため、ちょっとしたデータ分析を SQL で行いたい場合に便利に使えそう
- 個人的には AWS CLI などのツールで巨大な JSON を調べることが多いので、そういう分析に便利に使えたら良さそうだなと思っている
Guides – DuckDB
https://duckdb.org/docs/guides/overview
試した環境
% sw_vers
ProductName: macOS
ProductVersion: 14.7
BuildVersion: 23H124
% duckdb -version
v1.1.1 af39bd0dcf