PostgreSQL provide powerful operator and methods to process JSON data. I have taken mock data to explore different operators and methods. Before proceed make sure you have installed PostgreSQL 9.4+ and psql.
I have created mock date with help of Faker gem. Mock data available at https://gist.githubusercontent.com/sivakumarbdu/5c6838015f1ad9efc1959b015d939587/raw/71c0810ffd7dc9e41bc01be8c8ee76f99b6d3dc2/person-data.sql
JSON data will be look like this
{"0": {"name":"Kiera Halvorson","email":"keegan.bashirian@legros.name","interest":["memes","agriculture"],"age":25}, "1" :{}, ...}
In this example, “0” is the Id of a person and it has related information as values. This JSON has information about persons interest. It contains two interest for a person. Lets call it as primary and secondary interest. They might be same (Because generated by Random)
Date generated using ruby script with help of Faker Gem. Ruby script for generating mock data is available here https://gist.github.com/sivakumarbdu/591f6faa9fe55f2d1c4744703b77c193
We have mock data of 10000 users and their interest and age. We are going to frame the SQL query for following.
- Display all users’s details
- User who are interested it “DIY home”
- Users who are interested it “nature” and their age is less than 25
- User who has same primary and secondary interest
- User who are interested in “web design” and “ai”
Lets frame query for each. Before proceed lets create a database and table in PostgreSQL and insert our mock data.
Login to postgresql console and create database, table and mock data from file.
Download the mock data and save it on your desk. https://gist.githubusercontent.com/sivakumarbdu/5c6838015f1ad9efc1959b015d939587/raw/71c0810ffd7dc9e41bc01be8c8ee76f99b6d3dc2/person-data.sql
Lets create “person-data.json” at /home/$user/Documents/
CREATE DATABASE json_demo; CREATE TABLE person(data json)
Once done with above query exit from PostgreSQL and run the following command to insert data into person table. Below examples uses psql utility.
#insert data using psql client. psql -h $host-U $user -d json_demo -f person-data.sql
Now we are ready to query the data. Now we have only on row which contains 10000 person data in a JSON. Lets start with our queries.
- Display all users’s details
SELECT json_each_text(person.data) FROM person
Output will be like this. It contains two column key and value.
(0,"{""name"":""Kiera Halvorson"",""email"":""keegan.bashirian@legros.name"",""interest"":[""memes"",""agriculture""],""age"":25}") (1,"{""name"":""Golden Lynch"",""email"":""brandon.kautzer@boehm.org"",""interest"":[""DIY home"",""electronics""],""age"":43}") . . .
Now Lets take each persons interest. In the below query we are taking key(person’s Id) and navigating to next level to get person’s interest.
SELECT person_data.key ID, data -> person_data.key::text ->> 'interest' Interest FROM person, json_each_text(person.data) as person_data
Output will be
id |interest | ----|----------------------------------| 0 |["memes","agriculture"] | 1 |["DIY home","electronics"] | 2 |["garden design","DIY home"] | ...
Now lets take only primary interest of a person.
SELECT person_data.key ID, (data -> person_data.key::text ->> 'interest')::jsonb -> 0 Interest FROM person, json_each_text(person.data) as person_data
Output will be
id |interest | ----|----------------| 0 |"memes" | 1 |"DIY home" | 2 |"garden design" | . . . .
Note the selection option “(data -> person_data.key::text ->> ‘interest’)::jsonb -> 0”.
Similarly to display person’s secondary interest use the index 1
SELECT person_data.key ID, (data -> person_data.key::text ->> 'interest')::jsonb -> 1 Interest FROM person, json_each_text(person.data) as person_data
output will be
id |interest | ----|----------------| 0 |"agriculture" | 1 |"electronics" | 2 |"DIY home" | . . .
2. User who are interested it “DIY home”
SELECT (data -> person_data.key::text ->> 'name') as Name , (data -> person_data.key::text ->> 'email') as Email, (data -> person_data.key::text ->> 'interest') as Interest FROM person, json_each_text(person.data) as person_data where (data -> person_data.key::text ->> 'interest')::jsonb ? 'DIY home'
Above query will return person’s name, email and interest who are all interested in “DIY home”.
Output will be :
name |email |interest | -------------------------|-----------------------------------------|-----------------------------| Golden Lynch |brandon.kautzer@boehm.org |["DIY home","electronics"] | Alex Zemlak |braulio_kirlin@hane.biz |["garden design","DIY home"] |
3. Users who are interested it “nature” and their age is less than 25
SELECT (data -> person_data.key::text ->> 'name') as Name , (data -> person_data.key::text ->> 'age') as Age, (data -> person_data.key::text ->> 'interest') as Interest FROM person, json_each_text(person.data) as person_data WHERE (data -> person_data.key::text ->> 'age')::integer < 25 and (data -> person_data.key::text ->> 'interest')::jsonb ? 'nature'
4. User who has same primary and secondary interest
SELECT (data -> person_data.key::text ->> 'name') as Name , (data -> person_data.key::text ->> 'email') as Email, (data -> person_data.key::text ->> 'interest') as Interest FROM person, json_each_text(person.data) as person_data where (data -> person_data.key::text ->> 'interest')::jsonb -> 0 = (data -> person_data.key::text ->> 'interest')::jsonb -> 1
I am comparing the Primary interest with Primary and Secondary interest.
Primary interest (data -> person_data.key::text ->> ‘interest’)::jsonb -> 0
Secondary interest (data -> person_data.key::text ->> ‘interest’)::jsonb -> 1
5. User who are interested in “web design” and “ai”
SELECT (data -> person_data.key::text ->> 'name') as Name , (data -> person_data.key::text ->> 'email') as Email, (data -> person_data.key::text ->> 'interest') as Interest FROM person, json_each_text(person.data) as person_data where (data -> person_data.key::text ->> 'interest')::jsonb ? 'web design' and (data -> person_data.key::text ->> 'interest')::jsonb ? 'ai'
Optimization for above queries are welcome. 🙂