by Yohanes Ricky Andika Pradana, Analyst Programmer at Mitrais

Create Activity Feeds in PostgreSQL 9.5

As the implemented database in a project, PostgreSQL has many advantages: It is not associated with licensing costs for the software, has better reliability and stability, and is extensible and designed for high volume environments.

Within the PostgreSQL 9.5 version, there is one major feature called UPSERT to merge identical records, while inserting data. To give a better explanation about this feature, here is an example using the creation of activity feeds.
Let’s start by creating tables. First

create table in PostregSQL

Now we have these test users to work with:

id 

name 

Ricky 

Richard 

Richmond 

 

The next step is the the activities table: 

activities table in PostregSQL

Below is the explanation about columns in the activity table: 

  • Key - a string value containing the type of activity performed. 
  • Actor_ids - an array of users’ ids which performed the action. 
  • Subject_id / subject_type - polymorphic columns to associate the activity with a subject. For example, commenting on a post will result in an activity associated with Post subject. 
  • Recipient_id - id of the user receiving the activity in his feed. This is used for unique grouping activities.  

Our concern is to merge activities with the same key, subject and recipient. So, for example when two people comment on the same post, we want to display one activity that says: 

Richard and Ricky commented on your post 

rather than: 

Richard commented on your post 

Ricky commented on your post 

To achieve that, there should be a method to detect identical activities during an INSERT. So, a unique constraint needs to be added:

adding unique constraint

PostgreSQL automatically creates a unique index for it, rather than inserting it manually. 

After that, let's create the first activity. 

creating first activity

It will give a result with one row: 

id 

key 

actor_ids 

subject_id 

subject_type 

recipient_id 

post.commented 

{1} 

Post 

 

What happens if different users create the same subject activities? There will be duplicates only differing by actor_ids. Since there is no need to keep duplicating activities, the solution is by merging and combining actors into an array. To achieve that, the insert command should be modified: 

modify insert command

Note: updated_at = EXCLUDED.created_at takes care of “bumping” the activity to the top of the feed.

And the result: 

id 

key 

actor_ids 

subject_id 

subject_type 

recipient_id 

post.commented 

{2,1} 

Post 

 

After the modification, there will be one record, and actor_ids now include the new actor_id. But what happens if a user with id = 1 comments on the post again? The second time we run the same query we are going to see this: 

id 

key 

actor_ids 

subject_id 

subject_type 

recipient_id 

post.commented 

{1,2,1} 

Post 

 

To avoid this, the concatenation operator (||) will be required to combine both arrays, but unfortunately, this approach does not guarantee uniqueness. 

The goal is to: 

1.       Append the latest actor_id to the front of the existing array.

2.       Remove previous occurrences from said array to make it unique. 

The first goal is already achieved, while the second goal proved to be more complicated. The reason for this is: PostgreSQL does not come with built-in functions to de-duplicate arrays. 

There is a unique function inside the intarray module, but it has some cons: 

  • It only works with integer arrays, so it won't work with UUIDs 

  • It can only remove consecutive duplicates from an array: 

remove consecutive duplicates

In the previous version of PostgreSQL 9.4, there is a feature called UNNEST WITH ORDINALITY. This feature could expand arrays into rows, but also preserve their original order. Here’s the modified function using the feature: 

modify function on unnest with ordinality feature

It removes elements of one array from the other while preserving the order. After that, let's use UPSERT in the command:

use upsert in command

The result will be exactly as the expected one, user with id=2 is now first, while all the older duplicates are gone:

id 

key 

actor_ids 

subject_id 

subject_type 

recipient_id 

post.commented 

{2,1} 

Post 

 

Creating activities and merging them have been done, the remaining task is to display: 

Michael and Piotr commented on your post 

instead of: 

2 and 1 commented on your post 

It is not possible to simply join the users within the table using actor_ids because the result will lose order. To achieve that, let's utilize WITH ORDINALITY feature. First, the array of ids should be unnest. Next, JOIN users and put them in order:

join users

It will produce: 

id 

name 

Richard 

Ricky 

 

And match the order of ids in the input array. The next step is to wrap the result into a function that returns an array of user names:

wrap result

and the usage: 

usage

actor_names_in_order 

{Richard, Ricky} 

 

The last thing to do is to use the function in a regular SELECT query to fetch grouped activities: 

fetch grouped activities

The result will be: 

id 

key 

actor_names 

actor_ids 

subject_id 

subject_type 

recipient_id 

post_commented 

{Richard, Ricky} 

{2,1} 

Post 

 

By utilizing UPSERT, the query became much simpler while it still maintains the ability to count, paginate and filter down. This is something that could not be achieved easily using window functions and grouping. Furthermore, wrapping INSERT in a separate function is a good idea. However, make sure to add indexes to the columns, which might be filtered.