Remote Topic Views – Beyond Fan-out

Diffusion™ has long had the ability to replicate topics from an upstream (or primary) server to one or more secondary servers using the fan-out feature.

Fan-out is configured on the secondary server, enabling selected topics to be replicated from the primary server to the secondary. As those topics update on the primary server, the updates are fanned-out to the secondary.

Multiple secondary servers can be used to provide scalability. Replicating the same topics to many secondary servers requires copying the fan-out configuration files to each secondary server.

Diffusion 6.3 introduced topic views, a mechanism to produce virtual topics (known as reference topics) that refer to other topics in the topic tree for their value. In 6.4, we added topic view expansion, enabling a single source topic to be expanded to produce many derived reference topics.

Now in Diffusion 6.5, the concepts of fan-out and topic views have been combined to make a new and powerful feature: remote topic views.

What is a ‘remote topic view’?

A simple topic view maps one branch of the local topic tree to another using a specification as follows:

map ?a/ to b/<path(1)>

This specification maps all source topics under the topic node a to matching topics under the node b. For example, a/x/y/z is mapped to b/x/y/z. The directive path(1) means “all path elements from index 1 and beyond”.

Diffusion 6.5 introduces a new clause in the topic view specification. This is the from clause and it is used to indicate that the source topics to be mapped are hosted on another server. The format is as follows:

map ?a/ from Server1 to b/<path(1)>

In the above example, from Server1 indicates that the selected source topics are not on the local server but on a server named Server1. But what defines Server1? For this, a new configurable component called a remote server has been introduced.

Defining remote servers

A remote server definition can be created and maintained using client APIs in the same way as a topic view. Like a topic view, a remote server definition is persisted and replicated across the cluster. So if you create a remote server definition, it will exist and persist on every server in the cluster until it is explicitly removed using the client API.

A remote server is defined in the secondary server cluster and specifies how a connection to the primary server should be made. It will specify the primary server URL, as well as the principal and credentials necessary to make a connection. It can also specify connection parameters to tune the behavior of the connection.

Remote server definitions are created and maintained using the new remote servers API feature. The following example shows how a remote server can be defined using the Java Client API:

RemoteServers remoteServers = session.feature(RemoteServers.class);
RemoteServer server = 
    remoteServers.createRemoteServer(
            "Server1",
            "ws://host:8080",
            "principalX",
            Diffusion.credentials().password("password")).get(5, SECONDS);

This defines a remote server called Server1 that can be connected to at the URL ws://host:8080 using the security principalX with a password of password.

Creating such a definition will cause it to be persisted to a file on the server it is created on and distributed to all other servers in the cluster. However, no actual connection is made to the primary server until a topic view is created which specifies Server1 in its from clause.

When creating a remote server, you can also specify connection options to tune the behavior of the remote server connection. These options are very much like the options for configuring a fan-out client, setting properties like transport buffer sizes, recovery buffer size, and reconnection/retry parameters. The defaults for these options will suffice in most cases. In the above example, the returned RemoteServer object will reveal all of the options chosen for the connection definition.

The connection options can be specified in a map. For example, you can specify a non-default connection timeout like this:

RemoteServer server =  
   remoteServers.createRemoteServer(
      "Server1",
      "ws://host:8080",
      "principal",
      Diffusion.credentials().password("password"),
      Collections.singletonMap(ConnectionOption.CONNECTION_TIMEOUT,"123456")).get(5, SECONDS);

As mentioned above, a remote server definition is persistent, and so if the server is restarted the remote server is restored. Such definitions can be removed at any time using the API as follows:

remoteServers.removeRemoteServer("Server1").get(5, SECONDS);

All of the remote servers that have been defined may be listed as follows:

List<RemoteServer> servers = remoteServers.listRemoteServers().get(5, SECONDS);

To create and remove remote server definitions, the client session needs the CONTROL_SERVER global permission, and to list them it needs VIEW_SERVER permissions.

You can also create and remove remote server definitions using Diffusion’s web-based management console.

Remote servers will connect and remain connected as long as they are in use by topic views. If the connection is lost for any reason, normal session reconnection can take place. If that fails, the connection will keep periodically retrying (according to the connection option configuration).

Topic views and remote servers

If a remote topic view is created but the remote server specified does not exist, the topic view will remain inactive until the remote server is created and can successfully establish a connection.

Any number of topic views can specify the same named remote server. All of them will use the same connection to map topics (as specified in the view selector) from the primary server to reference topics on the local secondary server, according to the mapping defined by the topic view specification.

If a remote server definition is removed when one or more topic views refer to it, the reference topics created by those views will be removed and the views will become inactive. If the remote server is recreated, then such views will reactivate again. This means that actual network connections are only in place for remote servers when there are topic views that require them.

When a remote server connection is lost, the reference topics of all topic views that refer to it are removed.

Normal topic view precedence rules define the priority of one topic view over another. So if two topic views (remote or local) would lead to the creation of a reference topic at the same path, the first view created takes precedence. This means that a local topic view could create a reference topic at a certain path, but if a remote topic view that was created before it but had no remote server connection now establishes a connection, the reference topic at that path would be replaced by one mapped from the remote topic view.

In a Diffusion server cluster, both topic views and remote server definitions are automatically distributed across the cluster. Every member of a cluster with a remote topic view automatically connects to the same primary server and produces the same reference topics from remote topic views.

All of the capabilities of topic views (such as topic expansion, throttling, and the new delayed topics) are available for use in remote topic views.

Remote topic views vs fan-out

Remote topic views are effectively a replacement for fan-out, as at their most basic they provide the same functionality.

To define a remote topic view that does exactly the same as fan-out (that is, produce a one-to-one mapping of selected topics from primary to secondary server), you could use a topic view specification as follows:

map ?a/ from Server1 to <path(0)>

However, given the powerful mapping capabilities of topic views, remote topic views offer a great deal more potential than traditional fan-out, since you can process and transform the data instead of simply replicating it.

In addition, remote topic views require no file-based configuration on the server. All configuration is done via the client API (or the management console) and is automatically distributed across a cluster, removing the need to manually copy configuration files as with fan-out. You can enable replication of configuration across the cluster (including all remote topic views), even if there is no need for topic or session replication.

Remote topic views support automatic topic removal just as fan-out does. This means that if REMOVAL is specified for a topic on the primary server, subscriptions to reference topics created from that topic by a remote topic view will be counted for removal purposes, as well as sessions connected at the remote topic view’s server.

Bear in mind that only one level of reference is supported. Suppose you have a second remote view that creates reference topics from reference topics based on the original topic that has automatic topic removal configured: subscriptions and sessions based on the second-level reference topics would not be counted (although we may consider this for a future release). This is actually no different to fan-out, in that downstream subscribers to reference topics derived from fan-out topics are not included in subscription counts.

Although fan-out supports the propagation of missing topic notifications from secondary to primary servers (where the fan-out link selection would select the missing topic), this does not apply to remote topic views. The reason for this is that the topics that might be missing are always the result of a topic view mapping, so such missing topic notifications are not propagated any more than reference topics mapped locally from fan-out topics would be.

Remote topic views also support the readiness configuration of connectors. Now there is a new remote-topic-view-ready start condition that may be configured to ensure that a connector is not started until a named remote topic view is available (that is, it has successfully subscribed to the topics in its selector).

Caveats

The normal caveats relating to topic views apply to remote topic views. For example, view specifications that derive topics from the value of highly volatile fields in the source topics should be avoided, as there can be high CPU and memory costs relating to rapid addition and removal of topics.

In addition, remote topic views maintain a local representation of each selected source topic on the secondary server (per connection) so that all of the functionality of topic views can be supported locally. Even though the memory footprint of such topic copies is much lower than for regular topics, there is an additional per topic memory overhead over fan-out when using a one-to-one mapping to replicate exactly what fan-out does. The significant advantage over fan-out is the ability to map to entirely different topics on the secondary servers.

Summary

Remote topic views provide the same functionality as traditional fan-out, combined with all of the powerful topic mapping capabilities of topic views. This new feature considerably extends the distribution capabilities of Diffusion and opens up exciting new possibilities for distributed, highly scalable applications.


Further reading

The Diffusion Data logo

BLOG

Benchmarking and scaling subscribers

March 15, 2024

Read More about Benchmarking and scaling subscribers/span>

The Diffusion Data logo

BLOG

React PubSub using Diffusion Websocket Server

July 08, 2024

Read More about React PubSub using Diffusion Websocket Server/span>

The Diffusion Data logo

BLOG

100 million updates per second - Landmark Diffusion cluster performance

July 02, 2024

Read More about 100 million updates per second - Landmark Diffusion cluster performance/span>