Jazz Reporting Service: Data warehouse or LQE?

My previous post described the Jazz reporting solutions, including the two data sources used by the Jazz Reporting Service (JRS): the Data Warehouse (DW) and the Lifecycle Query Engine (LQE). So which one should you use? It depends on your reporting needs. This post provides guidance on why you’d use one or the other – or maybe both.

A reminder of the overall reporting architecture:

arch overview 6.x

Jazz reporting architecture

The DW is the more mature data store, and has been part of the solution for many years (DCC is slightly more recent, debuting in 2014).  It has a well-defined and documented schema. The JRS “ready-to-use” and “ready-to-copy” reports rely on the DW, as do many of the BIRT reports available in Rational Team Concert (RTC) and Rational Quality Manager (RQM).  If you plan to use any of those reports, you need the DW. The DW includes some data not available in LQE, such as build data, and a rich set of metrics and history for trend reports, particularly for work items.

However, if you use configuration management for project areas in RQM or DOORS Next Generation (DNG), you must use the LQE (scoped by a configuration) data source for those project areas; the DW does not support versioned artifacts.

rb select data source

Changing data source in Report Builder

To use Rational Engineering Lifecycle Manager (RELM), you also need LQE as the data source.

You can use LQE to report on project areas that aren’t enabled for configurations too, and there are benefits to doing so. The data in LQE is refreshed in nearly real time, while scheduled DCC jobs typically update the DW less frequently. LQE constructs its metamodel dynamically based on your data, which can make some reports easier to build than with the predetermined DW schema. Some reports are easier to build with the “schema-less” model. With the best practice of defining external URIs for artifact types and attributes, you can equate attributes across project areas to facilitate cross-project reporting. LQE also includes some data not available in the DW, especially for RQM.

LQE does have some disadvantages. The DW offers much richer history and metrics data. LQE has limited sample reports, and customizing queries requires SPARQL knowledge, which might be less familiar than SQL.

If you need LQE for configuration-enabled project areas, you might choose to continue using the DW for some reports. In particular, data for RTC work items continues to be available in the DW (since work items aren’t versioned). Even if all of your DNG and RQM project areas are configuration-enabled, you can use the DW to run out-of-the-box and trend reports for work items.

dw wi trend options

WI trend reports from DW

For non-enabled DNG/RQM projects, you might choose to build some reports using the DW, and others using LQE to take advantage of the dynamic schema and frequent updates.

As you decide on data sources, you do need to consider system resources as well.  The DW uses a database for storage and the Data Collection Component (DCC) application to extract and load the data; the LQE application acts as both the data indexer and data store.  Both require adequate resources for your data and usage scale. (See the Jazz.net Deployment wiki for sizing strategy and performance reports for DCC and LQE.)  With respect to sizing and system resources:

  • If you’re not using the DW for any reporting, you can disable DCC jobs from running. If most of your RM and QM project areas are enabled for configuration management, you’ll need less space for the DW database to grow, since those project areas do not contribute to the DW.
  • If you enable LQE, it collects data for all project areas in the registered application data sources; currently you can’t filter the data in the TRS feeds to reduce the size of the data store. However, from a query performance perspective, if you continue to use the DW for some number of reports, you reduce the reporting load on the LQE server, which could contribute to performance. That said, if you don’t use configuration management in RM or QM project areas, you might not want to invest in the extra resources for LQE.

There is another reporting option that uses neither of these data sources: Rational Publishing Engine (RPE) extracts data directly from the applications using the reportable REST API to generate document-style reports and spreadsheets. In some cases, RPE can access data not readily available in either the DW or LQE, and can also handle more complex data manipulation and formatting. RPE is available as a separate offering; it is not included with JRS.


RPE preview view (v6.0.6)

In closing, carefully consider your reporting needs as you decide whether to use the DW or LQE – or maybe even both.


The Jazz reporting alphabet

JRS, LQE, RB, DCC… When it comes to reporting for the IBM Collaborative Lifecycle Management (CLM) and Continuous Engineering (CE) solutions, there are enough acronyms to make alphabet soup! In this post, I’ll define the acronyms and describe the components and how they fit together, as illustrated in this architecture overview:

arch overview 6.x

CLM/CE 6.x reporting architecture

The links in this article go primarily to the IBM Knowledge Centre topics and Jazz.net pages for the components and offerings.

Let’s start with the data stores used for reporting. There are two data stores:

  1. Data warehouse (DW)

    The DW uses a database you specify when you configure your CLM/CE environment.  It includes operational data store (ODS) tables for reporting on current data about the resources, and fact and dimension tables to support metric and trend reports.

    To populate the DW, the Data Collection Component (DCC) application runs scheduled jobs that extract and load data from the CLM/CE applications. You register the DCC application when you configure your environment.

    Reports use SQL to query the DW.

  2. Lifecycle Query Engine (LQE)

    LQE is both a data store and an application; you register LQE when you configure your environment. LQE stores data in an index based on Resource Description Framework (RDF) graphs – essentially an aggregation of RDF triple statements.

    To generate and maintain the index, LQE consumes Tracked Resource Set (TRS) feeds that are provided by the CLM/CE applications.

    Reports use SPARQL to query LQE.

Note: If you are using configuration management to version requirements or tests, you’ll also see a third data source: “LQE scoped by a configuration”. This is another endpoint to the LQE data source to ensure correct reporting on versioned artifacts.

There are various ways to report on the application data:

  • The Jazz Reporting Service (JRS) comprises multiple applications and services, including DCC and LQE, to provide interactive reporting across the solution. The “face” of JRS is Report Builder (RB), a graphical tool for authoring and running reports which also includes administrative tools and tutorials.

    JRS Report Builder

    You can report against data in either the DW or LQE; when you author the report in RB, you choose which data source to use.

  • Rational Engineering Lifecycle Manager (RELM) is a separate application that provides advanced visualizations to support impact analysis and other explorations of linked data. RELM uses data from LQE.
  • Rational Publishing Engine (RPE) is a separate offering for generating document-style reports from CLM/CE as well as other applications. RPE extracts “live” data from the CLM/CE applications directly using reportable REST APIs.
  • BIRT (Business Intelligence and Reporting Tools) is an open-source platform for visualization and reports. The Rational Team Concert and Rational Quality Manager applications include several BIRT reports and dashboard widgets, although we recommend using JRS to build any new reports. BIRT reports use data from the data warehouse, and in a few cases, directly from the application.
  • For those looking for advanced business intelligence beyond the scope of the CLM/CE applications, IBM Cognos provides stand-alone solutions that can integrate data from many sources for advanced mining and insight. The optional ALM Cognos Connector exposes CLM/CE data to IBM Cognos.

There are additional nuances to the CLM/CE reporting story, especially when it comes to global configuration management (which you can read more about in this Jazz.net article), specific applications, and previous releases. Read more about reporting in the IBM Knowledge Center.

I hope this has helped clarify the various components and acronyms related to CLM/CE reporting. Below is the alphabetical list of acronyms and definitions for reference. Have fun exploring these data sources and technologies as you implement your CLM/CE reporting strategy!

Quick Reference

BIRT – Business Intelligence and Reporting Tools. Open source project for interactive reports on data warehouse and some live data.
DCC – Data Collection Component. Extracts data from CLM/CE applications and loads data warehouse.
DW – Data Warehouse. Database store for current data and metric/trend reporting, using SQL.
JRS – Jazz Reporting Service. Component that comprises DCC, LQE, RB, and other services to support interactive reporting for CLM/CE applications.
LQE – Lifecycle Query Engine. Data index based on RDF graphs, using SPARQL. Also an application that creates and maintains the index using TRS feeds from the CLM/CE applications.
ODS – Operational Data Store. DW tables for reporting on current data representation of the resources.
RB – Report Builder. Graphical tool for authoring and running reports against DW or LQE data.
RDF – Resource Description Framework. A framework for describing resources on the web using triple statements.
RELM – Rational Engineering Lifecycle Manager. Application for data visualization using LQE data.
RPE – Rational Publishing Engine. Application for generating document-style reports using reportable REST APIs to extract data directly from the CLM/CE applications.
TRS – Tracked Resource Set. An RDF representation of application resources or artifacts; LQE consumes TRS feeds to collect data from theh CLM/CE applications.

What is a personal stream anyways?

If you are exploring or adopting global configurations in the IBM Collaborative Lifecycle Management (CLM) solution, you have likely come across personal streams – whether you realized it or not!

Personal streams are closely related to change sets, and only come into play if you are using global configurations. A personal stream groups change sets with the broader GC context, meaning:

  • From your change set, you can still use and create links across component streams within the GC.
  • You can work in change sets for multiple streams of the GC at the same time
  • You can share change sets with other team members.

Let’s take a closer look at how personal streams work.  Say you are working in a stream in Rational DOORS Next Generation (DNG), in the context of a global configuration that includes contributions from other DNG and Rational Quality Manager (RQM) components.  When you work in a change set for your DNG stream, you probably still want to see the links to and content from those other component streams in the GC. So when you create your change set, the system automatically creates you a personal stream (PS) that becomes your configuration context.

create cs in ps

Creating a change set adds it to your personal stream

ps config context

Configuration context set to the user’s personal stream

Your PS includes or references the GC itself, and adds your change set at the top of the hierarchy.  As you make changes, you still have the context of the GC and can see the artifacts in the other component contributions, follow/create/delete links, add or change linked artifacts, and see changes that others make to the content in those streams.

The personal stream is per user, per global configuration. As you create and deliver change sets, the system automatically manages your PS to add and remove the change set. You can also view and modify your PS directly in the GCM application.

ps in gcm

Marco’s personal stream for the AMR Server US GC, shown in the GCM application.  Notice the change set is at the top of the hierarchy, so takes precedence over configurations lower in in the hierarchy.

Where a personal stream is really important is when you are making changes across related component streams, especially where both streams mandate change sets. In that scenario, your configuration context must include change sets for both of those streams. To add change sets for more than one component stream:

  1. In the first component stream, in the GC context, create your change set. Your PS is automatically updated and set as your context.
  2. Switch to the second component stream. With your context set to either your PS or the same GC, create your change set. Your PS is automatically updated with the additional change set, and set as your context.
ps in gcm 2 cs

After creating a change set in a second RM component. Both change sets are added to the top of the hierarchy.

In most cases, the automation handles everything you need. You can also modify the PS directly in the GCM application, which is useful if you want to:

  • Add a change set that already exists, including one that someone else created
  • Switch between different change sets for the same component stream, since you can include only one change set per stream at a time. If you add a second change set from the same component stream, it replaces the original change set; the original change set is still there and active, and you can switch back to it later.
manually adding cs

Adding a change set that a different user created. If we click OK, the selected change set will replace “changes for CR 1234”, since you can have only one change set per component stream.

A few things to be aware of:

  • Delivering a change set removes it from your PS; however, you still remain in the PS context, in case you had other change sets for other streams. You can manually switch to a different context. (Fun fact: a PS with no change sets is essentially the same as the GC it’s based on.) The PS persists and is reused the next time you create a change set for the same GC.
  • Personal streams are personal; each user has their own, and you can’t switch to another user’s PS context.
  • Be careful when creating “incoming links” to artifacts in a change set in a PS. If the artifact that owns the link is in a stream, the link might get created even if you don’t deliver the change set. For example, if you create a validated-by link to an artifact in a DNG change set, RQM creates and stores the link immediately, regardless of change set delivery.
  • Similarly, when linking across DNG component streams, you need to consider link direction when making changes, especially in version 6.0.5 and later where link direction and storage is enforced. If you’re working in a change set for one stream, but want to create an incoming link from an artifact in a different component stream, you’ll need a change set for that other stream as well, so you can store the link.
  • To use personal streams as described here, users need permission in the GCM application at least to create and modify personal streams (there are also permissions for archive/restore and scrubbing personal streams). The Contributor role includes the necessary permissions.

I hope that has helped demystify personal streams, and given you insight into how you can use them to do interesting things like share change sets or coordinate changes in change sets across multiple component streams.

Adoption guidance for CLM configuration management: new articles available!

Wow, it’s been a long while since I’ve posted. It’s been a busy couple of years working with the configuration management capabilities in the IBM Collaborative Lifecycle Management and IoT Continuous Engineering solutions!

In my last post, we had just introduced these new capabilities to the market.  There was considerable interest, but initial adoption was slow… many clients weren’t ready to take on the process transformation that goes along with using configurations, while others required additional capabilities.  Two years later, the solution has advanced significantly, a number of clients have successfully implemented the capabilities in production, and interest continues to grow.

So what have we learned over this time?  Our original guidance was to go slow: ensure you have a good understanding of your current process, the new capabilities of the solution, and take the time to work out what your new practices will be in light of those capabilities.  It turns out that is good advice that still applies.  Configuration management introduces new concepts, operations, roles, applications… there’s no way you can just “turn it on” and continue working the way you always have.  To be successful, you need to take the time to plan and pilot your adoption.

We’ve begun identifying some patterns based on what we’ve observed at clients who have implemented configuration management — including what worked and what didn’t — and articulating some additional guidance around adoption.  The initial set of articles is now available on Jazz.net, focusing on general guidance, component strategy, and stream patterns:

We plan to expand this series to address additional topics that might include baseline strategy, change management,  and additional stream patterns.

We welcome your feedback and input on what to cover in future articles: what burning questions do you have about configuration management? Are there particular topics that you’re struggling with?  Let us know.  We hope you find these articles useful.


My initiation to the IBM Internet of Things

I recently moved to the IBM Internet of Things division, and needed to learn more about the IBM Watson IoT Platform. There is a lot of hype and hyperbole around IoT, and the amount of information available — even just on IBM’s offerings — can be overwhelming and confusing, especially when it assumes what you already know (and you don’t know!).

For others who may be in the same boat, here’s a layperson’s introduction to IBM’s IoT Foundation offering, the underpinning of our IoT Platform.

You probably know that IoT solutions collect data from “things” (sensors or devices) and analyze that data to make decisions or take actions. As more “things” become instrumented and combined with data from other sources, advanced analytics, and cognitive systems, IoT solutions get very interesting, like the self-driving car that you’ve likely heard about.

Very cool, but also sounds very complex. How does this actually work?

Full disclosure: I’m not a programmer, although I can understand code; learning new languages and piecing together programs is not my idea of fun. So my goal was to understand how this all works without having to write code. [If you like to code and prefer to get your hands dirty, you might prefer to start with Exploring IBM Watson Internet of Things, or in the IBM Bluemix IoT Quickstart environment, where you can experiment in a sandbox.]

The IoT Foundation (IoTF) service enables communication between the devices that generate the data and the applications that want to interact with that data or with the devices themselves.  Here is an excellent overview of the IoTF, courtesy of the very helpful IoT Foundation documentation:iotf-overview

The IoT Foundation is available on IBM Bluemix as a hosted service, and recently became available to run on your own data center as a managed service (details here).

When you set up your IoTF service, you get an “organization id”, that identifies and groups your devices and applications. This value is used in the connection and authentication process to generate tokens for the devices and applications to use for security. Connections can also be encrypted.

Then you register your devices with IoTF. You can use the IoTF browser-based dashboard to add them manually, or write application code to manage the registration using REST APIs or one of several programming languages (libraries are provided to help). You can register devices individually or in bulk, and even set things up so devices can register themselves. Devices can then transmit data to IoTF, usually using JSON messages over MQTT, a lightweight messaging protocol.

You also connect your applications to IoTF, using the organization id and an API key and token generated from your IoTF instance. The applications are how you make use of the data, whether that’s applying analytics, invoking actions based on triggers, or what-have-you.

Using IoTF, applications can subscribe to data “events” from devices, or send commands to the devices. IoTF provides device management commands to reboot, reset, and manage device firmware – which you can also issue from the IoTF dashboard – presuming the target device has the capability to respond. (Of course, someone did have to program that device in the first place.)

To see this in action, I highly recommend the IoTPhone sample application on IBM Bluemix, as described in this video and documented here. In the sample, you register your smartphone with IoTF and then view the data from its sensors (accelerometer and GPS) coming into the IoTF service. No programming required, although you can also view and modify the sample application code for registering and connecting your device.

There is a second part of the sample that shows how to use that device data with IBM Real-Time Insights, another IoT Platform service on Bluemix that provides analytics and rules. So for example, you can trigger an email or other action based on the data values. I’ll leave details on that for a future post.

I hope this intro to IoT Foundation helped someone besides myself with IoT on-ramping. Happy exploring!

New: Configuration Management in Rational Collaborative Lifecycle Management 6.0

In case you haven’t heard: Rational Collaborative Lifecycle Management (CLM) v6.0 introduces new support for configuration management, to better enable strategic reuse, change management, and product-line engineering.

So what IS configuration management?  You may be familiar with source code configuration management (SCM), widely used by software programmers.  Artifacts are versioned in the context of “streams”, so you can include different versions of the same artifact in multiple streams — perhaps to support changes for maintenance on an initial release, while allowing separate parallel changes to those artfiacts for a new release. Changes can be delivered across streams, for example to add fixes from the maintenance stream to the new release. Multiple streams or configurations enable artifact reuse (unchanged artifacts are referenced, not copied) and isolate changes. You can also take baselines to capture the state of your artifacts at significant milestones, like a release date.

In v6, CLM extends configuration management to include requirement, test, and design artifacts. You can define streams of artifacts in each of DOORS Next Generation, Quality Manager, and Design Manager (and of course, Rational Team Concert’s SCM component); modify or include different artifacts or versions of artifacts in the context of each stream; and take baselines to reflect the status at a given point in  time.

Not only can you do this within the individual application domain, you can also define “global configurations” that bring together streams from the different domains — including artifacts from requirement, test, design, and/or code domains.  And you can take global baselines of these configurations, capturing a point-in-time for all the artifacts associated with your release or product line.

Why should you care?  (If you’re already doing product-line engineering, the answer might be obvious to you.)

Consider: how often might you use test artifacts from a given release in a subsequent release? Do you ever have a requirement that applies to more than one release, maybe with modifications?  Maybe you want to manage change against requirements, isolating revisions into separate streams or “change sets” while teams continue to use a primary stream.  Perhaps you simply want to capture a complete picture of the artifacts that went into a particular release.

For a great overview of configuration management, check out this YouTube playlist.

And look for more posts about the new CLM capabilities on jazz.net, IBM developerWorks, and right here.

a herd of “aaS”s

Read about cloud computing, and you tend to see “aaS” quite a bit.  As you probably know, it stands for “as a service”; cloud computing is all about providing self-service and managed operations to clients, who pay for what they use.  The typical prefixes for “aaS” are I (for Infrastructure), P (for Platform), and S (for Software).  And the difference is in what the cloud service provider manages, vs what the cloud client takes care of.

Let’s start from the top down: “Software as a Service”.  The cloud provider pretty much manages the entire software stack from soup to nuts (or hardware up to the application itself). The client just logs in and uses an instance of the application(s) that they need.  Like WordPress – you just sign in and can start writing a blog. Or Google Docs.  (Of course, as a web user, we don’t really know how something is hosted on the back-end, but just like cloud, we don’t have to care!)  In the SaaS model, the client doesn’t install anything, they just log in and go. They may be able to customize some options of the application, depending.

The next step down is “Platform as a Service”.  The cloud provider takes care of everything up to the runtime environment – including middleware (web application servers, database servers, etc) and operating system options. The client decides and manages the actual applications, code, and databases that get deployed into that environment.  So more work for the client, but also more control over the software that they use.

Then there’s “Infrastructure as a Service”.  The cloud provider manages the hardware (storage, servers), networking, and the virtualization services that make a cloud a cloud. The client takes care of operating systems, middleware and runtime, in addition to the applications.  This offers the greatest flexibility to the client.

While these seem like well-defined chunks, I suspect the reality is a little less cut and dried.  Could you have PaaS where the customer provides some of the middleware? Or where the IaaS includes operating systems?  Pretty sure those scenarios could be worked out with many cloud providers.

There are some great diagrams of the “aaS”s floating around. I didn’t want to plagiarize by copying one here.  I also found some references to “Business as a Service” (BaaS) – where the software provided has business processes and intelligence already built in. I wonder how many more “aaS”s will be defined as cloud computing further matures?  (hmm. did that sound wrong?)

The net net is that the cloud model allows multiple levels of service provision for clients – from the very basic virtualized environment to the entire software stack. It’s up to the client to decide how much they want their provider to manage for them, and how much they want to take on and control themselves.