Dataset Profile and Size:
- Total number of active items in your instance
- How items are arranged into projects
- Max number of items in a container
- Number of containers
- How items are related to each other
- Number of relationships per item
- Depth of relationship chains
- How often users comment on items
- Size and number of file attachments
- Usage Patterns and Workflows:
- How often items are batch updated/deleted
- Operations on large numbers of items have a performance cost
- How users search for items
- Do you have targeted project filters or large complex ‘All Projects’ filters
- How users manage projects
- Reuse large numbers of items
- Batch import items from other sources
- API Integrations and item sync
- Usage of Reviews
- Are large numbers of items reviewed
- Do you published a large number of revisions
- Usage of Baselines
- Are you creating baselines for a large number of items
- Are you updating the baselines often
- Usage of Test Center
- How many Test Cases per Test Plan
- How many Test Groups per Test Plan
- How many Test Runs per Test Case
- Usage of data exports/reports
- How many items are being exported
- How often are exports/reports run
- What time of the day
- Replicated Server (Vertical Scale)
- Disk Space
- Replicated Container Memory Allocation
- Network Topology
- Database Server
- Backup Jobs
Common Performance Issues
Performance issues can be categorized in a few different ways. Understanding which type of issue, you are seeing will help in tuning Jama Connect to address it.
General slow performance
If the application is simply slow and never seems to recover, this is an indication your systems might be resource constrained. Working with your IT group you will want to monitor the CPU and Memory usage on the Replicated and DB servers. It may be as simple as adding more memory or moving the database to a more powerful server.
Slow performance during specific times of the day
It is possible that scheduled (internal or external) processes are running that cause Jama Connect to consume all its available resources.
As above, monitoring and adjusting the system resources is one solution. Internally Jama Connect has cleanup jobs that are scheduled to run later at night (talk to your Account Manager for more information).
Additionally, some customers use API integrations or database tasks(eq. backups) that may add stress to a system when they run. In these cases, you should consider changing the scheduled execution of these tasks to ‘off hours’ when users are not on the system. Otherwise you may need to ensure Jama has enough resource headroom to handle these tasks as well as meet user demand.
Random complaints of poor performance
It can be difficult for a Jama Connect Administrator when their users report that Jama ‘feels’ slow but only ‘sometimes’. Often, we will work with the affected users, collect their common workflows and still be unable to reproduce the issue internally.
In these cases, we typically find that the performance degradation is caused by one or two users performing large, often unnecessary, operations. In one case we found a product manager who decided to create a project baselines (100,000+) items every day so they could see what changed. A quick discussion with support and we showed them how to run a filter on ‘Last Modified Date’. The performance complaints were fixed.
In some cases, the issues have been as simple as one user generating large reports at 8am. Having that user generate their reports at 5pm resolved the performance complaints. Sometimes Jama simply does not have enough headroom to handle these ‘performance spike events’. If you have provisioned your systems based on common user load, it may not be enough for periodic large operations.
It's important to remember that, for the most part, Jama Connect is a very performant application. Users can login, read comments, search for items, create new items and generally do their work without issue. However, there are some operations that can put temporary strain on a system.
- Generating reports that review 1000’s of items, or items with dozens or hundreds of relationships each.
- Creating a Baseline or Review of thousands of items
- Reindexing a project with 100,000+ items
- Importing, Exporting, Reusing or Deleting thousands of items at once
- Duplicating a project with 50,000+ items
- Reordering the Tree and causing 1000s of items to be updated
- Batch Deletes of 1000s of items
How Jama QA Tests Performance
Jama has a series of UI functional and REST API performance tests that it runs daily and then again on the official release version. These test results are published each release as part of the Jama Connect Validation Kit. For information on pricing and availability of the JVK, please contact your account manager.
Our performance tests are built and executed using an open source tool called Gatling.io. Gatling is similar to JMETER and HP LoadRunner. It generates HTTP traffic between a test client and Jama’s Applications Web Server. Using Gatling, we eliminate the need for physical web browsers and can simulate hundreds of virtual users simultaneously.
Current: API Test Approach
Our current approach to performance testing Jama focuses on stressing the system using our public REST API. This is a fairly common approach and is especially valuable in rapidly identifying degradation caused by changes to the code/database affect performance.
Our process involves scripting each of our REST Endpoints into a separate test, then running them in isolated batches so each endpoint can be measured without interference. For example, when we test creating items, we login a group of virtual users and then schedule them to execute a single REST endpoint. All users will create an item simultaneously and then waits for everyone to complete the test before moving on to the next endpoint
Because of this rigid isolation and scheduling we are able to measure performance for each endpoint with a high degree of accuracy. Without the isolation we noticed that different users executed the tests at slightly different rates, over time this drift would impact the final results. The goal of this type of performance baseline is allows using the same tests to compare different version of Jama and identify performance changes.
We run all of our tests in batches we call steps. For each step (Warmup, Step1, Step2 etc..) we run the same exact tests, the only difference is how many users execute each endpoint simultaneously (pictured below)
To further ensure accuracy, each step runs the same battery of tests 3 times in a row. There are different factors that can occasionally impact a performance test and cause anomalies in the results. By running the tests multiple times back to back we can exclude invalid results caused by network or external issues. We only publish our results when all 3 test runs are within a similar range.
We run all our performance tests nightly using our latest development branch. While our test cases are stored in Jama, we generate 1000s of test results a day. It does not make sense to upload all this data into Jama.
Instead we upload the raw results into an Amazon S3 Bucket. Using these results, we can perform our daily triage (looking for regressions or bugs) and then push the aggregate data into Jama Analyze for trending reports. We can then safely delete the raw S3 data to save space. Of course, we save the full data for all release test runs.
there are 3 things that will trigger an investigation into development changes made the day before:
- Errors – Any errors (404, 500 etc.) in the test results are investigated to determine the cause.
- Deltas – Any large deltas in the performance numbers (shifts of 10% or more) is investigated to understand why
- Threshold – We have a 1 and 2 second threshold for different APIs. If the results extend beyond a threshold it is investigated. Smaller [GET] calls are expected to be under 1 second, while larger [POST/PATCH] calls can move between 1.5 and 2 seconds under heavy load.
We typically release a new version of Jama every month. Once the final is identified we rerun all of our performance tests and build a comparison graph against the last release. That chart (along with any defects) is included in the Jama Validation Kit.
Future: Persona/Workflow Test Focus
Using the REST API to measure the impact of code changes on system performance, works well. However, these tests are not able to answer the common question of how real users are impacted in very specific situations. For example, how are Review Center moderators impacted when an admin user generates a large report in Jama?
Jama QAQC is committed to pursuing an innovative method of defining and measuring ‘simulations’ where multiple personas execute different workflows simultaneously and we are able to measure the performance for each user at each workflow action.
This work is still several months from completion. Mechanically it’s not very difficult to use Gatling to create a Simulation with multiple Personas each performing their own workflows.
The first problem comes in identifying what is a valid simulation. Different customers use Jama differently. Some might be focused on Test Center while others do a lot of Item movement. There is no one size fits all test for Jama.
Our plan is to identify several common situations to act as baselines (8am on a Monday, End of Month Reporting, Sprint Planning, etc..) and then maybe work with select customers to share our framework so they can identify and test their own simulations.
Imagine a typical company at 8AM in the morning. You will have groups of users sitting idle in the application while other users are performing their work. To test this, we would define those personas/workflows, how many are logged in and how often they are performing their actions.
The second problem occurs when we want to report the results of the simulation. The above simulation touches dozens of REST Endpoints (used in different ways) with thousands of individual requests. How do you condense all that data into values that can be used to compare versions of Jama?
Our plan is to define a simulation score base on defined performance thresholds for each Workflow (eg. 2 seconds). When the Workflows is completed within the threshold, the Persona is satisfied, otherwise the Persona is not satisfied.
Using these states, we can derive a percentage score rating the satisfaction with performance. We can even ‘roll’ the score up.
- The Simulation gets an overall score based on all the Personas/Workflows
- Every Persona is scored based on the scores of its own Workflows
- Every Workflow is scored based on the time it takes to execute all the requests in the Workflow
Executing your own performance tests
Some customers want to build their own performance tests and validate Jama updates on an internal test system before pushing them to production. This section is intended to help make those efforts as successful as possible.
You can’t test Jama like a static web site
Several of the companies we have consulted with have run into issues trying to build their own performance tests. Unlike a traditional web site, you cannot simply record the UI and playback a user session. Jama has very complex workflows for creating and modifying data and our internal DWR web calls can be confusing at times.
There is a lot of logic built into the UI. Moving an item is a single call, but it causes multiple other calls to be generated. Trying to map and parametrize all of these different calls is a difficult undertaking. Internally we use the REST API to generate comparative metrics across versions. It is the most stable and maintainable solution we have found.
Our APIs leverage the same server code as the UI and changes to Jama that affect performance will generally show up in the API. The UI is very ‘chatty’ as it tries to keep updated as the user works. Most of these extra UI calls hit our cache server and account for very little server load (milliseconds).
Finally, metrics used to measure static websites do not work well with some rich web applications (like Jama). Requests/Pages/Errors per Second are common measurements, but Jama continually generates hundreds of small requests as the user works, but only a small number of serious web requests that will be impacted by load/stress. Real data is lost is the noise of all the other calls.
The solution is to identify key Jama operations (Item creation, searching, loading projects etc..) isolate them from all the noise and measure how long it takes for those calls to complete. How you setup your data and load/stress depends on what you want to test.
Building the Test Plan (Ad-hoc vs. Baseline Testing)
Before writing code to script calls to Jama it important to understand what you are going to measure. There are two common types of performance testing our customers perform.
Ad-hoc testing refers to the practice of running random, or loosely ordered tests. For performance testing it means executing a number of different calls/scenarios concurrently and then looking at how the system behaves. Commonly some threshold value is determined (like 2 seconds) and then a lot of operations are executed, in different orders, and the tests are assumed to pass if the calls remain under 2 seconds.
This type of testing is often used by development teams to try lots of different combinations to see which ones fail or have issues. Customers can use this type of testing to build confidence a system will handle different types of load.
The downside to this type of testing is that the results are often not repeatable. Because the tests are not ridged and isolated, running the same tests over and over will often give different results as slight delays or jobs on the system change the timing. This is why this method requires setting a threshold for failure. If the same operation takes .5 seconds the first time and 1.9 seconds the second time; it is still considered a pass.
Baseline testing is a much more ridged and structured way to run a performance test. The goal of this type of testing is to remove the noise and drift of a long test by isolating the calls and waiting for all the users to complete the request before moving on to the next one.
The benefit of this type of testing is you get clear deltas in performance between different systems. If you want to know the impact of upgrading to a new Jama version or adding more memory to the server, this is type of testing is required. You simply run your tests, reset the database, upgrade the system and run the tests again. You should get a clear view of the impact to performance.
The downside to this type of testing is that it is not a realistic view of how users interact with the system. 30 users do not press the ‘Save’ button at the same time. It does not call out interactions between different types of operations.
Important considerations for baseline testing:
If you expect the same results running the tests multiple times in a row
If you expect to compare the results of two different Jama Versions
- You need to reset the data between test runs
- You need to run the same tests in the same order
If you need to change or reorder the test scripts
- Both tests need to run on the same hardware/configuration
- Both Jama instances need to use the same data (projects, items etc.)
- You need to rerun the tests on both systems again.
How many users should I test with (aka 5 min window)
It’s important to pick the right number of users to simulate with your tests. Too few and you are not going to stress the system, too many and you put the system in an unrealistic state. If you think about how users access Jama, you realize very few actions happen simultaneously. Most of the time users are reading or typing in Jama, but the server is only accessed when the user saves or loads a new item.
At Jama we use the ‘5-minute window’ estimation. We look at the total number of licenses and decide how many of those users will perform actions within 5 minutes of each other. Then we take that number and decide how many of those users will perform an action at the exact same second. For example, when we are simulating 8am on a weekday. We have 1000 license and we decide that around 250 users are performing actions in the same 5-minute period.
- Users will periodically perform a workflow where they may click something in Jama Connect every 5 seconds. It is possible to have multiple users running these types of workflows simultaneously.
- 300 seconds (5 mins) / 5 seconds = 60 actionable requests per user
- 60 requests per user * 250 users = 15,000 total requests in 300 seconds (5 minutes)
- On average 50 of these 60 requests will be read/get requests and 10 will be write/update requests.
Errors invalidate performance result comparisons
When executing performance tests, it’s important to ensure that your test runs do not have any errors. Errors invalidate the results because they change the dynamic of the tests. Imagine a chain of requests (Create Item, Edit Item, Delete Item), when a request fails all the chained requests fail as well. Different runs with different errors will have different results that are not comparable.
Performance results are generally based on averages of the time it takes multiple requests to respond. When Jama encounters an error, the results are affected. A request that takes 5000ms might suddenly return in 100ms or the reverse may happen. Even if you filter out the errors, the errors and skipped chained requests fundamentally affect the dynamic of the test. Requests align differently and impact each other in different ways.
It is valid to write a test to ramp up usage until Jama starts having errors, but any results after errors start occurring should not be compared to other results, they simply will not line up.
HTTP Status 409 - Conflict Errors
A 409 error is returned when the REST API attempts to modify an item that is already being updated. When adding/deleting or moving an item in Jama, there is a post-save process that updates the GlobalSortOrder for items in the tree. Jama uses this value to keep item hierarchy with their siblings.
The Jama UI prevents this error by first locking the Item and then only allowing 1 user to modify it at a time. With the REST API developers need to handle this error with a short delay and a retry.
When writing tests against the API you need to eliminate 409 errors from your tests. The most common cause of these errors is having multiple users creating items in the same folder. Under load you run into timing issues where multiple items are created at the same time and try to update each other. One item wins and the other gets a 409.
Like we discussed above, any errors (even with a retry) will change the dynamic of the test. So, the best practice is to structure the test scripts, so each user creates a unique folder for their tests.
At Jama we put all tests artifacts into a single component. Each user creates their own component and structure (Set, Folder and then items). We also use a datetime stamp so we can tell which artifacts from different test runs apart.