In the field of React, the past years have seen a significant shift in the way we test our components. While previously it was mainly focused on unit tests, the current standard is more orientated towards making sure your tests are as representative as possible of the experience of your users — behaviour testing.
Unfortunately, not every React developer or team can follow the latest trends whenever they appear. While being able to use React Testing library will help with more proper behaviour tests, it is not a change that every team can realistically make. If a team already has some kind of testing structure in place for the React code, then it’s not out of the ordinary to stick to it. The potential benefits might not outweigh the required effort to adopt an entirely new testing library, which is a reasonable consideration.
However, the result is that React developers want to apply behaviour tests, but have to use suboptimal tools to do so. In the past three years, I have experienced this multiple times. As a team and individual, our focus shifted towards behaviour testing. But the team was already using Enzyme for quite some time and it was deeply integrated into the frontend stack. Migrating from Enzyme towards React Testing Library, or any other behaviour testing focused library for that matter, was not realistically possible.
Since that was out of the question, I decided to work with what I had available and try applying proper behaviour testing in Enzyme. During the past three years, I have exclusively worked with Enzyme and tried doing behaviour testing with it. I have come across a lot of hurdles and worked on honing this skill quite a bit. Performing behaviour testing in Enzyme is not necessarily difficult, but doing it properly is a challenge on its own. In this article, I will share what I have learned over the years in regards to applying behaviour testing in React with Enzyme. In particular, the focus is on how to do it properly so that the resulting tests are meaningful, reliable, and valuable.
There are 3 ways to render your components using Enzyme: shallow rendering (using
shallow), full DOM rendering (using
mount), and static rendering (using
render). If you are focused on writing proper behaviour tests using Enzyme, the right choice is always to full DOM render the components subject to testing.
As stated in the Enzyme API docs), shallow rendering’s main purpose is to isolate a component and make sure your tests are only asserting against that component, whereas static rendering generates HTML from your React tree to make it easier for you to validate the resulting HTML structure. Both of these rendering methods create an isolated environment of the components for you to test in.
Shallow rendering isolates the component from the rest of the application, while static rendering isolates it towards only the resulting HTML structure. While these isolated environments are fine for unit tests and certain verifications, they are not the circumstances in which users will interact with your application and thus do not contribute towards proper behaviour tests. More often than not, using these rendering methods will incentivise bad testing habits, focusing on testing implementation details and gaining a false sense of security.
Full DOM rendering on the other hand does not isolate the component subject to test in any way and makes sure to keep all of its interactions with other components, the DOM, external resources, and the React lifecycle intact. This means that tests will run in an environment that is similar to the browser and most closely resembles the experience of the user. So if your goal is to perform proper behaviour testing, then it is a necessary step to always try to use full DOM rendering.
Most of the frontend testing libraries out there were not originally built with a focus on behaviour testing. Enzyme is one of these testing libraries. While this made perfect sense back in the days, the field of frontend testing has changed quite a bit. Compared to before, the field has shifted towards behaviour focused testing.
This is not a problem, but it does mean that you have to be extra thoughtful when using Enzyme to write behaviour tests. In particular that you should be extra careful that your tests do not rely on implementation details. A big part of your focus should be making sure that your tests make sense and properly reflect how users would also interact with your application.
This is especially crucial when you are attempting to interact with elements to trigger certain flows. When doing so, these interactions mustn’t rely on insider information that we have as developers — implementation details.
If this is done wrong, then one of the most fundamental parts of your tests is already fragile. This in turn will make your tests flaky, constantly trigger false negatives as small unrelated changes will affect the tests, unreliable, and provide no value.
The most frequent usage of implementation details in behaviour tests that I have seen is relying on CSS aspects of your frontend code. In general, I advise avoiding using CSS selectors and stick with interactive elements that make sense from a user point of view. From the perspective of your users, class names or other CSS features of your code don’t hold any meaning or purpose.
When testing your frontend code, it is very easy and convenient to make use of a CSS selector. That is also why so many people do so, but in the end, it is an implementation detail. From a development perspective, it is very subjective to change and only diminishes the credibility and reliability of your tests since the smallest change can cause false negatives. Instead, try to stick to actual elements like
a in combination with other visual properties. This is most representative of how users would also identify interactive elements before interacting with them.
As mentioned, Enzyme was not necessarily created with behaviour testing in mind. This means that we sometimes need to be creative and tedious when implementing them if we want to write proper behaviour tests. The following snippet of code is an example of this based on how I always approach my behaviour tests in Enzyme nowadays.
const submitButton = wrapper.findWhere(
node => node.type() === 'button' && node.text() === 'Submit'
Its purpose is to find a submit button on the page using Enzyme API, which might be necessary to verify for example a form submit flow. It looks quite verbose, ugly, and tedious, especially if you compare it to something along the lines of
const submitButton = wrapper.find('.submit-button'), which I have encountered quite often. But on the other side is that the more tedious selector better represents how an actual user would in reality look for a submit button in your application. Namely, finding a visual button that has a “Submit” label.
Taking the more tedious option also avoids an issue that we described already, namely relying on implementation details. Let’s say that in the future we want to re-organise our components into a design system and have different sizes of buttons. Instead of
submit-button, we now have
If we would have used the less tedious but more convenient CSS selector approach to implement our behaviour test for the submit flow, then the test will now fail because of this styling change. The issue with this test suddenly failing is that it does not provide us with any meaningful information. It is not like the submit flow is broken for the user. The feature itself is perfectly fine, but the test is now failing. Suddenly we have to go out of our way and address it, which feels completely meaningless in this scenario. This is what causes frustrations with front end tests — tests suddenly failing without the feature being broken, thus being perceived as flaky, annoying, and susceptible to the smallest changes.
Instead, using the more tedious approach described above to find the same submit button in our test would avoid these kinds of unnecessary false-negative test results. This is because it does not rely on a fragile part of the code that is not relevant for the test, a CSS class, but instead relies on information that is meaningful to both us and the user. In this case, it is about an HTML element
button that has submission text on it. That is the most important information for us in this test. If in the future we would perform similar CSS style changes to the submit button, then the test result would not be affected because now we do not rely on that kind of information. This would also make sense with the experience of the users as it does not affect the submit flow itself.
But let’s say that at another moment we accidentally swapped the text labels of the submit and a cancel button on the page. The test will now fail, which makes sense because a different flow is now triggered and impacts the experience of our users. The feature is now broken for our users, which is very important information for us and should trigger a change in test results. But if we would have used the more convenient CSS selector approach, then the test would not be able to alert us to this broken feature because it’s relying on different, irrelevant information.
Taking the easy approach can be convenient and save you time when implementing the behaviour tests, but will leave you with a false sense of security and potentially meaningless tests. Instead, sometimes it’s better to be more tedious. It would take more upfront time and effort, but that investment holds its value in the resulting tests being more reliable and valuable.
One of the most common ways to write test tests in React is through snapshot tests. They are very convenient, easy to use, and require little to no testing effort. The issue with snapshots tests, though, is that more often than not they are close to meaningless and hold barely any value. This is because the only thing snapshots tests do is render your components and save the resulting DOM structure as a snapshot. Then, the next time another snapshot is made and compared to the previous one. If there is any difference, the snapshot test will fail.
The reason snapshots more often than not provide barely any value is that they do not go beyond the DOM structure of your components. This closely relates to the issue that we discussed before regarding shallow rendering, namely creating a testing environment that is not representative of how users will interact with your application by isolating components. In the case of snapshot tests, there is no way in which snapshot tests can validate a user’s workflow properly. Its only purpose is to validate that the resulting DOM structure of your components does not change, which is not relevant in the realm of behaviour testing.
So, if you are keen on writing proper behaviour tests in React, then stop using snapshot tests as they do not provide any meaningful value towards that goal.
Small note: I do not think that snapshot tests are completely useless in all scenarios. But in the context of behaviour testing, as I mentioned, there is little reason for their existence because they can not validate user behaviour in any way. Their purpose is focused on a totally different domain of testing, which focuses on preventing DOM changes. In scenarios where the DOM structure of certain parts of your frontend are not allowed to change, then snapshot tests are very meaningful and make perfect sense. But based on personal experience I have seen them used for totally different reasons, which only leads to more noise in your test results.
One of them is
jest-enzyme, which integrates smoothly with Enzyme and the provided matchers are an upgrade over the out of the box matchers. Without them, performing assertions is quite often a tedious, verbose, and unintuitive task as can be seen in an earlier section of this article. While with
jest-enzyme, a lot of those issues are abstracted away in the matchers that they provide, like
jest-enzyme is an upgrade over the out of the box matchers and made my life so much easier in regards to performing proper behaviour testing, it is by no means a perfect one. However, I definitely recommend looking into making use of additional extensions upon Jest to make your tests more intuitive. This will in turn make it easier for you to create tests that more closely resemble how users interact with your application.
Mocking is an essential part of writing test code, but can also be a very easy trap that causes a false sense of security. As mentioned earlier when discussing shallow rendering, isolating a component brings us further away from our goal to write proper behaviour tests that reflect how a user interacts with our application. Mocking is a powerful method to gain control over a certain part of your code, but can also result in creating an isolated environment for your test, which is exactly not what we want.
The key is to be thoughtful of what you’re mocking and for what reason you’re mocking it. If you’re mocking child components just because they’re annoying, you’re actively diminishing the representativeness of your tests compared to how users perceive it just for the sake of having passing tests. Moreover, you’re actually also moving away from behaviour integration tests and reverting to unit tests by isolating your components subject to testing.
Going into details for all the scenarios whether it makes sense to mock a certain piece of code in your tests is slightly outside of the scope of this article. I have another article that goes more in-depth into these scenarios. But for now, here are some general guidelines when dealing with mocks in frontend behaviour tests that I can give to help you out:
- If running a piece of code is not related to how the users interact with it but do require quite some overhead, then it makes sense to mock.
- If the code affects the results of your tests and you want control over the different scenarios that it causes to verify them, like error handling, then it makes sense to mock.
- Try to keep your tests contained and only verify what’s necessary. This will keep your tests to the point and avoid mocking too much because you’re trying to verify too much.
While performing behaviour testing using specialised libraries like React Testing Library is currently the preferred way of testing React components, it is unfortunately not something that every developer or team can immediately adopt. I have been in exactly this scenario, where the teams I worked on were stuck using Enzyme. In the past years, I have invested a lot of time and effort into applying behaviour testing in Enzyme nonetheless. While just doing it is not necessarily difficult, doing it in a proper manner where your tests are useful, reliable, and meaningful is not a trivial task.
To help you out with this, I have shared several tips in this article based on what I have learned over the years so you don’t have to reinvent the wheel all over again. The first step is to always use full DOM rendering where possible so that components are tested in an environment that is the same as the user would experience them.
Then, it is important to make sure that the foundation of your tests is solid. This means that you should be careful of relying on implementation details, avoid using CSS selector when looking for elements to interact with, and sometimes have to be tedious with your tests because Enzyme wasn’t necessarily built for behaviour testing.
Snapshot tests should be avoided as much as possible. Although they are very popular, they do not contribute to behaviour testing in any meaningful way. Using a library to add additional matchers on top of the existing ones can cover a lot of ground in making your life easier in regards to behaviour testing. Lastly, it’s important to be thoughtful of what you are mocking. Doing it recklessly and without a second thought can diminish the representativeness of your tests and thus bring you further away from proper behaviour tests.
Writing proper behaviour tests is not an easy task, especially if you are provided with the suboptimal tools to do so. But with some creativity, additional resources, and some guidelines, it is definitely possible to make it work with Enzyme. Hopefully, this article can serve those purposes to you and help you create more useful, reliable, and meaningful behaviour tests.