Mocking the unmockable

Mocking is often challenging. Here is a situation I always tried to work around instead of solving it. We've got a function that is called on importing a module.

The Problem

Sometimes there is a need to call a function in the import stage. It might be connecting to a database or retrieving secure data from KMS. I've created an example scenario. To replicate it, create a project with poetry

➜  Medium > poetry new mocking_problem
➜  Medium > cd mocking_problem

Then add the following lib module with a test. The get_data() checks for the CACHED_DATA environment variable and returns its contents. Otherwise, it raises a NonImplementedError to emphasize the code we don't want to run in the tests.

# mocking_problem/lib.py
import json
import os
 
 
def get_data():
    if os.environ.get("CACHED_DATA"):
        return json.loads(os.environ["CACHED_DATA"])
 
    raise NotImplementedError()

# tests/test_lib.py
from unittest import mock
import pytest
 
from mocking_problem.lib import get_data
 
 
def test_get_data_raises():
    with pytest.raises(NotImplementedError):
        get_data()
 
@mock.patch("mocking_problem.lib.os")
def test_get_data_from_environment(mock_os):
    mock_os.environ = {"CACHED_DATA": '{"cached": "data"}'}
    assert get_data() == {"cached": "data"}

The action module depends on the lib. It calls the get_data() and then uses received data in the use_data() method.

# mocking_problem/action.py
from logging import getLogger
from mocking_problem.lib import get_data
 
 
logger = getLogger()
 
 
def use_data(data):
    logger.info(data)
 
received_data = get_data()
use_data(received_data)

# tests/test_action.py
from unittest import mock
from mocking_problem.action import use_data
 
@mock.patch("mocking_problem.action.logger")
def test_use_data(mock_logger):
    use_data({"some": "data"})
    mock_logger.info.assert_called_with({"some": "data"})

If we try to run pytest, it would fail because the get_data() is called when use_data() method is imported from the mocking_problem.action module.

➜  mocking_problem > poetry run pytest
============================ test session starts ============================
platform darwin -- Python 3.11.2, pytest-7.2.2, pluggy-1.0.0
rootdir: /Users/zalun/Projects/Medium/mocking_problem
collected 1 item / 1 error
 
================================== ERRORS ===================================
___________________ ERROR collecting tests/test_action.py ___________________
tests/test_action.py:2: in <module>
    from mocking_problem.action import use_data
mocking_problem/action.py:11: in <module>
    received_data = get_data()
mocking_problem/lib.py:8: in get_data
    raise NotImplementedError()
E   NotImplementedError
========================== short test summary info ==========================
ERROR tests/test_action.py - NotImplementedError
!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!
============================= 1 error in 0.04s ==============================

The Workaround

The usual workaround might be to provide the CACHED_DATA environment variable for the test. We can do it using a very helpful pytest-dotenv package.

➜  mocking_problem poetry > add --group dev pytest-dotenv

# pytest.ini
[pytest]
env_files =
    .test.env
 
# .test.env
CACHED_DATA='{"default": "data"}'

Now the failing test is in the test_lib.py as we no longer raise the NotImplementedError. The CACHED_DATA environment variable is there for all the tests.

➜  mocking_problem > poetry run pytest
============================ test session starts ============================
platform darwin -- Python 3.11.2, pytest-7.2.2, pluggy-1.0.0
rootdir: /Users/zalun/Projects/Medium/mocking_problem, configfile: pytest.ini
plugins: dotenv-0.5.2
collected 2 items
 
tests/test_action.py .                                                [ 33%]
tests/test_lib.py F.                                                  [100%]
 
================================= FAILURES ==================================
___________________________ test_get_data_raises ____________________________
 
    def test_get_data_raises():
>       with pytest.raises(NotImplementedError):
E       Failed: DID NOT RAISE <class 'NotImplementedError'>
 
tests/test_lib.py:7: Failed
========================== short test summary info ==========================
FAILED tests/test_lib.py::test_get_data_raises - Failed: DID NOT RAISE <class 'NotImplementedError'>
======================== 1 failed, 1 passed in 0.03s ========================

We can mock the os in test_get_data_raises() as well and change os.environ.get() to return None.

# tests/test_lib.py
from unittest import mock
import pytest
 
from mocking_problem.lib import get_data
 
 
@mock.patch("mocking_problem.lib.os")
def test_get_data_raises(mock_os):
    mock_os.environ.get.return_value = None
    with pytest.raises(NotImplementedError):
        get_data()
 
 
@mock.patch("mocking_problem.lib.os")
def test_get_data_from_environment(mock_os):
    mock_os.environ = {"CACHED_DATA": '{"cached": "data"}'}
    assert get_data() == {"cached": "data"}

Tests are working, and we can drink in the company relaxing chair.

➜  mocking_problem > poetry run pytest
============================ test session starts ============================
platform darwin -- Python 3.11.2, pytest-7.2.2, pluggy-1.0.0
rootdir: /Users/zalun/Projects/Medium/mocking_problem, configfile: pytest.ini
plugins: dotenv-0.5.2
collected 2 items
 
tests/test_action.py .                                                [ 33%]
tests/test_lib.py ..                                                  [100%]
 
============================= 3 passed in 0.02s =============================

Only the problem hasn't gone away, the get_data() function is still called on import, and we're unable to test if it did.

Solution 1

We can rewrite the test_action.py and make it use the mocked get_data() on import. We will patch the sys.modules with a mocked lib module. We need to patch it with the logging module as we use it also on import.

# tests/test_action.py
from unittest import mock
import sys
 
 
# Prepare mocked lib module
mock_lib = mock.Mock()
mock_lib.get_data.return_value = {"received": "data"}
# Mock sys.modules with the mocked lib and logger
with mock.patch.dict(
    sys.modules, **{"mocking_problem.lib": mock_lib, "logging": mock.Mock()}
):
    from mocking_problem.action import (
        received_data,
        use_data,
        logger as mock_logger,
    )
    from mocking_problem.lib import get_data as mock_get_data
 
 
def test_action_get_data_is_called_on_import():
    assert received_data == {"received": "data"}
    mock_get_data.assert_called_once()
 
 
def test_action_use_data_is_called_on_import():
    mock_logger.info.assert_called_once_with({"received": "data"})
 
 
def test_use_data_calls_logger():
    mock_logger.info.reset_mock()
    use_data({"some": "data"})
    mock_logger.info.assert_called_once_with({"some": "data"})

It is a good enough solution. We can test the effect of calling use_data() on import, but we can't check if it was actually called — the test would still pass if someone would change the code and log the received data directly in the module.

Solution 2 (with refactoring the `action` module)

We want to mock the get_data() and use_data(), import the action.py module, and check if both were called.

This time we will refactor the action.py module and move use_data() into another file. For simplicity, we will place it in the lib.py:

# mocking_problem/lib.py
import json
import os
from logging import getLogger
 
 
logger = getLogger()
 
 
def get_data():
    os.environ["CACHED_DATA"]
    if os.environ.get("CACHED_DATA"):
        return json.loads(os.environ["CACHED_DATA"])
 
    raise NotImplementedError()
 
 
def use_data(data):
    logger.info(data)

And the action.py will import it along with get_data():

# mocking_problem/action.py
from mocking_problem.lib import get_data, use_data
 
 
received_data = get_data()
use_data(received_data)

The tests will change, and testing use_data() will now happen in the test_lib.py. We no longer need the test_action.py for that purpose.

# tests/test_lib.py
import pytest
 
from unittest import mock
from mocking_problem.lib import get_data, use_data
 
 
@mock.patch("mocking_problem.lib.os")
def test_get_data_raises(mock_os):
    mock_os.environ.get.return_value = None
    with pytest.raises(NotImplementedError):
        get_data()
 
 
@mock.patch("mocking_problem.lib.os")
def test_get_data_from_environment(mock_os):
    mock_os.environ = {"CACHED_DATA": '{"cached": "data"}'}
    assert get_data() == {"cached": "data"}
 
 
@mock.patch("mocking_problem.lib.logger")
def test_use_data(mock_logger):
    use_data({"some": "data"})
    mock_logger.info.assert_called_with({"some": "data"})

The tests are working, but we still need to find out if functions in action.py are called in the right way on import.

We will patch the sys.modules with a mocked lib module. Imports in action will now use the mocked version.

# tests/test_action.py
from unittest import mock
import sys
 
 
mock_lib = mock.Mock()
# We need to set the return_value before importing the action module.
mock_lib.get_data.return_value = {"received": "data"}
with mock.patch.dict(sys.modules, **{"mocking_problem.lib": mock_lib}):
    from mocking_problem.action import received_data
    from mocking_problem.lib import (
        get_data as mock_get_data,
        use_data as mock_use_data,
    )
 
 
def test_action_received_data_instantiated_with_mocked_value():
    assert received_data == {"received": "data"}
 
 
def test_action_get_data_is_called_on_import():
    mock_get_data.assert_called_once()
 
 
def test_action_use_data_is_called_on_import():
    mock_use_data.assert_called_once_with({"received": "data"})

Tests passed. We are testing all the functions. We're seeing the right functions called with the right values on importing the actions module.

➜  mocking_problem > poetry run pytest
============================ test session starts ============================
platform darwin -- Python 3.11.2, pytest-7.2.2, pluggy-1.0.0
rootdir: /Users/zalun/Projects/Medium/mocking_problem, configfile: pytest.ini
plugins: dotenv-0.5.2
collected 6 items
 
tests/test_action.py ...                                              [ 50%]
tests/test_lib.py ...                                                 [100%]
 
============================= 6 passed in 0.03s =============================

Next Steps

It is possible to modify the solution in multiple ways. There might be a need to leave some of the methods from the lib in the original state. In that case instead of creating the module as a mock:

# tests/test_action.py
 
# ...
mock_lib = mock.Mock()
mock_lib.get_data.return_value = {"received": "data"}
# ...

We can import it and replace some of the methods with mocks leaving the rest of the test untouched:

# tests/test_action.py
 
# ...
import mocking_problem.lib as lib
 
 
lib.get_data = mock.Mock()
lib.get_data.return_value = {"received": "data"}
lib.use_data = mock.Mock()
# ...

The Problem

The Workaround

Solution 1

Solution 2 (with refactoring the action module)

Next Steps

Solution 2 (with refactoring the `action` module)