Developer

The project has the following structure:

.SNP_haplotype
    ├── docs
    ├── htmlcov
    ├── output
    ├── snp_haplotyper
    ├── test_data
    └── tests

The source code for BASHer is contained in the snp_haplotyper directory:

SNP_haplotype
    └── snp_haplotyper
            ├── autosomal_dominant_logic.py
            ├── autosomal_recessive_logic.py
            ├── debug_data.ipynb
            ├── excel_parser.py
            ├── exceptions.py
            ├── json2excel.py
            ├── sample_sheet_reader.py
            ├── snp_haplotype.py
            ├── snp_plot.py
            ├── stream_output.py
            ├── templates
            │   └── report_template.html
            └── x_linked_logic.py

snp_haplotype.py contains the majority of the code for processing samples

autosomal_dominant_logic.py, autosomal_recessive_logic.py, x_linked_logic.py contain the logic for assigning informative SNPs for each mode of inheritance.

sample_sheet_reader.py parses the command line arguments from the supplied excel sheet template containing the sample’s metadata.

report_template.html is the HTML template for the report.

stream_output.py streams the relevant output as a JSON when in testing mode so that the output is easily readable by the test suite.

Automated Testing Using Pytest

BASHer has two types of tests

SNP_haplotype
    └── tests
        ├── functional
        │   └── test_snp_filtering.py
        └── unit
            └── test_functions.py

Unit Tests

Unit tests are used to test the functionality of individual functions within the script to ensure they return expected results.

Functional Tests

Functional tests run test data through the software and compare the output to benchmark results manually curated by the PGD team. This benchmark covers all possible scenarios (reference type, mode of inheritance, embryo sex, and consanguinity). These were converted to machine readable JSON format and integrated into the test suite. Before any push to production these tests should be run to ensure that results match the expected output.

Coverage

The amount of code covered by the testing is calculated using coverage.py, the results of which can be found as an HTML report in:

SNP_haplotype
    └── htmlcov
        └── index.html

Run Tests and Coverage

# Run the tests use the following command within the local repo:
python -m pytest

# Calculate test coverage and produce HTML report (SNP_haplotype/htmlcov/index.html)
pytest --cov=snp_haplotyper

Debugging Tests

There is a known issue where the VS Code debugger doesn’t stop at breakpoints set in pytest test modules if pytest-cov is used to calculate coverage. This issue also affects PyCharm. When debugging tests you can manually set the “–no-cov” flag in the VS Code’s settings.json as shown below.

{
    "python.testing.pytestArgs": [
        ".",
        "--no-cov",
    ]
}

You can then debug the tests using breakpoints. Remember to change the settings.json back again once the debugging is completed.

Sphinx Documentation

The documentation for Basher is generated using Sphinx. The docstrings in the python modules are parsed using sphinx-apidoc and are automatically added to the documentation.

Editing Sphinx Documentation

The source files for the docs are located in the source folder as shown below. The majority of these source files are in markdown, with some being in rst or HTML. These files can be edited in a text editor and compiled following the instructions in the next section. This will build the corresponding HTML files from the source files and store the output in the build folder.

SNP_haplotype
    └── docs
        ├── apidocs
        ├── build
        └── source

Compiling Sphinx Documentation

# From within the local repo run the following commands.

# Auto generate documentation from docstrings
sphinx-apidoc -f -o docs/apidocs snp_haplotyper/

# Build documentation from source files
(cd docs && make clean) # Optional - removes any cached html files
sphinx-build -b html docs/source docs/build

To view the output open index.html in the build folder.

Release Process

TODO