scipy.stats.false_discovery_control — SciPy v1.13.1 Manual (2024)

scipy.stats.false_discovery_control(ps, *, axis=0, method='bh')[source]#

Adjust p-values to control the false discovery rate.

The false discovery rate (FDR) is the expected proportion of rejected nullhypotheses that are actually true.If the null hypothesis is rejected when the adjusted p-value falls belowa specified level, the false discovery rate is controlled at that level.

Parameters:

ps1D array_like: The p-values to adjust. Elements must be real numbers between 0 and 1.
axisint: The axis along which to perform the adjustment. The adjustment isperformed independently along each axis-slice. If axis is None, psis raveled before performing the adjustment.
method{‘bh’, ‘by’}: The false discovery rate control procedure to apply: 'bh' is forBenjamini-Hochberg [1] (Eq. 1), 'by' is for Benjaminini-Yekutieli[2] (Theorem 1.3). The latter is more conservative, but it isguaranteed to control the FDR even when the p-values are not fromindependent tests.

Returns:

ps_adustedarray_like: The adjusted p-values. If the null hypothesis is rejected where thesefall below a specified level, the false discovery rate is controlledat that level.

References

[1](1,2,3,4,5)

Benjamini, Yoav, and Yosef Hochberg. “Controlling the falsediscovery rate: a practical and powerful approach to multipletesting.” Journal of the Royal statistical society: series B(Methodological) 57.1 (1995): 289-300.

[2](1,2)

Benjamini, Yoav, and Daniel Yekutieli. “The control of the falsediscovery rate in multiple testing under dependency.” Annals ofstatistics (2001): 1165-1188.

[3]

TileStats. FDR - Benjamini-Hochberg explained - Youtube.https://www.youtube.com/watch?v=rZKa4tW2NKs.

[4]

Neuhaus, Karl-Ludwig, et al. “Improved thrombolysis in acutemyocardial infarction with front-loaded administration of alteplase:results of the rt-PA-APSAC patency study (TAPS).” Journal of theAmerican College of Cardiology 19.5 (1992): 885-891.

Examples

We follow the example from [1].

Thrombolysis with recombinant tissue-type plasminogen activator (rt-PA)and anisoylated plasminogen streptokinase activator (APSAC) inmyocardial infarction has been proved to reduce mortality. [4]investigated the effects of a new front-loaded administration of rt-PAversus those obtained with a standard regimen of APSAC, in a randomizedmulticentre trial in 421 patients with acute myocardial infarction.

There were four families of hypotheses tested in the study, the last ofwhich was “cardiac and other events after the start of thrombolitictreatment”. FDR control may be desired in this family of hypothesesbecause it would not be appropriate to conclude that the front-loadedtreatment is better if it is merely equivalent to the previous treatment.

The p-values corresponding with the 15 hypotheses in this family were

>>> ps = [0.0001, 0.0004, 0.0019, 0.0095, 0.0201, 0.0278, 0.0298, 0.0344,...  0.0459, 0.3240, 0.4262, 0.5719, 0.6528, 0.7590, 1.000]

If the chosen significance level is 0.05, we may be tempted to reject thenull hypotheses for the tests corresponding with the first nine p-values,as the first nine p-values fall below the chosen significance level.However, this would ignore the problem of “multiplicity”: if we fail tocorrect for the fact that multiple comparisons are being performed, weare more likely to incorrectly reject true null hypotheses.

One approach to the multiplicity problem is to control the family-wiseerror rate (FWER), that is, the rate at which the null hypothesis isrejected when it is actually true. A common procedure of this kind is theBonferroni correction [1]. We begin by multiplying the p-values by thenumber of hypotheses tested.

>>> import numpy as np>>> np.array(ps) * len(ps)array([1.5000e-03, 6.0000e-03, 2.8500e-02, 1.4250e-01, 3.0150e-01, 4.1700e-01, 4.4700e-01, 5.1600e-01, 6.8850e-01, 4.8600e+00, 6.3930e+00, 8.5785e+00, 9.7920e+00, 1.1385e+01, 1.5000e+01])

To control the FWER at 5%, we reject only the hypotheses correspondingwith adjusted p-values less than 0.05. In this case, only the hypothesescorresponding with the first three p-values can be rejected. According to[1], these three hypotheses concerned “allergic reaction” and “twodifferent aspects of bleeding.”

An alternative approach is to control the false discovery rate: theexpected fraction of rejected null hypotheses that are actually true. Theadvantage of this approach is that it typically affords greater power: anincreased rate of rejecting the null hypothesis when it is indeed false. Tocontrol the false discovery rate at 5%, we apply the Benjamini-Hochbergp-value adjustment.

>>> from scipy import stats>>> stats.false_discovery_control(ps)array([0.0015 , 0.003 , 0.0095 , 0.035625 , 0.0603 , 0.06385714, 0.06385714, 0.0645 , 0.0765 , 0.486 , 0.58118182, 0.714875 , 0.75323077, 0.81321429, 1. ])

Now, the first four adjusted p-values fall below 0.05, so we would rejectthe null hypotheses corresponding with these four p-values. Rejectionof the fourth null hypothesis was particularly important to the originalstudy as it led to the conclusion that the new treatment had a“substantially lower in-hospital mortality rate.”

scipy.stats.false_discovery_control — SciPy v1.13.1 Manual (2024)

FAQs

What does SciPy stats mode do? ›

mode. Return an array of the modal (most common) value in the passed array. If there is more than one such value, only the smallest is returned.

Tell Me More ›

What are SciPy stats used for? ›

The scipy. stats is the SciPy sub-package. It is mainly used for probabilistic distributions and statistical operations. There is a wide range of probability functions.

Keep Reading ›

How to calculate mean using SciPy stats? ›

mean(array, axis=0) function calculates the arithmetic mean of the array elements along the specified axis of the array (list in python). Parameters : array: Input array or object having the elements to calculate the arithmetic mean.

View Details ›

How to install SciPy stats in python? ›

We can install the SciPy library by using pip command; run the following command in the terminal: pip install scipy.

Explore More ›

What can SciPy be used for? ›

SciPy is an open-source Python library which is used to solve scientific and mathematical problems. It is built on the NumPy extension and allows the user to manipulate and visualize data with a wide range of high-level commands.

Get More Info Here ›

What does SciPy stats norm do? ›

Scipy Stats Norm is a submodule of Scipy library that provides methods for working with Normal Distribution. We can use it to create a Normal Distribution with given mean and standard deviation values. The probability density function (pdf) of the Normal Distribution can be obtained using the norm.

Discover More ›

Is SciPy still being used? ›

The SciPy library is currently distributed under the BSD license, and its development is sponsored and supported by an open community of developers. It is also supported by NumFOCUS, a community foundation for supporting reproducible and accessible science.

Read The Full Story ›

What does SciPy signal do? ›

Scipy Signal is a Python library that provides tools for signal processing, such as filtering, Fourier transforms, and wavelets. It is built on top of the Scipy library and provides a comprehensive set of functions for working with signals. The signal module contains a wide variety of functions for processing signals.

Explore More ›

Which of the following is performed using SciPy? ›

SciPy includes tools to perform numerical analysis such as optimization, integration, and linear algebraic operations, as well as data visualization tools such as Matplotlib, pandas, and seaborn.

What is the harmonic mean in SciPy stats? ›

The harmonic mean is computed over a single dimension of the input array, axis=0 by default, or all values in the array if axis=None. float64 intermediate and return values are used for integer inputs. Beginning in SciPy 1.9, np. matrix inputs (not recommended for new code) are converted to np.

How do you use statistics mean in Python? ›

Definition and Usage

The statistics. mean() method calculates the mean (average) of the given data set. Tip: Mean = add up all the given values, then divide by how many values there are.

Learn More Now ›

What is the harmonic mean in Python statistics? ›

Harmonic mean = The reciprocal of the arithmetic mean() of the reciprocals of the data. The harmonic mean is calculated as follows: If you have four values (a, b, c and d) - it will be equivalent to 4 / (1/a + 1/b + 1/c + 1/d).

Get More Info ›

What is SciPy stats in Python? ›

Scipy is a powerful library in Python that provides many useful functions for scientific computing. One of its sub-modules, scipy. stats, contains a variety of statistical functions and probability distributions that are commonly used in data analysis.

Explore More ›

How do I access SciPy in Python? ›

Type and run pip install scipy in the command prompt.

This will use the Python Package index, and install the core SciPy packages on your computer. You can also install other core packages like Numpy and Matplotlib by using the pip install numpy and pip install matplotlib commands.

How to install stats in Python? ›

2. How to Install

2.1. Clone the Repository. git clone git@github.com:runawayhorse001/statspy.git.
2.2. Install. cd statspy pip install -r requirements.txt python setup.py install.
2.3. Uninstall. pip uninstall statspy.
2.4. Test. cd statspy/test python test1.py. test1.py.

Mar 19, 2019

Get More Info Here ›

What is statistics mode in Python? ›

In Statistics, the value which occurs more often in a provided set of data values is known as the mode. In other terms, the number or value which has a high frequency or appears repeatedly is known as the mode or the modal value.

Get More Info ›

What is the difference between SciPy stats and statsmodels? ›

Scipy. stats has all of the probability distributions and some statistical tests. It's more like library code in the vein of numpy and scipy. Statsmodels on the other hand provides statistical models with a formula framework similar to R and it works with pandas DataFrames.

What is a mode in stats? ›

In statistics, the mode is the number that occurs most often. A data set can have one or more modes. The mode is different from the mean, which is the average of the numbers in a set; it's also different from the median, which is the midpoint of a set.

What does the mode function do in Python? ›

mode() method calculates the mode (central tendency) of the given numeric or nominal data set.