scipy.stats.false_discovery_control — SciPy v1.13.1 Manual (2024)

scipy.stats.false_discovery_control(ps, *, axis=0, method='bh')[source]#

Adjust p-values to control the false discovery rate.

The false discovery rate (FDR) is the expected proportion of rejected nullhypotheses that are actually true.If the null hypothesis is rejected when the adjusted p-value falls belowa specified level, the false discovery rate is controlled at that level.

Parameters:
ps1D array_like

The p-values to adjust. Elements must be real numbers between 0 and 1.

axisint

The axis along which to perform the adjustment. The adjustment isperformed independently along each axis-slice. If axis is None, psis raveled before performing the adjustment.

method{‘bh’, ‘by’}

The false discovery rate control procedure to apply: 'bh' is forBenjamini-Hochberg [1] (Eq. 1), 'by' is for Benjaminini-Yekutieli[2] (Theorem 1.3). The latter is more conservative, but it isguaranteed to control the FDR even when the p-values are not fromindependent tests.

Returns:
ps_adustedarray_like

The adjusted p-values. If the null hypothesis is rejected where thesefall below a specified level, the false discovery rate is controlledat that level.

Notes

In multiple hypothesis testing, false discovery control procedures tend tooffer higher power than familywise error rate control procedures (e.g.Bonferroni correction [1]).

If the p-values correspond with independent tests (or tests with“positive regression dependencies” [2]), rejecting null hypothesescorresponding with Benjamini-Hochberg-adjusted p-values below \(q\)controls the false discovery rate at a level less than or equal to\(q m_0 / m\), where \(m_0\) is the number of true null hypothesesand \(m\) is the total number of null hypotheses tested. The same istrue even for dependent tests when the p-values are adjusted accorded tothe more conservative Benjaminini-Yekutieli procedure.

The adjusted p-values produced by this function are comparable to thoseproduced by the R function p.adjust and the statsmodels functionstatsmodels.stats.multitest.multipletests. Please consider the latterfor more advanced methods of multiple comparison correction.

References

[1](1,2,3,4,5)

Benjamini, Yoav, and Yosef Hochberg. “Controlling the falsediscovery rate: a practical and powerful approach to multipletesting.” Journal of the Royal statistical society: series B(Methodological) 57.1 (1995): 289-300.

[2](1,2)

Benjamini, Yoav, and Daniel Yekutieli. “The control of the falsediscovery rate in multiple testing under dependency.” Annals ofstatistics (2001): 1165-1188.

[3]

TileStats. FDR - Benjamini-Hochberg explained - Youtube.https://www.youtube.com/watch?v=rZKa4tW2NKs.

[4]

Neuhaus, Karl-Ludwig, et al. “Improved thrombolysis in acutemyocardial infarction with front-loaded administration of alteplase:results of the rt-PA-APSAC patency study (TAPS).” Journal of theAmerican College of Cardiology 19.5 (1992): 885-891.

Examples

We follow the example from [1].

Thrombolysis with recombinant tissue-type plasminogen activator (rt-PA)and anisoylated plasminogen streptokinase activator (APSAC) inmyocardial infarction has been proved to reduce mortality. [4]investigated the effects of a new front-loaded administration of rt-PAversus those obtained with a standard regimen of APSAC, in a randomizedmulticentre trial in 421 patients with acute myocardial infarction.

There were four families of hypotheses tested in the study, the last ofwhich was “cardiac and other events after the start of thrombolitictreatment”. FDR control may be desired in this family of hypothesesbecause it would not be appropriate to conclude that the front-loadedtreatment is better if it is merely equivalent to the previous treatment.

The p-values corresponding with the 15 hypotheses in this family were

>>> ps = [0.0001, 0.0004, 0.0019, 0.0095, 0.0201, 0.0278, 0.0298, 0.0344,...  0.0459, 0.3240, 0.4262, 0.5719, 0.6528, 0.7590, 1.000]

If the chosen significance level is 0.05, we may be tempted to reject thenull hypotheses for the tests corresponding with the first nine p-values,as the first nine p-values fall below the chosen significance level.However, this would ignore the problem of “multiplicity”: if we fail tocorrect for the fact that multiple comparisons are being performed, weare more likely to incorrectly reject true null hypotheses.

One approach to the multiplicity problem is to control the family-wiseerror rate (FWER), that is, the rate at which the null hypothesis isrejected when it is actually true. A common procedure of this kind is theBonferroni correction [1]. We begin by multiplying the p-values by thenumber of hypotheses tested.

>>> import numpy as np>>> np.array(ps) * len(ps)array([1.5000e-03, 6.0000e-03, 2.8500e-02, 1.4250e-01, 3.0150e-01, 4.1700e-01, 4.4700e-01, 5.1600e-01, 6.8850e-01, 4.8600e+00, 6.3930e+00, 8.5785e+00, 9.7920e+00, 1.1385e+01, 1.5000e+01])

To control the FWER at 5%, we reject only the hypotheses correspondingwith adjusted p-values less than 0.05. In this case, only the hypothesescorresponding with the first three p-values can be rejected. According to[1], these three hypotheses concerned “allergic reaction” and “twodifferent aspects of bleeding.”

An alternative approach is to control the false discovery rate: theexpected fraction of rejected null hypotheses that are actually true. Theadvantage of this approach is that it typically affords greater power: anincreased rate of rejecting the null hypothesis when it is indeed false. Tocontrol the false discovery rate at 5%, we apply the Benjamini-Hochbergp-value adjustment.

>>> from scipy import stats>>> stats.false_discovery_control(ps)array([0.0015 , 0.003 , 0.0095 , 0.035625 , 0.0603 , 0.06385714, 0.06385714, 0.0645 , 0.0765 , 0.486 , 0.58118182, 0.714875 , 0.75323077, 0.81321429, 1. ])

Now, the first four adjusted p-values fall below 0.05, so we would rejectthe null hypotheses corresponding with these four p-values. Rejectionof the fourth null hypothesis was particularly important to the originalstudy as it led to the conclusion that the new treatment had a“substantially lower in-hospital mortality rate.”

scipy.stats.false_discovery_control — SciPy v1.13.1 Manual (2024)

FAQs

What does SciPy stats mode do? ›

mode. Return an array of the modal (most common) value in the passed array. If there is more than one such value, only the smallest is returned.

What are SciPy stats used for? ›

The scipy. stats is the SciPy sub-package. It is mainly used for probabilistic distributions and statistical operations. There is a wide range of probability functions.

How to calculate mean using SciPy stats? ›

mean(array, axis=0) function calculates the arithmetic mean of the array elements along the specified axis of the array (list in python). Parameters : array: Input array or object having the elements to calculate the arithmetic mean.

How to install SciPy stats in python? ›

We can install the SciPy library by using pip command; run the following command in the terminal: pip install scipy.

What can SciPy be used for? ›

SciPy is an open-source Python library which is used to solve scientific and mathematical problems. It is built on the NumPy extension and allows the user to manipulate and visualize data with a wide range of high-level commands.

What does SciPy stats norm do? ›

Scipy Stats Norm is a submodule of Scipy library that provides methods for working with Normal Distribution. We can use it to create a Normal Distribution with given mean and standard deviation values. The probability density function (pdf) of the Normal Distribution can be obtained using the norm.

Is SciPy still being used? ›

The SciPy library is currently distributed under the BSD license, and its development is sponsored and supported by an open community of developers. It is also supported by NumFOCUS, a community foundation for supporting reproducible and accessible science.

What does SciPy signal do? ›

Scipy Signal is a Python library that provides tools for signal processing, such as filtering, Fourier transforms, and wavelets. It is built on top of the Scipy library and provides a comprehensive set of functions for working with signals. The signal module contains a wide variety of functions for processing signals.

Which of the following is performed using SciPy? ›

SciPy includes tools to perform numerical analysis such as optimization, integration, and linear algebraic operations, as well as data visualization tools such as Matplotlib, pandas, and seaborn.

What is the harmonic mean in SciPy stats? ›

The harmonic mean is computed over a single dimension of the input array, axis=0 by default, or all values in the array if axis=None. float64 intermediate and return values are used for integer inputs. Beginning in SciPy 1.9, np. matrix inputs (not recommended for new code) are converted to np.

How do you use statistics mean in Python? ›

Definition and Usage

The statistics. mean() method calculates the mean (average) of the given data set. Tip: Mean = add up all the given values, then divide by how many values there are.

What is the harmonic mean in Python statistics? ›

Harmonic mean = The reciprocal of the arithmetic mean() of the reciprocals of the data. The harmonic mean is calculated as follows: If you have four values (a, b, c and d) - it will be equivalent to 4 / (1/a + 1/b + 1/c + 1/d).

What is SciPy stats in Python? ›

Scipy is a powerful library in Python that provides many useful functions for scientific computing. One of its sub-modules, scipy. stats, contains a variety of statistical functions and probability distributions that are commonly used in data analysis.

How do I access SciPy in Python? ›

Type and run pip install scipy in the command prompt.

This will use the Python Package index, and install the core SciPy packages on your computer. You can also install other core packages like Numpy and Matplotlib by using the pip install numpy and pip install matplotlib commands.

How to install stats in Python? ›

2. How to Install
  1. 2.1. Clone the Repository. git clone git@github.com:runawayhorse001/statspy.git.
  2. 2.2. Install. cd statspy pip install -r requirements.txt python setup.py install.
  3. 2.3. Uninstall. pip uninstall statspy.
  4. 2.4. Test. cd statspy/test python test1.py. test1.py.
Mar 19, 2019

What is statistics mode in Python? ›

In Statistics, the value which occurs more often in a provided set of data values is known as the mode. In other terms, the number or value which has a high frequency or appears repeatedly is known as the mode or the modal value.

What is the difference between SciPy stats and statsmodels? ›

Scipy. stats has all of the probability distributions and some statistical tests. It's more like library code in the vein of numpy and scipy. Statsmodels on the other hand provides statistical models with a formula framework similar to R and it works with pandas DataFrames.

What is a mode in stats? ›

In statistics, the mode is the number that occurs most often. A data set can have one or more modes. The mode is different from the mean, which is the average of the numbers in a set; it's also different from the median, which is the midpoint of a set.

What does the mode function do in Python? ›

mode() method calculates the mode (central tendency) of the given numeric or nominal data set.

References

Top Articles
Latest Posts
Article information

Author: Horacio Brakus JD

Last Updated:

Views: 6171

Rating: 4 / 5 (71 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Horacio Brakus JD

Birthday: 1999-08-21

Address: Apt. 524 43384 Minnie Prairie, South Edda, MA 62804

Phone: +5931039998219

Job: Sales Strategist

Hobby: Sculling, Kitesurfing, Orienteering, Painting, Computer programming, Creative writing, Scuba diving

Introduction: My name is Horacio Brakus JD, I am a lively, splendid, jolly, vivacious, vast, cheerful, agreeable person who loves writing and wants to share my knowledge and understanding with you.