Terminologies Used in Statistics

Terminologies Used in Statistics

statistics terminologies Data analysis and statistics have been around for the longest time and their contribution has been immense in every field, after all, every time there is a need for research statistical data to be employed and data analysis techniques brought into play. Just like in any other field, a statistical lingo had to be birthed and this means that all you need to mention is one word and you know what is needed. Statistics is not a walk in the park and if you’re not well versed with the statistical lingo, then you will neither be part of the conversation nor will you follow what is being said. But let that not be a problem for you, I’ll walk you through a couple of commonly used statistics terminologies.

  • Data set – Data sets describe values for each variable for quantities such as height, weight, temperature, volume, etc. it involves grouping the said data.
  • Skewness – This is the degree of asymmetry observed in a probability distribution. Skewness is normally observed from a graph and there can be one of three outcomes; positive skewness (to the right), negative skewness (to the left) or zero skewness
  • Kurtosis – Kurtosis is used to describe the level of peakedness of a frequency distribution. In describing the type of peak, there are three terminologies used i.e. Leptokurtic(high peak), mesokurtic(mid peak). Platykurtic (low peak)
  • Probability- this is generally used to describe the likelihood of the occurrence of an event.
  • Mean – this is the summation of all entries in a data set divided by the number of entries in the data set.
  • Mode- This refers to the most repeated value in a data set.
  • Median-this refers to the middle value in a data set. Can be derived from both even and odd values of data entries.
  • Standard deviation- The most commonly used measure of the spread of a set of observations. Equal to the square root of the variance.
  • Cumulative frequency distribution: The tabulation of a sample of observations in terms of numbers falling below particular values.
  • Average: Most often used for the arithmetic mean of a sample of observations, but can also be used for other measures of location such as the median.
  • Confidence intervals: It is a value that combines standard error and sample statistics to predict population parameters.
  • Standard errors: It is variability of sample mean.
  • Alpha factoring: A method of factor analysis in which the variables are considered samples from a population of variables.
  • Class- a grouping of values by which data is binned for computation of a frequency distribution
  • Data fusion: The act of combining data from heterogeneous sources with the intent of extracting information that would not be available for any single source in isolation
  • Bell curve- a graph depicting normal distribution with a high average and entries on both extremes and is similar to a bell in shape.
  • Coefficient of variation- a measure of spread for a set of data.
  • Time series analysis: It is a statistical procedure of analyzing data points indexed over time.
  • Alternative hypothesis: Theory that contradicts null hypothesis
  • Null hypothesis: This is informed assumption whether your statistical assumption is true
  • Analysis of covariance: This is a statistical tool that evaluates differences in means of effects or dependent variable related to the effect of the controlled independent variable while observing the effect of the uncontrolled independent variables. The analysis is more accurate and unbiased.
    Analysis of variance: This is a statistical tool that compares variances across means different groups.
  • Covariates: This is a continuous variable that influences the dependent variable but it is not of interest in the study
  • Causation- this implies that the occurrence of an event is dependent on the occurrence of another event.
  • Inferential statistics- This refers to data analysis in an effort to make conclusions from given data.
  • Descriptive statistics – A general term for methods of summarizing and tabulating data that make their main features more transparent.
  • Differencing: A simple approach to removing trends in time series
  • Sample size: The number of individuals to be included in an investigation. Usually chosen so that the study has a particular power of detecting an effect of a particular size
  • Population- it is the general group which a sample is taken from to give information on the whole group.
  • Frequency distribution- describes how many times an event is repeated. Or the number of times an occurrence is repeated.
  • Outlier: It is commonly referred as extreme of data points.
  • Venn diagram- A graphical representation of the extent to which two or more quantities or concepts are mutually inclusive and mutually exclusive.
  • Graphs- this is a diagrammatic representation of data normally on a horizontal and vertical axis that represent various sets of data for comparison purposes. There are various types of graphs depending on the purpose of the graphs.
  • Histogram- A graphical representation of a set of observations in which class frequencies are represented by the areas of rectangles centred on the class interval. If the latter are all equal, the heights of the rectangles are also proportional to the observed frequencies.
  • Scatterplots: This a visualization tool that plots two continuous variable
  • Break-even point- this is basically termed as the point where the total revenue is equal to the cost incurred hence profit at BEP is usually zero.
  • Quartile- The values that divide a frequency distribution or probability distribution into four equal parts.
  • Regression- This is a statistical technique that serves as a basis for studying and characterizing a system of interest, by formulating a reasonable mathematical model of the relationship between a response variable, y and a set of q explanatory variables, x1; x2; …xq.
  • Stem and leaf diagram- A method of displaying data in which each observation is split into two parts labelled the ‘stem’ and the ‘leaf’. With the stem as the common figure in the data set and the leaf as the unique figure.

These are just some of the most commonly used statistics terminologies; however, there is so much more that you need to learn to achieve guru status in statistics. Also, consider studying different statistical laws and the people who coined them. To top it all up, statistical formulae is extremely crucial to your statistical knowledge foundation.

The article has been prepared by our trained statistician. He has published numerous articles on data analysis.

See below links

How To Code and Enter Data in SPSS

R vs STATA vs SPSS

How To Choose Statistical Tests

Article written by Ngari Ngunjiri