Predicting JEE cut-off

The cut-off is a popular word during the admission season, especially in India. For us at Questionbang, the cut-off is something often asked by mock-set-plus users – Has my score is good enough to qualify Joint Entrance Examination (JEE)? Hence, we decided to predict the cut-off for the next season and have it in result analysis.

Background

Most of us have used basic regression analysis during our class 12th maths, e.g, time series and forecasting. In reality, the outcomes of such predictions are going to be dependent on various factors. The use of simple regression may not be sufficient in such cases.

Let us consider our requirement – predicting cut-off marks for JEE. The table below shows cut-off scores for the last 6 years.

JEE cut-off data — Table 1. Cut-off scores for the last 6 years.

Let us try a simple curve fitting approach (Figure 1); this is giving a prediction of 69 for the year 2019. However, we cannot relate these points to any reasoning. As we can see, the cut-off (Y) was 113 for the year 2013 and became 74 in 2018 (Table 1). Surely many factors influence the cut-off.

scatter plot diagram for JEE cut-off — Figure 1. Scatter plot diagram for table 1.

PRACTICE JEE ONLINE TESTS >>

Assumptions

Let us assume those cut-offs (Table 1) are a measure of competition and hence, are a function of the following variables:

Number of seats available ( $x_1$ ),
Difficulty level ( $x_2$ ),
Number of applicants ( $x_3$ ).

How these individual variables influence the outcome is something to be predicted.

Choosing a regression method

What is the regression analysis?

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables [wiki].

There are many different types of regression techniques. They are mainly of two categories – linear regression and non-linear regression.

In our case, the cut-off score (predictor) is dependent on three independent variables ( $x_1$ , $x_2$ , $x_3$ ) as discussed before. Hence, this is going to be a multiple linear regression scenario.

The table below (Table 2) is an extension of Table 1 (Cut-off scores for the last 6 years) to include the number of seats available, difficulty level and the number of applicants.

JEE cut-off observations — Table 2. Extended table to include other variables.
The difficulty level is a categorical variable having level 1 (high) or 0 (moderate).

About data

The data – cut-off scores, seat availability, number of applicants and difficulty level have been gathered from online news portals. The JEE format changed a few times during the past 20+ years. It has a two-phase (Mains & Advanced) format since 2013. We will use data from 2013 to 2018.

PRACTICE JEE MOCK TESTS >>

Revisiting the basics of least square regression method – a single independent variable condition

Assume a single independent variable condition and set of values as below (Table 3). Let us call these observations.

$y_i$	$x_i$
$y_1$	$x_1$
$y_2$	$x_2$
$\vdots$	$\vdots$
$y_n$	$x_n$

Table 3. Observations.
$x_i$ – Independent variable,
$y_i$ – Actual dependent variable.

Following is an equation for simple linear regression:

(1)

Let us compute predictions $(y_p)_i$ , using the the above values (Table 3):

$(y_p)_1= a+ b {x_1}$

$(y_p)_2= a+ b {x_2}$

$\vdots$ $\vdots$

$(y_p)_n= a+ b {x_n}$ .

In generic form, the equation for predicted value will become:

(2)

$y_i$	$x_i$	$y_p$
$y_1$	$x_1$	$(y_p)_1$
$y_2$	$x_2$	$(y_p)_2$
$\vdots$	$\vdots$	$\vdots$
$y_n$	$x_n$	$(y_p)_n$

Table 4. Observations and predictions.
$y_p$ – Predicted value.

Let us verify the accuracy of our prediction (Table 5),

$y_i$	$x_i$	$y_p$	$e_i$
$y_1$	$x_1$	$(y_p)_1$	$y_1 - (y_p)_1$
$y_2$	$x_2$	$(y_p)_2$	$y_2 - (y_p)_2$
$\vdots$	$\vdots$	$\vdots$	$\vdots$
$y_n$	$x_n$	$(y_p)_n$	$y_n - (y_p)_n$

Table 5. Observations, predictions and errors.
$e_i$ = error term.

As you can see, we subtracted the predictions ( $y_p$ ) from the actual observations ( $y_i$ ) to compute errors ( $e_i$ ). The next objective would be to refit the line so that the error ( $e_i$ ) is minimized.

(3)

From (2) and (3),

(4)

Eq (4) is a squared error function; we need to find coefficients a & b to achieve minimum (zero) error. Take the partial derivative of eq (4) with respect to a and b:

(5)

(6)

From (5) and (6),

 (7)    
 (8)

PRACTICE FREE JEE MOCK TESTS ONLINE >>

Computing JEE Cut-off – three independent variable condition

In our case, we have three independent variables – $x_1$ , $x_2$ , $x_3$ and coefficients – $a$ , $b$ , $c$ , $d$ . The regression equation becomes,

(9)

Intercept a is:

                                                     (10)

where;
$\overline{x}_1$ , $\overline{x}_2$ and $\overline{x}_3$ are means of respective variable values.

And the normal equations are,

(11) $\begin{equation*}$\sum{y_ix_1}$ = $b\sum{x_1^2}+c\sum{x_1x_2} +d\sum{x_1x_3} $, \end{equation*}$

(12) $\begin{equation*}$\sum{y_ix_2}$ = $b\sum{x_1x_2}+c\sum{x_2^2} +d\sum{x_2x_3} $, \end{equation*}$

(13) $\begin{equation*}$\sum{y_ix_3}$ = $b\sum{x_1x_3}+c\sum{x_2x_3} +d\sum{x_3^2} $. \end{equation*}$

Writing the above equations in matrix form and solving using Cramer’s rule:

$b=\frac{\sum{y_ix_1}[\sum{x_2^2}\sum{x_3^2}-(\sum{x_2x_3})]-\sum{x_1x_2}[\sum{yx_2}\sum{x_3^2}-\sum{yx_3} \sum{x_2x_3} )] + \sum{x_1x_3}[\sum{yx_2}\sum{x_2x_3}-\sum{yx_3} \sum{x_2}^2 )] }{\sum{x_1^2}[\sum{x_2^2}\sum{x_3^2}-(\sum{x_2x_3}^2)]-\sum{x_1x_2}[\sum{x_1x_2}\sum{x_3^2}-\sum{x_1x_3} \sum{x_2x_3} )] + \sum{x_1x_3}[\sum{x_1x_2}\sum{x_2x_3}-\sum{x_1x_3} \sum{x_2}^2 )]}, (14)$

$c=\frac{\sum{x_1^2}[\sum{yx_2}\sum{x_3^2}-(\sum{yx_3})]-\sum{yx_1}[\sum{x_1x_2}\sum{x_3^2}-\sum{x_1x_3} \sum{x_2x_3} )] + \sum{x_1x_3}[\sum{x_1x_2}\sum{yx_3}-\sum{x_1x_3} \sum{yx_2})] }{\sum{x_1^2}[\sum{x_2^2}\sum{x_3^2}-(\sum{x_2x_3}^2)]-\sum{x_1x_2}[\sum{x_1x_2}\sum{x_3^2}-\sum{x_1x_3} \sum{x_2x_3} )] + \sum{x_1x_3}[\sum{x_1x_2}\sum{x_2x_3}-\sum{x_1x_3} \sum{x_2}^2 )]}, (15)$

$d=\frac{\sum{x_1^2}[\sum{x_2^2}\sum{yx_3}-\sum{x_2x_3} \sum{yx_2}]-\sum{x_1x_2}[\sum{x_1x_2}\sum{yx_3}-\sum{x_1x_3} \sum{yx_2} )] + \sum{yx_1}[\sum{x_1x_2}\sum{x_2x_3}-\sum{x_1x_3} \sum{x_2^2})] }{\sum{x_1^2}[\sum{x_2^2}\sum{x_3^2}-(\sum{x_2x_3}^2)]-\sum{x_1x_2}[\sum{x_1x_2}\sum{x_3^2}-\sum{x_1x_3} \sum{x_2x_3} )] + \sum{x_1x_3}[\sum{x_1x_2}\sum{x_2x_3}-\sum{x_1x_3} \sum{x_2}^2 )]}. (16)$

PRACTICE FREE JEE TEST SERIES >>

Let us calculate the coefficients a, b, c and d using Microsoft Excel.
(Courtesy: https://onlinecourses.science.psu.edu/stat501/node/380/)

JEE cut-off computation — Figure 2: Excel showing observations.

Using data analysis toolkit in MS Excel:

a = 26.0402,

b = -0.00225,

c = -6.4645,

d = 0.000119.

After substituting the above coefficients, eq (9) becomes,

 (17)

We can use the above eq (17) to compute the JEE cut-off.

We will assume the following values for the year 2019:

Number of seats ( $x_1$ ) = 36500,

Difficulty level ( $x_2$ ) = 1 or 0,

Number of applicants ( $x_3$ ) = 11 lakh.

A) Using eq (10), high difficulty ( $x_2 = 1$ )

${y_p}= 26.0402+ ( -0.00225 ) 36500 + ( -6.4645 ) 1 + ( 0.000119 ) 1100000$

$= 68.35$

B) Using eq (10), moderate difficulty ( $x_2 = 0$ )

${y_p}= 26.0402+ ( -0.00225 ) 36500 + ( -6.4645 ) 0 + ( 0.000119 ) 1100000$

$= 74.92$

Cut-off score range: $68$

– $75$

Conclusion

The above prediction may not be accurate as it is based on a very limited set of data. It is to be noted that, this prediction is not relevant for the year 2019 (onwards), as the cut-off is going to be in percentile not in the score.

Questionbang users can find these values in the mock-set-plus result analysis section.

PRACTICE JEE TESTS ONLINE >>

We value your feedback and welcome any comments to help us serve you better.