Predicting JEE cut-off

Predicting JEE cut-off

Sharing is caring!

 

Cut-off is a popular word during admission season, especially in India. For us at Questionbang, cut-off  is  something often asked by mock-set-plus users – Has my score is good enough to qualify Joint Entrance Examination (JEE)? Hence, we decided to predict the cut-off for the next season and make this as a part of result analysis.

Background

Most of us have used basic regression analysis during our class 12th maths, e.g, time series and forecasting. In reality, the outcomes of such predictions are going to be dependent on various factors. The use of simple regression may not be sufficient in such cases.

Let us consider our requirement – predicting cut-off marks for JEE. The table below shows cut-off scores for last 6 years.

JEE cut-off data
Table 1. Cut-off scores for last 6 years.

 

Let us try a simple curve fitting approach (Figure 1); this is giving a prediction of 69 for the year 2019. However, we cannot relate these points to any reasoning. As we can see, the cut-off (Y) was 113 for the year 2013 and became 74 in 2018 (Table 1). Surely there are many factors that influence the cutoff.

 

scatter plot diagram for JEE cut-off
Figure 1. Scatter plot diagram for table 1.

 

 PRACTICE  JEE  ONLINE  TESTS    >>

 

Assumptions

Let us assume those cut-offs (Table 1) are a measure of competition and hence, are a function of following variables:

  • Number of seats available (x_1),
  • Difficulty level (x_2),
  • Number of applicants (x_3).

How these individual variables influence the outcome is something to be predicted.

 

Choosing regression method

What is regression analysis?

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables [wiki].

There are many different types of regression techniques. They are mainly of two categories –  linear regression and non-linear regression.

In our case, the cut-off score (predictor) is dependent on three independent variables (x_1, x_2, x_3) as discussed before. Hence, this is going to be a multiple linear regression scenario.

The table below (Table 2) is an extension of Table 1 (Cut-off scores for the last 6 years) to include the number of seats available, difficulty level and number of applicants.

 

JEE cut-off observations
Table 2. Extended table to include other variables.
Difficulty level is a categorical variable having level 1 (high) or 0 (moderate).

About data

The data – cut-off scores, seat availability, number of applicants and difficulty level have been gathered from online news portals. The JEE format changed a few times during past 20+ years. It has a two-phase (Mains & Advanced) format since 2013 .  We will use data from 2013  to  2018.

 PRACTICE  JEE  MOCK  TESTS   >>

 

Revisiting the basics of least square regression method -single independent variable condition

 

Assume a single independent variable condition and set of values as below (Table 3). Let us call these as observations.

y_i x_i
y_1 x_1
y_2 x_2
\vdots \vdots
y_n x_n

Table 3. Observations.
x_i – Independent variable,
y_i – Actual dependent variable.

Following is an equation for simple linear regression:

(1)   \begin{equation*} y= a+ b x. \end{equation*}

Let us compute predictions (y_p)_i , using above values (Table 3):

 (y_p)_1= a+ b {x_1}

(y_p)_2= a+ b {x_2}

          \vdots               \vdots              

(y_p)_n= a+ b {x_n}.

In generic form,  the equation for predicted value will become:

(2)   \begin{equation*} y_p= a+ b {x_i}. \end{equation*}

y_i x_i y_p
y_1 x_1 (y_p)_1
y_2 x_2 (y_p)_2
\vdots \vdots \vdots
y_n x_n (y_p)_n

Table 4. Observations and predictions.
y_p – Predicted value.

 

Let us verify the accuracy of our prediction (Table 5),

y_i x_i y_p e_i
y_1 x_1 (y_p)_1 y_1 -   (y_p)_1
y_2 x_2 (y_p)_2 y_2 - (y_p)_2
\vdots \vdots  \vdots  \vdots
y_n x_n (y_p)_n y_n - (y_p)_n

Table 5. Observations, predictions and errors.
e_i = error term.

 

As you can see, we subtracted the predictions (y_p) from the actual observations (y_i) to compute errors (e_i). The next objective would be to refit the line so that the error (e_i) is minimized.

(3)   \begin{equation*} e_i=y_i - y_p, \end{equation*}

From (2) and (3),

(4)   \begin{equation*} \sum_{i=1}^{N}{e_i^2} = \sum_{i=1}^{N}{y_i - (a+bx_i)}^2 }. \end{equation*}

Eq (4) is a squared error function; we need to find coefficients a & b to achieve minimum (zero) error. Take the partial derivative of eq (4) with respect to  a and b:

(5)   \begin{equation*} \frac{\partial \sum \limits_{i=1}^{n}{e_i^2}}{\partial a}=0, \end{equation*}

(6)   \begin{equation*}  \frac{\partial \sum \limits_{i=1}^{n}{e_i^2}}{\partial b}=0. \end{equation*}

From (5) and (6),

(7)   \begin{equation*} \sum{y_i} = \sum{a}+b\sum{x_i}   \end{equation*}

(8)   \begin{equation*} \sum{y_ix_i} = a\sum{x_i}+b\sum{x_i^2}.  \end{equation*}

PRACTICE  FREE  JEE  MOCK  TESTS  ONLINE   >>

 

 

Computing JEE Cut-off – three independent variable condition 

 

In our case, we have three independent variables – x_1, x_2, x_3 and coefficients – a, b, c, d. The regression equation becomes,

(9)   \begin{equation*} &{y_p}=a + bx_1+cx_2+dx_3. \end{equation*}

 

Intercept a is:

\begin{aligned} a=y_i - b {\overline{ x}_1} - c \overline{ x_2} - d \overline{ x_3} \label{eq1},  \end{aligned}                                                     (10)

where;
\overline{x}_1\overline{x}_2 and \overline{x}_3 are means of respective variable values.

And the normal equations are,

(11)   \begin{equation*}$\sum{y_ix_1}$ = $b\sum{x_1^2}+c\sum{x_1x_2} +d\sum{x_1x_3} $, \end{equation*}

(12)   \begin{equation*}$\sum{y_ix_2}$ = $b\sum{x_1x_2}+c\sum{x_2^2} +d\sum{x_2x_3} $, \end{equation*}

(13)   \begin{equation*}$\sum{y_ix_3}$ = $b\sum{x_1x_3}+c\sum{x_2x_3} +d\sum{x_3^2} $. \end{equation*}

 

Writing above equations in matrix form and solving using Cramer’s rule:


b=\frac{\sum{y_ix_1}[\sum{x_2^2}\sum{x_3^2}-(\sum{x_2x_3})]-\sum{x_1x_2}[\sum{yx_2}\sum{x_3^2}-\sum{yx_3} \sum{x_2x_3}  )] + \sum{x_1x_3}[\sum{yx_2}\sum{x_2x_3}-\sum{yx_3} \sum{x_2}^2  )] }{\sum{x_1^2}[\sum{x_2^2}\sum{x_3^2}-(\sum{x_2x_3}^2)]-\sum{x_1x_2}[\sum{x_1x_2}\sum{x_3^2}-\sum{x_1x_3} \sum{x_2x_3}  )] + \sum{x_1x_3}[\sum{x_1x_2}\sum{x_2x_3}-\sum{x_1x_3} \sum{x_2}^2  )]},      (14)

c=\frac{\sum{x_1^2}[\sum{yx_2}\sum{x_3^2}-(\sum{yx_3})]-\sum{yx_1}[\sum{x_1x_2}\sum{x_3^2}-\sum{x_1x_3} \sum{x_2x_3}  )] + \sum{x_1x_3}[\sum{x_1x_2}\sum{yx_3}-\sum{x_1x_3} \sum{yx_2})] }{\sum{x_1^2}[\sum{x_2^2}\sum{x_3^2}-(\sum{x_2x_3}^2)]-\sum{x_1x_2}[\sum{x_1x_2}\sum{x_3^2}-\sum{x_1x_3} \sum{x_2x_3}  )] + \sum{x_1x_3}[\sum{x_1x_2}\sum{x_2x_3}-\sum{x_1x_3} \sum{x_2}^2  )]},         (15)

d=\frac{\sum{x_1^2}[\sum{x_2^2}\sum{yx_3}-\sum{x_2x_3}  \sum{yx_2}]-\sum{x_1x_2}[\sum{x_1x_2}\sum{yx_3}-\sum{x_1x_3} \sum{yx_2}  )] + \sum{yx_1}[\sum{x_1x_2}\sum{x_2x_3}-\sum{x_1x_3} \sum{x_2^2})] }{\sum{x_1^2}[\sum{x_2^2}\sum{x_3^2}-(\sum{x_2x_3}^2)]-\sum{x_1x_2}[\sum{x_1x_2}\sum{x_3^2}-\sum{x_1x_3} \sum{x_2x_3}  )] + \sum{x_1x_3}[\sum{x_1x_2}\sum{x_2x_3}-\sum{x_1x_3} \sum{x_2}^2  )]}.           (16)

 

PRACTICE  FREE  JEE  TEST  SERIES    >>

 

Let us calculate the coefficients a, b, c and d using Microsoft Excel.
(Courtesy: https://onlinecourses.science.psu.edu/stat501/node/380/)

 

JEE cut-off computation
Figure 2: Excel showing observations.

Using data analysis toolkit in MS Excel:

                                                                                     a = 26.0402,

                                                                                     b = -0.00225,

                                                                                     c = -6.4645,

                                                                                     d = 0.000119.

After substituting above coefficients, eq (9) becomes,

(17)   \begin{equation*} {y_p}= 26.0402 - 0.00225 x_1  - 6.4645 x_2 + 0.000119 x_3. \end{equation*}

 

We can use above eq (17) to compute JEE cut-off.

 

We will assume the following values for the year 2019:

Number of seats (x_1) = 36500,

Difficulty level (x_2) = 1 or 0,

Number of applicants (x_3) = 11 lakh.

 

A)    Using eq (10),  high difficulty  (x_2 = 1)

             {y_p}= 26.0402+ ( -0.00225 ) 36500 + ( -6.4645 ) 1 + ( 0.000119 ) 1100000
                                = 68.35.

 

 

B)    Using eq (10),  moderate difficulty  (x_2 = 0)

             {y_p}= 26.0402+ ( -0.00225 ) 36500 + ( -6.4645 ) 0 + ( 0.000119 ) 1100000
                                = 74.92.

 

 

              Cut-off score range:   68  –  75.

 

Conclusion

Above prediction may not be accurate as it is based on a very limited set of data. It is to be noted that, this prediction is not relevant for the the year 2019 (onwards), as the cut-off is going to be in percentile not in score.

Questionbang users can find these values in mock-set-plus result analysis section.

 

 PRACTICE  JEE  TESTS  ONLINE   >>

 

We value your feedback and welcome any comments to help us serve you better.

 

References: