2016年全国大学生数学建模竞赛一等奖英文

Team # 47598 Page 1 of 40

———————————————————————————————————————————————————————————————————————————

For office use only Team Control Number For office use only
T1 ________________ 47598 F1 ________________
T2 ________________ F2 ________________
T3 ________________ Problem Chosen F3 ________________

T4 ________________ C F4 ________________

2016

MCM/ICM

Summary Sheet

We come up with an optimal strategy to invest a 1 to 30 prioritized list ofschools and evaluate the corresponding return on investment (ROI).
To deal with the missing data, we divide them into two categories and adopt
two methods accordingly, namely the zeroing operation and the correlation analysis.After filling the missing value, we observe the distribution of the descriptivestatistics.
Based on the processed data, we build three models to develop an optimalinvestment strategy.
Model 1 discusses the methods to select schools. We choose four determinantfactors for school selection. Based on these four aspects, we adopt PrincipalComponent Analysis (PCA) to build an optimized index system. Using thestandardized data as an input, the model outputs a score for each school. By settingthe score of 80 as a baseline, we select out 30 schools as our investment targets.
Model 2 is to define return on investment (ROI), which measures return byevaluating both educational performance and social performance. We also use themethod of PCA to obtain a set of linearly uncorrelated variables. Further, we apply acalibration coefficient α to eliminate the influence of factors other than education.
Focus on the 30 institutes selected by Model 1, Model 3 is to get the optimalinvestment strategy to maximize the ROI as is defined in Model 2. Starting with somepossible functional relationships between the investment and the return, weconstruct a derived model to maximize ROI by using idea of Markowitz’s portfoliotheory. According to the strategy, most of the institutions we select out are worth afive-year investment whereas the amount of money per school varies from year toyear.
Finally, we adopt sensitivity analysis to the first two models by adjusting theweights of different factors. Results show that the weights we set are reasonableand the models are resistant to a certain degree.

Team # 47598 Page 2 of 40

The Optimal Investment Strategy
1. Introduction
Starting July 2016, Goodgrant Foundation is going to donate a total of 100million dollars per year to an appropriate group of schools. The schools are chosenfrom more than 2000 undergraduate universities and colleges all over the United States. To give the best strategy, we divide the task into three sub-problems that areto be solved in order:
• Give a 1 to N optimized and prioritized candidate list of schools
• Define an estimated return on investment (ROI) in an appropriate manner
• Determine the optimal investment strategy to maximize ROI. including the investment amount per school, the return on a single investment and the timeduration that the money should be provided
Before we dive into a specific sub-problem, first we deal with the missing data by adopting two methods, namely the zeroing operation and the correlationanalysis.
To tackle the first problem, we adopt Principal Component Analysis (PCA) to build an optimized index system for school selections and select out schools based on the weighted scores of these indexes.
As for the second problem, we evaluate return by considering both educational performance and social performance of the investment. Then we also apply methods of PCA, calibration of indexes and standardization of input data to define return on investment (ROI).
Finally, based on the results of the first two models, we refer to the idea of Markowitz’s portfolio theory and construct a derived model to maximize ROI, which reveals the optimal investment strategy.

Team # 47598 Page 3 of 40

_____________________________________________________________________________________________________________________________

2. A few words about the problem
In order to get an optimal investment strategy correctly, we divide the problem as some sub-problems to deal with:
 Dealing with data.
 Deciding which kinds of school are our investment target.
 Define an estimated return on investment (ROI) in an appropriate manner.
 Determine the optimal investment strategy to maximize ROI. Including the investment amount per school, the return on a single investment and the time duration that the money should be provided.
3. Assumptions
 In the short term, student enrollment remains relatively stable. Factors related to student enrollment include, but are not limited to, the quality of applicants, the average scores of standardized tests, admission rate and recruitment policies.
 Colleges and universities can maximize the use of the donations they receive by adopting some strategies.
 Goodgrant Foundation can decide which schools to invest and the investment amount per school. But it cannot determine the ways how the money is to be spent. In other words, it is the schools themselves that decide the distributions and functions of a single investment.

 We assume that the functions of expenses are independent of the source of the money. So revenues from tuition, government appropriations and private gifts make no difference regarding the expense.

Team # 47598 Page 4 of 40

_____________________________________________________________________________________________________________________________

4. Dealing with the Missing Data
According to the available data sets, there are 2977 schools listed in the file of Potential Candidate Schools, which will be referred to as Data Set 1 afterwards. There are data on 122 relevant indexes of 7804 different schools in the file of Most Recent Cohorts Data, which we will refer to as Data Set 2. Because there are 41 schools of Data Set 1 that are not included in Data Set 2, so we eventually have information about 2936 schools with 122 indexes, which can be called Data Set 3.
It is obvious that there are numerous data missing in Data Set 3, which are given the value of “NULL”. Based on the reasons why certain data are missing, we can divide the missing data into two categories, one is the complementary missing data and the other is the statistical missing data. “Complementary missing data” means that there exist pairs of indexes, and in each pair the two indexes are complementary to each other. For instance, a public school is bound to miss data on factors that only describe private schools. In contrast, “statistical missing data” comes about not because the relevant index is not applicable, but because the schools themselves did not collect or provide the data on certain aspects. For instance, the average SAT scores are actually available but some universities simply do not collect or provide them. Due to the different properties of the two categories, we adopt different methods to deal with the missing data.
 For complementary missing data, it is more like a systematic problem, which means there does not and should not have any positive value because the index does not match the school. So we can simply replace “NULL” by the value of 0.
 For statistical missing data, we adopt the method of correlation analysis and complete the missing data of 2936 universities by matching data from the
overall 7804 schools. The detailed process is as follows:
Standardization

For Data Set 2, We denote its j ^th column as a vector X _j , so we have

Team # 47598 Page 5 of 40
____________________________________________________________________________________________________________________________

Then we denote the standardized X _j as X J * , so we have

The relationship between X ij * and X ij satisfies

Where

stands for the mean of X_j,std (X_j )stands for the standard deviations of X_j
In this way, we can obtain the standardized data set 2, recorded as Data Set 2’. Similarly, we can get the standardized data set 3, recorded as Data Set 3’.
Filling missing values
Based on the principal of maximum similarity, we match data from Data Set 2’to fill the missing values of Data Set 3’.
We denote data of the i th school in Data Set 3’ as a vector Y i * , so we have

Then we denote data of the j th school in Data Set 2’ as a vector X J * , so we have

For each j ≠ i, their correlation coefficient can be calculated by

where

stands for the mean of standardized data of the i ^th school and

Then, for school i in Data Set 3’, we can get its correlation coefficient with all the other schools in Data Set 1’ except itself. That is to say, for each i∈{1,2,…,2936}, we have a correlation coefficient vector

Notice that in the vector ρ i , we omit the subscript i in each ρ _ij to make it more

Team # 47598 Page 6 of 40

_____________________________________________________________________________________________________________________________

succinct.
Sorting the vector elements in the descending order, we can get

As for the statistical missing data Y ij * in

we assign its value to the mean of ten X ij * that have the largest correlation coefficient.
Finally, we reverse the standardization process to get the original form of data, so as to fill the statistical missing values.

Descriptive Statistics

After solving the problem of the missing data, we adopt the method of
descriptive statistics to all the schools and indexes.
Here we randomly select 9 indexes to provide an overview of the results.

Team # 47598 Page 7 of 40

_____________________________________________________________________________________________________________________________

With the help of descriptive statistics, we can have a general overview of the information about each index, which assists in further evaluation about the specific cases of each institution.

5. Models

Model Overview/symbols
Overviews
In order to come up with the optimal investment strategy, we mainly build three models to solve the three sub-problems that have been set forth in the introduction part.
Model 1 discusses the indexes we choose to evaluate each institution, as well as the methods we apply to selecting out a list of schools to invest. Model 2 deals with the definition of the “return” on an investment. Based on the results of the two models above, Model 3 further discusses how to form the optimized investment strategy, so as to maximize the total estimated return on investment (ROI).

Team # 47598 Page 8 of 40

_____________________________________________________________________________________________________________________________

Symbols
X 1 Input
X 11 Input in supportive resources, including the faculty team, scholarships and aids, etc.
X 12 Input in school infrastructure, including facilities such as the dormitories, libraries and dining halls, etc.
X 13 Input in scientific researches, including research programs and relevant equipment, etc.
X 2 Output
X 21    Output related to degree completion, including graduation, dropout, transfer and retention, etc.
X 22 Output related to career achievements after graduation, including employment rate, average earnings, etc.
Y 1 Degree of openness to special groups
Y 11 Admission rate of students from low-income families
Y 12 Admission rate of the minority groups
Y 2   Educational resources offered to special groups
Y 21   Percentage of students receiving loads or aids

Y 22 Graduation rate of the minority groups

_____________________________________________________________________________________________________________________________

5.1 Model 1: School Selection

Step 1. Build the index system
According to the databases and the literature reviews, we select out the
following four aspects most contributive to selecting out our target schools:
 the situation about special groups of students in the school
 The potential of undergraduate students in the school
 The credit status of the school
 The educational resources owned by the school

Team # 47598 Page 9 of 40

_____________________________________________________________________________________________________________________________

We respectively denote theses four aspects as c 1 , c 2 , c 3 and c 4 .
Then, we will give more specific explanations of the indexes needed to measure these four aspects：
c 1 involves four types of special groups, which we denote as c 11 ,c 12 ,c 13 and c 14 :
 Poor students from low-income families families, whose annual income is less than 30000 dollars.
 Students of minority groups, which typically refers to the blacks and Hispanics.
 Students majoring in certain subjects, namely Agriculture, Computer and Information Sciences, Education, Mathematics and Statistics, Transportation and Materials moving, Health Professions and Related Programs.
 Elder undergraduates over 25 years old.
c 2 covers three stages of time periods to comprehensively measure the potentials of students, which we denote as c 21 ,c 22 and c 23 respectively:
 Students’ performance at the admission stage.
 Students’ performance during college or university.
 Students’ performance after graduation.
c 3 is about credit evaluation, which help to ensure that the investment is to be utilized in reasonable ways. We evaluate the credit status by two indexes, which are denoted as c 30 and c 32 respectively:
 The credit status of the school.
 The credit status of the students, for instance the repayment rates.
c 4 refers to the resources owned by the school at present, which is used to measures the degree to which the investment is needed. In order to eliminate the influence of the scale, we can use the index of per capita resources, which is denoted
as c 41 .
Based on the above definitions, we can define further bottom-indexes in consideration of the available data. In this way, we have the preparatory index system as shown in Figure 1.

Team # 47598 Page 10 of 40

_____________________________________________________________________________________________________________________________

Step 2. PCA: determine fewer bottom indexes
We notice that some of the second-class and third-class indexes have too many subtsets. It is likely that there exist some correlations between these subsets, which will not only increase the computing workload, but also affect the effectiveness and reliability of the results.
To assist in the following modelling process, we want to use the least number of

Team # 47598 Page 11 of 40

_____________________________________________________________________________________________________________________________

bottom-indexes to cover the primary and comprehensive information needed for evaluation. Thus, we respectively apply the method of Principal Component Analysis (PCA) to those subsets, so as to select representative bottom-indexes and convert a set of possibly correlated variables into a set of linearly uncorrelated variables, which can provide non-overlapped information.
Take the third-class index c 211 as an example. This index refers to the scores of some standard examinations to measure students’ performance at the admission stage. From Figure 1 we can see that it involves 23 bottom-indexes, most of which are the SAT and ACT scores. So we applied PCA to its 23 bottom-indexes with the following three steps:
 Standardize the data of 2936 institutions on 23 bottom-indexes of c 211 so as to eliminate the effect of dimensions.
 Build the correlation matrices of different indexes and compute the corresponding eigenvalues

where

Then for each

compute the corresponding eigenvector e_i , which is exactly the load-value vector of the i th principal component.

 Denote the proportion of variance of the i th principal component as q i , so

Set the threshold level as Q and in most cases Q ≥ 85%.
Let k denote the number of principal components, then we have

So these k principal components can reliably reflect at least Q ratio of the original information that are covered by 23 indexes. And they can be considered as remarkable representatives.

Team # 47598 Page 12 of 40
_____________________________________________________________________________________________________________________________

Team # 47598 Page 13 of 40

_____________________________________________________________________________________________________________________________

We can use MATLAB to realize the above steps and get the load values, eigenvalues and proportion of variance. The results of the eigenvalues and cumulative proportion of variance are presented in Figure 4. From the figure we can notice that the eigenvalues have an overall decreasing trend, where the first four values decrease rapidly and then in a more gentle way. Accordingly, the cumulative proportion of variance first increases in a rapid way while the curve begins to flatten afterwards. When we set the threshold value Q to be 85%, we can get 10 principal components whose cumulative proportion of variance is 90.84859%, as is shown in Table 2. This means that the 10 principal components can reflect 90.84859% of the original information. Thus, the 10 principal components can remarkably represent the original 23 indexes.
By applying the above qualitative analysis of PCA, we can get a principal component to describe 𝑐 211 in a comprehensive way. We can rename this new index as SAT&ACT. In this way, we have decreased dimensions of the third-class index c211.
Similarly, we can apply the method of PCA to the other second-class and third-class indexes to obtain an optimized index system of school selection.

Team # 47598 Page 14 of 40

_____________________________________________________________________________________________________________________________

Step 3. Determine the weights of indexes.
Based on our literature review and our own analysis of the importance of difference factors, we finally determine the weights of each index. It is worth noting that the two-year community colleges and the four-year universities have different missions and focuses of education, so we also consider the distinction between them when making evaluations. Two-year community colleges are more job-oriented, so

Team # 47598 Page 15 of 40

_____________________________________________________________________________________________________________________________

we lay more emphasis on employment and increase the weights of employment rates. On the contrary, four-year universities are more academic-oriented, so we add more weights to their retention rates.

The weights of all the indexes are shown in Figure 6.

Step 4. Output the results of the model
Based on the final index system that we have constructed, we can use the

Team # 47598 Page 16 of 40

_____________________________________________________________________________________________________________________________

standardized data of 2936 institutions on 122 indexes as inputs to the model, which return a total score, denoted as C that varies from 1 to 100, for each of the 2936 institutions. We set the score of 80 as a baseline and obtain 30 schools as our investment targets.(Table 3)

Team # 47598 Page 17 of 40

_____________________________________________________________________________________________________________________________

The results are shown in the table above, which also give us a prioritized list of schools.

5.2 Model 2: Define the ROI
If we want to develop an optimal strategy to maximize the return, then the question is, how should we define the term of “return”? As a matter of fact, the “return” that Goodgrant Foundation seeks for is not a matter of money. Instead, it refers to some positive effects to the undergraduate students and perhaps to the society to fulfill the foundation’s higher missions.
For this reason, we believe that the performance of an investment should be evaluated in two aspects
 educational performance
 social performance.
Accordingly, we construct two sub-models to make further investigations and assessments.
Educational Performance: Input-Output Model
In order to measure the positive effect of the investment on students’ educational performance, we mainly look at the input and output. input refers to the increase in the resources students own or share, and the

Team # 47598 Page 18 of 40

_____________________________________________________________________________________________________________________________

output is the resulting improvements. The balance between input and output is important because there exists some sort of “time lag”. To put it in another way, the positive effects of the investment need some time before it can be fully observed. So the output of one time plot merely reflects changes in a short period and talks little about the long-term effect. In contrast, the increase in the number of resources allows us to predict the trend. Thus, it is reasonable to take both input and output into consideration. It not only gives us a more comprehensive view of students’ educational performance, but also reflects the efforts made by the universities and colleges in utilizing the resources.
Step 1. Build the index system to measure input and output
In term of input, we assign three second-class indexes, namely the supportive resources, school infrastructure and scientific researches, which are denoted as X ₁₁, X ₁₂ and X ₁₃ respectively. As for the output, we evaluate it by the degree completion rate and career achievements, which are denoted as X₂₁and X ₂₂ respectively.
Furthermore, for each second-class index, we also select out some important bottom indexes, which can be measured in a quantitative way.
In this way, we build a primary index system to measure input and output.
Step 2. PCA: determine fewer bottom-indexes
Also, just as what we have down in the Model 1, we apply the method of Principal Component Analysis (PCA) to determine fewer bottom-indexes that can
contribute in a linearly uncorrelated way.
Step 3. Calibration of Output Indexes
According to the index system we have built, the output indexes mainly depend on degree completion and career achievements. As for degree completion, it is conclusively determined by the educational factors so that the influence of other factors can be ignored. However, when it comes to the career achievements, things turn out in a different way. One’s income, which indicates the employment status and future promotion, is greatly affected by non-educational aspects. According to the Screening Theory, other external factors such as family background, major and fortune, as well as some internal factors like one’s intelligence and ability, can also

Team # 47598 Page 19 of 40

_____________________________________________________________________________________________________________________________

contribute to career developments. Thus, when it comes to the career achievements as a way to measure the effectiveness of our financial investment, educational factors only account for a part of the changes. So here we introduce a coefficient α.
The coefficient α acts as a calibration factor which points out the degree to which one’s career development is directly related to the education received. The value of α can be determined by conducting multivariate regression analysis to income, educational background and other representative variables. How we determine the value of α is crucial to the return of investments.
Denison used the method of Factors Analysis and assigned 0.6 to the value of α. To put it in another way, he believes that 60% of one’s income growth can be attributed to education while 40% is due to factors other than education. Denison’s coefficientα is generally accepted for it is both scientific and feasible. Thus, we’ll apply the study result of Denison and set the value ofαto 0.6 in our model.
Step 4. Determine the weights of indexes
Adopting methods just as we did in Model 1, we can determine the weights of different indexes. We also distinguish between the two-year community colleges and the four-year universities. Considering that two-year community colleges are more job-oriented, we lay more emphasis on employment and increase the weights of employment rates. On the contrary, four-year universities are usually more academic-oriented, so we add more weights to their retention rates.
The indexes and their respective weights are presented in Figure 7.

Team # 47598 Page 20 of 40

_____________________________________________________________________________________________________________________________

Now, we have constructed an optimized index system for the Input-Output Model to measure the educational performance of specific investments, as shown above.

Social Performance: Supporting Model
Step 1. Build the index system
Some disadvantaged and vulnerable groups are in dire need of educational supports. As an effective way of realizing educational reform, special groups are

Team # 47598 Page 21 of 40

_____________________________________________________________________________________________________________________________

always given some priority when it comes to compensatory fiscal policies. From the direct financial aids and indirect concepts of multi-culture education reflected in course designs, teaching and evaluation, special groups are given access to undergraduate educations.
Thus, in addition to educational performances, we consider the social performance of the investments when making decisions. It mainly involves the support to two groups of students, those from low-income families and those from minority groups. Accordingly, we set some first-class and second-class indexes.
Step 2. Determine the weights of indexes
Similar to what we have done in the Input-Output Model before, we now assign different weights to the indexes. Finally, we construct the system of weighted indexes for the Social Performance Model, as is represented below (Figure 8).

Now, we have constructed an optimized index system for the Supporting Model to measure the social performance of specific investments, as shown above.

Team # 47598 Page 22 of 40

_____________________________________________________________________________________________________________________________

Combined Mode l
Now that we have built the Input-Output Model and the Supporting Model to assess the educational performance and social performance, we are going to assign weights to these two models so that they can combine together to form a comprehensive one.
Based on the study of Banta and T.W in 2007, we set the weight of educational performance to 0.6 and the weight of social performance to 0.4, which we believe is a manner appropriate for a charitable organization as Goodgrant Foundation.
Based on the combined model we have built, we can extract the real data from the databases under each index. Then we can get the incremental data in a certain period of time, which reveal the return of investments in term of each basic index. By applying the weights, we finally get the total return of i th school in the j th year.

5.3 Model 3: MV-ROI Model: Find The Optimal Stratagy

We denote the return of all the schools we have selected out as a vector

We can standardize each element G i into the interval of [1,100]. And we denote the standardized form of G as

Thus, F and G satisfy the following equation:

We denote the capital input of the institutions as

and assume that there exists some functional relationship between F and A, which means

, then we can broadly observe the relation in a scatter plot. The functional form of f has various possibilities, which may either be a linear function or a nonlinear function, such as the quadratic, logarithmic, polynomial and piecewise functions.

Team # 47598 Page 23 of 40

_____________________________________________________________________________________________________________________________

If Goodgrant Foundation donates Xi amount of money into the i ^th school, then its estimated return will change into

Given the original return as

, the return-on-investment (ROI) is defined by:

Next we are going to carry out further discussions respectively on condition that f is a linear function, a quadratic function or a logarithmic function.
（1）If f is in the form of a linear function, then F can be expressed as

When the i ^thschool receives Xi amounts of investment, its estimated return changes into

We can conclude that an additional Xi amounts of money will add λXi to the estimated return of the i ^thschool, where λ can be obtained by using Least Square Method (LSM) to fit to the following model:

where ε is the residual.
Thus, we can get the estimated ROI for the i ^thschool:

By adding up the individual ROIs, we now have the estimated ROI for a total of N institutions that we decide to invest:

（2）If f is in the form of a quadratic function, then F can be expressed as

. When the i ^thschool receives Xi amounts of investment, its estimated return changes into

Team # 47598 Page 24 of 40

_____________________________________________________________________________________________________________________________

We can conclude that an additional xi amounts of money will add

to the estimated return of the i ^th school, where  can be
obtained by using Least Square Method (LSM) to fit to the following model:

where ε is the residual.
Thus, we can get the estimated ROI for the i ^th school:

By adding up the individual ROIs, we now have the estimated ROI for a total of N institutions that we decide to invest:

（3）If f is in the form of a logarithmic function, then F can be expressed as

. When the i th school receives
Xi amounts of investment, its estimated return changes into

We can conclude that an additional Xi amounts of money will add

to the estimated return of the i ^th school, where λ can be
obtained by using Least Square Method (LSM) to fit to the following model:

where ε is the residual.
Thus, we can get the estimated ROI for the i ^th school:

Team # 47598 Page 25 of 40

_____________________________________________________________________________________________________________________________

By adding up the individual ROIs, we now have the estimated ROI for a total of N institutions that we decide to invest:

In this way, we can estimate return-on-investment (ROI) in accordance with the different functional forms of f .
We note that the total amount of investment is limited to 100 million dollars at maximum, so Xi need to satisfy

In addition, the investment strategy should also take other factors into consideration. For instance, the geographic and racial distribution of schools should be extensive to some degree. And the number of institutions to invest should be neither too many nor too few.
Optimal Investment Model
Mean-Variance Model (M-V Model) is a risk measurement model put forward by H.M.Markowitz in 1952. There are two main objectives of making portfolio decisions. One is the higher return and the other is the lower risk. The model was constructed to seek out a balance between the two objectives.
According to our previous definition, return-on-investment (ROI) plays a similar role as the expected return does in M-V Model, and the correlation coefficient matrix of the normalized indexes can be regarded as the correlation coefficient matrix in M-V Model. So as long as we obtain a standard deviation, the concepts of Markowitz’s model can be applied to our situations to get the optimal investment

Team # 47598 Page 26 of 40

_____________________________________________________________________________________________________________________________

strategy.
Here, we define the standard deviation S I as the absolute value of the difference between the real return and the estimated return, so we have

Further, we can get the standard deviation vector

We assume the correlation coefficient of the i th school and the j th school to be

where

is the mean value of the standardized indexes for i^thschool and

, where m refers to the number of indexes.
we can use the correlation coefficients of any two schools to construct a corresponding correlation matrix C, where C is a symmetric matrix and each element on the primary diagonal of C is 1. Thus, the covariance matrix of different schools can be defined as

In this way, we have constructed an investment strategy model, which we would like to re-name as MV-ROI model.

It is easy to notice that the form of our MV-ROI Model is basically consistent with that of M-V Model. Considering the needs of Goodgrant Foundation and the schools, we can apply the theory of the indifference curve and the efficient frontier curve. Finally, we can form our optimized investment strategy by choosing certain

Team # 47598 Page 27 of 40

_____________________________________________________________________________________________________________________________

points from the efficient frontier curve.
According to the model constructed before, we can get the best investment strategy for foundation in one year’s investment (100 million dollars in total). We can predict the Return on Investment in this year according to the investment strategy a year ago, which in turn predict the basic index condition after the renew of school in the next year, and execute investment for the next year. The rest can be done in the same manner. Thus, we can calculate the investment strategy contributing to the best Return on Investment, which means the ROI per year is always the highest among the five years’ investment.
Based on the above, we only need one year’s investment strategy manipulated by the model to analogise the investment strategies for the past five years. Here we only give the 0 year’s output of the model.

Note： *means the significance of two-tailed test under 10%statistical level

*means the significance of two-tailed test under 5%statistical level

*means the significance of two-tailed test under 1%statistical level

In the brackets，there are t value fixed by heteroscedastic.

This note applies to the tables below.

Team # 47598 Page 28 of 40

_____________________________________________________________________________________________________________________________

So we can get the optimal investment strategy for 5 years:

 If f is quadratic, we can get the result which are as follows:
0 th year（historic data）：

So we can get the optimal investment strategy for 5 years:

 If f is logarithmic , we can get the result which are as follows:
0 th year（historic data）：

Team # 47598 Page 30 of 40

_____________________________________________________________________________________________________________________________

So we can get the optimal investment strategy for 5 years:

Team # 47598 Page 31 of 40

_____________________________________________________________________________________________________________________________

From above, we could found that with the increase of the investment period, those three types of model’s adjusted goodness of fit all have increased. Also, we could found that if f is linear, the distribution of investment is more dispersive; if f is quadratic, the investment amount on the universities whose ID is 215293 keeps in a high level during the five years; if f is logarithmic, the distribution of investment differs apparently in different years.
Combined with those three types of function, we think that it is reasonable to regard the
relationship between return and investment as some kind of logarithmic function, for logarithmic function show the quality of slower increase.
So we can get the optimal investment strategy from discuss above. The strategy is shown in Table 7

Team # 47598 Page 33 of 40

_____________________________________________________________________________________________________________________________

We input the final strategy to the model, according to the MV-ROI methods, we could get the ROI of this investment strategy. The result are as follows: ROI=82.213%, which means that 1 dollar’s investment could gain 82 scores. The meaning of the score are defined by the ROI Model we built before.

6. Sensitivity Analysis

Sensitivity of the School Selection Model
First, we analysis the sensitivity of the parameters in the School Select Model. According to the model we have built before, we know that there are four main indexes, named 𝑐 1 ,𝑐 2 ,𝑐 3 ,𝑐 4 , to evaluate a school whether it worth the investment. The weights of those four indexes are 0.5000, 0.3125, 0.0625 and 0.1250 respectively
It is obviously that index 𝑐 1 and 𝑐 2 have a higher weight than 𝑐 3 and𝑐 4 , which indicates that 𝑐 1 and 𝑐 2 are the most important index in the index system. So we only discuss this situation: if 𝑐 1 and𝑐 2 ’s weights changed, how would the final result respond. In another words, how would the proportion of schools whose score is above 80, named

, would respond to the change of 𝑐 1 and 𝑐 2 ’s weight . we know:

When:

Otherwise:

Therefore, we will keep the sum of 𝑐 1 +𝑐 2 invariable. On the condition that the sum of 𝑐 1 and 𝑐 2 are always 0.8125, we make the 𝑐 1 changes from 0 to 0.81 (step length=0.01). Correspondingly, 𝑐 1 changes from 0.8125 to 0.0025 (step length= -0.01) then we can get 82 groups of data in total.

Team # 47598 Page 34 of 40

The result are as follow:

From the result above, when 𝑐 1 is less than 0.5, the value of proportion

maintained at 1.4% approximately; when 𝑐 1 is greater than 0.5, the value of proportion

declines rapidly and maintained at 0.2%. From this situation, we can refer that the reasonable value of 𝑐 1 ’ s weight is 0.5 approximately. Also, Choosing a reasonable value of 𝑐 1 ’ s weight is extremely important to keep the stability of the result.
Also, we can discuss that if 𝑐 1 , 𝑐 2 and 𝑐 4 ’s weights change, how will the final result about school selection will respond.
Similarly, we do this by keeping the value of 𝑐 4 ’s weight constant, which means keeping 𝑐 1 + 𝑐 2 + 𝑐 4 =0.9375. We make the𝑐 1 changes from 0 to 0.93 (step length= 0.01). Correspondingly, we can get an uncomplete three-dimensional data. Because the sum of 𝑐 1 , 𝑐 2 and 𝑐 4 is constant, so the degree of freedom is 2. The result are as follows:

Team # 47598 Page 35 of 40

We could learn from the result that the reasonable value of 𝑐 1 ’ s weight is 0.5 approximately, while the reasonable value of 𝑐 2 ’ s weight is 0.3 approximately

Sensitivity of the Return-On-Investment Model

In the evaluation system of ROI, we set the weight of X1 and Y1 are 0.6 and 0.4. To know more about the weight’s effect, we did a research on the influence of weights on the function. In other words, we got the changes of adjR^2 calculated, fixed with least square method, for the influence of 3 different functions of ROI Score and investment (Shown in the model 3). Since X1+Y2=1, we made X1 increase from 0 to 1 in steps of 0.1. By doing this, we can get 101 sets of data. For the sum of X1 and Y1 is a constant, the free degree between them is one, which means the relationship between X1 and adjR^2 can be displayed by a one-dimensional curve.

Team # 47598 Page 36 of 40

_____________________________________________________________________________________________________________________________

As we can see in the figures above, the adjR^2 grows fast when X1 increases, and its growth rate increases at the first. Then the adjR^2 reaches a peak and begins to decrease slowly. So we can tell, the definition of weight can affect the result of function fitting. The weight of X1 we choose in this model is 0.6, the growth rate of whose relevant adjR^2 is slow. What’s more, the three function have close curve in this situation. In conclusion, the model can be affected by sensitivities, if the sensitivities are too tiny, the fitting result of function could be poor, while if they are too huge, the results are more likely similar to particular index. So, the choice of 0.6 is reasonable and wise.

7. Conclusion

What we have done

Team # 47598 Page 37 of 40

_____________________________________________________________________________________________________________________________

 We processed the data and solved its data-missing problem, and in the meantime, using PCA (Principle Components Analysis) to transform the enormous index. We used descriptive statistical analysis to observe the features of the data.
 We established the effective model named “School-Selection Model” to determine the investment list and explored how this model responds to small changes. Also, we have assessed the advantages and disadvantages of the model.
 We obtained comprehensive definition system for ROI, which enable us to use schools’ basic index data to reflect their performance after being invested. We also assess the advantages and disadvantages of the model.
 We get the best investment strategy by solving the optimized equation, based on the MV-ROI Model which in turn brings about the highest ROI and makes us successfully estimated the investment income created by this investment strategy.
Strength & Weakness
Strength:
 The filling of missing data is accurate. By classifying data into two classifications, and the use of correlation analysis of filling missing data in statistics classification, we get the missing data the more accurately.
 The models we built is thoughtful. We have considered all kinds of aspects to ensure that our model is a all-sided model.
 Thoughtful analysis. We have done lots of testing to ensure our models are accurate and practical.
Weakness:
 The given indexes are too limited, so we cannot take some reasonable indexes we want into our consideration, which would made our model lose its accuracy.

Team # 47598 Page 38 of 40

_____________________________________________________________________________________________________________________________

8. References
[1]Burke, J. C.&Minassians, H.“ Real” Accountability or Accountability“ Lite”: The Seventh Annual Survey 2003
[2]Bogue, E. G.“Twenty years of performance funding in Tennessee: A case study of policy intent and effectiveness” In Fun-ding public colleges and universities: popularity, problems, and prospects.edited by Joseph C. Burke and Associates(Al-bany, NY: The Rockefeller Institute Press,2002).
[3] R. Vidal. Subspace clustering. IEEE Signal Processing Magazine, 28(2):52–68, 2011.
[4] Hyvarinen A. Survey on independent component analysis[J]. Neural computing surveys, 1999, 2(4): 94-128.
[5] Turk M, Pentland A. Eigenfaces for recognition[J]. Journal of cognitive neuroscience, 1991, 3(1): 71-86.
[6] Rosenfeld A, Kak A C. Digital picture processing[M]. Elsevier, 2014.
[5] R. Vidal. Subspace clustering. IEEE Signal Processing Magazine, 28(2):52–68, 2011.
9. Letter to the CFO of the Goodgrant
Foundation
Dear Mr. Alpha Chiang,
It is our great honor to participate in the modelling process of your intended investment program. After careful consideration and analysis, our team finally comes up with an optimal investment strategy that we would like to present to you at the very beginning.
We suggest that your foundation should invest 33 colleges and universities in the United States, which include Colorado School of Mines, College of William and Mary and MCPHS University, etc. To visualize out strategy, in the table below we have the top 5 schools with the investment amount per year for each institution. The smaller the sequence number is, the more we would like to recommend the school to you as a target for your investment.

Team # 47598 Page 39 of 40

_____________________________________________________________________________________________________________________________

Based on the data we extracted from online datasets, we also estimate the ROI of this strategy is 82.213%.
Next, we would like to give a brief introduction to our basic concepts and approaches.
When it comes to the decisions of investments, the first thing is always to figure out what kind of institutions should be given priority to and what specific
characteristics are to be valued by your organization.
After collecting relevant information, we discover that most traditional organizations such as the Gates Foundation intend to invest their money to elite students, thus the top universities are always their first choices. But for an innovative organization as Goodgrant is, it is time to carve a new path. Therefore, we sum up in four determinant aspects to evaluate whether a school is worth investing, including the supports for disadvantaged groups, the potential of its undergraduates, the credit status and its present educational resources. Based on the different indexes of these four factors, we are able to give a weighted score to each institution and give priority to those that stand out.
After we select out the candidate list of schools, a way of measurement is needed to evaluate the effect. That is to say, we need to compare the return on every dollar that Goodgrant is going to invest. To define the return, we mainly consider from two perspectives, namely the educational performance and the social influence. The improvements of students’ educational performance can be measured by comparing the input and the output in a short term. Input factors are those supportive resources, school infrastructure and scientific researches. Output factors mainly include degree completion in school and career achievements after

Team # 47598 Page 40 of 40

_____________________________________________________________________________________________________________________________

graduation. In general, the investment is more effective if it has better outputs with fewer inputs. As for the social influence, it is meant to make undergraduate education more accessible and feasible for some poor students and minority groups, which is often a long-run effect but is of great significance in promoting fairness and openness.
Under the principal of maximizing the estimated return of certain amounts of money, we further arrive at the specific investment strategy as is presented at the beginning of the letter. Although to some extent, this strategy depends on our own perception of the roles and pursuits of Goodgrant Foundation, we believe it is reasonable to keep a balance between the individual benefits and the social benefits. Since the strategy we have proposed is based on authoritative data and scientific models, we believe it is a reliable and optimized way to carry out your investment. What’s more, our concept of “return on investment” is applicable to further evaluation of your donations and educational investments. For all these reasons, we would like to recommend this strategy to you and hope that it can be honorably adopted.
Regards,
Team 47598