其他分享
首页 > 其他分享> > MA705 Fall 2019

MA705 Fall 2019

作者:互联网


Project 3 Assignment
MA705 Fall 2019
Date Assigned: Mon, 18 Nov, 2019
One of my friends (Tim Carter, no relation to me) is the president of Second Nature (https://secondnature.org/), a Cambridge-based non-profit organization that works with colleges and universities to facilitate positive climate action. More information about their goals is available from their website, such as https://secondnature.org/mission/.
They collect data from “signatories,” schools who have chosen to join their program and voluntarily state climate action goals and report their progress toward those goals. You can browse the data online at http://reporting.secondnature.org/ but it is not possible to download the full dataset from there; Second Nature has been kind enough to provide it to us in Excel form, and you can download it in that form from our Blackboard site.
Our class has been asked by Dr. Tim Carter of Second Nature to consider their dataset and tell them any insights that we find in it that help them further their mission.
Unlike previous assignments, this is not an imaginary scenario. Second Nature will appreciate any insights we share and may use them to impact the planet positively.
Second Nature has not had analytics done on their data regularly. The last time was by a firm called Fovea (https://www.foveaservices.com/) in late 2017. You can see the report of that analysis in the following webinar, from about 8:15 to 38:30. While Fovea provided important insights, I do not share this webinar to suggest that you should do a similar analysis. Rather, it can provide context and be an example of what helpful, goal-oriented, professional analytics looks like. The first 8:15 of the video are also helpful for context.
https://secondnature.org/webinars/analyzing-climate-leadership-network-progress/

Recall that this project is to be done individually. Students should not share work with one another, nor represent anyone else’s work as their own. If you’re uncertain what’s allowed, ask me and/or refer to the Bentley academic integrity policy. If you’re uncertain as to whether you need to cite an external source, I’m glad to discuss such issues with you to help you make the correct choice.

Here are the specific deliverables for this project (to be submitted together in a .zip file, or shared with your instructor using OneDrive or other cloud provider):
The third-to-last page of this document is entitled “Data Insight for Second Nature.” It shows how to write a one-page insight/recommendation with evidence from the data,
代写MA705留学生作业、代写Python程序语言作业、代写Python实验作业、website课程作业代做 with an optional one-page appendix. I will call each of these an “Insight Report.” You should produce three Insight Reports, each on a separate page. See an example on the last two pages of this document.
Unless you specify otherwise, I will share these with the team at Second Nature.
During the grading process, I will highlight for them which Insight Reports I think are the most valuable, in case they do not have time to read all 3 insights from all 65 students.
If you would prefer that I not share your team’s work with Second Nature but use it only for your grade in this course, tell me so when you submit your work. I will, of course, respect your wishes.
Also submit your final merged dataset as a Python Pickle (.pkl) file.
It should include data from all of the following datasets, uniquely indexed by IPEDS ID, one school per row.
Homeland Infrastructure Foundation (see in-class work, Nov. 13)
US News Rankings (see in-class work, Nov. 13)
Chronicle of Higher Education (see in-class work, Nov. 20)
Second Nature (see in-class work, Nov. 20)
Optionally, any other datasets that you would like to include. For example, one student suggested the following very large dataset might be helpful: https://collegescorecard.ed.gov/data/ (Thanks, Michael!)
In the merged dataset, name columns as follows:
Columns from the HIF dataset should be named HIF_ followed by the original column name in lower case, such as HIF_name, HIF_address, HIF_city, HIF_state, HIF_zip, HIF_latitude, HIF_longitude, etc. Include at least these columns, plus whatever other ones you find relevant. It is not necessary to include all HIF columns.
Columns from the USNews dataset should be named USN_rank, USN_description, USN_tuition and fees, USN_in-state tuition and fees, USN_undergrad enrollment.
Columns from the Chronicle dataset should be named CHR_level, CHR_student_count, CHR_fte_value, CHR_med_sat_value, CHR_endow_value, etc. Include at least these columns, plus whatever other ones you find relevant. It is not necessary to include all CHR columns.
Columns from Second Nature should be named SN_xxxx, where you can choose whichever columns you find relevant to include. It is not necessary to include all SN columns.
I will test your work by loading your dataframe with pd.read_pickle() and then executing some code on various columns to ensure that they seem to contain the correct data. I will provide you a portion of my testing code so that you can check your own work before submitting (see in-class work, Nov. 20).
How you will be graded:
Things that are easy to do right if you just pay attention (20% total):
Submitted each one-page Insight Report as described in this assignment
(each with an optional one-page appendix) 15%
(3x5%)
Submitted your dataset pickle file as described in this assignment 5%
Things that are easy to do right if you put in the effort (30% total):
Your work looks professional, ready to hand off to an outside client 5%
The Explanation section of each of your Insight Reports is clear and well-written 15%
(3x5%)
Your work uses a few different techniques that we learned in our class to accomplish its goals (such as filtering, merging, visualizing, simulating, grouping, imputing, pivoting, etc.) 10%
Things that take effort at doing math/stats/coding/business well (40% total):
The dataset in your pickle file passes the tests I will run on it for correctness 20%
The Explanation section of each of your Insight Reports uses reasoning, mathematics, statistics, and/or programming correctly 15%
(3x5%)
At least one of your insights could make a positive impact on Second Nature 5%
Subjective elements that you can use to show excellence (10% total):
At least one of your insights showed creative/innovative thinking 5%
At least one of your Insight Reports makes me excited and/or proud to hand that work of Bentley students to our client, because I think they will be impressed 5%
Total 100%

Data Insight for Second Nature
Submitted by: [PLACE THE NAMES OF YOUR TEAM MEMBERS HERE]
December 15, 2019

Insight
[STATE YOUR INSIGHT, IN ONE BRIEF SENTENCE. AIM FOR ZERO TECHNICAL TERMS. USE BOLD FONT. TRY TO MAKE THE READER EXCITED/INTERESTED IN THE INSIGHT.]

Explanation
[PROVIDE A SHORT PARAGRAPH HERE ADDING ANY DETAILS YOU FEEL ARE RELEVANT AND EXPLAINING WHY THE INSIGHT IS VALUABLE. MAKE IT CLEAR HOW YOUR INSIGHT COMES FROM THE DATA. STILL TRY TO AVOID TECHNICAL DETAILS. INCLUDE A GRAPH IF RELEVANT. IF YOUR INSIGHT SENTENCE SUCCEEDED AT GRABBING THE READER’S ATTENTION, SO THAT THEY WANT TO LEARN MORE, THIS IS WHAT THEY WILL READ NEXT, SO KEEP THAT IN MIND.]

Methodology
[EXPLAIN WHAT COMPUTATIONS/INVESTIGATIONS YOU DID TO ACHIEVE THE INSIGHT. TECHNICAL TERMS ARE ACCEPTABLE HERE, IF NEEDED. FOR EXAMPLE, YOU MIGHT SAY WHICH DATA YOU USED, WHICH STATISTICAL METHOD YOU USED (IF ANY), WHICH VARIABLES YOU INCLUDED (AND WHAT THEY MEAN IF IT ISN’T OBVIOUS), ETC. THIS IS FOR THE READER WHO IS TECHNICAL AND WANTS TO KNOW HOW YOU WENT ABOUT REACHING THE CONCLUSIONS YOU STATED ABOVE, SO THEY CAN BE SURE YOU DID YOUR WORK CORRECTLY/SENSIBLY.]

[ALL OF THIS SHOULD FIT ON ONE PAGE. IF YOU NEED TO INCLUDE A LOT OF TECHNICAL INFORMATION, LIKE A LARGE TABLE OR A STATISTICAL PRINTOUT OR SOME EXAMPLE CODE, YOU MAY CREATE AN “APPENDIX” HEADER ON THE SECOND PAGE, AND USE JUST THAT ONE ADDITIONAL PAGE. WHEN PRINTED OUT, THE APPENDIX WOULD BE ON THE REVERSE SIDE OF THIS PAGE. IF YOU DID SOME PARTICULARLY INTERESTING PYTHON WORK, FEEL FREE TO USE THE APPENDIX TO SHOW IT OFF SO THAT I CAN GIVE AN APPROPRIATE GRADE.]

Data Insight for Second Nature
Submitted by: Nathan Carter (as an example for the class to read)
December 15, 2019

Insight
Second Nature should seek more signatories among large, state universities who have won major football championships in the past three years.

Explanation
Obviously, this is a silly insight, but if it were real, I might say something like this: The data shows that institutions with football championships are underrepresented among Second Nature’s signatories (7.1% as opposed to a national average of 18.9%). And yet they have a lot in common with existing signatories; they sit within one standard deviation of the mean (among Second Nature signatories) on three major metrics: total undergraduate enrollment, total endowment, and US News ranking. Most importantly, these schools could make a big impact if they joined, because their size is typically quite large. These facts are all summarized in the following figure.

Methodology
Data on universities was extracted from the Homeland Infrastructure Foundation, US News, and Chronicle of Higher education datasets. We merged this with climate commitment data from Second Nature, then added sports data obtained from www.myfootballchamps.com, using their Python API. That API indexes institutions by IPEDS ID, so we could reliably match them into our existing dataset, using that standard identifier.

We then generated the chart above using data visualization methods in Python, and verified the trend using the well-known GKRPZ test invented by Nancy Pelosi and Mariah Carey in 1925. The relevant Python code and statistical output are shown in the appendix, for reference.

Appendix
Descriptive statistics
This data was extracted from our dataset with simple Python queries. (OK, not really, because this example is 100% made up, but for your work, you could say something like that.)
Number of institutions in the nation Number of institutions as Second Nature signatories Percent
Large, state schools with football championships 14 1 7.1%
All institutions 7523 1422 18.9%

Similarity of football schools to existing signatories
This data was extracted from our dataset with simple Python queries. (OK, not really, because this example is 100% made up, but for your work, you could say something like that.)
Total UG enrollment Total endowment US News Ranking
Mean (within SN) 18,951 $98.4M 36.1
Std. Dev. (within SN) 2,845 $33.6M 12.2
Range (1 std. of mean) 16,106-21,796 $64.8M-$132.0M 23.9-48.3
Min (football schools) 19,667 $65.5M 35
Max (football schools) 20,035 $112.0M 48

Python code for data visualization

(Most Python code is not interesting, but if your crucial code is worth sharing, you may.)

Statistical software output for the GKRPZ test
Since this example is completely fabricated, and there really is no such test, I don’t have anything to say here. If this were a real test, I might show the output of the statistical software, like the completely unrelated (but realistically nerdy) tables below, with a suitable explanation of what it meant (not provided here in this pretend example):

因为专业,所以值得信赖。如有需要,请加QQ:99515681 或 微信:codehelp

标签:Nature,work,MA705,dataset,Second,2019,Fall,data,your
来源: https://www.cnblogs.com/studydotnet2/p/12040075.html