Sagan Exoplanet Workshop Day 1

Morning

Eric Feigelson (Penn State) discusses Statistics and the Astronomical Enterprise, and why statistics are so essential to the discovery and study of Exoplanets. The history and development of statistics in relation to astronomy as well as the present state of the field are covered. Key examples of essential statistical applications in astronomy and astrophysics are discussed, as well as an outlook on potential future developments and applications. Practical computing implementations with R are discussed.

Jessi Cisewski (Yale) goes over Bayesian Methods. She covers the basics of Bayesian analysis (Bayes’ Theorem, prior and posterior distributions, and inference with posteriors). Examples of different analyses using different models and distributions are given. Classical/Frequentist approaches are contrasted with the Bayesian approach, and best practices are covered.

Xavier Dumusque (Universite de Geneva) and Nikole Lewis (STScI) introduce the hands-on sessions of the week. Xavier discussed differentiating Radial Velocity signals from planetary signals, and Nikole discussed detecting exoplanets in upcoming JWST data and analyzing their spectra.

Afternoon

David Kipping (Columbia) gives A Beginner’s Guide to Monte Carlo Markov Chain (MCMC) Analysis. MCMC examples are covered as a way to find a posterior distribution. Metropolis and Metropolis-Hastings algorithms are discussed, and situations where other algorithms may be more or less effective are presented.

POPs Session I

Slides here.

  • Ines Juvan (Space Research Institute Graz) – PyTranSpot – A tool for combined transit and stellar spot light-curve modeling
  • Anthony Gai (Univ. at Albany) – Bayesian Model Testing of Models for Ellipsoidal Variation on Stars Due to Hot Jupiters
  • Emiliano Jofre (OAC-CONICET) – Searching for planets in southern stars via transit timing variations
  • Sean McCloat (Univ. of North Dakota) – Follow-up Observations of Recently Discovered Hot Jupiters
  • Romina Petrucci (OAC-CONICET) – A search for orbital decay in southern transiting planets
  • Luke Bouma (MIT) – What should we do next with the Transiting Exoplanet Survey Satellite?
  • Akshata Krishnamurthy (MIT) – A precision optical test bench to measure the absolute quantum efficiency of the Transiting Exoplanet Survey Satellite CCD detectors

David Kipping (Columbia) teaches us about using Bayesian Priors for Transits and RVs. Selection, implementation, strengths and weaknesses, and ideal use cases for different types of priors are discussed.

POPs Session II

Slides here.

  • Allen Davis (Yale) – Assessing the information content of spectra using PCA
  • Matteo Pinamonti (Univ. of Trieste) – Searching for planetary signals in Doppler time series: a performance evaluation of tools for periodogram analysis
  • Keara Wright (Univ. of Florida) – Stellar Parameters for FGK MARVELS Targets
  • Richard Hall (Univ. of Cambridge) – Measuring the Effective Pixel Positions of the HARPS3 CCD
  • Sarah Millholland (UCSC) – A Search for Non-Transiting Hot Jupiters with Transiting Super-Earth Companions
  • Tarun Kumar (Thapar Univ.) – Radial Velocity Curves of Polytropic Models of Stars of Polytropic Index N=1.5
  • Fei Dai (MIT) – The K2-ESPRINT Collaboration

Eric Feigelson (Penn State) covers Statistical Approaches to Exoplanetary Science. The talk focuses on time series analysis. Parametric and nonparametric methods are shown for time domain and frequency domain  problems. Code examples and other potentially useful methods are given.

dotAstronomy Day 1

James Webb Space Telescope and Astronomy

Sarah Kendrew (ESA, STScI)

JWST goes well into the infrared
Launch Autumn/winter 2018 — lots of things that can go wrong, but these engineers are awesome.
Science proposals start November 2017.
Routine science observations start six months after launch.
Compared to next-gen observatories, JWST is an old school telescope. We can bring it into the 21st century with better tools for research.
Coordination of development tools with Astropy developers.
Watch the clean room live on the WebbCam(ha!).

Bruno Merin (ESA)

ESASky – a Multi-Mission Interface

Open Source Hardware in Astronomy

Carl Ferkinhoff (Winona State University)

hardware.astronomy
Bringing the open hardware movement to astronomy
1) Develop low(er) cost astronomical instruments
2) Invest undergrads in the development (helps keep costs low).
3) Make hardware available to broader community
4) develop an open standard for hardware in astronomy

Citizen Science with the Zooniverse: turning data into discovery (Oxford)

Ali Swanson

Crowdsourcing has been proven effective at dealing with large, messy data in many cases across different fields.
Amateur consensus agrees with experts 97% of the time (experts agree with each other 98% of the time), and remaining 3% are deemed “impossible” even by experts.
Create your own zooniverse!

Gaffa tape and string: Professional hardware hacking (in astronomy)

James Gilbert (Oxford)

Spectra with fiber optic cables on a focal plane.
Move the cables to new locations.
Use a ring-magnet and piezoelectric movement to move “Starbugs” around — messy, inefficient.
Prototyped a vacuum solution that worked fine! This is now the final design.
Hacking/lean prototypes/live demos are effective in showing and proving results to people. Kinks can be ironed out later, but faith is won in showing something can work.

Open Science with K2

Geert Barentsen (NASA Ames)

Science is woefully underfunded.
Qatar World Cup ($220 billion) vs. Kepler mission ($0.6 billion)
Open science disseminates research and data to all levels of society.
We need more than a bunch of papers on the ArXiv.
Zooniverse promotes active participation.
K2 mission shows the impact of extreme openness.
Kepler contributed immensely to science, but it was closed.
Large missions are too valuable to give exclusively to the PI team — don’t build a wall.
Proprietary data slows down science, misses opportunities for limited-lifetime missions, blocks early-career researchers, and reduces diversity by favoring rich universities.
People are afraid of getting scooped, but we can have more than one paper.
Putting work on GitHub is publishing, and getting “scooped” is actually plagiarism.
K2 is basically a huge hack — using solar photon pressure to balance an axis after K1 broke.
Open approach: no proprietary data, funding other groups to do the same science, requires large programs to keep data open.
K2 vs K1: The broken spacecraft with a 5x smaller budget has more authors and most publications, and more are early-career researchers because all the data is open. 2x increase, and a more fair representation of the astro community.
Call to action: question restrictive policies and proprietary periods. Question the idea of one paper for the same dataset or discovery. Don’t fear each other as competition — fear losing public support.
The next mission will have open data from Day 0 thanks to K2.

Lightning Talks

#foundthem

Aleks Scholz (University of St Andrews)

SETI, closed science vs open science and communicating with the public.

astrobites

Ashley Villar (Harvard)

Send your undergrads to Astrobites! Advice, articles, tutorials.

There is no such thing as a stupid question comic book

Edward Gomez (Las Cumbres Observatory)

Neat astro comic book for kiddos.

Astronomy projects for the blind and visually impaired

Coleman Krawczyk (University of Portsmouth)

3D printing galaxies as a tool for the blind.

NOAO Data Lab

Matthew Graham (Caltech/NOAO)

Classifying Stellar Bubbles

Justyn Campbell-White (University of Kent)

Citizen science data being used in a PhD project.

The Pynterferometer

Adam Avison (ALMA)

A short history of JavaScript

William Roby (Caltech)

JavaScript is more usable thanks to ES6, and it follows functional principles. Give it another try if you’ve written it off!

Asteroid Day – June 30th, 2016

Edward Gomez (Las Cumbres Observatory)

International effort to observe NEAs with Las Cumbres.

The Most Interesting Research at SCMA VI

I’m spending this week at Carnegie Mellon for the 6th quinquennial Statistical Challenges in Modern Astronomy conference. I’ll keep this post updated with what I think are the most interesting research talks and posters (as judged by this statistically insignificant person’s opinion).

Day 1

Jessi Cisewski – Approximate Bayesian Computation for the stellar initial mass function
Uros Seljak – Optimal extraction of cosmological information from large scale structure
Kendrick Smith – CHIME: the Canadian Hydrogen Intensity Mapping Experiment
Elise Jennings – CosmoSIS: a flexible framework for parameter inference

Thoughts on the Potential of Open Data in Cities

The promise of Open Data has drawn most major US cities to implement some sort of program making city data available online and easily accessible to the general public. Citizen hackers, activists, news media, researchers, and more have all made use of the data in novel ways. However, these uses have largely been more information-based than action-based, and there remains work to be done in using Open Data to drive decisions in government and policy-making at all levels, from local to federal. Below I present some of the challenges and and opportunities available in making use of Open Data in more meaningful ways.

Challenges

Standardization and Organization

Open Data is dirty data. There is no set standard between different cities for how data should be formatted, and even similar datasets within a city are often not interoperable. Departments at all levels of government often act independently in publishing their data, so even if most datasets are available from the same repository (e.g. Socrata), their organization and quality can differ significantly. Without a cohesive set of standards between cities, it is difficult to adopt applications built for one city to others.

Automation

The way data is uploaded and made accessible must be improved. Datasets are often frozen and uploaded in bulk, so that when someone downloads a dataset, they download it for a particular period in time, and if they want newer data, they must either wait until it is released or find the bulk download for the newer data. This involves more human effort both in the process of uploading the data and in downloading and processing that data. Instead, new data should be made immediately accessible as a stream with old data going back as far in time as it is available. This allows someone to access exactly as much data as they need without the hassle of combing through multiple datasets, and it removes the curators need to constantly compile and update newer datasets.

Accessibility

Compared to the amount of data that the government stores, very little of it is digital and very little of what is digital is publicly available. The filing cabinet should not be a part of the government storage media. Making all data digital from the start makes it simpler to analyze and release. Finally, much of the data the government releases is in awkward formats such as XLSX and PDF that are not easily machine-readable. If the data is not readily available and easily accessible, it in effect does not exist.

Transparency

Most of the publicly accessible records that the government has are not readily available unless FOILed. The transparency argument of Open Data could be taken to a completely new level of depth and thoroughness if information at all levels of government was made readily available digitally as immediately as it was generated. Law enforcement records, public meetings, political records, judicial records, finance records, and any other operation of government that can be publicly audited by its people should be digitally available to the public from the moment it is entered into a government system.

Private Sector Data

Companies such as Uber and Airbnb have come to collect immense amounts of data on transportation and real estate that have historically fallen under regulated jurisdiction. Decisions should be reached with private companies to allow governments to access as much data as is necessary to ensure proper regulation of these utilities. This data should in turn be added to the public record along with official government data on these utilities.

Opportunities

Analytic Technologies

Policy-making should be actively informed by the nature of a constituency. Data-driven decision making is much hyped, but making it a reality requires software that easily and quickly gives decision-makers the information they need. From the city to the federal level, governments should have dashboards that summarize information on all aspects of citizens’ lives. These dashboards can contain information about traffic, pollution, crime, utilities, health, finance, education, and more. Lots of this data already exists within governments, and surely there exist some dashboards that analyze and visualize these properties individually, but to combine all available data on the population of a city can give significantly more insight into a decision than any one of these datasets alone.

Predictive Technologies

Governments have data going far back into history. Cities like New York have logged every service request for years, and that data is readily available digitally. Using the right statistical analysis on periodic data like heating requests, cities can start to predict which buildings might be at risk for heating violations in the winter, and can address such issues before they happen. The same can be applied to pot holes, graffiti, pollution issues and essentially any city-wide phenomena that might occur regularly. More precise preventative measures can be taken with more confidence, and eventually, the 311 call itself can be ruled out entirely.

Future Outlook

These ideas have the potential to radically change the way we engage with our cities and our politics. We can make decisions based unambiguously on what is happening in the world, and we can refine those decisions based on measured changes in the world over time. A population can know exactly if its citizens are getting healthier, safer, and smarter, and how to aid in these pursuits. Areas of governance that need more attention and potential approaches will become increasingly obvious as more information is combined and analyzed in meaningful ways. Decisions and their outcomes can be made with more confidence based on a more rigorous process. By making the most of Open Data, we can go beyond interesting information and begin to drive political action that directly benefits our cities, states, and nation.

A Note on Privacy

All of the ideas presented above have serious implications for the privacy of individuals and populations. These ideas have only considered the best-case uses of data in our society. Whether a government is analyzing granular data or data on a population in bulk, care must be taken to respect the privacy of its citizens. There is ongoing dialogue about how to balance data collection and privacy, and it is essential that governments and citizens take part in this dialogue as new technologies are developed and our societies become more data-driven.

Science on Supercomputers

A slice of the universe, created on Stampede.
Our simulations are far too computationally intensive to run on normal computers, so they’re run on the Stampede supercomputer at the Texas Advanced Computing Center. Stampede is a massive cluster of computers. It’s made up of 6400 nodes, and each node has multiple processors and 32GB to 1TB of RAM. The total system has 270TB of RAM and 14PB of storage. It’s hard to put these numbers into terms we can compare to our laptops, but essentially, this is enough computing power to simulate the universe with.
Stampede
Sometimes people ask if I use the telescope on top of Pupin, and when I answer “No,” they wonder what on earth I’m doing with my time. Mostly I write code and run scripts. This sort of astrophysics sounds unglamorous, but it amazes me. All I need is my computer and an internet connection, and I have the real universe and any theoretical universe I can dream up at my fingertips. Computers and the internet have completely changed the way we do science, and Stampede is just one reminder of the capability and potential of these new scientific tools.

The Importance of Data Visualization in Astronomy

It is difficult to understate the importance of data visualization in astronomy and astrophysics. Behind every great discovery, there is some simple visualization of the complex data that makes the science behind it seem obvious. As good at computers are becoming at making fits and finding patterns, the human eye and mind are still unparalleled when it comes to detecting interesting patterns in data to reach new conclusions. Here are a few of my favorite visualizations that simply illustrate complex concepts.

Large Scale Structure

As we’ve mapped increasing portions of the known universe, we’ve discovered astounding structures on the largest scales. Visualizing this structure in 2D or 3D maps gives us an intuitive grasp of the arrangement of galaxies within the universe and the forces the creation of that structure.

Galaxy filaments

Sloan Digital Sky Survey
The Sloan Digital Sky Survey is a massive scientific undertaking to map objects of the known universe. Hundreds of millions of objects have been observed going back billions of years. It may seem overwhelming to even begin processing this data, but a simple map of the objects in the sky provides immediate insight into the large scale structure of our universe. We find that galaxies are bound by gravity to form massive filaments, and that these filaments must contain mass beyond what we can see (in the form of dark matter) to form these web-like structures.

Fingers of God

Fingers of God
If you plot galaxies to observe large-scale structure, a peculiar pattern emerges. The structures seem to point inward and outward from our position in the universe. This violates the Cosmological Principle, which states that no position in the universe should be favored over any other. So why do these filaments seem to point at us? The cause of these “Fingers of God” is an observational effect called redshift-space distortion. The galaxies are moving due to larger gravitational forces of their cluster, as well as the expansion of the universe, so their light seems to be accelerated towards or away from us. Correcting for this effect gives us the more random filaments we see above.

Expansion of the Universe

Hubble's Law
In 1929, Edwin Hubble published a simple yet revolutionary plot. He plotted the distance of galaxies from us, and the velocities at which they moved toward or away from us. What he found was that the farther away a galaxy was, the faster it moved away from us. This could not be the case in what was thought to be a static universe. Hubble’s Law came to prove that our universe was in fact expanding.

Galaxy Rotation Curves

Rotation Curve of NGC3198
When we plot the rotational velocity of galaxies, we expect the rotational velocity to fall off as the radius increases based on the mass we can observe. As we get further away from the center and into less-dense regions, the matter should lose angular momentum and rotate slower. However, plotting rotational velocity curves reveals something peculiar — the rotational velocity remains constant regardless of radius as you leave the center. This means there must be matter we aren’t seeing: dark matter.

Our own research

Density Plot
Visualization has proved important in our own research as well. Simple sanity checks on the large-scale structure of our simulations helps us make sure our simulations are running properly. Plots of different parameters show simple relationships that arise from the physics of our simulations.

Graphing Google Voice Data

I finally finished off some nice plots of my daily text message history for the past ~40 months. The most difficult part was dealing with Google Voice’s terrible exported HTML format. I will post the Python scripts and more detailed plots and interpretations of the data soon, once things are more polished, so more people can plot out and interpret the mundane details of their lives!

Operating on Assumptions

Over the past few weeks, our Design for America team has made a different type of progress in our ongoing research in the state of community and education in the Harlem community. When we started off, much of the difficulty of moving forward was in organizing all of our thoughts and goals effectively. We had no lack of ideas — we could talk about education for a sustained three hours every Monday night without a problem — but the weeks passed, and we soon found ourselves having actually created little despite having thought of a lot.

The white board after our first meeting — all kinds of ideas, idealism, and potential.

We had researched for weeks and discovered all sorts of useful information, but we could go on doing this for months — eventually, we had to converge on some more precise goals and directions forward. Looking back at the pages of research and piles of sticky notes, we converged on a few common themes and narrowed down our target audience and goals. Gaps in reading and literacy as a result of environmental disadvantages consistently came up in our research, parent-student-teacher communication and relationships were regularly found to be critical to early childhood development (parent-student relationships especially affected the child’s later emotional and educational success), and all of this started to set in at an early age, so we decided to focus on elementary school children in the younger end of the K-6 spectrum. Our proximity the Harlem community and Columbia’s messy history with its residents made it our ideal environment to work with. Now we had a more concrete plan to improve the relationships and interaction between parents, students, and teachers centered around reading, hopefully improving literacy rates and fostering a love for reading in the process.

Looking back, the process of refinement began when we started to question and remove our assumptions. The more speculation and hypotheticals we had, the more thinking and research was being done into another semi-related topic — we could go down all sorts of rabbit holes forever had we kept at it. Eventually, the continuing expansion of the theoretical and speculative aspects of our sessions became counterproductive in the ultimate goal of actually implementing something. What allowed us to focus on something specific was simply getting out into the community, visiting libraries, talking to teachers, reaching out to outreach groups, and filling in our unknowns. In some sense, this validated some of our previous research and thought, but it was even more useful in simply letting us know what we shouldn’t focus on. Removing the extraneous left us free to discover and focus on the underlying core ideas of our project.

Of course, this is a constant work in progress, and given the amount of change we’ve seen in the last month alone, it’s to be expected that our implementation and ideas continue to adjust as we learn more and interact with the community at large. I suppose in the design process, we’re learning how to learn, too.

Reinvent 311 Mobile Content Challenge: Homeless Helper NYC

NYC 311, with help from Stack Exchange, held the Reinvent 311 Mobile Content Challenge, which called on developers to use NYC’s revamped Open 311 Inquiry API to make city information more readily available on people’s mobile devices.

I started out focused on education data, but it was messy and too loosely organized to be of any immediate use. If you want to extract meaningful information from it, you could, but it would take some cleaning up and organizing to make useful. It isn’t as easy as displaying an API call from a mobile device.

After looking through more of the available data and consider the different use cases, I settled on an app designed to help homeless people in the city. This seemed like a terrible idea at first — how do you use technology to help those without access to it?

A few ideas came to mind. A map of food banks and soup kitchens (along with directions) could be useful. There were also lists of intake shelters online, but no coherent sets of shelters, so a map of shelters would also be useful. Finally, there were also information-based services that the city offered — information on food stamps, homeless prevention, youth counseling, job services, and other outreach information. Putting all of this data together wouldn’t be terribly difficult, and it would create an app that someone might find helpful. Homeless people typically don’t have access to smartphones, but outreach groups like Coalition for the Homeless could use technology to help those without it.

The app is modeled after the data from the API, with different objects for each type of API response. This allowed me to easily create maps give any set of facilities (shelters, food banks, etc.), and information pages given any city service. In this sense, the Open Inquiry API’s design has allowed for flexibility in adapting the app to easily include more data.

One of the main strengths of the Open 311 API is that once an app is created for one city, it should be seamlessly compatible with data from another city, since the API calls are all the same — all that changes is the city that serves up the data. This is a fantastic ideal outcome, but the implementation is slightly off, especially in this particular use case. The data I used needed minor refinements — nothing extreme, but I had to manually decide which services were relevant to this audience, since there is no “homeless” category. I also whipped up some quick Python scripts to add more useful latitude/longitude data from street addresses using the Google Maps Geocoding API.

Perhaps the biggest flaw in the data is the lack of information in API calls. One of the goals outlined by the people at NYC 311 was to reduce the number of calls to 311 asking for information that was readily available online. A mobile app is a great way to make this information more immediately available, but for most services, nothing more than a description was offered. The “More Information” and “FAQs” sections simply said “Call 311 for…” — the data isn’t useful if it simply redirects to the old method of calling for information.

Besides the lack of completeness and consistency in the city’s data, the potential for interesting tools and visualizations is clear. There’s a ton of data, the challenge is in sifting through it and making it easily usable.

The demo event itself was fantastic fun. The other contestants had impressive applications, most of which focused on low-income resources and real estate data. I met Joel Spolsky, who offered some sharp and honest advice for the contestants (think StackOverflow, but in person), Noel Hidalgo, who runs BetaNYC as a part of Code for America (check out their projects here), the talented people of StackExchange, designers from HUGE, and a gaggle of kind people from NYC 311. In the end, Homeless Helper NYC won a prize for “Best presentation of 311 information targeting a specific audience,” and it’s out now on the Play Store. You can also contribute to the source code on GitHub!

Sage for Android Testing APK Now Available!

After much revision and cleaning-up, the Sage Android application is now at a point where most basic features are functional, and bug reporting, feature requests, and general feedback are needed as work on the application progresses. If you’d like to try the latest APK, you can download it here. Features are always being added (an updated APK with History and Favorites will be available soon!), and you can track the latest updates at the GitHub repository.

As always, feedback and suggestions are much appreciated. Thank you!