- Below, we read in lines of data from the Advanced National Seismic System (ANSS), on earthquakes of magnitude 6+, between 2002 and 2016. We display the first 15 lines. (You don’t have to do anything yet.)
anss.lines = readLines(“http://www.stat.cmu.edu/~ryantibs/statcomp-F16/data/anss.html”) head(anss.lines, 15) ## [1] “<HTML><HEAD><TITLE>NCEDC_Search_Results</TITLE></HEAD><BODY>Your search parameters are:<ul>” ## [2] “<li>catalog=ANSS” ## [3] “<li>start_time=2002/01/01,00:00:00” ## [4] “<li>end_time=2016/08/01,00:00:00” ## [5] “<li>minimum_magnitude=6.0” ## [6] “<li>maximum_magnitude=10” ## [7] “<li>event_type=E” ## [8] “</ul>” ## [9] “<PRE>” ## [10] “Date Time Lat Lon Depth Mag Magt Nst Gap Clo RMS SRC Event ID” ## [11] “-” ## [12] “2002/01/01 10:39:06.82 -55.2140 -129.0000 10.00 6.00 Mw 78 1.07 NEI 200201014017” ## [13] “2002/01/01 11:29:22.73 6.3030 125.6500 138.10 6.30 Mw 236 0.90 NEI 200201014018” ## [14] “2002/01/02 14:50:33.49 -17.9830 178.7440 665.80 6.20 Mw 215 1.08 NEI 200201024034” ## [15] “2002/01/02 17:22:48.76 -17.6000 167.8560 21.00 7.20 Mw 427 0.90 NEI 200201024041”
- This looks like webpage code mixed in with earthquake data. We don’t care about the first 11 lines, and it looks as if the data we want starts on line 12. Importantly, every line of data begins with a date, of the form YYYY/MM/DD, as in “2002/01/01”. Design a regular expression, call it date.pattern, to match to these dates. (Hint: use quantifiers to make date.pattern concise.). Use this and grep() to retrieve the lines of text containing earthquake data. Call the result date.lines. How many lines of data are there? Show the first 2 and the last 2 lines. When was the last earthquake recorded and what was its magnitude?
Hw2 Q5 (1 point). Check that all the lines in date.lines actually start with a date, of the form YYYY/MM/DD, rather than contain a date of this form somewhere in the middle of the text. (Hint: one clean way to do this is with anchoring. Also, it might help to note that you can look for non-matches using invert=TRUE.)
- From date.lines, extract just the date strings themselves, and call the resulting vector date.str.vec. (Hint: use regexpr() and regmatches().) Check that the first three are “2002/01/01”, “2002/01/01”, “2002/01/02”, and that the length of date.str.vec matches that of date.lines.
Hw2 Q6 (2 points). Which 5 days witnessed the most earthquakes, and how many were there, these days? (Hint: use table() and sort().) Also, what happened on the day with the most earthquakes: can you find any references to this day in the news?
Hw2 Q7 (2 points). How many earthquakes were there during each year in between 2002 and 2012? What year had the most? (Hint: use substr(), then table().)
- Look back at the lines of earthquake data printed at the start of this lab document. The columns for Lat and Lon give the latitude and longitude, respectively, of the earthquake. Importantly, this takes the form X.XXXX, XX.XXXX, XXX.XXXX, or any of these forms with a leading minus sign, where X is a number. Design a regular expression to match these entries, call it geo.pattern. Test it out on the trial string vector below, with grep(), and make sure that you match all the strings. (Hint: build the regular expression, “left to right”, following this logic: an optional minus sign, 1 to 3 digits, a period, then exactly 4 digits.)
trial.str.vec = c(“-55.2140”, “-129.0000”, “6.3030”, “125.6500”, “-17.9830”)
- Design a regular expression geo.pattern.sp that captures not only the latitude/longitude pattern (like geo.pattern), but additionally, any number of leading spaces (1 or more). Test it out on the trial string vector below, with regexpr() and regmatches(). Make sure that you match all the strings, and in each case the extracted text is the entire string.
trial.str.vec.sp = c(” -55.2140″, ” -129.0000″, ” 6.3030″, ” 125.6500″, ” -17.9830″)
- Finally, design a regular expression geo.pattern.pair that captures a latitude pattern, then any number of spaces (1 or more), then a longitude pattern. Really, this is just the concatenation of the regexes you already designed, geo.pattern and geo.pattern.sp. Use geo.pattern.pair, with regexpr() and regmatches(), in order to extract the latitude/longitude pairs from each line of earthquake data in date.lines. Call the result lat.lon.pairs, and display the first 3 entries, checking visually that it matches the results printed at the top of this lab.