Loading…
useR!2017 has ended
Back To Schedule
Thursday, July 6 • 6:05pm - 6:10pm
TAGS - Table Assorting Guided System: an HTML widget to create multiple tables from Excel spreadsheets

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Keywords: Excel spreadsheet, htmlwidget, data import
Despite advancements in data storing and sharing which allows for direct access to database content though API calls (i.e., SQL, NoSQL, etc.), Microsoft Excel™ and other spreadsheet applications remain widespread as data entry, distribution and presentation platforms for scientists across all fields of research and development. Spreadsheet applications have obvious advantages in terms of intuitive representation of data as tables, ease of use, and minimal to no programming effort. However, they generally have limited graphing and analytical capabilities. Adding large number of formulas and interactive plots can significantly slow down the application. Up until now, most attempts to extend Excel’s capabilities have consisted of connecting an Excel interface to an analytical engine, usually R, using Visual Basics for Excel (VBA) programming language. While it allows for relatively seamless integration of the two, the downside of this solution is that it requires proficiency in both VBA and R programming to develop new procedures. We have decided to take an alternative approach and bring Excel into R. Using Shiny® technology, we are able to import, display and interact with Excel workbooks inside web applications. An immediate benefit of this approach is that all programming can now be done in R without ever resorting to VBA. This opens doors to large number of possibilities, from flexible selection of data ranges to new methods of storing and retrieving data. Spreadsheets typically are constructed with additional metadata, making them very convoluted. It is also common to have multiple tables included in a single spreadsheet separated by empty columns or rows, which complicates the import of data to an R session. The arduous task of “cleaning” the files normally falls to the data scientist, who must either hard code the exact location of the tables in the spreadsheets using customized commands to retrieve the data, or allocate it to separate CSV files for posterior use. This last option should always be avoided, due to the possibility of data duplication. TAGS - Table Assorting Guided System, is an (JS-based) htmlwidgets package created to simplify the work of the data scientist while retrieving data from complex Excel spreadsheets. The package loads the data into a webpage display configuration similar to that of Excel. The user can then click and drag the mouse to highlight the location of the data in the spreadsheet, and tag it. Basic information about the table, such as the file location, name and sheet, is automatically added to the tag metadata. The user can than add any number of key:value pairs by typing them into a dialog box therefore expanding the number of table descriptors. The metadata is saved in a JSON file that can be used later to retrieve the data. Simultaneously, an S4 object containing both the metadata and values from the selected spreadsheet range are available to current R session. We will discuss the creation of the package as well as demonstrate its use. TAGS will reduce time lost and frustration when data are imported from spreadsheets, eliminate copying/pasting, and improve reproducibility.

Speakers


Thursday July 6, 2017 6:05pm - 6:10pm CEST
2.02 Wild Gallery