www.fgks.org   »   [go: up one dir, main page]

Showing posts with label Data Science. Show all posts
Showing posts with label Data Science. Show all posts

Monday, August 27, 2018

BUDSC18: Bucknell University Digital Scholarship Conference

I received this in email as it is written below.



The organizers for BUDSC18 (Bucknell University Digital Scholarship Conference) are excited to announce the opening of conference registration. The conference will take place at Bucknell University in Lewisburg, PA from October 5th-7th. The theme of the conference is “Digital Scholarship: Expanding Access, Activism, and Advocacy.”

More information about registration, keynotes, and the conference program will be posted to the conference website.

#BUDSC18 will bring together a community of practitioners–faculty, researchers, librarians, artists, educational technologists, students, administrators, and others–committed to promoting access to and through digital scholarship. We consider “access” in the broadest possible terms: accessible formats and technologies, access through universal design for learning, access to a mode of expression, access to stories that might not otherwise be heard or that might be lost over time, access to understanding and knowledge once considered beyond reach.

We hope to see you in Lewisburg this fall for another exciting installment of BUDSC.

Saturday, September 30, 2017

You and the Internet of Things

 This fall, the SU iSchool has begun to offer Graduate Immersion Milestone Seminars. The first one is on the topic of "You and the Internet of Things."  Graduate students across the iSchool's graduate programs are in attendance, including MSLIS students.  

From my perspective, the the pros, cons and pitfalls of Internet of Things (IoT) is not a topic that is widely discussed in library circles.  Yes, we recognize that devices are capturing information, but:
  • Do we think deeply about what data is being captured by or in the library? 
  • Have we thought about how the Internet of Things can make libraries better?  
  • Have we thought about how the collected data is being stored and secured in the cloud?  
  • Have we thought about what could happen if our data is hacked?
The speakers this morning were not focused on libraries, but that doesn't mean we can't apply their topics to our library environment.  Below you'll see I've inserted some "library thinking" into my notes.  Please add comments if you have information to add or questions to ask.
 
Megan Snyder - Internet of Things and Cyber Security

Concerns:
"Things" can live long, software does not.
  • New vulnerabilities are addressed with new software
  • While you might replace your phone, for example, every two years, it will receive several software updates during that period.  Of course, people might not apply all of the updates, which could leave a security gap.
  • Imagine people being able to hack into a car or other things, which could be used to do harm
Things with sensitive data are connected
  • While you immediately think of banks, there are low tech devices which can capture sensitive data
  • Securing sensitive data
    • proactive ethical data stewardship
    • end to end security processes
    • innovate with new technologies 
Things are making decisions
  • Think about smart locks, smart homes, and smart grids
    • need built-in monitoring and then identifying of risks
  • There have been attacks on infrastructure worldwide, which was done by attacking the software
The future of securing IoT
  • Both customers and businesses need to focus on this
  • Need to look at the entire supply chain
While Snyder did not talk about libraries, consider that libraries are using software which is stored in the cloud or software as a service (SaaS).  That software could be storing information on library users/patrons, including private information such as books borrowed.  A security breach could make that information public.  Or a security breach could be used o alter the user data or alter the information on the library's collection.

Is the personal data stored in libraries a vulnerability that needs more attention?
  • Imagine a child changing his/her personal information so the person can check out adult books.
  • Imagine someone hacking an library system and wiping out fines.
  • Imagine a library's collection information being altered or deleted.
  • Imagine the software being delivered as SaaS being altered at the source, rendering all of its implementations useless.
Snyder noted that the U.S. Is behind in passing laws which would cause non-for-profits to pay attention to their cyber security concerns.

Radhika Garg (@gargradhika) - Does privacy disappear with IoT?

Are the implications that we as consumers are not aware of, in terms of cyber security?

IoT is not a single technology,  it is a combination of sensors, devices, networks, and software that work together to unlock valuable, actionable data.  If you are interacting with any part of that ecosystem, you should be concerned with cyber security. 

Garg asked if people use Dropbox and then asked if people know where the data is actually stored.  We use Dropbox to store a variety of different data, but we have no idea where that data really is and how it is being secured.

Data in the cloud can be used by the cloud service to learn about you, and then use that data, for example, to send you advertisements.

IoT dilemma - the information collected by sensors can be used for services that benefit and simplify people's lives, or it can be used for data mining and other use cases that raise security and privacy concerns.

Imagine the habits that your sensors know about you.

Garg noted that a sensor may only collect data, but then transmit the data to the cloud where it can be analyzed, shared, used, and abused.  Once the data is in the cloud, you have no idea what third parties that data might be shared with.

Although we do anonymize data, data gathered on a person from different sources may contain enough information to de-anonymize all of the data.

Can we collect less data?  Is there a minimal amount of data that is needed for a specific function?

While Garg talked about sensors, it occurred to me that video cameras in our cities and buildings are collecting our images.  Software can be used to identify people in those videos and it can be done automatically.  Software can also then track where people are traveling and when.  Imagine combining that information with sensor data, which could disclose more about your state/health when you were traveling through and between locations.

Garg noted that companies assume that people do not read privacy policies.  She also asked how are we expected to read the privacy policy on sensors, if sensors do not have screens?

Both Garg and Snyder noted that the privacy rules in the EU are better than in the U.S. The EU rules do affect U.S. residents because of U.S. companies doing business in Europe and needing to comply with EU policies.

In the U.S., state and federal laws are not harmonized on what is personal data.  We need to harmonize our laws in the U.S. and then harmonize our laws with the EU.

Next steps for organization in IoT ecosystem include:
  • privacy by design
  • privacy notice and transparency 
Garg ended by talking about the right to be forgotten, which has been written into EU law.

Kim Rose - How hospitals are embracing IoT

Rose talked about privacy legislation related to healthcare, such as the HITECH Act.

Medical devices inside the hospital
  • vital sign monitor
  • surgical procedures
  • intelligent bed
  • medical imaging
Outside the hospital
  • home sleep study
  • CPAP machine
  • cardiac monitor
  • diabetes blood sugar monitor
IoT has changed how medicine is being practiced.

Rose didn't connect her talk to libraries, but I can imagine a patient opting in to having their medical data shared with the hospital's medical library.  That would allow the library to deliver information to a patient which relates to the person's reason for being in the hospitals. Yes, that would raise huge privacy concerns.  Would the benefits outweigh the risks?

The talks this morning have made we wonder about cyber security, the Internet of Things (IoT), and libraries. Is this an area that we're really talking about?  Who are the library leaders in this space?  What conferences are talking about this?

On Twitter (#IoTSUiSchool), Jason Griffey said he is writing a library tech report right now on sensors.  It should be available late 2017 or early 2018.

Sunday, November 06, 2016

NYLA2016 : Elaine Lasda - Get Fancy With Your Library Data

Data Collection Scenario A
Data Collection Scenario A
Elaine Lasda, @ElaineLibrarian, Her slides will be available at http://slideshare.net/librarian68

Some stakeholders respond better to data. In fact, many of our stakeholders respond well to data.   Data can tell us about our impact.  Anecdotes can play very well, too, with some people.

What is data?  Lots of things are and format can affect what you can do with it.

Elaine Lasda focused on quantitative data during the session, but wanted people to realize that data isn't always numbers.

What are the limitations to data?  
  • People can argue over the interpretation of the data.
  • It doesn't account for a person's gut (feelings).
Data can provide actionable insights.  (This is what we want.)

Data Collection is where it starts.
  • Remember garbage in, garbage out.
  • Was the data collected correctly?
  • Does the data fit the purpose?
Data collection scenario "A" (see image)
  • Need a clear definition of what you're looking for.
  • What is the best way of collecting the data?
  • Make sure that the data is collected accurately.
  • As much as possible, eliminate the possibility of errors in the data.
Data Cleaning: (See tools list below.)
  • Data cleaning can take up to 80% of your time.  While it is critically important, it is not "sexy."
  • This is putting the data into the format that you need and doing any normalizing.  
Data Cleaning Resources
Data Cleaning Resources
Data Analysis:
  • Going from data to information to knowledge to wisdom

Data Collection Scenario B
Data Collection Scenario B

Remember that correlation does not mean causation.

How do you get data from non-library users?  One person paired public library staff with board members who then went to different places on a Saturday to interview people.

Data Presentation:  With the chart and graphs, make sure the scale does not lead people astray in interpreting the information.

Top10 Worst Graphs in Science (web page)

Elaine suggests that people use free and low cost data tools.  She said that you don't always need  expensive tools.

Her library has use data analysis to improve workflow.

Resources:  







Monday, December 14, 2015

What’s New in Competitive Intelligence?

Last Friday, I spoke to one of the councils of the Manufacturers Association of Central New York on competitive intelligence.  Since I don't like for a good handout to go to waste, I'm placing it here.  Competitive intelligence is what I did as a corporate librarian and then when I started Hurst Associates, Ltd.  Now that I'm an academic, I like talking with students about this work.  It is not something the initially consider when they think about library science.

Tuesday, June 10, 2014

#SLA2014 : Data Caucus

Elaine Lasda Bergman and Kimberly Silk (@SLAdatacaucus) ran the meeting.

This caucus is new and is now accepting members. Currently the email list is open, but will become closed later this year.

The caucus will want to create a program in the 2015 conference and the group brainstormed a long list of possible topics.  In addition, the group discussed other SLA units that it might partnership with.  Caucuses can have sponsors.  The meeting ended with discussion of how people wanted to communicate and a call for volunteers.  

#SLA2014 : Amy Affelt - The Accidental Data Scientist: A new role for librarians and information professionals 

What is big data? You know it when you see it.

McKinsey: amount of data collected will grow by 40% per year.
15 out of 17 industries will have more data than the information stored in the Library of Congress.

How is the data different?  It is being collected in the background and automatically, as well as being user generated. 

Gartner's five V's:
  • Volume
  • Velocity
  • Variety
  • Verification
  • Value
Verification and value are places were information professionals can have a role determining the value is challenging, risky, and expensive.

Cool big data applications...

Healthcare
  • Msft readmissions manager
  • Stanford drug pairings
  • MyAchoo
Transportation
  • Street bump
  • Xerox ExpressLanes
  • Fixed
Entertainment
  • My magic +
  • RUWT
  • Qcue
We have the skills to work with big data.  We think about things in a critical way.  We should not say "it is easy", but we should work to ensure that our skills are valued. 

Big data busts:
  • Google flu trends
  • Crimson Tide v. Auburn
  • Target "targeted" coupons
  • Lego - did not use big data methods 
  • Boston Marathon Manhunt - did not take a big data approach
Bad big data advice
  • Sketchy citation algorithms - what if the citing article states that the citation is junk?
  • Re-use of data - how do you ensure that the recycled data is clean?
  • Global data sharing - garbage in, garbage out.  How do you prevent garbage in?
We can help people find data and make sure that it is authoritative.
Did you consider alternative data sources?
What biases are inherent in the interpretation?

We'll take it from here:
  • Search
  • Discover
  • Analyze
  • Communicate impact
  • Create deliverables
What's in it for me?
  • Look for big data projects in your industry.  How could you fit into those projects?
  • What are the vexing issues?
  • What is our mission?
  • Set the context to build connections between data points.  Patterns v. Predictions, Coincidence v. Causation
  • Embed into IT and Bog Data teams to provide point of need research
  • Curiosity = high quality
  • Data science v. Data intelligence - not bigdata but better data
Big data communications framework
  • Understand the business platforms
  • Determine impact measurements
  • Discover data available
  • Decide which data is most valuable
  • Formulate hypothesis
  • Communicate the results - what's the story?
How do you get hired as a data scientist?  Gigaom.com article.  Also...
  • Core competencies
  • Learn totally a story
  • Exercise creativity and curiosity/healthy skepticism 
  • Show up and be ready to learn 
New big data roles
  • Data policy expert 
  • Data release expert  
  • Exit survey on data expert

Monday, June 09, 2014

#SLA2014 : Big Data & Job Opportunities Panel

This session was moderated by Jane Dysart.  
Book - The Human Face of Big Data
Promotional video for the book, http://m.youtube.com/watch?v=7K5d9ArRLJE

Data is the exhaust of our lives.

Amy Affelt (@aainfopro) - Librarians have always worked with data.  Librarians have a role in working with data.  We may not be the programmers.

We could have roles around describing data and helping with its use.

She is writing "The Accidental Data Scientist" (provisional title), which will be out next year.  It can be preordered.  

She mentioned the article  on six big data tools that anyone can use.  See http://gigaom.com/2013/01/31/data-for-dummies-5-data-analysis-tools-anyone-can-use/

Daniel Lee (@YankeeInCanada) - small data enthusiast and a big data wannabe.  Need to learn how to scale from small data to big data.

How do you catalogue data at the question level? 

We need the business acumen, a long with the data skills.  Some librarians do have the technology skills that a data scientist needs.

We need to understand the privacy issues.  This could be an area for information professionals.  Professional associations could be providing education around privacy.  

We could also get involved in helping organizations understand the security concerns.

Getting involved in data doesn't necessarily require a huge upfront cost.  There are open source tools.  He notes that our SLA vendor partners have data and data tools. He talked about using data created at an SLA conference through twitter and analyzing it.

Kim Silk (@KimberlySilk) - she is contributing to Amy's book. Her job title is data librarian.  She supports the research team at her institution. There is more than one data librarian at the University of Toronto. One person works with the licensing of data.  They help students with analysis.

It doesn't take a long time for data to get big.  Data can get too big for Excel quickly.  Then you need to use SAS, SPSS, or something else. 

Data is just another media type.  There will be a need for data policy librarians. 

She mentioned "data ferret" as a tool for converting data sets.

She showed a graphic on "Toronto Public Library creates over $1 billion in total economic impact".  The graphic is on page 1 of this report, http://martinprosperity.org/media/TPL%20Economic%20Impact_Dec2013_LR_FINAL.pdf. The tables used to create the graphic are in the appendices of the report.  The graphic is something that could help the community understand the economic impact and would be something that the media could use.  The table uses the market value for equivalent services that libraries provide.  

Visualizing data makes big, hairy information understandable.  Allows you to overlay data.  She described a project in Toronto that surfaced and demonstrated transportation/transit deserts.  A single map can tell you a million things.

Jane Dysart (@jdysart) - mentioned a library that hired a number of data visualization people.   

In order to make decisions using data, we need to be able to understand it.  Visualizations help.

Consider what data would impress your boss.

What skills are needed (from the panel):
  • Coding - scripting languages 
  • Classification (coding, metadata)
  • Data privacy
  • Comfort with technology
  • Ability to understand your data collection (and their subject areas)
  • Sense of curiosity
  • Analysis
  • Can tell the story that the data is telling
Code4lib has job ads for data focused jobs.

Q&A :

Is there a need for backend computer skills or graphic design? - Silk has acquired more of her skills on the job.  Some of the work in her organization (e.g., graphics) are done by other people.  Lee asks why some information professionals are reticent to offer analysis.  It is a hump that we need to get over.

Places to get additional training? - Lee is believes in training himself.  He looks for free tools like Udacity, MIT, Code Academy, MOOCs.  The problem is choosing the training, not finding it.

Sunday, June 08, 2014

#SLA2014 : More Than Pretty Pictures: A Guide to Data Visualization for Info Pros

View from outside Conference Centre room 121Marcy Phelps, owner of Phelps Research, presented on this topic to a standing room only crowd.  She began doing visualization because of her work with marketing professionals, who liked visualization.  Phelps has placed her slides on her web site (phelpsresearch.com).  This was an introductory session on visualization.  She did not go deep into how to visualize and only touched on some of the tools.

Phelps blames the trend on visualization on TV, a visual medium.  Instagram and Pinterest are rapidly growing, which shows how much we want to communicate through pictures.  Our clients are downing in information.  Visualization helps to make that more understandable.  Visualizing adds value to the research that we can do.  Visualization creates interest in boring numbers.

Visual information becomes mor readable, logical/memorable, and usable.

Understanding data visualization - there is a huge range of how people are analyzing data and creating visualizations.  

Data art vs. Data visualization - infographics are data art.  Data art doesn't tell a story in the same way as a visualization.  

Data visualization = visualization design + visual analysis

Data visualization allows you to report or explore.  

Phelps recommends the book "Show Me the Numbers." See http://www.analyticspress.com/show.html

There are four functions for data visualization: analyze, communicate, monitor, and/or plan.  Can allow you to get fast answers, uncover hidden insights, investigate cause and effect, and real-time tracking.

BTW Gapminder is a online resource for a fact-based world view.  

The health industry has been doing visualizations for quite a while.  Check out Health InfoScape, for an example.

Selecting the correct graphic is key.  There is another book called "Say It With Charts", which she recommends.  Having the skill to select the correct visual for the data is important.

A question was asked about creating visualizations that would be understandable to people with color blindness or other sight disabilities.  The audience agreed that there is no one solution, but that there are sites and tools that could help someone create a more broadly usable visualization.  Possible resources on this include:
  • http://colororacle.org/resources/2007_JennyKelso_ColorDesign_hires.pdf
  • http://www.rgd-accessibledesign.com/wp-content/uploads/2010/11/RGD_AccessAbility_Handbook.pdf
In terms of visualize ifnomarion,in order to determine the possible graphics, ask:
  • What kind of information?
  • What is the message?
  • What's the relationship?
See graphic on phelpsresearch.com/SLA2014 that can help you decide on the correct graphic.

Another resource is "The Visual Display of Quantitative Infomation" by Edward Tufte.  He believes that you need to keep the clutter out of your visualization.

Visual analysis process:
  • Data/view specification
  • View manipulation
  • Process and provenance
Article - "Interactive Dynamics for Visual Analysis: A taxonomy of tools..."

Wednesday, December 18, 2013

Using Big Data for Library Advocacy (webinar recording)

Erin Bartolo
Yesterday, Dec. 17, Erin Bartolo and I did a one-hour webinar entitled "Using Big Data for Library Advocacy."  This webinar was based on the presentation that we did at the New York Library Association Annual Conference in September.   A recording of the sessions is available on this page, which also contains a link to our handout.  Since this was so what we did at NYLA, I'm placing below the slides from NYLA.



One question that we did not receive was about how libraries are currently using big data/data science. I know from the NMC webinar that we did that we don't have good library examples yet, because we (libraries/librarians) are just thinking about how to use data science in our work.  I expect that those examples will come, as we begin using big data to help us with assessment and advocacy.  For now, we need to talk about what is possible and get people interested in using these techniques, which are already widely used in business.

Monday, December 02, 2013

NMC On the Horizon > Big Data (webinar recording)

In November, I had the honor of participating in the New Media Consortium webinar on big data.  The event was recorded and is now available through YouTube. Information on all of the presenters is available on the NMC web site.  Thanks to Dr. Ruben Puentedura for moderating the event and to the NMC staff for their coordination.




This webinar used the Google+ On Air platform and was broadcasted live on YouTube. For me, it was very interesting to do a webinar in this way.  For example, how do you interrupt or get the attention of the moderator?  (Obviously, waving doesn't work!)  I'm used to other platforms that have a bit more functionality, yet I have to admit that this worked amazingly well. 

Wednesday, February 27, 2013

Visualization

This past week in class, the conversation turned to information that is "born digital" and how it can be analyzed and used.  I attended New York Data Week last October, so my mind quickly thought of the visualizations that I saw.  Yes, when you get data into digital form - whether it is born digital or digitized - you can then analyze and display the information.  If you have not viewed any visualizations of large/massive data sets, take a look at these.

Wind Map, http://hint.fm/wind/

Esty loves NYC, http://www.etsy.com/nyc-data-week


Foursquare Check-ins in NYC During Hurricane Sandy,
http://blog.foursquare.com/2012/11/05/a-time-lapse-of-foursquare-activity-in-nyc-during-sandy-plus-a-simple-way-to-help-every-time-you-check-in/


Foursquare Check-ins in NYC During Hurricane Sandy from Foursquare on Vimeo.