The Next Major Challenge in Records Management is Already Here: Social Media

The Next Major Challenge in Records Management is Already Here: Social Media

August 19, 2019 0 By Kailee Schamberger


[ Silence ]>>Welcome everyone to tonight’s MARA Colloquim. The topic is The Next Major
Challenge in Records Management. And yet, it’s on Facebook. Our speaker is Anil Chawla. He’s the founder and the CEO of
ArchiveSocial, and tonight’s guest lecturer. I will speak about two very
important topics, social media and also records and information management. But I want the other people, including myself,
who do a lot of talking about those subjects, he actually works at solutions to the challenges
that they follow, and now is the founder and CEO of ArchiveSocial, a social media
archiving company that enables businesses to automate the management of content
created through or posted to social networks such as Facebook, Twitter and LinkedIn. And I’m sure you’re all on the three of those. During his presentation, he will
examine a number of approaches taken by organizations to address these challenges. And he will provide examples of social
media records management in practice today. So, I’m not going to take anymore
of our valuable time this evening. I’m going to introduce to
you our speaker, Anil Chawla. Anil, you could take the mic.>>Hi Pat, and good evening everyone. Thank you so much for the introduction as
well as the invitation to speak here tonight. We often get a lot of interest
when we speak just because social media itself
is such an intriguing topic. But I don’t often get the chance to talk
about this side of social media in front of a room full of records management
professionals, so this particularly exciting for me and it’s a real privilege. And, I thought I mentioned that I do want
to keep it somewhat informal in a sense that I will take opportunities
throughout this presentation to pause, ask for comments in the chat room, and if we
can make it work, even have some folks speak up. Now, we could– I put this up
here for a couple of reasons. One, I wanted to make sure that you all have
a way to contact me if you have questions or if you like to discuss anything
[inaudible] after this presentation. I also wanted to give you a little
bit of an overview on my background. So essentially, I have a
background in computer science. I’ve been a software developer most of my life. And a few years back, as I was creating
various social media applications, I stumbled upon this problem that many,
many businesses are starting to face. And it’s a problem around, of course, records
management of retaining important conversations and communications from all the social networks. I say that, though, to point out that I have
neither a degree on archival science or any kind of training in records management. And I want to make you aware
of that for two key reasons. The first of that, I’m likely to butcher some
of terminology and not get the lingo right, so I ask you to forgive me in advance. And the second is to really point out that
when it comes to true records management, all of you are actually the experts. We at ArchiveSocial are trying to
do what makes sense to us and apply that to real world problems
and acquire customers and based on customer feedback, continue to evolve. But I’m very interested in hearing from
folks that are in this discipline in terms of how we can– maybe close any
loop holes in what we’re doing, and really tie in records
management and social media into records management as
a discipline as a whole. [ Pause ] So with that, I think it’s important to
first start with why we’re even talking about keeping social media records. And I don’t think I need to
really explain to all of you that social media has become an
important business communication. It’s obvious looking around that businesses of
all types and sizes have adopted social media for a variety of communication
including customer service, recruiting, obviously marketing and advertising,
business partnerships and so forth. But I thought this was an interesting stat. This is from about two weeks ago
and it really stuck out to me. So if you read it, basically,
what J.D. Power found was that, two-thirds of the time a consumer
goes to a company social media site, they’re not going there because
they were drawn in by the marketing or to engage with some marketing campaign. Two-thirds of the time, they’re
going to that company’s site because they’re looking for
help or customer service. And that’s the really eye opening
figure, I think, to start with. [ Pause ] Now, this is a slide that I pulled from one
of our side decks that are [inaudible] uses and it just goes to show that, for
example, universities, as all of you are in, so in governments, if any of you as
citizens are engaging with the government, you probably notice that many of these
organizations are now leveraging social media for very important disclosures of information. You know, photos of wanted
suspects, details and so forth. We’ve actually, in our engagement
with customers, occasionally come across the situation in which
the law enforcement division will put a posting on Facebook with information and a photo about
a suspect, and a citizen will actually respond to that Facebook post with some information. [Inaudible] it’s smart for that citizen to kind
to put their name on there or not, it happens and maybe the citizen or the law
enforcement agency want to clean that up. But the management of that record, that’s
really important because ultimately that information is the kind of
information that used to come in through hotline or some other means. [ Pause ] Now, what’s interesting about this
topic and we really kicked this off by saying the next major challenges in
records management, it’s a major challenge because people are finally coming around
[inaudible] now to try to address it. But it’s not that folks haven’t
been thinking about this. If you look at some of this postings that date
back in 2009-2010, discussing the challenge that various organizations are facing around
getting their hands on these communications and storing them for whatever
purposes of that business or entity and you can store those communications. We have seen a variety of approaches
taken to address this issue which I will talk to you in this presentation. But it’s something that people
are still grappling with today. In fact, the federal government has put out
various directives around records management in a long timeframe for the
agency to satisfy that. I think something like 2019 is the [inaudible]
date for that to– all that to be in place. But, that said, that really encompasses
all types of electronic communications and federal government have
taken stabs at social media in solving that social media problem. And in fact, last year, they put together
a council, the Federal Records Council, and they made it one of the top priorities
to solve social media records management. And the report is really– it’s really
surprising because at the end of the day, they come back and one of their top
recommendations is for federal agencies to copy and paste Twitter tweets, Facebook post into a
Word document and store that on a shared folder. So in many ways, that still
is a state of the art. And so, these organizations are
struggling with a better way doing this. [ Pause ] Now, I got a couple of overarching themes
here on the slide of why you keep records. Business compliance is obviously one of the
biggest driver, especially since, you know, the last five, six years, we put email
in markets like financial services and healthcare and legal and so forth. So compliance, definitely a driver. SCC [inaudible] around both
in financial services, SCC also applies to any kind
of publicly-traded company. FOIA is Freedom of Information Act
which applies to the government on all levels including your
cities, your counties, your state agencies, and of
course the federal level. Litigation, any discovery, of course, is a
concern for any kind of mid to large business. If they get in a situation where they need
to produce those records, they maybe required to produce those records assuming that
they should be able to get them in some way and if they can’t, that can be
disastrous probably [inaudible]. And lastly, digital preservation. So as we engage organizations
throughout United States, and particularly in the government sector,
we continually coming across archivists and record managers that are also interested
in the actual value of that content and how it reflects on society and
culture whether that’s at a university or within a state, or across
like the country as a whole. Now, I’m going to pause for just
a second as I take a sip of tea. I wanted to see in the chat room if anyone
else had any other overarching themes that they could add in here in terms
of why keeping records is important. [ Pause ] Okay, so I think we have three very general
topics and those cases cover the use cases, but if someone comes out with something,
I’m happy to hear it and I’ll go ahead and move on and take the next slide here. And talk a little bit about why this is
[inaudible] a big topic, why is this so hard? So if you think about a social media, the
reasons you want to keep at least some portion of your social media communication are the
same reasons why you want to keep e-mails, important e-mails or records of a website that
are reflective of your business or, frankly, paper document that are important
to your business. The problem in keeping social media and
capturing it and preserving it, in many ways, it’s far more difficult that any of
those other forms of communications. So I’ve listed them all up here but I want
to talk to them just for a few minutes. To start with, we all know social
media is very dynamic in real time. Now, that’s not to say that an e-mail
can’t come in throughout the day or websites can’t be changed constantly, but
social media really is [inaudible] different in terms of how frequently
communications happen. And largely, that’s a fact. That’s a result to the fact that
social media is often public. It’s part of these social networks
which have high viral factor to them. And so, you have this situation where
you’re communicating but, frankly, you know, tens of million of people could be
seeing that and communicating back. And so, it creates an environment where
communications are happening very, very quickly. Social networks also tend to encourage short
communications which with communication, when that text box is smaller, it
becomes easier for people to contribute, and so there’s more involvement as well. The reason why this is important
is that you can’t take an approach to social media that’s [inaudible] a website
where you’re capturing it every few months or even every week, by then, it maybe too late. You really have to figure out how can I
capture social media as continually as possible to account for [inaudible] major. The next level– next bullet point
on here is continuous heterogeneous. And this is really important. If you think about e-mail,
it’s a long standardized format and it’s a rather simple type of
format, same thing with websites. The HTML standard is really mature. It continues to get updates but it
does take a while and is one standard. But, if you look at social media, a Twitter tweet is completely
different from a Facebook photo. It’s completely different
from a LinkedIn profile. These are not standardized. These are formats entirely proprietary created
by these companies and they continue to evolve. So tomorrow, Facebook could change the way
they represent a wall post and something to– there’s no standard body
that have to approve that. There’s no [inaudible], it just happens. And that’s a really big concern that I think
often time in our industry is overlooked. We have competitors that wanted, you know– just the general folks want to think this is a
simple problem, let’s just treat social media like e-mail, and it’s really not that simple. The middle bullet point here is
one of the more obvious ones. But, again, it creates a
whole host of challenges. These communications never have
to pass through your network. You can post something on Facebook on
behalf of your company by using your iPhone. Even worst, the people that then responds
to your Facebook post or comment on it, those communications may never pass through
your internet, they go straight to Facebook. So these are communications hosted
[inaudible] on third party servers. Getting through this data can be a challenge. You have to actually go out and get it. It’s not like e-mail where
all of the e-mail comes in to your e-mail server and just sits there. And in order to archive e-mail, you just reach into the e-mail server and
you just take a copy of it. You can’t really do the same thing
with social media to go out and get it. Now, part of this os the fact that, you
know, a part of this creates a problem with IT organizations because IT no longer
has the ability to provision access. Some IT department can’t decide
who has an e-mail address. And even worst, IT can’t decide
that when they want to get to someone’s e-mail, they can just go do it. Like they do with e-mail,
they can just go do it. They can’t do that with social media. If they want to get to someone’s Facebook data,
IT or anyone in the company may not be able to just go to that Facebook account because
it may have been created by an employee. And therefore, you have to work through
that employee if that employee cooperates to be able to get access to that account. And [inaudible] the fact that it’s distributed. So now, what we’re seeing is if you go to– I’m sure San Jose State has this
problem as every university does. There are literally dozens, if not, hundreds
of Facebook and Twitter accounts being created that are branded for university without
the university blessing, you know, giving their affirmation of that. So for example, we work at NC State University
here in at North Carolina and I walked in and spoke to the chief communications
officer about a year ago. And I said to him, “How many Facebook
and Twitter accounts do you think are out there representing NC State
using the brand of your university?” And the chief communications
officer said, “Maybe 90.” I was not able to open up my laptop and show
him how he had collected more than 130 Facebook and Twitter accounts representing their brand. And of course, this is a huge problem in
the communications department eyes because, for the first time ever, they don’t have
the over sighting control that the ha– they used to have in terms of communications
that are going out on behalf of the business. And this happens everywhere. And this happens in the government. Especially the fact the government
[inaudible] distributed people across the city and they create Facebook
accounts for a variety of things, Twitter accounts for a variety of events. It happens in large businesses and small
businesses too where an employee simply has to go on one of these social networks and
in three or four clicks can create a profile that in someway maybe branded
representing that organization. So I see some questions. I want to stop real quick and
the one question I see here from Jenny is can they take legal
action against [inaudible] accounts. Jenny, is that in regards to employees creating
account representing the organization brands without organization knowing? Yeah, so that’s a great question. I’m not, by any means, an attorney, but I did
present on this topic recently with an attorney and it’s a very, very tricky issue. There’s actually a case where– this
PhoneDog case where a guy– his name– he had a Twitter account for company
called PhoneDog, his name is Noah. So he created this created
account called PhoneDog Noah. And as he works for that company
servicing clients, he build up– built up a pretty large Twitter following. I think something in the order of
18 to 20,000 Twitter followers. Noah then decided to leave the company
and he took that Twitter handle with him. Of course, he renamed it from PhoneDog Noah to
his actual name which I forget at the moment. And the company at that time was fine with
this, they said, “Okay, you know what? If you don’t mind– okay, usually just
responding the tweets that are come in your way, that’s fine, just take your account.” But several months on the road, the
company showed up with a lawsuit that said, “You have an important customer
list of ours and we want that back. And we think every follower that you
took with you is a worth a dollar, fifty. And since you had 20,000 followers at a dollar,
fifty a piece for the last eight months, you now owe us X hundreds
of thousands of dollars.” So this is the case that’s out there. I don’t think it’s been settled
because it’s so gray. What the company should have done is really,
you know– a lot of times, comes out a policy. The company should have, out front, created a social media policy saying any
social media account created for usage engaging with our customers and performing
business activity and, particularly, if it looks like [inaudible] brand
is property of the company, period. Even if you’re managing it today,
it’s property of the company. And if they would done that out front, they
would’ve saved themselves a whole lot of grief. And if I’ll add up to that, when people– you
know, sometimes it’s not even the employee. Sometimes, it’s just somebody random. So somebody at University
of North Carolina might come and create an account representing NC State
as rivals, and it gets really, really tricky. I think if you talk– we talk to our customers,
[inaudible] talk to anyone in terms of trying to go down the processes of claiming trademark
infringement and getting that account shutdown and claiming it back, it’s just hassle. So we always encourage folks to
think about these issues out front and put that on social media policy. And– thanks for that question. Good question. And finally, it’s distributed [inaudible]
employees [inaudible] you control because all sorts of format–
and it’s really dynamic. But let’s say you solve all of this and
you’re able to store the data, what now? Well, the whole reason you
stored it was to get it back out. And today’s systems are not well designed to get
social media back out the way that make sense. That really accommodate to the fact that social
media is not just real time and heterogeneous, but that it’s also very rich interactive. So if you do a search, you find a
Facebook post, you probably you want to get all the comments on that Facebook post. If you search for a photo, you
want all the text and comments, so you run that photo as
well as the photo itself. So that creates a bit of a challenge as well. [ Pause ] Okay, so I’m going to, again, take a
moment here and give this another shot. So I know that in the bar on the left
side, you’ve got this check mark icon. And I want to ask some questions
if I could do a quick poll here. Who in our audience has ever tried or worked
with any type of technology or application or solution to capture social media records? Give me each agree– check if you’ve
done it and an X if you have not. All right, so we got one, two that have– and
those of you, if could in the chat room, now, can you put just a brief statement
of what type of technology you use or what approach you use
[inaudible] those manual capture or sometimes archiving products
or any other technology. No volunteers, okay. That’s all right. Well, let’s go ahead and–
okay, Lisa’s typing, all right. [Inaudible] document also
use [inaudible], great. Appreciate that Lisa. [ Pause ] So those are– you know, it’s good to hear
that for your library, you’re taking staffs to at least preserve this information in
someway because if you don’t do anything, it does become very [inaudible]
and hard to reproduce. We take the talk off here, so
Lisa, you want to speak up?>>Oh, no. I just want to– I think it’d be
easier for me to speak and type. So, thank you very much for the question. I just wanted to add that my organization which
is Royal Bank of Canada have recently done RSP for exactly this, this topic in capturing social
media and you asked if anyone had done that and we were looking at– I don’t want to–
I’ll just name them, Hearsay, Actiance, Smarsh. These are all tools that according to my
understanding can give a sort of capturing of external social media
sites and for compliance and records management and requirements. So, I will end that there only ’cause
you asked a question, so thank you.>>Well, thank you Lisa so much for
speaking up and making that contribution. And I will definitely try to address
as unbiased as possible [laughs]– I mean, as unbiased fashion as possible
as we go to some of these slides. Those are– some of those
folks are our competitors but there are definitely different
approaches to this problem, and every approach has some
advantages and some disadvantages. So, glad to hear that you have some
context in the types of solutions out there and I’ll definitely try to spend some– a
moment or two digging into what those types of solutions do and what they do well and some– where is the gaps including with
solutions like ours as well. [ Pause ] So let’s start. And this is really a buildup and I’ll
be very, very transparent with you. This is a buildup and how we approach
this problem in ArchiveSocial in terms of what are the ways that people
can start to address this issue. Why is that a good way to do it? Why is it a bad way to do it? And where– if there are
holes, what’s the next step? What the next thing that we
can try to fill that gap? So starting off, this, I would say is
actually not a solution and hopefully all of you recognize that, relying on the
social networks to keep your data forever. And I think we– someone in the chat
room mentioned [inaudible] which is great because it does capture information
to some extent. Though, a servicing the social networks
themselves are not a record keeping– place for keeping records,
they are not an archive. And for the simple matter that
anything can be deleted on Twitter and Facebook today and can
never be accessed again. So, by definition, that would fill a
number of [inaudible] requirements. In fact, interesting enough, I’m
sure many of you saw the news– I think it was December that Twitter came out
and said we’re launching the Twitter archive. So, every Twitter user can now– or should
be able to soon go to their Twitter profile and download their entire Twitter archive. And I’m saying “archive”, I’m using quote
quotation marks with it here in my chair. And the reason why is that that
archive is not really an archive. What Twitter is allowing you to do is, at any
point in time, go and download all of the tweets on your public timeline that
are still in Twitter. So rather than an archive,
it’s more of a snapshot, right? So anything that you deleted from your public
timeline is not going to be in that download. It’s not going to be in that
archive, as they call. So it’s definitely not a good
idea to rely in social networks for that one reason [inaudible]
that data can be deleted. But as we’ve seen with Twitter and the
next social network, they [inaudible]. There are often limitations to getting
back data even it hasn’t been deleted. So it’s just being able to get
three, four, eight, 12 years, that [inaudible] your social media history. It’s really hard to trust that they’ll be able
to do that because these companies are not in the business of keeping your data
forever or keeping your data forever in a legally sound fashion
that most businesses require. And then just to touch on
[inaudible] real quick. The analogy I tend to make there is these
Twitter clients that you use, to some extent, are keeping it cache-free and that’s great, but
you have to think about these clients very much like Outlook, your e-mail client. Your Outlook interface or your e-mail client
on your Smartphone is caching with information but is no– but, again, no means, a records
management solution because it’s just temporary. You would never rely on Outlook or
your Smartphone as your e-mail archive. And so, some of those stop gap instead of them
doing nothing, but it’s important to be aware of all the limitations of using
the client as a record store. [ Pause ] So, where do we go from here? And again, this is another– this is something– all the things that I’m going walk you
here are solutions, I’ll call them, that we get from our– that we get from talking
to costumers and people in the industry, and I really want to address them one by one. So again, in my view, this is not really
a solution for records management. It’s sort of avoiding a problem. It’s kind of like saying, “We just don’t
use social media at all,” but there is a lot of confusion, I think, with folks out
there that are using social networks who aren’t necessarily savvy enough to
understand exactly how all these networks work. So we’ll hear a lot of times from respective
costumers that they are not worried about keeping social media records
because they’re using social media in one way and they’re disabling comments. But I take a screenshot from Facebook just
as an example for and this really rings true for virtually all the networks, is that you really can’t turn off the
two way conversation on social media. So the if you post on Facebook, you maybe
able to turn off the ability for other folks to also post on your wall, but you can’t
turn off the ability for them to comment on what you’ve already put out there,
and those comments could be a value. Likewise, you may receive private messages
on both Facebook as well as Twitter. Twitter, you may receive mentions. And a mention, given that it’s
addressed to someone, that add symbol, very well can be perceived as a
message received, [inaudible] receive. And so, there’s no way to this off. And so, we like to make it very clear to folks
we talk to that this may sound like a solution at a high level, but in practice,
it doesn’t work. It doesn’t really make sense
in the context of social media. Copy and paste, I actually just did this to see
how it work and to get screenshot out of it. Again, you know, doing something
is far better than nothing and this is obviously very cost
effective from a purchasing standpoint, it’s not very cost effective
from a human resource standpoint. But [inaudible] do, just copy
and paste and put it somewhere. And from a legal situation, again,
this is not ideal by any means. So if you were to go to court, first question
would be, well, how do I know you just– you didn’t just type this up
in Word today and edit it. There’s really nothing here that proves
the authenticity of it and that ensures that it is a content that
you supposedly captured. But as you saw, the federal government
thought this was a key recommendation. So a lot of these boils down
as state of the art. So today, this may actually fly. In turn situations, you may go to court
and be able to show this to someone and maybe the court would weigh this against
the other evidence and say, “All right, well, at least you kept some record and I trust
it,” but there’s obviously a lot of holes with this because it’s so easy to falsify. The picture to next [inaudible]. So rather than copy and pasting into Word and
crossing your fingers to that Word or whatever, Word Processor will preserve
the content of what you want. So you could just take a screenshot and
we often talk to perspective costumers where they have somebody going
through their social networks on a– say, on monthly basis or a
quarterly basis taking screenshots. We also see them not take screenshots
proactively, but if, for example, a government organization or a brand see
something very offensive and obscene show up on there site, they may want to delete
that right away but before they delete it, they’ll take a screenshot so they
have at least some kind of record. So again, given the status of technology
and the understanding in the industry today, this is definitely better than nothing. Again, just like being able to edit
a Word document, it’s not hard at all for anyone to Photoshop something. So if it comes down to it and you really
need to be able to prove that this it real, you’re not going to be able to that
because it’s too easy to falsify. But it may give a little bit of an edge if
you have some kind of record versus nothing. We actually saw a case a couple
months back with the city of Honolulu where the police department there was getting
post on their Facebook from gun advocacy group, especially this gun advocacy group was posting
things like you should carry your own guns because you can’t rely on
the police to defend you. And that was– whatever your political
views are or views on gun rights, you can clearly see why the police department
didn’t like that being on their page. But what they did was they
simply deleted that content. And now, a lawsuit has been filed by
this gun advocacy group claiming First Amendment violation. [Inaudible] go to court and in court,
of course, the first question– one of the first question the
court will have is, “Well, can you show me the content you deleted?” So if Honolulu– I don’t know whether or
not they did anything to keep these records, but if they had a screenshot, they
maybe able to at least [inaudible] on context for why they did what they did. But if they don’t a screenshot, then there’s
a chance that that could actually backfire against them because that’s evidence
that’s important and they got a claim and they can’t produce it, and so the jury
or the judge can then maybe incline just to assume the evidence would
have been in their favor. [ Pause ] So what about storing the original source? Definitely, some of you have heard this
before and this seems to be a common practice from that I’ve gained in the records
management world that the content of record obviously is what’s most
important and– not the physical form. So if you can store a record in at least one of
its forms, [inaudible] not really [inaudible] to work and the other various forms. And that makes sense to me. So if you, for example, issue a press
release and then maybe you have a tweet that very simply link to that press
release, you may not need to keep that tweet because the real context of that tweet
was already captured on press release and you’re only keeping a copy
of that press release somewhere. However, this really gets back to the whole one
way versus two way and what is social media. When you put this content out on social media,
you may posting it and keeping one record of it but there maybe an entire
conversation that forms around it. And you may– that conversation’s comments
[inaudible] may actually be of importance. And so, even if you don’t necessarily need to
keep your post, you probably need to thinking about keeping the replies and the comments
coming back in regards to your post. There’s also some [inaudible] here. For example, if you posted it on Facebook
as many of you know, you may post a link and it’s probably [inaudible] derive the content
out of that link, so they’ll show a thumbnail, they’ll show a title and some description
and, again, that maybe of importance and it may not being kept everywhere, so
it’s just important to keep that in mind. [ Pause ] I’m just looking at the question
in the chat room right now, makes me wonder if there’s any
place for a personal organization to capture their own commentary
on someone else’s timeline. That actually is a very– thanks for asking. That’s actually a very difficult
technical problem. So today, I could go post on one of the
hundreds of millions of Facebook pages out there and Facebook [inaudible] doesn’t provide a
mechanism for some technology to go figure out where all [inaudible] post to. You can go look at my own wall
but you can’t easily figure out where other people are commenting. But that’s something that we’ve had with
Facebook that’s just a technology limitation. I’m trying to think if that’s an issue with
all networks, but what I think this raises is that there are gaps, no matter who
has the best technology on the market, no solution is 100 percent perfect. And the way we explain this
to clients is, you know, [inaudible] is not a zero percent or a 100 game. You’re not either completely
risky or have no risk at all. But what record keeping and
archiving can do for you is get you as close to zero risk as possible. Well, that definitely is one of the gaps. Good question. [ Pause ] All right, so let’s look at
some real technology here. One of the most prominent ways that folks have
been dealing with this problem outside of, I would say, financial services
and some of the regulated– really highly regulated markets but especially
in the government and education and I think for many of the brands is
using website archiving. So the concept here is that website
archiving today can capture live pages. Facebook.com is a webpage. Twitter.com is a set of web pages, so
we can just archive those web pages. And therefore, keep social media records. And I think this to start with is one of
the first real solutions to this problem. If your website archiving technology is good
enough and that’s what you have in your hands and maybe you’re using a
[inaudible], it may make sense just to start pointing that to
your social media sites. Now, when I say is it good
enough, that’s an important point. We see technology like Archive-It
for example which is a non-profit, really great organization,
but we’re seeing folks try to apply Archive-It to social media archiving. We’ve seen a number of vendors
out there that are also for-profit that has similar type of
technology to Archive-It. And the real issue here is that website
archiving is all about capturing a website, looking at all the links, following those links,
and then capturing what those links take you to. And so, you typically set up a seed, first
cases to start on, and then you let it crawl, crawl those links to get as much content
as you’re willing for it to run for. So you might say, “We’ll
crawl it to for LinkedIn.” One problem with that is that with social
media, if a comment shows up on a Facebook post, let’s say three years old, you may not
crawl far enough to get back to that. And that can happen. At anytime a social media post from
the past can blow up and become viral, and becomes very important from a
business record keeping context. And Facebook may prioritize
that to the top, but it may not. And so, your crawling is a limiting
factor in how far back can it go. But the more fundamental issue with
website archiving is that technology that just follows links gets really stuck
when it comes to today’s social media sites. So I’m sure all of you familiar with Facebook. If you go on Facebook, the
links are no longer links. When you want to see more posts, you just scroll
down and out of magic, 10 more posts show up. When you see a link for view more comments
or view previous comments, you click on that, that’s not a link that takes you somewhere,
it magically loads a whole set of comments. And so, web archiving technology which is
built to follow links now gets confused, rather than following that link, what really
needs to do is start to interpret this– the scripting that’s happening on these sites. And for the most part, this
is a really hard problem. We haven’t seen– aside from
one vendor that I’ve seen that, I haven’t seen any web archiving
tool handle this well. And there is one vendor that I
think does this better than others. But it is a very hard problem. And the moment that Facebook changes
their scripts, it’s brittle and it breaks. And so with that, I switch vendor and
so I’m happy to give up a plug to one of the vendors, it’s PageFreezer. And you will see PageFreezer,
it’s a competitor of ours, they’re a social media archiving
solution as well. So they have the best technology that I’ve
seen out of the products that I looked at and I haven’t necessarily listed
all of them, but I’ve seen them able to capture the dynamic behavior
of a website and replay it back. Now, that said, aside from having to crawl, there’s another major problem
with website archiving. Capturing the HTML from Facebook.com, at least
in my opinion, and I think I can convince to you, many people that this is
not capturing the real record. To me, using website archiving to capture
social media– I like to look at the analogy. Would you use the website
archiving to capture e-mail? Would you point Archive-It or
PageFreezer or any other tools out there, Gmail.com or Jotmail.com and
call that e-mail archiving? And you absolutely wouldn’t, right? Because what you’re doing is you’re capturing
a rendering of that content in HTML form. But, just like e-mails, social media
not only shows up on Facebook.com, but it is shows up on your iPad and
your iPhone and your BlackBerry, and all sorts of other Android devices. And in each of those cases, it
maybe a little bit different. And so, in my view, that’s not the real record. It’s a rendering of the record, but
it’s missing the actual metadata. It’s missing what’s underneath that HTML. So if you got a Facebook.com and you
view source, you’ll see a bunch of HTML for a Facebook post, but you see all of the
metadata that is associated with that content, just like if you were to go to
Gmail.com and look at view source, you’re not going to see the XMPP
message format of the e-mail. You’re going to see the HTML rendering of it. You have to click on a different button in Gmail to see the actual, original
source of the message. That, you know, is something
to really keep in mind. Website archiving can take you– take you into
the realm of automated [inaudible] content and, to some degree, solve the problem,
but you have to be aware of the kind of records you’re keeping at that point. And then doing a search is
another issue [inaudible]. So you caught website captures of Facebook.com
for the last five years and now you want to search for a single keyword,
but what it’s going to do is search across– yeah,
an entire webpage content. What if you wanted to search
just within Facebook photo posts? Well, you can’t because you’re
just searching websites. So there are some issues with that, but by
enlarge, this has been [inaudible] technology and in fact one of the organizations
that we’ve been able to work with– [inaudible] in this page. I wanted to point you to a
case study that’s out there. So, the State of North Carolina, I’m
not sure how many of you are familiar with the work they do at the State
Archives down there, but in many ways, they’re seen as style leaders
at least in the public sector. In fact, the Council of State Governments
just in August published this case study on how North Carolina has been using
web archiving to capture social media. So I encourage all of you that to
follow that link and check it out. Definitely, it’s very informative. Fortunately, for us, not to make too much
[inaudible] but we had also been engaging with the State of North Carolina
prior to this case study going live and really show them a different
way of capturing these records. And in December of 2012– actually, about
August, they switched to this new mode, but in December of 2012 we
did something really exciting which is we launched the world’s first fully
interactive social media archive that anyone around the world can search and poke around in. So that’s the link in the slide and with that,
thanks for sharing that in the chat room. Definitely encourage all of you to click on
that, run some searches and see how it works, and I love to hear your feedback. [ Pause ] So I named a number of issues and I kept
going on webpage archiving because, again, it’s capturing the rendering of that
content and I gave you an analogy of, well, you don’t only want to one use– you don’t
want to– you wouldn’t do that with e-mail, [inaudible] capture of Gmail.com
and probably, you know, archiving, you would go straight to the e-mail server. Well, that’s where social media
backup type tools came into play. And some of the first of
this came in back in 2008. Backupify is the one that you may have heard
of that came out as a consumer-oriented tool for just keeping your own social media history. [Inaudible] more than probably
to Google apps [inaudible]. They still have that consumer product
out there that all of you can try to sign up for free and check it out. But here are the posts from that. So one of the issues that we saw when
we were looking in at this page, well, what the social backup tools
are doing makes a lot of sense. Go to the network, go to Facebook
directly, go to Twitter directly, pull the real records from those APIs. Tell them frequently, don’t
wait two months to do this and run to crawl, but do
this as much as you can. And get that real data and store it. The problem we saw and what really pushed
us to continue to look in this page is that backup is very different from archiving. Backup is all about having a copy of
your data in case you need to restore it. In Backupify’s case with social
media, you only kind of restore it, but it’s just a way of having a copy. But there’s a big distinction there and in fact,
Backupify themselves wrote a [inaudible] post in how backup versus archiving [inaudible]
more where backup is not restore, but archiving is really about long-term
preservation of records and indexing records, being able to search them and produce them. And so this is really, I think for us
was that final stepping stone to say, “You can go to that [inaudible] get that
data, but how do you store that data and present it back in a way
that really fulfills the need of records management and
really serve as an archive?” [ Pause ] So I [inaudible] on those because that
really takes us to where we are today and what we believe is the state of the art– truly the state of the art
when it comes to technology. But it boils down to four important questions
when you look at vendors like us that have come out around social media archiving
and we’re not the only ones. To me, it really boils down
to four key characteristics. The first is frequency. Again, you don’t want to
run this every two weeks. Maybe not [inaudible] okay. But in our view, you want to
capture this even more frequently. You can’t get it real time
because technology doesn’t exist. You can’t get it the instant [inaudible]. But how soon can you capture it? So we’re looking at our archiving
especially for social media, frequency is definitely an important
factor because the larger that time window, the larger the risk that
that record will be lost. Comprehensiveness, again, you want to be able to capture this content regardless
of how it gets on Facebook.com. And there are variety of technology approaches
that are around how much data you can capture, but you want to be able to get
that data if it’s out there and you want to be able get as much of it. And again, every archiving tool
solution does this to a different degree. Authenticity is something that I
hammered on a little bit when it comes to screenshots and web archiving and so forth. Do you have the full metadata? In fact, what we see now is in
various areas, a lot of new discovery, metadata has been incredibly important. We’ve also seen it with State FOIA
laws, Freedom of Information laws. For example, the State of
Washington actually has it written in their public record law
that you need metadata. We’re hearing that some organizations
are worried now that [inaudible] out there will file lawsuit because
they know you’re not keeping a metadata. So you have to get as much
of that data as you can keep. But even more importantly, if [inaudible]
reason why you’re keeping this data in case you have to produce it at some point. And oftentimes, businesses are driven
to do this because of legal reasons. And so, you want to make
sure it holds up in court. So what are that solution doing to
preserve that authenticity and to prove that record hasn’t been– wasn’t typed up today. And finally, storing it again is great
but that’s only half the equation. When you produce it, context
is incredibly important. If you just search and you find a Facebook
comment, well, what post is that comment on? You know, what’s the full
context of that conversation? You’re storing photos, do
you have the full photo or do you have this tiny
thumbnail version of it? You know, so various types of social media
communications have it on data formats, but a lot of them are rich and it’s not good
enough just to throw them all the database. In fact, I’ll go back to
the backup screenshot here. You see status, status, inbox, comment,
status, status, status, comment, comment, and literally that’s all you get. So if you click on any of these comments, you have no idea what it relates
to and that’s a real problem. [ Pause ] So how do you make this possible? And I guess one thing I want to toss
with, with at least that storage– with asking the question
storage in the cloud, absolutely. The vast majority of vendors today are
storing social media data in the cloud. And the realization there is that this is cloud
data from birth, so it’s coming from the cloud that Facebook and Twitter
run into some other cloud. And that really, you know, important. Because of this, it allows vendors like us
to deploy this solution in a large company in half an hour or an hour because there’s
no installations, there’s nothing to do, you just log-in to your account, tell us
which Facebook and Twitter accounts you have and give us permission and we can right
then [inaudible] network to the cloud on Facebook and Twitter, pull that data in. And of course what you want to see in archiving
vendor is that, yes, it’s storing in the cloud but as long– but make sure that they’re
never reducing the security of privacy of that information ’cause they’re taking out of
Facebook, are they using encryption, you know, who can get to that database, you know, and
how is that data stored and is it stored– encrypted at [inaudible], is it
replicated so that won’t be lost. You want to make sure that when it’s moving
from one cloud to the other to a vendor that the vendor is doing everything
they can to follow cloud best practices. And finally, I’m getting on to the end of this. When it comes to social media
archiving, what we’re seeing today in the market are really
two technology approaches. The first is proxy. And this is I think the first
approach out of the gate just because the APIs were not
as mature at that time. So back in 2000 and 2009– 2008, 2009,
companies came out with this approach where you as a customer would deploy some kind of
network device or software that would sit under a network and listen in
[inaudible] network traffic. It’s basically a proxy. All of the traffic on the network process
through your black box and from there, you try to preserve the social media records. ‘Cause I think about the solution
is that it can capture anything. So if you’re the network and folks are using
Facebook, probably they’re using the wall or events or Facebook answers or whatever else,
the proxy should be able to pick up all of that because it’s traveling right through that box. The problem is although it can
pickup any type of content, it can’t pickup all instances
of the communications. So you maybe at work with this proxy, but you
may then just get on your Smartphone and say, “Well, I’m trying to post to Facebook straight
from my Smartphone using my 3G or LTE coverage,” and it never goes to the corporate network,
it never gets picked up by the proxy. And even worse, when people reply to
you and comment, of course those– that content never goes through your–
may never go through your internet and therefore your proxy is
not going to pick that up. So that is definitely a tradeoff. The API on the other hand
can get any type of content. So Facebook might say, “Well, you only can get to the answers communications just
’cause we don’t want to expose our API.” And so, any kind of API solution. Which ArchiveSocial [inaudible] API
solution is limited based on the type of content that the social network exposes. That said, it can– an API solution
doesn’t care how that content gets there. So if it shows up on Facebook or Twitter,
it’s on the API and we can fill it down. They no longer have to worry
about bring your own device, BYOD, I no longer have to worry about,
well, when people respond, how do I get that, it just happens. And so, the advice that we give is, yes, we
can capture every single size of Facebook, but here are the size of the Facebook that we do
capture, the main ones and in your organization, you know, use policy to enforce
that people are allowed to use. We are also seeing some folks
come out with proxy API hybrids. And again, these tend to be more enterprise
type solutions where you deploy something on your internal network, [inaudible]
should do to it, but then it uses the API to supplement what the proxy can’t capture. And finally, I’m going to try to leave–
hold on, let’s say, 10 minutes for question. This is a [inaudible] from our products. So if you click on the link,
that’s what you will see. I think this may also be in the upcoming
book, plug for [inaudible] upcoming book. I believe the screenshot
is the one that was used. And it’s actually, this is–
encourage you again just click around [inaudible] and provide feedback to me. What I want to show here was authenticity
capturing the underlying metadata underneath this record. So there’s a couple of sentence in the top. I mean, all that metadata is for that same post. Every comment individually would
have its own metadata as well. You know, we use visual sign techniques
and so forth to prove authenticity. And then on the context side, again,
this looks and feels like Facebook and you can actually see the content in line
when you see comment on the post they belong to. So this is the first that we tried take to try to address the issues around
context and authenticity. And with that, I appreciate you listening
to me talk for most of these 53 minutes. I have my contact information here on the
slide and I’m about to release the mic. So if any of you have questions,
I love to hear them. [ Pause ]>>If you have a question, just click
on the hand sign and raise your hand. You could also do the same thing
if you just like to make a comment. [ Pause ]>>So I hope I’m not violating protocol here. But I just see a question
here from Deidra [phonetic]. Is there a common standard for the metadata that
speaks to the authenticity of these materials? Again, no. The metadata that we saw on
that previous slide was the data that Facebook provides about this post. It’s in a format called JSON which is a
technical format for exchanging information. LinkedIn has its own XML for its own format. It sort of has its own. And there’s no standard because these
are essentially private companies that have created this whole
site and created these formats and this is how they decided to represent them. [ Pause ]>>Any other questions, [inaudible]
on the chat area or using the mic? I see [inaudible].>>Yeah. That’s a good question, was that would
the metadata only be available through the API? Yes. This metadata is only available. This is a data you– this is literally the
data you get back when you ask the API, give me this Facebook photo, the post. So, you know, if you were to go to the website,
there maybe some user IDs and some data embedded in HTML to support the rendering of it. But certainly, this data is
not available in any other way than talking directly to Facebook via its API. [ Pause ]>>Go ahead.>>Hi. Just a quick question about–
we’re all archivist and records managers, just about potential migration strategies as
technology changes, social media is so cool and changing and vibrant about,
you know, what the next great tool, what are your thoughts about, you
know, migrating this data as an archive to the next data or the next data format?>>Great question. One of the reasons that we focus on social media
specifically was to try to attack this problem of the fact that the data
format continue to change. So internally, we have an architecture that essentially versioning
all of these data formats. And so, if Facebook changes the
format of the wall post tomorrow, we can show you a new format in–
the new wall post in a new format and so preserve the old wall
post in the old format. That’s the unique architecture and
something that we have as an advantage because we started with social media. But kind of to take it beyond
ArchiveSocial, there is no standard and there’s no [inaudible] approach in
the industry to probably deal with it. So what we’ve done and–
largely, to be honest with you, we’ve done because we’re a small company and
people want to have confidence in our product is that we preserved that raw
metadata exactly like it is. And at any point in time, you have the
option to get at your entire archive. So you can click on two button– two
clicks, get a zip file that has a rendering of every record whether you
want it as a PDF or an HTML. But what’s more important is the raw
metadata for every comment, post, photo, so it comes straight out in a zip
file that you can walk away with and I’d really feed into some other system. And so, the belief there is that if you
were taking this data out of our system and to some maybe more standardized system
or the next vendor that you wanted to use, that if we gave you the full metadata untampered with that they would be able
to ingest that correctly. And I actually want to add
on that really quickly. One thing that I’m open to and I actually asked
Professor Frank [phonetic] is as archivists and records managers, what is the standard
today that’s out there that we as ArchiveSocial and the other vendors in this
industry should be looking at. Is there a sensor of your universe when it
comes to EDR-MF, if I have that correct, [inaudible] put through that, but a records
management central solution and a standard? If folks believe they have an opinion
on that, I’d definitely be interested in hearing what that standard might be.>>You say you work with electronic
records more than I but what, you know, seems to be the case is that there is guidance
for metadata as far as what should be captured. But we have no actual metadata that is required
or that is a standard for records management. What do you do at your place of employment?>>That one is a really great question. And I think what the courts would look for
and I think that there are ISO standards about how to migrate one format to a next. So I think if you did as best you’re possible
could about migrating from one format to a next according to ISO
standards, you’d be in a good shape. But I think that the courts and what, you know,
laws of evidence would say is that as long as you documented everything and had, you
know, well-documented policies and procedures on how you did that and what it looked like here
and what it looked like there and what you lost or what you didn’t lost, I think
you’d be in a really safe place. And I think as a– I know,
as a financial organization, that’s what we’re more concerned about,
but your point were in a brave new world. So, yes, it’s all new. [ Pause ]>>If you [inaudible] any
magic answers tonight from us.>>Not a problem. I’ve come to understand that there
are no magic answers, just progress.>>But I wish we had been making more progress
yet as we thought ISO 23081 part 3 was going to actually give us metadata that’s recommended
for record, but it didn’t go far enough. They couldn’t come to agreement,
the community that was working on it, so still got a problem there. I think Deidra has a question. [ Pause ] Deidra, can you tell me what
do you mean as far as metadata in diplomatic or is something else, archiving? [ Pause ] And now, what she means is in terms of applying
diplomatic term standards, no, not yet. What she’s referring to is the University
of British Columbia and I do work with a doctoral student there who is
actually doing her research on social media. And the couple of Canadian local government, what they’re doing is again just
the exploratory stage and– so, no. I don’t know of anything. They want an enterprise project for several
years that was supposed to get to the point of resulting in metadata and that too
did not develop what had been hoped. They’ve done wonderful work
but, no, we don’t have anything. It’s hard [inaudible] standards
to come out of that either. [ Pause ] Yeah, I see your slide too, just fine, it’s your
last one with Q and A. [Laughs] You’re okay. I think at this point, because we
are two minutes after the hour, I’m going to say thank you very much. I appreciate your presenting
to the students tonight. We’ll have our recording ready after
this for those who could not be here. And within a couple of weeks as quickly
as the right [inaudible] can get to it, we’ll have a webcast on the same page that
announced your social media presentation. So thank you very much everyone for
attending and thank you Anil for presenting. It was just very educational, very informative. I enjoyed it.>>Thank you so much for the invitation. I really love– my pleasure. So I appreciate it.>>Good night everyone. [ Silence ]