Hadoop: Your Partner in Crime

Hadoop: Your Partner in Crime

August 24th, 2012

Pre-crime? Pretty close…

If you have seen the futuristic movie Minority Report, you most likely have an idea of how many factors and decisions go into crime prevention. Yes, Pre-crime is an aspect of the future but even today it is clear that many social, economic, psychological, racial, and geographical circumstances must be thoroughly considered in order to make crime prediction even partially possible and accurate. The predictive analytics made possible with Apache Hadoop can significantly benefit this area of government security.

The essence of crime prevention is to understand and narrow down thousands of “what if” cases to a manageable and plausible handful of scenarios. Crime can happen anywhere and can be categorized as anything from cyber fraud to kidnapping, which provides a lot of combinations for possible misdemeanors or felonies. With the help of big data analytics, government agencies can zone in on certain areas, demographics, and age groups to pick out specific types of crimes and move towards decreasing the one trillion dollar annual cost of crime in the United States.

Zach Friend, a crime analyst for the Santa Cruz Police Department, explained that there aren’t enough cops on the streets due to insufficient funds. Not only that, but many police departments are still technologically behind in the crime-monitoring field, so big data analytics tools could be a huge step forward for police all over the country. Evidence and information about cases could be stored much more efficiently, police action could be more proactive, and crime awareness could be much more prevalent.

Who’s on the case?

The Crime and Corruption Observatory (created by the European company, FuturICT) is pushing for this kind of development and aims to predict the dynamics of criminal phenomena by running massive data mining and large-scale computer simulations. The Observatory is structured as a network that involves scientists from varying fields – “from cognitive and social science to criminology, from artificial intelligence to complexity science, from statistics to economics and psychology”.

This Observatory will be used through the framework of the developing Living Earth Simulator project – “a big data and supercomputing project that will attempt to uncover the underlying sociological and psychological laws that underpin human civilization.” The project, funded by the European Union, is an impressive advancement in technology, which will not only aid in pin pointing crime but will also effectively utilize the big data of today’s world.

PredPol has made predictive crime analytics available to police departments so that “pre-crime”, in a sense, could be put into action. Zach Friend explains, “We’re facing a situation where we have 30 percent more calls for service but 20 percent less staff than in the year 2000, and that is going to continue to be our reality. So we have to deploy our resources in a more effective way. This model does that.” PredPol allows law enforcement agencies to collect and organize data about crimes that have already happened and to use this data to predict future incidents in certain areas at a radius of 500 square foot blocks. It may not be the same as knowing the exact perpetrator, victim, and cause of the crime ahead of time as was possible in Minority Report but it is an impressive step towards perfecting crime prediction.

The Santa Cruz Police Department, which is using PredPol’s software, has already seen significant improvements in police work. SCPD began by locating areas of possible burglaries, battery, and assault and handing out maps of these areas to officers so they could patrol them. Since then, the department has seen a 19% decrease in these types of crimes.

PredPol software is able to make calculations about crimes based on previous times and locations of other incidents while cross-referencing these with criminal behavior and patterns. Here is an example of how large-scale this could get: George Mohler, a UCLA mathematician who was testing the effectiveness of PredPol, looked at 5,000 crimes which required 5,000! comparisons (i.e. 5,000 x 4,999 x 4,998…). With impressive results already materializing from calculations like these, it is exciting to think how much more accurate predictive crime analytics could become.

Hadoop lays down the law

With Apache Hadoop, perfecting crime prevention becomes an attainable goal. CTOlabs presented some very important points in a recent white paper about big data and law enforcement, showing how Hadoop could be beneficial to smaller police departments that don’t have very much financial leeway. The LAPD for example, is very well-funded and can afford to work with companies such as IBM to develop crime predicting techniques.

Smaller or less advanced departments, however, do not have the financial advantage to use supercomputers or extensive command centers and will use less efficient techniques (such as simple spreadsheets and homegrown databases) to keep track of all of the information involved in law enforcement. “Nationwide, agencies and departments have to reduce their resources and even their manpower but are expected to continue the trend of a decreasing crime rate. To do so requires better service with fewer resources.” Open source presents an extremely effective and less expensive option – Apache Hadoop is the super hero that can save the day, one cluster at a time.

With Hadoop’s capability to store and organize data, police departments can filter through unnecessary information in order to focus on the aspects of crime that are more important. By applying advanced analytics to historical crime patterns, weather trends, traffic sensor data, and a wealth of other sources, police can place patrol cops in areas with higher crime probability instead of evenly distributing man power throughout quiet anddangerous neighborhoods. This conserves money, effort, and time. Hadoop can also help organize a number of other factors such as police back up, calls for service, or screening for biases and confounding variables. Phone calls, videos, historical records, suspect profiles, or any other important information that is necessary for law agencies to keep for a long time can be systematized and referenced whenever need be.

Increasing public safety through effective use of technology is not a panacea but it is here and is an effective tool in combating crime. Apache Hadoop serves as a foundation for this new approach and, most importantly, it is accessible to a wider range of police departments all over the country and the world. Yes, predictive policing and crime prevention still have a lot of room for development and have yet to tackle issues like specific crimes that depend on interpersonal relationships or random events. However, it is all very possible, especially with the use of Hadoop as a predictive analytics platform. Crime can be stopped. No PreCogs necessary.


After Boston: Terrorism and the Technology Gap

The Boston Marathon bombing, subsequent manhunt and current investigation are unprecedented – not only due to the nature of the attack but because of how much information has been available to law enforcement, the public and the suspects. Unlike any previous large-scale attack, data came in at a staggering velocity within seconds of the twin explosions, yielding constant changes and misreporting, but also the timely apprehension of the suspects.

For all its evident successes, however, this “big data” event exposed many limitations in existing technologies, demonstrating the need for new capabilities and providing new collaborative opportunities for law enforcement and technology developers. This article is about the technological capabilities Boston demonstrated we need, rather than about the victims, the heroism of the Boston responders or cooperation among the agencies involved. Some of the technologies discussed below may already exist in some form, but still are not ideally suited to the needs of this kind of event.

Here’s my punch list:
Avid for Intel: The FBI and Boston Police Department (BPD) requested and received video and photos from witnesses to the blast and from private security cameras in the vicinity. Capabilities in this area are not ready for “prime time”. To my knowledge there is no image and video management system that can operate at scale to quickly to stitch together the various images in both space and time. The best seem to be in the entertainment industry – something akin to the Avid video editing and production suite – than the intelligence community. Each image from a smart phone carries telemetry data that can be used to orient it in space and time. Add hundreds or even thousands of those images together, taken from different vantage points and different times, and you have an amazingly detailed mosaic of the environment. Being able to ‘play it back’ to particular time stamps, say to see who put a package where, is an enormous challenge and opportunity. I suspect that Boston pulled off this feat with brute force but a technology solution to this type of image management capability seems to be in order. Similar ideas can be seen in movies, but these haven’t yet made the trip from the big screen to the real world. No matter how many video cameras a city installs, we should expect that there will be increasing amounts of consumer imagery and video available and we must develop the technology to harness it.

Complex Event Processing (CEP): It’s difficult to imagine the barrage of information flying at the Boston law enforcement team on April 15th: citizen tips, social media posts, 911 calls and forensic evidence to name a few. But I could imagine that the primary information management system was email, and that it wouldn’t take long in a rapidly evolving event such as this to be drowned in message traffic and miss key pieces of information. CEP is an idea typically found in machine automation, but automating alerts based on key events could ensure that the right message gets to the right people automatically. That might mean any small event in a key location (or a certain type of activity anywhere) generates an alert. In order for CEP to be effective for a rapidly evolving situation, it would require a very simple configuration interface and easy integration into data streams and messaging systems. In the next Boston-type event there will be no time to call a support contractor for help configuring rules; this has to be almost consumer-friendly out of the box.

Link Analysis: There is a critical need in rapidly unfolding situations to organize the information you have and tie it together in a way that allows you to tell a story or build a case. As the Boston authorities tried to figure out who the suspects were, little pieces of information came in all the time, answering critical questions like: How many suspects are there? Where do they live? Where do they work? How are they tied together? This is certainly the promise of link analysis software from vendors like Palantir, IBM/i2, Visual Analytics, Centrifuge and others. Unfortunately, without a room full of engineers from the vendor, customers don’t have the capability to use these tools rapidly enough and with the level of sophistication this type of event requires, and most agencies end up using these tools for a few simple activities and as basic drawing packages. The products, business models and capabilities destined for use in crises must evolve in order to make the kind of headway needed during a fast-moving event. Even a city the size of Boston doesn’t have the budget or the day-to-day need for the level of investment that would be required to have those capabilities using today’s solutions.

Geographic Information Systems (GIS): Every law enforcement and homeland security agency has GIS tools. But let’s face it: nobody can use them at the pace and level of complexity that Boston required. And that’s not Boston’s fault; it’s the tools’. Modern GIS systems are built on old software architectures to support geographers. But they need to be rebuilt for the velocity of social media data, for easy and rapid data entry, for simple analysis, and for quick information sharing and reporting. The needs of law enforcement to see the locations of detonations, devices that were discovered, suspect homes and other parts of the crime scenes and then correlate that data with reporting from social media, random tips and their own personnel was just out of reach. They had the tools and they knew how to use them, but the tools are not up to the task. Given the revolution in geo-enabled consumer apps such as Foursquare, Google Maps, Yelp and Find My iPhone, it’s disappointing that the professional tools are so lacking in capability.

Crowd Analytics: From the DARPA Challenge to the recent Intelligence Advanced Research Projects Activity (IARPA) crowd forecasting program, this has been a pretty hot topic for research. The FBI’s release of suspect photos proved that the crowd was able to identify the suspects better than facial recognition algorithms were apparently able to do against their drivers’ licenses and other publicly available photos. In addition to allowing witnesses who saw or knew the suspects to identify them, the crowd presents a massive computational reasoning capability with the entire Internet at its disposal. The crowd was able to find the suspects’ Russian-language social network VKontakte (VK), Twitter and other social media accounts faster than the government. Leveraging the crowd for search, translation, information dissemination and such bears much promise and much peril. More will be written, I’m sure, about the ill-fated reddit community attempt to analyze crime scene imagery, but make no mistake: a well-organized crowd can be a powerful tool.

Social Identity: Identity resolution and identity management capabilities are used every day by law enforcement and intelligence agencies. But these capabilities struggle with low-quality data sources. It’s one thing to find an identity match with a name, date of birth and social security number; it’s something else entirely when the name has multiple spellings and there’s no other good information. It’s particularly hard to find that person’s social media identity, perhaps the first place you’ll see their extreme views or other information that may provide additional leads or explanations of motives. And, in this case like many others, fraudulent websites are created as quickly as the event unfolds, further confusing the search for suspect identities. High quality but rapid social identity solutions are needed to understand a person’s identity when their official government identity is either unknown or insufficient. And these tools must not only be timely in order to have any value to law enforcement, they must also be accurate.

Social TTL: The concept of tagging, tracking, and locating (TTL) is well known in the intel and special operations communities. But, as we could see that one of the suspects was logging into his VK and Twitter accounts from his smart phone during the event, it exposed the need for a different kind of TTL. All of the technology capabilities to identify the user and track the location of his mobile phone exist, but were not readily available in a timely manner in Boston.

Phone Neutralization/Intercept: The explosive devices used during the marathon were apparently triggered with controllers from a radio-operated toy, but they first appeared to have been detonated by mobile calls or messages, as with many other attacks of this nature. After the suspects were identified there was concern that they possessed additional devices and that those devices could be remotely detonated using mobile phones as well. Along with the Social TTL idea, there is a need to either neutralize, intercept or exploit the mobile phones of the suspects. This would have been even more essential with more assailants or a protracted standoff. Products exist that would allow law enforcement to disable a phone from communicating on the network, track it precisely and even send it direct messages.

Digital Canvassing: Digital cameras and video were not the only sources of information available at the time of, or leading up to, the explosions. There was also a high volume of Tweets, Facebook updates, Yelp check-ins, Instagram posts and even YouTube uploads. One idea for identifying potential witnesses or suspects is to play back all of those time-stamped posts to determine who was in the vicinity, and when. Similar to deploying policemen to canvass a neighborhood, a digital canvass would allow investigators to review what was in the public social space that might yield clues.

Behavioral Markers: Every friend of the suspects interviewed by the media said that they were shocked by the attack. That their friends had been normal Americans but that something must have triggered a fundamental change. Each time there’s an event like Boston or Sandy Hook or the Gabrielle Giffords attack or the Aurora movie theater shooting, we seem surprised that these acts occurred, that we could only see the evidence after the fact. In reality, the behavioral ‘markers’ were there more often than not. But any attempt at analytical prevention or detection approaches quickly encroaches on the privacy and civil liberties of people with psychological disorders or those of a given race or chosen religion. In light of the potential to save many lives, we must have the courage to do responsible research on the behavioral markers of people who are mentally or ideologically capable of committing mass murder. We must address the root causes and find signals that we can detect in advance so that we can prevent these events from happening.

Smart Phones for Law Enforcement: Government, from the Pentagon to local police departments, have been slow to embrace smart phones. This mainly stems from a legitimate concern for protecting sensitive information, determining acceptable use, limiting the high cost of migrating to a new device – even from the uncertainty of choosing the right vendor. But it seems obvious that the Boston suspects had a real-time information advantage over those responsible for tracking them down. The smart phones the suspects carried would have allowed them to listen to the police scanners (I’m not sure they did, but I did – so they could have), tweet to their growing list of followers, monitor the news and call their mother. This “net centric warfare” provided a time and information advantage over the chain-of-command information flow to radios and outmoded Blackberry email devices. Equipping cops with smart phones, connected to some of the information sources described above, would tip the playing field back in favor of law enforcement.

Information Security: The International Association of Chiefs of Police (IACP) and others have reported recently that law enforcement’s use of social media is primarily to disseminate information rather than to monitor or engage. As former Homeland Security Secretary Michael Chertoff wrote in The Wall Street Journal recently, BPD did a fantastic job of using Twitter as an authoritative information source to quell rumors and enlist the public’s help. However, this event also showed the need to be able to control publicly available information that may be used by the adversary. I suspect BPD had forgotten or didn’t know that its police scanners with detailed operational information were being streamed over the Internet. The rapid flow of information that is easily accessed by even the simplest smart phone raises the stakes for information- and cyber-security during events like Boston.
To close, I welcome your ideas, your comments, your additions, and your opposing viewpoints. In such a dialogue lies a tremendous opportunity for refinement and innovation of the tools and products that support our public safety and intelligence agencies.
# # #

Disclaimer: These observations are made from a distance; I was not part of the Boston response nor do I have input on these technologies from anyone who was. Moreover, this is being written while the event is still unfolding and nothing has yet been published about the tools and technologies that were actually used during the event. These observations and opinions, and any errors, are my own.

Bryan Ware is the CTO of Haystax Technology, a new analytics company focused on the defense and intelligence sector. Mr. Ware was the co-Founder and chief technology strategist for Digital Sandbox until its acquisition by Haystax. His current work is focused on intelligence, law enforcement, and financial industry applications particularly in real-time analytics, social media intelligence, and mobility.

빅데이터 활용의 예 – 범죄 수사

20대女집서 성폭행하려고 콘센트 뽑았다가…
[온라인 중앙일보] 입력 2013.02.17 01:34 / 수정 2013.02.17 10:01

성폭행범 10년 만에 검거 … 과학 수사 어디까지 왔나

경찰 감식요원들이 가상의 화장실 살해사건 현장에서 루미놀 약품을 뿌린 뒤 용의자의 혈흔과 지문 등을 채취하는 훈련을 하고 있다. 조문규 기자

“중앙선데이, 오피니언 리더의 신문”

미제로 남을 뻔한 강력 사건들이 검찰과 경찰의 유전자(DNA) 정보 공유로 잇따라 해결되고 있다.

서울 광진경찰서는 2003년 4월 서울 화양동 주택가에서 20대 여성 2명을 성폭행하고 귀금속 등을 빼앗아 달아났던 송모(44)씨를 13일 붙잡았다. 10년의 공소시효 만료를 불과 66일 앞두고서였다.

경찰은 사건 직후 용의자의 DNA를 국립과학수사연구원(국과수)에 보냈지만 일치하는 자료가 없어 추적에 실패했다. 그러나 이후 마약 복용으로 복역했다 출소한 범인의 DNA 자료가 검찰에 남아 있어 이를 대조한 결과 10년 전 사건의 범인을 특정할 수 있었다.

검찰과 경찰은 미제 강력 사건의 해결을 위해 이처럼 DNA 정보 공유를 활성화하고, 조각지문 감식법 개발 등 과학수사 역량 확대에 총력을 기울이고 있지만 넘어야 할 산 또한 많다. 대표적인 게 검찰 따로, 국과수 따로인 유전자 분석자료 관리의 일원화다.

지난해 서울 중곡동 주부 살해사건의 범인 서진환이 현장 검증하는 모습. [뉴시스]

검·경 교차 검색으로 미제 사건 해결

지난해 8월 20일 서울 중곡동에서 30대 주부를 성폭행하려다 실패하자 살해한 서진환(43)은 성폭행 전과자다. 사건 13일 전에도 인근 면목동에서 또 다른 성폭행 범죄를 저질렀다. 경찰은 면목동 사건을 수사하면서 피해자의 몸에서 범인의 DNA를 확보해 국과수에 분석을 의뢰했다. 그러나 국과수가 보유한 DNA 정보 데이터베이스엔 범인의 DNA 자료가 없었다. 반면 검찰은 그의 DNA 정보를 갖고 있었다. 다른 성폭행 사건으로 수감 중인 그의 DNA 자료를 검찰이 확보해뒀지만 경찰과의 정보 공유가 되지 않아 미리 범인을 잡을 기회를 놓쳤던 것이다.

성범죄는 특성상 상습적인 경우가 많다. 따라서 성폭행 사건이 일어나면 동종 범죄 전과자를 조회하는 것이 수사의 기본이다. 그런데도 성범죄자의 DNA 정보를 검찰과 경찰이 따로 관리한 탓에 애꿎은 희생자를 만든 셈이 됐다.

DNA 분석정보의 ‘따로 보관’에 따른 폐해는 더 있다.

서울 강서경찰서는 2005년 8월 부녀자를 성폭행한 혐의로 A씨(57)를 지난해 10월 구속했다. 경찰이 범행 현장에서 채취한 A씨의 DNA는 국과수에서 보관해 왔다.

A씨는 2008년 7월 절도 혐의로 검찰에 구속됐다가 2011년 7월 여주교도소에서 출소했다. 이때까지 그의 성폭행 전력은 노출되지 않았지만 출소하면서 채취한 그의 DNA 자료가 검찰에 등록되면서 7년 전 성범죄 사실이 드러났다. 그러나 A씨는 이미 출소해서 자유의 몸이 된 상태. 그를 다시 붙잡느라 1년3개월의 시일이 또 소요됐다.

2010년 7월 DNA법(DNA 신원확인정보의 이용 및 보호에 관한 법률)이 시행되면서 검찰과 경찰은 살인, 강도, 강간 등 11개의 강력 범죄를 저지른 자에 대해 DNA 자료를 채취, 보관할 수 있게 됐다. 이 법 시행 이전에는 특정 사건 관련자나 용의자의 DNA를 채집해 관리했지만 데이터베이스로 구축해 체계적으로 관리하지는 않았다. 결국 DNA법 이후 검·경이 따로 DNA 채취를 하게 됐는데, 수형자의 DNA 자료는 대검찰청에서, 범죄 현장에서 채집한 경찰 수사 DNA 자료는 국과수에서 각각 보관 중이다. 검찰 관계자는 “검·경이 각자 보유한 용의자들의 DNA 자료는 풍부하다. 최근 경찰과 검찰이 서로의 DNA 자료를 교차 검색할 수 있게 되면서 성폭력 등 장기 미제사건 해결에 기여하고 있다”고 설명했다.

하지만 국과수 측 설명은 다르다. “서진환 사건 이후 검찰과 경찰은 DNA 정보 교류를 위해 노력하고 있지만 아무래도 각자 보관·관리하다 보니 효율성이 많이 떨어지는 것은 부인할 수 없다”고 말했다.

국과수, 살인사건 피의자 국적 정확히 맞춰

지난 1월 박영선(민주통합당) 의원과 김희정(새누리당) 의원이 각각 DNA법 개정안을 대표발의했다. 현재의 이원화된 관리에서 파생되는 비효율을 제거하자는 뜻에서다. 박 의원은 “2010년 7월 법무부가 DNA법을 제정하면서 국과수에서 하던 DNA 신원확인 정보의 사무 관장을 검찰과 국과수로 나눴다. 그 결과 효율성이 떨어지고, 비용만 증가해 예산 낭비를 초래하고 있다”며 “관련 업무를 과학수사 연구기관인 국과수로 일원화해야 한다”고 주장했다. 아울러 “DNA 신원확인 정보가 체계적이고 통일적으로 관리되지 못해 인권 침해의 소지도 크다”고 지적했다.

김희정 의원의 개정안 내용은 좀 다르다. 김 의원은 “현재 범행 현장에서 확보한 용의자의 DNA는 경찰이 관리하고, 수형자들로부터 채취한 정보는 검찰이 보관하고 있다”면서 “이로 인해 유력한 용의자를 놓치거나 뒤늦게 검거하는 사례가 많았다. 따라서 효율적이고 신속한 범인 검거를 위해 검·경 간 DNA 정보를 연계 운용토록 개선해야 한다”고 밝혔다. 즉 검찰과 경찰이 용의자 DNA와 수형자 DNA를 따로 관리하되, 정보를 연계 운용할 수 있도록 하자는 내용이다.

국과수 사람들은 관리체계 일원화를 강조하면서도 무척 신중하고 조심스럽다. 검찰, 경찰의 눈치를 꽤 보는 기색이다. 업무 처리상 검찰의 협조가 절실하고 경찰은 예산을 지원하고 있기 때문이다. 그러나 대한민국 국과수가 자랑하는 게 있다. 무척 손이 빠르다는 점이다. 혈액이든 정액이든 증거물이 채집돼 오면 이를 분류해 DB화하는 데 전광석화 같다고 한다. 한 간부는 “업무의 신속성은 수많은 사건을 통해 단련된 덕분”이라고 했다. 분석 수준도 세계적으로 명성을 얻고 있다. 2006년 서울 반포동 서래마을 프랑스 영아 유기 사건을 명쾌하게 해결하면서 국제적 명성을 얻었지만, 이후에도 이런 전통을 이어가고 있다.

가령 얼마 전 동남아인이 연루된 살인 사건에서 DNA 자료만으로도 피의자의 국적을 정확히 맞혀 사건 해결의 결정적 역할을 했다. 혈흔만 가지고도 용의자의 연령을 추정해 내는 기술도 세계적으로 손꼽힌다. 분석방법의 다양화가 이뤄낸 쾌거다.

유전자 분석을 통해 용의자의 종족이나 피부색 등을 맞히는 것은 물론 동식물의 구체적인 개체 식별도 가능하다고 한다. 과거 분석에선 그냥 소나무 정도만 맞혔지만 최근엔 분석 기술의 발달로 어떤 유의 소나무인지, 고양이도 단순히 고양이가 아닌 어떤 형태의 고양이인지 정확히 집어낸다고 한다.

그런 국과수가 조심스러워하는 분야가 있다. 유전자를 통해 용의자의 성(姓)씨를 거의 맞힐 수 있는 능력을 보유하고 있지만 사회 분위기상 이를 공개적으로 활용하지 못하고 있는 점이 그것이다. 부계 염색체인 y유전자를 조사하면 성씨마다 독특한 패턴이 나타난다고 한다. “이는 축적된 데이터로 분석이 가능한데 에러가 거의 없다. 하지만 우리 사회에 하도 금기가 많고 편견이 심해 조심스럽다”고 국과수 한 간부는 전했다. 

조각 지문만 남긴 미제 사건, 3년간 138건 해결

지문을 입력하고 저장·검색할 수 있는 지문자동식별시스템(AFIS). 10분 내에 지문으로 신원 확인이 가능하다. 국내 AFIS에는 4600만여 명의 지문 정보가 담겨 있다.
2005년 5월 어느 날 저녁 서울 공릉동 A씨(26·여)의 집 앞. 갑자기 한 남자가 귀가하던 A씨에게 달려들었다. A씨를 협박해 집 안까지 끌고 들어간 남자는 성폭행한 뒤 현금을 훔쳐 달아났다. 당시 범인이 현장에 남긴 건 손톱 크기도 안 되는 조각지문(일부만 남은 지문) 3개. A씨를 묶기 위해 콘센트에서 뽑았던 전선에 남은 지문이었다. 경찰은 범인의 신원을 밝혀내지 못했고 사건은 그렇게 미제로 남게 됐다. 그로부터 7년 뒤인 지난해 중반, 전과 8범 구모(33)씨가 용의선상에 올랐다. 성능이 향상된 지문자동식별시스템(AFIS)이 당시 밝혀내지 못했던 조각지문 3개의 주인공을 밝혀낸 것이다. 범인임이 확인된 구씨는 결국 구속됐다.

조각지문만 남겨 사건을 미궁에 빠뜨렸던 범인들이 속속 붙잡히고 있다. 지문 감식 기술이 나날이 향상되고 있기 때문이다. 경찰은 향상된 지문 감식 시스템을 바탕으로 2010년부터 지난해까지 살인 5건을 포함해 총 138건의 미제 사건을 해결했다.

지문 감식은 일제 조선총독부가 1911년 국내에 처음 도입했다. 하지만 초기엔 활용도가 그리 높지 않았다. 채취한 지문을 일일이 수작업으로 확인해야 했기 때문이다. 지문 감식 수사가 날개를 단 건 AFIS를 도입한 1990년 이후다. AFIS는 지문을 입력·저장·검색하는 시스템으로 10분 안에 범인의 신원을 확인할 수 있다. AFIS 도입 후 지문 데이터베이스를 구축하고 활용도가 높아지면서 1985년 6건에 불과했던 지문 신원 확인이 지난해에는 4200여 건으로 늘었다. 여기에 경찰청은 2007년부터 2010년까지 AFIS에 지문을 재입력해 선명도를 높이고 검색 시스템을 개선했다. 이를 통해 지문에 나타나는 특징점들을 선으로 연결한 부분의 면적, 융선(지문의 선)의 각도, 위치·방향 등을 활용해 지문 검색의 정확도를 높였다. AFIS에는 외국인, 국내 성인 등 4600만 명(일부 범죄자 중복 입력)의 지문 정보가 입력돼 있다.

지문 채취 기술도 향상됐다. 1948년 경찰부 감식과로 시작해 경찰청 내 감식계, 감식과로 이어져오던 것이 1999년 과학수사과로 확대 개편되면서 체계적으로 연구됐다. 기존에는 채취 도구가 흑연과 같은 분말과 백색 광원이 전부였다. 하지만 지금은 인체에 무해하고 흡착력이 높아 거친 표면에서도 지문을 채취할 수 있는 압축분말이 보편화됐다. 검체(지문이 묻은 물건)에 따라 다양한 파장의 빛을 활용하도록 광원도 다양화됐다. 현재는 미라처럼 건조된 시신과 물에 불은 익사자의 지문도 채취가 가능할 정도로 향상됐다.

경찰은 올해 초부터 손가락 지문뿐 아니라 손바닥 지문인 장문(掌紋)도 수사 단계에서 확보해 활용 중이다. 손금을 기준으로 손바닥 어느 부위의 지문인지 확인해 그 부분을 용의자의 지문과 비교해보는 것이다. 국제법과학감정연구소 이희일 소장은 “지문 데이터베이스가 확보돼 있기 때문에 대한민국의 지문 감식 기술은 세계 최고”라며 “DNA 분석이 비용과 시간이 많이 들기 때문에 지문 감식은 과학수사의 시작과 끝”이라고 말했다.

경찰청 장철환 감정관은 “현재는 보조적인 역할이지만 앞으로 중요성이 높아질 것”이라며 “예산·인력 보충이 필요하고 관련연구 또한 계속 돼야 할 것이다”라고 말했다.

노진호 기자
[출처] 빅데이터 활용의 예 – 범죄 수사|작성자 카몰리노

Predicting Crime with Big Data

Predicting Crime with Big Data
Thursday, March 7, 2013 at 12:05PM

Dr. Jennifer Bachner is Program Coordinator for the MA in Government program. Her work on predictive policing is supported by the IBM Center for the Business of Government.The rise of big data is shifting decision-making practices in all sectors of society. Journalists like David Brooks observe that “data-ism” is the “rising philosophy of the day” and businesses like McKinsey and IBM have focused their efforts on developing products and services that harness the power of big data. The big data revolution means that organizations of all types will need to collect, clean, analyze and act upon increasingly massive amounts of quantitative information to remain competitive.

Among those on the frontier of this paradigm shift are law enforcement agencies. Through what is often referred to as “predictive policing,” police departments are experiencing unprecedented success using data and analytics. Intelligence-led policing (i.e. the adoption of CompStat) emerged in the 1990s and greatly improved accountability by tracking information such as crime and arrest rates. Predictive policing builds upon this foundation. By examining patterns in past crime data, in conjunction with environmental characteristics, analysts can generate amazingly accurate forecasts about where crime is likely to occur. Officers are then deployed according to these forecasts.

The Santa Cruz Police Department, for example, partnered with social scientists at UCLA and Santa Clara University to develop software that assigns the probability of crime occurring to 150 by 150 meter cells on a map. Prior to their shifts, officers are notified of the 15 cells with the highest probabilities, and during their shifts, they can log into a web-based system to access updated, real-time probabilities. While officers are encouraged to view the maps as one of many tools in their crime prevention kits, many of those who integrate the information into their on-the-ground decision making have experienced marked declines in crime rates on their beats.

The applications of predictive policing extend well beyond mapping locations with increased likelihoods of crime (“hot spots”). The Baltimore Police Department, for example, has used predictive methods to inform its offender interdiction tactics. With a serial robber, analysts can use analytics to pinpoint the likely location of the suspect. To accomplish this, analysts first employ an iterative algorithm to calculate the center of minimum distance (CMD) between crime scenes, which is assumed to be the offender’s residence. An analysis of all possible routes from the CMD to the crime scenes and back again frequently reveals a limited number streets and times the offender uses. Police can then conduct an efficient stakeout and apprehend the suspect.

Social network analysis has proven to be another effective prediction tool. Analysts with the Richmond Police Department recently used this type of analysis to identify central (mathematically speaking) members in a homicide suspect’s social network. Police had been searching for the suspect for over a month. A few days after police notified key members in the network of their search, the suspect turned himself in. The police had successfully shut down the suspect’s social resources and, with no safe haven, he submitted himself to the authorities.

More and more police departments across the country are implementing predictive policing programs as the technology and training become more accessible. Further, the increasing computing power and data storage capacities available to police departments allow analysts to integrate more information into predictive analyses. Over the past few decades, criminologists have identified numerous characteristics associated with heightened criminal activity, including the availability of escape routes (e.g. highways and bridges), presence of adult retail establishments, weather patterns, payday schedules, times of day, days of the week and even moon phases. Through collaborative efforts, social scientists, crime analysts and police officers are discovering new ways to leverage this information and translate it into actionable recommendations that prevent crime.

We can also expect predictive policing to improve as information sharing becomes easier. Serial criminals often cross jurisdictional boundaries. This presents a problem for crime analysts, as the accuracy of predictions is positively correlated with information completeness. Recognizing this challenge, the federal government has supported the development of the Law Enforcement Information Exchange (LInX), which serves as a data warehouse for all participating police agencies. Individual agencies populate the database with their crime data, which can then be accessed by other agencies. Other information-sharing systems, such as Digital Information Gateway (DIG), is likewise making data analysis, visualization and interpretation easier and more accurate.

Law enforcement agencies are certainly not the only organizations benefiting from predictive analytics. Retail companies, intelligence agencies, financial institutions and marketing firms are just a few of the organizations using predictive methods and big data to improve their efficiency and success rates. And this trend is likely to continue. IBM estimates that “90% of the data in the world today has been created in the last two years.” This is a great time for undergraduate and graduate students to focus their academic careers in the field of data and analytics.