Skip to main content

Apache Tika server using Golang

Apache Tika?




    Apache Tika is a toolkit/library that uses to detect meta data and extract contents from different types of files (such as .txt, .docx, .pdf, .ppt etc).
    Tika parsed all types of file through single interface, with Tika make useful for search engine indexing, content analysis and much more.

Configuration

To setup a tika server on local machine, you need to go download  page. Which looks like below image.



From download page just download with the 3rd link Viz. "Mirrors for tika-server-1.25.jar".
After download completion, go to the downloaded file and open a command prompt. Inside command prompt type below command to start tika server.

java -jar tika-server-1.25.jar

Now your tika server has been started on local environment and it'll provide an URL to access tika server.

http://localhost:9998/

After configure tika server go to your GOPATH directory, create a file with name as you want to give file name with .go extension. I'm taking as "tikaExample.go".

Execute below command to get the tika package.

go get -u github.com/google/go-tika/tika

tikaExample.go

package main

import(
    "os"
    "fmt"
    "context"
    "regexp"

"github.com/google/go-tika/tika"
)

func main(){

    fmt.Println("Main function started")

//Get the file and open it
    file, err := os.Open("<file_path>/<file_name>.txt")
    if err != nil {
        fmt.Println(err)
}

//Close the file
defer file.Close()

//Create connection with tika server
client := tika.NewClient(nil, "http://localhost:9998/")

//Read the content from file
body, err := client.Parse(context.Background(), file)

    fmt.Println("Extracted content = ", body) 
}

Explaination

1. First we need to import required packages then in the main function we've to open the file with Open() with package "os".
2. After then we're connecting the Tika server using Tika URL.
3. Then after we're calling the Parse method of tika package with context and file, which will return the content of provide file, which will printed with println().



Comments

Post a Comment

Popular posts from this blog

Go Conditional Statements and Loops

Conditional statement is those statement, which is used to execute code on a certain condition. Unlike C or Java conditional statement, conditions are not required to be written inside parenthesis in Go. Lets dive into different conditional statement. Go has following conditional statements are: 1. If condition 2. If-else condition 3. If-else if-else condition 1. If condition: If condition used to check true statement only, means if condition is true then the enclosed code will be executed other wise do nothing. Suppose we've a variable with value as "Mango" and we want some task to perform while condition is true, so above line of code will execute with true condition and will print the message. If the condition isn't true then the code will do nothing. 2. if-else condition:          Another form of conditional statement is " If-else" condition, it has two part, the first part will execute while the condition will be true otherwise the second pa

What is Go/golang?

“Go will be the server language of the future.” — Tobias Lütke, Shopify Go (often referred to as Golang) is an open source programming language that makes it easy to build simple, reliable, and efficient software. Golang is a statically type language and compiled programing language. Go is similar to C, but with memory safety, garbage collection. Its concurrency mechanisms make it easy to write programs that get the most out of multicore and networked machines, while its novel type system enables flexible and modular program construction. Go compiles quickly to machine code. The world was first introduced to Go in November 2009 by Google’s Rob Pike, Robert Griesemer, and Ken Thompson. The main goal of creating Go was to combine the best features of other programming languages. Go released its latest version 1.15.6 on 3rd December 2020. A very basic Hello world program with Go: package main import "fmt" func main () { fmt . Println ( "Hell